Re: Hive built-in functions not working

2013-06-21 Thread Ramki Palle
Try to add the jar explicitly from hive prompt and see if that works.

Regards,
Ramki.


On Fri, Jun 21, 2013 at 7:32 AM, fab wol darkwoll...@gmail.com wrote:

 I'm using Hadoop 0.20.2 with Hive 0.11. I have succesfully inserted into
 hive/hdfs some csv-files in seperate tables. selects and joins work
 flawlessly. When trying to analyse some data, i needed to make use of the
 built in functions of hive like:

  - substr
  - to_date
  - rand
  - etc.

 for example:

 select sid, request_id, to_date(times), to_unix_timestamp(times) from
 contents where sid = '500032066010373';

 sid and request id are strings here, times is a timestamp column
 Unfortanetely i only get errors (always the same error stack) when using
 these functions:

 java.lang.RuntimeException: Error in configuring object
 at
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
 at
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
 at
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
 at
 org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
 at org.apache.hadoop.mapred.Child.main(Child.java:170)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
 at
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
 ... 5 more
 Caused by: java.lang.RuntimeException: Error in configuring object
 at
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
 at
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
 at
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
 at
 org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
 ... 10 more
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
 at
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
 ... 13 more
 Caused by: java.lang.RuntimeException: Map operator initialization
 failed
 at
 org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121)
 ... 18 more
 Caused by: java.lang.NoClassDefFoundError:
 org/codehaus/jackson/JsonFactory
 at
 org.apache.hadoop.hive.ql.udf.generic.GenericUDTFJSONTuple.clinit(GenericUDTFJSONTuple.java:56)
 at
 sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
 at
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
 at
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at
 java.lang.reflect.Constructor.newInstance(Constructor.java:525)
 at
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:113)
 at
 org.apache.hadoop.hive.ql.exec.FunctionRegistry.registerGenericUDTF(FunctionRegistry.java:526)
 at
 org.apache.hadoop.hive.ql.exec.FunctionRegistry.registerGenericUDTF(FunctionRegistry.java:520)
 at
 org.apache.hadoop.hive.ql.exec.FunctionRegistry.clinit(FunctionRegistry.java:423)
 at
 org.apache.hadoop.hive.ql.exec.DefaultUDFMethodResolver.getEvalMethod(DefaultUDFMethodResolver.java:59)
 at
 org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.initialize(GenericUDFBridge.java:154)
 at
 org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:111)
 at
 org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:141)
 at
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:970)
 at
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:996)
 at
 org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60)
 at
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
 at
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451)
 at
 

Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

2013-06-20 Thread Ramki Palle
In the *Attempt two, *are you not supposed to use hivetry as the
directory?

May be you should try giving the full path 
/opt/am/ver/1.0/hive/hivetry/classifier_wf.py and see if it works.

Regards,
Ramki.


On Thu, Jun 20, 2013 at 9:28 AM, Stephen Boesch java...@gmail.com wrote:


 Stephen:  would you be willing to share an example of specifying a
 directory as the  add file target?I have not seen this working

 I have attempted to use it as follows:

 *We will access a script within the hivetry directory located here:*
 hive ! ls -l  /opt/am/ver/1.0/hive/hivetry/classifier_wf.py;
 -rwxrwxr-x 1 hadoop hadoop 11241 Jun 18 19:37
 /opt/am/ver/1.0/hive/hivetry/classifier_wf.py

 *Add the directory  to hive:*
 hive add file /opt/am/ver/1.0/hive/hivetry;
 Added resource: /opt/am/ver/1.0/hive/hivetry

 *Attempt to run transform query using that script:*
 *
 *
 *Attempt one: use the script name unqualified:*

 hivefrom (select transform (aappname,qappname) using 'classifier_wf.py' 
 as (aappname2 string, qappname2 string) from eqx ) o insert overwrite table c 
 select o.aappname2, o.qappname2;

 (Failed:   Caused by: java.io.IOException: Cannot run program 
 classifier_wf.py: java.io.IOException: error=2, No such file or directory)


 *Attempt two: use the script name with the directory name prefix: *

 hivefrom (select transform (aappname,qappname) using 
 'hive/classifier_wf.py' as (aappname2 string, qappname2 string) from eqx ) o 
 insert overwrite table c select o.aappname2, o.qappname2;

 (Failed:   Caused by: java.io.IOException: Cannot run program 
 hive/classifier_wf.py: java.io.IOException: error=2, No such file or 
 directory)




 2013/6/20 Stephen Sprague sprag...@gmail.com

 yeah.  the archive isn't unpacked on the remote side. I think add archive
 is mostly used for finding java packages since CLASSPATH will reference the
 archive (and as such there is no need to expand it.)


 On Thu, Jun 20, 2013 at 9:00 AM, Stephen Boesch java...@gmail.comwrote:

 thx for the tip on add file where file is directory. I will try
 that.


 2013/6/20 Stephen Sprague sprag...@gmail.com

 i personally only know of adding a .jar file via add archive but my
 experience there is very limited.  i believe if you 'add file' and the file
 is a directory it'll recursively take everything underneath but i know of
 nothing that inflates or un tars things on the remote end automatically.

 i would 'add file' your python script and then within that untar your
 tarball to get at your model data. its just the matter of figuring out the
 path to that tarball that's kinda up in the air when its added as 'add
 file'.  Yeah. local downlooads directory.  What's the literal path is
 what i'd like to know. :)


 On Thu, Jun 20, 2013 at 8:37 AM, Stephen Boesch java...@gmail.comwrote:


 @Stephen:  given the  'relative' path for hive is from a local
 downloads directory on each local tasktracker in the cluster,  it was my
 thought that if the archive were actually being expanded then
 somedir/somefileinthearchive  should work.  I will go ahead and test this
 assumption.

 In the meantime, is there any facility available in hive for making
 archived files available to hive jobs?  archive or hadoop archive (har)
 etc?


 2013/6/20 Stephen Sprague sprag...@gmail.com

 what would be interesting would be to run a little experiment and
 find out what the default PATH is on your data nodes.  How much of a pain
 would it be to run a little python script to print to stderr the value of
 the environmental variable $PATH and $PWD (or the shell command 'pwd') ?

 that's of course going through normal channels of add file.

 the thing is given you're using a relative path hive/parse_qx.py
 you need to know what the current directory is when the process runs on
 the data nodes.




 On Thu, Jun 20, 2013 at 5:32 AM, Stephen Boesch java...@gmail.comwrote:


 We have a few dozen files that need to be made available to all
 mappers/reducers in the cluster while running  hive transformation 
 steps .

 It seems the add archive  does not make the entries unarchived and
 thus available directly on the default file path - and that is what we 
 are
 looking for.

 To illustrate:

add file modelfile.1;
add file modelfile.2;
..
 add file modelfile.N;

   Then, our model that is invoked during the transformation step *does
 *have correct access to its model files in the defaul path.

 But .. those model files take low *minutes* to all load..

 instead when we try:
add archive  modelArchive.tgz.

 The problem is the archive does not get exploded apparently ..

 I have an archive for example that contains shell scripts under the
 hive directory stored inside.  I am *not *able to access
 hive/my-shell-script.sh  after adding the archive. Specifically the
 following fails:

 $ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml
 -rwxrwxr-x stephenb/stephenb664 2013-06-18 17:46
 appminer/bin/launch-quixey_to_xml.sh

 from (select 

Re: Hive External Table issue

2013-06-20 Thread Ramki Palle
1. I was under the impression that you cannot refer the table location to a
file. But, it looks like it works. Please see the discussion in the thread
 http://mail-archives.apache.org/mod_mbox/hive-user/201303.mbox/%
3c556325346ca26341b6f0530e07f90d96017084360...@gbgh-exch-cms.sig.ads%3e

2. It there are more than one file in the directory, your query gets the
data from all the files in that directory.

In your case, the regex may not be parsing the data properly.

Regards,
Ramki.


On Thu, Jun 20, 2013 at 11:03 AM, sanjeev sagar sanjeev.sa...@gmail.comwrote:

 Two issues:

 1. I've created external tables in hive based on file location before and
 it work without any issue. It don't have to be a directory.

 2. If there are more than one file in the directory, and you create
 external table based on directory then how the table knows that which file
 it need to look for the data?

 I tried to create the table based on directory, it created the table but
 all the rows were NULL.

 -Sanjeev


 On Thu, Jun 20, 2013 at 10:30 AM, Nitin Pawar nitinpawar...@gmail.comwrote:

 in hive when you create table and use the location to refer hdfs path,
 that path is supposed to be a directory.
 If the directory is not existing it will try to create it and if its a
 file it will throw an error as its not a directory

 thats the error you are getting that location you referred is a file.
 Change it to the directory and see if that works for you


 On Thu, Jun 20, 2013 at 10:57 PM, sanjeev sagar 
 sanjeev.sa...@gmail.comwrote:

 I did mention in my mail the hdfs file exists in that location. See below

 In HDFS: file exists



 hadoop fs -ls

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 Found 1 items

 -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 so the directory and file both exists.


 On Thu, Jun 20, 2013 at 10:24 AM, Nitin Pawar 
 nitinpawar...@gmail.comwrote:

 MetaException(message:hdfs://
 h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 is not a directory or unable to create one)


 it clearly says its not a directory. Point to the dictory and it will
 work


 On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar 
 sanjeev.sa...@gmail.com wrote:

 Hello Everyone, I'm running into the following Hive external table
 issue.



 hive CREATE EXTERNAL TABLE access(

 host STRING,

 identity STRING,

 user STRING,

 time STRING,

 request STRING,

 status STRING,

 size STRING,

 referer STRING,

 agent STRING)

 ROW FORMAT SERDE

 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'

 WITH SERDEPROPERTIES (

input.regex = ([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\])

 ([^ \]*|\[^\]*\) (-|[0-9]*) (-|[0-9]*)(?: ([^ \]*|\[^\]*\) ([^
 \]*|\[^\]*\))?,

 output.format.string = %1$s %2$s %3$s %4$s %5$s %6$s

 %7$s %8$s %9$s

 )

 STORED AS TEXTFILE

 LOCATION

 '/user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033';

 FAILED: Error in metadata:

 MetaException(message:hdfs://
 h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 is not a directory or unable to create one)

 FAILED: Execution Error, return code 1 from
 org.apache.hadoop.hive.ql.exec.DDLTask





 In HDFS: file exists



 hadoop fs -ls

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 Found 1 items

 -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033



 I've download the serde2 jar file too and install it in
 /usr/lib/hive/lib/hive-json-serde-0.2.jar and I've bounced all the hadoop
 services after that.



 I even added the jar file manually in hive and run the above sql but
 still failing.

 ive add jar /usr/lib/hive/lib/hive-json-serde-0.2.jar

   ;

 Added /usr/lib/hive/lib/hive-json-serde-0.2.jar to class path Added
 resource: /usr/lib/hive/lib/hive-json-serde-0.2.jar



 Any help would be highly appreciable.



 -Sanjeev









 --
 Sanjeev Sagar

 ***Separate yourself from everything that separates you from others
 ! - Nirankari Baba Hardev Singh ji *

 **




 --
 Nitin Pawar




 --
 Sanjeev Sagar

 ***Separate yourself from everything that separates you from others !- 
 Nirankari
 Baba Hardev Singh ji *

 **




 --
 Nitin Pawar




 --
 Sanjeev Sagar

 ***Separate yourself from everything that separates you from others ! - 
 Nirankari
 Baba Hardev Singh ji *

 **



Re: Hive External Table issue

2013-06-20 Thread Ramki Palle
Nitin,

Can you go through the thread with subject S3/EMR Hive: Load contents of a
single file  on Tue, 26 Mar, 17:11 at


http://mail-archives.apache.org/mod_mbox/hive-user/201303.mbox/thread?1

 This gives the whole discussion about the topic of table location pointing
to a filename vs. directory.

Can you give your insight from this discussion and the discussion you
mentioned at stackoverflow link?

Regards,
Ramki.



On Thu, Jun 20, 2013 at 11:14 AM, Nitin Pawar nitinpawar...@gmail.comwrote:

 Also see this JIRA
 https://issues.apache.org/jira/browse/HIVE-951

 I think issue you are facing is due to the JIRA


 On Thu, Jun 20, 2013 at 11:41 PM, Nitin Pawar nitinpawar...@gmail.comwrote:

 Mark has answered this before

 http://stackoverflow.com/questions/11269203/when-creating-an-external-table-in-hive-can-i-point-the-location-to-specific-fil

 If this link does not answer your question, do let us know


 On Thu, Jun 20, 2013 at 11:33 PM, sanjeev sagar 
 sanjeev.sa...@gmail.comwrote:

 Two issues:

 1. I've created external tables in hive based on file location before
 and it work without any issue. It don't have to be a directory.

 2. If there are more than one file in the directory, and you create
 external table based on directory then how the table knows that which file
 it need to look for the data?

 I tried to create the table based on directory, it created the table but
 all the rows were NULL.

 -Sanjeev


 On Thu, Jun 20, 2013 at 10:30 AM, Nitin Pawar 
 nitinpawar...@gmail.comwrote:

 in hive when you create table and use the location to refer hdfs path,
 that path is supposed to be a directory.
 If the directory is not existing it will try to create it and if its a
 file it will throw an error as its not a directory

 thats the error you are getting that location you referred is a file.
 Change it to the directory and see if that works for you


 On Thu, Jun 20, 2013 at 10:57 PM, sanjeev sagar 
 sanjeev.sa...@gmail.com wrote:

 I did mention in my mail the hdfs file exists in that location. See
 below

 In HDFS: file exists



 hadoop fs -ls

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 Found 1 items

 -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 so the directory and file both exists.


 On Thu, Jun 20, 2013 at 10:24 AM, Nitin Pawar nitinpawar...@gmail.com
  wrote:

 MetaException(message:hdfs://
 h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 is not a directory or unable to create one)


 it clearly says its not a directory. Point to the dictory and it will
 work


 On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar 
 sanjeev.sa...@gmail.com wrote:

 Hello Everyone, I'm running into the following Hive external table
 issue.



 hive CREATE EXTERNAL TABLE access(

 host STRING,

 identity STRING,

 user STRING,

 time STRING,

 request STRING,

 status STRING,

 size STRING,

 referer STRING,

 agent STRING)

 ROW FORMAT SERDE

 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'

 WITH SERDEPROPERTIES (

input.regex = ([^ ]*) ([^ ]*) ([^ ]*)
 (-|\\[[^\\]]*\\])

 ([^ \]*|\[^\]*\) (-|[0-9]*) (-|[0-9]*)(?: ([^ \]*|\[^\]*\)
 ([^ \]*|\[^\]*\))?,

 output.format.string = %1$s %2$s %3$s %4$s %5$s %6$s

 %7$s %8$s %9$s

 )

 STORED AS TEXTFILE

 LOCATION

 '/user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033';

 FAILED: Error in metadata:

 MetaException(message:hdfs://
 h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 is not a directory or unable to create one)

 FAILED: Execution Error, return code 1 from
 org.apache.hadoop.hive.ql.exec.DDLTask





 In HDFS: file exists



 hadoop fs -ls

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 Found 1 items

 -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033



 I've download the serde2 jar file too and install it in
 /usr/lib/hive/lib/hive-json-serde-0.2.jar and I've bounced all the 
 hadoop
 services after that.



 I even added the jar file manually in hive and run the above sql but
 still failing.

 ive add jar /usr/lib/hive/lib/hive-json-serde-0.2.jar

   ;

 Added /usr/lib/hive/lib/hive-json-serde-0.2.jar to class path Added
 resource: /usr/lib/hive/lib/hive-json-serde-0.2.jar



 Any help would be highly appreciable.



 -Sanjeev









 --
 Sanjeev Sagar

 ***Separate yourself from everything that separates you from
 others ! - Nirankari Baba Hardev Singh ji *

 **




 --
 Nitin Pawar




 --
 Sanjeev 

Re: LZO compression implementation in Hive

2013-06-17 Thread Ramki Palle
Hi Sanjay,

Can you quickly give your insights on thip topic, if possible?

Regards,
Ramki.


On Mon, May 20, 2013 at 2:51 PM, Sanjay Subramanian 
sanjay.subraman...@wizecommerce.com wrote:

  Hi Programming Hive Book authors

  Maybe a lot of u have already successfully implemented this but only
 these last two weeks , we implemented our aggregations using LZO
 compression in Hive - MR jobs creating LZO files as Input for Hive ---
 Therafter Hive aggregations creating more LZO files as output.
 As usual nothing was straight forward :-)  Also the other challenge was to
 neatly tie all into actions in Oozie workflows….but after being underwater
 for weeks I think I am able to  rise above water and breathe !

  In the next version of the book , If u guys r planning to add detailed
 sections on using lzo compression in Hive , let me know…my experiences
 might be useful :-)

  Thanks

  sanjay



 CONFIDENTIALITY NOTICE
 ==
 This email message and any attachments are for the exclusive use of the
 intended recipient(s) and may contain confidential and privileged
 information. Any unauthorized review, use, disclosure or distribution is
 prohibited. If you are not the intended recipient, please contact the
 sender by reply email and destroy all copies of the original message along
 with any attachments, from your computer system. If you are the intended
 recipient, please be advised that the content of this message is subject to
 access, review and disclosure by the sender's Email System Administrator.



Re: Table present in HDFS but 'show table' Returns Empty

2013-04-23 Thread Ramki Palle
May be you are using derby as your metastore. It creates the metastore in
the current directory from where you started your hive session. You may
have started your hive session from a different directory next time.

Please use either mysql as your metastore or set a definite directory in
your config file as your metastore if you continue to use derby as your
meta store.

Regards,
Ramki.


On Mon, Apr 22, 2013 at 9:48 PM, Xun TANG tangxun.al...@gmail.com wrote:

 Hi guys,

 I've created hive tables via 'hive -f' or hive interactive shell. Data are
 loaded to tables right after, so the tables are not empty. However, when I
 log out and log in to hive shell again, 'show tables' return 0 table, while
 'hadoop list' command shows the hdfs files are still where they were.

 So my questions are,
 1. What triggered hive to 'forget' the tables?
 2. How to make hive permanently (or within the lifecycle of hadoop
 daemons) remember the tables until they were dropped?

 This problem has puzzled me for a while, and I could not find similar
 question/answer online... Did I miss some configuration?


 Thanks,
 Alice



Re: Exception comes out when counting the rows

2013-04-23 Thread Ramki Palle
Normally, you do not use load data when you use external table. You either
specify location in the create external table query or use alter table
tablename set location syntax to set the location of the files that the
external table stores the data.

In this case, try describe extended NMS_CMTS_CPU_CDX_TEST and see what is
the value for location.

It is better to use the location as I mentioned above for external tables
instead of using load data. This way there is no confusion between managed
tables and external tables locations.

Regards,
Ramki.


On Mon, Apr 22, 2013 at 2:01 AM, YouPeng Yang yypvsxf19870...@gmail.comwrote:


 Hi hive users

   Sorry for missing the title on the previous mail.

   This is my first time to post a question here.

 I have gotten an exception when I count the rows of  my hive table after I
 have loaded the data:

 hivecreate EXTERNAL TABLE  NMS_CMTS_CPU_CDX_TEST (CMTSID INT,MSEQ
 INT,GOTTIME BIGINT,CMTSINDEX INT,CPUTOTAL INT,DESCR STRING) ROW FORMAT
  DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'  STORED AS
 TEXTFILE;

 hiveload data inpath '/user/sqoop/NMS_CMTS_CPU_CDX3/NMS_CMTS_CPU_CDX3'
  into table NMS_CMTS_CPU_CDX_TEST;

 hive select count(1) from NMS_CMTS_CPU_CDX_TEST;
 I get an exception on step 3,logs are as follows.

 Any helps will be gratefull.

 Regards

 ---
 
 WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please
 use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties
 files.
 Execution log at:
 /tmp/hive/hive_20130422162020_791a7b61-6ba0-466d-99ba-5c2556bafaa4.log
 Job running in-process (local Hadoop)
 Hadoop job information for null: number of mappers: 1; number of reducers:
 1
 2013-04-22 16:24:21,604 null map = 0%,  reduce = 0%
 2013-04-22 16:25:21,965 null map = 0%,  reduce = 0%
 2013-04-22 16:26:22,902 null map = 0%,  reduce = 0%
 2013-04-22 16:26:27,312 null map = 100%,  reduce = 0%
 Ended Job = job_1364348895095_0055 with errors
 Error during job, obtaining debugging information...
 Examining task ID: task_1364348895095_0055_m_00 (and more) from job
 job_1364348895095_0055
 Unable to retrieve URL for Hadoop Task logs. Does not contain a valid
 host:port authority: local

 Task with the most failures(4):
 -
 Task ID:
   task_1364348895095_0055_m_00

 URL:
   Unavailable
 -
 Diagnostic Messages for this Task:
 Error: java.lang.RuntimeException: java.io.FileNotFoundException:
 /tmp/hive/hive_2013-04-22_16-20-45_720_3839682514463028560/-mr-10001/89dd576e-fb9d-409a-8b46-2e46b7d21160
 (No such file or directory)
 at
 org.apache.hadoop.hive.ql.exec.Utilities.getMapRedWork(Utilities.java:224)
  at
 org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
 at
 org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:381)
  at
 org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:374)
 at
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:536)
  at
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.init(MapTask.java:160)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:381)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:334)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:152)
  at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
  at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:147)
 Caused by: java.io.FileNotFoundException:
 /tmp/hive/hive_2013-04-22_16-20-45_720_3839682514463028560/-mr-10001/89dd576e-fb9d-409a-8b46-2e46b7d21160
 (No such file or directory)
 at java.io.FileInputStream.open(Native Method)
  at java.io.FileInputStream.init(FileInputStream.java:120)
 at java.io.FileInputStream.init(FileInputStream.java:79)
  at
 org.apache.hadoop.hive.ql.exec.Utilities.getMapRedWork(Utilities.java:215)
 ... 12 more


 Execution failed with exit status: 2
 13/04/22 16:26:28 ERROR exec.Task: Execution failed with exit status: 2
 Obtaining error information
 13/04/22 16:26:28 ERROR exec.Task: Obtaining error information

 Task failed!
 Task ID:
   Stage-1

 Logs:

 13/04/22 16:26:28 ERROR exec.Task:
 Task failed!
 Task ID:
   Stage-1

 Logs:

 13/04/22 16:26:28 ERROR exec.ExecDriver: Execution failed with exit
 status: 2
 FAILED: Execution Error, return code 2 from
 org.apache.hadoop.hive.ql.exec.MapRedTask
 13/04/22 16:26:28 ERROR ql.Driver: FAILED: Execution Error, return code 2
 from org.apache.hadoop.hive.ql.exec.MapRedTask
 13/04/22 16:26:28 INFO ql.Driver: /PERFLOG method=Driver.execute
 start=1366618840573 end=1366619188646 duration=348073
 13/04/22 16:26:28 INFO ql.Driver: PERFLOG method=releaseLocks
 13/04/22 16:26:28 INFO ql.Driver: /PERFLOG method=releaseLocks
 

Re: Partition performance

2013-04-05 Thread Ramki Palle
Can you tell how many map tasks are there in each scenario?

If my assumption is correct, you should have 336 in the first case and 14
in second case.


It looks like it is combing all small files in a folder and running as one
map task for all 24 files in a folder, whereas it is running a separate
task in these files are there in different partitions (folders).

You can try to reuse the JVM and see if the response time is similar.

Can you please try the following and let us know how long each strategy
takes?

hive set mapred.job.reuse.jvm.num.tasks = 24;

Run your  query that has more partitions and see if the response time is
lower.

Regards,
Ramki.


On Fri, Apr 5, 2013 at 11:36 AM, Ian liu...@yahoo.com wrote:

 Thanks. This is just a test from my local box. So each file is only 1kb. I
 shared the query plans of these two tests at:
 http://codetidy.com/paste/raw/5198
 http://codetidy.com/paste/raw/5199

 Also in the Hadoop log, there is this line for each partition:
 org.apache.hadoop.hive.ql.exec.MapOperator: Adding alias test1 to work
 list for file hdfs://localhost:8020/test1/2011/02/01/01
 Does that mean each partition will become a map task?

 I'm still new in Hive, just wondering what are the common strategy for
 partitioning the hourly logs? I know we shouldn't have too many partitions
 but I'm wondering what's the reason behind it? If I run this on a real
 cluster, maybe it won't perform so differently?

 Thanks.

   *From:* Dean Wampler dean.wamp...@thinkbiganalytics.com
 *To:* user@hive.apache.org
 *Sent:* Thursday, April 4, 2013 4:28 PM
 *Subject:* Re: Partition performance

 Also, how big are the files in each directory? Are they roughly the size
 of one HDFS block or a multiple. Lots of small files will mean lots of
 mapper tasks will little to do.

 You can also compare the job tracker console output for each job. I bet
 the slow one has a lot of very short map and reduce tasks, while the faster
 one has fewer tasks that run longer. A rule of thumb is that any one task
 should take 20 seconds or more to amortize over the few seconds spent in
 start up per task.

 In other words, if you think about what's happening at the HDFS and MR
 level, you can learn to predict how fast or slow things will run. Learning
 to read the output of EXPLAIN or EXPLAIN EXTENDED helps with this.

 dean

 On Thu, Apr 4, 2013 at 6:25 PM, Owen O'Malley omal...@apache.org wrote:

 See slide #9 from my Optimizing Hive Queries talk
 http://www.slideshare.net/oom65/optimize-hivequeriespptx . Certainly, we
 will improve it, but for now you are much better off with 1,000 partitions
 than 10,000.

 -- Owen


 On Thu, Apr 4, 2013 at 4:21 PM, Ramki Palle ramki.pa...@gmail.com wrote:

 Is it possible for you to send the explain plan of these two queries?

 Regards,
 Ramki.


 On Thu, Apr 4, 2013 at 4:06 PM, Sanjay Subramanian 
 sanjay.subraman...@wizecommerce.com wrote:

  The slow down is most possibly due to large number of partitions.
 I believe the Hive book authors tell us to be cautious with large number
 of partitions :-)  and I abide by that.

  Users
 Please add your points of view and experiences

  Thanks
 sanjay

   From: Ian liu...@yahoo.com
 Reply-To: user@hive.apache.org user@hive.apache.org, Ian 
 liu...@yahoo.com
 Date: Thursday, April 4, 2013 4:01 PM
 To: user@hive.apache.org user@hive.apache.org
 Subject: Partition performance

   Hi,

 I created 3 years of hourly log files (totally 26280 files), and use
 External Table with partition to query. I tried two partition methods.

 1). Log files are stored as /test1/2013/04/02/16/00_0 (A directory per
 hour). Use date and hour as partition keys. Add 3 years of directories to
 the table partitions. So there are 26280 partitions.
 CREATE EXTERNAL TABLE test1 (logline string) PARTITIONED BY (dt
 string, hr int);
 ALTER TABLE test1 ADD PARTITION (dt='2013-04-02', hr=16) LOCATION
 '/test1/2013/04/02/16';

 2). Log files are stored as /test2/2013/04/02/16_00_0 (A directory per
 day, 24 files in each directory). Use date as partition key. Add 3 years of
 directories to the table partitions. So there are 1095 partitions.
  CREATE EXTERNAL TABLE test2 (logline string) PARTITIONED BY (dt
 string);
 ALTER TABLE test2 ADD PARTITION (dt='2013-04-02') LOCATION
 '/test2/2013/04/02';

 When doing a simple query like
 SELECT * FROM  test1/test2  WHERE  dt = '2013-02-01' and dt =
 '2013-02-14'
  Using approach #1 takes 320 seconds, but #2 only takes 70 seconds.

 I'm wondering why there is a big performance difference between these two?
 These two approaches have the same number of files, only the directory
 structure is different. So Hive is going to load the same amount of files.
 Why does the number of partitions have such big impact? Does that mean #2
 is a better partition strategy?

 Thanks.



 CONFIDENTIALITY NOTICE
 ==
 This email message and any attachments are for the exclusive use of the
 intended

Re: Partition performance

2013-04-04 Thread Ramki Palle
Is it possible for you to send the explain plan of these two queries?

Regards,
Ramki.


On Thu, Apr 4, 2013 at 4:06 PM, Sanjay Subramanian 
sanjay.subraman...@wizecommerce.com wrote:

  The slow down is most possibly due to large number of partitions.
 I believe the Hive book authors tell us to be cautious with large number
 of partitions :-)  and I abide by that.

  Users
 Please add your points of view and experiences

  Thanks
 sanjay

   From: Ian liu...@yahoo.com
 Reply-To: user@hive.apache.org user@hive.apache.org, Ian 
 liu...@yahoo.com
 Date: Thursday, April 4, 2013 4:01 PM
 To: user@hive.apache.org user@hive.apache.org
 Subject: Partition performance

   Hi,

 I created 3 years of hourly log files (totally 26280 files), and use
 External Table with partition to query. I tried two partition methods.

 1). Log files are stored as /test1/2013/04/02/16/00_0 (A directory per
 hour). Use date and hour as partition keys. Add 3 years of directories to
 the table partitions. So there are 26280 partitions.
 CREATE EXTERNAL TABLE test1 (logline string) PARTITIONED BY (dt
 string, hr int);
 ALTER TABLE test1 ADD PARTITION (dt='2013-04-02', hr=16) LOCATION
 '/test1/2013/04/02/16';

 2). Log files are stored as /test2/2013/04/02/16_00_0 (A directory per
 day, 24 files in each directory). Use date as partition key. Add 3 years of
 directories to the table partitions. So there are 1095 partitions.
  CREATE EXTERNAL TABLE test2 (logline string) PARTITIONED BY (dt
 string);
 ALTER TABLE test2 ADD PARTITION (dt='2013-04-02') LOCATION
 '/test2/2013/04/02';

 When doing a simple query like
 SELECT * FROM  test1/test2  WHERE  dt = '2013-02-01' and dt =
 '2013-02-14'
  Using approach #1 takes 320 seconds, but #2 only takes 70 seconds.

 I'm wondering why there is a big performance difference between these two?
 These two approaches have the same number of files, only the directory
 structure is different. So Hive is going to load the same amount of files.
 Why does the number of partitions have such big impact? Does that mean #2
 is a better partition strategy?

 Thanks.



 CONFIDENTIALITY NOTICE
 ==
 This email message and any attachments are for the exclusive use of the
 intended recipient(s) and may contain confidential and privileged
 information. Any unauthorized review, use, disclosure or distribution is
 prohibited. If you are not the intended recipient, please contact the
 sender by reply email and destroy all copies of the original message along
 with any attachments, from your computer system. If you are the intended
 recipient, please be advised that the content of this message is subject to
 access, review and disclosure by the sender's Email System Administrator.



Re: S3/EMR Hive: Load contents of a single file

2013-03-26 Thread Ramki Palle
First of all, you cannot point a table to a file. Each table will have a
corresponding table. If you want to have all the in the table contains in
only one file, simply copy that one file into the directory. The table does
not need to know the name of the file. It only matters whether the
structure of the data in the file is similar to the table structure.

When you query the table, it gets the data from whatever files are there
from the corresponding directory.

Regards,
Ramki.


On Tue, Mar 26, 2013 at 10:11 AM, Tony Burton tbur...@sportingindex.comwrote:

 Hi list,

 ** **

 I've been using hive to perform queries on data hosted on AWS S3, and my
 tables point at data by specifying the directory in which the data is
 stored, eg 

 ** **

 $ create external table myData (str1 string, str2 string, count1 int)
 partitioned by snip row format snip stored as textfile location
 's3://mybucket/path/to/data';

 ** **

 where s3://mybucket/path/to/data is the directory that contains the
 files I'm interested in. My use case now is to create a table with data
 pointing to a specifc file in a directory:

 ** **

 $ create external table myData (str1 string, str2 string, count1 int)
 partitioned by snip row format snip stored as textfile location
 's3://mybucket/path/to/data/src1.txt';

 

 and I get the error: FAILED: Error in metadata: MetaException(message:Got
 exception: java.io.IOException Can't make directory for path
 's3://spinmetrics/global/counter_Fixture.txt' since it is a file.). Ok,
 lets try to create the table without specifying the data source:

 ** **

 $ create external table myData (str1 string, str2 string, count1 int)
 partitioned by snip row format snip stored as textfile

 ** **

 Ok, no problem. Now lets load the data

 ** **

 $ LOAD DATA INPATH 's3://mybucket/path/to/data/src1.txt' INTO TABLE myData;
 

 ** **

 (referring to https://cwiki.apache.org/Hive/languagemanual-dml.html -
 ...filepath can refer to a file (in which case hive will move the file
 into the table))

 ** **

 Error message is:  FAILED: Error in semantic analysis: Line 1:17 Path is
 not legal ''s3://mybucket/path/to/data/src1.txt: Move from: s3://
 mybucket/path/to/data/src1.txt to: hdfs://
 10.48.97.97:9000/mnt/hive_081/warehouse/gfix is not valid. Please check
 that values for params default.fs.name and
 hive.metastore.warehouse.dir do not conflict.

 ** **

 So I check my default.fs.name and hive.metastore.warehouse.dir (which
 have never caused problems before):

 ** **

 $ set fs.default.name;

 fs.default.name=hdfs://10.48.97.97:9000

 $ set hive.metastore.warehouse.dir;

 hive.metastore.warehouse.dir=/mnt/hive_081/warehouse

 ** **

 Clearly different, but which is correct? Is there an easier way to load a
 single file into a hive table? Or should I just put each file in a
 directory and proceed as before?

 ** **

 Thanks!

 ** **

 Tony

 ** **

 ** **

 ** **

 ** **

 ** **

 ** **

 ** **

 *Tony Burton
 Senior Software Engineer*
 e: tbur...@sportingindex.com

 

 [image: cid:image001.png@01CDC643.43FE7330]http://www.sportingsolutions.com/
 

 ** **




 *
 P *Please consider the environment before printing this email or
 attachments*


 This email and any attachments are confidential, protected by copyright
 and may be legally privileged. If you are not the intended recipient, then
 the dissemination or copying of this email is prohibited. If you have
 received this in error, please notify the sender by replying by email and
 then delete the email completely from your system. Neither Sporting Index
 nor the sender accepts responsibility for any virus, or any other defect
 which might affect any computer or IT system into which the email is
 received and/or opened. It is the responsibility of the recipient to scan
 the email and no responsibility is accepted for any loss or damage arising
 in any way from receipt or use of this email. Sporting Index Ltd is a
 company registered in England and Wales with company number 2636842, whose
 registered office is at Gateway House, Milverton Street, London, SE11 4AP.
 Sporting Index Ltd is authorised and regulated by the UK Financial Services
 Authority (reg. no. 150404) and Gambling Commission (reg. no.
 000-027343-R-308898-001). Any financial promotion contained herein has been
 issued and approved by Sporting Index Ltd.

 Outbound email has been scanned for viruses and SPAM


image001.png

Re: Using TABLESAMPLE on inner queries

2013-03-20 Thread Ramki Palle
You may use percent based (block sampling) sampling for non-bucketed
tables, though there are some restrictions.

https://cwiki.apache.org/Hive/languagemanual-sampling.html

Regards,
Ramki.


On Wed, Mar 20, 2013 at 12:27 PM, Mark Grover
grover.markgro...@gmail.comwrote:

 Hey Dean,
 I am not a power user of the sampling feature but my understanding was
 that sampling in Hive only works on bucketed tables. I am happy to be
 corrected though.

 Mark


 On Wed, Mar 20, 2013 at 12:20 PM, Dean Wampler 
 dean.wamp...@thinkbiganalytics.com wrote:

 Mark,

 Aside from what might be wrong here, isn't it true that sampling with the
 bucket clause still works on non-bucketed tables; it's just inefficient
 because it still scans the whole table? Or am I an idiot? ;)

 dean

 On Wed, Mar 20, 2013 at 2:17 PM, Mark Grover grover.markgro...@gmail.com
  wrote:

 Hi Robert,
 Sampling in Hive is based on buckets. Therefore, you table needs to be
 appropriately bucketed.

 I would recommend storing the results of your inner query in a bucketed
 table. See how to populate a bucketed table at
 https://cwiki.apache.org/Hive/languagemanual-ddl-bucketedtables.html

 Then you will be able to be sample through it.

 If the predicate is on partition column, you may be able to get around
 the intermediate table requirement but in general, as far as I know,
 intermediate bucketed table might be the only choice.

 Mark

 On Wed, Mar 20, 2013 at 9:56 AM, Robert Li robert...@kontagent.comwrote:

 Hi Everyone

 I'm trying to use the TABLESAMPLE function to sample data, however it's
 a little more complicated and I am having trouble getting it to run.

 I know that this works fine and it will give me about 25% of the whole
 dataset

 select distinct s
 from testtable TABLESAMPLE(BUCKET 1 OUT OF 4 ON s)
 where month = 201211

 However, in my situation I need to do a TABLESAMPLE on an outer query,
 a simple example is

 *select mytest.s *
 *from *
 * (select distinct s from testtable where month = 201211)mytest*

 or something like

 *select table1.s*
 *from *
 * (select distinct s from testtable)table1*
 * join*
 * (select distinct s from test2table)table2*
 * on table1.s=table2.s*


 How do I use TABLESAMPLE in this case to sample the results of the
 outer query? I tried placing TABLESAMPLE(BUCKET 1 OUT OF 4 ON s) in various
 places of my query but it always returns some sort of syntax error and thus
 not allowing the query to run.

 Any help is appreciated.

 Robert
 **





 --
 *Dean Wampler, Ph.D.*
 thinkbiganalytics.com
 +1-312-339-1330






Re: how to handle variable format data of text file?

2013-03-18 Thread Ramki Palle
One way you can try is to make your ldata as a map field as it contains
variable formatted data and write a UDF to get whatever information you
need get.

Regards,
Ramki.



On Mon, Mar 18, 2013 at 1:23 AM, Zhiwen Sun pens...@gmail.com wrote:

 As u defined in create table hql: fields delimited by blank space. So, the
 other data is omitted

 if you wanna contain rest data at the end of line. I suggest you use
 org.apache.hadoop.hive.contrib.serde2.RegexSerDe row format instead of
 default delimited format.


 Zhiwen Sun



 On Mon, Mar 11, 2013 at 12:04 PM, 周梦想 abloz...@gmail.com wrote:

 I have files like this:
 03/11/13 10:59:52 0ec0 1009 180538126 92041 2300 0 0 7 21|47|20|33|11
 0:2775
 03/11/13 10:59:52 0744 1010 178343610 92042 350 1 0 -1 NULL NULL 22 45
 the format is separated by blank space:
 date time threadid gid userid [variable formated data grouped by fields
 separated by space ]

 I'd like to create a table like:

 hive create external table handresult (hdate string,htime string, thid
 string, gid int, userid string,ldata string) row format delimited fields
 terminated by   ;
 OK

 but the above table will only have a part of the data.
 select * from handresult;
 03/11/13 10:59:52 0ec0 1009 180538126 92041
 03/11/13 10:59:52 0744 1010 178343610 92042

 the remain data  like 2300 0 0 7 21|47|20|33|11 0:2775   I can't get.

 while ldata may be variance length and format separated by   or an
 array, the ldata we will parse diferent  by each gid.

 how do this?

 Thanks,
 Andy Zhou





Re: hive issue with sub-directories

2013-03-11 Thread Ramki Palle
One way it was solved by an user earlier was by subclassing the InputFormat
class and overriding the listStatus method so that you can ignore
subdirectories. This was done in 0.7.1 version. Not sure if there is any
better way in later versions. At least you can use this approach until
someone comes up with a better way.

Please see the message at

http://mail-archives.apache.org/mod_mbox/hive-user/201108.mbox/%3CCAC80dVSwRz3GpTSAzpS1eQw_P79ko+-=jmQVkOE2VMvMam=0...@mail.gmail.com%3E


-Ramki.



On Sun, Mar 10, 2013 at 11:36 PM, bejoy...@yahoo.com wrote:

 **
 Hi Suresh

 AFAIK as of now a partition cannot contain sub directories, it can contain
 only files.

 You may have to move the sub dirs out of the parent dir 'a' and create
 separate partitions for those.
 Regards
 Bejoy KS

 Sent from remote device, Please excuse typos
 --
 *From: * Suresh Krishnappa suresh.krishna...@gmail.com
 *Date: *Mon, 11 Mar 2013 10:58:05 +0530
 *To: *user@hive.apache.org
 *ReplyTo: * user@hive.apache.org
 *Subject: *Re: hive issue with sub-directories

 Hi Mark,
 I am using external table in HIVE.

 This is how I am adding the partition

  alter table mytable add partition (pt=1) location '/test/a/';

 I am able to run HIVE queries only if '/test/a/b' folder is deleted.

 How can I retain this folder structure and still issue queries?

 Thanks
 Suresh

 On Sun, Mar 10, 2013 at 12:48 AM, Mark Grover grover.markgro...@gmail.com
  wrote:

 Suresh,
 By default, the partition column name has to be appear in HDFS
 directory structure.

 e.g.
 /user/hive/warehouse/table name/partition col name=partition col
 value/data1.txt
 /user/hive/warehouse/table name/partition col name=partition col
 value/data2.txt


 On Thu, Mar 7, 2013 at 7:20 AM, Suresh Krishnappa
 suresh.krishna...@gmail.com wrote:
  Hi All,
  I have the following directory structure in hdfs
 
  /test/a/
  /test/a/1.avro
  /test/a/2.avro
  /test/a/b/
  /test/a/b/3.avro
 
  I created an external HIVE table using Avro Serde and added /test/a as a
  partition to this table.
 
  I am not able to run a select query. Always getting the error 'not a
 file'
  on '/test/a/b'
 
  Is this by design, a bug or am I missing some configuration?
  I am using HIVE 0.10
 
  Thanks
  Suresh
 





Re: Error while table creation

2013-03-10 Thread Ramki Palle
Please use if not exists clause and let us know if it works.

I am wondering if a table exists already in your database with the same
name.

-Ramki.


On Sat, Mar 9, 2013 at 11:23 PM, Abhishek Gayakwad a.gayak...@gmail.comwrote:

 it is an old installation, we recently upgraded hive 0.7.0 to 0.9.0

 and we are not using if not exists clause


 On Sun, Mar 10, 2013 at 4:37 AM, Ramki Palle ramki.pa...@gmail.comwrote:

 Just wondering if your create table syntax include if not exists such
 as

 CREATE EXTERNAL TABLE IF NOT EXISTS my_table (

 ...
 ...
 ...
 )



 On Sun, Mar 10, 2013 at 2:34 AM, Viral Bajaria 
 viral.baja...@gmail.comwrote:

 Is this is a new installation of Hive or did you upgrade ? How many
 tables do you already have ?


 On Sat, Mar 9, 2013 at 12:00 PM, Abhishek Gayakwad a.gayak...@gmail.com
  wrote:

 while trying to create external table in oozie hive action, I am getting 
 following error, hive version is 0.9.0



 Caused by: javax.jdo.JDODataStoreException: Insert of object 
 org.apache.hadoop.hive.metastore.model.MColumnDescriptor@46c1dc0b using 
 statement INSERT INTO `CDS` (`CD_ID`) VALUES (?) failed : Duplicate 
 entry 'X' for key 'PRIMARY'
 NestedThrowables:
 com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException:
  Duplicate entry 'X' for key 'PRIMARY'
at 
 org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:313)
at 
 org.datanucleus.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:660)
at 
 org.datanucleus.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:680)
at 
 org.apache.hadoop.hive.metastore.ObjectStore.createTable(ObjectStore.java:617)
... 43 more
 Caused by: 
 com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException:
  Duplicate entry 'X' for key 'PRIMARY'


 Thanks,
 Abhishek







Re: java.lang.NoClassDefFoundError: com/jayway/jsonpath/PathUtil

2013-03-10 Thread Ramki Palle
When you execute the following query,

hive select * from twitter limit 5;

Hive runs it in local mode and not use MapReduce.

For the query,

hive select tweet_id from twitter limit 5;

I think you need to add JSON jars to overcome this error. You might have
added these in a previous session. If you want these jars available for all
sessions, insert the add jar statements to your $HOME/.hiverc file.


To bypass MapReduce

set hive.exec.mode.local.auto = true;

to suggest Hive to use local mode to execute the query. If it still uses
MR, try

set hive.fetch.task.conversion = more;.


-Ramki.



On Sun, Mar 10, 2013 at 12:19 AM, Sai Sai saigr...@yahoo.in wrote:

 Just wondering if anyone has any suggestions:

 This executes successfully:

 hive select * from twitter limit 5;

 This does not work:

 hive select tweet_id from twitter limit 5; // I have given the exception
 info below:

 Here is the output of this:

 hive select * from twitter limit 5;
 OK

 tweet_idcreated_attextuser_iduser_screen_nameuser_lang
 122106088022745088Fri Oct 07 00:28:54 + 2011wkwkw -_- ayo saja
 mba RT @yullyunet: Sepupuuu, kita lanjalan yok.. Kita karokoe-an.. Ajak mas
 galih jg kalo dia mau.. @Dindnf: doremifas124735434Dindnfen
 122106088018558976Fri Oct 07 00:28:54 + 2011@egg486 특별히
 준비했습니다!252828803CocaCola_Koreako
 122106088026939392Fri Oct 07 00:28:54 + 2011My offer of free
 gobbies for all if @amityaffliction play Blair snitch project still
 stands.168590073SarahYoungBlooden
 122106088035328001Fri Oct 07 00:28:54 + 2011the girl nxt to me
 in the lib got her headphones in dancing and singing loud af like she the
 only one here haha267296295MONEYyDREAMS_en
 122106088005971968Fri Oct 07 00:28:54 + 2011@KUnYoong_B2UTY
 Bị lsao đấy269182160b2st_b2utyhpen
 Time taken: 0.154 seconds

 This does not work:

 hive select tweet_id from twitter limit 5;


 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks is set to 0 since there's no reduce operator
 Starting Job = job_201303050432_0094, Tracking URL =
 http://ubuntu:50030/jobdetails.jsp?jobid=job_201303050432_0094
 Kill Command = /home/satish/work/hadoop-1.0.4/libexec/../bin/hadoop job
 -kill job_201303050432_0094
 Hadoop job information for Stage-1: number of mappers: 1; number of
 reducers: 0
 2013-03-10 00:14:44,509 Stage-1 map = 0%,  reduce = 0%
 2013-03-10 00:15:14,613 Stage-1 map = 100%,  reduce = 100%
 Ended Job = job_201303050432_0094 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL:
 http://ubuntu:50030/jobdetails.jsp?jobid=job_201303050432_0094
 Examining task ID: task_201303050432_0094_m_02 (and more) from job
 job_201303050432_0094

 Task with the most failures(4):
 -
 Task ID:
   task_201303050432_0094_m_00

 URL:

 http://ubuntu:50030/taskdetails.jsp?jobid=job_201303050432_0094tipid=task_201303050432_0094_m_00
 -
 Diagnostic Messages for this Task:
 java.lang.RuntimeException: Error in configuring object
 at
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
 at
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
 at
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:416)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
 at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.lang.reflect.InvocationTargetException

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
 ... 9 more
 Caused by: java.lang.RuntimeException: Error in configuring object
 at
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
 at
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
 at
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
 at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
 ... 14 more
 Caused by: java.lang.reflect.InvocationTargetException

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 

Re: java.lang.NoClassDefFoundError: com/jayway/jsonpath/PathUtil

2013-03-10 Thread Ramki Palle
Well, you get the results faster.

Please check this:

https://cwiki.apache.org/Hive/gettingstarted.html#GettingStarted-Runtimeconfiguration

Under section   Hive, Map-Reduce and Local-Mode, it says

This can be very useful to run queries over small data sets - in such cases
local mode execution is usually significantly faster than submitting jobs
to a large cluster.

-Ramki.






On Sun, Mar 10, 2013 at 5:26 AM, Sai Sai saigr...@yahoo.in wrote:

 Ramki/John
 Many Thanks, that really helped. I have run the add jars in the new
 session and it appears to be running. However i was wondering about by
 passing MR, why would we do it and what is the use of it. Will appreciate
 any input.
 Thanks
 Sai


   --
 *From:* Ramki Palle ramki.pa...@gmail.com

 *To:* user@hive.apache.org; Sai Sai saigr...@yahoo.in
 *Sent:* Sunday, 10 March 2013 4:22 AM
 *Subject:* Re: java.lang.NoClassDefFoundError:
 com/jayway/jsonpath/PathUtil

 When you execute the following query,

 hive select * from twitter limit 5;

 Hive runs it in local mode and not use MapReduce.

 For the query,

 hive select tweet_id from twitter limit 5;

 I think you need to add JSON jars to overcome this error. You might have
 added these in a previous session. If you want these jars available for all
 sessions, insert the add jar statements to your $HOME/.hiverc file.


 To bypass MapReduce

 set hive.exec.mode.local.auto = true;

 to suggest Hive to use local mode to execute the query. If it still uses
 MR, try

 set hive.fetch.task.conversion = more;.


 -Ramki.



 On Sun, Mar 10, 2013 at 12:19 AM, Sai Sai saigr...@yahoo.in wrote:

 Just wondering if anyone has any suggestions:

 This executes successfully:

 hive select * from twitter limit 5;

 This does not work:

 hive select tweet_id from twitter limit 5; // I have given the exception
 info below:

 Here is the output of this:

 hive select * from twitter limit 5;
 OK

 tweet_idcreated_attextuser_iduser_screen_nameuser_lang
 122106088022745088Fri Oct 07 00:28:54 + 2011wkwkw -_- ayo saja
 mba RT @yullyunet: Sepupuuu, kita lanjalan yok.. Kita karokoe-an.. Ajak mas
 galih jg kalo dia mau.. @Dindnf: doremifas124735434Dindnfen
 122106088018558976Fri Oct 07 00:28:54 + 2011@egg486 특별히
 준비했습니다!252828803CocaCola_Koreako
 122106088026939392Fri Oct 07 00:28:54 + 2011My offer of free
 gobbies for all if @amityaffliction play Blair snitch project still
 stands.168590073SarahYoungBlooden
 122106088035328001Fri Oct 07 00:28:54 + 2011the girl nxt to me
 in the lib got her headphones in dancing and singing loud af like she the
 only one here haha267296295MONEYyDREAMS_en
 122106088005971968Fri Oct 07 00:28:54 + 2011@KUnYoong_B2UTY
 Bị lsao đấy269182160b2st_b2utyhpen
 Time taken: 0.154 seconds

 This does not work:

 hive select tweet_id from twitter limit 5;


 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks is set to 0 since there's no reduce operator
 Starting Job = job_201303050432_0094, Tracking URL =
 http://ubuntu:50030/jobdetails.jsp?jobid=job_201303050432_0094
 Kill Command = /home/satish/work/hadoop-1.0.4/libexec/../bin/hadoop job
 -kill job_201303050432_0094
 Hadoop job information for Stage-1: number of mappers: 1; number of
 reducers: 0
 2013-03-10 00:14:44,509 Stage-1 map = 0%,  reduce = 0%
 2013-03-10 00:15:14,613 Stage-1 map = 100%,  reduce = 100%
 Ended Job = job_201303050432_0094 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL:
 http://ubuntu:50030/jobdetails.jsp?jobid=job_201303050432_0094
 Examining task ID: task_201303050432_0094_m_02 (and more) from job
 job_201303050432_0094

 Task with the most failures(4):
 -
 Task ID:
   task_201303050432_0094_m_00

 URL:

 http://ubuntu:50030/taskdetails.jsp?jobid=job_201303050432_0094tipid=task_201303050432_0094_m_00
 -
 Diagnostic Messages for this Task:
 java.lang.RuntimeException: Error in configuring object
 at
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
 at
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
 at
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:416)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
 at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.lang.reflect.InvocationTargetException

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke

Re: HIVE issues when using large number of partitions

2013-03-09 Thread Ramki Palle
Check this for your first question:

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Recoverpartitions

Please post if you find any solution for your 2nd and 3rd questions.

Regards,
Ramki.


On Thu, Mar 7, 2013 at 8:01 PM, Suresh Krishnappa 
suresh.krishna...@gmail.com wrote:

 Hi All,
 I have a hadoop cluster with data present in large number of directories (
  10,000)
 To run HIVE queries over this data I created an external partitioned table
 and pointed each directory as a partition to the external table using
 'alter table add partition' command.
 Is there a better way to create a HIVE external table over large number of
 directories?

 Also I am facing the following issues due to the large number of partitions
 1) The DDL operations of creating the table and adding partitions to the
 table takes a very long time. Takes about an hour to add around 10,000
 partitions
 2) Getting 'out of memory' java exception while adding partitions  5
 3) Sometimes getting 'out of memory' java exception for select queries for
 partitions  1

 What is the recommended limit to the number of partitions that we can
 create with an HIVE table?
 Are there any configuration settings in hive/hadoop to support large
 number of partitions?

 I am using HIVE 0.10.0. I re-ran the tests by replacing derby with
 postgresql as metastore and still faced similar issues.

 Would appreciate any inputs on this

 Thanks
 Suresh




Re: difference between add jar in hive session and hive --auxpath

2013-03-09 Thread Ramki Palle
I am not sure whether it works but try this:

Put your ADD JAR commands into a file and invoke hive with -i file option.

or

insert your ADD JAR commands in your $HOME/.hiverc file and start hive.


-Ramki.


On Fri, Mar 8, 2013 at 11:55 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 Essentially anything that is part of the InputFormat needs to be in
 auxlib/auxpath. Anything part of a UDF can be added with 'add jar'.


 On Fri, Mar 8, 2013 at 1:01 PM, Dean Wampler 
 dean.wamp...@thinkbiganalytics.com wrote:

 --auxpath adds more jars to Hive's classpath before invoking Hive. ADD
 JARS copies jars around the cluster and adds them to the task classpath, so
 the jars you add aren't visible to hive itself. Annoying, but...

 On Fri, Mar 8, 2013 at 11:53 AM, java8964 java8964 
 java8...@hotmail.comwrote:

  This is in HIVE-0.9.0

 hive list jars;
 /nfs_home/common/userlibs/google-collections-1.0.jar
 /nfs_home/common/userlibs/elephant-bird-hive-3.0.7.jar
 /nfs_home/common/userlibs/protobuf-java-2.3.0.jar
 /nfs_home/common/userlibs/elephant-bird-core-3.0.7.jar
 file:/usr/lib/hive/lib/hive-builtins-0.9.0-cdh4.1.2.jar
 hive desc table;
 java.lang.NoClassDefFoundError:
 com/twitter/elephantbird/mapreduce/io/ProtobufConverter
 at
 com.twitter.elephantbird.hive.serde.ProtobufDeserializer.initialize(ProtobufDeserializer.java:45)
 at
 org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:203)
 at
 org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:260)
 at
 org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:253)
 at
 org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:490)
 at
 org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:162)
 at
 org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:930)
 at
 org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:844)
 at
 org.apache.hadoop.hive.ql.exec.DDLTask.describeTable(DDLTask.java:2545)
 at
 org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:309)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
 at
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1331)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1117)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:950)
 at
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
 at
 org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
 at
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:744)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:607)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
 Caused by: java.lang.ClassNotFoundException:
 com.twitter.elephantbird.mapreduce.io.ProtobufConverter
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
 ... 25 more
 FAILED: Execution Error, return code -101 from
 org.apache.hadoop.hive.ql.exec.DDLTask
 hive exit;
 [y130zhan@daca2 userlibs]$ jar tvf
 /nfs_home/common/userlibs/elephant-bird-core-3.0.7.jar | grep
 ProtobufConverter
   4825 Mon Mar 04 16:50:46 UTC 2013
 com/twitter/elephantbird/mapreduce/io/ProtobufConverter.class
732 Mon Mar 04 16:50:46 UTC 2013
 com/twitter/elephantbird/mapreduce/io/ProtobufConverter$1.class


 --
 From: vkavul...@outlook.com
 To: user@hive.apache.org
 Subject: RE: difference between add jar in hive session and hive
 --auxpath
 Date: Thu, 7 Mar 2013 16:44:41 -0800


 If properly done, add jar jar-file should work the same as passing
 the jar with --auxpath. Can you run list jars; command from CLI or Hue
 and check if you see the jar file.

 --
 From: java8...@hotmail.com
 To: user@hive.apache.org
 Subject: difference between add jar in hive session and hive --auxpath
 Date: Thu, 7 Mar 2013 17:47:26 -0500

  Hi,

 I have a hive table which uses the jar file provided from the
 elephant-bird, which is a framework integrated between lzo and google
 protobuf data and hadoop/hive.

 If I use the hive command like this:

 hive --auxpath path_to_jars, it works fine 

Re: Rename external table, including HDFS directory

2013-03-09 Thread Ramki Palle
As you can run the hadoop dfs commands from inside Hive, you can put both
the commands in a script file and run the script file using hive. These are
still two different commands but are together at one place and can be
executed in the same environment in one go.

-Ramki.


On Fri, Mar 8, 2013 at 5:32 AM, Dean Wampler 
dean.wamp...@thinkbiganalytics.com wrote:

 You can rename the LOCATION with an ALTER TABLE command. I don't believe
 there's a single, atomic statement to rename both the table and the
 directory.


 On Thu, Mar 7, 2013 at 5:29 PM, Keith Wiley kwi...@keithwiley.com wrote:

 My understanding is that renaming and external table doesn't rename the
 corresponding HDFS directory to match.  I would like to do this; I want to
 rename the table in Hive and also rename the HDFS directory so they match
 (they already match with the old name).  I'm not sure how to do this.  I'm
 worried that if I rename the HDFS directory through Hadoop, not Hive, that
 Hive will lose track of the directory, regardless of whether I rename the
 table in Hive as well.

 I also don't want to drop and recreate the table because I don't want to
 notify Hive of all the partitions all over again.

 Thoughts?


 
 Keith Wiley kwi...@keithwiley.com keithwiley.com
 music.keithwiley.com

 Luminous beings are we, not this crude matter.
--  Yoda

 




 --
 *Dean Wampler, Ph.D.*
 thinkbiganalytics.com
 +1-312-339-1330




Re: Error while table creation

2013-03-09 Thread Ramki Palle
Just wondering if your create table syntax include if not exists such as

CREATE EXTERNAL TABLE IF NOT EXISTS my_table (

...
...
...
)



On Sun, Mar 10, 2013 at 2:34 AM, Viral Bajaria viral.baja...@gmail.comwrote:

 Is this is a new installation of Hive or did you upgrade ? How many tables
 do you already have ?


 On Sat, Mar 9, 2013 at 12:00 PM, Abhishek Gayakwad 
 a.gayak...@gmail.comwrote:

 while trying to create external table in oozie hive action, I am getting 
 following error, hive version is 0.9.0


 Caused by: javax.jdo.JDODataStoreException: Insert of object 
 org.apache.hadoop.hive.metastore.model.MColumnDescriptor@46c1dc0b using 
 statement INSERT INTO `CDS` (`CD_ID`) VALUES (?) failed : Duplicate entry 
 'X' for key 'PRIMARY'
 NestedThrowables:
 com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: 
 Duplicate entry 'X' for key 'PRIMARY'
  at 
 org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:313)
  at 
 org.datanucleus.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:660)
  at 
 org.datanucleus.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:680)
  at 
 org.apache.hadoop.hive.metastore.ObjectStore.createTable(ObjectStore.java:617)
  ... 43 more
 Caused by: 
 com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: 
 Duplicate entry 'X' for key 'PRIMARY'


 Thanks,
 Abhishek





Re: Hive sample test

2013-03-08 Thread Ramki Palle
If  any of the 100 rows that the sub-query returns do not satisfy the where
clause, there would be no rows in the overall result. Do we still consider
that the Hive query is verified in this case?

Regards,
Ramki.


On Wed, Mar 6, 2013 at 1:14 AM, Dean Wampler 
dean.wamp...@thinkbiganalytics.com wrote:

 NIce, yea that would do it.


 On Tue, Mar 5, 2013 at 1:26 PM, Mark Grover 
 grover.markgro...@gmail.comwrote:

 I typically change my query to query from a limited version of the whole
 table.

 Change

 select really_expensive_select_clause
 from
 really_big_table
 where
 something=something
 group by something=something

 to

 select really_expensive_select_clause
 from
 (
 select
 *
 from
 really_big_table
 limit 100
 )t
 where
 something=something
 group by something=something


 On Tue, Mar 5, 2013 at 10:57 AM, Dean Wampler
 dean.wamp...@thinkbiganalytics.com wrote:
  Unfortunately, it will still go through the whole thing, then just
 limit the
  output. However, there's a flag that I think only works in more recent
 Hive
  releases:
 
  set hive.limit.optimize.enable=true
 
  This is supposed to apply limiting earlier in the data stream, so it
 will
  give different results that limiting just the output.
 
  Like Chuck said, you might consider sampling, but unless your table is
  organized into buckets, you'll at least scan the whole table, but maybe
 not
  do all computation over it ??
 
  Also, if you have a small sample data set:
 
  set hive.exec.mode.local.auto=true
 
  will cause Hive to bypass the Job and Task Trackers, calling APIs
 directly,
  when it can do the whole thing in a single process. Not lightning
 fast,
  but faster.
 
  dean
 
  On Tue, Mar 5, 2013 at 12:48 PM, Joey D'Antoni jdant...@yahoo.com
 wrote:
 
  Just add a limit 1 to the end of your query.
 
 
 
 
  On Mar 5, 2013, at 1:45 PM, Kyle B kbi...@gmail.com wrote:
 
  Hello,
 
  I was wondering if there is a way to quick-verify a Hive query before
 it
  is run against a big dataset? The tables I am querying against have
 millions
  of records, and I'd like to verify my Hive query before I run it
 against all
  records.
 
  Is there a way to test the query against a small subset of the data,
  without going into full MapReduce? As silly as this sounds, is there a
 way
  to MapReduce without the overhead of MapReduce? That way I can check my
  query is doing what I want before I run it against all records.
 
  Thanks,
 
  -Kyle
 
 
 
 
  --
  Dean Wampler, Ph.D.
  thinkbiganalytics.com
  +1-312-339-1330
 




 --
 *Dean Wampler, Ph.D.*
 thinkbiganalytics.com
 +1-312-339-1330




Re: Find current db we r using in Hive

2013-03-07 Thread Ramki Palle
Sai,

I do not think there is any command to show the current db in Hive. One
alternative for you is to set a property so that the current database is
shown as part of the prompt:

set hive.cli.print.current.db=true;

This one shows your current db as part of your hive prompt.

Regards,
Ramki.


On Fri, Mar 8, 2013 at 11:13 AM, Sai Sai saigr...@yahoo.in wrote:

 Just wondering if there is any command in Hive which will show us the
 current db we r using similar to pwd in Unix.
 Thanks
 Sai