Re: java.lang.NoClassDefFoundError: com/jayway/jsonpath/PathUtil

2013-03-08 Thread Sai Sai
I have added the jar files successfully like this:


hive (testdb) ADD JAR lib/hive-json-serde-0.3.jar;
   Added lib/hive-json-serde-0.3.jar to class path
   Added resource: lib/hive-json-serde-0.3.jar



hive (testdb) ADD JAR lib/json-path-0.5.4.jar;
   Added lib/json-path-0.5.4.jar to class path
   Added resource: lib/json-path-0.5.4.jar



hive (testdb) ADD JAR lib/json-smart-1.0.6.3.jar;
   Added lib/json-smart-1.0.6.3.jar to class path
   Added resource: lib/json-smart-1.0.6.3.jar


After this i am getting this error:



CREATE EXTERNAL TABLE IF NOT EXISTS twitter (tweet_id BIGINT,created_at 
STRING,text STRING,user_id BIGINT, user_screen_name STRING,user_lang STRING) 
ROW FORMAT SERDE org.apache.hadoop.hive.contrib.serde2.JsonSerde WITH 
SERDEPROPERTIES ( 
tweet_id=$.id,created_at=$.created_at,text=$.text,user_id=$.user.id,user_screen_name=$.user.screen_name,
 user_lang=$.user.lang) LOCATION '/home/satish/data/twitter/input';
java.lang.NoClassDefFoundError: com/jayway/jsonpath/PathUtil
    at org.apache.hadoop.hive.contrib.serde2.JsonSerde.initialize(Unknown 
Source)
    at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:207)
    at 
org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:266)
    at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:259)
    at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:585)
    at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:550)
    at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:3698)
    at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:253)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
    at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1336)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1122)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:935)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:755)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.ClassNotFoundException: com.jayway.jsonpath.PathUtil
    at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
    ... 23 more
FAILED: Execution Error, return code -101 from 
org.apache.hadoop.hive.ql.exec.DDLTask



Any help would be really appreciated.
Thanks
Sai


Re: Hive sample test

2013-03-08 Thread Ramki Palle
If  any of the 100 rows that the sub-query returns do not satisfy the where
clause, there would be no rows in the overall result. Do we still consider
that the Hive query is verified in this case?

Regards,
Ramki.


On Wed, Mar 6, 2013 at 1:14 AM, Dean Wampler 
dean.wamp...@thinkbiganalytics.com wrote:

 NIce, yea that would do it.


 On Tue, Mar 5, 2013 at 1:26 PM, Mark Grover 
 grover.markgro...@gmail.comwrote:

 I typically change my query to query from a limited version of the whole
 table.

 Change

 select really_expensive_select_clause
 from
 really_big_table
 where
 something=something
 group by something=something

 to

 select really_expensive_select_clause
 from
 (
 select
 *
 from
 really_big_table
 limit 100
 )t
 where
 something=something
 group by something=something


 On Tue, Mar 5, 2013 at 10:57 AM, Dean Wampler
 dean.wamp...@thinkbiganalytics.com wrote:
  Unfortunately, it will still go through the whole thing, then just
 limit the
  output. However, there's a flag that I think only works in more recent
 Hive
  releases:
 
  set hive.limit.optimize.enable=true
 
  This is supposed to apply limiting earlier in the data stream, so it
 will
  give different results that limiting just the output.
 
  Like Chuck said, you might consider sampling, but unless your table is
  organized into buckets, you'll at least scan the whole table, but maybe
 not
  do all computation over it ??
 
  Also, if you have a small sample data set:
 
  set hive.exec.mode.local.auto=true
 
  will cause Hive to bypass the Job and Task Trackers, calling APIs
 directly,
  when it can do the whole thing in a single process. Not lightning
 fast,
  but faster.
 
  dean
 
  On Tue, Mar 5, 2013 at 12:48 PM, Joey D'Antoni jdant...@yahoo.com
 wrote:
 
  Just add a limit 1 to the end of your query.
 
 
 
 
  On Mar 5, 2013, at 1:45 PM, Kyle B kbi...@gmail.com wrote:
 
  Hello,
 
  I was wondering if there is a way to quick-verify a Hive query before
 it
  is run against a big dataset? The tables I am querying against have
 millions
  of records, and I'd like to verify my Hive query before I run it
 against all
  records.
 
  Is there a way to test the query against a small subset of the data,
  without going into full MapReduce? As silly as this sounds, is there a
 way
  to MapReduce without the overhead of MapReduce? That way I can check my
  query is doing what I want before I run it against all records.
 
  Thanks,
 
  -Kyle
 
 
 
 
  --
  Dean Wampler, Ph.D.
  thinkbiganalytics.com
  +1-312-339-1330
 




 --
 *Dean Wampler, Ph.D.*
 thinkbiganalytics.com
 +1-312-339-1330




Re: Accessing sub column in hive

2013-03-08 Thread bejoy_ks
Hi Sai


You can do it as
Select address.country from employees;
 

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-Original Message-
From: Bennie Schut bsc...@ebuddy.com
Date: Fri, 8 Mar 2013 09:09:49 
To: user@hive.apache.orguser@hive.apache.org; 'Sai Sai'saigr...@yahoo.in
Reply-To: user@hive.apache.org
Subject: RE: Accessing sub column in hive

Perhaps worth posting the error. Some might know what the error means.

Also a bit unrelated to hive but please do yourself a favor and don't use float 
to store monetary values like salary. You will get rounding issues at some 
point in time when you do arithmetic on them. Considering you are using hadoop 
you probably have a lot of data so adding it all up will get you there really 
really fast. 
http://stackoverflow.com/questions/3730019/why-not-use-double-or-float-to-represent-currency


From: Sai Sai [mailto:saigr...@yahoo.in]
Sent: Thursday, March 07, 2013 12:54 PM
To: user@hive.apache.org
Subject: Re: Accessing sub column in hive

I have a table created like this successfully:

CREATE TABLE IF NOT EXISTS employees (name STRING,salary FLOAT,subordinates 
ARRAYSTRING,deductions   MAPSTRING,FLOAT,address STRUCTstreet:STRING, 
city:STRING, state:STRING, zip:INT, country:STRING)

I would like to access/display country column from my address struct.
I have tried this:

select address[country] from employees;

I get an error.

Please help.

Thanks
Sai



Re: Accessing sub column in hive

2013-03-08 Thread Dean Wampler
I recognize this example ;)

You reference struct elements with the dot notation, as Bejoy said, map
elements with what you tried, deductions['Federal taxes'], and arrays by
index, starting from zero, subordinates[0].

On Fri, Mar 8, 2013 at 6:35 AM, bejoy...@yahoo.com wrote:

 Hi Sai


 You can do it as
 Select address.country from employees;

 Regards
 Bejoy KS

 Sent from remote device, Please excuse typos
 --
 *From: * Bennie Schut bsc...@ebuddy.com
 *Date: *Fri, 8 Mar 2013 09:09:49 +0100
 *To: *user@hive.apache.orguser@hive.apache.org; 'Sai Sai'
 saigr...@yahoo.in
 *ReplyTo: * user@hive.apache.org
 *Subject: *RE: Accessing sub column in hive

 Perhaps worth posting the error. Some might know what the error means.

 ** **

 Also a bit unrelated to hive but please do yourself a favor and don’t use
 float to store monetary values like salary. You will get rounding issues at
 some point in time when you do arithmetic on them. Considering you are
 using hadoop you probably have a lot of data so adding it all up will get
 you there really really fast.
 http://stackoverflow.com/questions/3730019/why-not-use-double-or-float-to-represent-currency
 

 ** **

 ** **

 *From:* Sai Sai [mailto:saigr...@yahoo.in]
 *Sent:* Thursday, March 07, 2013 12:54 PM
 *To:* user@hive.apache.org
 *Subject:* Re: Accessing sub column in hive

 ** **

 I have a table created like this successfully:

 ** **

 CREATE TABLE IF NOT EXISTS employees (name STRING,salary
 FLOAT,subordinates ARRAYSTRING,deductions   MAPSTRING,FLOAT,address
 STRUCTstreet:STRING, city:STRING, state:STRING, zip:INT, country:STRING)
 

 ** **

 I would like to access/display country column from my address struct.

 I have tried this:

 ** **

 select address[country] from employees;

 ** **

 I get an error.

 ** **

 Please help.

 ** **

 Thanks

 Sai




-- 
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330


Re: java.lang.NoClassDefFoundError: com/jayway/jsonpath/PathUtil

2013-03-08 Thread Dean Wampler
Unfortunately, you have to also add the json jars to Hive's class path
before it starts, e.g.,

env HADOOP_CLASSPATH=/path/to/lib/*.jar hive

Use the appropriate path to your lib directory.

On Fri, Mar 8, 2013 at 4:53 AM, Sai Sai saigr...@yahoo.in wrote:

 I have added the jar files successfully like this:


 hive (testdb) ADD JAR lib/hive-json-serde-0.3.jar;
Added lib/hive-json-serde-0.3.jar to class path
Added resource: lib/hive-json-serde-0.3.jar


 hive (testdb) ADD JAR lib/json-path-0.5.4.jar;
Added lib/json-path-0.5.4.jar to class path
Added resource: lib/json-path-0.5.4.jar


 hive (testdb) ADD JAR lib/json-smart-1.0.6.3.jar;
Added lib/json-smart-1.0.6.3.jar to class path
Added resource: lib/json-smart-1.0.6.3.jar


 After this i am getting this error:


 CREATE EXTERNAL TABLE IF NOT EXISTS twitter (tweet_id BIGINT,created_at
 STRING,text STRING,user_id BIGINT, user_screen_name STRING,user_lang
 STRING) ROW FORMAT SERDE org.apache.hadoop.hive.contrib.serde2.JsonSerde
 WITH SERDEPROPERTIES (
 tweet_id=$.id,created_at=$.created_at,text=$.text,user_id=$.
 user.id,user_screen_name=$.user.screen_name,
 user_lang=$.user.lang) LOCATION '/home/satish/data/twitter/input';
 java.lang.NoClassDefFoundError: com/jayway/jsonpath/PathUtil
 at org.apache.hadoop.hive.contrib.serde2.JsonSerde.initialize(Unknown
 Source)
 at
 org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:207)
 at
 org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:266)
 at
 org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:259)
 at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:585)
 at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:550)
 at
 org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:3698)
 at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:253)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
 at
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1336)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1122)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:935)
 at
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:755)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 Caused by: java.lang.ClassNotFoundException: com.jayway.jsonpath.PathUtil
 at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
 ... 23 more
 FAILED: Execution Error, return code -101 from
 org.apache.hadoop.hive.ql.exec.DDLTask


 Any help would be really appreciated.
 Thanks
 Sai




-- 
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330


Re: Find current db we r using in Hive

2013-03-08 Thread Dean Wampler
It's odd that there is no such command. The trick Ramki mentioned is the
only one I know of. Two points about it, though:

1. It only works on Hive v0.8+.
2. I've seen a few cases where the prompt did NOT change when first used,
but starting working a little later! I have no idea why and of course, it
happened while teaching a class where I'm supposed to be the expert ;)

dean

On Fri, Mar 8, 2013 at 12:36 AM, Ramki Palle ramki.pa...@gmail.com wrote:

 Sai,

 I do not think there is any command to show the current db in Hive. One
 alternative for you is to set a property so that the current database is
 shown as part of the prompt:

 set hive.cli.print.current.db=true;

 This one shows your current db as part of your hive prompt.

 Regards,
 Ramki.


 On Fri, Mar 8, 2013 at 11:13 AM, Sai Sai saigr...@yahoo.in wrote:

 Just wondering if there is any command in Hive which will show us the
 current db we r using similar to pwd in Unix.
 Thanks
 Sai





-- 
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330


RE: Data mismatch when importing data from Oracle to Hive through Sqoop without an error

2013-03-08 Thread Ajit Kumar Shreevastava
Hi Venkat,

All most column have some value except these three.

Regards,
Ajit

-Original Message-
From: Venkat Ranganathan [mailto:vranganat...@hortonworks.com] 
Sent: Wednesday, March 06, 2013 9:36 PM
To: user@hive.apache.org
Cc: u...@sqoop.apache.org
Subject: Re: Data mismatch when importing data from Oracle to Hive through 
Sqoop without an error

Hi Ajit

Do you know if rest of the columns also are null when the three non null 
columns are null

Venkat

On Wed, Mar 6, 2013 at 12:35 AM, Ajit Kumar Shreevastava 
ajit.shreevast...@hcl.com wrote:
 Hi Abhijeet,



 Thanks for your response.

 If values that don't fit in double must be getting inserted as Null is 
 the case then count should not be mis-match in both the case.

 Here the null value inserted are extra value apart from the other 
 value which is already present in both Oracle Table and Hive table.



 Correct me if I am wrong in interpretation.



 Thanks and Regards,

 Ajit Kumar Shreevastava



 From: abhijeet gaikwad [mailto:abygaikwa...@gmail.com]
 Sent: Wednesday, March 06, 2013 1:46 PM
 To: user@hive.apache.org
 Cc: u...@sqoop.apache.org
 Subject: Re: Data mismatch when importing data from Oracle to Hive 
 through Sqoop without an error



 Sqoop maps numeric and decimal types (RDBMS) to double (Hive). I think 
 the values that don't fit in double must be getting inserted as NULL.
 You can see this warning in your logs.

 Thanks,
 Abhijeet

 On Wed, Mar 6, 2013 at 1:32 PM, Ajit Kumar Shreevastava 
 ajit.shreevast...@hcl.com wrote:

 Hi all,

 I have notice one interesting thing in the below result-set.

 I have fired one query in both Oracle and Hive shell and found the 
 following result set:à



 SQL select count(1) from bttn

   2  where bttn_id is null or data_inst_id is null or scr_id is null;



   COUNT(1)

 --

  0

 hive select count(1) from bttn

  where bttn_id is null or data_inst_id is null or scr_id is null;

 Total MapReduce jobs = 1

 Launching Job 1 out of 1

 Number of reduce tasks determined at compile time: 1

 In order to change the average load for a reducer (in bytes):

   set hive.exec.reducers.bytes.per.reducer=number

 In order to limit the maximum number of reducers:

   set hive.exec.reducers.max=number

 In order to set a constant number of reducers:

   set mapred.reduce.tasks=number

 Starting Job = job_201303051835_0020, Tracking URL =
 http://NHCLT-PC44-2:50030/jobdetails.jsp?jobid=job_201303051835_0020

 Kill Command = /home/hadoop/hadoop-1.0.3/bin/hadoop job  -kill
 job_201303051835_0020

 Hadoop job information for Stage-1: number of mappers: 1; number of
 reducers: 1

 2013-03-06 13:22:56,908 Stage-1 map = 0%,  reduce = 0%

 2013-03-06 13:23:05,928 Stage-1 map = 100%,  reduce = 0%, Cumulative 
 CPU 5.2 sec

 2013-03-06 13:23:06,931 Stage-1 map = 100%,  reduce = 0%, Cumulative 
 CPU 5.2 sec

 2013-03-06 13:23:07,934 Stage-1 map = 100%,  reduce = 0%, Cumulative 
 CPU 5.2 sec

 2013-03-06 13:23:08,938 Stage-1 map = 100%,  reduce = 0%, Cumulative 
 CPU 5.2 sec

 2013-03-06 13:23:09,941 Stage-1 map = 100%,  reduce = 0%, Cumulative 
 CPU 5.2 sec

 2013-03-06 13:23:10,944 Stage-1 map = 100%,  reduce = 0%, Cumulative 
 CPU 5.2 sec

 2013-03-06 13:23:11,947 Stage-1 map = 100%,  reduce = 0%, Cumulative 
 CPU 5.2 sec

 2013-03-06 13:23:12,956 Stage-1 map = 100%,  reduce = 0%, Cumulative 
 CPU 5.2 sec

 2013-03-06 13:23:13,959 Stage-1 map = 100%,  reduce = 0%, Cumulative 
 CPU 5.2 sec

 2013-03-06 13:23:14,962 Stage-1 map = 100%,  reduce = 33%, Cumulative 
 CPU
 5.2 sec

 2013-03-06 13:23:15,965 Stage-1 map = 100%,  reduce = 33%, Cumulative 
 CPU
 5.2 sec

 2013-03-06 13:23:16,969 Stage-1 map = 100%,  reduce = 33%, Cumulative 
 CPU
 5.2 sec

 2013-03-06 13:23:17,974 Stage-1 map = 100%,  reduce = 100%, Cumulative 
 CPU
 6.95 sec

 2013-03-06 13:23:18,977 Stage-1 map = 100%,  reduce = 100%, Cumulative 
 CPU
 6.95 sec

 2013-03-06 13:23:19,981 Stage-1 map = 100%,  reduce = 100%, Cumulative 
 CPU
 6.95 sec

 2013-03-06 13:23:20,985 Stage-1 map = 100%,  reduce = 100%, Cumulative 
 CPU
 6.95 sec

 2013-03-06 13:23:21,988 Stage-1 map = 100%,  reduce = 100%, Cumulative 
 CPU
 6.95 sec

 2013-03-06 13:23:22,995 Stage-1 map = 100%,  reduce = 100%, Cumulative 
 CPU
 6.95 sec

 2013-03-06 13:23:23,998 Stage-1 map = 100%,  reduce = 100%, Cumulative 
 CPU
 6.95 sec

 MapReduce Total cumulative CPU time: 6 seconds 950 msec

 Ended Job = job_201303051835_0020

 MapReduce Jobs Launched:

 Job 0: Map: 1  Reduce: 1   Cumulative CPU: 6.95 sec   HDFS Read: 184270926
 HDFS Write: 4 SUCCESS

 Total MapReduce CPU Time Spent: 6 seconds 950 msec

 OK

 986

 Time taken: 35.983 seconds

 hive



 and  739169 - 738183=986



 can anyone tell me why this happened as BTTN_ID  ,  DATA_INST_ID, 
 SCR_ID are not null constrains of BTTN table and also composite 
 Primary Key of the table?

 Also tell me how can I prevent this unnecessary data generation in 
 HIVE table.



 Regards

 Ajit Kumar Shreevastava



 From: 

RE: difference between add jar in hive session and hive --auxpath

2013-03-08 Thread java8964 java8964

This is in HIVE-0.9.0
hive list 
jars;/nfs_home/common/userlibs/google-collections-1.0.jar/nfs_home/common/userlibs/elephant-bird-hive-3.0.7.jar/nfs_home/common/userlibs/protobuf-java-2.3.0.jar/nfs_home/common/userlibs/elephant-bird-core-3.0.7.jarfile:/usr/lib/hive/lib/hive-builtins-0.9.0-cdh4.1.2.jarhive
 desc table;java.lang.NoClassDefFoundError: 
com/twitter/elephantbird/mapreduce/io/ProtobufConverterat 
com.twitter.elephantbird.hive.serde.ProtobufDeserializer.initialize(ProtobufDeserializer.java:45)
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:203)
at 
org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:260)
at 
org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:253)
at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:490)at 
org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:162)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:930)at 
org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:844)at 
org.apache.hadoop.hive.ql.exec.DDLTask.describeTable(DDLTask.java:2545)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:309)at 
org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) 
   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1331)at 
org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1117)at 
org.apache.hadoop.hive.ql.Driver.run(Driver.java:950)at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:744)at 
org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:607)at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)   
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)at 
org.apache.hadoop.util.RunJar.main(RunJar.java:208)Caused by: 
java.lang.ClassNotFoundException: 
com.twitter.elephantbird.mapreduce.io.ProtobufConverterat 
java.net.URLClassLoader$1.run(URLClassLoader.java:202)at 
java.security.AccessController.doPrivileged(Native Method)at 
java.net.URLClassLoader.findClass(URLClassLoader.java:190)at 
java.lang.ClassLoader.loadClass(ClassLoader.java:307)at 
java.lang.ClassLoader.loadClass(ClassLoader.java:248)... 25 moreFAILED: 
Execution Error, return code -101 from 
org.apache.hadoop.hive.ql.exec.DDLTaskhive exit;[y130zhan@daca2 userlibs]$ jar 
tvf /nfs_home/common/userlibs/elephant-bird-core-3.0.7.jar | grep 
ProtobufConverter  4825 Mon Mar 04 16:50:46 UTC 2013 
com/twitter/elephantbird/mapreduce/io/ProtobufConverter.class   732 Mon Mar 04 
16:50:46 UTC 2013 
com/twitter/elephantbird/mapreduce/io/ProtobufConverter$1.class

From: vkavul...@outlook.com
To: user@hive.apache.org
Subject: RE: difference between add jar in hive session and hive --auxpath
Date: Thu, 7 Mar 2013 16:44:41 -0800




If properly done, add jar jar-file should work the same as passing the jar 
with --auxpath. Can you run list jars; command from CLI or Hue and check if 
you see the jar file.

From: java8...@hotmail.com
To: user@hive.apache.org
Subject: difference between add jar in hive session and hive --auxpath
Date: Thu, 7 Mar 2013 17:47:26 -0500





Hi, 
I have a hive table which uses the jar file provided from the elephant-bird, 
which is a framework integrated between lzo and google protobuf data and 
hadoop/hive.
If I use the hive command like this:
hive --auxpath path_to_jars, it works fine to query my table, 
but if I use the add jar after I started the hive session, I will get 
ClassNotFoundException in the runtime of my query of the classes in those jars.
My questions are:
1) What is the different between hive --auxpath and add jar in the hive 
session?2) This problem makes it is hard to access my table in the HUE, as it 
only supports add jar, but not --auxpath option. Any suggestions?

Thanks
Yong
  

Re: difference between add jar in hive session and hive --auxpath

2013-03-08 Thread Dean Wampler
--auxpath adds more jars to Hive's classpath before invoking Hive. ADD JARS
copies jars around the cluster and adds them to the task classpath, so the
jars you add aren't visible to hive itself. Annoying, but...

On Fri, Mar 8, 2013 at 11:53 AM, java8964 java8964 java8...@hotmail.comwrote:

  This is in HIVE-0.9.0

 hive list jars;
 /nfs_home/common/userlibs/google-collections-1.0.jar
 /nfs_home/common/userlibs/elephant-bird-hive-3.0.7.jar
 /nfs_home/common/userlibs/protobuf-java-2.3.0.jar
 /nfs_home/common/userlibs/elephant-bird-core-3.0.7.jar
 file:/usr/lib/hive/lib/hive-builtins-0.9.0-cdh4.1.2.jar
 hive desc table;
 java.lang.NoClassDefFoundError:
 com/twitter/elephantbird/mapreduce/io/ProtobufConverter
 at
 com.twitter.elephantbird.hive.serde.ProtobufDeserializer.initialize(ProtobufDeserializer.java:45)
 at
 org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:203)
 at
 org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:260)
 at
 org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:253)
 at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:490)
 at
 org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:162)
 at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:930)
 at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:844)
 at
 org.apache.hadoop.hive.ql.exec.DDLTask.describeTable(DDLTask.java:2545)
 at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:309)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
 at
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1331)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1117)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:950)
 at
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
 at
 org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
 at
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:744)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:607)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
 Caused by: java.lang.ClassNotFoundException:
 com.twitter.elephantbird.mapreduce.io.ProtobufConverter
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
 ... 25 more
 FAILED: Execution Error, return code -101 from
 org.apache.hadoop.hive.ql.exec.DDLTask
 hive exit;
 [y130zhan@daca2 userlibs]$ jar tvf
 /nfs_home/common/userlibs/elephant-bird-core-3.0.7.jar | grep
 ProtobufConverter
   4825 Mon Mar 04 16:50:46 UTC 2013
 com/twitter/elephantbird/mapreduce/io/ProtobufConverter.class
732 Mon Mar 04 16:50:46 UTC 2013
 com/twitter/elephantbird/mapreduce/io/ProtobufConverter$1.class


 --
 From: vkavul...@outlook.com
 To: user@hive.apache.org
 Subject: RE: difference between add jar in hive session and hive --auxpath
 Date: Thu, 7 Mar 2013 16:44:41 -0800


 If properly done, add jar jar-file should work the same as passing the
 jar with --auxpath. Can you run list jars; command from CLI or Hue and
 check if you see the jar file.

 --
 From: java8...@hotmail.com
 To: user@hive.apache.org
 Subject: difference between add jar in hive session and hive --auxpath
 Date: Thu, 7 Mar 2013 17:47:26 -0500

  Hi,

 I have a hive table which uses the jar file provided from the
 elephant-bird, which is a framework integrated between lzo and google
 protobuf data and hadoop/hive.

 If I use the hive command like this:

 hive --auxpath path_to_jars, it works fine to query my table,

 but if I use the add jar after I started the hive session, I will get
 ClassNotFoundException in the runtime of my query of the classes in those
 jars.

 My questions are:

 1) What is the different between hive --auxpath and add jar in the hive
 session?
 2) This problem makes it is hard to access my table in the HUE, as it only
 supports add jar, but not --auxpath option. Any suggestions?


 Thanks

 Yong




-- 
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330


Re: difference between add jar in hive session and hive --auxpath

2013-03-08 Thread Edward Capriolo
Essentially anything that is part of the InputFormat needs to be in
auxlib/auxpath. Anything part of a UDF can be added with 'add jar'.

On Fri, Mar 8, 2013 at 1:01 PM, Dean Wampler 
dean.wamp...@thinkbiganalytics.com wrote:

 --auxpath adds more jars to Hive's classpath before invoking Hive. ADD
 JARS copies jars around the cluster and adds them to the task classpath, so
 the jars you add aren't visible to hive itself. Annoying, but...

 On Fri, Mar 8, 2013 at 11:53 AM, java8964 java8964 
 java8...@hotmail.comwrote:

  This is in HIVE-0.9.0

 hive list jars;
 /nfs_home/common/userlibs/google-collections-1.0.jar
 /nfs_home/common/userlibs/elephant-bird-hive-3.0.7.jar
 /nfs_home/common/userlibs/protobuf-java-2.3.0.jar
 /nfs_home/common/userlibs/elephant-bird-core-3.0.7.jar
 file:/usr/lib/hive/lib/hive-builtins-0.9.0-cdh4.1.2.jar
 hive desc table;
 java.lang.NoClassDefFoundError:
 com/twitter/elephantbird/mapreduce/io/ProtobufConverter
 at
 com.twitter.elephantbird.hive.serde.ProtobufDeserializer.initialize(ProtobufDeserializer.java:45)
 at
 org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:203)
 at
 org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:260)
 at
 org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:253)
 at
 org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:490)
 at
 org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:162)
 at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:930)
 at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:844)
 at
 org.apache.hadoop.hive.ql.exec.DDLTask.describeTable(DDLTask.java:2545)
 at
 org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:309)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
 at
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1331)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1117)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:950)
 at
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
 at
 org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
 at
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:744)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:607)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
 Caused by: java.lang.ClassNotFoundException:
 com.twitter.elephantbird.mapreduce.io.ProtobufConverter
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
 ... 25 more
 FAILED: Execution Error, return code -101 from
 org.apache.hadoop.hive.ql.exec.DDLTask
 hive exit;
 [y130zhan@daca2 userlibs]$ jar tvf
 /nfs_home/common/userlibs/elephant-bird-core-3.0.7.jar | grep
 ProtobufConverter
   4825 Mon Mar 04 16:50:46 UTC 2013
 com/twitter/elephantbird/mapreduce/io/ProtobufConverter.class
732 Mon Mar 04 16:50:46 UTC 2013
 com/twitter/elephantbird/mapreduce/io/ProtobufConverter$1.class


 --
 From: vkavul...@outlook.com
 To: user@hive.apache.org
 Subject: RE: difference between add jar in hive session and hive --auxpath
 Date: Thu, 7 Mar 2013 16:44:41 -0800


 If properly done, add jar jar-file should work the same as passing
 the jar with --auxpath. Can you run list jars; command from CLI or Hue
 and check if you see the jar file.

 --
 From: java8...@hotmail.com
 To: user@hive.apache.org
 Subject: difference between add jar in hive session and hive --auxpath
 Date: Thu, 7 Mar 2013 17:47:26 -0500

  Hi,

 I have a hive table which uses the jar file provided from the
 elephant-bird, which is a framework integrated between lzo and google
 protobuf data and hadoop/hive.

 If I use the hive command like this:

 hive --auxpath path_to_jars, it works fine to query my table,

 but if I use the add jar after I started the hive session, I will get
 ClassNotFoundException in the runtime of my query of the classes in those
 jars.

 My questions are:

 1) What is the different between hive --auxpath and add jar in the hive
 session?
 2) This problem 

A bug belongs to Hive or Elephant-bird

2013-03-08 Thread java8964 java8964

Hi, 
Hive 0.9.0 + Elephant-Bird 3.0.7
I faced a problem to use the elephant-bird with hive. I know what maybe cause 
this problem, but I don't know which side this bug belongs to. Let me know 
explain what is the problem.
If we define a google protobuf file, with field name like 'dateString' (the 
field contains an uppercase 'S'), then when I query the table like this: 
select dateString from table .

I will get the following exception trace:
Caused by:
java.lang.RuntimeException: cannot find field datestring from
[org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@49aacd5f
 .at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321)

   
at
org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldRef(UnionStructObjectInspector.java:96)

   
at 
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57)

   
at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:878)

   
at 
org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:904)

   
at
org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60)

   
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)

   
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)

   
at
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)

   
at
org.apache.hadoop.hive.ql.exec.FilterOperator.initializeOp(FilterOperator.java:73)

   
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)

   
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)

   
at
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)

   
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:133)

   
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)

   
at
org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:444)

   
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)

   
at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98)

Here is the code for the method throws this error:
  public static StructField getStandardStructFieldRef(String fieldName,  
List? extends StructField fields) {fieldName = fieldName.toLowerCase();   
 for (int i = 0; i  fields.size(); i++) {  if 
(fields.get(i).getFieldName().equals(fieldName)) {return fields.get(i); 
 }}// For backward compatibility: fieldNames can also be integer 
Strings.try {  int i = Integer.parseInt(fieldName);  if (i = 0  
i  fields.size()) {return fields.get(i);  }} catch 
(NumberFormatException e) {  // ignore}throw new 
RuntimeException(cannot find field  + fieldName +  from + fields);  
  // return null;  }
I understand the problem happens because at this time, the fileName is 
datestring (all lowercase charcters), but the Listfields contains the 
fieldName for that field is dateString, and that is why the RuntimeException 
happened.
But I don't know which side this bug belongs to, or I want to know more inside 
detail about the Hive implementation contract.
From this link: 
https://cwiki.apache.org/Hive/user-faq.html#UserFAQ-AreHiveQLidentifiers%2528e.g.tablenames%252Ccolumnnames%252Cetc%2529casesensitive%253F
I know that in hive, the table name and column name should be case insensitive, 
so even though in my Query, I used select dateString, the fieldName changed 
to datestring in the code, but the StructField of ObjectInspector from the 
elephant-bird return the EXACTLY fieldname, defined in the code, dateString 
in this case. of course, I can change my protof file to only use lowercase 
field name to bypass this bug, but my questions are:
1) If I implement my ObjectInspector, should I pay attention to the field name? 
Is it needed to be lowercase? 2) I would consider this as a bug of hive, right? 
If this line:
fieldName = fieldName.toLowerCase(); to lowercase the data,
then the comparing should also do it by lowering case by changing
if (fields.get(i).getFieldName().equals(fieldName))
to 
if (fields.get(i).getFieldName().toLowerCase().equals(fieldName))
right?
Thanks
Yong
  

Re: Hive query started map task being killed during execution

2013-03-08 Thread Dean Wampler
Do you have more than one hive process running? It looks like you're using
Derby, which only supports one process at a time. Also, you have to start
Hive from the same directory every time, where the metastore database is
written, unless you edit the JDBC connection property in the Hive config
file to point to a particular path. Here's what I use:

property
  namejavax.jdo.option.ConnectionURL/name

valuejdbc:derby:;databaseName=/Users/somedufus/hive/metastore_db;create=true/value
  descriptionJDBC connect string for a JDBC metastore/description
/property


On Fri, Mar 8, 2013 at 4:09 PM, Dileep Kumar dileepkumar...@gmail.comwrote:

 Hi All,

 I am running a hive query which does insert into a table.
 What I noticed from the symptom it looks like it got to do with some
 settings but  I am not able to figure out what settings.

 When I submit the query it starts 2130 map tasks in the job and 150 of
 them completes fine without any error and then next batch of 75 gets killed
 and all of them after that gets killed.
 While I submit a similar query based on smaller table its starts around
 only 135 map tasks and it runs till completion without any error and does
 the insert into appropriate table.

 I don't find any obvious error messages in any of the tasks log apart form
 this:


 ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001636_0/syslog:2013-03-08
 08:54:06,910 INFO orapache.hadoop.hive.ql.exec.MapOperator:
 DESERIALIZE_ERRORS:0
 ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08
 08:41:06,060 INFO orapache.hadoop.hive.ql.exec.MapOperator:
 DESERIALIZE_ERRORS:0
 ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08
 08:46:54,390 ERROR o.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher:
 Error during instantiating JDBC driver org.apache.derby.jdbc.EmbeddedDriver.
 ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08
 08:46:54,394 ERROR o.apache.hadoop.hive.ql.exec.FileSinkOperator:
 StatsPublishing error: cannot connect to database

 Please suggest if I need to set anything in Hive when I invoke this query.
 The query that runs successfully has lot less rows compared to on that
 fails.

 Thanks,
 DK




-- 
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330


Re: Hive query started map task being killed during execution

2013-03-08 Thread Dileep Kumar
Thanks for your attention !
No only one hive process is running and thing that bother me is smaller
query runs till completion which I invoke the same way. It is using embeded
db if that is the problem I can change it to external DB but as my smaller
query runs fine I thought this should be OK.


On Fri, Mar 8, 2013 at 2:16 PM, Dean Wampler 
dean.wamp...@thinkbiganalytics.com wrote:

 Do you have more than one hive process running? It looks like you're using
 Derby, which only supports one process at a time. Also, you have to start
 Hive from the same directory every time, where the metastore database is
 written, unless you edit the JDBC connection property in the Hive config
 file to point to a particular path. Here's what I use:

 property
   namejavax.jdo.option.ConnectionURL/name

 valuejdbc:derby:;databaseName=/Users/somedufus/hive/metastore_db;create=true/value
   descriptionJDBC connect string for a JDBC metastore/description
 /property


 On Fri, Mar 8, 2013 at 4:09 PM, Dileep Kumar dileepkumar...@gmail.comwrote:

 Hi All,

 I am running a hive query which does insert into a table.
 What I noticed from the symptom it looks like it got to do with some
 settings but  I am not able to figure out what settings.

 When I submit the query it starts 2130 map tasks in the job and 150 of
 them completes fine without any error and then next batch of 75 gets killed
 and all of them after that gets killed.
 While I submit a similar query based on smaller table its starts around
 only 135 map tasks and it runs till completion without any error and does
 the insert into appropriate table.

 I don't find any obvious error messages in any of the tasks log apart
 form this:


 ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001636_0/syslog:2013-03-08
 08:54:06,910 INFO orapache.hadoop.hive.ql.exec.MapOperator:
 DESERIALIZE_ERRORS:0
 ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08
 08:41:06,060 INFO orapache.hadoop.hive.ql.exec.MapOperator:
 DESERIALIZE_ERRORS:0
 ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08
 08:46:54,390 ERROR o.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher:
 Error during instantiating JDBC driver org.apache.derby.jdbc.EmbeddedDriver.
 ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08
 08:46:54,394 ERROR o.apache.hadoop.hive.ql.exec.FileSinkOperator:
 StatsPublishing error: cannot connect to database

 Please suggest if I need to set anything in Hive when I invoke this
 query. The query that runs successfully has lot less rows compared to on
 that fails.

 Thanks,
 DK




 --
 *Dean Wampler, Ph.D.*
 thinkbiganalytics.com
 +1-312-339-1330




Re: Hive query started map task being killed during execution

2013-03-08 Thread Abdelrhman Shettia
Hi Dileep, 

Have tried to se the following values in hive and run the query again.  More 
info why the query may fail in the following link : 

https://cwiki.apache.org/Hive/statsdev.html


set hive.stats.autogather=false;
As well as ; 

set hive.stats.dbclass=jdbc:derby;
set 
hive.stats.dbconnectionstring=jdbc:derby:;databaseName=TempStatsStore;create=true;
set hive.stats.jdbcdriver=org.apache.derby.jdbc.EmbeddedDriver;

Hope this helps. 


 

 Abdelrahman Shettia
ashet...@hortonworks.com


On Mar 8, 2013, at 2:31 PM, Dileep Kumar dileepkumar...@gmail.com wrote:

 Thanks for your attention !
 No only one hive process is running and thing that bother me is smaller query 
 runs till completion which I invoke the same way. It is using embeded db if 
 that is the problem I can change it to external DB but as my smaller query 
 runs fine I thought this should be OK.
 
 
 On Fri, Mar 8, 2013 at 2:16 PM, Dean Wampler 
 dean.wamp...@thinkbiganalytics.com wrote:
 Do you have more than one hive process running? It looks like you're using 
 Derby, which only supports one process at a time. Also, you have to start 
 Hive from the same directory every time, where the metastore database is 
 written, unless you edit the JDBC connection property in the Hive config file 
 to point to a particular path. Here's what I use:
 
 property
   namejavax.jdo.option.ConnectionURL/name
   
 valuejdbc:derby:;databaseName=/Users/somedufus/hive/metastore_db;create=true/value
   descriptionJDBC connect string for a JDBC metastore/description
 /property
 
 
 On Fri, Mar 8, 2013 at 4:09 PM, Dileep Kumar dileepkumar...@gmail.com wrote:
 Hi All,
 
 I am running a hive query which does insert into a table.
 What I noticed from the symptom it looks like it got to do with some settings 
 but  I am not able to figure out what settings.
 
 When I submit the query it starts 2130 map tasks in the job and 150 of them 
 completes fine without any error and then next batch of 75 gets killed and 
 all of them after that gets killed.
 While I submit a similar query based on smaller table its starts around only 
 135 map tasks and it runs till completion without any error and does the 
 insert into appropriate table.
 
 I don't find any obvious error messages in any of the tasks log apart form 
 this:
 
 
 ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001636_0/syslog:2013-03-08
  08:54:06,910 INFO orapache.hadoop.hive.ql.exec.MapOperator: 
 DESERIALIZE_ERRORS:0
 ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08
  08:41:06,060 INFO orapache.hadoop.hive.ql.exec.MapOperator: 
 DESERIALIZE_ERRORS:0
 ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08
  08:46:54,390 ERROR o.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher: 
 Error during instantiating JDBC driver org.apache.derby.jdbc.EmbeddedDriver.
 ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08
  08:46:54,394 ERROR o.apache.hadoop.hive.ql.exec.FileSinkOperator: 
 StatsPublishing error: cannot connect to database
 
 Please suggest if I need to set anything in Hive when I invoke this query. 
 The query that runs successfully has lot less rows compared to on that fails.
 
 Thanks,
 DK
 
 
 
 -- 
 Dean Wampler, Ph.D.
 thinkbiganalytics.com
 +1-312-339-1330
 
 



Re: why apache hive 0.10 document not found?

2013-03-08 Thread Ashutosh Chauhan
This is now fixed via HIVE-4074. Docs are now online again. Thanks, Gunther!

Ashutosh
On Tue, Mar 5, 2013 at 6:42 PM, 周梦想 abloz...@gmail.com wrote:

 from version 0.80, the release document is not found.

 http://hive.apache.org/docs/r0.10.0/

 Not Found

 The requested URL /docs/r0.10.0/ was not found on this server.
 --
 Apache/2.4.4 (Unix) OpenSSL/1.0.0g Server at hive.apache.org Port 80