Re: java.lang.NoClassDefFoundError: com/jayway/jsonpath/PathUtil
I have added the jar files successfully like this: hive (testdb) ADD JAR lib/hive-json-serde-0.3.jar; Added lib/hive-json-serde-0.3.jar to class path Added resource: lib/hive-json-serde-0.3.jar hive (testdb) ADD JAR lib/json-path-0.5.4.jar; Added lib/json-path-0.5.4.jar to class path Added resource: lib/json-path-0.5.4.jar hive (testdb) ADD JAR lib/json-smart-1.0.6.3.jar; Added lib/json-smart-1.0.6.3.jar to class path Added resource: lib/json-smart-1.0.6.3.jar After this i am getting this error: CREATE EXTERNAL TABLE IF NOT EXISTS twitter (tweet_id BIGINT,created_at STRING,text STRING,user_id BIGINT, user_screen_name STRING,user_lang STRING) ROW FORMAT SERDE org.apache.hadoop.hive.contrib.serde2.JsonSerde WITH SERDEPROPERTIES ( tweet_id=$.id,created_at=$.created_at,text=$.text,user_id=$.user.id,user_screen_name=$.user.screen_name, user_lang=$.user.lang) LOCATION '/home/satish/data/twitter/input'; java.lang.NoClassDefFoundError: com/jayway/jsonpath/PathUtil at org.apache.hadoop.hive.contrib.serde2.JsonSerde.initialize(Unknown Source) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:207) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:266) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:259) at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:585) at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:550) at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:3698) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:253) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1336) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1122) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:935) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:755) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.lang.ClassNotFoundException: com.jayway.jsonpath.PathUtil at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) ... 23 more FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.DDLTask Any help would be really appreciated. Thanks Sai
Re: Hive sample test
If any of the 100 rows that the sub-query returns do not satisfy the where clause, there would be no rows in the overall result. Do we still consider that the Hive query is verified in this case? Regards, Ramki. On Wed, Mar 6, 2013 at 1:14 AM, Dean Wampler dean.wamp...@thinkbiganalytics.com wrote: NIce, yea that would do it. On Tue, Mar 5, 2013 at 1:26 PM, Mark Grover grover.markgro...@gmail.comwrote: I typically change my query to query from a limited version of the whole table. Change select really_expensive_select_clause from really_big_table where something=something group by something=something to select really_expensive_select_clause from ( select * from really_big_table limit 100 )t where something=something group by something=something On Tue, Mar 5, 2013 at 10:57 AM, Dean Wampler dean.wamp...@thinkbiganalytics.com wrote: Unfortunately, it will still go through the whole thing, then just limit the output. However, there's a flag that I think only works in more recent Hive releases: set hive.limit.optimize.enable=true This is supposed to apply limiting earlier in the data stream, so it will give different results that limiting just the output. Like Chuck said, you might consider sampling, but unless your table is organized into buckets, you'll at least scan the whole table, but maybe not do all computation over it ?? Also, if you have a small sample data set: set hive.exec.mode.local.auto=true will cause Hive to bypass the Job and Task Trackers, calling APIs directly, when it can do the whole thing in a single process. Not lightning fast, but faster. dean On Tue, Mar 5, 2013 at 12:48 PM, Joey D'Antoni jdant...@yahoo.com wrote: Just add a limit 1 to the end of your query. On Mar 5, 2013, at 1:45 PM, Kyle B kbi...@gmail.com wrote: Hello, I was wondering if there is a way to quick-verify a Hive query before it is run against a big dataset? The tables I am querying against have millions of records, and I'd like to verify my Hive query before I run it against all records. Is there a way to test the query against a small subset of the data, without going into full MapReduce? As silly as this sounds, is there a way to MapReduce without the overhead of MapReduce? That way I can check my query is doing what I want before I run it against all records. Thanks, -Kyle -- Dean Wampler, Ph.D. thinkbiganalytics.com +1-312-339-1330 -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330
Re: Accessing sub column in hive
Hi Sai You can do it as Select address.country from employees; Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- From: Bennie Schut bsc...@ebuddy.com Date: Fri, 8 Mar 2013 09:09:49 To: user@hive.apache.orguser@hive.apache.org; 'Sai Sai'saigr...@yahoo.in Reply-To: user@hive.apache.org Subject: RE: Accessing sub column in hive Perhaps worth posting the error. Some might know what the error means. Also a bit unrelated to hive but please do yourself a favor and don't use float to store monetary values like salary. You will get rounding issues at some point in time when you do arithmetic on them. Considering you are using hadoop you probably have a lot of data so adding it all up will get you there really really fast. http://stackoverflow.com/questions/3730019/why-not-use-double-or-float-to-represent-currency From: Sai Sai [mailto:saigr...@yahoo.in] Sent: Thursday, March 07, 2013 12:54 PM To: user@hive.apache.org Subject: Re: Accessing sub column in hive I have a table created like this successfully: CREATE TABLE IF NOT EXISTS employees (name STRING,salary FLOAT,subordinates ARRAYSTRING,deductions MAPSTRING,FLOAT,address STRUCTstreet:STRING, city:STRING, state:STRING, zip:INT, country:STRING) I would like to access/display country column from my address struct. I have tried this: select address[country] from employees; I get an error. Please help. Thanks Sai
Re: Accessing sub column in hive
I recognize this example ;) You reference struct elements with the dot notation, as Bejoy said, map elements with what you tried, deductions['Federal taxes'], and arrays by index, starting from zero, subordinates[0]. On Fri, Mar 8, 2013 at 6:35 AM, bejoy...@yahoo.com wrote: Hi Sai You can do it as Select address.country from employees; Regards Bejoy KS Sent from remote device, Please excuse typos -- *From: * Bennie Schut bsc...@ebuddy.com *Date: *Fri, 8 Mar 2013 09:09:49 +0100 *To: *user@hive.apache.orguser@hive.apache.org; 'Sai Sai' saigr...@yahoo.in *ReplyTo: * user@hive.apache.org *Subject: *RE: Accessing sub column in hive Perhaps worth posting the error. Some might know what the error means. ** ** Also a bit unrelated to hive but please do yourself a favor and don’t use float to store monetary values like salary. You will get rounding issues at some point in time when you do arithmetic on them. Considering you are using hadoop you probably have a lot of data so adding it all up will get you there really really fast. http://stackoverflow.com/questions/3730019/why-not-use-double-or-float-to-represent-currency ** ** ** ** *From:* Sai Sai [mailto:saigr...@yahoo.in] *Sent:* Thursday, March 07, 2013 12:54 PM *To:* user@hive.apache.org *Subject:* Re: Accessing sub column in hive ** ** I have a table created like this successfully: ** ** CREATE TABLE IF NOT EXISTS employees (name STRING,salary FLOAT,subordinates ARRAYSTRING,deductions MAPSTRING,FLOAT,address STRUCTstreet:STRING, city:STRING, state:STRING, zip:INT, country:STRING) ** ** I would like to access/display country column from my address struct. I have tried this: ** ** select address[country] from employees; ** ** I get an error. ** ** Please help. ** ** Thanks Sai -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330
Re: java.lang.NoClassDefFoundError: com/jayway/jsonpath/PathUtil
Unfortunately, you have to also add the json jars to Hive's class path before it starts, e.g., env HADOOP_CLASSPATH=/path/to/lib/*.jar hive Use the appropriate path to your lib directory. On Fri, Mar 8, 2013 at 4:53 AM, Sai Sai saigr...@yahoo.in wrote: I have added the jar files successfully like this: hive (testdb) ADD JAR lib/hive-json-serde-0.3.jar; Added lib/hive-json-serde-0.3.jar to class path Added resource: lib/hive-json-serde-0.3.jar hive (testdb) ADD JAR lib/json-path-0.5.4.jar; Added lib/json-path-0.5.4.jar to class path Added resource: lib/json-path-0.5.4.jar hive (testdb) ADD JAR lib/json-smart-1.0.6.3.jar; Added lib/json-smart-1.0.6.3.jar to class path Added resource: lib/json-smart-1.0.6.3.jar After this i am getting this error: CREATE EXTERNAL TABLE IF NOT EXISTS twitter (tweet_id BIGINT,created_at STRING,text STRING,user_id BIGINT, user_screen_name STRING,user_lang STRING) ROW FORMAT SERDE org.apache.hadoop.hive.contrib.serde2.JsonSerde WITH SERDEPROPERTIES ( tweet_id=$.id,created_at=$.created_at,text=$.text,user_id=$. user.id,user_screen_name=$.user.screen_name, user_lang=$.user.lang) LOCATION '/home/satish/data/twitter/input'; java.lang.NoClassDefFoundError: com/jayway/jsonpath/PathUtil at org.apache.hadoop.hive.contrib.serde2.JsonSerde.initialize(Unknown Source) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:207) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:266) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:259) at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:585) at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:550) at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:3698) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:253) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1336) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1122) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:935) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:755) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.lang.ClassNotFoundException: com.jayway.jsonpath.PathUtil at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) ... 23 more FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.DDLTask Any help would be really appreciated. Thanks Sai -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330
Re: Find current db we r using in Hive
It's odd that there is no such command. The trick Ramki mentioned is the only one I know of. Two points about it, though: 1. It only works on Hive v0.8+. 2. I've seen a few cases where the prompt did NOT change when first used, but starting working a little later! I have no idea why and of course, it happened while teaching a class where I'm supposed to be the expert ;) dean On Fri, Mar 8, 2013 at 12:36 AM, Ramki Palle ramki.pa...@gmail.com wrote: Sai, I do not think there is any command to show the current db in Hive. One alternative for you is to set a property so that the current database is shown as part of the prompt: set hive.cli.print.current.db=true; This one shows your current db as part of your hive prompt. Regards, Ramki. On Fri, Mar 8, 2013 at 11:13 AM, Sai Sai saigr...@yahoo.in wrote: Just wondering if there is any command in Hive which will show us the current db we r using similar to pwd in Unix. Thanks Sai -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330
RE: Data mismatch when importing data from Oracle to Hive through Sqoop without an error
Hi Venkat, All most column have some value except these three. Regards, Ajit -Original Message- From: Venkat Ranganathan [mailto:vranganat...@hortonworks.com] Sent: Wednesday, March 06, 2013 9:36 PM To: user@hive.apache.org Cc: u...@sqoop.apache.org Subject: Re: Data mismatch when importing data from Oracle to Hive through Sqoop without an error Hi Ajit Do you know if rest of the columns also are null when the three non null columns are null Venkat On Wed, Mar 6, 2013 at 12:35 AM, Ajit Kumar Shreevastava ajit.shreevast...@hcl.com wrote: Hi Abhijeet, Thanks for your response. If values that don't fit in double must be getting inserted as Null is the case then count should not be mis-match in both the case. Here the null value inserted are extra value apart from the other value which is already present in both Oracle Table and Hive table. Correct me if I am wrong in interpretation. Thanks and Regards, Ajit Kumar Shreevastava From: abhijeet gaikwad [mailto:abygaikwa...@gmail.com] Sent: Wednesday, March 06, 2013 1:46 PM To: user@hive.apache.org Cc: u...@sqoop.apache.org Subject: Re: Data mismatch when importing data from Oracle to Hive through Sqoop without an error Sqoop maps numeric and decimal types (RDBMS) to double (Hive). I think the values that don't fit in double must be getting inserted as NULL. You can see this warning in your logs. Thanks, Abhijeet On Wed, Mar 6, 2013 at 1:32 PM, Ajit Kumar Shreevastava ajit.shreevast...@hcl.com wrote: Hi all, I have notice one interesting thing in the below result-set. I have fired one query in both Oracle and Hive shell and found the following result set:à SQL select count(1) from bttn 2 where bttn_id is null or data_inst_id is null or scr_id is null; COUNT(1) -- 0 hive select count(1) from bttn where bttn_id is null or data_inst_id is null or scr_id is null; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Starting Job = job_201303051835_0020, Tracking URL = http://NHCLT-PC44-2:50030/jobdetails.jsp?jobid=job_201303051835_0020 Kill Command = /home/hadoop/hadoop-1.0.3/bin/hadoop job -kill job_201303051835_0020 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2013-03-06 13:22:56,908 Stage-1 map = 0%, reduce = 0% 2013-03-06 13:23:05,928 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.2 sec 2013-03-06 13:23:06,931 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.2 sec 2013-03-06 13:23:07,934 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.2 sec 2013-03-06 13:23:08,938 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.2 sec 2013-03-06 13:23:09,941 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.2 sec 2013-03-06 13:23:10,944 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.2 sec 2013-03-06 13:23:11,947 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.2 sec 2013-03-06 13:23:12,956 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.2 sec 2013-03-06 13:23:13,959 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.2 sec 2013-03-06 13:23:14,962 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 5.2 sec 2013-03-06 13:23:15,965 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 5.2 sec 2013-03-06 13:23:16,969 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 5.2 sec 2013-03-06 13:23:17,974 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.95 sec 2013-03-06 13:23:18,977 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.95 sec 2013-03-06 13:23:19,981 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.95 sec 2013-03-06 13:23:20,985 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.95 sec 2013-03-06 13:23:21,988 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.95 sec 2013-03-06 13:23:22,995 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.95 sec 2013-03-06 13:23:23,998 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.95 sec MapReduce Total cumulative CPU time: 6 seconds 950 msec Ended Job = job_201303051835_0020 MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 Cumulative CPU: 6.95 sec HDFS Read: 184270926 HDFS Write: 4 SUCCESS Total MapReduce CPU Time Spent: 6 seconds 950 msec OK 986 Time taken: 35.983 seconds hive and 739169 - 738183=986 can anyone tell me why this happened as BTTN_ID , DATA_INST_ID, SCR_ID are not null constrains of BTTN table and also composite Primary Key of the table? Also tell me how can I prevent this unnecessary data generation in HIVE table. Regards Ajit Kumar Shreevastava From:
RE: difference between add jar in hive session and hive --auxpath
This is in HIVE-0.9.0 hive list jars;/nfs_home/common/userlibs/google-collections-1.0.jar/nfs_home/common/userlibs/elephant-bird-hive-3.0.7.jar/nfs_home/common/userlibs/protobuf-java-2.3.0.jar/nfs_home/common/userlibs/elephant-bird-core-3.0.7.jarfile:/usr/lib/hive/lib/hive-builtins-0.9.0-cdh4.1.2.jarhive desc table;java.lang.NoClassDefFoundError: com/twitter/elephantbird/mapreduce/io/ProtobufConverterat com.twitter.elephantbird.hive.serde.ProtobufDeserializer.initialize(ProtobufDeserializer.java:45) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:203) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:260) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:253) at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:490)at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:162) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:930)at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:844)at org.apache.hadoop.hive.ql.exec.DDLTask.describeTable(DDLTask.java:2545) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:309)at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1331)at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1117)at org.apache.hadoop.hive.ql.Driver.run(Driver.java:950)at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:744)at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:607)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597)at org.apache.hadoop.util.RunJar.main(RunJar.java:208)Caused by: java.lang.ClassNotFoundException: com.twitter.elephantbird.mapreduce.io.ProtobufConverterat java.net.URLClassLoader$1.run(URLClassLoader.java:202)at java.security.AccessController.doPrivileged(Native Method)at java.net.URLClassLoader.findClass(URLClassLoader.java:190)at java.lang.ClassLoader.loadClass(ClassLoader.java:307)at java.lang.ClassLoader.loadClass(ClassLoader.java:248)... 25 moreFAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.DDLTaskhive exit;[y130zhan@daca2 userlibs]$ jar tvf /nfs_home/common/userlibs/elephant-bird-core-3.0.7.jar | grep ProtobufConverter 4825 Mon Mar 04 16:50:46 UTC 2013 com/twitter/elephantbird/mapreduce/io/ProtobufConverter.class 732 Mon Mar 04 16:50:46 UTC 2013 com/twitter/elephantbird/mapreduce/io/ProtobufConverter$1.class From: vkavul...@outlook.com To: user@hive.apache.org Subject: RE: difference between add jar in hive session and hive --auxpath Date: Thu, 7 Mar 2013 16:44:41 -0800 If properly done, add jar jar-file should work the same as passing the jar with --auxpath. Can you run list jars; command from CLI or Hue and check if you see the jar file. From: java8...@hotmail.com To: user@hive.apache.org Subject: difference between add jar in hive session and hive --auxpath Date: Thu, 7 Mar 2013 17:47:26 -0500 Hi, I have a hive table which uses the jar file provided from the elephant-bird, which is a framework integrated between lzo and google protobuf data and hadoop/hive. If I use the hive command like this: hive --auxpath path_to_jars, it works fine to query my table, but if I use the add jar after I started the hive session, I will get ClassNotFoundException in the runtime of my query of the classes in those jars. My questions are: 1) What is the different between hive --auxpath and add jar in the hive session?2) This problem makes it is hard to access my table in the HUE, as it only supports add jar, but not --auxpath option. Any suggestions? Thanks Yong
Re: difference between add jar in hive session and hive --auxpath
--auxpath adds more jars to Hive's classpath before invoking Hive. ADD JARS copies jars around the cluster and adds them to the task classpath, so the jars you add aren't visible to hive itself. Annoying, but... On Fri, Mar 8, 2013 at 11:53 AM, java8964 java8964 java8...@hotmail.comwrote: This is in HIVE-0.9.0 hive list jars; /nfs_home/common/userlibs/google-collections-1.0.jar /nfs_home/common/userlibs/elephant-bird-hive-3.0.7.jar /nfs_home/common/userlibs/protobuf-java-2.3.0.jar /nfs_home/common/userlibs/elephant-bird-core-3.0.7.jar file:/usr/lib/hive/lib/hive-builtins-0.9.0-cdh4.1.2.jar hive desc table; java.lang.NoClassDefFoundError: com/twitter/elephantbird/mapreduce/io/ProtobufConverter at com.twitter.elephantbird.hive.serde.ProtobufDeserializer.initialize(ProtobufDeserializer.java:45) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:203) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:260) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:253) at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:490) at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:162) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:930) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:844) at org.apache.hadoop.hive.ql.exec.DDLTask.describeTable(DDLTask.java:2545) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:309) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1331) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1117) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:950) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:744) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:607) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) Caused by: java.lang.ClassNotFoundException: com.twitter.elephantbird.mapreduce.io.ProtobufConverter at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) ... 25 more FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.DDLTask hive exit; [y130zhan@daca2 userlibs]$ jar tvf /nfs_home/common/userlibs/elephant-bird-core-3.0.7.jar | grep ProtobufConverter 4825 Mon Mar 04 16:50:46 UTC 2013 com/twitter/elephantbird/mapreduce/io/ProtobufConverter.class 732 Mon Mar 04 16:50:46 UTC 2013 com/twitter/elephantbird/mapreduce/io/ProtobufConverter$1.class -- From: vkavul...@outlook.com To: user@hive.apache.org Subject: RE: difference between add jar in hive session and hive --auxpath Date: Thu, 7 Mar 2013 16:44:41 -0800 If properly done, add jar jar-file should work the same as passing the jar with --auxpath. Can you run list jars; command from CLI or Hue and check if you see the jar file. -- From: java8...@hotmail.com To: user@hive.apache.org Subject: difference between add jar in hive session and hive --auxpath Date: Thu, 7 Mar 2013 17:47:26 -0500 Hi, I have a hive table which uses the jar file provided from the elephant-bird, which is a framework integrated between lzo and google protobuf data and hadoop/hive. If I use the hive command like this: hive --auxpath path_to_jars, it works fine to query my table, but if I use the add jar after I started the hive session, I will get ClassNotFoundException in the runtime of my query of the classes in those jars. My questions are: 1) What is the different between hive --auxpath and add jar in the hive session? 2) This problem makes it is hard to access my table in the HUE, as it only supports add jar, but not --auxpath option. Any suggestions? Thanks Yong -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330
Re: difference between add jar in hive session and hive --auxpath
Essentially anything that is part of the InputFormat needs to be in auxlib/auxpath. Anything part of a UDF can be added with 'add jar'. On Fri, Mar 8, 2013 at 1:01 PM, Dean Wampler dean.wamp...@thinkbiganalytics.com wrote: --auxpath adds more jars to Hive's classpath before invoking Hive. ADD JARS copies jars around the cluster and adds them to the task classpath, so the jars you add aren't visible to hive itself. Annoying, but... On Fri, Mar 8, 2013 at 11:53 AM, java8964 java8964 java8...@hotmail.comwrote: This is in HIVE-0.9.0 hive list jars; /nfs_home/common/userlibs/google-collections-1.0.jar /nfs_home/common/userlibs/elephant-bird-hive-3.0.7.jar /nfs_home/common/userlibs/protobuf-java-2.3.0.jar /nfs_home/common/userlibs/elephant-bird-core-3.0.7.jar file:/usr/lib/hive/lib/hive-builtins-0.9.0-cdh4.1.2.jar hive desc table; java.lang.NoClassDefFoundError: com/twitter/elephantbird/mapreduce/io/ProtobufConverter at com.twitter.elephantbird.hive.serde.ProtobufDeserializer.initialize(ProtobufDeserializer.java:45) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:203) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:260) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:253) at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:490) at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:162) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:930) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:844) at org.apache.hadoop.hive.ql.exec.DDLTask.describeTable(DDLTask.java:2545) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:309) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1331) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1117) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:950) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:744) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:607) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) Caused by: java.lang.ClassNotFoundException: com.twitter.elephantbird.mapreduce.io.ProtobufConverter at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) ... 25 more FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.DDLTask hive exit; [y130zhan@daca2 userlibs]$ jar tvf /nfs_home/common/userlibs/elephant-bird-core-3.0.7.jar | grep ProtobufConverter 4825 Mon Mar 04 16:50:46 UTC 2013 com/twitter/elephantbird/mapreduce/io/ProtobufConverter.class 732 Mon Mar 04 16:50:46 UTC 2013 com/twitter/elephantbird/mapreduce/io/ProtobufConverter$1.class -- From: vkavul...@outlook.com To: user@hive.apache.org Subject: RE: difference between add jar in hive session and hive --auxpath Date: Thu, 7 Mar 2013 16:44:41 -0800 If properly done, add jar jar-file should work the same as passing the jar with --auxpath. Can you run list jars; command from CLI or Hue and check if you see the jar file. -- From: java8...@hotmail.com To: user@hive.apache.org Subject: difference between add jar in hive session and hive --auxpath Date: Thu, 7 Mar 2013 17:47:26 -0500 Hi, I have a hive table which uses the jar file provided from the elephant-bird, which is a framework integrated between lzo and google protobuf data and hadoop/hive. If I use the hive command like this: hive --auxpath path_to_jars, it works fine to query my table, but if I use the add jar after I started the hive session, I will get ClassNotFoundException in the runtime of my query of the classes in those jars. My questions are: 1) What is the different between hive --auxpath and add jar in the hive session? 2) This problem
A bug belongs to Hive or Elephant-bird
Hi, Hive 0.9.0 + Elephant-Bird 3.0.7 I faced a problem to use the elephant-bird with hive. I know what maybe cause this problem, but I don't know which side this bug belongs to. Let me know explain what is the problem. If we define a google protobuf file, with field name like 'dateString' (the field contains an uppercase 'S'), then when I query the table like this: select dateString from table . I will get the following exception trace: Caused by: java.lang.RuntimeException: cannot find field datestring from [org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@49aacd5f .at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321) at org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldRef(UnionStructObjectInspector.java:96) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:878) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:904) at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389) at org.apache.hadoop.hive.ql.exec.FilterOperator.initializeOp(FilterOperator.java:73) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389) at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:133) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:444) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98) Here is the code for the method throws this error: public static StructField getStandardStructFieldRef(String fieldName, List? extends StructField fields) {fieldName = fieldName.toLowerCase(); for (int i = 0; i fields.size(); i++) { if (fields.get(i).getFieldName().equals(fieldName)) {return fields.get(i); }}// For backward compatibility: fieldNames can also be integer Strings.try { int i = Integer.parseInt(fieldName); if (i = 0 i fields.size()) {return fields.get(i); }} catch (NumberFormatException e) { // ignore}throw new RuntimeException(cannot find field + fieldName + from + fields); // return null; } I understand the problem happens because at this time, the fileName is datestring (all lowercase charcters), but the Listfields contains the fieldName for that field is dateString, and that is why the RuntimeException happened. But I don't know which side this bug belongs to, or I want to know more inside detail about the Hive implementation contract. From this link: https://cwiki.apache.org/Hive/user-faq.html#UserFAQ-AreHiveQLidentifiers%2528e.g.tablenames%252Ccolumnnames%252Cetc%2529casesensitive%253F I know that in hive, the table name and column name should be case insensitive, so even though in my Query, I used select dateString, the fieldName changed to datestring in the code, but the StructField of ObjectInspector from the elephant-bird return the EXACTLY fieldname, defined in the code, dateString in this case. of course, I can change my protof file to only use lowercase field name to bypass this bug, but my questions are: 1) If I implement my ObjectInspector, should I pay attention to the field name? Is it needed to be lowercase? 2) I would consider this as a bug of hive, right? If this line: fieldName = fieldName.toLowerCase(); to lowercase the data, then the comparing should also do it by lowering case by changing if (fields.get(i).getFieldName().equals(fieldName)) to if (fields.get(i).getFieldName().toLowerCase().equals(fieldName)) right? Thanks Yong
Re: Hive query started map task being killed during execution
Do you have more than one hive process running? It looks like you're using Derby, which only supports one process at a time. Also, you have to start Hive from the same directory every time, where the metastore database is written, unless you edit the JDBC connection property in the Hive config file to point to a particular path. Here's what I use: property namejavax.jdo.option.ConnectionURL/name valuejdbc:derby:;databaseName=/Users/somedufus/hive/metastore_db;create=true/value descriptionJDBC connect string for a JDBC metastore/description /property On Fri, Mar 8, 2013 at 4:09 PM, Dileep Kumar dileepkumar...@gmail.comwrote: Hi All, I am running a hive query which does insert into a table. What I noticed from the symptom it looks like it got to do with some settings but I am not able to figure out what settings. When I submit the query it starts 2130 map tasks in the job and 150 of them completes fine without any error and then next batch of 75 gets killed and all of them after that gets killed. While I submit a similar query based on smaller table its starts around only 135 map tasks and it runs till completion without any error and does the insert into appropriate table. I don't find any obvious error messages in any of the tasks log apart form this: ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001636_0/syslog:2013-03-08 08:54:06,910 INFO orapache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0 ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08 08:41:06,060 INFO orapache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0 ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08 08:46:54,390 ERROR o.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher: Error during instantiating JDBC driver org.apache.derby.jdbc.EmbeddedDriver. ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08 08:46:54,394 ERROR o.apache.hadoop.hive.ql.exec.FileSinkOperator: StatsPublishing error: cannot connect to database Please suggest if I need to set anything in Hive when I invoke this query. The query that runs successfully has lot less rows compared to on that fails. Thanks, DK -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330
Re: Hive query started map task being killed during execution
Thanks for your attention ! No only one hive process is running and thing that bother me is smaller query runs till completion which I invoke the same way. It is using embeded db if that is the problem I can change it to external DB but as my smaller query runs fine I thought this should be OK. On Fri, Mar 8, 2013 at 2:16 PM, Dean Wampler dean.wamp...@thinkbiganalytics.com wrote: Do you have more than one hive process running? It looks like you're using Derby, which only supports one process at a time. Also, you have to start Hive from the same directory every time, where the metastore database is written, unless you edit the JDBC connection property in the Hive config file to point to a particular path. Here's what I use: property namejavax.jdo.option.ConnectionURL/name valuejdbc:derby:;databaseName=/Users/somedufus/hive/metastore_db;create=true/value descriptionJDBC connect string for a JDBC metastore/description /property On Fri, Mar 8, 2013 at 4:09 PM, Dileep Kumar dileepkumar...@gmail.comwrote: Hi All, I am running a hive query which does insert into a table. What I noticed from the symptom it looks like it got to do with some settings but I am not able to figure out what settings. When I submit the query it starts 2130 map tasks in the job and 150 of them completes fine without any error and then next batch of 75 gets killed and all of them after that gets killed. While I submit a similar query based on smaller table its starts around only 135 map tasks and it runs till completion without any error and does the insert into appropriate table. I don't find any obvious error messages in any of the tasks log apart form this: ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001636_0/syslog:2013-03-08 08:54:06,910 INFO orapache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0 ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08 08:41:06,060 INFO orapache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0 ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08 08:46:54,390 ERROR o.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher: Error during instantiating JDBC driver org.apache.derby.jdbc.EmbeddedDriver. ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08 08:46:54,394 ERROR o.apache.hadoop.hive.ql.exec.FileSinkOperator: StatsPublishing error: cannot connect to database Please suggest if I need to set anything in Hive when I invoke this query. The query that runs successfully has lot less rows compared to on that fails. Thanks, DK -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330
Re: Hive query started map task being killed during execution
Hi Dileep, Have tried to se the following values in hive and run the query again. More info why the query may fail in the following link : https://cwiki.apache.org/Hive/statsdev.html set hive.stats.autogather=false; As well as ; set hive.stats.dbclass=jdbc:derby; set hive.stats.dbconnectionstring=jdbc:derby:;databaseName=TempStatsStore;create=true; set hive.stats.jdbcdriver=org.apache.derby.jdbc.EmbeddedDriver; Hope this helps. Abdelrahman Shettia ashet...@hortonworks.com On Mar 8, 2013, at 2:31 PM, Dileep Kumar dileepkumar...@gmail.com wrote: Thanks for your attention ! No only one hive process is running and thing that bother me is smaller query runs till completion which I invoke the same way. It is using embeded db if that is the problem I can change it to external DB but as my smaller query runs fine I thought this should be OK. On Fri, Mar 8, 2013 at 2:16 PM, Dean Wampler dean.wamp...@thinkbiganalytics.com wrote: Do you have more than one hive process running? It looks like you're using Derby, which only supports one process at a time. Also, you have to start Hive from the same directory every time, where the metastore database is written, unless you edit the JDBC connection property in the Hive config file to point to a particular path. Here's what I use: property namejavax.jdo.option.ConnectionURL/name valuejdbc:derby:;databaseName=/Users/somedufus/hive/metastore_db;create=true/value descriptionJDBC connect string for a JDBC metastore/description /property On Fri, Mar 8, 2013 at 4:09 PM, Dileep Kumar dileepkumar...@gmail.com wrote: Hi All, I am running a hive query which does insert into a table. What I noticed from the symptom it looks like it got to do with some settings but I am not able to figure out what settings. When I submit the query it starts 2130 map tasks in the job and 150 of them completes fine without any error and then next batch of 75 gets killed and all of them after that gets killed. While I submit a similar query based on smaller table its starts around only 135 map tasks and it runs till completion without any error and does the insert into appropriate table. I don't find any obvious error messages in any of the tasks log apart form this: ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001636_0/syslog:2013-03-08 08:54:06,910 INFO orapache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0 ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08 08:41:06,060 INFO orapache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0 ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08 08:46:54,390 ERROR o.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher: Error during instantiating JDBC driver org.apache.derby.jdbc.EmbeddedDriver. ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08 08:46:54,394 ERROR o.apache.hadoop.hive.ql.exec.FileSinkOperator: StatsPublishing error: cannot connect to database Please suggest if I need to set anything in Hive when I invoke this query. The query that runs successfully has lot less rows compared to on that fails. Thanks, DK -- Dean Wampler, Ph.D. thinkbiganalytics.com +1-312-339-1330
Re: why apache hive 0.10 document not found?
This is now fixed via HIVE-4074. Docs are now online again. Thanks, Gunther! Ashutosh On Tue, Mar 5, 2013 at 6:42 PM, 周梦想 abloz...@gmail.com wrote: from version 0.80, the release document is not found. http://hive.apache.org/docs/r0.10.0/ Not Found The requested URL /docs/r0.10.0/ was not found on this server. -- Apache/2.4.4 (Unix) OpenSSL/1.0.0g Server at hive.apache.org Port 80