Re: Alter table is giving error
The RECOVER PARTITIONS is an enhancement added by Amazon to their version of Hive. http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html shameless-plus Chapter 21 of Programming Hive discusses this feature and other aspects of using Hive in EMR. /shameless-plug dean On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta chunky.gu...@vizury.comwrote: Hi, I am having a cluster setup on EC2 with Hadoop version 0.20.2 and Hive version 0.8.1 (I configured everything) . I have created a table using :- CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW FORMAT DELIMITED FIELDS TERMINATED BY 'WWW' LOCATION 's3://my-location/data/'; Now I am trying to recover partition using :- ALTER TABLE XXX RECOVER PARTITIONS; but I am getting this error :- FAILED: Parse Error: line 1:12 cannot recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter table statement Doing same steps on a cluster setup on EMR with Hadoop version 1.0.3 and Hive version 0.8.1 (Configured by EMR), works fine. So is this a version issue or am I missing some configuration changes in EC2 setup ? I am not able to find exact solution for this problem on internet. Please help me. Thanks, Chunky. -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330
Re: Alter table is giving error
Hi Dean, Actually I was having Hadoop and Hive cluster on EMR and I have S3 storage containing logs which updates daily and having partition with date(dt). And I was using this recover partition. Now I wanted to shift to EC2 and have my own Hadoop and Hive cluster. So, what is the alternate of using recover partition in this case, if you have any idea ? I found one way of individually partitioning all dates, so I have to write script for that to do so for all dates. Is there any easiest way other than this ? Thanks, Chunky On Mon, Nov 5, 2012 at 6:28 PM, Dean Wampler dean.wamp...@thinkbiganalytics.com wrote: The RECOVER PARTITIONS is an enhancement added by Amazon to their version of Hive. http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html shameless-plus Chapter 21 of Programming Hive discusses this feature and other aspects of using Hive in EMR. /shameless-plug dean On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta chunky.gu...@vizury.comwrote: Hi, I am having a cluster setup on EC2 with Hadoop version 0.20.2 and Hive version 0.8.1 (I configured everything) . I have created a table using :- CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW FORMAT DELIMITED FIELDS TERMINATED BY 'WWW' LOCATION 's3://my-location/data/'; Now I am trying to recover partition using :- ALTER TABLE XXX RECOVER PARTITIONS; but I am getting this error :- FAILED: Parse Error: line 1:12 cannot recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter table statement Doing same steps on a cluster setup on EMR with Hadoop version 1.0.3 and Hive version 0.8.1 (Configured by EMR), works fine. So is this a version issue or am I missing some configuration changes in EC2 setup ? I am not able to find exact solution for this problem on internet. Please help me. Thanks, Chunky. -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330
Re: Alter table is giving error
Writing a script to add the external partitions individually is the only way I know of. Sent from my rotary phone. On Nov 5, 2012, at 8:19 AM, Chunky Gupta chunky.gu...@vizury.com wrote: Hi Dean, Actually I was having Hadoop and Hive cluster on EMR and I have S3 storage containing logs which updates daily and having partition with date(dt). And I was using this recover partition. Now I wanted to shift to EC2 and have my own Hadoop and Hive cluster. So, what is the alternate of using recover partition in this case, if you have any idea ? I found one way of individually partitioning all dates, so I have to write script for that to do so for all dates. Is there any easiest way other than this ? Thanks, Chunky On Mon, Nov 5, 2012 at 6:28 PM, Dean Wampler dean.wamp...@thinkbiganalytics.com wrote: The RECOVER PARTITIONS is an enhancement added by Amazon to their version of Hive. http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html shameless-plus Chapter 21 of Programming Hive discusses this feature and other aspects of using Hive in EMR. /shameless-plug dean On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta chunky.gu...@vizury.com wrote: Hi, I am having a cluster setup on EC2 with Hadoop version 0.20.2 and Hive version 0.8.1 (I configured everything) . I have created a table using :- CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW FORMAT DELIMITED FIELDS TERMINATED BY 'WWW' LOCATION 's3://my-location/data/'; Now I am trying to recover partition using :- ALTER TABLE XXX RECOVER PARTITIONS; but I am getting this error :- FAILED: Parse Error: line 1:12 cannot recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter table statement Doing same steps on a cluster setup on EMR with Hadoop version 1.0.3 and Hive version 0.8.1 (Configured by EMR), works fine. So is this a version issue or am I missing some configuration changes in EC2 setup ? I am not able to find exact solution for this problem on internet. Please help me. Thanks, Chunky. -- Dean Wampler, Ph.D. thinkbiganalytics.com +1-312-339-1330
ClassNotFoundException when use hive java client of hive + hbase integration
Hi, all. I have a hive+hbase integration cluster. When I try to execute query through the java client of hive, sometimes a ClassNotFoundException happens. My java code : final Connection conn = DriverManager.getConnection(URL); final ResultSet rs = conn.executeQuery(SELECT count(*) FROM test_table WHERE (source = '0' AND ur_createtime BETWEEN '2012103100' AND '20121031235959')); I can execute the sql:SELECT count(*) FROM test_table WHERE (source = '0' AND ur_createtime BETWEEN '2012103100' AND '20121031235959') in hive cli mode, and get the query result, so there is no error in my sql. The client side exception: Caused by: java.sql.SQLException: Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask at org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:189) ... 23 more The server side exception(hadoop-jobtracker): 2012-11-05 18:55:39,443 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201210301133_0112_m_00_3: java.io.IOException: Cannot create an instance of InputSplit class = org.apache.hadoop.hive.hbase.HBaseSplit:org.apache.hadoop.hive.hbase.HBaseSplit at org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:146) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40) at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:396) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Unknown Source) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.hbase.HBaseSplit at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Unknown Source) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:819) at org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:143) ... 10 more My hive-env.sh export HIVE_AUX_JARS_PATH=/data/install/hive-0.9.0/lib/hive-hbase-handler-0.9.0.jar,/data/install/hive-0.9.0/lib/hbase-0.92.0.jar,/data/install/hive-0.9.0/lib/zookeeper-3.4.2.jar My hive-site.xml property namehive.zookeeper.quorum/name valuehadoop01,hadoop02,hadoop03/value descriptionThe list of zookeeper servers to talk to. This is only needed for read/write locks./description /property And I start thrift service as below: hive --service hiveserver -p 1 The server side error log says that HBaseSplit is not found. But why? How can I fix this? -- Regards, Cheng Su
Re: Alter table is giving error
Recover partitions should work the same way for different file systems. Edward On Mon, Nov 5, 2012 at 9:33 AM, Dean Wampler dean.wamp...@thinkbiganalytics.com wrote: Writing a script to add the external partitions individually is the only way I know of. Sent from my rotary phone. On Nov 5, 2012, at 8:19 AM, Chunky Gupta chunky.gu...@vizury.com wrote: Hi Dean, Actually I was having Hadoop and Hive cluster on EMR and I have S3 storage containing logs which updates daily and having partition with date(dt). And I was using this recover partition. Now I wanted to shift to EC2 and have my own Hadoop and Hive cluster. So, what is the alternate of using recover partition in this case, if you have any idea ? I found one way of individually partitioning all dates, so I have to write script for that to do so for all dates. Is there any easiest way other than this ? Thanks, Chunky On Mon, Nov 5, 2012 at 6:28 PM, Dean Wampler dean.wamp...@thinkbiganalytics.com wrote: The RECOVER PARTITIONS is an enhancement added by Amazon to their version of Hive. http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html shameless-plus Chapter 21 of Programming Hive discusses this feature and other aspects of using Hive in EMR. /shameless-plug dean On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta chunky.gu...@vizury.com wrote: Hi, I am having a cluster setup on EC2 with Hadoop version 0.20.2 and Hive version 0.8.1 (I configured everything) . I have created a table using :- CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW FORMAT DELIMITED FIELDS TERMINATED BY 'WWW' LOCATION 's3://my-location/data/'; Now I am trying to recover partition using :- ALTER TABLE XXX RECOVER PARTITIONS; but I am getting this error :- FAILED: Parse Error: line 1:12 cannot recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter table statement Doing same steps on a cluster setup on EMR with Hadoop version 1.0.3 and Hive version 0.8.1 (Configured by EMR), works fine. So is this a version issue or am I missing some configuration changes in EC2 setup ? I am not able to find exact solution for this problem on internet. Please help me. Thanks, Chunky. -- Dean Wampler, Ph.D. thinkbiganalytics.com +1-312-339-1330
Hive compression with external table
Hi all, I'm looking into finding a suitable format to store data in HDFS, so that it's available for processing by Hive. Ideally I would like to satisfy the following: 1. store the data in a format that is readable by multiple Hadoop projects (eg. Pig, Mahout, etc.), not just Hive 2. work with a Hive external table 3. store data in a compressed format that is splittable (1) is a requirement because Hive isn't appropriate for all the problems that we want to throw at Hadoop. (2) is really more of a consequence of (1). Ideally we want the data stored in some open format that is compressed in HDFS. This way we can just point Hive, Pig, Mahout, etc at it depending on the problem. (3) is obviously so it plays well with Hadoop. Gzip is no good because it is not splittable. Snappy looked promising, but it is splittable only if used with a non-external Hive table. LZO also looked promising, but I wonder about whether it is future proof given the licencing issues surrounding it. So far, the only solution I could find that satisfies all the above seems to be bzip2 compression, but concerns about its performance make me wary about choosing it. Is bzip2 the only option I have? Or have I missed some other compression option? Cheers, Krishna
Re: Hive compression with external table
Compression is a confusing issue. Sequence files that are in block format are always split table regardless of what compression for the block is chosen.The Programming Hive book has an entire section dedicated to the permutations of compression options. Edward On Mon, Nov 5, 2012 at 10:57 AM, Krishna Rao krishnanj...@gmail.com wrote: Hi all, I'm looking into finding a suitable format to store data in HDFS, so that it's available for processing by Hive. Ideally I would like to satisfy the following: 1. store the data in a format that is readable by multiple Hadoop projects (eg. Pig, Mahout, etc.), not just Hive 2. work with a Hive external table 3. store data in a compressed format that is splittable (1) is a requirement because Hive isn't appropriate for all the problems that we want to throw at Hadoop. (2) is really more of a consequence of (1). Ideally we want the data stored in some open format that is compressed in HDFS. This way we can just point Hive, Pig, Mahout, etc at it depending on the problem. (3) is obviously so it plays well with Hadoop. Gzip is no good because it is not splittable. Snappy looked promising, but it is splittable only if used with a non-external Hive table. LZO also looked promising, but I wonder about whether it is future proof given the licencing issues surrounding it. So far, the only solution I could find that satisfies all the above seems to be bzip2 compression, but concerns about its performance make me wary about choosing it. Is bzip2 the only option I have? Or have I missed some other compression option? Cheers, Krishna
Re: Alter table is giving error
Chunky, I have used recover partitions command on EMR, and that worked fine. However, take a look at https://issues.apache.org/jira/browse/HIVE-874. Seems like msck command in Apache Hive does the same thing. Try it out and let us know it goes. Mark On Mon, Nov 5, 2012 at 7:56 AM, Edward Capriolo edlinuxg...@gmail.comwrote: Recover partitions should work the same way for different file systems. Edward On Mon, Nov 5, 2012 at 9:33 AM, Dean Wampler dean.wamp...@thinkbiganalytics.com wrote: Writing a script to add the external partitions individually is the only way I know of. Sent from my rotary phone. On Nov 5, 2012, at 8:19 AM, Chunky Gupta chunky.gu...@vizury.com wrote: Hi Dean, Actually I was having Hadoop and Hive cluster on EMR and I have S3 storage containing logs which updates daily and having partition with date(dt). And I was using this recover partition. Now I wanted to shift to EC2 and have my own Hadoop and Hive cluster. So, what is the alternate of using recover partition in this case, if you have any idea ? I found one way of individually partitioning all dates, so I have to write script for that to do so for all dates. Is there any easiest way other than this ? Thanks, Chunky On Mon, Nov 5, 2012 at 6:28 PM, Dean Wampler dean.wamp...@thinkbiganalytics.com wrote: The RECOVER PARTITIONS is an enhancement added by Amazon to their version of Hive. http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html shameless-plus Chapter 21 of Programming Hive discusses this feature and other aspects of using Hive in EMR. /shameless-plug dean On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta chunky.gu...@vizury.com wrote: Hi, I am having a cluster setup on EC2 with Hadoop version 0.20.2 and Hive version 0.8.1 (I configured everything) . I have created a table using :- CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW FORMAT DELIMITED FIELDS TERMINATED BY 'WWW' LOCATION 's3://my-location/data/'; Now I am trying to recover partition using :- ALTER TABLE XXX RECOVER PARTITIONS; but I am getting this error :- FAILED: Parse Error: line 1:12 cannot recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter table statement Doing same steps on a cluster setup on EMR with Hadoop version 1.0.3 and Hive version 0.8.1 (Configured by EMR), works fine. So is this a version issue or am I missing some configuration changes in EC2 setup ? I am not able to find exact solution for this problem on internet. Please help me. Thanks, Chunky. -- Dean Wampler, Ph.D. thinkbiganalytics.com +1-312-339-1330
Hive 0.7.1 with MySQL 5.5 as metastore
I am working on copying existing Hive metadata (Hive 0.7.1 with MySQL 5.1) to a new cluster environment (Hive 0.7.1 with MySQL 5.5). I copied over the metastore tables and modified the data under SDS (sub-directories) table to reflect the new data path. However I am getting MySQL integrity constraint violation against SDS.SD_ID column while trying to create new Hive tables. Is this a problem with the MySQL version I am using ? Does Hive 0.7.1 support MySQL 5.5 as the metastore. Thanks,Venkatesh
Re: Hive 0.7.1 with MySQL 5.5 as metastore
Moving underlying data files around is not the correct way to perform an upgrade. https://dev.mysql.com/doc/refman/5.5/en/upgrading-from-previous-series.html I would do a mysqldump and then re-insert the data for maximum comparability. On Mon, Nov 5, 2012 at 6:21 PM, Venkatesh Kavuluri vkavul...@outlook.com wrote: I am working on copying existing Hive metadata (Hive 0.7.1 with MySQL 5.1) to a new cluster environment (Hive 0.7.1 with MySQL 5.5). I copied over the metastore tables and modified the data under SDS (sub-directories) table to reflect the new data path. However I am getting MySQL integrity constraint violation against SDS.SD_ID column while trying to create new Hive tables. Is this a problem with the MySQL version I am using ? Does Hive 0.7.1 support MySQL 5.5 as the metastore. Thanks, Venkatesh
RE: Hive 0.7.1 with MySQL 5.5 as metastore
Sorry for the confusion, the problem is not with the MySQL version upgrade - I have indeed performed the upgrade by doing a mysqldump and restoring the data. The problem is with how Hive 0.7.1 is interacting with the same metastore data on a different version of MySQL server. Date: Mon, 5 Nov 2012 18:31:37 -0500 Subject: Re: Hive 0.7.1 with MySQL 5.5 as metastore From: edlinuxg...@gmail.com To: user@hive.apache.org Moving underlying data files around is not the correct way to perform an upgrade. https://dev.mysql.com/doc/refman/5.5/en/upgrading-from-previous-series.html I would do a mysqldump and then re-insert the data for maximum comparability. On Mon, Nov 5, 2012 at 6:21 PM, Venkatesh Kavuluri vkavul...@outlook.com wrote: I am working on copying existing Hive metadata (Hive 0.7.1 with MySQL 5.1) to a new cluster environment (Hive 0.7.1 with MySQL 5.5). I copied over the metastore tables and modified the data under SDS (sub-directories) table to reflect the new data path. However I am getting MySQL integrity constraint violation against SDS.SD_ID column while trying to create new Hive tables. Is this a problem with the MySQL version I am using ? Does Hive 0.7.1 support MySQL 5.5 as the metastore. Thanks, Venkatesh
Re: ClassNotFoundException when use hive java client of hive + hbase integration
Cheng, You will have to add the appropriate HBase related jars to your class path. You can do that by running add jar command(s) or put it in aux_lib. See this thread for reference: http://mail-archives.apache.org/mod_mbox/hive-user/201103.mbox/%3caanlktingqlgknqmizgoi+szfnexgcat8caqtovf8j...@mail.gmail.com%3E Mark On Mon, Nov 5, 2012 at 6:53 AM, Cheng Su scarcer...@gmail.com wrote: Hi, all. I have a hive+hbase integration cluster. When I try to execute query through the java client of hive, sometimes a ClassNotFoundException happens. My java code : final Connection conn = DriverManager.getConnection(URL); final ResultSet rs = conn.executeQuery(SELECT count(*) FROM test_table WHERE (source = '0' AND ur_createtime BETWEEN '2012103100' AND '20121031235959')); I can execute the sql:SELECT count(*) FROM test_table WHERE (source = '0' AND ur_createtime BETWEEN '2012103100' AND '20121031235959') in hive cli mode, and get the query result, so there is no error in my sql. The client side exception: Caused by: java.sql.SQLException: Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask at org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:189) ... 23 more The server side exception(hadoop-jobtracker): 2012-11-05 18:55:39,443 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201210301133_0112_m_00_3: java.io.IOException: Cannot create an instance of InputSplit class = org.apache.hadoop.hive.hbase.HBaseSplit:org.apache.hadoop.hive.hbase.HBaseSplit at org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:146) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40) at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:396) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Unknown Source) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.hbase.HBaseSplit at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Unknown Source) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:819) at org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:143) ... 10 more My hive-env.sh export HIVE_AUX_JARS_PATH=/data/install/hive-0.9.0/lib/hive-hbase-handler-0.9.0.jar,/data/install/hive-0.9.0/lib/hbase-0.92.0.jar,/data/install/hive-0.9.0/lib/zookeeper-3.4.2.jar My hive-site.xml property namehive.zookeeper.quorum/name valuehadoop01,hadoop02,hadoop03/value descriptionThe list of zookeeper servers to talk to. This is only needed for read/write locks./description /property And I start thrift service as below: hive --service hiveserver -p 1 The server side error log says that HBaseSplit is not found. But why? How can I fix this? -- Regards, Cheng Su
Re: Hive 0.7.1 with MySQL 5.5 as metastore
Venkatesh, What's the exact integrity constraint error you are seeing? I'd be curious to see if you restored the data from the mysqldump onto a separate schema/db on MySQL 5.1 server whether you still get the error or not. Mark On Mon, Nov 5, 2012 at 3:37 PM, Venkatesh Kavuluri vkavul...@outlook.comwrote: Sorry for the confusion, the problem is not with the MySQL version upgrade - I have indeed performed the upgrade by doing a mysqldump and restoring the data. The problem is with how Hive 0.7.1 is interacting with the same metastore data on a different version of MySQL server. Date: Mon, 5 Nov 2012 18:31:37 -0500 Subject: Re: Hive 0.7.1 with MySQL 5.5 as metastore From: edlinuxg...@gmail.com To: user@hive.apache.org Moving underlying data files around is not the correct way to perform an upgrade. https://dev.mysql.com/doc/refman/5.5/en/upgrading-from-previous-series.html I would do a mysqldump and then re-insert the data for maximum comparability. On Mon, Nov 5, 2012 at 6:21 PM, Venkatesh Kavuluri vkavul...@outlook.com wrote: I am working on copying existing Hive metadata (Hive 0.7.1 with MySQL 5.1) to a new cluster environment (Hive 0.7.1 with MySQL 5.5). I copied over the metastore tables and modified the data under SDS (sub-directories) table to reflect the new data path. However I am getting MySQL integrity constraint violation against SDS.SD_ID column while trying to create new Hive tables. Is this a problem with the MySQL version I am using ? Does Hive 0.7.1 support MySQL 5.5 as the metastore. Thanks, Venkatesh
RE: Hive 0.7.1 with MySQL 5.5 as metastore
Hi Mark, I just started to restore the data to a separate MySQL 5.1 schema, will try to create a table and post back here. I copied the error stack trace below. Nov 5 22:24:02 127.0.0.1/127.0.0.1 local3:[ETLManager] ERROR [pool-2-thread-1] exec.MoveTask - Failed with exception Insert of object org.apache.hadoop.hive.metastore.model.MStorageDescriptor@1db0454f using statement INSERT INTO `SDS` (`SD_ID`,`LOCATION`,`OUTPUT_FORMAT`,`IS_COMPRESSED`,`NUM_BUCKETS`,`INPUT_FORMAT`,`SERDE_ID`) VALUES (?,?,?,?,?,?,?) failed : Duplicate entry '5152711' for key 'PRIMARY'javax.jdo.JDODataStoreException: Insert of object org.apache.hadoop.hive.metastore.model.MStorageDescriptor@1db0454f using statement INSERT INTO `SDS` (`SD_ID`,`LOCATION`,`OUTPUT_FORMAT`,`IS_COMPRESSED`,`NUM_BUCKETS`,`INPUT_FORMAT`,`SERDE_ID`) VALUES (?,?,?,?,?,?,?) failed : Duplicate entry '5152711' for key 'PRIMARY' at org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:313) at org.datanucleus.jdo.JDOTransaction.commit(JDOTransaction.java:132) at org.apache.hadoop.hive.metastore.ObjectStore.commitTransaction(ObjectStore.java:315) at org.apache.hadoop.hive.metastore.HiveAlterHandler.alterTable(HiveAlterHandler.java:172) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$29.run(HiveMetaStore.java:1687) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$29.run(HiveMetaStore.java:1684) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.executeWithRetry(HiveMetaStore.java:307) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table(HiveMetaStore.java:1684) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_table(HiveMetaStoreClient.java:166) at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:354) at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1194)at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:131) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063)at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900)at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748) Thanks,Venkatesh Date: Mon, 5 Nov 2012 17:17:46 -0800 Subject: Re: Hive 0.7.1 with MySQL 5.5 as metastore From: grover.markgro...@gmail.com To: user@hive.apache.org Venkatesh,What's the exact integrity constraint error you are seeing? I'd be curious to see if you restored the data from the mysqldump onto a separate schema/db on MySQL 5.1 server whether you still get the error or not. Mark On Mon, Nov 5, 2012 at 3:37 PM, Venkatesh Kavuluri vkavul...@outlook.com wrote: Sorry for the confusion, the problem is not with the MySQL version upgrade - I have indeed performed the upgrade by doing a mysqldump and restoring the data. The problem is with how Hive 0.7.1 is interacting with the same metastore data on a different version of MySQL server. Date: Mon, 5 Nov 2012 18:31:37 -0500 Subject: Re: Hive 0.7.1 with MySQL 5.5 as metastore From: edlinuxg...@gmail.com To: user@hive.apache.org Moving underlying data files around is not the correct way to perform an upgrade. https://dev.mysql.com/doc/refman/5.5/en/upgrading-from-previous-series.html I would do a mysqldump and then re-insert the data for maximum comparability. On Mon, Nov 5, 2012 at 6:21 PM, Venkatesh Kavuluri vkavul...@outlook.com wrote: I am working on copying existing Hive metadata (Hive 0.7.1 with MySQL 5.1) to a new cluster environment (Hive 0.7.1 with MySQL 5.5). I copied over the metastore tables and modified the data under SDS (sub-directories) table to reflect the new data path. However I am getting MySQL integrity constraint violation against SDS.SD_ID column while trying to create new Hive tables. Is this a problem with the MySQL version I am using ? Does Hive 0.7.1 support MySQL 5.5 as the metastore. Thanks, Venkatesh
hive integrate with hbase, map to existed hbase table report column family not exist
hi all: now, I'm map to an existed hbase table, i got the following infomation as: FAILED: Error in metadata: MetaException(message:Column Family data is not defined in hbase table df_money_files) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask my hive QL is create external table hbase_money_files (rowkey string, user_no string, mon int, mon_sn int, group_no int, sn int, write_sect_no string, write_sn int, business_place_code string, power_no int, trans_group int, price_code string, ts_flag string, elec_type_code string, trade_type_code string, ms_mode string, user_ms_type string, write_power double, chg_powerdouble, add_powerdouble, kb_power double, share_power double, total_power double, total_money double, num_moneydouble, add_money1 double, add_money2 double, add_money3 double, add_money4 double, add_money5 double, add_money6 double, add_money7 double, add_money8 double, add_money9 double, add_money10 double, rp_power double, rp_money double, should_money double, create_date string, creator string, warrant_no int, line_codestring, trans_no string, add_taxflag string, write_date string, compute_date string, calculator_id string, statusstring, user_type1string, rela_user_no string, part_sn int, have_ext string, id_fragment string, check_date string, check_manstring, start_date string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,data:user_no, data:mon, data:mon_sn, data:group_no, data:sn, data:write_sect_no, data:write_sn, data:business_place_code, data:power_no, data:trans_group, data:price_code, data:ts_flag, data:elec_type_code, data:trade_type_code, data:ms_mode, data:user_ms_type, data:write_power, data:chg_power, data:add_power, data:kb_power, data:share_power, data:total_power, data:total_money, data:num_money, data:add_money1, data:add_money2, data:add_money3, data:add_money4, data:add_money5, data:add_money6, data:add_money7, data:add_money8, data:add_money9, data:add_money10, data:rp_power, data:rp_money, data:should_money, data:create_date, data:creator, data:warrant_no, data:line_code, data:trans_no, data:add_taxflag, data:write_date, data:compute_date, data:calculator_id, data:status, data:user_type1, data:rela_user_no, data:part_sn, data:have_ext, data:id_fragment, data:check_date, data:check_man, data:start_date) TBLPROPERTIES(hbase.table.name = df_money_files); however the data column does exist! when i describe the table in hbase shell, it reported: hbase(main):001:0 describe 'df_money_files' DESCRIPTION ENABLED {NAME = 'df_money_files', FAMILIES = [{NAME = 'd true ata', BLOOMFILTER = 'NONE', REPLICATION_SCOPE = ' 0', VERSIONS = '3', COMPRESSION = 'NONE', MIN_VER SIONS = '0', TTL = '2147483647', BLOCKSIZE = '65 536', IN_MEMORY = 'false', BLOCKCACHE = 'true'}]} 1 row(s) in 0.8470 seconds i am confused now, anyone can give some infomation? Chris Gong
Re: ClassNotFoundException when use hive java client of hive + hbase integration
Mark, thank you so much for your suggestion. Although I've already add necessary jars to my hive aux path, thus I can execute my sql in hive CLI mode without getting any error. But when I use a java client to access the tables through the thrift service, I need to add these jars manually. I execute the ADD JAR .jar sql and the problem is solved! Thank you again! On Tue, Nov 6, 2012 at 9:03 AM, Mark Grover grover.markgro...@gmail.com wrote: Cheng, You will have to add the appropriate HBase related jars to your class path. You can do that by running add jar command(s) or put it in aux_lib. See this thread for reference: http://mail-archives.apache.org/mod_mbox/hive-user/201103.mbox/%3caanlktingqlgknqmizgoi+szfnexgcat8caqtovf8j...@mail.gmail.com%3E Mark On Mon, Nov 5, 2012 at 6:53 AM, Cheng Su scarcer...@gmail.com wrote: Hi, all. I have a hive+hbase integration cluster. When I try to execute query through the java client of hive, sometimes a ClassNotFoundException happens. My java code : final Connection conn = DriverManager.getConnection(URL); final ResultSet rs = conn.executeQuery(SELECT count(*) FROM test_table WHERE (source = '0' AND ur_createtime BETWEEN '2012103100' AND '20121031235959')); I can execute the sql:SELECT count(*) FROM test_table WHERE (source = '0' AND ur_createtime BETWEEN '2012103100' AND '20121031235959') in hive cli mode, and get the query result, so there is no error in my sql. The client side exception: Caused by: java.sql.SQLException: Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask at org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:189) ... 23 more The server side exception(hadoop-jobtracker): 2012-11-05 18:55:39,443 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201210301133_0112_m_00_3: java.io.IOException: Cannot create an instance of InputSplit class = org.apache.hadoop.hive.hbase.HBaseSplit:org.apache.hadoop.hive.hbase.HBaseSplit at org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:146) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40) at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:396) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Unknown Source) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.hbase.HBaseSplit at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Unknown Source) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:819) at org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:143) ... 10 more My hive-env.sh export HIVE_AUX_JARS_PATH=/data/install/hive-0.9.0/lib/hive-hbase-handler-0.9.0.jar,/data/install/hive-0.9.0/lib/hbase-0.92.0.jar,/data/install/hive-0.9.0/lib/zookeeper-3.4.2.jar My hive-site.xml property namehive.zookeeper.quorum/name valuehadoop01,hadoop02,hadoop03/value descriptionThe list of zookeeper servers to talk to. This is only needed for read/write locks./description /property And I start thrift service as below: hive --service hiveserver -p 1 The server side error log says that HBaseSplit is not found. But why? How can I fix this? -- Regards, Cheng Su -- Regards, Cheng Su
Which is the postgres version is compatible for hive-trunk..?
Hi Guys, I am planning to use postgres as metastore with Hive. Can anyone point me which is the postgres version is compatible with Hive..? Regards Rohith Sharma K S
Re: Which is the postgres version is compatible for hive-trunk..?
Hey Rohith, last time I used psql was with postgresql-8.4-701.jdbc4.jar, and was working great. But I guess all 8.x version should work. Postgres 9x I personally wouldn't choose. best, Alex On Nov 6, 2012, at 5:29 AM, rohithsharma rohithsharm...@huawei.com wrote: Hi Guys, I am planning to use postgres as metastore with Hive. Can anyone point me which is the postgres version is compatible with Hive..? Regards Rohith Sharma K S -- Alexander Alten-Lorenz http://mapredit.blogspot.com German Hadoop LinkedIn Group: http://goo.gl/N8pCF