RE: Which is the postgres version is compatible for hive-trunk..?
Hi Lorenz, Thanks for you fast reply :-) Postgres 9x I personally wouldn't choose. Can you please tell what is the problem using 9x version with hive.? I tried with 9.0.7, it was successfully integrated and functionality also fine. But when I changed to 9.1, basic show table in hive failed. I was wondering what is the major change between 9.0.7 to 9.1.4.? Regards Rohith Sharma K S -Original Message- From: Alexander Lorenz [mailto:wget.n...@gmail.com] Sent: Tuesday, November 06, 2012 12:40 PM To: user@hive.apache.org; rohithsharm...@huawei.com Subject: Re: Which is the postgres version is compatible for hive-trunk..? Hey Rohith, last time I used psql was with postgresql-8.4-701.jdbc4.jar, and was working great. But I guess all 8.x version should work. Postgres 9x I personally wouldn't choose. best, Alex On Nov 6, 2012, at 5:29 AM, rohithsharma rohithsharm...@huawei.com wrote: Hi Guys, I am planning to use postgres as metastore with Hive. Can anyone point me which is the postgres version is compatible with Hive..? Regards Rohith Sharma K S -- Alexander Alten-Lorenz http://mapredit.blogspot.com German Hadoop LinkedIn Group: http://goo.gl/N8pCF
Re: Hive compression with external table
Thanks for the reply. Compressed sequence files with compression might work. However, it's not clear to me if it's possible to read Sequence files using an external table. On 5 November 2012 16:04, Edward Capriolo edlinuxg...@gmail.com wrote: Compression is a confusing issue. Sequence files that are in block format are always split table regardless of what compression for the block is chosen.The Programming Hive book has an entire section dedicated to the permutations of compression options. Edward On Mon, Nov 5, 2012 at 10:57 AM, Krishna Rao krishnanj...@gmail.com wrote: Hi all, I'm looking into finding a suitable format to store data in HDFS, so that it's available for processing by Hive. Ideally I would like to satisfy the following: 1. store the data in a format that is readable by multiple Hadoop projects (eg. Pig, Mahout, etc.), not just Hive 2. work with a Hive external table 3. store data in a compressed format that is splittable (1) is a requirement because Hive isn't appropriate for all the problems that we want to throw at Hadoop. (2) is really more of a consequence of (1). Ideally we want the data stored in some open format that is compressed in HDFS. This way we can just point Hive, Pig, Mahout, etc at it depending on the problem. (3) is obviously so it plays well with Hadoop. Gzip is no good because it is not splittable. Snappy looked promising, but it is splittable only if used with a non-external Hive table. LZO also looked promising, but I wonder about whether it is future proof given the licencing issues surrounding it. So far, the only solution I could find that satisfies all the above seems to be bzip2 compression, but concerns about its performance make me wary about choosing it. Is bzip2 the only option I have? Or have I missed some other compression option? Cheers, Krishna
Re: Which is the postgres version is compatible for hive-trunk..?
Hi, I don't know what exactly was the major changes between 9.0.x and 9.1.x, but when you've a good experience with 9.0.x, use it :) For 9.1.x, it would be great to get a log or stack trace to see whats going wrong there. best, Alex On Nov 6, 2012, at 9:53 AM, rohithsharma rohithsharm...@huawei.com wrote: Hi Lorenz, Thanks for you fast reply :-) Postgres 9x I personally wouldn't choose. Can you please tell what is the problem using 9x version with hive.? I tried with 9.0.7, it was successfully integrated and functionality also fine. But when I changed to 9.1, basic show table in hive failed. I was wondering what is the major change between 9.0.7 to 9.1.4.? Regards Rohith Sharma K S -Original Message- From: Alexander Lorenz [mailto:wget.n...@gmail.com] Sent: Tuesday, November 06, 2012 12:40 PM To: user@hive.apache.org; rohithsharm...@huawei.com Subject: Re: Which is the postgres version is compatible for hive-trunk..? Hey Rohith, last time I used psql was with postgresql-8.4-701.jdbc4.jar, and was working great. But I guess all 8.x version should work. Postgres 9x I personally wouldn't choose. best, Alex On Nov 6, 2012, at 5:29 AM, rohithsharma rohithsharm...@huawei.com wrote: Hi Guys, I am planning to use postgres as metastore with Hive. Can anyone point me which is the postgres version is compatible with Hive..? Regards Rohith Sharma K S -- Alexander Alten-Lorenz http://mapredit.blogspot.com German Hadoop LinkedIn Group: http://goo.gl/N8pCF -- Alexander Alten-Lorenz http://mapredit.blogspot.com German Hadoop LinkedIn Group: http://goo.gl/N8pCF
Re: Which is the postgres version is compatible for hive-trunk..?
Hi, Some major differences between these versions are: PostgreSQL 9.0 - Streaming Replication and Hot Standby (latest is 9.0.10) PostgreSQL 9.1 - Synchronous Replication (latest is 9.1.6) PostgreSQL 9.2 - Cascading Replication (latest is 9.2.1) PostgreSQL 9.3 - (not yet release version: Multiple-master replication) Also here is a link about PostgreSQL Feature Matrix per version: http://www.emblocsoft.com/PostgreSQL/PostgreSQL-Feature-Matrix Could you share what issues you have about '9.1, basic show table in hive failed' so we might be of help. With Regards TC On 6 Nov 2012, at 6:03 PM, Alexander Lorenz wrote: Hi, I don't know what exactly was the major changes between 9.0.x and 9.1.x, but when you've a good experience with 9.0.x, use it :) For 9.1.x, it would be great to get a log or stack trace to see whats going wrong there. best, Alex On Nov 6, 2012, at 9:53 AM, rohithsharma rohithsharm...@huawei.com wrote: Hi Lorenz, Thanks for you fast reply :-) Postgres 9x I personally wouldn't choose. Can you please tell what is the problem using 9x version with hive.? I tried with 9.0.7, it was successfully integrated and functionality also fine. But when I changed to 9.1, basic show table in hive failed. I was wondering what is the major change between 9.0.7 to 9.1.4.? Regards Rohith Sharma K S -Original Message- From: Alexander Lorenz [mailto:wget.n...@gmail.com] Sent: Tuesday, November 06, 2012 12:40 PM To: user@hive.apache.org; rohithsharm...@huawei.com Subject: Re: Which is the postgres version is compatible for hive-trunk..? Hey Rohith, last time I used psql was with postgresql-8.4-701.jdbc4.jar, and was working great. But I guess all 8.x version should work. Postgres 9x I personally wouldn't choose. best, Alex On Nov 6, 2012, at 5:29 AM, rohithsharma rohithsharm...@huawei.com wrote: Hi Guys, I am planning to use postgres as metastore with Hive. Can anyone point me which is the postgres version is compatible with Hive..? Regards Rohith Sharma K S -- Alexander Alten-Lorenz http://mapredit.blogspot.com German Hadoop LinkedIn Group: http://goo.gl/N8pCF -- Alexander Alten-Lorenz http://mapredit.blogspot.com German Hadoop LinkedIn Group: http://goo.gl/N8pCF
Re: Alter table is giving error
Hi Mark, I tried msck, but it is not working for me. I have written a python script to partition the data individually. Thank you Edward, Mark and Dean. Chunky. On Mon, Nov 5, 2012 at 11:08 PM, Mark Grover grover.markgro...@gmail.comwrote: Chunky, I have used recover partitions command on EMR, and that worked fine. However, take a look at https://issues.apache.org/jira/browse/HIVE-874. Seems like msck command in Apache Hive does the same thing. Try it out and let us know it goes. Mark On Mon, Nov 5, 2012 at 7:56 AM, Edward Capriolo edlinuxg...@gmail.comwrote: Recover partitions should work the same way for different file systems. Edward On Mon, Nov 5, 2012 at 9:33 AM, Dean Wampler dean.wamp...@thinkbiganalytics.com wrote: Writing a script to add the external partitions individually is the only way I know of. Sent from my rotary phone. On Nov 5, 2012, at 8:19 AM, Chunky Gupta chunky.gu...@vizury.com wrote: Hi Dean, Actually I was having Hadoop and Hive cluster on EMR and I have S3 storage containing logs which updates daily and having partition with date(dt). And I was using this recover partition. Now I wanted to shift to EC2 and have my own Hadoop and Hive cluster. So, what is the alternate of using recover partition in this case, if you have any idea ? I found one way of individually partitioning all dates, so I have to write script for that to do so for all dates. Is there any easiest way other than this ? Thanks, Chunky On Mon, Nov 5, 2012 at 6:28 PM, Dean Wampler dean.wamp...@thinkbiganalytics.com wrote: The RECOVER PARTITIONS is an enhancement added by Amazon to their version of Hive. http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html shameless-plus Chapter 21 of Programming Hive discusses this feature and other aspects of using Hive in EMR. /shameless-plug dean On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta chunky.gu...@vizury.com wrote: Hi, I am having a cluster setup on EC2 with Hadoop version 0.20.2 and Hive version 0.8.1 (I configured everything) . I have created a table using :- CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW FORMAT DELIMITED FIELDS TERMINATED BY 'WWW' LOCATION 's3://my-location/data/'; Now I am trying to recover partition using :- ALTER TABLE XXX RECOVER PARTITIONS; but I am getting this error :- FAILED: Parse Error: line 1:12 cannot recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter table statement Doing same steps on a cluster setup on EMR with Hadoop version 1.0.3 and Hive version 0.8.1 (Configured by EMR), works fine. So is this a version issue or am I missing some configuration changes in EC2 setup ? I am not able to find exact solution for this problem on internet. Please help me. Thanks, Chunky. -- Dean Wampler, Ph.D. thinkbiganalytics.com +1-312-339-1330
Re: Alter table is giving error
Glad to hear, Chunky. Out of curiosity, what errors did you get when using msck? On Tue, Nov 6, 2012 at 5:14 AM, Chunky Gupta chunky.gu...@vizury.comwrote: Hi Mark, I tried msck, but it is not working for me. I have written a python script to partition the data individually. Thank you Edward, Mark and Dean. Chunky. On Mon, Nov 5, 2012 at 11:08 PM, Mark Grover grover.markgro...@gmail.comwrote: Chunky, I have used recover partitions command on EMR, and that worked fine. However, take a look at https://issues.apache.org/jira/browse/HIVE-874. Seems like msck command in Apache Hive does the same thing. Try it out and let us know it goes. Mark On Mon, Nov 5, 2012 at 7:56 AM, Edward Capriolo edlinuxg...@gmail.comwrote: Recover partitions should work the same way for different file systems. Edward On Mon, Nov 5, 2012 at 9:33 AM, Dean Wampler dean.wamp...@thinkbiganalytics.com wrote: Writing a script to add the external partitions individually is the only way I know of. Sent from my rotary phone. On Nov 5, 2012, at 8:19 AM, Chunky Gupta chunky.gu...@vizury.com wrote: Hi Dean, Actually I was having Hadoop and Hive cluster on EMR and I have S3 storage containing logs which updates daily and having partition with date(dt). And I was using this recover partition. Now I wanted to shift to EC2 and have my own Hadoop and Hive cluster. So, what is the alternate of using recover partition in this case, if you have any idea ? I found one way of individually partitioning all dates, so I have to write script for that to do so for all dates. Is there any easiest way other than this ? Thanks, Chunky On Mon, Nov 5, 2012 at 6:28 PM, Dean Wampler dean.wamp...@thinkbiganalytics.com wrote: The RECOVER PARTITIONS is an enhancement added by Amazon to their version of Hive. http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html shameless-plus Chapter 21 of Programming Hive discusses this feature and other aspects of using Hive in EMR. /shameless-plug dean On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta chunky.gu...@vizury.com wrote: Hi, I am having a cluster setup on EC2 with Hadoop version 0.20.2 and Hive version 0.8.1 (I configured everything) . I have created a table using :- CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW FORMAT DELIMITED FIELDS TERMINATED BY 'WWW' LOCATION 's3://my-location/data/'; Now I am trying to recover partition using :- ALTER TABLE XXX RECOVER PARTITIONS; but I am getting this error :- FAILED: Parse Error: line 1:12 cannot recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter table statement Doing same steps on a cluster setup on EMR with Hadoop version 1.0.3 and Hive version 0.8.1 (Configured by EMR), works fine. So is this a version issue or am I missing some configuration changes in EC2 setup ? I am not able to find exact solution for this problem on internet. Please help me. Thanks, Chunky. -- Dean Wampler, Ph.D. thinkbiganalytics.com +1-312-339-1330
Re: Hive compression with external table
Hi Krishna Sequence Files + Snappy compressed would be my recommendation as well. It can be processed by managed as well as external tables. There is no difference in storage formats for managed and external tables. Also this can be consumed by mapred or pig directly. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Krishna Rao krishnanj...@gmail.com Date: Tue, 6 Nov 2012 09:50:33 To: user@hive.apache.org Reply-To: user@hive.apache.org Subject: Re: Hive compression with external table Thanks for the reply. Compressed sequence files with compression might work. However, it's not clear to me if it's possible to read Sequence files using an external table. On 5 November 2012 16:04, Edward Capriolo edlinuxg...@gmail.com wrote: Compression is a confusing issue. Sequence files that are in block format are always split table regardless of what compression for the block is chosen.The Programming Hive book has an entire section dedicated to the permutations of compression options. Edward On Mon, Nov 5, 2012 at 10:57 AM, Krishna Rao krishnanj...@gmail.com wrote: Hi all, I'm looking into finding a suitable format to store data in HDFS, so that it's available for processing by Hive. Ideally I would like to satisfy the following: 1. store the data in a format that is readable by multiple Hadoop projects (eg. Pig, Mahout, etc.), not just Hive 2. work with a Hive external table 3. store data in a compressed format that is splittable (1) is a requirement because Hive isn't appropriate for all the problems that we want to throw at Hadoop. (2) is really more of a consequence of (1). Ideally we want the data stored in some open format that is compressed in HDFS. This way we can just point Hive, Pig, Mahout, etc at it depending on the problem. (3) is obviously so it plays well with Hadoop. Gzip is no good because it is not splittable. Snappy looked promising, but it is splittable only if used with a non-external Hive table. LZO also looked promising, but I wonder about whether it is future proof given the licencing issues surrounding it. So far, the only solution I could find that satisfies all the above seems to be bzip2 compression, but concerns about its performance make me wary about choosing it. Is bzip2 the only option I have? Or have I missed some other compression option? Cheers, Krishna
回复: hive integrate with hbase, map to existed hbase table report column family not exist
i got the reason, the column mapping section can't have any white space, including \r\n Chris Gong 发件人: Chris Gong 发送时间: 2012-11-06 10:56 收件人: user-hive 主题: hive integrate with hbase, map to existed hbase table report column family not exist hi all: now, I'm map to an existed hbase table, i got the following infomation as: FAILED: Error in metadata: MetaException(message:Column Family data is not defined in hbase table df_money_files) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask my hive QL is create external table hbase_money_files (rowkey string, user_no string, mon int, mon_sn int, group_no int, sn int, write_sect_no string, write_sn int, business_place_code string, power_no int, trans_group int, price_code string, ts_flag string, elec_type_code string, trade_type_code string, ms_mode string, user_ms_type string, write_power double, chg_powerdouble, add_powerdouble, kb_power double, share_power double, total_power double, total_money double, num_moneydouble, add_money1 double, add_money2 double, add_money3 double, add_money4 double, add_money5 double, add_money6 double, add_money7 double, add_money8 double, add_money9 double, add_money10 double, rp_power double, rp_money double, should_money double, create_date string, creator string, warrant_no int, line_codestring, trans_no string, add_taxflag string, write_date string, compute_date string, calculator_id string, statusstring, user_type1string, rela_user_no string, part_sn int, have_ext string, id_fragment string, check_date string, check_manstring, start_date string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,data:user_no, data:mon, data:mon_sn, data:group_no, data:sn, data:write_sect_no, data:write_sn, data:business_place_code, data:power_no, data:trans_group, data:price_code, data:ts_flag, data:elec_type_code, data:trade_type_code, data:ms_mode, data:user_ms_type, data:write_power, data:chg_power, data:add_power, data:kb_power, data:share_power, data:total_power, data:total_money, data:num_money, data:add_money1, data:add_money2, data:add_money3, data:add_money4, data:add_money5, data:add_money6, data:add_money7, data:add_money8, data:add_money9, data:add_money10, data:rp_power, data:rp_money, data:should_money, data:create_date, data:creator, data:warrant_no, data:line_code, data:trans_no, data:add_taxflag, data:write_date, data:compute_date, data:calculator_id, data:status, data:user_type1, data:rela_user_no, data:part_sn, data:have_ext, data:id_fragment, data:check_date, data:check_man, data:start_date) TBLPROPERTIES(hbase.table.name = df_money_files); however the data column does exist! when i describe the table in hbase shell, it reported: hbase(main):001:0 describe 'df_money_files' DESCRIPTION ENABLED {NAME = 'df_money_files', FAMILIES = [{NAME = 'd true ata', BLOOMFILTER = 'NONE', REPLICATION_SCOPE = ' 0', VERSIONS = '3', COMPRESSION = 'NONE', MIN_VER SIONS = '0', TTL = '2147483647', BLOCKSIZE = '65 536', IN_MEMORY = 'false', BLOCKCACHE = 'true'}]} 1 row(s) in 0.8470 seconds i am confused now, anyone can give some infomation? Chris Gong
Re: ClassNotFoundException when use hive java client of hive + hbase integration
This doesn't work. In CLI mode you could export the environment variable to avoid add jar every time. I did this, but still encounter the error when I access from java client. And I can't even specify the --auxpath param when you start a hive thrift service. So at least in my situation, I have to add jar by myself. On Wed, Nov 7, 2012 at 12:31 AM, kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com wrote: FWIW, you can also drop all your needed jars (including the hbase and zookeeper ones) in a folder and then set this property in your hive-env.sh. export HIVE_AUX_JARS_PATH = path to the folder This way you need not add them manually everytime. On Mon, Nov 5, 2012 at 9:18 PM, Cheng Su scarcer...@gmail.com wrote: Mark, thank you so much for your suggestion. Although I've already add necessary jars to my hive aux path, thus I can execute my sql in hive CLI mode without getting any error. But when I use a java client to access the tables through the thrift service, I need to add these jars manually. I execute the ADD JAR .jar sql and the problem is solved! Thank you again! On Tue, Nov 6, 2012 at 9:03 AM, Mark Grover grover.markgro...@gmail.com wrote: Cheng, You will have to add the appropriate HBase related jars to your class path. You can do that by running add jar command(s) or put it in aux_lib. See this thread for reference: http://mail-archives.apache.org/mod_mbox/hive-user/201103.mbox/%3caanlktingqlgknqmizgoi+szfnexgcat8caqtovf8j...@mail.gmail.com%3E Mark On Mon, Nov 5, 2012 at 6:53 AM, Cheng Su scarcer...@gmail.com wrote: Hi, all. I have a hive+hbase integration cluster. When I try to execute query through the java client of hive, sometimes a ClassNotFoundException happens. My java code : final Connection conn = DriverManager.getConnection(URL); final ResultSet rs = conn.executeQuery(SELECT count(*) FROM test_table WHERE (source = '0' AND ur_createtime BETWEEN '2012103100' AND '20121031235959')); I can execute the sql:SELECT count(*) FROM test_table WHERE (source = '0' AND ur_createtime BETWEEN '2012103100' AND '20121031235959') in hive cli mode, and get the query result, so there is no error in my sql. The client side exception: Caused by: java.sql.SQLException: Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask at org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:189) ... 23 more The server side exception(hadoop-jobtracker): 2012-11-05 18:55:39,443 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201210301133_0112_m_00_3: java.io.IOException: Cannot create an instance of InputSplit class = org.apache.hadoop.hive.hbase.HBaseSplit:org.apache.hadoop.hive.hbase.HBaseSplit at org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:146) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40) at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:396) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Unknown Source) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.hbase.HBaseSplit at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Unknown Source) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:819) at org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:143) ... 10 more My hive-env.sh export HIVE_AUX_JARS_PATH=/data/install/hive-0.9.0/lib/hive-hbase-handler-0.9.0.jar,/data/install/hive-0.9.0/lib/hbase-0.92.0.jar,/data/install/hive-0.9.0/lib/zookeeper-3.4.2.jar My hive-site.xml property namehive.zookeeper.quorum/name valuehadoop01,hadoop02,hadoop03/value descriptionThe list of zookeeper servers to talk to. This is only needed for
Re: hive integrate with hbase, map to existed hbase table report column family not exist
Indeed. https://issues.apache.org/jira/browse/HIVE-3243 Sorry you found out about it the hard way! On Tue, Nov 6, 2012 at 5:46 PM, Chris Gong fellowt...@163.com wrote: ** i got the reason, the column mapping section can't have any white space, including \r\n -- Chris Gong *发件人:* Chris Gong fellowt...@163.com *发送时间:* 2012-11-06 10:56 *收件人:* user-hive user@hive.apache.org *主题:* hive integrate with hbase, map to existed hbase table report column family not exist hi all: now, I'm map to an existed hbase table, i got the following infomation as: FAILED: Error in metadata: MetaException(message:Column Family data is not defined in hbase table df_money_files) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask my hive QL is create external table hbase_money_files (rowkey string, user_no string, mon int, mon_sn int, group_no int, sn int, write_sect_no string, write_sn int, business_place_code string, power_no int, trans_group int, price_code string, ts_flag string, elec_type_code string, trade_type_code string, ms_mode string, user_ms_type string, write_power double, chg_powerdouble, add_powerdouble, kb_power double, share_power double, total_power double, total_money double, num_moneydouble, add_money1 double, add_money2 double, add_money3 double, add_money4 double, add_money5 double, add_money6 double, add_money7 double, add_money8 double, add_money9 double, add_money10 double, rp_power double, rp_money double, should_money double, create_date string, creator string, warrant_no int, line_codestring, trans_no string, add_taxflag string, write_date string, compute_date string, calculator_id string, statusstring, user_type1string, rela_user_no string, part_sn int, have_ext string, id_fragment string, check_date string, check_manstring, start_date string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,data:user_no, data:mon, data:mon_sn, data:group_no, data:sn, data:write_sect_no, data:write_sn, data:business_place_code, data:power_no, data:trans_group, data:price_code, data:ts_flag, data:elec_type_code, data:trade_type_code, data:ms_mode, data:user_ms_type, data:write_power, data:chg_power, data:add_power, data:kb_power, data:share_power, data:total_power, data:total_money, data:num_money, data:add_money1, data:add_money2, data:add_money3, data:add_money4, data:add_money5, data:add_money6, data:add_money7, data:add_money8, data:add_money9, data:add_money10, data:rp_power, data:rp_money, data:should_money, data:create_date, data:creator, data:warrant_no, data:line_code, data:trans_no, data:add_taxflag, data:write_date, data:compute_date, data:calculator_id, data:status, data:user_type1, data:rela_user_no, data:part_sn, data:have_ext, data:id_fragment, data:check_date, data:check_man, data:start_date) TBLPROPERTIES(hbase.table.name = df_money_files); however the data column does exist! when i describe the table in hbase shell, it reported: hbase(main):001:0 describe 'df_money_files' DESCRIPTION ENABLED {NAME = 'df_money_files', FAMILIES = [{NAME = 'd true ata', BLOOMFILTER = 'NONE', REPLICATION_SCOPE = ' 0', VERSIONS = '3', COMPRESSION = 'NONE', MIN_VER SIONS = '0', TTL = '2147483647', BLOCKSIZE = '65 536', IN_MEMORY = 'false', BLOCKCACHE = 'true'}]} 1 row(s) in 0.8470 seconds i am confused now, anyone can give some infomation? -- Chris Gong
RE: Which is the postgres version is compatible for hive-trunk..?
Thanks for you reply.. I will use 9.0.x version. -Original Message- From: Alexander Lorenz [mailto:wget.n...@gmail.com] Sent: Tuesday, November 06, 2012 3:34 PM To: rohithsharm...@huawei.com Cc: user@hive.apache.org Subject: Re: Which is the postgres version is compatible for hive-trunk..? Hi, I don't know what exactly was the major changes between 9.0.x and 9.1.x, but when you've a good experience with 9.0.x, use it :) For 9.1.x, it would be great to get a log or stack trace to see whats going wrong there. best, Alex On Nov 6, 2012, at 9:53 AM, rohithsharma rohithsharm...@huawei.com wrote: Hi Lorenz, Thanks for you fast reply :-) Postgres 9x I personally wouldn't choose. Can you please tell what is the problem using 9x version with hive.? I tried with 9.0.7, it was successfully integrated and functionality also fine. But when I changed to 9.1, basic show table in hive failed. I was wondering what is the major change between 9.0.7 to 9.1.4.? Regards Rohith Sharma K S -Original Message- From: Alexander Lorenz [mailto:wget.n...@gmail.com] Sent: Tuesday, November 06, 2012 12:40 PM To: user@hive.apache.org; rohithsharm...@huawei.com Subject: Re: Which is the postgres version is compatible for hive-trunk..? Hey Rohith, last time I used psql was with postgresql-8.4-701.jdbc4.jar, and was working great. But I guess all 8.x version should work. Postgres 9x I personally wouldn't choose. best, Alex On Nov 6, 2012, at 5:29 AM, rohithsharma rohithsharm...@huawei.com wrote: Hi Guys, I am planning to use postgres as metastore with Hive. Can anyone point me which is the postgres version is compatible with Hive..? Regards Rohith Sharma K S -- Alexander Alten-Lorenz http://mapredit.blogspot.com German Hadoop LinkedIn Group: http://goo.gl/N8pCF -- Alexander Alten-Lorenz http://mapredit.blogspot.com German Hadoop LinkedIn Group: http://goo.gl/N8pCF
Re: TRANSFORM + LATERAL VIEW?
Jamie, Not that I know of. Assuming you will be using LATERAL VIEW for exploding the data, I can think of 2 options at the top of my head: 1. Pass 'id' column to your transform script. You will have to take care of the exploding data in your transform script. It would no longer be a simple 'cat'. 2. Return an array of entries from your Transform script. Consequently, use explode/lateral view on this array to achieve the desired result. The transform script will no longer be a cat, in this case too. Hope that helps. Let us know how it goes. Mark On Mon, Nov 5, 2012 at 11:32 PM, Jamie Olson jamie.f.ol...@gmail.comwrote: Is it possible to select columns not in the TRANSFORM(...) operation? I did not see anywhere in the language definition for transform. I'm looking for something comparable to LATERAL VIEW for UDTF but with TRANSFORM statements. Ex: Not (seemingly?) legal: FROM ( FROM xtab dat id, TRANSFORM (x, y) USING 'cat' ) map_out INSERT INTO xresult SELECT map_out.* Jamie Olson
Re: Hbase connection closed when query multiple complicated hql with hive+hbase integration
The exceptions seem to be another problem. They all happened on one node. And after the task attempts failed at that node, retried on other nodes and no exceptions. So that, the exception maybe have nothing to do with the performance issue. On Wed, Nov 7, 2012 at 11:07 AM, Cheng Su scarcer...@gmail.com wrote: Hi, all. I have a hive + hbase integration cluster. I met a performance issue. I executed some complicated hql, but only the first one is actually running. The rests are showed running on the job track web ui, but task attempt failed with exceptions below: java.io.IOException: java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@5117a20 closed at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:243) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:522) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.init(MapTask.java:197) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Unknown Source) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@5117a20 closed at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:794) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:782) at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:249) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:213) at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getRecordReader(HiveHBaseTableInputFormat.java:92) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:240) ... 9 more So actually the rests are queuing. I guess it might be because the count(?) of hbase handler is too small, which handle access request from hive. So that when a big job occupied all handles the rests have to wait until the handlers are released. Is my assumption right? and what settings should I tuning? Thanks. -- Regards, Cheng Su -- Regards, Cheng Su
Re: Alter table is giving error
Chunky, You should have run: msck repair table Table name; Sorry, I should have made it clear in my last reply. I have added an entry to Hive wiki for benefit of others: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Recoverpartitions Mark On Tue, Nov 6, 2012 at 9:55 PM, Chunky Gupta chunky.gu...@vizury.comwrote: Hi Mark, I didn't get any error. I ran this on hive console:- msck table Table_Name; It says Ok and showed the execution time as 1.050 sec. But when I checked partitions for table using show partitions Table_Name; It didn't show me any partitions. Thanks, Chunky. On Tue, Nov 6, 2012 at 10:38 PM, Mark Grover grover.markgro...@gmail.comwrote: Glad to hear, Chunky. Out of curiosity, what errors did you get when using msck? On Tue, Nov 6, 2012 at 5:14 AM, Chunky Gupta chunky.gu...@vizury.comwrote: Hi Mark, I tried msck, but it is not working for me. I have written a python script to partition the data individually. Thank you Edward, Mark and Dean. Chunky. On Mon, Nov 5, 2012 at 11:08 PM, Mark Grover grover.markgro...@gmail.com wrote: Chunky, I have used recover partitions command on EMR, and that worked fine. However, take a look at https://issues.apache.org/jira/browse/HIVE-874. Seems like msck command in Apache Hive does the same thing. Try it out and let us know it goes. Mark On Mon, Nov 5, 2012 at 7:56 AM, Edward Capriolo edlinuxg...@gmail.comwrote: Recover partitions should work the same way for different file systems. Edward On Mon, Nov 5, 2012 at 9:33 AM, Dean Wampler dean.wamp...@thinkbiganalytics.com wrote: Writing a script to add the external partitions individually is the only way I know of. Sent from my rotary phone. On Nov 5, 2012, at 8:19 AM, Chunky Gupta chunky.gu...@vizury.com wrote: Hi Dean, Actually I was having Hadoop and Hive cluster on EMR and I have S3 storage containing logs which updates daily and having partition with date(dt). And I was using this recover partition. Now I wanted to shift to EC2 and have my own Hadoop and Hive cluster. So, what is the alternate of using recover partition in this case, if you have any idea ? I found one way of individually partitioning all dates, so I have to write script for that to do so for all dates. Is there any easiest way other than this ? Thanks, Chunky On Mon, Nov 5, 2012 at 6:28 PM, Dean Wampler dean.wamp...@thinkbiganalytics.com wrote: The RECOVER PARTITIONS is an enhancement added by Amazon to their version of Hive. http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html shameless-plus Chapter 21 of Programming Hive discusses this feature and other aspects of using Hive in EMR. /shameless-plug dean On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta chunky.gu...@vizury.com wrote: Hi, I am having a cluster setup on EC2 with Hadoop version 0.20.2 and Hive version 0.8.1 (I configured everything) . I have created a table using :- CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW FORMAT DELIMITED FIELDS TERMINATED BY 'WWW' LOCATION 's3://my-location/data/'; Now I am trying to recover partition using :- ALTER TABLE XXX RECOVER PARTITIONS; but I am getting this error :- FAILED: Parse Error: line 1:12 cannot recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter table statement Doing same steps on a cluster setup on EMR with Hadoop version 1.0.3 and Hive version 0.8.1 (Configured by EMR), works fine. So is this a version issue or am I missing some configuration changes in EC2 setup ? I am not able to find exact solution for this problem on internet. Please help me. Thanks, Chunky. -- Dean Wampler, Ph.D. thinkbiganalytics.com +1-312-339-1330
Re: Alter table is giving error
Hi Mark, Sorry, I forgot to mention. I have also tried msck repair table Table name; and same output I got which I got from msck only. Do I need to do any other settings for this to work, because I have prepared Hadoop and Hive setup from start on EC2. Thanks, Chunky. On Wed, Nov 7, 2012 at 11:58 AM, Mark Grover grover.markgro...@gmail.comwrote: Chunky, You should have run: msck repair table Table name; Sorry, I should have made it clear in my last reply. I have added an entry to Hive wiki for benefit of others: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Recoverpartitions Mark On Tue, Nov 6, 2012 at 9:55 PM, Chunky Gupta chunky.gu...@vizury.comwrote: Hi Mark, I didn't get any error. I ran this on hive console:- msck table Table_Name; It says Ok and showed the execution time as 1.050 sec. But when I checked partitions for table using show partitions Table_Name; It didn't show me any partitions. Thanks, Chunky. On Tue, Nov 6, 2012 at 10:38 PM, Mark Grover grover.markgro...@gmail.com wrote: Glad to hear, Chunky. Out of curiosity, what errors did you get when using msck? On Tue, Nov 6, 2012 at 5:14 AM, Chunky Gupta chunky.gu...@vizury.comwrote: Hi Mark, I tried msck, but it is not working for me. I have written a python script to partition the data individually. Thank you Edward, Mark and Dean. Chunky. On Mon, Nov 5, 2012 at 11:08 PM, Mark Grover grover.markgro...@gmail.com wrote: Chunky, I have used recover partitions command on EMR, and that worked fine. However, take a look at https://issues.apache.org/jira/browse/HIVE-874. Seems like msck command in Apache Hive does the same thing. Try it out and let us know it goes. Mark On Mon, Nov 5, 2012 at 7:56 AM, Edward Capriolo edlinuxg...@gmail.com wrote: Recover partitions should work the same way for different file systems. Edward On Mon, Nov 5, 2012 at 9:33 AM, Dean Wampler dean.wamp...@thinkbiganalytics.com wrote: Writing a script to add the external partitions individually is the only way I know of. Sent from my rotary phone. On Nov 5, 2012, at 8:19 AM, Chunky Gupta chunky.gu...@vizury.com wrote: Hi Dean, Actually I was having Hadoop and Hive cluster on EMR and I have S3 storage containing logs which updates daily and having partition with date(dt). And I was using this recover partition. Now I wanted to shift to EC2 and have my own Hadoop and Hive cluster. So, what is the alternate of using recover partition in this case, if you have any idea ? I found one way of individually partitioning all dates, so I have to write script for that to do so for all dates. Is there any easiest way other than this ? Thanks, Chunky On Mon, Nov 5, 2012 at 6:28 PM, Dean Wampler dean.wamp...@thinkbiganalytics.com wrote: The RECOVER PARTITIONS is an enhancement added by Amazon to their version of Hive. http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html shameless-plus Chapter 21 of Programming Hive discusses this feature and other aspects of using Hive in EMR. /shameless-plug dean On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta chunky.gu...@vizury.com wrote: Hi, I am having a cluster setup on EC2 with Hadoop version 0.20.2 and Hive version 0.8.1 (I configured everything) . I have created a table using :- CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW FORMAT DELIMITED FIELDS TERMINATED BY 'WWW' LOCATION 's3://my-location/data/'; Now I am trying to recover partition using :- ALTER TABLE XXX RECOVER PARTITIONS; but I am getting this error :- FAILED: Parse Error: line 1:12 cannot recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter table statement Doing same steps on a cluster setup on EMR with Hadoop version 1.0.3 and Hive version 0.8.1 (Configured by EMR), works fine. So is this a version issue or am I missing some configuration changes in EC2 setup ? I am not able to find exact solution for this problem on internet. Please help me. Thanks, Chunky. -- Dean Wampler, Ph.D. thinkbiganalytics.com +1-312-339-1330