Hive import issue
Hi, I am a newbie to hive. When I am trying to import data to HBase via a table managed by Hive. I am getting following errors: mismatched input 'Timestamp' expecting Identifier in column specification mismatched input 'data' expecting Identifier in column specification Remvoing or renaming these columns to 'data_something' makes it working. Any idea why is it happening? And what are total of these keywords which cannot be used as column name? Any help will be greatly appreciated. Vivek Register for Impetus Webinar on 'Building Highly Scalable and Flexible SaaS Solutions' on Dec 10 (10:00 a.m. PT). Click http://www.impetus.com to know more. Follow us on www.twitter.com/impetuscalling. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
Re: Hive HBase intergration scan failing
Thanks for the info. Moreover how can we make sure that our regionservers are running with same Datanodes ( locality). Is there a way we can make sure? On Thu, Dec 9, 2010 at 11:09 PM, John Sichi jsi...@fb.com wrote: Try set hbase.client.scanner.caching=5000; Also, check to make sure that you are getting the expected locality so that mappers are running on the same nodes as the region servers they are scanning (assuming that you are running HBase and mapreduce on the same cluster). When I was testing this, I encountered this problem (but it may have been specific to our cluster configurations): https://issues.apache.org/jira/browse/HBASE-2535 JVS On Dec 9, 2010, at 10:46 PM, vlisovsky wrote: Hi Guys, Wonder if anybody could shed some light on how to reduce the load on HBase cluster when running a full scan. The need is to dump everything I have in HBase and into a Hive table. The HBase data size is around 500g. The job creates 9000 mappers, after about 1000 maps things go south every time.. If I run below insert it runs for about 30 minutes then starts bringing down HBase cluster after which region servers need to be restarted.. Wonder if there is a way to throttle it somehow or otherwise if there is any other method of getting structured data out? Any help is appreciated, Thanks, -Vitaly create external table hbase_linked_table ( mykeystring, infomapstring, string, ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,info:) TBLPROPERTIES (hbase.table.name = hbase_table2); set hive.exec.compress.output=true; set io.seqfile.compression.type=BLOCK; set mapred.output.compression.type=BLOCK; set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec; set mapred.reduce.tasks=40; set mapred.map.tasks=25; INSERT overwrite table tmp_hive_destination select * from hbase_linked_table;
Metastore compatibility
Is it safe to share a 0.7 metastore between 0.7 clients/servers and 0.5 clients/servers? Thanks.
Re: Documentation related to DB operations
On Fri, Dec 10, 2010 at 9:24 PM, Ashutosh Chauhan hashut...@apache.org wrote: It will really help to have the behavior documented for Database in Hive. I thought of doing it myself but then got stumped by location clause. Reading from ticket https://issues.apache.org/jira/browse/HIVE-675 it looks like one can specify the custom location. When I tried to do it, got ParseException. Then went looking in code and seems like user is not allowed to specify location. So, gave up on documenting. Instead of me writing buggy document and confuse other users better to have no document :) Ashutosh On Fri, Oct 29, 2010 at 08:16, Carl Steinbach c...@cloudera.com wrote: Hi Ashutosh, You're correct that this is currently not documented anywhere. I'll try to write something up in the next couple of days. Thanks. Carl On Thu, Oct 28, 2010 at 10:55 AM, Ashutosh Chauhan hashut...@apache.org wrote: I see that https://issues.apache.org/jira/browse/HIVE-675 is committed and Hive now has support for databases and related operations associated with it. But I am unable to find any user facing documentation for it. Can someone update the http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL with new grammar for DB related operations as well as example usages of those. Or is it already documented in some other wiki page ? Ashutosh In the jira issue the comments are not definitive. The best way to understand what is going on is to look at the test cases *.q. Usually the test cases exercise the code and are self documenting. Edward