Hive import issue

2010-12-10 Thread Vivek Mishra
Hi,

I am a newbie to hive.



When I am trying to import data to HBase via a table managed by Hive. I am 
getting following errors:

mismatched input 'Timestamp' expecting Identifier in column specification

mismatched input 'data' expecting Identifier in column specification



Remvoing or renaming these columns to 'data_something' makes it working.



Any idea why is it happening? And what are total of these keywords which cannot 
be used as column name? Any help will be greatly appreciated.



Vivek





Register for Impetus Webinar on 'Building Highly Scalable and Flexible SaaS 
Solutions' on Dec 10 (10:00 a.m. PT).

Click http://www.impetus.com to know more. Follow us on 
www.twitter.com/impetuscalling.

NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Re: Hive HBase intergration scan failing

2010-12-10 Thread vlisovsky
Thanks for the info. Moreover how can we make sure that our regionservers
are running with same Datanodes ( locality). Is there a way we can make
sure?

On Thu, Dec 9, 2010 at 11:09 PM, John Sichi jsi...@fb.com wrote:

 Try

 set hbase.client.scanner.caching=5000;

 Also, check to make sure that you are getting the expected locality so that
 mappers are running on the same nodes as the region servers they are
 scanning (assuming that you are running HBase and mapreduce on the same
 cluster).  When I was testing this, I encountered this problem (but it may
 have been specific to our cluster configurations):

 https://issues.apache.org/jira/browse/HBASE-2535

 JVS

 On Dec 9, 2010, at 10:46 PM, vlisovsky wrote:

 
  Hi Guys,
  Wonder if  anybody could shed some light on how to reduce the load on
 HBase cluster when running a full scan.
  The need is to dump everything I have in HBase and into a Hive table. The
 HBase data size is around 500g.
  The job creates 9000 mappers, after about 1000 maps things go south every
 time..
  If I run below insert it runs for about 30 minutes then starts bringing
 down HBase cluster after which region servers need to be restarted..
  Wonder if there is a way to throttle it somehow or otherwise if there is
 any other method of getting structured data out?
  Any help is appreciated,
  Thanks,
  -Vitaly
 
  create external table hbase_linked_table (
  mykeystring,
  infomapstring, string,
  )
  STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
  WITH
  SERDEPROPERTIES (hbase.columns.mapping = :key,info:)
  TBLPROPERTIES (hbase.table.name = hbase_table2);
 
  set hive.exec.compress.output=true;
  set io.seqfile.compression.type=BLOCK;
  set mapred.output.compression.type=BLOCK;
  set
 mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
 
  set mapred.reduce.tasks=40;
  set mapred.map.tasks=25;
 
  INSERT overwrite table tmp_hive_destination
  select * from hbase_linked_table;
 




Metastore compatibility

2010-12-10 Thread Steven Wong
Is it safe to share a 0.7 metastore between 0.7 clients/servers and 0.5 
clients/servers?

Thanks.



Re: Documentation related to DB operations

2010-12-10 Thread Edward Capriolo
On Fri, Dec 10, 2010 at 9:24 PM, Ashutosh Chauhan hashut...@apache.org wrote:
 It will really help to have the behavior documented for Database in
 Hive. I thought of doing it myself but then got stumped by location
 clause. Reading from ticket
 https://issues.apache.org/jira/browse/HIVE-675 it looks like one can
 specify the custom location. When I tried to do it, got
 ParseException. Then went looking in code and seems like user is not
 allowed to specify location. So, gave up on documenting. Instead of me
 writing buggy document and confuse other users better to have no
 document :)

 Ashutosh
 On Fri, Oct 29, 2010 at 08:16, Carl Steinbach c...@cloudera.com wrote:
 Hi Ashutosh,
 You're correct that this is currently not documented anywhere. I'll try to
 write something up in the next couple of days.
 Thanks.
 Carl

 On Thu, Oct 28, 2010 at 10:55 AM, Ashutosh Chauhan hashut...@apache.org
 wrote:

 I see that https://issues.apache.org/jira/browse/HIVE-675 is committed
 and Hive now has support for databases and related operations
 associated with it. But I am unable to find any user facing
 documentation for it. Can someone update the
 http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL with new grammar
 for DB related operations as well as example usages of those. Or is it
 already documented in some other wiki page ?

 Ashutosh




In the jira issue the comments are not definitive. The best way to
understand what is going on is to look at the test cases *.q. Usually
the test cases exercise the code and are self documenting.

Edward