RE: Which is the postgres version is compatible for hive-trunk..?

2012-11-06 Thread rohithsharma
Hi Lorenz, 

Thanks for you fast reply :-)

 Postgres 9x I personally wouldn't choose.
Can you please tell what is the problem using 9x version with hive.?

I tried with 9.0.7, it was successfully integrated and functionality also
fine. But when I changed to 9.1, basic show table in hive failed.

I was wondering what is the major change between 9.0.7 to 9.1.4.?


Regards
Rohith Sharma K S

-Original Message-
From: Alexander Lorenz [mailto:wget.n...@gmail.com] 
Sent: Tuesday, November 06, 2012 12:40 PM
To: user@hive.apache.org; rohithsharm...@huawei.com
Subject: Re: Which is the postgres version is compatible for hive-trunk..?

Hey Rohith,

last time I used psql was with postgresql-8.4-701.jdbc4.jar, and was working
great. But I guess all 8.x version should work. Postgres 9x I personally
wouldn't choose.

best,
 Alex

On Nov 6, 2012, at 5:29 AM, rohithsharma rohithsharm...@huawei.com wrote:

 Hi Guys,
 
 I am planning to use postgres as metastore with Hive.  Can anyone point me
 which is the postgres version is compatible with Hive..?
 
 
 
 Regards
 
 Rohith Sharma K S
 

--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF



Re: Hive compression with external table

2012-11-06 Thread Krishna Rao
Thanks for the reply. Compressed sequence files with compression might work.
However, it's not clear to me if it's possible to read Sequence files using
an external table.

On 5 November 2012 16:04, Edward Capriolo edlinuxg...@gmail.com wrote:

 Compression is a confusing issue. Sequence files that are in block
 format are always split table regardless of what compression for the
 block is chosen.The Programming Hive book has an entire section
 dedicated to the permutations of compression options.

 Edward
 On Mon, Nov 5, 2012 at 10:57 AM, Krishna Rao krishnanj...@gmail.com
 wrote:
  Hi all,
 
  I'm looking into finding a suitable format to store data in HDFS, so that
  it's available for processing by Hive. Ideally I would like to satisfy
 the
  following:
 
  1. store the data in a format that is readable by multiple Hadoop
 projects
  (eg. Pig, Mahout, etc.), not just Hive
  2. work with a Hive external table
  3. store data in a compressed format that is splittable
 
  (1) is a requirement because Hive isn't appropriate for all the problems
  that we want to throw at Hadoop.
 
  (2) is really more of a consequence of (1). Ideally we want the data
 stored
  in some open format that is compressed in HDFS.
  This way we can just point Hive, Pig, Mahout, etc at it depending on the
  problem.
 
  (3) is obviously so it plays well with Hadoop.
 
  Gzip is no good because it is not splittable. Snappy looked promising,
 but
  it is splittable only if used with a non-external Hive table.
  LZO also looked promising, but I wonder about whether it is future proof
  given the licencing issues surrounding it.
 
  So far, the only solution I could find that satisfies all the above
 seems to
  be bzip2 compression, but concerns about its performance make me wary
 about
  choosing it.
 
  Is bzip2 the only option I have? Or have I missed some other compression
  option?
 
  Cheers,
 
  Krishna



Re: Which is the postgres version is compatible for hive-trunk..?

2012-11-06 Thread Alexander Lorenz
Hi,

I don't know what exactly was the major changes between 9.0.x and 9.1.x, but 
when you've a good experience with 9.0.x, use it :)
For 9.1.x, it would be great to get a log or stack trace to see whats going 
wrong there.

best,
 Alex

On Nov 6, 2012, at 9:53 AM, rohithsharma rohithsharm...@huawei.com wrote:

 Hi Lorenz, 
 
 Thanks for you fast reply :-)
 
 Postgres 9x I personally wouldn't choose.
 Can you please tell what is the problem using 9x version with hive.?
 
 I tried with 9.0.7, it was successfully integrated and functionality also
 fine. But when I changed to 9.1, basic show table in hive failed.
 
 I was wondering what is the major change between 9.0.7 to 9.1.4.?
 
 
 Regards
 Rohith Sharma K S
 
 -Original Message-
 From: Alexander Lorenz [mailto:wget.n...@gmail.com] 
 Sent: Tuesday, November 06, 2012 12:40 PM
 To: user@hive.apache.org; rohithsharm...@huawei.com
 Subject: Re: Which is the postgres version is compatible for hive-trunk..?
 
 Hey Rohith,
 
 last time I used psql was with postgresql-8.4-701.jdbc4.jar, and was working
 great. But I guess all 8.x version should work. Postgres 9x I personally
 wouldn't choose.
 
 best,
 Alex
 
 On Nov 6, 2012, at 5:29 AM, rohithsharma rohithsharm...@huawei.com wrote:
 
 Hi Guys,
 
 I am planning to use postgres as metastore with Hive.  Can anyone point me
 which is the postgres version is compatible with Hive..?
 
 
 
 Regards
 
 Rohith Sharma K S
 
 
 --
 Alexander Alten-Lorenz
 http://mapredit.blogspot.com
 German Hadoop LinkedIn Group: http://goo.gl/N8pCF
 

--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF



Re: Which is the postgres version is compatible for hive-trunk..?

2012-11-06 Thread t...@postgresql.hk
Hi,

Some major differences between these versions are:
PostgreSQL 9.0 - Streaming Replication and Hot Standby  (latest is 9.0.10)
PostgreSQL 9.1 - Synchronous Replication (latest is 9.1.6)
PostgreSQL 9.2 - Cascading Replication (latest is 9.2.1) 
PostgreSQL 9.3 - (not yet release version: Multiple-master replication)

Also here is a link about PostgreSQL Feature Matrix per version: 
http://www.emblocsoft.com/PostgreSQL/PostgreSQL-Feature-Matrix

Could you share what issues you have about '9.1, basic show table in hive 
failed' so we might be of help.

With Regards
TC

On 6 Nov 2012, at 6:03 PM, Alexander Lorenz wrote:

 Hi,
 
 I don't know what exactly was the major changes between 9.0.x and 9.1.x, but 
 when you've a good experience with 9.0.x, use it :)
 For 9.1.x, it would be great to get a log or stack trace to see whats going 
 wrong there.
 
 best,
 Alex
 
 On Nov 6, 2012, at 9:53 AM, rohithsharma rohithsharm...@huawei.com wrote:
 
 Hi Lorenz, 
 
 Thanks for you fast reply :-)
 
 Postgres 9x I personally wouldn't choose.
 Can you please tell what is the problem using 9x version with hive.?
 
 I tried with 9.0.7, it was successfully integrated and functionality also
 fine. But when I changed to 9.1, basic show table in hive failed.
 
 I was wondering what is the major change between 9.0.7 to 9.1.4.?
 
 
 Regards
 Rohith Sharma K S
 
 -Original Message-
 From: Alexander Lorenz [mailto:wget.n...@gmail.com] 
 Sent: Tuesday, November 06, 2012 12:40 PM
 To: user@hive.apache.org; rohithsharm...@huawei.com
 Subject: Re: Which is the postgres version is compatible for hive-trunk..?
 
 Hey Rohith,
 
 last time I used psql was with postgresql-8.4-701.jdbc4.jar, and was working
 great. But I guess all 8.x version should work. Postgres 9x I personally
 wouldn't choose.
 
 best,
 Alex
 
 On Nov 6, 2012, at 5:29 AM, rohithsharma rohithsharm...@huawei.com wrote:
 
 Hi Guys,
 
 I am planning to use postgres as metastore with Hive.  Can anyone point me
 which is the postgres version is compatible with Hive..?
 
 
 
 Regards
 
 Rohith Sharma K S
 
 
 --
 Alexander Alten-Lorenz
 http://mapredit.blogspot.com
 German Hadoop LinkedIn Group: http://goo.gl/N8pCF
 
 
 --
 Alexander Alten-Lorenz
 http://mapredit.blogspot.com
 German Hadoop LinkedIn Group: http://goo.gl/N8pCF
 



Re: Alter table is giving error

2012-11-06 Thread Chunky Gupta
Hi Mark,
I tried msck, but it is not working for me. I have written a python script
to partition the data individually.

Thank you Edward, Mark and Dean.
Chunky.

On Mon, Nov 5, 2012 at 11:08 PM, Mark Grover grover.markgro...@gmail.comwrote:

 Chunky,
 I have used recover partitions command on EMR, and that worked fine.

 However, take a look at https://issues.apache.org/jira/browse/HIVE-874. Seems
 like msck command in Apache Hive does the same thing. Try it out and let us
 know it goes.

 Mark

 On Mon, Nov 5, 2012 at 7:56 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

 Recover partitions should work the same way for different file systems.

 Edward

 On Mon, Nov 5, 2012 at 9:33 AM, Dean Wampler
 dean.wamp...@thinkbiganalytics.com wrote:
  Writing a script to add the external partitions individually is the
 only way
  I know of.
 
  Sent from my rotary phone.
 
 
  On Nov 5, 2012, at 8:19 AM, Chunky Gupta chunky.gu...@vizury.com
 wrote:
 
  Hi Dean,
 
  Actually I was having Hadoop and Hive cluster on EMR and I have S3
 storage
  containing logs which updates daily and having partition with date(dt).
 And
  I was using this recover partition.
  Now I wanted to shift to EC2 and have my own Hadoop and Hive cluster.
 So,
  what is the alternate of using recover partition in this case, if you
 have
  any idea ?
  I found one way of individually partitioning all dates, so I have to
 write
  script for that to do so for all dates. Is there any easiest way other
 than
  this ?
 
  Thanks,
  Chunky
 
 
 
  On Mon, Nov 5, 2012 at 6:28 PM, Dean Wampler
  dean.wamp...@thinkbiganalytics.com wrote:
 
  The RECOVER PARTITIONS is an enhancement added by Amazon to their
 version
  of Hive.
 
 
 
 http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html
 
  shameless-plus
Chapter 21 of Programming Hive discusses this feature and other
 aspects
  of using Hive in EMR.
  /shameless-plug
 
  dean
 
 
  On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta chunky.gu...@vizury.com
  wrote:
 
  Hi,
 
  I am having a cluster setup on EC2 with Hadoop version 0.20.2 and Hive
  version 0.8.1 (I configured everything) . I have created a table
 using :-
 
  CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW FORMAT
  DELIMITED FIELDS TERMINATED BY 'WWW' LOCATION
 's3://my-location/data/';
 
  Now I am trying to recover partition using :-
 
  ALTER TABLE XXX RECOVER PARTITIONS;
 
  but I am getting this error :- FAILED: Parse Error: line 1:12 cannot
  recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter table
 statement
 
  Doing same steps on a cluster setup on EMR with Hadoop version 1.0.3
 and
  Hive version 0.8.1 (Configured by EMR), works fine.
 
  So is this a version issue or am I missing some configuration changes
 in
  EC2 setup ?
  I am not able to find exact solution for this problem on internet.
 Please
  help me.
 
  Thanks,
  Chunky.
 
 
 
 
 
 
  --
  Dean Wampler, Ph.D.
  thinkbiganalytics.com
  +1-312-339-1330
 
 
 





Re: Alter table is giving error

2012-11-06 Thread Mark Grover
Glad to hear, Chunky.

Out of curiosity, what errors did you get when using msck?

On Tue, Nov 6, 2012 at 5:14 AM, Chunky Gupta chunky.gu...@vizury.comwrote:

 Hi Mark,
 I tried msck, but it is not working for me. I have written a python script
 to partition the data individually.

 Thank you Edward, Mark and Dean.
 Chunky.


 On Mon, Nov 5, 2012 at 11:08 PM, Mark Grover 
 grover.markgro...@gmail.comwrote:

 Chunky,
 I have used recover partitions command on EMR, and that worked fine.

 However, take a look at https://issues.apache.org/jira/browse/HIVE-874. Seems
 like msck command in Apache Hive does the same thing. Try it out and let us
 know it goes.

 Mark

 On Mon, Nov 5, 2012 at 7:56 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

 Recover partitions should work the same way for different file systems.

 Edward

 On Mon, Nov 5, 2012 at 9:33 AM, Dean Wampler
 dean.wamp...@thinkbiganalytics.com wrote:
  Writing a script to add the external partitions individually is the
 only way
  I know of.
 
  Sent from my rotary phone.
 
 
  On Nov 5, 2012, at 8:19 AM, Chunky Gupta chunky.gu...@vizury.com
 wrote:
 
  Hi Dean,
 
  Actually I was having Hadoop and Hive cluster on EMR and I have S3
 storage
  containing logs which updates daily and having partition with
 date(dt). And
  I was using this recover partition.
  Now I wanted to shift to EC2 and have my own Hadoop and Hive cluster.
 So,
  what is the alternate of using recover partition in this case, if you
 have
  any idea ?
  I found one way of individually partitioning all dates, so I have to
 write
  script for that to do so for all dates. Is there any easiest way other
 than
  this ?
 
  Thanks,
  Chunky
 
 
 
  On Mon, Nov 5, 2012 at 6:28 PM, Dean Wampler
  dean.wamp...@thinkbiganalytics.com wrote:
 
  The RECOVER PARTITIONS is an enhancement added by Amazon to their
 version
  of Hive.
 
 
 
 http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html
 
  shameless-plus
Chapter 21 of Programming Hive discusses this feature and other
 aspects
  of using Hive in EMR.
  /shameless-plug
 
  dean
 
 
  On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta chunky.gu...@vizury.com
 
  wrote:
 
  Hi,
 
  I am having a cluster setup on EC2 with Hadoop version 0.20.2 and
 Hive
  version 0.8.1 (I configured everything) . I have created a table
 using :-
 
  CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW FORMAT
  DELIMITED FIELDS TERMINATED BY 'WWW' LOCATION
 's3://my-location/data/';
 
  Now I am trying to recover partition using :-
 
  ALTER TABLE XXX RECOVER PARTITIONS;
 
  but I am getting this error :- FAILED: Parse Error: line 1:12 cannot
  recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter table
 statement
 
  Doing same steps on a cluster setup on EMR with Hadoop version 1.0.3
 and
  Hive version 0.8.1 (Configured by EMR), works fine.
 
  So is this a version issue or am I missing some configuration
 changes in
  EC2 setup ?
  I am not able to find exact solution for this problem on internet.
 Please
  help me.
 
  Thanks,
  Chunky.
 
 
 
 
 
 
  --
  Dean Wampler, Ph.D.
  thinkbiganalytics.com
  +1-312-339-1330
 
 
 






Re: Hive compression with external table

2012-11-06 Thread Bejoy KS
Hi Krishna

Sequence Files + Snappy compressed would be my recommendation as well. It can 
be processed by managed as well as external tables.

There is no difference in storage formats for managed and external tables. 


Also this can be consumed by mapred or pig directly.


Regards
Bejoy KS

Sent from handheld, please excuse typos.

-Original Message-
From: Krishna Rao krishnanj...@gmail.com
Date: Tue, 6 Nov 2012 09:50:33 
To: user@hive.apache.org
Reply-To: user@hive.apache.org
Subject: Re: Hive compression with external table

Thanks for the reply. Compressed sequence files with compression might work.
However, it's not clear to me if it's possible to read Sequence files using
an external table.

On 5 November 2012 16:04, Edward Capriolo edlinuxg...@gmail.com wrote:

 Compression is a confusing issue. Sequence files that are in block
 format are always split table regardless of what compression for the
 block is chosen.The Programming Hive book has an entire section
 dedicated to the permutations of compression options.

 Edward
 On Mon, Nov 5, 2012 at 10:57 AM, Krishna Rao krishnanj...@gmail.com
 wrote:
  Hi all,
 
  I'm looking into finding a suitable format to store data in HDFS, so that
  it's available for processing by Hive. Ideally I would like to satisfy
 the
  following:
 
  1. store the data in a format that is readable by multiple Hadoop
 projects
  (eg. Pig, Mahout, etc.), not just Hive
  2. work with a Hive external table
  3. store data in a compressed format that is splittable
 
  (1) is a requirement because Hive isn't appropriate for all the problems
  that we want to throw at Hadoop.
 
  (2) is really more of a consequence of (1). Ideally we want the data
 stored
  in some open format that is compressed in HDFS.
  This way we can just point Hive, Pig, Mahout, etc at it depending on the
  problem.
 
  (3) is obviously so it plays well with Hadoop.
 
  Gzip is no good because it is not splittable. Snappy looked promising,
 but
  it is splittable only if used with a non-external Hive table.
  LZO also looked promising, but I wonder about whether it is future proof
  given the licencing issues surrounding it.
 
  So far, the only solution I could find that satisfies all the above
 seems to
  be bzip2 compression, but concerns about its performance make me wary
 about
  choosing it.
 
  Is bzip2 the only option I have? Or have I missed some other compression
  option?
 
  Cheers,
 
  Krishna




回复: hive integrate with hbase, map to existed hbase table report column family not exist

2012-11-06 Thread Chris Gong
i got the reason,  the column mapping section can't have any white space, 
including \r\n




Chris Gong

发件人: Chris Gong
发送时间: 2012-11-06 10:56
收件人: user-hive
主题: hive integrate with hbase, map to existed hbase table report column family 
not exist
hi all:
now, I'm map to an existed hbase table, i got the following infomation as:

FAILED: Error in metadata: MetaException(message:Column Family
data is not defined in hbase table df_money_files)
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask

   my hive QL is 

create external table hbase_money_files (rowkey string, 
user_no string,
mon int,
mon_sn int,
group_no int,
sn int,
write_sect_no string,
write_sn int,
business_place_code string,
power_no int,
trans_group  int,
price_code string,
ts_flag string,
elec_type_code string,
trade_type_code string,
ms_mode string,
user_ms_type string,
write_power double,
chg_powerdouble,
add_powerdouble,
kb_power double,
share_power  double,
total_power  double,
total_money  double,
num_moneydouble,
add_money1   double,
add_money2   double,
add_money3   double,
add_money4   double,
add_money5   double,
add_money6   double,
add_money7   double,
add_money8   double,
add_money9   double,
add_money10  double,
rp_power double,
rp_money double,
should_money double,
create_date  string,
creator  string,
warrant_no   int,
line_codestring,
trans_no string,
add_taxflag  string,
write_date   string,
compute_date string,
calculator_id string,
statusstring,
user_type1string,
rela_user_no  string,
part_sn   int,
have_ext string,
id_fragment  string,
check_date   string,
check_manstring,
start_date  string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'  
WITH SERDEPROPERTIES (hbase.columns.mapping = :key,data:user_no,
data:mon,
data:mon_sn,
data:group_no,
data:sn,
data:write_sect_no,
data:write_sn,
data:business_place_code,
data:power_no,
data:trans_group,
data:price_code,
data:ts_flag,
data:elec_type_code,
data:trade_type_code,
data:ms_mode,
data:user_ms_type,
data:write_power,
data:chg_power,
data:add_power,
data:kb_power,
data:share_power,
data:total_power,
data:total_money,
data:num_money,
data:add_money1,
data:add_money2,
data:add_money3,
data:add_money4,
data:add_money5,
data:add_money6,
data:add_money7,
data:add_money8,
data:add_money9,
data:add_money10,
data:rp_power,
data:rp_money,
data:should_money,
data:create_date,
data:creator,
data:warrant_no,
data:line_code,
data:trans_no,
data:add_taxflag,
data:write_date,
data:compute_date,
data:calculator_id,
data:status,
data:user_type1,
data:rela_user_no,
data:part_sn,
data:have_ext,
data:id_fragment,
data:check_date,
data:check_man,
data:start_date)  
TBLPROPERTIES(hbase.table.name = df_money_files);  

however the data column does exist! when i describe the table in hbase shell, 
it reported:

hbase(main):001:0 describe 'df_money_files'
DESCRIPTION  ENABLED
 {NAME = 'df_money_files', FAMILIES = [{NAME = 'd true
 ata', BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '
 0', VERSIONS = '3', COMPRESSION = 'NONE', MIN_VER
 SIONS = '0', TTL = '2147483647', BLOCKSIZE = '65
 536', IN_MEMORY = 'false', BLOCKCACHE = 'true'}]}
1 row(s) in 0.8470 seconds

i am confused now, anyone can give some infomation?



Chris Gong

Re: ClassNotFoundException when use hive java client of hive + hbase integration

2012-11-06 Thread Cheng Su
This doesn't work. In CLI mode you could export the environment
variable to avoid add jar every time.
I did this, but still encounter the error when I access from java client.
And I can't even specify the --auxpath param when you start a hive
thrift service.
So at least in my situation, I have to add jar by myself.

On Wed, Nov 7, 2012 at 12:31 AM, kulkarni.swar...@gmail.com
kulkarni.swar...@gmail.com wrote:
 FWIW, you can also drop all your needed jars (including the hbase and
 zookeeper ones) in a folder and then set this property in your hive-env.sh.

 export HIVE_AUX_JARS_PATH = path to the folder

 This way you need not add them manually everytime.


 On Mon, Nov 5, 2012 at 9:18 PM, Cheng Su scarcer...@gmail.com wrote:

 Mark, thank you so much for your suggestion.

 Although I've already add necessary jars to my hive aux path, thus I
 can execute my sql in hive CLI mode without getting any error.
 But when I use a java client to access the tables through the thrift
 service, I need to add these jars manually.
 I execute the ADD JAR .jar sql and the problem is solved!

 Thank you again!

 On Tue, Nov 6, 2012 at 9:03 AM, Mark Grover grover.markgro...@gmail.com
 wrote:
  Cheng,
  You will have to add the appropriate HBase related jars to your class
  path.
 
  You can do that by running add jar command(s) or put it in aux_lib.
  See
  this thread for reference:
 
  http://mail-archives.apache.org/mod_mbox/hive-user/201103.mbox/%3caanlktingqlgknqmizgoi+szfnexgcat8caqtovf8j...@mail.gmail.com%3E
 
  Mark
 
 
  On Mon, Nov 5, 2012 at 6:53 AM, Cheng Su scarcer...@gmail.com wrote:
 
  Hi, all. I have a hive+hbase integration cluster.
 
  When I try to execute query through the java client of hive, sometimes
  a ClassNotFoundException happens.
 
  My java code :
 
  final Connection conn = DriverManager.getConnection(URL);
  final ResultSet rs = conn.executeQuery(SELECT count(*) FROM
  test_table WHERE (source = '0' AND ur_createtime BETWEEN
  '2012103100' AND '20121031235959'));
 
  I can execute the sql:SELECT count(*) FROM test_table WHERE (source =
  '0' AND ur_createtime BETWEEN '2012103100' AND '20121031235959')
  in hive cli mode, and get the query result, so there is no error in my
  sql.
 
  The client side exception:
 
  Caused by: java.sql.SQLException: Query returned non-zero code: 9,
  cause: FAILED: Execution Error, return code 2 from
  org.apache.hadoop.hive.ql.exec.MapRedTask
  at
 
  org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:189)
  ... 23 more
 
  The server side exception(hadoop-jobtracker):
 
  2012-11-05 18:55:39,443 INFO org.apache.hadoop.mapred.TaskInProgress:
  Error from attempt_201210301133_0112_m_00_3: java.io.IOException:
  Cannot create an instance of InputSplit class =
 
 
  org.apache.hadoop.hive.hbase.HBaseSplit:org.apache.hadoop.hive.hbase.HBaseSplit
  at
 
  org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:146)
  at
 
  org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
  at
 
  org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
  at
  org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:396)
  at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
  at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Unknown Source)
  at
 
  org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
  at org.apache.hadoop.mapred.Child.main(Child.java:249)
  Caused by: java.lang.ClassNotFoundException:
  org.apache.hadoop.hive.hbase.HBaseSplit
  at java.net.URLClassLoader$1.run(Unknown Source)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(Unknown Source)
  at java.lang.ClassLoader.loadClass(Unknown Source)
  at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
  at java.lang.ClassLoader.loadClass(Unknown Source)
  at java.lang.Class.forName0(Native Method)
  at java.lang.Class.forName(Unknown Source)
  at
 
  org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:819)
  at
 
  org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:143)
  ... 10 more
 
 
  My hive-env.sh
 
  export
 
  HIVE_AUX_JARS_PATH=/data/install/hive-0.9.0/lib/hive-hbase-handler-0.9.0.jar,/data/install/hive-0.9.0/lib/hbase-0.92.0.jar,/data/install/hive-0.9.0/lib/zookeeper-3.4.2.jar
 
 
  My hive-site.xml
 
  property
  namehive.zookeeper.quorum/name
  valuehadoop01,hadoop02,hadoop03/value
  descriptionThe list of zookeeper servers to talk to. This is
  only needed for 

Re: hive integrate with hbase, map to existed hbase table report column family not exist

2012-11-06 Thread Mark Grover
Indeed. https://issues.apache.org/jira/browse/HIVE-3243

Sorry you found out about it the hard way!

On Tue, Nov 6, 2012 at 5:46 PM, Chris Gong fellowt...@163.com wrote:

 **
 i got the reason,  the column mapping section can't have any white space,
 including \r\n

 --
 Chris Gong

  *发件人:* Chris Gong fellowt...@163.com
 *发送时间:* 2012-11-06 10:56
 *收件人:* user-hive user@hive.apache.org
 *主题:* hive integrate with hbase, map to existed hbase table report column
 family not exist
  hi all:
 now, I'm map to an existed hbase table, i got the following infomation
 as:


 FAILED: Error in metadata: MetaException(message:Column Family
 data is not defined in hbase table df_money_files)

 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.DDLTask


my hive QL is


 create external table hbase_money_files (rowkey string,
 user_no string,
 mon int,
 mon_sn int,
 group_no int,
 sn int,
 write_sect_no string,
 write_sn int,
 business_place_code string,
 power_no int,
 trans_group  int,
 price_code string,
 ts_flag string,
 elec_type_code string,
 trade_type_code string,
 ms_mode string,
 user_ms_type string,
 write_power double,
 chg_powerdouble,
 add_powerdouble,
 kb_power double,
 share_power  double,
 total_power  double,
 total_money  double,
 num_moneydouble,
 add_money1   double,
 add_money2   double,
 add_money3   double,
 add_money4   double,
 add_money5   double,
 add_money6   double,
 add_money7   double,
 add_money8   double,
 add_money9   double,
 add_money10  double,
 rp_power double,
 rp_money double,
 should_money double,
 create_date  string,
 creator  string,
 warrant_no   int,
 line_codestring,
 trans_no string,
 add_taxflag  string,
 write_date   string,
 compute_date string,
 calculator_id string,
 statusstring,
 user_type1string,
 rela_user_no  string,
 part_sn   int,
 have_ext string,
 id_fragment  string,
 check_date   string,
 check_manstring,
 start_date  string)
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
 WITH SERDEPROPERTIES (hbase.columns.mapping = :key,data:user_no,
 data:mon,
 data:mon_sn,
 data:group_no,
 data:sn,
 data:write_sect_no,
 data:write_sn,
 data:business_place_code,
 data:power_no,
 data:trans_group,
 data:price_code,
 data:ts_flag,
 data:elec_type_code,
 data:trade_type_code,
 data:ms_mode,
 data:user_ms_type,
 data:write_power,
 data:chg_power,
 data:add_power,
 data:kb_power,
 data:share_power,
 data:total_power,
 data:total_money,
 data:num_money,
 data:add_money1,
 data:add_money2,
 data:add_money3,
 data:add_money4,
 data:add_money5,
 data:add_money6,
 data:add_money7,
 data:add_money8,
 data:add_money9,
 data:add_money10,
 data:rp_power,
 data:rp_money,
 data:should_money,
 data:create_date,
 data:creator,
 data:warrant_no,
 data:line_code,
 data:trans_no,
 data:add_taxflag,
 data:write_date,
 data:compute_date,
 data:calculator_id,
 data:status,
 data:user_type1,
 data:rela_user_no,
 data:part_sn,
 data:have_ext,
 data:id_fragment,
 data:check_date,
 data:check_man,
 data:start_date)
 TBLPROPERTIES(hbase.table.name = df_money_files);

 however the data column does exist! when i describe the table in hbase
 shell, it reported:


  hbase(main):001:0 describe 'df_money_files'
 DESCRIPTION  ENABLED
  {NAME = 'df_money_files', FAMILIES = [{NAME = 'd true
  ata', BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '
  0', VERSIONS = '3', COMPRESSION = 'NONE', MIN_VER
  SIONS = '0', TTL = '2147483647', BLOCKSIZE = '65
  536', IN_MEMORY = 'false', BLOCKCACHE = 'true'}]}
 1 row(s) in 0.8470 seconds

 i am confused now, anyone can give some infomation?
 --
 Chris Gong



RE: Which is the postgres version is compatible for hive-trunk..?

2012-11-06 Thread rohithsharma
Thanks for you reply.. I will use 9.0.x version.

-Original Message-
From: Alexander Lorenz [mailto:wget.n...@gmail.com] 
Sent: Tuesday, November 06, 2012 3:34 PM
To: rohithsharm...@huawei.com
Cc: user@hive.apache.org
Subject: Re: Which is the postgres version is compatible for hive-trunk..?

Hi,

I don't know what exactly was the major changes between 9.0.x and 9.1.x, but
when you've a good experience with 9.0.x, use it :)
For 9.1.x, it would be great to get a log or stack trace to see whats going
wrong there.

best,
 Alex

On Nov 6, 2012, at 9:53 AM, rohithsharma rohithsharm...@huawei.com wrote:

 Hi Lorenz, 
 
 Thanks for you fast reply :-)
 
 Postgres 9x I personally wouldn't choose.
 Can you please tell what is the problem using 9x version with hive.?
 
 I tried with 9.0.7, it was successfully integrated and functionality also
 fine. But when I changed to 9.1, basic show table in hive failed.
 
 I was wondering what is the major change between 9.0.7 to 9.1.4.?
 
 
 Regards
 Rohith Sharma K S
 
 -Original Message-
 From: Alexander Lorenz [mailto:wget.n...@gmail.com] 
 Sent: Tuesday, November 06, 2012 12:40 PM
 To: user@hive.apache.org; rohithsharm...@huawei.com
 Subject: Re: Which is the postgres version is compatible for hive-trunk..?
 
 Hey Rohith,
 
 last time I used psql was with postgresql-8.4-701.jdbc4.jar, and was
working
 great. But I guess all 8.x version should work. Postgres 9x I personally
 wouldn't choose.
 
 best,
 Alex
 
 On Nov 6, 2012, at 5:29 AM, rohithsharma rohithsharm...@huawei.com
wrote:
 
 Hi Guys,
 
 I am planning to use postgres as metastore with Hive.  Can anyone point
me
 which is the postgres version is compatible with Hive..?
 
 
 
 Regards
 
 Rohith Sharma K S
 
 
 --
 Alexander Alten-Lorenz
 http://mapredit.blogspot.com
 German Hadoop LinkedIn Group: http://goo.gl/N8pCF
 

--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF



Re: TRANSFORM + LATERAL VIEW?

2012-11-06 Thread Mark Grover
Jamie,
Not that I know of.

Assuming you will be using LATERAL VIEW for exploding the data, I can think
of 2 options at the top of my head:
1. Pass 'id' column to your transform script. You will have to take care of
the exploding data in your transform script. It would no longer be a simple
'cat'.
2. Return an array of entries from your Transform script. Consequently, use
explode/lateral view on this array to achieve the desired result. The
transform script will no longer be a cat, in this case too.

Hope that helps. Let us know how it goes.

Mark

On Mon, Nov 5, 2012 at 11:32 PM, Jamie Olson jamie.f.ol...@gmail.comwrote:

 Is it possible to select columns not in the TRANSFORM(...) operation?  I
 did not see anywhere in the language definition for transform.  I'm looking
 for something comparable to LATERAL VIEW for UDTF but with TRANSFORM
 statements.

 Ex: Not (seemingly?) legal:
 FROM (
 FROM xtab dat
 id, TRANSFORM (x, y)
 USING 'cat'
 ) map_out
 INSERT INTO xresult SELECT map_out.*


 Jamie Olson




Re: Hbase connection closed when query multiple complicated hql with hive+hbase integration

2012-11-06 Thread Cheng Su
The exceptions seem to be another problem.
They all happened on one node.
And after the task attempts failed at that node,
retried on other nodes and no exceptions.
So that, the exception maybe have nothing to do with the performance issue.

On Wed, Nov 7, 2012 at 11:07 AM, Cheng Su scarcer...@gmail.com wrote:
 Hi, all.

 I have a hive + hbase integration cluster. I met a performance issue.
 I executed some complicated hql, but only the first one is actually running.
 The rests are showed running on the job track web ui, but task
 attempt failed with exceptions below:

 java.io.IOException: java.io.IOException:
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@5117a20
 closed
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
 at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:243)
 at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:522)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.init(MapTask.java:197)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Unknown Source)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
 at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.IOException:
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@5117a20
 closed
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:794)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:782)
 at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:249)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:213)
 at 
 org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getRecordReader(HiveHBaseTableInputFormat.java:92)
 at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:240)
 ... 9 more

 So actually the rests are queuing.

 I guess it might be because the count(?) of hbase handler is too
 small,  which handle access request from hive.
 So that when a big job occupied all handles the rests have to wait
 until the handlers are released.

 Is my assumption right? and what settings should I tuning?

 Thanks.

 --

 Regards,
 Cheng Su



-- 

Regards,
Cheng Su


Re: Alter table is giving error

2012-11-06 Thread Mark Grover
Chunky,
You should have run:
msck repair table Table name;

Sorry, I should have made it clear in my last reply. I have added an entry
to Hive wiki for benefit of others:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Recoverpartitions

Mark


On Tue, Nov 6, 2012 at 9:55 PM, Chunky Gupta chunky.gu...@vizury.comwrote:

 Hi Mark,
 I didn't get any error.
 I ran this on hive console:-
  msck table Table_Name;
 It says Ok and showed the execution time as 1.050 sec.
 But when I checked partitions for table using
   show partitions Table_Name;
 It didn't show me any partitions.

 Thanks,
 Chunky.


 On Tue, Nov 6, 2012 at 10:38 PM, Mark Grover 
 grover.markgro...@gmail.comwrote:

 Glad to hear, Chunky.

 Out of curiosity, what errors did you get when using msck?


 On Tue, Nov 6, 2012 at 5:14 AM, Chunky Gupta chunky.gu...@vizury.comwrote:

 Hi Mark,
 I tried msck, but it is not working for me. I have written a python
 script to partition the data individually.

 Thank you Edward, Mark and Dean.
 Chunky.


 On Mon, Nov 5, 2012 at 11:08 PM, Mark Grover 
 grover.markgro...@gmail.com wrote:

 Chunky,
 I have used recover partitions command on EMR, and that worked fine.

 However, take a look at https://issues.apache.org/jira/browse/HIVE-874. 
 Seems
 like msck command in Apache Hive does the same thing. Try it out and let us
 know it goes.

 Mark

 On Mon, Nov 5, 2012 at 7:56 AM, Edward Capriolo 
 edlinuxg...@gmail.comwrote:

 Recover partitions should work the same way for different file systems.

 Edward

 On Mon, Nov 5, 2012 at 9:33 AM, Dean Wampler
 dean.wamp...@thinkbiganalytics.com wrote:
  Writing a script to add the external partitions individually is the
 only way
  I know of.
 
  Sent from my rotary phone.
 
 
  On Nov 5, 2012, at 8:19 AM, Chunky Gupta chunky.gu...@vizury.com
 wrote:
 
  Hi Dean,
 
  Actually I was having Hadoop and Hive cluster on EMR and I have S3
 storage
  containing logs which updates daily and having partition with
 date(dt). And
  I was using this recover partition.
  Now I wanted to shift to EC2 and have my own Hadoop and Hive
 cluster. So,
  what is the alternate of using recover partition in this case, if
 you have
  any idea ?
  I found one way of individually partitioning all dates, so I have to
 write
  script for that to do so for all dates. Is there any easiest way
 other than
  this ?
 
  Thanks,
  Chunky
 
 
 
  On Mon, Nov 5, 2012 at 6:28 PM, Dean Wampler
  dean.wamp...@thinkbiganalytics.com wrote:
 
  The RECOVER PARTITIONS is an enhancement added by Amazon to their
 version
  of Hive.
 
 
 
 http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html
 
  shameless-plus
Chapter 21 of Programming Hive discusses this feature and other
 aspects
  of using Hive in EMR.
  /shameless-plug
 
  dean
 
 
  On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta 
 chunky.gu...@vizury.com
  wrote:
 
  Hi,
 
  I am having a cluster setup on EC2 with Hadoop version 0.20.2 and
 Hive
  version 0.8.1 (I configured everything) . I have created a table
 using :-
 
  CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW FORMAT
  DELIMITED FIELDS TERMINATED BY 'WWW' LOCATION
 's3://my-location/data/';
 
  Now I am trying to recover partition using :-
 
  ALTER TABLE XXX RECOVER PARTITIONS;
 
  but I am getting this error :- FAILED: Parse Error: line 1:12
 cannot
  recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter table
 statement
 
  Doing same steps on a cluster setup on EMR with Hadoop version
 1.0.3 and
  Hive version 0.8.1 (Configured by EMR), works fine.
 
  So is this a version issue or am I missing some configuration
 changes in
  EC2 setup ?
  I am not able to find exact solution for this problem on internet.
 Please
  help me.
 
  Thanks,
  Chunky.
 
 
 
 
 
 
  --
  Dean Wampler, Ph.D.
  thinkbiganalytics.com
  +1-312-339-1330
 
 
 








Re: Alter table is giving error

2012-11-06 Thread Chunky Gupta
Hi Mark,
Sorry, I forgot to mention. I have also tried
msck repair table Table name;
and same output I got which I got from msck only.
Do I need to do any other settings for this to work, because I have
prepared Hadoop and Hive setup from start on EC2.

Thanks,
Chunky.



On Wed, Nov 7, 2012 at 11:58 AM, Mark Grover grover.markgro...@gmail.comwrote:

 Chunky,
 You should have run:
 msck repair table Table name;

 Sorry, I should have made it clear in my last reply. I have added an entry
 to Hive wiki for benefit of others:

 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Recoverpartitions

 Mark


 On Tue, Nov 6, 2012 at 9:55 PM, Chunky Gupta chunky.gu...@vizury.comwrote:

 Hi Mark,
 I didn't get any error.
 I ran this on hive console:-
  msck table Table_Name;
 It says Ok and showed the execution time as 1.050 sec.
 But when I checked partitions for table using
   show partitions Table_Name;
 It didn't show me any partitions.

 Thanks,
 Chunky.


 On Tue, Nov 6, 2012 at 10:38 PM, Mark Grover grover.markgro...@gmail.com
  wrote:

 Glad to hear, Chunky.

 Out of curiosity, what errors did you get when using msck?


 On Tue, Nov 6, 2012 at 5:14 AM, Chunky Gupta chunky.gu...@vizury.comwrote:

 Hi Mark,
 I tried msck, but it is not working for me. I have written a python
 script to partition the data individually.

 Thank you Edward, Mark and Dean.
 Chunky.


 On Mon, Nov 5, 2012 at 11:08 PM, Mark Grover 
 grover.markgro...@gmail.com wrote:

 Chunky,
 I have used recover partitions command on EMR, and that worked fine.

 However, take a look at https://issues.apache.org/jira/browse/HIVE-874. 
 Seems
 like msck command in Apache Hive does the same thing. Try it out and let 
 us
 know it goes.

 Mark

 On Mon, Nov 5, 2012 at 7:56 AM, Edward Capriolo edlinuxg...@gmail.com
  wrote:

 Recover partitions should work the same way for different file
 systems.

 Edward

 On Mon, Nov 5, 2012 at 9:33 AM, Dean Wampler
 dean.wamp...@thinkbiganalytics.com wrote:
  Writing a script to add the external partitions individually is the
 only way
  I know of.
 
  Sent from my rotary phone.
 
 
  On Nov 5, 2012, at 8:19 AM, Chunky Gupta chunky.gu...@vizury.com
 wrote:
 
  Hi Dean,
 
  Actually I was having Hadoop and Hive cluster on EMR and I have S3
 storage
  containing logs which updates daily and having partition with
 date(dt). And
  I was using this recover partition.
  Now I wanted to shift to EC2 and have my own Hadoop and Hive
 cluster. So,
  what is the alternate of using recover partition in this case, if
 you have
  any idea ?
  I found one way of individually partitioning all dates, so I have
 to write
  script for that to do so for all dates. Is there any easiest way
 other than
  this ?
 
  Thanks,
  Chunky
 
 
 
  On Mon, Nov 5, 2012 at 6:28 PM, Dean Wampler
  dean.wamp...@thinkbiganalytics.com wrote:
 
  The RECOVER PARTITIONS is an enhancement added by Amazon to their
 version
  of Hive.
 
 
 
 http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html
 
  shameless-plus
Chapter 21 of Programming Hive discusses this feature and other
 aspects
  of using Hive in EMR.
  /shameless-plug
 
  dean
 
 
  On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta 
 chunky.gu...@vizury.com
  wrote:
 
  Hi,
 
  I am having a cluster setup on EC2 with Hadoop version 0.20.2 and
 Hive
  version 0.8.1 (I configured everything) . I have created a table
 using :-
 
  CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW FORMAT
  DELIMITED FIELDS TERMINATED BY 'WWW' LOCATION
 's3://my-location/data/';
 
  Now I am trying to recover partition using :-
 
  ALTER TABLE XXX RECOVER PARTITIONS;
 
  but I am getting this error :- FAILED: Parse Error: line 1:12
 cannot
  recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter table
 statement
 
  Doing same steps on a cluster setup on EMR with Hadoop version
 1.0.3 and
  Hive version 0.8.1 (Configured by EMR), works fine.
 
  So is this a version issue or am I missing some configuration
 changes in
  EC2 setup ?
  I am not able to find exact solution for this problem on
 internet. Please
  help me.
 
  Thanks,
  Chunky.
 
 
 
 
 
 
  --
  Dean Wampler, Ph.D.
  thinkbiganalytics.com
  +1-312-339-1330