Re: Alter table is giving error

2012-11-05 Thread Dean Wampler
The RECOVER PARTITIONS is an enhancement added by Amazon to their version
of Hive.

http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html

shameless-plus
  Chapter 21 of Programming Hive discusses this feature and other aspects
of using Hive in EMR.
/shameless-plug

dean

On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta chunky.gu...@vizury.comwrote:

 Hi,

 I am having a cluster setup on EC2 with Hadoop version 0.20.2 and Hive
 version 0.8.1 (I configured everything) . I have created a table using :-

 CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW FORMAT
 DELIMITED FIELDS TERMINATED BY 'WWW' LOCATION 's3://my-location/data/';

 Now I am trying to recover partition using :-

 ALTER TABLE XXX RECOVER PARTITIONS;

 but I am getting this error :- FAILED: Parse Error: line 1:12 cannot
 recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter table statement

 Doing same steps on a cluster setup on EMR with Hadoop version 1.0.3 and
 Hive version 0.8.1 (Configured by EMR), works fine.

 So is this a version issue or am I missing some configuration changes in
 EC2 setup ?
 I am not able to find exact solution for this problem on internet. Please
 help me.

 Thanks,
 Chunky.






-- 
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330


Re: Alter table is giving error

2012-11-05 Thread Chunky Gupta
Hi Dean,

Actually I was having Hadoop and Hive cluster on EMR and I have S3 storage
containing logs which updates daily and having partition with date(dt). And
I was using this recover partition.
Now I wanted to shift to EC2 and have my own Hadoop and Hive cluster. So,
what is the alternate of using recover partition in this case, if you have
any idea ?
I found one way of individually partitioning all dates, so I have to write
script for that to do so for all dates. Is there any easiest way other than
this ?

Thanks,
Chunky



On Mon, Nov 5, 2012 at 6:28 PM, Dean Wampler 
dean.wamp...@thinkbiganalytics.com wrote:

 The RECOVER PARTITIONS is an enhancement added by Amazon to their version
 of Hive.


 http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html

 shameless-plus
   Chapter 21 of Programming Hive discusses this feature and other aspects
 of using Hive in EMR.
 /shameless-plug

 dean


 On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta chunky.gu...@vizury.comwrote:

 Hi,

 I am having a cluster setup on EC2 with Hadoop version 0.20.2 and Hive
 version 0.8.1 (I configured everything) . I have created a table using :-

 CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW FORMAT
 DELIMITED FIELDS TERMINATED BY 'WWW' LOCATION 's3://my-location/data/';

 Now I am trying to recover partition using :-

 ALTER TABLE XXX RECOVER PARTITIONS;

 but I am getting this error :- FAILED: Parse Error: line 1:12 cannot
 recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter table statement

 Doing same steps on a cluster setup on EMR with Hadoop version 1.0.3 and
 Hive version 0.8.1 (Configured by EMR), works fine.

 So is this a version issue or am I missing some configuration changes in
 EC2 setup ?
 I am not able to find exact solution for this problem on internet. Please
 help me.

 Thanks,
 Chunky.






 --
 *Dean Wampler, Ph.D.*
 thinkbiganalytics.com
 +1-312-339-1330





Re: Alter table is giving error

2012-11-05 Thread Dean Wampler
Writing a script to add the external partitions individually is the only way I 
know of. 

Sent from my rotary phone. 


On Nov 5, 2012, at 8:19 AM, Chunky Gupta chunky.gu...@vizury.com wrote:

 Hi Dean,
 
 Actually I was having Hadoop and Hive cluster on EMR and I have S3 storage 
 containing logs which updates daily and having partition with date(dt). And I 
 was using this recover partition.
 Now I wanted to shift to EC2 and have my own Hadoop and Hive cluster. So, 
 what is the alternate of using recover partition in this case, if you have 
 any idea ? 
 I found one way of individually partitioning all dates, so I have to write 
 script for that to do so for all dates. Is there any easiest way other than 
 this ?
 
 Thanks,
 Chunky
 
 
 
 On Mon, Nov 5, 2012 at 6:28 PM, Dean Wampler 
 dean.wamp...@thinkbiganalytics.com wrote:
 The RECOVER PARTITIONS is an enhancement added by Amazon to their version of 
 Hive.
 
 http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html
 
 shameless-plus
   Chapter 21 of Programming Hive discusses this feature and other aspects of 
 using Hive in EMR.
 /shameless-plug
 
 dean
 
 
 On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta chunky.gu...@vizury.com wrote:
 Hi,
 
 I am having a cluster setup on EC2 with Hadoop version 0.20.2 and Hive 
 version 0.8.1 (I configured everything) . I have created a table using :-
 
 CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW FORMAT DELIMITED 
 FIELDS TERMINATED BY 'WWW' LOCATION 's3://my-location/data/';
 
 Now I am trying to recover partition using :-
 
 ALTER TABLE XXX RECOVER PARTITIONS;
 
 but I am getting this error :- FAILED: Parse Error: line 1:12 cannot 
 recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter table statement
 
 Doing same steps on a cluster setup on EMR with Hadoop version 1.0.3 and 
 Hive version 0.8.1 (Configured by EMR), works fine.
 
 So is this a version issue or am I missing some configuration changes in 
 EC2 setup ?
 I am not able to find exact solution for this problem on internet. Please 
 help me.
 
 Thanks,
 Chunky.
 
 
 
 -- 
 Dean Wampler, Ph.D.
 thinkbiganalytics.com
 +1-312-339-1330
 


ClassNotFoundException when use hive java client of hive + hbase integration

2012-11-05 Thread Cheng Su
Hi, all. I have a hive+hbase integration cluster.

When I try to execute query through the java client of hive, sometimes
a ClassNotFoundException happens.

My java code :

final Connection conn = DriverManager.getConnection(URL);
final ResultSet rs = conn.executeQuery(SELECT count(*) FROM
test_table WHERE (source = '0' AND ur_createtime BETWEEN
'2012103100' AND '20121031235959'));

I can execute the sql:SELECT count(*) FROM test_table WHERE (source =
'0' AND ur_createtime BETWEEN '2012103100' AND '20121031235959')
in hive cli mode, and get the query result, so there is no error in my
sql.

The client side exception:

Caused by: java.sql.SQLException: Query returned non-zero code: 9,
cause: FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.MapRedTask
at 
org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:189)
... 23 more

The server side exception(hadoop-jobtracker):

2012-11-05 18:55:39,443 INFO org.apache.hadoop.mapred.TaskInProgress:
Error from attempt_201210301133_0112_m_00_3: java.io.IOException:
Cannot create an instance of InputSplit class =
org.apache.hadoop.hive.hbase.HBaseSplit:org.apache.hadoop.hive.hbase.HBaseSplit
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:146)
at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:396)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.hive.hbase.HBaseSplit
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Unknown Source)
at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:819)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:143)
... 10 more


My hive-env.sh

export 
HIVE_AUX_JARS_PATH=/data/install/hive-0.9.0/lib/hive-hbase-handler-0.9.0.jar,/data/install/hive-0.9.0/lib/hbase-0.92.0.jar,/data/install/hive-0.9.0/lib/zookeeper-3.4.2.jar


My hive-site.xml

property
namehive.zookeeper.quorum/name
valuehadoop01,hadoop02,hadoop03/value
descriptionThe list of zookeeper servers to talk to. This is
only needed for read/write locks./description
/property


And I start thrift service as below:

hive --service hiveserver -p 1 


The server side error log says that HBaseSplit is not found. But why?
How can I fix this?

-- 

Regards,
Cheng Su


Re: Alter table is giving error

2012-11-05 Thread Edward Capriolo
Recover partitions should work the same way for different file systems.

Edward

On Mon, Nov 5, 2012 at 9:33 AM, Dean Wampler
dean.wamp...@thinkbiganalytics.com wrote:
 Writing a script to add the external partitions individually is the only way
 I know of.

 Sent from my rotary phone.


 On Nov 5, 2012, at 8:19 AM, Chunky Gupta chunky.gu...@vizury.com wrote:

 Hi Dean,

 Actually I was having Hadoop and Hive cluster on EMR and I have S3 storage
 containing logs which updates daily and having partition with date(dt). And
 I was using this recover partition.
 Now I wanted to shift to EC2 and have my own Hadoop and Hive cluster. So,
 what is the alternate of using recover partition in this case, if you have
 any idea ?
 I found one way of individually partitioning all dates, so I have to write
 script for that to do so for all dates. Is there any easiest way other than
 this ?

 Thanks,
 Chunky



 On Mon, Nov 5, 2012 at 6:28 PM, Dean Wampler
 dean.wamp...@thinkbiganalytics.com wrote:

 The RECOVER PARTITIONS is an enhancement added by Amazon to their version
 of Hive.


 http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html

 shameless-plus
   Chapter 21 of Programming Hive discusses this feature and other aspects
 of using Hive in EMR.
 /shameless-plug

 dean


 On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta chunky.gu...@vizury.com
 wrote:

 Hi,

 I am having a cluster setup on EC2 with Hadoop version 0.20.2 and Hive
 version 0.8.1 (I configured everything) . I have created a table using :-

 CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW FORMAT
 DELIMITED FIELDS TERMINATED BY 'WWW' LOCATION 's3://my-location/data/';

 Now I am trying to recover partition using :-

 ALTER TABLE XXX RECOVER PARTITIONS;

 but I am getting this error :- FAILED: Parse Error: line 1:12 cannot
 recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter table statement

 Doing same steps on a cluster setup on EMR with Hadoop version 1.0.3 and
 Hive version 0.8.1 (Configured by EMR), works fine.

 So is this a version issue or am I missing some configuration changes in
 EC2 setup ?
 I am not able to find exact solution for this problem on internet. Please
 help me.

 Thanks,
 Chunky.






 --
 Dean Wampler, Ph.D.
 thinkbiganalytics.com
 +1-312-339-1330





Hive compression with external table

2012-11-05 Thread Krishna Rao
Hi all,

I'm looking into finding a suitable format to store data in HDFS, so that
it's available for processing by Hive. Ideally I would like to satisfy the
following:

1. store the data in a format that is readable by multiple Hadoop projects
(eg. Pig, Mahout, etc.), not just Hive
2. work with a Hive external table
3. store data in a compressed format that is splittable

(1) is a requirement because Hive isn't appropriate for all the problems
that we want to throw at Hadoop.

(2) is really more of a consequence of (1). Ideally we want the data stored
in some open format that is compressed in HDFS.
This way we can just point Hive, Pig, Mahout, etc at it depending on the
problem.

(3) is obviously so it plays well with Hadoop.

Gzip is no good because it is not splittable. Snappy looked promising, but
it is splittable only if used with a non-external Hive table.
LZO also looked promising, but I wonder about whether it is future proof
given the licencing issues surrounding it.

So far, the only solution I could find that satisfies all the above seems
to be bzip2 compression, but concerns about its performance make me wary
about choosing it.

Is bzip2 the only option I have? Or have I missed some other compression
option?

Cheers,

Krishna


Re: Hive compression with external table

2012-11-05 Thread Edward Capriolo
Compression is a confusing issue. Sequence files that are in block
format are always split table regardless of what compression for the
block is chosen.The Programming Hive book has an entire section
dedicated to the permutations of compression options.

Edward
On Mon, Nov 5, 2012 at 10:57 AM, Krishna Rao krishnanj...@gmail.com wrote:
 Hi all,

 I'm looking into finding a suitable format to store data in HDFS, so that
 it's available for processing by Hive. Ideally I would like to satisfy the
 following:

 1. store the data in a format that is readable by multiple Hadoop projects
 (eg. Pig, Mahout, etc.), not just Hive
 2. work with a Hive external table
 3. store data in a compressed format that is splittable

 (1) is a requirement because Hive isn't appropriate for all the problems
 that we want to throw at Hadoop.

 (2) is really more of a consequence of (1). Ideally we want the data stored
 in some open format that is compressed in HDFS.
 This way we can just point Hive, Pig, Mahout, etc at it depending on the
 problem.

 (3) is obviously so it plays well with Hadoop.

 Gzip is no good because it is not splittable. Snappy looked promising, but
 it is splittable only if used with a non-external Hive table.
 LZO also looked promising, but I wonder about whether it is future proof
 given the licencing issues surrounding it.

 So far, the only solution I could find that satisfies all the above seems to
 be bzip2 compression, but concerns about its performance make me wary about
 choosing it.

 Is bzip2 the only option I have? Or have I missed some other compression
 option?

 Cheers,

 Krishna


Re: Alter table is giving error

2012-11-05 Thread Mark Grover
Chunky,
I have used recover partitions command on EMR, and that worked fine.

However, take a look at https://issues.apache.org/jira/browse/HIVE-874. Seems
like msck command in Apache Hive does the same thing. Try it out and let us
know it goes.

Mark

On Mon, Nov 5, 2012 at 7:56 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

 Recover partitions should work the same way for different file systems.

 Edward

 On Mon, Nov 5, 2012 at 9:33 AM, Dean Wampler
 dean.wamp...@thinkbiganalytics.com wrote:
  Writing a script to add the external partitions individually is the only
 way
  I know of.
 
  Sent from my rotary phone.
 
 
  On Nov 5, 2012, at 8:19 AM, Chunky Gupta chunky.gu...@vizury.com
 wrote:
 
  Hi Dean,
 
  Actually I was having Hadoop and Hive cluster on EMR and I have S3
 storage
  containing logs which updates daily and having partition with date(dt).
 And
  I was using this recover partition.
  Now I wanted to shift to EC2 and have my own Hadoop and Hive cluster. So,
  what is the alternate of using recover partition in this case, if you
 have
  any idea ?
  I found one way of individually partitioning all dates, so I have to
 write
  script for that to do so for all dates. Is there any easiest way other
 than
  this ?
 
  Thanks,
  Chunky
 
 
 
  On Mon, Nov 5, 2012 at 6:28 PM, Dean Wampler
  dean.wamp...@thinkbiganalytics.com wrote:
 
  The RECOVER PARTITIONS is an enhancement added by Amazon to their
 version
  of Hive.
 
 
 
 http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html
 
  shameless-plus
Chapter 21 of Programming Hive discusses this feature and other
 aspects
  of using Hive in EMR.
  /shameless-plug
 
  dean
 
 
  On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta chunky.gu...@vizury.com
  wrote:
 
  Hi,
 
  I am having a cluster setup on EC2 with Hadoop version 0.20.2 and Hive
  version 0.8.1 (I configured everything) . I have created a table using
 :-
 
  CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW FORMAT
  DELIMITED FIELDS TERMINATED BY 'WWW' LOCATION 's3://my-location/data/';
 
  Now I am trying to recover partition using :-
 
  ALTER TABLE XXX RECOVER PARTITIONS;
 
  but I am getting this error :- FAILED: Parse Error: line 1:12 cannot
  recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter table
 statement
 
  Doing same steps on a cluster setup on EMR with Hadoop version 1.0.3
 and
  Hive version 0.8.1 (Configured by EMR), works fine.
 
  So is this a version issue or am I missing some configuration changes
 in
  EC2 setup ?
  I am not able to find exact solution for this problem on internet.
 Please
  help me.
 
  Thanks,
  Chunky.
 
 
 
 
 
 
  --
  Dean Wampler, Ph.D.
  thinkbiganalytics.com
  +1-312-339-1330
 
 
 



Hive 0.7.1 with MySQL 5.5 as metastore

2012-11-05 Thread Venkatesh Kavuluri
I am working on copying existing Hive metadata (Hive 0.7.1 with MySQL 5.1) to a 
new cluster environment (Hive 0.7.1 with MySQL 5.5). I copied over the 
metastore tables and modified the data under SDS (sub-directories) table to 
reflect the new data path. However I am getting MySQL integrity constraint 
violation against SDS.SD_ID column while trying to create new Hive tables. Is 
this a problem with the MySQL version I am using ? Does Hive 0.7.1 support 
MySQL 5.5 as the metastore.
Thanks,Venkatesh  

Re: Hive 0.7.1 with MySQL 5.5 as metastore

2012-11-05 Thread Edward Capriolo
Moving underlying data files around is not the correct way to perform
an upgrade.

https://dev.mysql.com/doc/refman/5.5/en/upgrading-from-previous-series.html

I would do a mysqldump and then re-insert the data for maximum comparability.

On Mon, Nov 5, 2012 at 6:21 PM, Venkatesh Kavuluri
vkavul...@outlook.com wrote:
 I am working on copying existing Hive metadata (Hive 0.7.1 with MySQL 5.1)
 to a new cluster environment (Hive 0.7.1 with MySQL 5.5). I copied over the
 metastore tables and modified the data under SDS (sub-directories) table to
 reflect the new data path. However I am getting MySQL integrity constraint
 violation against SDS.SD_ID column while trying to create new Hive tables.
 Is this a problem with the MySQL version I am using ? Does Hive 0.7.1
 support MySQL 5.5 as the metastore.

 Thanks,
 Venkatesh


RE: Hive 0.7.1 with MySQL 5.5 as metastore

2012-11-05 Thread Venkatesh Kavuluri
Sorry for the confusion, the problem is not with the MySQL version upgrade - I 
have indeed performed the upgrade by doing a mysqldump and restoring the data.
The problem is with how Hive 0.7.1 is interacting with the same metastore data 
on a different version of MySQL server.

 Date: Mon, 5 Nov 2012 18:31:37 -0500
 Subject: Re: Hive 0.7.1 with MySQL 5.5 as metastore
 From: edlinuxg...@gmail.com
 To: user@hive.apache.org
 
 Moving underlying data files around is not the correct way to perform
 an upgrade.
 
 https://dev.mysql.com/doc/refman/5.5/en/upgrading-from-previous-series.html
 
 I would do a mysqldump and then re-insert the data for maximum comparability.
 
 On Mon, Nov 5, 2012 at 6:21 PM, Venkatesh Kavuluri
 vkavul...@outlook.com wrote:
  I am working on copying existing Hive metadata (Hive 0.7.1 with MySQL 5.1)
  to a new cluster environment (Hive 0.7.1 with MySQL 5.5). I copied over the
  metastore tables and modified the data under SDS (sub-directories) table to
  reflect the new data path. However I am getting MySQL integrity constraint
  violation against SDS.SD_ID column while trying to create new Hive tables.
  Is this a problem with the MySQL version I am using ? Does Hive 0.7.1
  support MySQL 5.5 as the metastore.
 
  Thanks,
  Venkatesh
  

Re: ClassNotFoundException when use hive java client of hive + hbase integration

2012-11-05 Thread Mark Grover
Cheng,
You will have to add the appropriate HBase related jars to your class path.

You can do that by running add jar command(s) or put it in aux_lib. See
this thread for reference:
http://mail-archives.apache.org/mod_mbox/hive-user/201103.mbox/%3caanlktingqlgknqmizgoi+szfnexgcat8caqtovf8j...@mail.gmail.com%3E

Mark


On Mon, Nov 5, 2012 at 6:53 AM, Cheng Su scarcer...@gmail.com wrote:

 Hi, all. I have a hive+hbase integration cluster.

 When I try to execute query through the java client of hive, sometimes
 a ClassNotFoundException happens.

 My java code :

 final Connection conn = DriverManager.getConnection(URL);
 final ResultSet rs = conn.executeQuery(SELECT count(*) FROM
 test_table WHERE (source = '0' AND ur_createtime BETWEEN
 '2012103100' AND '20121031235959'));

 I can execute the sql:SELECT count(*) FROM test_table WHERE (source =
 '0' AND ur_createtime BETWEEN '2012103100' AND '20121031235959')
 in hive cli mode, and get the query result, so there is no error in my
 sql.

 The client side exception:

 Caused by: java.sql.SQLException: Query returned non-zero code: 9,
 cause: FAILED: Execution Error, return code 2 from
 org.apache.hadoop.hive.ql.exec.MapRedTask
 at
 org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:189)
 ... 23 more

 The server side exception(hadoop-jobtracker):

 2012-11-05 18:55:39,443 INFO org.apache.hadoop.mapred.TaskInProgress:
 Error from attempt_201210301133_0112_m_00_3: java.io.IOException:
 Cannot create an instance of InputSplit class =

 org.apache.hadoop.hive.hbase.HBaseSplit:org.apache.hadoop.hive.hbase.HBaseSplit
 at
 org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:146)
 at
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
 at
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
 at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:396)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Unknown Source)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
 at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.lang.ClassNotFoundException:
 org.apache.hadoop.hive.hbase.HBaseSplit
 at java.net.URLClassLoader$1.run(Unknown Source)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(Unknown Source)
 at java.lang.ClassLoader.loadClass(Unknown Source)
 at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
 at java.lang.ClassLoader.loadClass(Unknown Source)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Unknown Source)
 at
 org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:819)
 at
 org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:143)
 ... 10 more


 My hive-env.sh

 export
 HIVE_AUX_JARS_PATH=/data/install/hive-0.9.0/lib/hive-hbase-handler-0.9.0.jar,/data/install/hive-0.9.0/lib/hbase-0.92.0.jar,/data/install/hive-0.9.0/lib/zookeeper-3.4.2.jar


 My hive-site.xml

 property
 namehive.zookeeper.quorum/name
 valuehadoop01,hadoop02,hadoop03/value
 descriptionThe list of zookeeper servers to talk to. This is
 only needed for read/write locks./description
 /property


 And I start thrift service as below:

 hive --service hiveserver -p 1 


 The server side error log says that HBaseSplit is not found. But why?
 How can I fix this?

 --

 Regards,
 Cheng Su



Re: Hive 0.7.1 with MySQL 5.5 as metastore

2012-11-05 Thread Mark Grover
Venkatesh,
What's the exact integrity constraint error you are seeing?

I'd be curious to see if you restored the data from the mysqldump onto a
separate schema/db on MySQL 5.1 server whether you still get the error or
not.

Mark

On Mon, Nov 5, 2012 at 3:37 PM, Venkatesh Kavuluri vkavul...@outlook.comwrote:

 Sorry for the confusion, the problem is not with the MySQL version upgrade
 - I have indeed performed the upgrade by doing a mysqldump and restoring
 the data.

 The problem is with how Hive 0.7.1 is interacting with the same metastore
 data on a different version of MySQL server.

  Date: Mon, 5 Nov 2012 18:31:37 -0500
  Subject: Re: Hive 0.7.1 with MySQL 5.5 as metastore
  From: edlinuxg...@gmail.com
  To: user@hive.apache.org

 
  Moving underlying data files around is not the correct way to perform
  an upgrade.
 
 
 https://dev.mysql.com/doc/refman/5.5/en/upgrading-from-previous-series.html
 
  I would do a mysqldump and then re-insert the data for maximum
 comparability.
 
  On Mon, Nov 5, 2012 at 6:21 PM, Venkatesh Kavuluri
  vkavul...@outlook.com wrote:
   I am working on copying existing Hive metadata (Hive 0.7.1 with MySQL
 5.1)
   to a new cluster environment (Hive 0.7.1 with MySQL 5.5). I copied
 over the
   metastore tables and modified the data under SDS (sub-directories)
 table to
   reflect the new data path. However I am getting MySQL integrity
 constraint
   violation against SDS.SD_ID column while trying to create new Hive
 tables.
   Is this a problem with the MySQL version I am using ? Does Hive 0.7.1
   support MySQL 5.5 as the metastore.
  
   Thanks,
   Venkatesh



RE: Hive 0.7.1 with MySQL 5.5 as metastore

2012-11-05 Thread Venkatesh Kavuluri
Hi Mark,
I just started to restore the data to a separate MySQL 5.1 schema, will try to 
create a table and post back here.
I copied the error stack trace below.
Nov  5 22:24:02 127.0.0.1/127.0.0.1 local3:[ETLManager] ERROR [pool-2-thread-1] 
exec.MoveTask - Failed with exception Insert of object 
org.apache.hadoop.hive.metastore.model.MStorageDescriptor@1db0454f using 
statement INSERT INTO `SDS` 
(`SD_ID`,`LOCATION`,`OUTPUT_FORMAT`,`IS_COMPRESSED`,`NUM_BUCKETS`,`INPUT_FORMAT`,`SERDE_ID`)
 VALUES (?,?,?,?,?,?,?) failed : Duplicate entry '5152711' for key 
'PRIMARY'javax.jdo.JDODataStoreException: Insert of object 
org.apache.hadoop.hive.metastore.model.MStorageDescriptor@1db0454f using 
statement INSERT INTO `SDS` 
(`SD_ID`,`LOCATION`,`OUTPUT_FORMAT`,`IS_COMPRESSED`,`NUM_BUCKETS`,`INPUT_FORMAT`,`SERDE_ID`)
 VALUES (?,?,?,?,?,?,?) failed : Duplicate entry '5152711' for key 'PRIMARY'   
  at 
org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:313)
   at org.datanucleus.jdo.JDOTransaction.commit(JDOTransaction.java:132)   at 
org.apache.hadoop.hive.metastore.ObjectStore.commitTransaction(ObjectStore.java:315)
 at 
org.apache.hadoop.hive.metastore.HiveAlterHandler.alterTable(HiveAlterHandler.java:172)
  at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$29.run(HiveMetaStore.java:1687)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$29.run(HiveMetaStore.java:1684)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.executeWithRetry(HiveMetaStore.java:307)
   at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table(HiveMetaStore.java:1684)
   at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_table(HiveMetaStoreClient.java:166)
   at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:354)
at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1194)at 
org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)   at 
org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:131)   at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)  at 
org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063)at 
org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900)at 
org.apache.hadoop.hive.ql.Driver.run(Driver.java:748)
Thanks,Venkatesh
Date: Mon, 5 Nov 2012 17:17:46 -0800
Subject: Re: Hive 0.7.1 with MySQL 5.5 as metastore
From: grover.markgro...@gmail.com
To: user@hive.apache.org

Venkatesh,What's the exact integrity constraint error you are seeing?
I'd be curious to see if you restored the data from the mysqldump onto a 
separate schema/db on MySQL 5.1 server whether you still get the error or not. 

Mark

On Mon, Nov 5, 2012 at 3:37 PM, Venkatesh Kavuluri vkavul...@outlook.com 
wrote:




Sorry for the confusion, the problem is not with the MySQL version upgrade - I 
have indeed performed the upgrade by doing a mysqldump and restoring the data.
The problem is with how Hive 0.7.1 is interacting with the same metastore data 
on a different version of MySQL server.


 Date: Mon, 5 Nov 2012 18:31:37 -0500
 Subject: Re: Hive 0.7.1 with MySQL 5.5 as metastore
 From: edlinuxg...@gmail.com
 To: user@hive.apache.org

 
 Moving underlying data files around is not the correct way to perform
 an upgrade.
 
 https://dev.mysql.com/doc/refman/5.5/en/upgrading-from-previous-series.html

 
 I would do a mysqldump and then re-insert the data for maximum comparability.
 
 On Mon, Nov 5, 2012 at 6:21 PM, Venkatesh Kavuluri
 vkavul...@outlook.com wrote:

  I am working on copying existing Hive metadata (Hive 0.7.1 with MySQL 5.1)
  to a new cluster environment (Hive 0.7.1 with MySQL 5.5). I copied over the
  metastore tables and modified the data under SDS (sub-directories) table to

  reflect the new data path. However I am getting MySQL integrity constraint
  violation against SDS.SD_ID column while trying to create new Hive tables.
  Is this a problem with the MySQL version I am using ? Does Hive 0.7.1

  support MySQL 5.5 as the metastore.
 
  Thanks,
  Venkatesh
  

  

hive integrate with hbase, map to existed hbase table report column family not exist

2012-11-05 Thread Chris Gong
hi all:
now, I'm map to an existed hbase table, i got the following infomation as:

FAILED: Error in metadata: MetaException(message:Column Family
data is not defined in hbase table df_money_files)
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask
 
   my hive QL is 

create external table hbase_money_files (rowkey string, 
user_no string,
mon int,
mon_sn int,
group_no int,
sn int,
write_sect_no string,
write_sn int,
business_place_code string,
power_no int,
trans_group  int,
price_code string,
ts_flag string,
elec_type_code string,
trade_type_code string,
ms_mode string,
user_ms_type string,
write_power double,
chg_powerdouble,
add_powerdouble,
kb_power double,
share_power  double,
total_power  double,
total_money  double,
num_moneydouble,
add_money1   double,
add_money2   double,
add_money3   double,
add_money4   double,
add_money5   double,
add_money6   double,
add_money7   double,
add_money8   double,
add_money9   double,
add_money10  double,
rp_power double,
rp_money double,
should_money double,
create_date  string,
creator  string,
warrant_no   int,
line_codestring,
trans_no string,
add_taxflag  string,
write_date   string,
compute_date string,
calculator_id string,
statusstring,
user_type1string,
rela_user_no  string,
part_sn   int,
have_ext string,
id_fragment  string,
check_date   string,
check_manstring,
start_date  string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'  
WITH SERDEPROPERTIES (hbase.columns.mapping = :key,data:user_no,
data:mon,
data:mon_sn,
data:group_no,
data:sn,
data:write_sect_no,
data:write_sn,
data:business_place_code,
data:power_no,
data:trans_group,
data:price_code,
data:ts_flag,
data:elec_type_code,
data:trade_type_code,
data:ms_mode,
data:user_ms_type,
data:write_power,
data:chg_power,
data:add_power,
data:kb_power,
data:share_power,
data:total_power,
data:total_money,
data:num_money,
data:add_money1,
data:add_money2,
data:add_money3,
data:add_money4,
data:add_money5,
data:add_money6,
data:add_money7,
data:add_money8,
data:add_money9,
data:add_money10,
data:rp_power,
data:rp_money,
data:should_money,
data:create_date,
data:creator,
data:warrant_no,
data:line_code,
data:trans_no,
data:add_taxflag,
data:write_date,
data:compute_date,
data:calculator_id,
data:status,
data:user_type1,
data:rela_user_no,
data:part_sn,
data:have_ext,
data:id_fragment,
data:check_date,
data:check_man,
data:start_date)  
TBLPROPERTIES(hbase.table.name = df_money_files);  

however the data column does exist! when i describe the table in hbase shell, 
it reported:

hbase(main):001:0 describe 'df_money_files'
DESCRIPTION  ENABLED
 {NAME = 'df_money_files', FAMILIES = [{NAME = 'd true
 ata', BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '
 0', VERSIONS = '3', COMPRESSION = 'NONE', MIN_VER
 SIONS = '0', TTL = '2147483647', BLOCKSIZE = '65
 536', IN_MEMORY = 'false', BLOCKCACHE = 'true'}]}
1 row(s) in 0.8470 seconds

i am confused now, anyone can give some infomation?



Chris Gong

Re: ClassNotFoundException when use hive java client of hive + hbase integration

2012-11-05 Thread Cheng Su
Mark, thank you so much for your suggestion.

Although I've already add necessary jars to my hive aux path, thus I
can execute my sql in hive CLI mode without getting any error.
But when I use a java client to access the tables through the thrift
service, I need to add these jars manually.
I execute the ADD JAR .jar sql and the problem is solved!

Thank you again!

On Tue, Nov 6, 2012 at 9:03 AM, Mark Grover grover.markgro...@gmail.com wrote:
 Cheng,
 You will have to add the appropriate HBase related jars to your class path.

 You can do that by running add jar command(s) or put it in aux_lib. See
 this thread for reference:
 http://mail-archives.apache.org/mod_mbox/hive-user/201103.mbox/%3caanlktingqlgknqmizgoi+szfnexgcat8caqtovf8j...@mail.gmail.com%3E

 Mark


 On Mon, Nov 5, 2012 at 6:53 AM, Cheng Su scarcer...@gmail.com wrote:

 Hi, all. I have a hive+hbase integration cluster.

 When I try to execute query through the java client of hive, sometimes
 a ClassNotFoundException happens.

 My java code :

 final Connection conn = DriverManager.getConnection(URL);
 final ResultSet rs = conn.executeQuery(SELECT count(*) FROM
 test_table WHERE (source = '0' AND ur_createtime BETWEEN
 '2012103100' AND '20121031235959'));

 I can execute the sql:SELECT count(*) FROM test_table WHERE (source =
 '0' AND ur_createtime BETWEEN '2012103100' AND '20121031235959')
 in hive cli mode, and get the query result, so there is no error in my
 sql.

 The client side exception:

 Caused by: java.sql.SQLException: Query returned non-zero code: 9,
 cause: FAILED: Execution Error, return code 2 from
 org.apache.hadoop.hive.ql.exec.MapRedTask
 at
 org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:189)
 ... 23 more

 The server side exception(hadoop-jobtracker):

 2012-11-05 18:55:39,443 INFO org.apache.hadoop.mapred.TaskInProgress:
 Error from attempt_201210301133_0112_m_00_3: java.io.IOException:
 Cannot create an instance of InputSplit class =

 org.apache.hadoop.hive.hbase.HBaseSplit:org.apache.hadoop.hive.hbase.HBaseSplit
 at
 org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:146)
 at
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
 at
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
 at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:396)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Unknown Source)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
 at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.lang.ClassNotFoundException:
 org.apache.hadoop.hive.hbase.HBaseSplit
 at java.net.URLClassLoader$1.run(Unknown Source)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(Unknown Source)
 at java.lang.ClassLoader.loadClass(Unknown Source)
 at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
 at java.lang.ClassLoader.loadClass(Unknown Source)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Unknown Source)
 at
 org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:819)
 at
 org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:143)
 ... 10 more


 My hive-env.sh

 export
 HIVE_AUX_JARS_PATH=/data/install/hive-0.9.0/lib/hive-hbase-handler-0.9.0.jar,/data/install/hive-0.9.0/lib/hbase-0.92.0.jar,/data/install/hive-0.9.0/lib/zookeeper-3.4.2.jar


 My hive-site.xml

 property
 namehive.zookeeper.quorum/name
 valuehadoop01,hadoop02,hadoop03/value
 descriptionThe list of zookeeper servers to talk to. This is
 only needed for read/write locks./description
 /property


 And I start thrift service as below:

 hive --service hiveserver -p 1 


 The server side error log says that HBaseSplit is not found. But why?
 How can I fix this?

 --

 Regards,
 Cheng Su





-- 

Regards,
Cheng Su


Which is the postgres version is compatible for hive-trunk..?

2012-11-05 Thread rohithsharma
Hi Guys,

 I am planning to use postgres as metastore with Hive.  Can anyone point me
which is the postgres version is compatible with Hive..?

 

Regards

Rohith Sharma K S



Re: Which is the postgres version is compatible for hive-trunk..?

2012-11-05 Thread Alexander Lorenz
Hey Rohith,

last time I used psql was with postgresql-8.4-701.jdbc4.jar, and was working 
great. But I guess all 8.x version should work. Postgres 9x I personally 
wouldn't choose.

best,
 Alex

On Nov 6, 2012, at 5:29 AM, rohithsharma rohithsharm...@huawei.com wrote:

 Hi Guys,
 
 I am planning to use postgres as metastore with Hive.  Can anyone point me
 which is the postgres version is compatible with Hive..?
 
 
 
 Regards
 
 Rohith Sharma K S
 

--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF