Hive crashing after an upgrade - issue with existing larger tables

2011-08-18 Thread Bejoy Ks
Hi Experts

        I was working on hive with larger volume data  with hive 0.7 . Recently 
my hive installation was upgraded to 0.7.1 . After the upgrade I'm having a lot 
of issues with queries that were already working fine with larger data. The 
queries that took seconds to return results is now taking hours, for most 
larger tables even the map reduce jobs are not getting triggered. Queries like 
Select * and describe are working fine since they don't involve any map reduce 
jobs. For the jobs that didn't even get triggered I got the following error 
from job tracker

Job initialization failed: java.io.IOException: Split metadata size exceeded 
1000. 
Aborting job job_201106061630_6993 at 
org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo(SplitMetaInfoReader.java:48)
 
at org.apache.hadoop.mapred.JobInProgress.createSplits(JobInProgress.java:807) 
at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:701) 
at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:4013) 
at 
org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79)
 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 
at java.lang.Thread.run(Thread.java:619) 


Looks like some metadata issue. My cluster is on CDH3-u0 . Has anyone faced 
similar issues before. Please share your thoughts what could be the probable 
cause of the error.

Thank You


Re: Hive crashing after an upgrade - issue with existing larger tables

2011-08-18 Thread bejoy_ks
A small correction to my previous post. The CDH version is CDH u1 not u0
Sorry for the confusion

Regards
Bejoy K S

-Original Message-
From: Bejoy Ks bejoy...@yahoo.com
Date: Thu, 18 Aug 2011 05:51:58 
To: hive user groupuser@hive.apache.org
Reply-To: user@hive.apache.org
Subject: Hive crashing after an upgrade - issue with existing larger tables

Hi Experts

        I was working on hive with larger volume data  with hive 0.7 . Recently 
my hive installation was upgraded to 0.7.1 . After the upgrade I'm having a lot 
of issues with queries that were already working fine with larger data. The 
queries that took seconds to return results is now taking hours, for most 
larger tables even the map reduce jobs are not getting triggered. Queries like 
Select * and describe are working fine since they don't involve any map reduce 
jobs. For the jobs that didn't even get triggered I got the following error 
from job tracker

Job initialization failed: java.io.IOException: Split metadata size exceeded 
1000. 
Aborting job job_201106061630_6993 at 
org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo(SplitMetaInfoReader.java:48)
 
at org.apache.hadoop.mapred.JobInProgress.createSplits(JobInProgress.java:807) 
at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:701) 
at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:4013) 
at 
org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79)
 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 
at java.lang.Thread.run(Thread.java:619) 


Looks like some metadata issue. My cluster is on CDH3-u0 . Has anyone faced 
similar issues before. Please share your thoughts what could be the probable 
cause of the error.

Thank You



Re: Hive crashing after an upgrade - issue with existing larger tables

2011-08-18 Thread Carl Steinbach
Hi,

The original CDH3U1 release of Hive contained a configuration bug which we
recently fixed in an update. You can get the update by refreshing your Hive
packages. Afterwards please verify that you are using the following Hive
package: hive-0.7.1+42.9

You can also fix the problem by modifying your hive-site.xml file to include
the following setting:

mapred.max.split.size=25600

Thanks.

Carl

On Thu, Aug 18, 2011 at 8:48 AM, bejoy...@yahoo.com wrote:

 A small correction to my previous post. The CDH version is CDH u1 not u0
 Sorry for the confusion

 Regards
 Bejoy K S
 --
 *From: * Bejoy Ks bejoy...@yahoo.com
 *Date: *Thu, 18 Aug 2011 05:51:58 -0700 (PDT)
 *To: *hive user groupuser@hive.apache.org
 *ReplyTo: * user@hive.apache.org
 *Subject: *Hive crashing after an upgrade - issue with existing larger
 tables

 Hi Experts
 I was working on hive with larger volume data  with hive 0.7 .
 Recently my hive installation was upgraded to 0.7.1 . After the upgrade I'm
 having a lot of issues with queries that were already working fine with
 larger data. The queries that took seconds to return results is now taking
 hours, for most larger tables even the map reduce jobs are not getting
 triggered. Queries like Select * and describe are working fine since they
 don't involve any map reduce jobs. For the jobs that didn't even get
 triggered I got the following error from job tracker

 Job initialization failed: java.io.IOException: Split metadata size
 exceeded 1000.
 Aborting job job_201106061630_6993 at
 org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo(SplitMetaInfoReader.java:48)

 at
 org.apache.hadoop.mapred.JobInProgress.createSplits(JobInProgress.java:807)
 at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:701)

 at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:4013)
 at
 org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79)

 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

 at java.lang.Thread.run(Thread.java:619)


 Looks like some metadata issue. My cluster is on CDH3-u0 . Has anyone faced
 similar issues before. Please share your thoughts what could be the probable
 cause of the error.

 Thank You



Re: Hive DDL issue

2011-08-18 Thread Carl Steinbach
Hive does not work on Cygwin.

On Wed, Aug 17, 2011 at 3:38 PM, Siddharth Tiwari siddharth.tiw...@live.com
 wrote:


 encountering following issur pls help, on cygwin windows

 hive show tables;
 FAILED: Hive Internal Error:
 java.lang.IllegalArgumentException(java.net.URISyntaxException: Relative
 path in absolute URI: file:C:/cygwin/tmp//siddharth/hive_2011-08-18_
 04-08-25_850_5502285238716420526)
 java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative
 path in absolute URI:
 file:C:/cygwin/tmp//siddharth/hive_2011-08-18_04-08-25_850_550228523871642
 0526
 at org.apache.hadoop.fs.Path.initialize(Path.java:140)
 at org.apache.hadoop.fs.Path.init(Path.java:132)
 at
 org.apache.hadoop.hive.ql.Context.getScratchDir(Context.java:142)
 at
 org.apache.hadoop.hive.ql.Context.getLocalScratchDir(Context.java:168)
 at
 org.apache.hadoop.hive.ql.Context.getLocalTmpFileURI(Context.java:282)
 at
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:205)
 at
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
 at
 org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
 at
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 Caused by: java.net.URISyntaxException: Relative path in absolute URI:
 file:C:/cygwin/tmp//siddharth/hive_2011-08-18_04-08-25_850_5502285238716420526
 at java.net.URI.checkPath(URI.java:1787)
 at java.net.URI.init(URI.java:735)
 at org.apache.hadoop.fs.Path.initialize(Path.java:137)
 ... 16 more


 ****
 *Cheers !!!*
 *Siddharth Tiwari*
 Have a refreshing day !!!



RE: Hive DDL issue

2011-08-18 Thread Siddharth Tiwari

hey carl,

Isint there any way to enable it, if not, what is this error about ? what is 
the problem ?

**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!


From: c...@cloudera.com
Date: Thu, 18 Aug 2011 11:34:03 -0700
Subject: Re: Hive DDL issue
To: user@hive.apache.org

Hive does not work on Cygwin.

On Wed, Aug 17, 2011 at 3:38 PM, Siddharth Tiwari siddharth.tiw...@live.com 
wrote:








encountering following issur pls help, on cygwin windows

hive show tables;
FAILED: Hive Internal Error: 
java.lang.IllegalArgumentException(java.net.URISyntaxException: Relative path 
in absolute URI: file:C:/cygwin/tmp//siddharth/hive_2011-08-18_


04-08-25_850_5502285238716420526)
java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path 
in absolute URI: 
file:C:/cygwin/tmp//siddharth/hive_2011-08-18_04-08-25_850_550228523871642
0526
at org.apache.hadoop.fs.Path.initialize(Path.java:140)


at org.apache.hadoop.fs.Path.init(Path.java:132)
at org.apache.hadoop.hive.ql.Context.getScratchDir(Context.java:142)
at 
org.apache.hadoop.hive.ql.Context.getLocalScratchDir(Context.java:168)


at 
org.apache.hadoop.hive.ql.Context.getLocalTmpFileURI(Context.java:282)
at 
org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:205)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)


at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)


at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)


at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)


at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.net.URISyntaxException: Relative path in absolute URI: 
file:C:/cygwin/tmp//siddharth/hive_2011-08-18_04-08-25_850_5502285238716420526
at java.net.URI.checkPath(URI.java:1787)


at java.net.URI.init(URI.java:735)
at org.apache.hadoop.fs.Path.initialize(Path.java:137)
... 16 more


**



Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
  

  

Re: Hive DDL issue

2011-08-18 Thread Carl Steinbach
Adding to what Ed said, we don't run regression tests on Cygwin, so Hive on
Cygwin is
de facto unmaintained.

On Thu, Aug 18, 2011 at 12:37 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 It did work with cygwin at one point but since it is rarely used in that
 environment it is not well supported. Your best bet is QEMU or Vmware
 emulating a linux environment.


 On Thu, Aug 18, 2011 at 3:14 PM, Siddharth Tiwari 
 siddharth.tiw...@live.com wrote:

  hey carl,

 Isint there any way to enable it, if not, what is this error about ? what
 is the problem ?

 ****
 *Cheers !!!*
 *Siddharth Tiwari*
 Have a refreshing day !!!


 --
 From: c...@cloudera.com
 Date: Thu, 18 Aug 2011 11:34:03 -0700
 Subject: Re: Hive DDL issue
 To: user@hive.apache.org

 Hive does not work on Cygwin.

 On Wed, Aug 17, 2011 at 3:38 PM, Siddharth Tiwari 
 siddharth.tiw...@live.com wrote:


 encountering following issur pls help, on cygwin windows

 hive show tables;
 FAILED: Hive Internal Error:
 java.lang.IllegalArgumentException(java.net.URISyntaxException: Relative
 path in absolute URI: file:C:/cygwin/tmp//siddharth/hive_2011-08-18_
 04-08-25_850_5502285238716420526)
 java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative
 path in absolute URI:
 file:C:/cygwin/tmp//siddharth/hive_2011-08-18_04-08-25_850_550228523871642
 0526
 at org.apache.hadoop.fs.Path.initialize(Path.java:140)
 at org.apache.hadoop.fs.Path.init(Path.java:132)
 at
 org.apache.hadoop.hive.ql.Context.getScratchDir(Context.java:142)
 at
 org.apache.hadoop.hive.ql.Context.getLocalScratchDir(Context.java:168)
 at
 org.apache.hadoop.hive.ql.Context.getLocalTmpFileURI(Context.java:282)
 at
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:205)
 at
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
 at
 org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
 at
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 Caused by: java.net.URISyntaxException: Relative path in absolute URI:
 file:C:/cygwin/tmp//siddharth/hive_2011-08-18_04-08-25_850_5502285238716420526
 at java.net.URI.checkPath(URI.java:1787)
 at java.net.URI.init(URI.java:735)
 at org.apache.hadoop.fs.Path.initialize(Path.java:137)
 ... 16 more


 ****
 *Cheers !!!*
 *Siddharth Tiwari*
 Have a refreshing day !!!






RE: Hive DDL issue

2011-08-18 Thread Siddharth Tiwari

okay Ed and Carl, I get the point, the only thing which bothered me was, would 
it be able to run on cygwin ? what actually was wrong.

**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!


From: c...@cloudera.com
Date: Thu, 18 Aug 2011 13:13:37 -0700
Subject: Re: Hive DDL issue
To: user@hive.apache.org

Adding to what Ed said, we don't run regression tests on Cygwin, so Hive on 
Cygwin isde facto unmaintained. 

On Thu, Aug 18, 2011 at 12:37 PM, Edward Capriolo edlinuxg...@gmail.com wrote:


It did work with cygwin at one point but since it is rarely used in that 
environment it is not well supported. Your best bet is QEMU or Vmware emulating 
a linux environment.  



On Thu, Aug 18, 2011 at 3:14 PM, Siddharth Tiwari siddharth.tiw...@live.com 
wrote:








hey carl,

Isint there any way to enable it, if not, what is this error about ? what is 
the problem ?

**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!


From: c...@cloudera.com
Date: Thu, 18 Aug 2011 11:34:03 -0700
Subject: Re: Hive DDL issue



To: user@hive.apache.org

Hive does not work on Cygwin.

On Wed, Aug 17, 2011 at 3:38 PM, Siddharth Tiwari siddharth.tiw...@live.com 
wrote:











encountering following issur pls help, on cygwin windows

hive show tables;
FAILED: Hive Internal Error: 
java.lang.IllegalArgumentException(java.net.URISyntaxException: Relative path 
in absolute URI: file:C:/cygwin/tmp//siddharth/hive_2011-08-18_





04-08-25_850_5502285238716420526)
java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path 
in absolute URI: 
file:C:/cygwin/tmp//siddharth/hive_2011-08-18_04-08-25_850_550228523871642
0526
at org.apache.hadoop.fs.Path.initialize(Path.java:140)





at org.apache.hadoop.fs.Path.init(Path.java:132)
at org.apache.hadoop.hive.ql.Context.getScratchDir(Context.java:142)
at 
org.apache.hadoop.hive.ql.Context.getLocalScratchDir(Context.java:168)





at 
org.apache.hadoop.hive.ql.Context.getLocalTmpFileURI(Context.java:282)
at 
org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:205)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)





at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)





at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)





at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)





at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.net.URISyntaxException: Relative path in absolute URI: 
file:C:/cygwin/tmp//siddharth/hive_2011-08-18_04-08-25_850_5502285238716420526
at java.net.URI.checkPath(URI.java:1787)





at java.net.URI.init(URI.java:735)
at org.apache.hadoop.fs.Path.initialize(Path.java:137)
... 16 more


**



Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
  

  



  

Ignore subdirectories when querying external table

2011-08-18 Thread Dave
Hi,

I have a partitioned external table in Hive, and in the partition
directories there are other subdirectories that are not related to the table
itself. Hive seems to want to scan those directories, as I am getting an
error message when trying to do a SELECT on the table:

Failed with exception java.io.IOException:java.io.IOException: Not a file:
hdfs://path/to/partition/path/to/subdir

Also, it seems to ignore directories prefixed by an underscore (_directory).

I am using hive 0.7.1 on Hadoop 0.20.2.

Is there a way to force Hive to ignore all subdirectories in external tables
and only look at files?

Thanks in advance,
-Dave


Re: Setting up stats database

2011-08-18 Thread wd
Maybe you should use
'hive.stats.jdbcdriver=org.apache.mysql.jdbc.EmbeddedDriver'
settings?

via
http://mail-archives.apache.org/mod_mbox/hive-user/201103.mbox/%3c42360b00-72ec-437a-9d95-93f3ad9f1...@fb.com%3E

On Fri, Aug 19, 2011 at 5:45 AM, bharath vissapragada 
bharathvissapragada1...@gmail.com wrote:

 Hi,

 Iam also getting the same error. However I am using mysql for stats.

 The thing is I configured mysql for metastore and it works fine and all the
 metadata gets populated normally. When the metastore classes can find the
 mysql jar in the class path , why cant the stats publisher find it. I looked
 at the stats source and everything looks fine.

 My conn string is :
 jdbc:mysql://ip:3306/TempStatsStoreamp;user=nameamp;password=pwd.

 Am I missing something?

 Thanks




 On Thu, Aug 18, 2011 at 8:19 AM, wd w...@wdicc.com wrote:

 The error in log is 'java.lang.ClassNotFoundException:
 org.postgresql.Driver', not can't connect or user name or password error.


 On Wed, Aug 17, 2011 at 3:53 PM, Jander g jande...@gmail.com wrote:

 Hi,wd

 You should configure hive.stats.dbconnectionstring as follows.


 property
  namehive.stats.dbconnectionstring/name

  
 valuejdbc:postgresql://localhost/hive_statsdb?createDatabaseIfNotExist=trueamp;user=hiveamp;password=pwd/value

  descriptionThe default connection string for the database that
 stores temporary hive statistics./description
 /property

 Regards,

 Jander.


 On Mon, Aug 15, 2011 at 3:09 PM, wd w...@wdicc.com wrote:

 hi,

 I'm try to use postgres as stats database. And made following settings
 in hive-site.xml


 property
  namehive.stats.dbclass/name
  valuejdbc:postgresql/value
  descriptionThe default database that stores temporary hive
 statistics./description
 /property

 property
  namehive.stats.autogather/name
  valuetrue/value
  descriptionA flag to gather statistics automatically during the
 INSERT OVERWRITE command./description
 /property

 property
  namehive.stats.jdbcdriver/name
  valueorg.postgresql.Driver/value
  descriptionThe JDBC driver for the database that stores temporary
 hive statistics./description
 /property

 property
  namehive.stats.dbconnectionstring/name

  
 valuejdbc:postgresql://localhost/hive_statsdb?createDatabaseIfNotExist=true;user=hive;password=pwd/value
  descriptionThe default connection string for the database that
 stores temporary hive statistics./description
 /property

 I use postgres as hive meta database, so there is a
 postgresql-9.0-801.jdbc4.jar file in lib.

 After run 'analyse table t1 partitions(dt) comput statistics;' in hive
 cli, it will output some stats info in cli, but nothing in db. And I
 can found there is the flowing errors

 1-08-15 14:54:54,767 INFO
 org.apache.hadoop.hive.ql.exec.TableScanOperator: Stats Gathering
 found a new partition spec = dt=20110805
 2011-08-15 14:54:54,767 INFO
 org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarding 1 rows
 2011-08-15 14:54:54,767 INFO ExecMapper: ExecMapper: processing 1
 rows: used memory = 39953640
 2011-08-15 14:54:54,768 INFO
 org.apache.hadoop.hive.ql.exec.MapOperator: 1 finished. closing...
 2011-08-15 14:54:54,768 INFO
 org.apache.hadoop.hive.ql.exec.MapOperator: 1 forwarded 2 rows
 2011-08-15 14:54:54,768 INFO
 org.apache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0
 2011-08-15 14:54:54,768 INFO
 org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 finished.
 closing...
 2011-08-15 14:54:54,768 INFO
 org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarded 2 rows
 2011-08-15 14:54:54,772 ERROR
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher: Error during
 JDBC connection to

 jdbc:postgresql://localhost/hive_statsdb?createDatabaseIfNotExist=true;user=hive;password=pwd.
 java.lang.ClassNotFoundException: org.postgresql.Driver
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:169)
at
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.connect(JDBCStatsPublisher.java:55)
at
 org.apache.hadoop.hive.ql.exec.TableScanOperator.publishStats(TableScanOperator.java:202)
at
 org.apache.hadoop.hive.ql.exec.TableScanOperator.closeOp(TableScanOperator.java:164)
at
 org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
at
 org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
at
 org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at
 org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
 

How to skip the malformatted records while loading data

2011-08-18 Thread XieXianshan
Hi,everyone,

Is there an option to ignore malformatted records while loading data
into hive table?
Or an option to ignore bad rows while querying data?

For instance:
1. Specify a row format explicitly for a new table.
hivecreate table tb (id int, pref string, zip string) row format
delimited fields terminated by ',' lines terminated by '\n';

2. Load data into the table from a csv file that with bad records.
hiveload data local inpath 'data.csv' overwrite into table tb;

The data.csv might look like:
32,aaa,422
--Blank line
33:bbb:423 --Invalid field delimiter :
aa,ccc,424 --Non-int number aa

3. Select data
hive select * from tb;
OK
32 aaa 422
NULL NULL NULL
NULL NULL NULL
NULL ccc 424
Time taken: 0.196 seconds

I have tried to set mapred.skip.map.max.skip.records,but it seems not to
work.

Thanks in advance.

Regards,
Xie

-- 
Best Regards
Xie Xianshan
--
Xie Xianshan
Dept.IV of Technology and Development
Nanjing Fujitsu Nanda Software Tech. Co., Ltd.(FNST)
No. 6 Wenzhu Road, Nanjing, China
PostCode: 210012
PHONE: +86+25-86630566-8522
FUJITSU INTERNAL: 7998-8522
MAIL: xi...@cn.fujitsu.com
--
This communication is for use by the intended recipient(s) only and may
contain information that is privileged, confidential and exempt from
disclosure under applicable law. If you are not an intended recipient of
this communication, you are hereby notified that any dissemination,
distribution or copying hereof is strictly prohibited.  If you have
received this communication in error, please notify me by reply e-mail,
permanently delete this communication from your system, and destroy any
hard copies you may have printed 



How to skip the malformatted records while loading data

2011-08-18 Thread XieXianshan
Hi,everyone,

Is there an option to ignore malformatted records while loading data
into hive table?
Or an option to ignore bad rows while querying data?

For instance:
1. Specify a row format explicitly for a new table.
hivecreate table tb (id int, pref string, zip string) row format
delimited fields terminated by ',' lines terminated by '\n';

2. Load data into the table from a csv file that with bad records.
hiveload data local inpath 'data.csv' overwrite into table tb;

The data.csv might look like:
32,aaa,422
--Blank line
33:bbb:423 --Invalid field delimiter :
aa,ccc,424 --Non-int number aa

3. Select data
hive select * from tb;
OK
32 aaa 422
NULL NULL NULL
NULL NULL NULL
NULL ccc 424
Time taken: 0.196 seconds

I have tried to set mapred.skip.map.max.skip.records,but it seems not to
work.

Thanks in advance.

Regards,
Xie

-- 
Best Regards
Xie Xianshan
--
Xie Xianshan
Dept.IV of Technology and Development
Nanjing Fujitsu Nanda Software Tech. Co., Ltd.(FNST)
No. 6 Wenzhu Road, Nanjing, China
PostCode: 210012
PHONE: +86+25-86630566-8522
FUJITSU INTERNAL: 7998-8522
MAIL: xi...@cn.fujitsu.com
--
This communication is for use by the intended recipient(s) only and may
contain information that is privileged, confidential and exempt from
disclosure under applicable law. If you are not an intended recipient of
this communication, you are hereby notified that any dissemination,
distribution or copying hereof is strictly prohibited.  If you have
received this communication in error, please notify me by reply e-mail,
permanently delete this communication from your system, and destroy any
hard copies you may have printed 



Re: Setting up stats database[SOLVED]

2011-08-18 Thread bharath vissapragada
Hi,

I solved this by placing the jar in ${java_home}/jre/lib and
${java_home}/jre/lib/ext . This is the workaround whenever jdbc drivers wont
work. The same thing worked here too. (I hope it works with your postgres
too). I am still wondering why hive didn't recognize it in the classpath.

Also there is some parsing problem in my connection string and it is getting
terminated at ; in
jdbc:mysql://ip:3306/TempStatsStoreamp;user=nameamp;password=pwd.  I
got it worked by adding 2 properties stats.username and stats.password just
like the metastore db user and password,  and replaced the

conn = DriverManager.getConnection(connectionString) with  conn =
DriverManager.getConnection(connectionString,uname,pwd) by reading them
from Conf variable inside JDBCStatsPublisher class.

Is this worth filing a JIRA or Am I the only one facing this problem?

Thanks



On Fri, Aug 19, 2011 at 8:05 AM, wd w...@wdicc.com wrote:

 Maybe you should use 
 'hive.stats.jdbcdriver=org.apache.mysql.jdbc.EmbeddedDriver'
 settings?

 via
 http://mail-archives.apache.org/mod_mbox/hive-user/201103.mbox/%3c42360b00-72ec-437a-9d95-93f3ad9f1...@fb.com%3E

 On Fri, Aug 19, 2011 at 5:45 AM, bharath vissapragada 
 bharathvissapragada1...@gmail.com wrote:

 Hi,

 Iam also getting the same error. However I am using mysql for stats.

 The thing is I configured mysql for metastore and it works fine and all
 the metadata gets populated normally. When the metastore classes can find
 the mysql jar in the class path , why cant the stats publisher find it. I
 looked at the stats source and everything looks fine.

 My conn string is :
 jdbc:mysql://ip:3306/TempStatsStoreamp;user=nameamp;password=pwd.

 Am I missing something?

 Thanks




 On Thu, Aug 18, 2011 at 8:19 AM, wd w...@wdicc.com wrote:

 The error in log is 'java.lang.ClassNotFoundException:
 org.postgresql.Driver', not can't connect or user name or password error.


 On Wed, Aug 17, 2011 at 3:53 PM, Jander g jande...@gmail.com wrote:

 Hi,wd

 You should configure hive.stats.dbconnectionstring as follows.


 property
  namehive.stats.dbconnectionstring/name

  
 valuejdbc:postgresql://localhost/hive_statsdb?createDatabaseIfNotExist=trueamp;user=hiveamp;password=pwd/value

  descriptionThe default connection string for the database that
 stores temporary hive statistics./description
 /property

 Regards,

 Jander.


 On Mon, Aug 15, 2011 at 3:09 PM, wd w...@wdicc.com wrote:

 hi,

 I'm try to use postgres as stats database. And made following settings
 in hive-site.xml


 property
  namehive.stats.dbclass/name
  valuejdbc:postgresql/value
  descriptionThe default database that stores temporary hive
 statistics./description
 /property

 property
  namehive.stats.autogather/name
  valuetrue/value
  descriptionA flag to gather statistics automatically during the
 INSERT OVERWRITE command./description
 /property

 property
  namehive.stats.jdbcdriver/name
  valueorg.postgresql.Driver/value
  descriptionThe JDBC driver for the database that stores temporary
 hive statistics./description
 /property

 property
  namehive.stats.dbconnectionstring/name

  
 valuejdbc:postgresql://localhost/hive_statsdb?createDatabaseIfNotExist=true;user=hive;password=pwd/value
  descriptionThe default connection string for the database that
 stores temporary hive statistics./description
 /property

 I use postgres as hive meta database, so there is a
 postgresql-9.0-801.jdbc4.jar file in lib.

 After run 'analyse table t1 partitions(dt) comput statistics;' in hive
 cli, it will output some stats info in cli, but nothing in db. And I
 can found there is the flowing errors

 1-08-15 14:54:54,767 INFO
 org.apache.hadoop.hive.ql.exec.TableScanOperator: Stats Gathering
 found a new partition spec = dt=20110805
 2011-08-15 14:54:54,767 INFO
 org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarding 1 rows
 2011-08-15 14:54:54,767 INFO ExecMapper: ExecMapper: processing 1
 rows: used memory = 39953640
 2011-08-15 14:54:54,768 INFO
 org.apache.hadoop.hive.ql.exec.MapOperator: 1 finished. closing...
 2011-08-15 14:54:54,768 INFO
 org.apache.hadoop.hive.ql.exec.MapOperator: 1 forwarded 2 rows
 2011-08-15 14:54:54,768 INFO
 org.apache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0
 2011-08-15 14:54:54,768 INFO
 org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 finished.
 closing...
 2011-08-15 14:54:54,768 INFO
 org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarded 2 rows
 2011-08-15 14:54:54,772 ERROR
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher: Error during
 JDBC connection to

 jdbc:postgresql://localhost/hive_statsdb?createDatabaseIfNotExist=true;user=hive;password=pwd.
 java.lang.ClassNotFoundException: org.postgresql.Driver
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at