[jira] Commented: (HIVE-1387) Make PERCENTILE work with double data type

2010-06-24 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882450#action_12882450
 ] 

John Sichi commented on HIVE-1387:
--

+1.  Will commit if tests pass.


> Make PERCENTILE work with double data type
> --
>
> Key: HIVE-1387
> URL: https://issues.apache.org/jira/browse/HIVE-1387
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Vaibhav Aggarwal
>Assignee: Mayank Lahiri
> Fix For: 0.6.0
>
> Attachments: HIVE-1387.2.patch, HIVE-1387.3.patch, 
> median_approx_quality.png, patch-1387-1.patch
>
>
> The PERCENTILE UDAF does not work with double datatype.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1135) Use Anakia for version controlled documentation

2010-06-24 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882447#action_12882447
 ] 

John Sichi commented on HIVE-1135:
--

I should be able to get to this one early next week.

> Use Anakia for version controlled documentation
> ---
>
> Key: HIVE-1135
> URL: https://issues.apache.org/jira/browse/HIVE-1135
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: hive-1135-3-patch.txt, hive-1135-4-patch.txt, 
> hive-1135-5-patch.txt, hive-1135-6-patch.txt, hive-1335-1.patch.txt, 
> hive-1335-2.patch.txt, jdom-1.1.jar, jdom-1.1.LICENSE, wtf.png
>
>
> Currently the Hive Language Manual and many other critical pieces of 
> documentation are on the Hive wiki. 
> Right now we count on the author of a patch to follow up and add wiki 
> entries. While we do a decent job with this, new features can be missed. Or 
> using running older/newer branches can not locate relevant documentation for 
> their branch. 
> ..example of a perception I do not think we want to give off...
> http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy
> We should generate our documentation in the way hadoop & hbase does, inline 
> using forest. I would like to take the lead on this, but we need a lot of 
> consensus on doing this properly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1435) Upgraded naming scheme causes JDO exceptions

2010-06-24 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882446#action_12882446
 ] 

John Sichi commented on HIVE-1435:
--

Committed.  Thanks Paul!


> Upgraded naming scheme causes JDO exceptions
> 
>
> Key: HIVE-1435
> URL: https://issues.apache.org/jira/browse/HIVE-1435
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Paul Yang
>Assignee: Paul Yang
> Fix For: 0.6.0, 0.7.0
>
> Attachments: HIVE-1435.1.patch
>
>
> We recently upgraded from Datanucleus 1.0 to 2.0, which changed some of the 
> defaults for how field names get mapped to datastore identifiers. Because of 
> this change, connecting to an existing database would throw exceptions such 
> as:
> 2010-06-24 17:59:09,854 ERROR exec.DDLTask 
> (SessionState.java:printError(277)) - FAILED: Error in metadata: 
> javax.jdo.JDODataStoreException: Insert of object 
> "org.apache.hadoop.hive.metastore.model.mstoragedescrip...@4ccd21c" using 
> statement "INSERT INTO `SDS` 
> (`SD_ID`,`NUM_BUCKETS`,`INPUT_FORMAT`,`OUTPUT_FORMAT`,`LOCATION`,`SERDE_ID`,`ISCOMPRESSED`)
>  VALUES (?,?,?,?,?,?,?)" failed : Unknown column 'ISCOMPRESSED' in 'field 
> list'
> NestedThrowables:
> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 
> 'ISCOMPRESSED' in 'field list'
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> javax.jdo.JDODataStoreException: Insert of object 
> "org.apache.hadoop.hive.metastore.model.mstoragedescrip...@4ccd21c" using 
> statement "INSERT INTO `SDS` 
> (`SD_ID`,`NUM_BUCKETS`,`INPUT_FORMAT`,`OUTPUT_FORMAT`,`LOCATION`,`SERDE_ID`,`ISCOMPRESSED`)
>  VALUES (?,?,?,?,?,?,?)" failed : Unknown column 'ISCOMPRESSED' in 'field 
> list'
> NestedThrowables:
> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 
> 'ISCOMPRESSED' in 'field list'
> at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:325)
> at 
> org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:2012)
> at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:144)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:633)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:506)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:384)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:302)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1435) Upgraded naming scheme causes JDO exceptions

2010-06-24 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1435:
-

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

> Upgraded naming scheme causes JDO exceptions
> 
>
> Key: HIVE-1435
> URL: https://issues.apache.org/jira/browse/HIVE-1435
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Paul Yang
>Assignee: Paul Yang
> Fix For: 0.6.0, 0.7.0
>
> Attachments: HIVE-1435.1.patch
>
>
> We recently upgraded from Datanucleus 1.0 to 2.0, which changed some of the 
> defaults for how field names get mapped to datastore identifiers. Because of 
> this change, connecting to an existing database would throw exceptions such 
> as:
> 2010-06-24 17:59:09,854 ERROR exec.DDLTask 
> (SessionState.java:printError(277)) - FAILED: Error in metadata: 
> javax.jdo.JDODataStoreException: Insert of object 
> "org.apache.hadoop.hive.metastore.model.mstoragedescrip...@4ccd21c" using 
> statement "INSERT INTO `SDS` 
> (`SD_ID`,`NUM_BUCKETS`,`INPUT_FORMAT`,`OUTPUT_FORMAT`,`LOCATION`,`SERDE_ID`,`ISCOMPRESSED`)
>  VALUES (?,?,?,?,?,?,?)" failed : Unknown column 'ISCOMPRESSED' in 'field 
> list'
> NestedThrowables:
> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 
> 'ISCOMPRESSED' in 'field list'
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> javax.jdo.JDODataStoreException: Insert of object 
> "org.apache.hadoop.hive.metastore.model.mstoragedescrip...@4ccd21c" using 
> statement "INSERT INTO `SDS` 
> (`SD_ID`,`NUM_BUCKETS`,`INPUT_FORMAT`,`OUTPUT_FORMAT`,`LOCATION`,`SERDE_ID`,`ISCOMPRESSED`)
>  VALUES (?,?,?,?,?,?,?)" failed : Unknown column 'ISCOMPRESSED' in 'field 
> list'
> NestedThrowables:
> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 
> 'ISCOMPRESSED' in 'field list'
> at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:325)
> at 
> org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:2012)
> at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:144)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:633)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:506)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:384)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:302)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1435) Upgraded naming scheme causes JDO exceptions

2010-06-24 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1435:
-

Fix Version/s: 0.6.0
   0.7.0

> Upgraded naming scheme causes JDO exceptions
> 
>
> Key: HIVE-1435
> URL: https://issues.apache.org/jira/browse/HIVE-1435
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Paul Yang
>Assignee: Paul Yang
> Fix For: 0.6.0, 0.7.0
>
> Attachments: HIVE-1435.1.patch
>
>
> We recently upgraded from Datanucleus 1.0 to 2.0, which changed some of the 
> defaults for how field names get mapped to datastore identifiers. Because of 
> this change, connecting to an existing database would throw exceptions such 
> as:
> 2010-06-24 17:59:09,854 ERROR exec.DDLTask 
> (SessionState.java:printError(277)) - FAILED: Error in metadata: 
> javax.jdo.JDODataStoreException: Insert of object 
> "org.apache.hadoop.hive.metastore.model.mstoragedescrip...@4ccd21c" using 
> statement "INSERT INTO `SDS` 
> (`SD_ID`,`NUM_BUCKETS`,`INPUT_FORMAT`,`OUTPUT_FORMAT`,`LOCATION`,`SERDE_ID`,`ISCOMPRESSED`)
>  VALUES (?,?,?,?,?,?,?)" failed : Unknown column 'ISCOMPRESSED' in 'field 
> list'
> NestedThrowables:
> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 
> 'ISCOMPRESSED' in 'field list'
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> javax.jdo.JDODataStoreException: Insert of object 
> "org.apache.hadoop.hive.metastore.model.mstoragedescrip...@4ccd21c" using 
> statement "INSERT INTO `SDS` 
> (`SD_ID`,`NUM_BUCKETS`,`INPUT_FORMAT`,`OUTPUT_FORMAT`,`LOCATION`,`SERDE_ID`,`ISCOMPRESSED`)
>  VALUES (?,?,?,?,?,?,?)" failed : Unknown column 'ISCOMPRESSED' in 'field 
> list'
> NestedThrowables:
> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 
> 'ISCOMPRESSED' in 'field list'
> at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:325)
> at 
> org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:2012)
> at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:144)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:633)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:506)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:384)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:302)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1428) ALTER TABLE ADD PARTITION fails with a remote Thirft metastore

2010-06-24 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated HIVE-1428:
-

Attachment: HIVE-1428.patch

Attached patch addresses the issue by throwing a NoSuchObjectException in 
ObjectStore for get_partition() method when there are no partitions matching 
the arguments. This requires a change in the thrift idl for hive_metastore. 
Hence the patch also has the compiler generated files. I have tested that the 
"ALTER TABLE ADD pARTITION.." works when hive client uses thrift and connects 
to a thrift server and also in the case it connects directly to the db - not 
sure how to test this in a unit test framework - would need some guidance in 
that area.

> ALTER TABLE ADD PARTITION fails with a remote Thirft metastore
> --
>
> Key: HIVE-1428
> URL: https://issues.apache.org/jira/browse/HIVE-1428
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Paul Yang
> Attachments: HIVE-1428.patch
>
>
> If the hive cli is configured to use a remote metastore, ALTER TABLE ... ADD 
> PARTITION commands will fail with an error similar to the following:
> {code}
> [prade...@chargesize:~/dev/howl]hive --auxpath ult-serde.jar -e "ALTER TABLE 
> mytable add partition(datestamp = '20091101', srcid = '10',action) location 
> '/user/pradeepk/mytable/20091101/10';"
> 10/06/16 17:08:59 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found 
> in the classpath. Usage of hadoop-site.xml is deprecated. Instead use 
> core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of 
> core-default.xml, mapred-default.xml and hdfs-default.xml respectively
> Hive history 
> file=/tmp/pradeepk/hive_job_log_pradeepk_201006161709_1934304805.txt
> FAILED: Error in metadata: org.apache.thrift.TApplicationException: 
> get_partition failed: unknown result
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask
> [prade...@chargesize:~/dev/howl]
> {code}
> This is due to a check that tries to retrieve the partition to see if it 
> exists. If it does not, an attempt is made to pass a null value from the 
> metastore. Since thrift does not support null return values, an exception is 
> thrown.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1436) specify partition metadata at insertion time

2010-06-24 Thread Namit Jain (JIRA)
specify partition metadata at insertion time


 Key: HIVE-1436
 URL: https://issues.apache.org/jira/browse/HIVE-1436
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: He Yongqiang
 Fix For: 0.7.0


Currently, the partition metadata is inferred from the table being inserted 
into.

So, it is not possible to convert a old partition to RCfile unless the table is 
first converted to RCFile, and the older
partition then inserted into.

But, for some reason, if the table cannot be converted to RCFile (say a old 
process is loading data into that table),
there is no way to convert the older partition from sequence file to rcfile.

There should be a way to do so, something like:


insert overwrite table T partition (..) 
select..
stored as RCFile;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1435) Upgraded naming scheme causes JDO exceptions

2010-06-24 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882439#action_12882439
 ] 

John Sichi commented on HIVE-1435:
--

+1.

> Upgraded naming scheme causes JDO exceptions
> 
>
> Key: HIVE-1435
> URL: https://issues.apache.org/jira/browse/HIVE-1435
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Paul Yang
>Assignee: Paul Yang
> Attachments: HIVE-1435.1.patch
>
>
> We recently upgraded from Datanucleus 1.0 to 2.0, which changed some of the 
> defaults for how field names get mapped to datastore identifiers. Because of 
> this change, connecting to an existing database would throw exceptions such 
> as:
> 2010-06-24 17:59:09,854 ERROR exec.DDLTask 
> (SessionState.java:printError(277)) - FAILED: Error in metadata: 
> javax.jdo.JDODataStoreException: Insert of object 
> "org.apache.hadoop.hive.metastore.model.mstoragedescrip...@4ccd21c" using 
> statement "INSERT INTO `SDS` 
> (`SD_ID`,`NUM_BUCKETS`,`INPUT_FORMAT`,`OUTPUT_FORMAT`,`LOCATION`,`SERDE_ID`,`ISCOMPRESSED`)
>  VALUES (?,?,?,?,?,?,?)" failed : Unknown column 'ISCOMPRESSED' in 'field 
> list'
> NestedThrowables:
> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 
> 'ISCOMPRESSED' in 'field list'
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> javax.jdo.JDODataStoreException: Insert of object 
> "org.apache.hadoop.hive.metastore.model.mstoragedescrip...@4ccd21c" using 
> statement "INSERT INTO `SDS` 
> (`SD_ID`,`NUM_BUCKETS`,`INPUT_FORMAT`,`OUTPUT_FORMAT`,`LOCATION`,`SERDE_ID`,`ISCOMPRESSED`)
>  VALUES (?,?,?,?,?,?,?)" failed : Unknown column 'ISCOMPRESSED' in 'field 
> list'
> NestedThrowables:
> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 
> 'ISCOMPRESSED' in 'field list'
> at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:325)
> at 
> org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:2012)
> at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:144)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:633)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:506)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:384)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:302)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: JDO upgrade issue with HIVE-1176

2010-06-24 Thread Paul Yang
I reproduced this by connecting to a database (with schemas created prior to 
the upgrade) with these properties set:


  datanucleus.autoCreateSchema
  false



  datanucleus.fixedDatastore
  true


The upgraded datanucleus will then try to insert into non-existent columns, 
resulting in exceptions. This problem doesn't show up with a fresh db and if 
auto create is enabled. See

https://issues.apache.org/jira/browse/HIVE-1435

From: Arvind Prabhakar [mailto:arv...@cloudera.com]
Sent: Thursday, June 24, 2010 7:35 PM
To: hive-u...@hadoop.apache.org
Cc: hive-dev@hadoop.apache.org
Subject: Re: JDO upgrade issue with HIVE-1176

John,

Can you describe the problem in more detail and perhaps give us an example that 
can be reproduced?

Arvind
On Thu, Jun 24, 2010 at 7:01 PM, John Sichi 
mailto:jsi...@facebook.com>> wrote:
Hi all,

Yesterday I committed Arvind's patch for HIVE-1176, which includes an upgrade 
from datanucleus 1.x to 2.x.

The patch works fine against a clean checkout, but just now Paul Yang and I 
noticed a couple of problems introduced due to a change in the way column names 
are generated by datanucleus when no name is specified in the JDO mapping 
(which is the case for some of ours such as "isCompressed").  This is a 
heads-up for people who happen to pull from latest trunk.

The problems only occur when running against an existing metastore, for example 
if you run trunk/build/dist/bin/hive against a new build in an existing sandbox 
(where a Derby embedded metastore had previously been created), or if you 
deploy against an existing production metastore DB.

In a developer sandbox, the default configuration tries to auto-update the 
schema to add the new column names, and hits an error due to the way the Derby 
ALTER TABLE statement is generated.  If you hit this, a workaround is to delete 
your trunk/metastore_db directory so that a fresh schema will be recreated 
instead.  Or just move to a fresh checkout.

Paul is taking a look at the column name generation to see if we can get it to 
match the datanucleus 1.x behavior.

JVS



Re: JDO upgrade issue with HIVE-1176

2010-06-24 Thread John Sichi
Paul explained one symptom in HIVE-1435.

For the other, the way to repro it is to start from a clean checkout  
from before HIVE-1176 was committed, build there and run build/dist/ 
bin/hive to init the metastore.  Then svn update to the tip of trunk,  
build again, and run hive CLI again; this time you will hit a startup  
error when it tries to modify the existing Derby database.

I think Paul's HIVE-1435 patch will resolve both.

JVS


On Jun 24, 2010, at 7:35 PM, "Arvind Prabhakar"   
wrote:

> John,
>
> Can you describe the problem in more detail and perhaps give us an  
> example
> that can be reproduced?
>
> Arvind
>
> On Thu, Jun 24, 2010 at 7:01 PM, John Sichi   
> wrote:
>
>> Hi all,
>>
>> Yesterday I committed Arvind's patch for HIVE-1176, which includes an
>> upgrade from datanucleus 1.x to 2.x.
>>
>> The patch works fine against a clean checkout, but just now Paul  
>> Yang and I
>> noticed a couple of problems introduced due to a change in the way  
>> column
>> names are generated by datanucleus when no name is specified in the  
>> JDO
>> mapping (which is the case for some of ours such as  
>> "isCompressed").  This
>> is a heads-up for people who happen to pull from latest trunk.
>>
>> The problems only occur when running against an existing metastore,  
>> for
>> example if you run trunk/build/dist/bin/hive against a new build in  
>> an
>> existing sandbox (where a Derby embedded metastore had previously  
>> been
>> created), or if you deploy against an existing production metastore  
>> DB.
>>
>> In a developer sandbox, the default configuration tries to auto- 
>> update the
>> schema to add the new column names, and hits an error due to the  
>> way the
>> Derby ALTER TABLE statement is generated.  If you hit this, a  
>> workaround is
>> to delete your trunk/metastore_db directory so that a fresh schema  
>> will be
>> recreated instead.  Or just move to a fresh checkout.
>>
>> Paul is taking a look at the column name generation to see if we  
>> can get it
>> to match the datanucleus 1.x behavior.
>>
>> JVS
>>
>>


[jira] Updated: (HIVE-1435) Upgraded naming scheme causes JDO exceptions

2010-06-24 Thread Paul Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-1435:


Status: Patch Available  (was: Open)

> Upgraded naming scheme causes JDO exceptions
> 
>
> Key: HIVE-1435
> URL: https://issues.apache.org/jira/browse/HIVE-1435
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Paul Yang
>Assignee: Paul Yang
> Attachments: HIVE-1435.1.patch
>
>
> We recently upgraded from Datanucleus 1.0 to 2.0, which changed some of the 
> defaults for how field names get mapped to datastore identifiers. Because of 
> this change, connecting to an existing database would throw exceptions such 
> as:
> 2010-06-24 17:59:09,854 ERROR exec.DDLTask 
> (SessionState.java:printError(277)) - FAILED: Error in metadata: 
> javax.jdo.JDODataStoreException: Insert of object 
> "org.apache.hadoop.hive.metastore.model.mstoragedescrip...@4ccd21c" using 
> statement "INSERT INTO `SDS` 
> (`SD_ID`,`NUM_BUCKETS`,`INPUT_FORMAT`,`OUTPUT_FORMAT`,`LOCATION`,`SERDE_ID`,`ISCOMPRESSED`)
>  VALUES (?,?,?,?,?,?,?)" failed : Unknown column 'ISCOMPRESSED' in 'field 
> list'
> NestedThrowables:
> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 
> 'ISCOMPRESSED' in 'field list'
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> javax.jdo.JDODataStoreException: Insert of object 
> "org.apache.hadoop.hive.metastore.model.mstoragedescrip...@4ccd21c" using 
> statement "INSERT INTO `SDS` 
> (`SD_ID`,`NUM_BUCKETS`,`INPUT_FORMAT`,`OUTPUT_FORMAT`,`LOCATION`,`SERDE_ID`,`ISCOMPRESSED`)
>  VALUES (?,?,?,?,?,?,?)" failed : Unknown column 'ISCOMPRESSED' in 'field 
> list'
> NestedThrowables:
> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 
> 'ISCOMPRESSED' in 'field list'
> at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:325)
> at 
> org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:2012)
> at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:144)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:633)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:506)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:384)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:302)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1435) Upgraded naming scheme causes JDO exceptions

2010-06-24 Thread Paul Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882433#action_12882433
 ] 

Paul Yang commented on HIVE-1435:
-

This patch was tested with a metastore database created prior to the 
datanucleus upgrade. In our configuration, we have

{code}

  datanucleus.autoCreateSchema
  false


  datanucleus.fixedDatastore
  true

{code}

so that no automatic schema changes will occur. The following commands were run 
without errors: create table, add partition, create view, create view, drop 
table, and drop view.

> Upgraded naming scheme causes JDO exceptions
> 
>
> Key: HIVE-1435
> URL: https://issues.apache.org/jira/browse/HIVE-1435
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Paul Yang
>Assignee: Paul Yang
> Attachments: HIVE-1435.1.patch
>
>
> We recently upgraded from Datanucleus 1.0 to 2.0, which changed some of the 
> defaults for how field names get mapped to datastore identifiers. Because of 
> this change, connecting to an existing database would throw exceptions such 
> as:
> 2010-06-24 17:59:09,854 ERROR exec.DDLTask 
> (SessionState.java:printError(277)) - FAILED: Error in metadata: 
> javax.jdo.JDODataStoreException: Insert of object 
> "org.apache.hadoop.hive.metastore.model.mstoragedescrip...@4ccd21c" using 
> statement "INSERT INTO `SDS` 
> (`SD_ID`,`NUM_BUCKETS`,`INPUT_FORMAT`,`OUTPUT_FORMAT`,`LOCATION`,`SERDE_ID`,`ISCOMPRESSED`)
>  VALUES (?,?,?,?,?,?,?)" failed : Unknown column 'ISCOMPRESSED' in 'field 
> list'
> NestedThrowables:
> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 
> 'ISCOMPRESSED' in 'field list'
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> javax.jdo.JDODataStoreException: Insert of object 
> "org.apache.hadoop.hive.metastore.model.mstoragedescrip...@4ccd21c" using 
> statement "INSERT INTO `SDS` 
> (`SD_ID`,`NUM_BUCKETS`,`INPUT_FORMAT`,`OUTPUT_FORMAT`,`LOCATION`,`SERDE_ID`,`ISCOMPRESSED`)
>  VALUES (?,?,?,?,?,?,?)" failed : Unknown column 'ISCOMPRESSED' in 'field 
> list'
> NestedThrowables:
> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 
> 'ISCOMPRESSED' in 'field list'
> at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:325)
> at 
> org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:2012)
> at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:144)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:633)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:506)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:384)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:302)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1435) Upgraded naming scheme causes JDO exceptions

2010-06-24 Thread Paul Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-1435:


Attachment: HIVE-1435.1.patch

This reverts the identifier factory to the same as datanucleus 1.1.

> Upgraded naming scheme causes JDO exceptions
> 
>
> Key: HIVE-1435
> URL: https://issues.apache.org/jira/browse/HIVE-1435
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Paul Yang
>Assignee: Paul Yang
> Attachments: HIVE-1435.1.patch
>
>
> We recently upgraded from Datanucleus 1.0 to 2.0, which changed some of the 
> defaults for how field names get mapped to datastore identifiers. Because of 
> this change, connecting to an existing database would throw exceptions such 
> as:
> 2010-06-24 17:59:09,854 ERROR exec.DDLTask 
> (SessionState.java:printError(277)) - FAILED: Error in metadata: 
> javax.jdo.JDODataStoreException: Insert of object 
> "org.apache.hadoop.hive.metastore.model.mstoragedescrip...@4ccd21c" using 
> statement "INSERT INTO `SDS` 
> (`SD_ID`,`NUM_BUCKETS`,`INPUT_FORMAT`,`OUTPUT_FORMAT`,`LOCATION`,`SERDE_ID`,`ISCOMPRESSED`)
>  VALUES (?,?,?,?,?,?,?)" failed : Unknown column 'ISCOMPRESSED' in 'field 
> list'
> NestedThrowables:
> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 
> 'ISCOMPRESSED' in 'field list'
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> javax.jdo.JDODataStoreException: Insert of object 
> "org.apache.hadoop.hive.metastore.model.mstoragedescrip...@4ccd21c" using 
> statement "INSERT INTO `SDS` 
> (`SD_ID`,`NUM_BUCKETS`,`INPUT_FORMAT`,`OUTPUT_FORMAT`,`LOCATION`,`SERDE_ID`,`ISCOMPRESSED`)
>  VALUES (?,?,?,?,?,?,?)" failed : Unknown column 'ISCOMPRESSED' in 'field 
> list'
> NestedThrowables:
> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 
> 'ISCOMPRESSED' in 'field list'
> at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:325)
> at 
> org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:2012)
> at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:144)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:633)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:506)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:384)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:302)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1435) Upgraded naming scheme causes JDO exceptions

2010-06-24 Thread Paul Yang (JIRA)
Upgraded naming scheme causes JDO exceptions


 Key: HIVE-1435
 URL: https://issues.apache.org/jira/browse/HIVE-1435
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.6.0, 0.7.0
Reporter: Paul Yang
Assignee: Paul Yang


We recently upgraded from Datanucleus 1.0 to 2.0, which changed some of the 
defaults for how field names get mapped to datastore identifiers. Because of 
this change, connecting to an existing database would throw exceptions such as:

{code}
2010-06-24 17:59:09,854 ERROR exec.DDLTask (SessionState.java:printError(277)) 
- FAILED: Error in metadata: javax.jdo.JDODataStoreException: Insert of object 
"org.apache.hadoop.hive.metastore.model.mstoragedescrip...@4ccd21c" using 
statement "INSERT INTO `SDS` 
(`SD_ID`,`NUM_BUCKETS`,`INPUT_FORMAT`,`OUTPUT_FORMAT`,`LOCATION`,`SERDE_ID`,`ISCOMPRESSED`)
 VALUES (?,?,?,?,?,?,?)" failed : Unknown column 'ISCOMPRESSED' in 'field list'
NestedThrowables:
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 
'ISCOMPRESSED' in 'field list'
org.apache.hadoop.hive.ql.metadata.HiveException: 
javax.jdo.JDODataStoreException: Insert of object 
"org.apache.hadoop.hive.metastore.model.mstoragedescrip...@4ccd21c" using 
statement "INSERT INTO `SDS` 
(`SD_ID`,`NUM_BUCKETS`,`INPUT_FORMAT`,`OUTPUT_FORMAT`,`LOCATION`,`SERDE_ID`,`ISCOMPRESSED`)
 VALUES (?,?,?,?,?,?,?)" failed : Unknown column 'ISCOMPRESSED' in 'field list'
NestedThrowables:
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 
'ISCOMPRESSED' in 'field list'
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:325)
at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:2012)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:144)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:633)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:506)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:384)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:302)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1435) Upgraded naming scheme causes JDO exceptions

2010-06-24 Thread Paul Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-1435:


Description: 
We recently upgraded from Datanucleus 1.0 to 2.0, which changed some of the 
defaults for how field names get mapped to datastore identifiers. Because of 
this change, connecting to an existing database would throw exceptions such as:

2010-06-24 17:59:09,854 ERROR exec.DDLTask (SessionState.java:printError(277)) 
- FAILED: Error in metadata: javax.jdo.JDODataStoreException: Insert of object 
"org.apache.hadoop.hive.metastore.model.mstoragedescrip...@4ccd21c" using 
statement "INSERT INTO `SDS` 
(`SD_ID`,`NUM_BUCKETS`,`INPUT_FORMAT`,`OUTPUT_FORMAT`,`LOCATION`,`SERDE_ID`,`ISCOMPRESSED`)
 VALUES (?,?,?,?,?,?,?)" failed : Unknown column 'ISCOMPRESSED' in 'field list'
NestedThrowables:
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 
'ISCOMPRESSED' in 'field list'
org.apache.hadoop.hive.ql.metadata.HiveException: 
javax.jdo.JDODataStoreException: Insert of object 
"org.apache.hadoop.hive.metastore.model.mstoragedescrip...@4ccd21c" using 
statement "INSERT INTO `SDS` 
(`SD_ID`,`NUM_BUCKETS`,`INPUT_FORMAT`,`OUTPUT_FORMAT`,`LOCATION`,`SERDE_ID`,`ISCOMPRESSED`)
 VALUES (?,?,?,?,?,?,?)" failed : Unknown column 'ISCOMPRESSED' in 'field list'
NestedThrowables:
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 
'ISCOMPRESSED' in 'field list'
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:325)
at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:2012)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:144)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:633)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:506)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:384)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:302)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)


  was:
We recently upgraded from Datanucleus 1.0 to 2.0, which changed some of the 
defaults for how field names get mapped to datastore identifiers. Because of 
this change, connecting to an existing database would throw exceptions such as:

{code}
2010-06-24 17:59:09,854 ERROR exec.DDLTask (SessionState.java:printError(277)) 
- FAILED: Error in metadata: javax.jdo.JDODataStoreException: Insert of object 
"org.apache.hadoop.hive.metastore.model.mstoragedescrip...@4ccd21c" using 
statement "INSERT INTO `SDS` 
(`SD_ID`,`NUM_BUCKETS`,`INPUT_FORMAT`,`OUTPUT_FORMAT`,`LOCATION`,`SERDE_ID`,`ISCOMPRESSED`)
 VALUES (?,?,?,?,?,?,?)" failed : Unknown column 'ISCOMPRESSED' in 'field list'
NestedThrowables:
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 
'ISCOMPRESSED' in 'field list'
org.apache.hadoop.hive.ql.metadata.HiveException: 
javax.jdo.JDODataStoreException: Insert of object 
"org.apache.hadoop.hive.metastore.model.mstoragedescrip...@4ccd21c" using 
statement "INSERT INTO `SDS` 
(`SD_ID`,`NUM_BUCKETS`,`INPUT_FORMAT`,`OUTPUT_FORMAT`,`LOCATION`,`SERDE_ID`,`ISCOMPRESSED`)
 VALUES (?,?,?,?,?,?,?)" failed : Unknown column 'ISCOMPRESSED' in 'field list'
NestedThrowables:
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 
'ISCOMPRESSED' in 'field list'
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:325)
at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:2012)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:144)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:633)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:506)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:384)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:302)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(

Re: JDO upgrade issue with HIVE-1176

2010-06-24 Thread Arvind Prabhakar
John,

Can you describe the problem in more detail and perhaps give us an example
that can be reproduced?

Arvind

On Thu, Jun 24, 2010 at 7:01 PM, John Sichi  wrote:

> Hi all,
>
> Yesterday I committed Arvind's patch for HIVE-1176, which includes an
> upgrade from datanucleus 1.x to 2.x.
>
> The patch works fine against a clean checkout, but just now Paul Yang and I
> noticed a couple of problems introduced due to a change in the way column
> names are generated by datanucleus when no name is specified in the JDO
> mapping (which is the case for some of ours such as "isCompressed").  This
> is a heads-up for people who happen to pull from latest trunk.
>
> The problems only occur when running against an existing metastore, for
> example if you run trunk/build/dist/bin/hive against a new build in an
> existing sandbox (where a Derby embedded metastore had previously been
> created), or if you deploy against an existing production metastore DB.
>
> In a developer sandbox, the default configuration tries to auto-update the
> schema to add the new column names, and hits an error due to the way the
> Derby ALTER TABLE statement is generated.  If you hit this, a workaround is
> to delete your trunk/metastore_db directory so that a fresh schema will be
> recreated instead.  Or just move to a fresh checkout.
>
> Paul is taking a look at the column name generation to see if we can get it
> to match the datanucleus 1.x behavior.
>
> JVS
>
>


[jira] Created: (HIVE-1434) Cassandra Storage Handler

2010-06-24 Thread Edward Capriolo (JIRA)
Cassandra Storage Handler
-

 Key: HIVE-1434
 URL: https://issues.apache.org/jira/browse/HIVE-1434
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Edward Capriolo
Assignee: Edward Capriolo


Add a cassandra storage handler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



JDO upgrade issue with HIVE-1176

2010-06-24 Thread John Sichi
Hi all,

Yesterday I committed Arvind's patch for HIVE-1176, which includes an upgrade 
from datanucleus 1.x to 2.x.

The patch works fine against a clean checkout, but just now Paul Yang and I 
noticed a couple of problems introduced due to a change in the way column names 
are generated by datanucleus when no name is specified in the JDO mapping 
(which is the case for some of ours such as "isCompressed").  This is a 
heads-up for people who happen to pull from latest trunk.

The problems only occur when running against an existing metastore, for example 
if you run trunk/build/dist/bin/hive against a new build in an existing sandbox 
(where a Derby embedded metastore had previously been created), or if you 
deploy against an existing production metastore DB.

In a developer sandbox, the default configuration tries to auto-update the 
schema to add the new column names, and hits an error due to the way the Derby 
ALTER TABLE statement is generated.  If you hit this, a workaround is to delete 
your trunk/metastore_db directory so that a fresh schema will be recreated 
instead.  Or just move to a fresh checkout.

Paul is taking a look at the column name generation to see if we can get it to 
match the datanucleus 1.x behavior.

JVS



[jira] Commented: (HIVE-1395) Table aliases are ambiguous

2010-06-24 Thread Ted Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882414#action_12882414
 ] 

Ted Xu commented on HIVE-1395:
--

+1 to disallow it in strict mode. Simply disallowing it at all may be too 
aggressive and cause more confusion. 
I'm pretty sure that ambiguous is caused by predicate push down, we shall work 
it out rather than avoid it. 

> Table aliases are ambiguous
> ---
>
> Key: HIVE-1395
> URL: https://issues.apache.org/jira/browse/HIVE-1395
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Adam Kramer
>
> Consider this query:
> SELECT a.num FROM (
>   SELECT a.num AS num, b.num AS num2
>   FROM foo a LEFT OUTER JOIN bar b ON a.num=b.num
> ) a
> WHERE a.num2 IS NULL;
> ...in this case, the table alias 'a' is ambiguous. It could be the outer 
> table (i.e., the subquery result), or it could be the inner table (foo).
> In the above case, Hive silently parses the outer reference to a as the inner 
> reference. The result, then, is akin to:
> SELECT foo.num FROM foo WHERE bar.num IS NULL. This is bad.
> The bigger problem, however, is that Hive even lets people use the same table 
> alias at multiple points in the query. We should simply throw an exception 
> during the parse stage if there is any ambiguity in which table is which, 
> just like we do if the column names are ambiguous.
> Or, if for some reason we need people to be able to use 'a' to refer to 
> multiple tables or subqueries, it would be excellent if the exact parsing 
> structure were made clear and added to the wiki. In that case, I will file a 
> separate bug JIRA to complain about how it should be different. :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1387) Make PERCENTILE work with double data type

2010-06-24 Thread Mayank Lahiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Lahiri updated HIVE-1387:


Attachment: HIVE-1387.3.patch

(1) Moved NumericHistogram into a top-level class

(2) Beefed up Javadocs and Hive descriptions for percentile_approx() and 
histogram_numeric()

(3) Fixed typos

> Make PERCENTILE work with double data type
> --
>
> Key: HIVE-1387
> URL: https://issues.apache.org/jira/browse/HIVE-1387
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Vaibhav Aggarwal
>Assignee: Mayank Lahiri
> Fix For: 0.6.0
>
> Attachments: HIVE-1387.2.patch, HIVE-1387.3.patch, 
> median_approx_quality.png, patch-1387-1.patch
>
>
> The PERCENTILE UDAF does not work with double datatype.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1405) hive command line option -i to run an init file before other SQL commands

2010-06-24 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1405:
-

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Committed. Thanks John

> hive command line option -i to run an init file before other SQL commands
> -
>
> Key: HIVE-1405
> URL: https://issues.apache.org/jira/browse/HIVE-1405
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Clients
>Affects Versions: 0.5.0
>Reporter: Jonathan Chang
>Assignee: John Sichi
> Fix For: 0.7.0
>
> Attachments: HIVE-1405.1.patch
>
>
> When deploying hive, it would be nice to have a .hiverc file containing 
> statements that would be automatically run whenever hive is launched.  This 
> way, we can automatically add JARs, create temporary functions, set flags, 
> etc. for all users quickly. 
> This should ideally be set up like .bashrc and the like with a global version 
> and a user-local version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1135) Use Anakia for version controlled documentation

2010-06-24 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882349#action_12882349
 ] 

Edward Capriolo commented on HIVE-1135:
---

Bump:: I will fix the formatting later. Can we commit this we do not really 
need any unit tests here?

> Use Anakia for version controlled documentation
> ---
>
> Key: HIVE-1135
> URL: https://issues.apache.org/jira/browse/HIVE-1135
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: hive-1135-3-patch.txt, hive-1135-4-patch.txt, 
> hive-1135-5-patch.txt, hive-1135-6-patch.txt, hive-1335-1.patch.txt, 
> hive-1335-2.patch.txt, jdom-1.1.jar, jdom-1.1.LICENSE, wtf.png
>
>
> Currently the Hive Language Manual and many other critical pieces of 
> documentation are on the Hive wiki. 
> Right now we count on the author of a patch to follow up and add wiki 
> entries. While we do a decent job with this, new features can be missed. Or 
> using running older/newer branches can not locate relevant documentation for 
> their branch. 
> ..example of a perception I do not think we want to give off...
> http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy
> We should generate our documentation in the way hadoop & hbase does, inline 
> using forest. I would like to take the lead on this, but we need a lot of 
> consensus on doing this properly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1387) Make PERCENTILE work with double data type

2010-06-24 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882324#action_12882324
 ] 

HBase Review Board commented on HIVE-1387:
--

Message from: "John Sichi" 

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/233/
---

Review request for Hive Developers.


Summary
---

review by jvs


This addresses bug HIVE-1387.
http://issues.apache.org/jira/browse/HIVE-1387


Diffs
-

  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
 957296 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFHistogramNumeric.java
 957296 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/test/queries/clientpositive/udaf_percentile_approx.q
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/test/results/clientpositive/udaf_percentile_approx.q.out
 PRE-CREATION 

Diff: http://review.hbase.org/r/233/diff


Testing
---


Thanks,

John




> Make PERCENTILE work with double data type
> --
>
> Key: HIVE-1387
> URL: https://issues.apache.org/jira/browse/HIVE-1387
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Vaibhav Aggarwal
>Assignee: Mayank Lahiri
> Fix For: 0.6.0
>
> Attachments: HIVE-1387.2.patch, median_approx_quality.png, 
> patch-1387-1.patch
>
>
> The PERCENTILE UDAF does not work with double datatype.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1229) replace dependencies on HBase deprecated API

2010-06-24 Thread Basab Maulik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Basab Maulik updated HIVE-1229:
---

Attachment: HIVE-1229.1.patch

Reattaching first patch with correct name.

> replace dependencies on HBase deprecated API
> 
>
> Key: HIVE-1229
> URL: https://issues.apache.org/jira/browse/HIVE-1229
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: Basab Maulik
> Attachments: HIVE-1229.1.patch, HIVE-1229.2.patch
>
>
> Some of these dependencies are on the old Hadoop mapred packages; others are 
> HBase-specific.  The former have to wait until the rest of Hive moves over to 
> the new Hadoop mapreduce package, but the HBase-specific ones don't have to 
> wait.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1405) hive command line option -i to run an init file before other SQL commands

2010-06-24 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882249#action_12882249
 ] 

Namit Jain commented on HIVE-1405:
--

+1

Filed https://issues.apache.org/jira/browse/HIVE-1433 as a followup

> hive command line option -i to run an init file before other SQL commands
> -
>
> Key: HIVE-1405
> URL: https://issues.apache.org/jira/browse/HIVE-1405
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Clients
>Affects Versions: 0.5.0
>Reporter: Jonathan Chang
>Assignee: John Sichi
> Fix For: 0.7.0
>
> Attachments: HIVE-1405.1.patch
>
>
> When deploying hive, it would be nice to have a .hiverc file containing 
> statements that would be automatically run whenever hive is launched.  This 
> way, we can automatically add JARs, create temporary functions, set flags, 
> etc. for all users quickly. 
> This should ideally be set up like .bashrc and the like with a global version 
> and a user-local version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1395) Table aliases are ambiguous

2010-06-24 Thread Adam Kramer (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882246#action_12882246
 ] 

Adam Kramer commented on HIVE-1395:
---

Right--this is reported as a bug because Hive is crossing levels. The first use 
of a.num in the above query should look into the CLOSEST scope, which is the 
subquery labeled a. What hive is doing here is looking into the NON-CLOSEST 
scope, and returning foo.num when it should return (subquery).num. That is the 
bug.

Whether we should allow it at ALL in hive, since it's confusing, is a broader 
question. I vote either for disallowing it at all, or disallowing it in strict 
mode.

> Table aliases are ambiguous
> ---
>
> Key: HIVE-1395
> URL: https://issues.apache.org/jira/browse/HIVE-1395
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Adam Kramer
>
> Consider this query:
> SELECT a.num FROM (
>   SELECT a.num AS num, b.num AS num2
>   FROM foo a LEFT OUTER JOIN bar b ON a.num=b.num
> ) a
> WHERE a.num2 IS NULL;
> ...in this case, the table alias 'a' is ambiguous. It could be the outer 
> table (i.e., the subquery result), or it could be the inner table (foo).
> In the above case, Hive silently parses the outer reference to a as the inner 
> reference. The result, then, is akin to:
> SELECT foo.num FROM foo WHERE bar.num IS NULL. This is bad.
> The bigger problem, however, is that Hive even lets people use the same table 
> alias at multiple points in the query. We should simply throw an exception 
> during the parse stage if there is any ambiguity in which table is which, 
> just like we do if the column names are ambiguous.
> Or, if for some reason we need people to be able to use 'a' to refer to 
> multiple tables or subqueries, it would be excellent if the exact parsing 
> structure were made clear and added to the wiki. In that case, I will file a 
> separate bug JIRA to complain about how it should be different. :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1433) move tables created in QTestUtil to a init file

2010-06-24 Thread Namit Jain (JIRA)
move tables created in QTestUtil to a init file
---

 Key: HIVE-1433
 URL: https://issues.apache.org/jira/browse/HIVE-1433
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Reporter: Namit Jain
Assignee: John Sichi
 Fix For: 0.7.0


Followup for https://issues.apache.org/jira/browse/HIVE-1405

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1342) Predicate push down get error result when sub-queries have the same alias name

2010-06-24 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882236#action_12882236
 ] 

John Sichi commented on HIVE-1342:
--

Commentary on duplicate aliases in HIVE-1395.


> Predicate push down get error result when sub-queries have the same alias 
> name 
> ---
>
> Key: HIVE-1342
> URL: https://issues.apache.org/jira/browse/HIVE-1342
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Ted Xu
>Priority: Critical
> Fix For: 0.6.0
>
> Attachments: cmd.hql, explain, ppd_same_alias_1.patch
>
>
> Query is over-optimized by PPD when sub-queries have the same alias name, see 
> the query:
> ---
> create table if not exists dm_fact_buyer_prd_info_d (
>   category_id string
>   ,gmv_trade_num  int
>   ,user_idint
>   )
> PARTITIONED BY (ds int);
> set hive.optimize.ppd=true;
> set hive.map.aggr=true;
> explain select category_id1,category_id2,assoc_idx
> from (
>   select 
>   category_id1
>   , category_id2
>   , count(distinct user_id) as assoc_idx
>   from (
>   select 
>   t1.category_id as category_id1
>   , t2.category_id as category_id2
>   , t1.user_id
>   from (
>   select category_id, user_id
>   from dm_fact_buyer_prd_info_d
>   group by category_id, user_id ) t1
>   join (
>   select category_id, user_id
>   from dm_fact_buyer_prd_info_d
>   group by category_id, user_id ) t2 on 
> t1.user_id=t2.user_id 
>   ) t1
>   group by category_id1, category_id2 ) t_o
>   where category_id1 <> category_id2
>   and assoc_idx > 2;
> -
> The query above will fail when execute, throwing exception: "can not cast 
> UDFOpNotEqual(Text, IntWritable) to UDFOpNotEqual(Text, Text)". 
> I explained the query and the execute plan looks really wired ( only Stage-1, 
> see the highlighted predicate):
> ---
> Stage: Stage-1
> Map Reduce
>   Alias -> Map Operator Tree:
> t_o:t1:t1:dm_fact_buyer_prd_info_d 
>   TableScan
> alias: dm_fact_buyer_prd_info_d
> Filter Operator
>   predicate:
>   expr: *(category_id <> user_id)*
>   type: boolean
>   Select Operator
> expressions:
>   expr: category_id
>   type: string
>   expr: user_id
>   type: bigint
> outputColumnNames: category_id, user_id
> Group By Operator
>   keys:
> expr: category_id
> type: string
> expr: user_id
> type: bigint
>   mode: hash
>   outputColumnNames: _col0, _col1
>   Reduce Output Operator
> key expressions:
>   expr: _col0
>   type: string
>   expr: _col1
>   type: bigint
> sort order: ++
> Map-reduce partition columns:
>   expr: _col0
>   type: string
>   expr: _col1
>   type: bigint
> tag: -1
>   Reduce Operator Tree:
> Group By Operator
>   keys:
> expr: KEY._col0
> type: string
> expr: KEY._col1
> type: bigint
>   mode: mergepartial
>   outputColumnNames: _col0, _col1
>   Select Operator
> expressions:
>   expr: _col0
>   type: string
>   expr: _col1
>   type: bigint
> outputColumnNames: _col0, _col1
> File Output Operator
>   compressed: true
>   GlobalTableId: 0
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>  --
> If disabling predi

[jira] Commented: (HIVE-1395) Table aliases are ambiguous

2010-06-24 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882235#action_12882235
 ] 

John Sichi commented on HIVE-1395:
--

In the SQL standard, aliases are allowed to be reused at different levels of 
query nesting; conflicts are only illegal in the same FROM clause.  This is the 
same rule as for column names, actually (they can be reused at different levels 
of the query nesting, but not at the same level).  There are corresponding 
rules for resolving references when correlated subqueries are used (looking up 
into the closest scope).

I'm fine with making the rules stricter for Hive, since reusing a table alias 
is very confusing; just pointing this out.


> Table aliases are ambiguous
> ---
>
> Key: HIVE-1395
> URL: https://issues.apache.org/jira/browse/HIVE-1395
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Adam Kramer
>
> Consider this query:
> SELECT a.num FROM (
>   SELECT a.num AS num, b.num AS num2
>   FROM foo a LEFT OUTER JOIN bar b ON a.num=b.num
> ) a
> WHERE a.num2 IS NULL;
> ...in this case, the table alias 'a' is ambiguous. It could be the outer 
> table (i.e., the subquery result), or it could be the inner table (foo).
> In the above case, Hive silently parses the outer reference to a as the inner 
> reference. The result, then, is akin to:
> SELECT foo.num FROM foo WHERE bar.num IS NULL. This is bad.
> The bigger problem, however, is that Hive even lets people use the same table 
> alias at multiple points in the query. We should simply throw an exception 
> during the parse stage if there is any ambiguity in which table is which, 
> just like we do if the column names are ambiguous.
> Or, if for some reason we need people to be able to use 'a' to refer to 
> multiple tables or subqueries, it would be excellent if the exact parsing 
> structure were made clear and added to the wiki. In that case, I will file a 
> separate bug JIRA to complain about how it should be different. :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Hive-trunk-h0.19 #481

2010-06-24 Thread Apache Hudson Server
See 

Changes:

[namit] HIVE-1430. Dont run serialize plan by default (Ning Zhang via namit)

[jvs] HIVE-1176. 'create if not exists' fails for a table name with
'select' in it.
(Arvind Prabhakar via jvs)

[athusoo] HIVE-1271. Make matching of type information case insensitive. 
(Arvind Prabhakar via Ashish Thusoo)

[jvs] HIVE-1359. Unit test should be shim-aware
(Ning Zhang via jvs)

--
[...truncated 12969 lines...]
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 

 

[junit] Done query: unknown_function4.q
[junit] Begin query: unknown_table1.q
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 

 

[junit] Done query: unknown_table1.q
[junit] Begin query: unknown_table2.q
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] 

[jira] Commented: (HIVE-1342) Predicate push down get error result when sub-queries have the same alias name

2010-06-24 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882203#action_12882203
 ] 

He Yongqiang commented on HIVE-1342:


It maybe better if we throw an error message when we see duplicate alias name.

> Predicate push down get error result when sub-queries have the same alias 
> name 
> ---
>
> Key: HIVE-1342
> URL: https://issues.apache.org/jira/browse/HIVE-1342
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Ted Xu
>Priority: Critical
> Fix For: 0.6.0
>
> Attachments: cmd.hql, explain, ppd_same_alias_1.patch
>
>
> Query is over-optimized by PPD when sub-queries have the same alias name, see 
> the query:
> ---
> create table if not exists dm_fact_buyer_prd_info_d (
>   category_id string
>   ,gmv_trade_num  int
>   ,user_idint
>   )
> PARTITIONED BY (ds int);
> set hive.optimize.ppd=true;
> set hive.map.aggr=true;
> explain select category_id1,category_id2,assoc_idx
> from (
>   select 
>   category_id1
>   , category_id2
>   , count(distinct user_id) as assoc_idx
>   from (
>   select 
>   t1.category_id as category_id1
>   , t2.category_id as category_id2
>   , t1.user_id
>   from (
>   select category_id, user_id
>   from dm_fact_buyer_prd_info_d
>   group by category_id, user_id ) t1
>   join (
>   select category_id, user_id
>   from dm_fact_buyer_prd_info_d
>   group by category_id, user_id ) t2 on 
> t1.user_id=t2.user_id 
>   ) t1
>   group by category_id1, category_id2 ) t_o
>   where category_id1 <> category_id2
>   and assoc_idx > 2;
> -
> The query above will fail when execute, throwing exception: "can not cast 
> UDFOpNotEqual(Text, IntWritable) to UDFOpNotEqual(Text, Text)". 
> I explained the query and the execute plan looks really wired ( only Stage-1, 
> see the highlighted predicate):
> ---
> Stage: Stage-1
> Map Reduce
>   Alias -> Map Operator Tree:
> t_o:t1:t1:dm_fact_buyer_prd_info_d 
>   TableScan
> alias: dm_fact_buyer_prd_info_d
> Filter Operator
>   predicate:
>   expr: *(category_id <> user_id)*
>   type: boolean
>   Select Operator
> expressions:
>   expr: category_id
>   type: string
>   expr: user_id
>   type: bigint
> outputColumnNames: category_id, user_id
> Group By Operator
>   keys:
> expr: category_id
> type: string
> expr: user_id
> type: bigint
>   mode: hash
>   outputColumnNames: _col0, _col1
>   Reduce Output Operator
> key expressions:
>   expr: _col0
>   type: string
>   expr: _col1
>   type: bigint
> sort order: ++
> Map-reduce partition columns:
>   expr: _col0
>   type: string
>   expr: _col1
>   type: bigint
> tag: -1
>   Reduce Operator Tree:
> Group By Operator
>   keys:
> expr: KEY._col0
> type: string
> expr: KEY._col1
> type: bigint
>   mode: mergepartial
>   outputColumnNames: _col0, _col1
>   Select Operator
> expressions:
>   expr: _col0
>   type: string
>   expr: _col1
>   type: bigint
> outputColumnNames: _col0, _col1
> File Output Operator
>   compressed: true
>   GlobalTableId: 0
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>  ---

[jira] Updated: (HIVE-1405) hive command line option -i to run an init file before other SQL commands

2010-06-24 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1405:
-

Status: Patch Available  (was: Open)

> hive command line option -i to run an init file before other SQL commands
> -
>
> Key: HIVE-1405
> URL: https://issues.apache.org/jira/browse/HIVE-1405
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Clients
>Affects Versions: 0.5.0
>Reporter: Jonathan Chang
>Assignee: John Sichi
> Fix For: 0.7.0
>
> Attachments: HIVE-1405.1.patch
>
>
> When deploying hive, it would be nice to have a .hiverc file containing 
> statements that would be automatically run whenever hive is launched.  This 
> way, we can automatically add JARs, create temporary functions, set flags, 
> etc. for all users quickly. 
> This should ideally be set up like .bashrc and the like with a global version 
> and a user-local version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1405) hive command line option -i to run an init file before other SQL commands

2010-06-24 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882186#action_12882186
 ] 

John Sichi commented on HIVE-1405:
--

I was thinking about that, but some of the information (e.g. path) is 
environment-specific.  Also, can we do anything like this in a followup JIRA so 
that this can go in as is?

> hive command line option -i to run an init file before other SQL commands
> -
>
> Key: HIVE-1405
> URL: https://issues.apache.org/jira/browse/HIVE-1405
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Clients
>Affects Versions: 0.5.0
>Reporter: Jonathan Chang
>Assignee: John Sichi
> Fix For: 0.7.0
>
> Attachments: HIVE-1405.1.patch
>
>
> When deploying hive, it would be nice to have a .hiverc file containing 
> statements that would be automatically run whenever hive is launched.  This 
> way, we can automatically add JARs, create temporary functions, set flags, 
> etc. for all users quickly. 
> This should ideally be set up like .bashrc and the like with a global version 
> and a user-local version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1405) hive command line option -i to run an init file before other SQL commands

2010-06-24 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882180#action_12882180
 ] 

Namit Jain commented on HIVE-1405:
--

Do you want to move the tables created in QTestUtil (src, srcpart etc.) in the 
new init file

> hive command line option -i to run an init file before other SQL commands
> -
>
> Key: HIVE-1405
> URL: https://issues.apache.org/jira/browse/HIVE-1405
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Clients
>Affects Versions: 0.5.0
>Reporter: Jonathan Chang
>Assignee: John Sichi
> Fix For: 0.7.0
>
> Attachments: HIVE-1405.1.patch
>
>
> When deploying hive, it would be nice to have a .hiverc file containing 
> statements that would be automatically run whenever hive is launched.  This 
> way, we can automatically add JARs, create temporary functions, set flags, 
> etc. for all users quickly. 
> This should ideally be set up like .bashrc and the like with a global version 
> and a user-local version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1405) hive command line option -i to run an init file before other SQL commands

2010-06-24 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1405:
-

Status: Open  (was: Patch Available)

> hive command line option -i to run an init file before other SQL commands
> -
>
> Key: HIVE-1405
> URL: https://issues.apache.org/jira/browse/HIVE-1405
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Clients
>Affects Versions: 0.5.0
>Reporter: Jonathan Chang
>Assignee: John Sichi
> Fix For: 0.7.0
>
> Attachments: HIVE-1405.1.patch
>
>
> When deploying hive, it would be nice to have a .hiverc file containing 
> statements that would be automatically run whenever hive is launched.  This 
> way, we can automatically add JARs, create temporary functions, set flags, 
> etc. for all users quickly. 
> This should ideally be set up like .bashrc and the like with a global version 
> and a user-local version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Hive-trunk-h0.17 #479

2010-06-24 Thread Apache Hudson Server
See 

Changes:

[namit] HIVE-1430. Dont run serialize plan by default (Ning Zhang via namit)

[jvs] HIVE-1176. 'create if not exists' fails for a table name with
'select' in it.
(Arvind Prabhakar via jvs)

[athusoo] HIVE-1271. Make matching of type information case insensitive. 
(Arvind Prabhakar via Ashish Thusoo)

[jvs] HIVE-1359. Unit test should be shim-aware
(Ning Zhang via jvs)

--
[...truncated 11162 lines...]
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 

 

[junit] Done query: unknown_function4.q
[junit] Begin query: unknown_table1.q
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 

 

[junit] Done query: unknown_table1.q
[junit] Begin query: unknown_table2.q
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit]