[jira] Commented: (HIVE-287) count distinct on multiple columns does not work

2010-07-06 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885788#action_12885788
 ] 

HBase Review Board commented on HIVE-287:
-

Message from: "Carl Steinbach" 

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/275/
---

(Updated 2010-07-06 19:24:53.267440)


Review request for Hive Developers and John Sichi.


Changes
---

Updated 'Bugs' field in review request.


Summary
---

This is the patch revision 5 for HIVE-287. Please see the Jira comments for 
full description.


This addresses bug HIVE-287.
http://issues.apache.org/jira/browse/HIVE-287


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 4109103 
  ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 7e6e63e 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 252b89d 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/AbstractGenericUDAFResolver.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java 
0ef4734 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFBridge.java 
916eb33 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCount.java 
0054664 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFHistogramNumeric.java
 8d22ef1 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMax.java 6785687 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMin.java 051f3a1 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFParameterInfo.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFResolver.java 
9888b52 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFResolver2.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java ce97afd 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFVariance.java 
26dc84c 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/SimpleGenericUDAFParameterInfo.java
 PRE-CREATION 
  ql/src/test/queries/clientnegative/invalid_avg_syntax.q PRE-CREATION 
  ql/src/test/queries/clientnegative/invalid_max_syntax.q PRE-CREATION 
  ql/src/test/queries/clientnegative/invalid_min_syntax.q PRE-CREATION 
  ql/src/test/queries/clientnegative/invalid_std_syntax.q PRE-CREATION 
  ql/src/test/queries/clientnegative/invalid_stddev_samp_syntax.q PRE-CREATION 
  ql/src/test/queries/clientnegative/invalid_sum_syntax.q PRE-CREATION 
  ql/src/test/queries/clientnegative/invalid_var_samp_syntax.q PRE-CREATION 
  ql/src/test/queries/clientnegative/invalid_variance_syntax.q PRE-CREATION 
  ql/src/test/queries/clientpositive/udf_count.q 2d1510f 
  ql/src/test/results/clientnegative/invalid_avg_syntax.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/invalid_max_syntax.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/invalid_min_syntax.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/invalid_std_syntax.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/invalid_stddev_samp_syntax.q.out 
PRE-CREATION 
  ql/src/test/results/clientnegative/invalid_sum_syntax.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/invalid_var_samp_syntax.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/invalid_variance_syntax.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/udf_count.q.out 2adffd8 

Diff: http://review.hbase.org/r/275/diff


Testing
---

Ran all tests on trunk. 


Thanks,

Arvind




> count distinct on multiple columns does not work
> 
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, 
> HIVE-287-4.patch, HIVE-287-5-branch-0.6.patch, HIVE-287-5-trunk.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Review Request: Fix for HIVE-287 (patch rev 5)

2010-07-06 Thread Carl Steinbach

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/275/
---

(Updated 2010-07-06 19:24:53.267440)


Review request for Hive Developers and John Sichi.


Changes
---

Updated 'Bugs' field in review request.


Summary
---

This is the patch revision 5 for HIVE-287. Please see the Jira comments for 
full description.


This addresses bug HIVE-287.
http://issues.apache.org/jira/browse/HIVE-287


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 4109103 
  ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 7e6e63e 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 252b89d 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/AbstractGenericUDAFResolver.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java 
0ef4734 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFBridge.java 
916eb33 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCount.java 
0054664 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFHistogramNumeric.java
 8d22ef1 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMax.java 6785687 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMin.java 051f3a1 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFParameterInfo.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFResolver.java 
9888b52 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFResolver2.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java ce97afd 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFVariance.java 
26dc84c 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/SimpleGenericUDAFParameterInfo.java
 PRE-CREATION 
  ql/src/test/queries/clientnegative/invalid_avg_syntax.q PRE-CREATION 
  ql/src/test/queries/clientnegative/invalid_max_syntax.q PRE-CREATION 
  ql/src/test/queries/clientnegative/invalid_min_syntax.q PRE-CREATION 
  ql/src/test/queries/clientnegative/invalid_std_syntax.q PRE-CREATION 
  ql/src/test/queries/clientnegative/invalid_stddev_samp_syntax.q PRE-CREATION 
  ql/src/test/queries/clientnegative/invalid_sum_syntax.q PRE-CREATION 
  ql/src/test/queries/clientnegative/invalid_var_samp_syntax.q PRE-CREATION 
  ql/src/test/queries/clientnegative/invalid_variance_syntax.q PRE-CREATION 
  ql/src/test/queries/clientpositive/udf_count.q 2d1510f 
  ql/src/test/results/clientnegative/invalid_avg_syntax.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/invalid_max_syntax.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/invalid_min_syntax.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/invalid_std_syntax.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/invalid_stddev_samp_syntax.q.out 
PRE-CREATION 
  ql/src/test/results/clientnegative/invalid_sum_syntax.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/invalid_var_samp_syntax.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/invalid_variance_syntax.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/udf_count.q.out 2adffd8 

Diff: http://review.hbase.org/r/275/diff


Testing
---

Ran all tests on trunk. 


Thanks,

Arvind



[jira] Commented: (HIVE-1432) Create a test case for case sensitive comparison done during field comparison

2010-07-06 Thread Arvind Prabhakar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885773#action_12885773
 ] 

Arvind Prabhakar commented on HIVE-1432:


Review posted:

http://review.hbase.org/r/276/

> Create a test case for case sensitive comparison done during field comparison
> -
>
> Key: HIVE-1432
> URL: https://issues.apache.org/jira/browse/HIVE-1432
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Query Processor
>Reporter: Arvind Prabhakar
>Assignee: Arvind Prabhakar
> Fix For: 0.7.0
>
> Attachments: HIVE-1432.patch
>
>
> See HIVE-1271. This jira tracks the creation of a test case to test this fix 
> specifically.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Review Request: Fix for HIVE-1432

2010-07-06 Thread Arvind Prabhakar

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/276/
---

Review request for Hive Developers.


Summary
---

This issue was to track the addition of test cases for exercising the fix for 
HIVE-1271. 


Diffs
-

  data/files/key_value_struct.txt PRE-CREATION 
  data/scripts/replace_tab_with_ctrlB PRE-CREATION 
  ql/src/test/queries/clientpositive/struct_equivalence_test.q PRE-CREATION 
  ql/src/test/results/clientpositive/struct_equivalence_test.q.out PRE-CREATION 

Diff: http://review.hbase.org/r/276/diff


Testing
---

The newly created test were run on trunk version prior to the commit of 
HIVE-1271 and failed as expected. The test passes otherwise on the current 
trunk.


Thanks,

Arvind



[jira] Created: (HIVE-1451) Creating a table stores the full address of namenode in the metadata. This leads to problems when the namenode address changes.

2010-07-06 Thread Arvind Prabhakar (JIRA)
Creating a table stores the full address of namenode in the metadata. This 
leads to problems when the namenode address changes.
---

 Key: HIVE-1451
 URL: https://issues.apache.org/jira/browse/HIVE-1451
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Metastore, Query Processor
Affects Versions: 0.5.0
 Environment: Any
Reporter: Arvind Prabhakar
Assignee: Arvind Prabhakar


Here is an excerpt from table metadata for an arbitrary table {{table1}}:

{noformat}
hive> describe extended table1;
OK
...
Detailed Table Information  ...
location:hdfs://localhost:9000/user/arvind/hive/warehouse/table1, 
...
{noformat}

As can be seen, the full address of namenode is captured in the location 
information for the table. This information is later used to run any queries on 
the table - thus making it impossible to change the namenode location once the 
table has been created. For example, for the above table, a query will fail if 
the namenode is migrated from port 9000 to 8020:

{noformat}
hive> select * from table1;
OK
Failed with exception java.io.IOException:java.net.ConnectException: Call to 
localhost/127.0.0.1:9000
failed on connection exception: java.net.ConnectException: Connection refused
Time taken: 10.78 seconds
hive> 
{noformat}

It should be possible to change the namenode location regardless of when the 
tables are created. Also, any query execution should work with the configured 
namenode at that point in time rather than requiring the configuration to be 
exactly the same at the time when the tables were created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-287) count distinct on multiple columns does not work

2010-07-06 Thread Arvind Prabhakar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885766#action_12885766
 ] 

Arvind Prabhakar commented on HIVE-287:
---

Review board review posted:

http://review.hbase.org/r/275/


> count distinct on multiple columns does not work
> 
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, 
> HIVE-287-4.patch, HIVE-287-5-branch-0.6.patch, HIVE-287-5-trunk.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-287) count distinct on multiple columns does not work

2010-07-06 Thread Arvind Prabhakar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-287:
--

Status: Patch Available  (was: Open)

Uploaded patch for trunk and branch-0.6. Ran all the tests on trunk and did 
spot testing on branch-0.6.

*Changes from Previous patch:*
* Modified the implementation of {{AbstractGenericUDAFResolver}} to raise an 
exception when invoked with the {{UDAF(STAR)}} syntax.
* Added negative test cases to assert that the current UDAFs present in the 
code other than {{COUNT}} do not accept the {{UDAF(STAR)}} syntax.
* Added {{EXPLAIN}} directives for the queries run in {{udf_count.q}} test file.

Will attempt to post the patch on review board as well.


> count distinct on multiple columns does not work
> 
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, 
> HIVE-287-4.patch, HIVE-287-5-branch-0.6.patch, HIVE-287-5-trunk.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Review Request: Fix for HIVE-287 (patch rev 5)

2010-07-06 Thread Arvind Prabhakar

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/275/
---

Review request for Hive Developers and John Sichi.


Summary
---

This is the patch revision 5 for HIVE-287. Please see the Jira comments for 
full description.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 4109103 
  ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 7e6e63e 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 252b89d 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/AbstractGenericUDAFResolver.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java 
0ef4734 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFBridge.java 
916eb33 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCount.java 
0054664 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFHistogramNumeric.java
 8d22ef1 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMax.java 6785687 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMin.java 051f3a1 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFParameterInfo.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFResolver.java 
9888b52 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFResolver2.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java ce97afd 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFVariance.java 
26dc84c 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/SimpleGenericUDAFParameterInfo.java
 PRE-CREATION 
  ql/src/test/queries/clientnegative/invalid_avg_syntax.q PRE-CREATION 
  ql/src/test/queries/clientnegative/invalid_max_syntax.q PRE-CREATION 
  ql/src/test/queries/clientnegative/invalid_min_syntax.q PRE-CREATION 
  ql/src/test/queries/clientnegative/invalid_std_syntax.q PRE-CREATION 
  ql/src/test/queries/clientnegative/invalid_stddev_samp_syntax.q PRE-CREATION 
  ql/src/test/queries/clientnegative/invalid_sum_syntax.q PRE-CREATION 
  ql/src/test/queries/clientnegative/invalid_var_samp_syntax.q PRE-CREATION 
  ql/src/test/queries/clientnegative/invalid_variance_syntax.q PRE-CREATION 
  ql/src/test/queries/clientpositive/udf_count.q 2d1510f 
  ql/src/test/results/clientnegative/invalid_avg_syntax.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/invalid_max_syntax.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/invalid_min_syntax.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/invalid_std_syntax.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/invalid_stddev_samp_syntax.q.out 
PRE-CREATION 
  ql/src/test/results/clientnegative/invalid_sum_syntax.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/invalid_var_samp_syntax.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/invalid_variance_syntax.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/udf_count.q.out 2adffd8 

Diff: http://review.hbase.org/r/275/diff


Testing
---

Ran all tests on trunk. 


Thanks,

Arvind



[jira] Updated: (HIVE-287) count distinct on multiple columns does not work

2010-07-06 Thread Arvind Prabhakar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-287:
--

Attachment: HIVE-287-5-trunk.patch
HIVE-287-5-branch-0.6.patch

> count distinct on multiple columns does not work
> 
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, 
> HIVE-287-4.patch, HIVE-287-5-branch-0.6.patch, HIVE-287-5-trunk.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Reminder: Hive Contributors Meeting is Tomorrow

2010-07-06 Thread John Sichi
Here's the photo if you want to link it in the notes:

http://www.meetup.com/Hive-Contributors-Group/photos/978296/

If you move your head back and forth real fast you can almost recognize people. 
 :)

JVS



[jira] Updated: (HIVE-1307) More generic and efficient merge method

2010-07-06 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1307:
-

Fix Version/s: 0.7.0
   (was: 0.6.0)

> More generic and efficient merge method
> ---
>
> Key: HIVE-1307
> URL: https://issues.apache.org/jira/browse/HIVE-1307
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1307.0.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is 
> create to read the input files and output to one reducer for merging. This MR 
> job is created at compile time and one MR job for one partition. In the case 
> of dynamic partition case, multiple partitions could be created at execution 
> time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and 
> most of the time a map-only job should be sufficient if we use 
> CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1399) Nested UDAFs cause Hive Internal Error (NullPointerException)

2010-07-06 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1399:
-

Fix Version/s: 0.7.0
   (was: 0.6.0)

> Nested UDAFs cause Hive Internal Error (NullPointerException)
> -
>
> Key: HIVE-1399
> URL: https://issues.apache.org/jira/browse/HIVE-1399
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Mayank Lahiri
> Fix For: 0.7.0
>
>
> This query does not make "real-world" sense, and I'm guessing it's not even 
> supported by HQL/SQL, but I'm pretty sure that it shouldn't be causing an 
> internal error with a NullPointerException. "normal" just has one column 
> called "val". I'm running on trunk, svn updated 5 minutes ago, ant clean 
> package.
> SELECT percentile(val, percentile(val, 0.5)) FROM normal;
> FAILED: Hive Internal Error: java.lang.NullPointerException(null)
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:153)
>   at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:587)
>   at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:708)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:128)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:6241)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:2301)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:2860)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:5002)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5524)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6055)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:126)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:304)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:377)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:303)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> I've also recreated this error with a GenericUDAF I'm writing, and also with 
> the following:
> SELECT percentile(val, percentile()) FROM normal;   
> SELECT avg(variance(dob_year)) FROM somedata; // this makes no sense, but 
> still a NullPointerException

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1019) java.io.FileNotFoundException: HIVE_PLAN (No such file or directory)

2010-07-06 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1019:
-

Fix Version/s: 0.7.0
   (was: 0.6.0)

> java.io.FileNotFoundException: HIVE_PLAN (No such file or directory)
> 
>
> Key: HIVE-1019
> URL: https://issues.apache.org/jira/browse/HIVE-1019
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Bennie Schut
>Assignee: Bennie Schut
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: HIVE-1019-1.patch, HIVE-1019-2.patch, HIVE-1019-3.patch, 
> HIVE-1019-4.patch, HIVE-1019-5.patch, HIVE-1019-6.patch, HIVE-1019-7.patch, 
> HIVE-1019-8.patch, HIVE-1019.patch, stacktrace2.txt
>
>
> I keep getting errors like this:
> java.io.FileNotFoundException: HIVE_PLAN (No such file or directory)
> and :
> java.io.IOException: cannot find dir = 
> hdfs://victoria.ebuddy.com:9000/tmp/hive-dwh/801467596/10002 in 
> partToPartitionInfo!
> when running multiple threads with roughly similar queries.
> I have a patch for this which works for me.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1375) dynamic partitions should not create some of the partitions if the query fails

2010-07-06 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1375:
-

Fix Version/s: 0.7.0
   (was: 0.6.0)

> dynamic partitions should not create some of the partitions if the query fails
> --
>
> Key: HIVE-1375
> URL: https://issues.apache.org/jira/browse/HIVE-1375
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
>
> Currently, if a bad row exists, which cannot be part of a partitioning 
> column, it fails - but some of the partitions may already have been created

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1342) Predicate push down get error result when sub-queries have the same alias name

2010-07-06 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1342:
-

Fix Version/s: 0.7.0
   (was: 0.6.0)

> Predicate push down get error result when sub-queries have the same alias 
> name 
> ---
>
> Key: HIVE-1342
> URL: https://issues.apache.org/jira/browse/HIVE-1342
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Ted Xu
>Assignee: Ted Xu
>Priority: Critical
> Fix For: 0.7.0
>
> Attachments: cmd.hql, explain, ppd_same_alias_1.patch, 
> ppd_same_alias_2.patch
>
>
> Query is over-optimized by PPD when sub-queries have the same alias name, see 
> the query:
> ---
> create table if not exists dm_fact_buyer_prd_info_d (
>   category_id string
>   ,gmv_trade_num  int
>   ,user_idint
>   )
> PARTITIONED BY (ds int);
> set hive.optimize.ppd=true;
> set hive.map.aggr=true;
> explain select category_id1,category_id2,assoc_idx
> from (
>   select 
>   category_id1
>   , category_id2
>   , count(distinct user_id) as assoc_idx
>   from (
>   select 
>   t1.category_id as category_id1
>   , t2.category_id as category_id2
>   , t1.user_id
>   from (
>   select category_id, user_id
>   from dm_fact_buyer_prd_info_d
>   group by category_id, user_id ) t1
>   join (
>   select category_id, user_id
>   from dm_fact_buyer_prd_info_d
>   group by category_id, user_id ) t2 on 
> t1.user_id=t2.user_id 
>   ) t1
>   group by category_id1, category_id2 ) t_o
>   where category_id1 <> category_id2
>   and assoc_idx > 2;
> -
> The query above will fail when execute, throwing exception: "can not cast 
> UDFOpNotEqual(Text, IntWritable) to UDFOpNotEqual(Text, Text)". 
> I explained the query and the execute plan looks really wired ( only Stage-1, 
> see the highlighted predicate):
> ---
> Stage: Stage-1
> Map Reduce
>   Alias -> Map Operator Tree:
> t_o:t1:t1:dm_fact_buyer_prd_info_d 
>   TableScan
> alias: dm_fact_buyer_prd_info_d
> Filter Operator
>   predicate:
>   expr: *(category_id <> user_id)*
>   type: boolean
>   Select Operator
> expressions:
>   expr: category_id
>   type: string
>   expr: user_id
>   type: bigint
> outputColumnNames: category_id, user_id
> Group By Operator
>   keys:
> expr: category_id
> type: string
> expr: user_id
> type: bigint
>   mode: hash
>   outputColumnNames: _col0, _col1
>   Reduce Output Operator
> key expressions:
>   expr: _col0
>   type: string
>   expr: _col1
>   type: bigint
> sort order: ++
> Map-reduce partition columns:
>   expr: _col0
>   type: string
>   expr: _col1
>   type: bigint
> tag: -1
>   Reduce Operator Tree:
> Group By Operator
>   keys:
> expr: KEY._col0
> type: string
> expr: KEY._col1
> type: bigint
>   mode: mergepartial
>   outputColumnNames: _col0, _col1
>   Select Operator
> expressions:
>   expr: _col0
>   type: string
>   expr: _col1
>   type: bigint
> outputColumnNames: _col0, _col1
> File Output Operator
>   compressed: true
>   GlobalTableId: 0
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>  

[jira] Updated: (HIVE-1211) Tapping logs from child processes

2010-07-06 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1211:
-

Fix Version/s: 0.7.0
   (was: 0.6.0)

> Tapping logs from child processes
> -
>
> Key: HIVE-1211
> URL: https://issues.apache.org/jira/browse/HIVE-1211
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Logging
>Reporter: bc Wong
>Assignee: bc Wong
> Fix For: 0.7.0
>
> Attachments: HIVE-1211-2.patch, HIVE-1211.1.patch
>
>
> Stdout/stderr from child processes (e.g. {{MapRedTask}}) are redirected to 
> the parent's stdout/stderr. There is little one can do to to sort out which 
> log is from which query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1395) Table aliases are ambiguous

2010-07-06 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1395:
-

Fix Version/s: (was: 0.6.0)

> Table aliases are ambiguous
> ---
>
> Key: HIVE-1395
> URL: https://issues.apache.org/jira/browse/HIVE-1395
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Adam Kramer
> Fix For: 0.7.0
>
>
> Consider this query:
> SELECT a.num FROM (
>   SELECT a.num AS num, b.num AS num2
>   FROM foo a LEFT OUTER JOIN bar b ON a.num=b.num
> ) a
> WHERE a.num2 IS NULL;
> ...in this case, the table alias 'a' is ambiguous. It could be the outer 
> table (i.e., the subquery result), or it could be the inner table (foo).
> In the above case, Hive silently parses the outer reference to a as the inner 
> reference. The result, then, is akin to:
> SELECT foo.num FROM foo WHERE bar.num IS NULL. This is bad.
> The bigger problem, however, is that Hive even lets people use the same table 
> alias at multiple points in the query. We should simply throw an exception 
> during the parse stage if there is any ambiguity in which table is which, 
> just like we do if the column names are ambiguous.
> Or, if for some reason we need people to be able to use 'a' to refer to 
> multiple tables or subqueries, it would be excellent if the exact parsing 
> structure were made clear and added to the wiki. In that case, I will file a 
> separate bug JIRA to complain about how it should be different. :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1056) Predicate push down does not work with UDTF's

2010-07-06 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1056:
-

Fix Version/s: 0.7.0
   (was: 0.6.0)

> Predicate push down does not work with UDTF's
> -
>
> Key: HIVE-1056
> URL: https://issues.apache.org/jira/browse/HIVE-1056
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.5.0, 0.6.0
>Reporter: Paul Yang
>Assignee: Paul Yang
> Fix For: 0.7.0
>
> Attachments: HIVE-1056.1.patch
>
>
> Predicate push down does not work with UDTF's in lateral views
> {code}
> hive> SELECT * FROM src LATERAL VIEW explode(array(1,2,3)) myTable AS k WHERE 
> k=1;
> FAILED: Unknown exception: null
> hive>
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1363) 'SHOW TABLE EXTENDED LIKE' command does not strip single/double quotes

2010-07-06 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1363:
-

Fix Version/s: 0.7.0
   (was: 0.6.0)
  Description: 
{code}
hive> SHOW TABLE EXTENDED LIKE pokes;
OK
tableName:pokes
owner:carl
location:hdfs://localhost/user/hive/warehouse/pokes
inputformat:org.apache.hadoop.mapred.TextInputFormat
outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
columns:struct columns { i32 num}
partitioned:false
partitionColumns:
totalNumberFiles:0
totalFileSize:0
maxFileSize:0
minFileSize:0
lastAccessTime:0
lastUpdateTime:1274517075221

hive> SHOW TABLE EXTENDED LIKE "p*";
FAILED: Error in metadata: MetaException(message:Got exception: 
javax.jdo.JDOUserException ')' expected at character 54 in "database.name == 
dbName && ( tableName.matches("(?i)"p.*""))")
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask

hive> SHOW TABLE EXTENDED LIKE 'p*';
OK

hive> SHOW TABLE EXTENDED LIKE `p*`;
OK
tableName:pokes
owner:carl
location:hdfs://localhost/user/hive/warehouse/pokes
inputformat:org.apache.hadoop.mapred.TextInputFormat
outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
columns:struct columns { i32 num}
partitioned:false
partitionColumns:
totalNumberFiles:0
totalFileSize:0
maxFileSize:0
minFileSize:0
lastAccessTime:0
lastUpdateTime:1274517075221

{code}

  was:

{code}
hive> SHOW TABLE EXTENDED LIKE pokes;
OK
tableName:pokes
owner:carl
location:hdfs://localhost/user/hive/warehouse/pokes
inputformat:org.apache.hadoop.mapred.TextInputFormat
outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
columns:struct columns { i32 num}
partitioned:false
partitionColumns:
totalNumberFiles:0
totalFileSize:0
maxFileSize:0
minFileSize:0
lastAccessTime:0
lastUpdateTime:1274517075221

hive> SHOW TABLE EXTENDED LIKE "p*";
FAILED: Error in metadata: MetaException(message:Got exception: 
javax.jdo.JDOUserException ')' expected at character 54 in "database.name == 
dbName && ( tableName.matches("(?i)"p.*""))")
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask

hive> SHOW TABLE EXTENDED LIKE 'p*';
OK

hive> SHOW TABLE EXTENDED LIKE `p*`;
OK
tableName:pokes
owner:carl
location:hdfs://localhost/user/hive/warehouse/pokes
inputformat:org.apache.hadoop.mapred.TextInputFormat
outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
columns:struct columns { i32 num}
partitioned:false
partitionColumns:
totalNumberFiles:0
totalFileSize:0
maxFileSize:0
minFileSize:0
lastAccessTime:0
lastUpdateTime:1274517075221

{code}


> 'SHOW TABLE EXTENDED LIKE' command does not strip single/double quotes
> --
>
> Key: HIVE-1363
> URL: https://issues.apache.org/jira/browse/HIVE-1363
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Fix For: 0.7.0
>
>
> {code}
> hive> SHOW TABLE EXTENDED LIKE pokes;
> OK
> tableName:pokes
> owner:carl
> location:hdfs://localhost/user/hive/warehouse/pokes
> inputformat:org.apache.hadoop.mapred.TextInputFormat
> outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> columns:struct columns { i32 num}
> partitioned:false
> partitionColumns:
> totalNumberFiles:0
> totalFileSize:0
> maxFileSize:0
> minFileSize:0
> lastAccessTime:0
> lastUpdateTime:1274517075221
> hive> SHOW TABLE EXTENDED LIKE "p*";
> FAILED: Error in metadata: MetaException(message:Got exception: 
> javax.jdo.JDOUserException ')' expected at character 54 in "database.name == 
> dbName && ( tableName.matches("(?i)"p.*""))")
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask
> hive> SHOW TABLE EXTENDED LIKE 'p*';
> OK
> hive> SHOW TABLE EXTENDED LIKE `p*`;
> OK
> tableName:pokes
> owner:carl
> location:hdfs://localhost/user/hive/warehouse/pokes
> inputformat:org.apache.hadoop.mapred.TextInputFormat
> outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> columns:struct columns { i32 num}
> partitioned:false
> partitionColumns:
> totalNumberFiles:0
> totalFileSize:0
> maxFileSize:0
> minFileSize:0
> lastAccessTime:0
> lastUpdateTime:1274517075221
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1369) LazySimpleSerDe should be able to read classes that support some form of toString()

2010-07-06 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1369:
-

Fix Version/s: 0.7.0
   (was: 0.6.0)

> LazySimpleSerDe should be able to read classes that support some form of 
> toString()
> ---
>
> Key: HIVE-1369
> URL: https://issues.apache.org/jira/browse/HIVE-1369
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.5.0
>Reporter: Alex Kozlov
>Assignee: Alex Kozlov
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: HIVE-1369.patch, HIVE-1369.svn.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Currently LazySimpleSerDe is able to deserialize only BytesWritable or Text 
> objects.  It should be pretty easy to extend the class to read any object 
> that implements toString() method.
> Ideas or concerns?
> Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1419) Policy on deserialization errors

2010-07-06 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1419:
-

Fix Version/s: 0.7.0
   (was: 0.6.0)
   (was: 0.5.1)

> Policy on deserialization errors
> 
>
> Key: HIVE-1419
> URL: https://issues.apache.org/jira/browse/HIVE-1419
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.5.0
>Reporter: Vladimir Klimontovich
>Assignee: Vladimir Klimontovich
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: corrupted_records_0.5.patch, 
> corrupted_records_0.5_ver2.patch, corrupted_records_trunk.patch, 
> corrupted_records_trunk_ver2.patch
>
>
> When deserializer throws an exception the whole map tasks fails (see 
> MapOperator.java file). It's not always an convenient behavior especially on 
> huge datasets where several corrupted lines could be a normal practice. 
> Proposed solution:
> 1) Have a counter of corrupted records
> 2) When a counter exceeds a limit (configurable via 
> hive.max.deserializer.errors property, 0 by default) throw an exception. 
> Otherwise just log and exception with WARN level.
> Patches for 0.5 branch and trunk are attached

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1432) Create a test case for case sensitive comparison done during field comparison

2010-07-06 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1432:
-

Fix Version/s: 0.7.0
   (was: 0.6.0)

> Create a test case for case sensitive comparison done during field comparison
> -
>
> Key: HIVE-1432
> URL: https://issues.apache.org/jira/browse/HIVE-1432
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Query Processor
>Reporter: Arvind Prabhakar
>Assignee: Arvind Prabhakar
> Fix For: 0.7.0
>
> Attachments: HIVE-1432.patch
>
>
> See HIVE-1271. This jira tracks the creation of a test case to test this fix 
> specifically.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1411) DataNucleus barfs if JAR appears more than once in CLASSPATH

2010-07-06 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1411:
-

Fix Version/s: 0.7.0

> DataNucleus barfs if JAR appears more than once in CLASSPATH
> 
>
> Key: HIVE-1411
> URL: https://issues.apache.org/jira/browse/HIVE-1411
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.4.0, 0.4.1, 0.5.0
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Fix For: 0.6.0, 0.7.0
>
> Attachments: HIVE-1411.patch.txt
>
>
> DataNucleus barfs when more than one JAR with the same name appears on the 
> CLASSPATH:
> {code}
> 2010-03-06 12:33:25,565 ERROR exec.DDLTask 
> (SessionState.java:printError(279)) - FAILED: Error in metadata: 
> javax.jdo.JDOFatalInter 
> nalException: Unexpected exception caught. 
> NestedThrowables: 
> java.lang.reflect.InvocationTargetException 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> javax.jdo.JDOFatalInternalException: Unexpected exception caught. 
> NestedThrowables: 
> java.lang.reflect.InvocationTargetException 
> at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:258) 
> at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:879) 
> at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:103) 
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:379) 
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:285) 
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123) 
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181) 
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287) 
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597) 
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156) 
> Caused by: javax.jdo.JDOFatalInternalException: Unexpected exception caught. 
> NestedThrowables: 
> java.lang.reflect.InvocationTargetException 
> at 
> javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1186)
> at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:803) 
> at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:698) 
> at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:164) 
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:181)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:125) 
> at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:104) 
> at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) 
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) 
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:130)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:146)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:118)
>  
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.(HiveMetaStore.java:100)
>  
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:74)
>  
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:783) 
> at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:794) 
> at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:252) 
> ... 12 more 
> Caused by: java.lang.reflect.InvocationTargetException 
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597) 
> at javax.jdo.JDOHelper$16.run(JDOHelper.java:1956) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.jdo.JDOHelper.invoke(JDOHelper.java:1951) 
> at 
> javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1159)
> ... 28 more 
> Caused by: org.datanucleus.exceptions.NucleusException: Plugin (Bundle) 
> "org.eclipse.jdt.core" is already registered. Ensure you do 
> nt have multiple JAR versions of the same plugin in the classpath. The URL 
> "file:/Users/hadop/hadoop-0.20.1+152/build/ivy/lib/Hadoo 
> p/common/core-3.1.1.jar" is already registered, and you are trying to 
> register an identical plugin located at URL "file:/Users/hado 
> p/hadoop-0.20.1+152/lib/core-3.1.1.jar." 
> at 
> org.datanucleus.plugin.NonM

[jira] Updated: (HIVE-1401) Web Interface can ony browse default

2010-07-06 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1401:
-

Fix Version/s: 0.7.0

> Web Interface can ony browse default
> 
>
> Key: HIVE-1401
> URL: https://issues.apache.org/jira/browse/HIVE-1401
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Web UI
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0, 0.7.0
>
> Attachments: HIVE-1401-1-patch.txt
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1364) Increase the maximum length of SERDEPROPERTIES values (currently 767 characters)

2010-07-06 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1364:
-

Fix Version/s: 0.7.0

> Increase the maximum length of SERDEPROPERTIES values (currently 767 
> characters)
> 
>
> Key: HIVE-1364
> URL: https://issues.apache.org/jira/browse/HIVE-1364
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.5.0
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Fix For: 0.6.0, 0.7.0
>
> Attachments: HIVE-1364.2.patch.txt, HIVE-1364.patch
>
>
> The value component of a SERDEPROPERTIES key/value pair is currently limited
> to a maximum length of 767 characters. I believe that the motivation for 
> limiting the length to 
> 767 characters is that this value is the maximum allowed length of an index in
> a MySQL database running on the InnoDB engine: 
> http://bugs.mysql.com/bug.php?id=13315
> * The Metastore OR mapping currently limits many fields (including 
> SERDEPROPERTIES.PARAM_VALUE) to a maximum length of 767 characters despite 
> the fact that these fields are not indexed.
> * The maximum length of a VARCHAR value in MySQL 5.0.3 and later is 65,535.
> * We can expect many users to hit the 767 character limit on 
> SERDEPROPERTIES.PARAM_VALUE when using the hbase.columns.mapping 
> serdeproperty to map a table that has many columns.
> I propose increasing the maximum allowed length of 
> SERDEPROPERTIES.PARAM_VALUE to 8192.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1229) replace dependencies on HBase deprecated API

2010-07-06 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885689#action_12885689
 ] 

HBase Review Board commented on HIVE-1229:
--

Message from: bkm.had...@gmail.com

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/239/#review309
---



http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java


I have added a HashMap to HBaseSerDe to cache the String to byte [] 
mapping. The code looks it up for the serialization path. The map is also 
passed to LazyHBaseRow and LazyHBaseCellMap for lookup during deserialization. 
In addition I have moved some function calls to serdeParams and saved their 
return values as instance variables to reduce the per row calls.



http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSplit.java


I have left this in the 3rd patch. Thanks for explaining this. I don't 
think we have a test case which exposes this or even whether the serde instance 
and the record reader instance could get out of sync, but good idea to leave it 
in.

The failing tests were due to an improperly initialized Scan instance.



http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableOutputFormat.java


This OutputFormat is from the deprecated mapred package - this is needed to 
keep it compatible with the storage handler which needs it to be compatible 
with this -- see the getOutputFormat() method.



http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java


Actually, parse() should also be called only once. I have added the missing 
call to set parsed to true. In addition the cached values are now passed in 
from the serde to lazy row to the lazy cell map in the deserialization path.



http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java


Done, reverted these changes.


- bkm





> replace dependencies on HBase deprecated API
> 
>
> Key: HIVE-1229
> URL: https://issues.apache.org/jira/browse/HIVE-1229
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: Basab Maulik
> Fix For: 0.7.0
>
> Attachments: HIVE-1229.1.patch, HIVE-1229.2.patch, HIVE-1229.3.patch
>
>
> Some of these dependencies are on the old Hadoop mapred packages; others are 
> HBase-specific.  The former have to wait until the rest of Hive moves over to 
> the new Hadoop mapreduce package, but the HBase-specific ones don't have to 
> wait.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Review Request: HIVE-1229: take two

2010-07-06 Thread bkm . hadoop

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/239/#review309
---



http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java


I have added a HashMap to HBaseSerDe to cache the String to byte [] 
mapping. The code looks it up for the serialization path. The map is also 
passed to LazyHBaseRow and LazyHBaseCellMap for lookup during deserialization. 
In addition I have moved some function calls to serdeParams and saved their 
return values as instance variables to reduce the per row calls.



http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSplit.java


I have left this in the 3rd patch. Thanks for explaining this. I don't 
think we have a test case which exposes this or even whether the serde instance 
and the record reader instance could get out of sync, but good idea to leave it 
in.

The failing tests were due to an improperly initialized Scan instance.



http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableOutputFormat.java


This OutputFormat is from the deprecated mapred package - this is needed to 
keep it compatible with the storage handler which needs it to be compatible 
with this -- see the getOutputFormat() method.



http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java


Actually, parse() should also be called only once. I have added the missing 
call to set parsed to true. In addition the cached values are now passed in 
from the serde to lazy row to the lazy cell map in the deserialization path.



http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java


Done, reverted these changes.


- bkm


On 2010-06-28 12:13:44, John Sichi wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/239/
> ---
> 
> (Updated 2010-06-28 12:13:44)
> 
> 
> Review request for Hive Developers.
> 
> 
> Summary
> ---
> 
> review by JVS (please ignore the previous one I created a few minutes ago 
> with the bad patch by accident)
> 
> 
> This addresses bug HIVE-1229.
> http://issues.apache.org/jira/browse/HIVE-1229
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java
>  957296 
>   
> http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSplit.java
>  957296 
>   
> http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java
>  957296 
>   
> http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java
>  957296 
>   
> http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableOutputFormat.java
>  957296 
>   
> http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHFileOutputFormat.java
>  957296 
>   
> http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java
>  957296 
>   
> http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java
>  957296 
>   
> http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/HBaseTestSetup.java
>  957296 
>   
> http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java
>  957296 
>   
> http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java
>  957296 
> 
> Diff: http://review.hbase.org/r/239/diff
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> John
> 
>



[jira] Updated: (HIVE-1416) Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode

2010-07-06 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1416:
-

Fix Version/s: (was: 0.6.0)

> Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode
> --
>
> Key: HIVE-1416
> URL: https://issues.apache.org/jira/browse/HIVE-1416
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1416.2.patch, HIVE-1416.patch, junit-noframes.html
>
>
> Hive parses the file name generated by tasks to figure out the task ID in 
> order to generate files for empty buckets. Different hadoop versions and 
> execution mode have different ways of naming  output files by 
> mappers/reducers. We need to move the parsing code to shims. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1442) HOLD_DDLTIME does not change partition metadata

2010-07-06 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1442:
-

Affects Version/s: 0.7.0
   (was: 0.6.0)

> HOLD_DDLTIME does not change partition metadata
> ---
>
> Key: HIVE-1442
> URL: https://issues.apache.org/jira/browse/HIVE-1442
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Namit Jain
>Assignee: He Yongqiang
> Fix For: 0.7.0
>
>
> create table T1 (key string, value string) partitioned by(ds string) stored 
> as sequencefile;
> desc extended T1;
> insert overwrite table T1 partition (ds='1') select key, value from src;
> insert overwrite table T1 partition (ds='2') select key, value from src;
> desc extended T1 partition (ds='1');
> desc extended T1 partition (ds='2');
> alter table T1 set fileformat rcfile;
> insert overwrite table T1 partition (ds='1')
> select /*+ HOLD_DDLTIME*/ key, value from src;
> insert overwrite table T1 partition (ds='2')
> select key, value from src;
> desc extended T1 partition (ds='1');
> desc extended T1 partition (ds='2');
> drop table T1;
> T1/ds=1 is left as sequencefile and corrupted after the insert as HOLD_DDLTIME

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1394) do not update transient_lastDdlTime if the partition is modified by a housekeeping operation

2010-07-06 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885680#action_12885680
 ] 

John Sichi commented on HIVE-1394:
--

Note HIVE-1442, which prevents this feature from being used.  Also, someone 
needs to document it.


> do not update transient_lastDdlTime if the partition is modified by a 
> housekeeping operation
> 
>
> Key: HIVE-1394
> URL: https://issues.apache.org/jira/browse/HIVE-1394
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1394.2.patch, HIVE-1394.patch
>
>
> Currently. purging looks at the hdfs time to see the last time the files got 
> modified.
> It should look at the metastore property instead - these are facebook 
> specific utilities, which do not require any changes to hive.
> However, in some cases, the operation might be performed by some housekeeping 
> job, which should not modify the timestamp.
> Since, hive has no way of knowing the origin of the query, it might be a good 
> idea to add a new hint which specifies that the 
> operation is a cleanup operation, and the timestamp in the metastore need not 
> be touched for that scenario.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1438) sentences() UDF for natural language tokenization

2010-07-06 Thread Mayank Lahiri (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885679#action_12885679
 ] 

Mayank Lahiri commented on HIVE-1438:
-

Patch available for code review. Implements the UDF as described.

> sentences() UDF for natural language tokenization
> -
>
> Key: HIVE-1438
> URL: https://issues.apache.org/jira/browse/HIVE-1438
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Mayank Lahiri
>Assignee: Mayank Lahiri
> Fix For: 0.7.0
>
> Attachments: HIVE-1438.1.patch
>
>
> Create a generic UDF that tokenizes free-form natural language text into 
> sentences and words for more advanced processing, while stripping unnecessary 
> punctuation and being fully international-aware. Fortunately, most of this 
> functionality is already built into Java in the form of the i8n BreakIterator 
> class, so this UDF will just connect it to Hive. For example:
> > SELECT sentences("Hello there! This is a UDF.") FROM somedata LIMIT 1;
> [ ["Hello", "there"], ["This", "is", "a", "UDF"] ]
> or
> > SELECT sentences("Je m'apelle hive!!!", "fr") FROM somedata LIMIT 1;
> [["Je","m'apelle","hive"]]
> Notice how punctuation is maintained only where appropriate. Breaking at 
> sentences (and thus the nested array return type) is important for tasks like 
> counting the frequency of n-grams in text, which should not cross sentence 
> boundaries.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1438) sentences() UDF for natural language tokenization

2010-07-06 Thread Mayank Lahiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Lahiri updated HIVE-1438:


Attachment: HIVE-1438.1.patch

> sentences() UDF for natural language tokenization
> -
>
> Key: HIVE-1438
> URL: https://issues.apache.org/jira/browse/HIVE-1438
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Mayank Lahiri
>Assignee: Mayank Lahiri
> Fix For: 0.7.0
>
> Attachments: HIVE-1438.1.patch
>
>
> Create a generic UDF that tokenizes free-form natural language text into 
> sentences and words for more advanced processing, while stripping unnecessary 
> punctuation and being fully international-aware. Fortunately, most of this 
> functionality is already built into Java in the form of the i8n BreakIterator 
> class, so this UDF will just connect it to Hive. For example:
> > SELECT sentences("Hello there! This is a UDF.") FROM somedata LIMIT 1;
> [ ["Hello", "there"], ["This", "is", "a", "UDF"] ]
> or
> > SELECT sentences("Je m'apelle hive!!!", "fr") FROM somedata LIMIT 1;
> [["Je","m'apelle","hive"]]
> Notice how punctuation is maintained only where appropriate. Breaking at 
> sentences (and thus the nested array return type) is important for tasks like 
> counting the frequency of n-grams in text, which should not cross sentence 
> boundaries.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1442) HOLD_DDLTIME does not change partition metadata

2010-07-06 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1442:
-

Fix Version/s: 0.7.0
Affects Version/s: 0.6.0

> HOLD_DDLTIME does not change partition metadata
> ---
>
> Key: HIVE-1442
> URL: https://issues.apache.org/jira/browse/HIVE-1442
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Namit Jain
>Assignee: He Yongqiang
> Fix For: 0.7.0
>
>
> create table T1 (key string, value string) partitioned by(ds string) stored 
> as sequencefile;
> desc extended T1;
> insert overwrite table T1 partition (ds='1') select key, value from src;
> insert overwrite table T1 partition (ds='2') select key, value from src;
> desc extended T1 partition (ds='1');
> desc extended T1 partition (ds='2');
> alter table T1 set fileformat rcfile;
> insert overwrite table T1 partition (ds='1')
> select /*+ HOLD_DDLTIME*/ key, value from src;
> insert overwrite table T1 partition (ds='2')
> select key, value from src;
> desc extended T1 partition (ds='1');
> desc extended T1 partition (ds='2');
> drop table T1;
> T1/ds=1 is left as sequencefile and corrupted after the insert as HOLD_DDLTIME

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1438) sentences() UDF for natural language tokenization

2010-07-06 Thread Mayank Lahiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Lahiri updated HIVE-1438:


Status: Patch Available  (was: Open)

> sentences() UDF for natural language tokenization
> -
>
> Key: HIVE-1438
> URL: https://issues.apache.org/jira/browse/HIVE-1438
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Mayank Lahiri
>Assignee: Mayank Lahiri
> Fix For: 0.7.0
>
> Attachments: HIVE-1438.1.patch
>
>
> Create a generic UDF that tokenizes free-form natural language text into 
> sentences and words for more advanced processing, while stripping unnecessary 
> punctuation and being fully international-aware. Fortunately, most of this 
> functionality is already built into Java in the form of the i8n BreakIterator 
> class, so this UDF will just connect it to Hive. For example:
> > SELECT sentences("Hello there! This is a UDF.") FROM somedata LIMIT 1;
> [ ["Hello", "there"], ["This", "is", "a", "UDF"] ]
> or
> > SELECT sentences("Je m'apelle hive!!!", "fr") FROM somedata LIMIT 1;
> [["Je","m'apelle","hive"]]
> Notice how punctuation is maintained only where appropriate. Breaking at 
> sentences (and thus the nested array return type) is important for tasks like 
> counting the frequency of n-grams in text, which should not cross sentence 
> boundaries.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1413) bring a table/partition offline

2010-07-06 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1413:
-

Fix Version/s: 0.7.0
   (was: 0.6.0)

> bring a table/partition offline
> ---
>
> Key: HIVE-1413
> URL: https://issues.apache.org/jira/browse/HIVE-1413
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Reporter: Namit Jain
>Assignee: Siying Dong
> Fix For: 0.7.0
>
>
> There should be a way to bring a table/partition offline.
> At that time, no read/write operations should be supported on that table.
> It would be very useful for housekeeping operations

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Reminder: Hive Contributors Meeting is Tomorrow

2010-07-06 Thread Carl Steinbach
Hi Ed,

I updated the event details with dial-in and web conference information.
Please see the following link for more details:

http://www.meetup.com/Hive-Contributors-Group/calendar/13743288/

Thanks.

Carl

On Mon, Jul 5, 2010 at 6:39 PM, Edward Capriolo wrote:

> On Mon, Jul 5, 2010 at 4:21 PM, Carl Steinbach  wrote:
> > Hi Everyone,
> >
> > The next installment of the monthly Hive Contributors Meeting is
> convening
> > tomorrow from 3-5pm at Cloudera's offices in Palo Alto. If you are
> planning
> > to attend and have not already done so, please officially sign up at
> >
> > http://www.meetup.com/Hive-Contributors-Group
> >
> > (If you're not already a member of the Hive Contributors Group you'll
> have
> > to join.)
> >
> > The proposed agenda is:
> >
> > 3-3:15 Introductions
> >
> > 3:15 - 4:15 Share what we're working on
> >- HOwl overview and update from Olga Natkovich
> >- Beeswax demo from bc Wong
> >
> > 4:15 - 5:00 0.6 Release Management Discussion
> >
> > Meeting Location:
> > Cloudera
> > 820 Portage Ave.
> > Palo Alto, CA 94306
> > http://www.google.com/maps?q=210+Portage+Ave,+Palo+Alto,+CA+94306
> >
> > Thanks.
> >
> > Carl
> >
>
> Please find a way to do an audio,video or go-to-meeting, for those on
> the east coast.
>
> Thank you,
> Edward
>


[jira] Updated: (HIVE-1229) replace dependencies on HBase deprecated API

2010-07-06 Thread Basab Maulik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Basab Maulik updated HIVE-1229:
---

Attachment: HIVE-1229.3.patch

This patch fixes the issue which was causing hbase_joins.q tests to fail. I 
needed to update the HBase Scan instance properly.

I have also made the changes as suggested on Review Board, where I will post 
the line item details.

Thanks for the feedback/comments.

ant test -Dtestcase=TestHBaseSerDe
ant test -Dtestcase=TestLazyHBaseObject
ant test -Dtestcase=TestHBaseCliDriver
ant test -Dtestcase=TestHBaseMinimrCliDriver

run successfully.

> replace dependencies on HBase deprecated API
> 
>
> Key: HIVE-1229
> URL: https://issues.apache.org/jira/browse/HIVE-1229
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: Basab Maulik
> Fix For: 0.7.0
>
> Attachments: HIVE-1229.1.patch, HIVE-1229.2.patch, HIVE-1229.3.patch
>
>
> Some of these dependencies are on the old Hadoop mapred packages; others are 
> HBase-specific.  The former have to wait until the rest of Hive moves over to 
> the new Hadoop mapreduce package, but the HBase-specific ones don't have to 
> wait.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1425) hive.task.progress should be added to conf/hive-default.xml

2010-07-06 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885641#action_12885641
 ] 

John Sichi commented on HIVE-1425:
--

Also (unrelated) fix the description for hive.exec.pre.hooks ("Pre Execute Hook 
for Tests") since we are using them for more than just tests now, and add 
hive.exec.post.hooks (currently missing from hive-default.xml).



> hive.task.progress should be added to conf/hive-default.xml
> ---
>
> Key: HIVE-1425
> URL: https://issues.apache.org/jira/browse/HIVE-1425
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Configuration
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: John Sichi
> Fix For: 0.6.0, 0.7.0
>
>
> It is defined in HiveConf, and referenced in hive-default.xml by 
> hive.mapjoin.maxsize, but it itself is not defined in hive-default.xml.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-07-06 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885631#action_12885631
 ] 

He Yongqiang commented on HIVE-1293:


I am going to commit this patch in the next few days. Please post your comments 
if you have any, so we can fix them before this patch gets in.

> Concurreny Model for Hive
> -
>
> Key: HIVE-1293
> URL: https://issues.apache.org/jira/browse/HIVE-1293
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.7.0
>
> Attachments: hive.1293.1.patch, hive_leases.txt
>
>
> Concurrency model for Hive:
> Currently, hive does not provide a good concurrency model. The only 
> guanrantee provided in case of concurrent readers and writers is that
> reader will not see partial data from the old version (before the write) and 
> partial data from the new version (after the write).
> This has come across as a big problem, specially for background processes 
> performing maintenance operations.
> The following possible solutions come to mind.
> 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
> the query or the write locks can be delayed till move
> task (when the directory is actually moved). Care needs to be taken for 
> deadlocks.
> 2. Versioning: The writer can create a new version if the current version is 
> being read. Note that, it is not equivalent to snapshots,
> the old version can only be accessed by the current readers, and will be 
> deleted when all of them have finished.
> Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Hive-trunk-h0.19 #493

2010-07-06 Thread Apache Hudson Server
See 

Changes:

[heyongqiang] HIVE-1447. Speed up reflection method calls(Zheng via He 
Yongqiang)

--
[...truncated 13862 lines...]
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 

 

[junit] Done query: unknown_function4.q
[junit] Begin query: unknown_table1.q
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] diff 

 

[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] Done query: unknown_table1.q
[junit] Begin query: unknown_table2.q
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Outp

Build failed in Hudson: Hive-trunk-h0.18 #493

2010-07-06 Thread Apache Hudson Server
See 

Changes:

[heyongqiang] HIVE-1447. Speed up reflection method calls(Zheng via He 
Yongqiang)

--
[...truncated 13851 lines...]
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 

 

[junit] Done query: unknown_function4.q
[junit] Begin query: unknown_table1.q
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 

 

[junit] Done query: unknown_table1.q
[junit] Begin query: unknown_table2.q
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Outp

[jira] Updated: (HIVE-847) support show databases

2010-07-06 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-847:


Fix Version/s: 0.7.0
Affects Version/s: (was: 0.5.0)

> support show databases
> --
>
> Key: HIVE-847
> URL: https://issues.apache.org/jira/browse/HIVE-847
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Reporter: Namit Jain
>Assignee: Carl Steinbach
> Fix For: 0.6.0, 0.7.0
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-675) add database/scheme support Hive QL

2010-07-06 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-675:


Fix Version/s: 0.7.0
Affects Version/s: (was: 0.5.0)

> add database/scheme support Hive QL
> ---
>
> Key: HIVE-675
> URL: https://issues.apache.org/jira/browse/HIVE-675
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Carl Steinbach
> Fix For: 0.6.0, 0.7.0
>
> Attachments: hive-675-2009-9-16.patch, hive-675-2009-9-19.patch, 
> hive-675-2009-9-21.patch, hive-675-2009-9-23.patch, hive-675-2009-9-7.patch, 
> hive-675-2009-9-8.patch
>
>
> Currently all Hive tables reside in single namespace (default). Hive should 
> support multiple namespaces (databases or schemas) such that users can create 
> tables in their specific namespaces. These name spaces can have different 
> warehouse directories (with a default naming scheme) and possibly different 
> properties.
> There is already some support for this in metastore but Hive query parser 
> should have this feature as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Hudson build is back to normal : Hive-trunk-h0.17 #491

2010-07-06 Thread Apache Hudson Server
See 




[jira] Created: (HIVE-1450) always catch exception when invoke executeUpdate in jdbc

2010-07-06 Thread Alexey Diomin (JIRA)
always catch exception when invoke executeUpdate in jdbc


 Key: HIVE-1450
 URL: https://issues.apache.org/jira/browse/HIVE-1450
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Drivers
Reporter: Alexey Diomin
 Fix For: 0.5.1


Request executed in hive, but always return exception 
error in ./jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveStatement.java

{code}
  public int executeUpdate(String sql) throws SQLException {
try {
  client.execute(sql);
} catch (Exception ex) {
  throw new SQLException(ex.toString());
}
throw new SQLException("Method not supported");
  }
{code}

executeQuery work correct

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.