[jira] Updated: (HIVE-1674) count(*) returns wrong result when a mapper returns empty results

2010-10-06 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1674:
---

   Resolution: Fixed
Fix Version/s: 0.7.0
   Status: Resolved  (was: Patch Available)

I just committed! Thanks Ning!

 count(*) returns wrong result when a mapper returns empty results
 -

 Key: HIVE-1674
 URL: https://issues.apache.org/jira/browse/HIVE-1674
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang
 Fix For: 0.7.0

 Attachments: HIVE-1674.patch


 select count(*) from src where false; will return # of mappers rather than 0. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1376) Simple UDAFs with more than 1 parameter crash on empty row query

2010-10-06 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12918716#action_12918716
 ] 

He Yongqiang commented on HIVE-1376:


will take a look. 

 Simple UDAFs with more than 1 parameter crash on empty row query 
 -

 Key: HIVE-1376
 URL: https://issues.apache.org/jira/browse/HIVE-1376
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Mayank Lahiri
Assignee: Ning Zhang
 Attachments: HIVE-1376.2.patch, HIVE-1376.patch


 Simple UDAFs with more than 1 parameter crash when the query returns no rows. 
 Currently, this only seems to affect the percentile() UDAF where the second 
 parameter is the percentile to be computed (of type double). I've also 
 verified the bug by adding a dummy parameter to ExampleMin in contrib. 
 On an empty query, Hive seems to be trying to resolve an iterate() method 
 with signature {null,null} instead of {null,double}. You can reproduce this 
 bug using:
 CREATE TABLE pct_test ( val INT );
 SELECT percentile(val, 0.5) FROM pct_test;
 which produces a lot of errors like: 
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
 execute method public boolean 
 org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.iterate(org.apache.hadoop.io.LongWritable,double)
   on object 
 org.apache.hadoop.hive.ql.udf.udafpercentile$percentilelongevalua...@11d13272 
 of class org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator 
 with arguments {null, null} of size 2

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1641) add map joined table to distributed cache

2010-10-06 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12918764#action_12918764
 ] 

He Yongqiang commented on HIVE-1641:


There are 2 patches with the same name. Can you delete the older one? And when 
uploading a patch, pls rename the patch to 
hive-jiranumber.patchnumberordate.patch.


 add map joined table to distributed cache
 -

 Key: HIVE-1641
 URL: https://issues.apache.org/jira/browse/HIVE-1641
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.7.0
Reporter: Namit Jain
Assignee: Liyin Tang
 Fix For: 0.7.0

 Attachments: Hive-1641.patch, Hive-1641.patch


 Currently, the mappers directly read the map-joined table from HDFS, which 
 makes it difficult to scale.
 We end up getting lots of timeouts once the number of mappers are beyond a 
 few thousand, due to 
 concurrent mappers.
 It would be good idea to put the mapped file into distributed cache and read 
 from there instead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1376) Simple UDAFs with more than 1 parameter crash on empty row query

2010-10-06 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12918767#action_12918767
 ] 

He Yongqiang commented on HIVE-1376:


the patch looks good. is there the same problem in other udafs? If yes, should 
we fix them one by one or fix them in the group by operator?

 Simple UDAFs with more than 1 parameter crash on empty row query 
 -

 Key: HIVE-1376
 URL: https://issues.apache.org/jira/browse/HIVE-1376
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Mayank Lahiri
Assignee: Ning Zhang
 Attachments: HIVE-1376.2.patch, HIVE-1376.patch


 Simple UDAFs with more than 1 parameter crash when the query returns no rows. 
 Currently, this only seems to affect the percentile() UDAF where the second 
 parameter is the percentile to be computed (of type double). I've also 
 verified the bug by adding a dummy parameter to ExampleMin in contrib. 
 On an empty query, Hive seems to be trying to resolve an iterate() method 
 with signature {null,null} instead of {null,double}. You can reproduce this 
 bug using:
 CREATE TABLE pct_test ( val INT );
 SELECT percentile(val, 0.5) FROM pct_test;
 which produces a lot of errors like: 
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
 execute method public boolean 
 org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.iterate(org.apache.hadoop.io.LongWritable,double)
   on object 
 org.apache.hadoop.hive.ql.udf.udafpercentile$percentilelongevalua...@11d13272 
 of class org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator 
 with arguments {null, null} of size 2

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1376) Simple UDAFs with more than 1 parameter crash on empty row query

2010-10-06 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12918771#action_12918771
 ] 

He Yongqiang commented on HIVE-1376:


sorry, did not see the previous comments. John and Zheng have already discussed 
this problem. I will start running tests.

 Simple UDAFs with more than 1 parameter crash on empty row query 
 -

 Key: HIVE-1376
 URL: https://issues.apache.org/jira/browse/HIVE-1376
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Mayank Lahiri
Assignee: Ning Zhang
 Attachments: HIVE-1376.2.patch, HIVE-1376.patch


 Simple UDAFs with more than 1 parameter crash when the query returns no rows. 
 Currently, this only seems to affect the percentile() UDAF where the second 
 parameter is the percentile to be computed (of type double). I've also 
 verified the bug by adding a dummy parameter to ExampleMin in contrib. 
 On an empty query, Hive seems to be trying to resolve an iterate() method 
 with signature {null,null} instead of {null,double}. You can reproduce this 
 bug using:
 CREATE TABLE pct_test ( val INT );
 SELECT percentile(val, 0.5) FROM pct_test;
 which produces a lot of errors like: 
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
 execute method public boolean 
 org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.iterate(org.apache.hadoop.io.LongWritable,double)
   on object 
 org.apache.hadoop.hive.ql.udf.udafpercentile$percentilelongevalua...@11d13272 
 of class org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator 
 with arguments {null, null} of size 2

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1674) count(*) returns wrong result when a mapper returns empty results

2010-10-05 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12918318#action_12918318
 ] 

He Yongqiang commented on HIVE-1674:


+1. running test.

 count(*) returns wrong result when a mapper returns empty results
 -

 Key: HIVE-1674
 URL: https://issues.apache.org/jira/browse/HIVE-1674
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1674.patch


 select count(*) from src where false; will return # of mappers rather than 0. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1658) Fix describe [extended] column formatting

2010-10-04 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917725#action_12917725
 ] 

He Yongqiang commented on HIVE-1658:


+1. Looks good. Can you do the final patch?

 Fix describe [extended] column formatting
 -

 Key: HIVE-1658
 URL: https://issues.apache.org/jira/browse/HIVE-1658
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Paul Yang
Assignee: Thiruvel Thirumoolan
 Attachments: HIVE-1658-PrelimPatch.patch


 When displaying the column schema, the formatting should follow should be 
 nameTABtypeTABcommentNEWLINE
 to be inline with the previous formatting style for backward compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1658) Fix describe [extended] column formatting

2010-10-04 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917771#action_12917771
 ] 

He Yongqiang commented on HIVE-1658:


one more thing, if the time information (create time, last access time etc) is 
0, can you put some string like unknown to the output of desc format?

 Fix describe [extended] column formatting
 -

 Key: HIVE-1658
 URL: https://issues.apache.org/jira/browse/HIVE-1658
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Paul Yang
Assignee: Thiruvel Thirumoolan
 Attachments: HIVE-1658-PrelimPatch.patch


 When displaying the column schema, the formatting should follow should be 
 nameTABtypeTABcommentNEWLINE
 to be inline with the previous formatting style for backward compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1674) count(*) returns wrong result when a mapper returns empty results

2010-10-04 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917856#action_12917856
 ] 

He Yongqiang commented on HIVE-1674:


will take a look.

 count(*) returns wrong result when a mapper returns empty results
 -

 Key: HIVE-1674
 URL: https://issues.apache.org/jira/browse/HIVE-1674
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1674.patch


 select count(*) from src where false; will return # of mappers rather than 0. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1676) show table extended like does not work well with wildcards

2010-09-30 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916583#action_12916583
 ] 

He Yongqiang commented on HIVE-1676:


needs to use ``, not quote.

 show table extended like does not work well with wildcards
 --

 Key: HIVE-1676
 URL: https://issues.apache.org/jira/browse/HIVE-1676
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Pradeep Kamath
Priority: Minor

 As evident from the output below though there are tables that match the 
 wildcard, the output from show table extended like  does not contain the 
 matches.
 {noformat}
 bin/hive -e show tables 'foo*'
 Hive history 
 file=/tmp/pradeepk/hive_job_log_pradeepk_201009301037_568707409.txt
 OK
 foo
 foo2
 Time taken: 3.417 seconds
 bin/hive -e show table extended like 'foo*'
 Hive history 
 file=/tmp/pradeepk/hive_job_log_pradeepk_201009301037_410056681.txt
 OK
 Time taken: 2.948 seconds
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1673) Create table bug causes the row format property lost when serde is specified.

2010-09-30 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916590#action_12916590
 ] 

He Yongqiang commented on HIVE-1673:


Just tried again, the same tests succeeded in my box. Can you post your diff 
for those testcases?

 Create table bug causes the row format property lost when serde is specified.
 -

 Key: HIVE-1673
 URL: https://issues.apache.org/jira/browse/HIVE-1673
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: He Yongqiang
Assignee: He Yongqiang
 Fix For: 0.7.0

 Attachments: hive-1673.1.patch


 An example:
 create table src_rc_serde_yongqiang(key string, value string) ROW FORMAT  
 DELIMITED FIELDS TERMINATED BY '\\0' stored as rcfile; 
 will lost the row format information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1647) Incorrect initialization of thread local variable inside IOContext ( implementation is not threadsafe )

2010-09-30 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916676#action_12916676
 ] 

He Yongqiang commented on HIVE-1647:


+1 running tests

 Incorrect initialization of thread local variable inside IOContext ( 
 implementation is not threadsafe ) 
 

 Key: HIVE-1647
 URL: https://issues.apache.org/jira/browse/HIVE-1647
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Server Infrastructure
Affects Versions: 0.6.0, 0.7.0
Reporter: Raman Grover
Assignee: Liyin Tang
 Fix For: 0.7.0

 Attachments: HIVE-1647.patch

   Original Estimate: 0.17h
  Remaining Estimate: 0.17h

 Bug in org.apache.hadoop.hive.ql.io.IOContext
 in relation to initialization of thread local variable.
  
 public class IOContext {
  
   private static ThreadLocalIOContext threadLocal = new 
 ThreadLocalIOContext(){ };
  
   static {
 if (threadLocal.get() == null) {
   threadLocal.set(new IOContext());
 }
   }
  
 In a multi-threaded environment, the thread that gets to load the class first 
 for the JVM (assuming threads share the classloader),
 gets to initialize itself correctly by executing the code in the static 
 block. Once the class is loaded, 
 any subsequent threads would  have their respective threadlocal variable as 
 null.  Since IOContext
 is set during initialization of HiveRecordReader, In a scenario where 
 multiple threads get to acquire
  an instance of HiveRecordReader, it would result in a NPE for all but the 
 first thread that gets to load the class in the VM.
  
 Is the above scenario of multiple threads initializing HiveRecordReader a 
 typical one ?  or we could just provide the following fix...
  
   private static ThreadLocalIOContext threadLocal = new 
 ThreadLocalIOContext(){
 protected synchronized IOContext initialValue() {
   return new IOContext();
 }  
   };

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1624) Patch to allows scripts in S3 location

2010-09-30 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang resolved HIVE-1624.


Fix Version/s: 0.7.0
   Resolution: Fixed

I just committed! Thanks Vaibhav Aggarwal!

 Patch to allows scripts in S3 location
 --

 Key: HIVE-1624
 URL: https://issues.apache.org/jira/browse/HIVE-1624
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Vaibhav Aggarwal
Assignee: Vaibhav Aggarwal
 Fix For: 0.7.0

 Attachments: HIVE-1624-2.patch, HIVE-1624-3.patch, HIVE-1624-4.patch, 
 HIVE-1624-5.patch, HIVE-1624.patch


 I want to submit a patch which allows user to run scripts located in S3.
 This patch enables Hive to download the hive scripts located in S3 buckets 
 and execute them. This saves users the effort of copying scripts to HDFS 
 before executing them.
 Thanks
 Vaibhav

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1647) Incorrect initialization of thread local variable inside IOContext ( implementation is not threadsafe )

2010-09-30 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916712#action_12916712
 ] 

He Yongqiang commented on HIVE-1647:


The test is still running, but there are a lot of diffs. Can you take a look? 
Examples like join_map_ppr.q, input_part10.q

You can use ' ant test -Dtestcase=TestCliDriver 
-Dqfile=join_map_ppr.q,input_part10.q ' to reproduce.

 Incorrect initialization of thread local variable inside IOContext ( 
 implementation is not threadsafe ) 
 

 Key: HIVE-1647
 URL: https://issues.apache.org/jira/browse/HIVE-1647
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Server Infrastructure
Affects Versions: 0.6.0, 0.7.0
Reporter: Raman Grover
Assignee: Liyin Tang
 Fix For: 0.7.0

 Attachments: HIVE-1647.patch

   Original Estimate: 0.17h
  Remaining Estimate: 0.17h

 Bug in org.apache.hadoop.hive.ql.io.IOContext
 in relation to initialization of thread local variable.
  
 public class IOContext {
  
   private static ThreadLocalIOContext threadLocal = new 
 ThreadLocalIOContext(){ };
  
   static {
 if (threadLocal.get() == null) {
   threadLocal.set(new IOContext());
 }
   }
  
 In a multi-threaded environment, the thread that gets to load the class first 
 for the JVM (assuming threads share the classloader),
 gets to initialize itself correctly by executing the code in the static 
 block. Once the class is loaded, 
 any subsequent threads would  have their respective threadlocal variable as 
 null.  Since IOContext
 is set during initialization of HiveRecordReader, In a scenario where 
 multiple threads get to acquire
  an instance of HiveRecordReader, it would result in a NPE for all but the 
 first thread that gets to load the class in the VM.
  
 Is the above scenario of multiple threads initializing HiveRecordReader a 
 typical one ?  or we could just provide the following fix...
  
   private static ThreadLocalIOContext threadLocal = new 
 ThreadLocalIOContext(){
 protected synchronized IOContext initialValue() {
   return new IOContext();
 }  
   };

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1677) revert changes made by HIVE-558

2010-09-30 Thread He Yongqiang (JIRA)
revert changes made by HIVE-558
---

 Key: HIVE-1677
 URL: https://issues.apache.org/jira/browse/HIVE-1677
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang


There is another jira (https://issues.apache.org/jira/browse/HIVE-1658) going 
on to do a better fix for HIVE-558.

If HIVE-1658 can not be patched in a timely fashion, we can first revert 
HIVE-558 for now. So that it will not be a blocker for releasing etc. 

This is just the bottom line. Please feel free to close this jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1677) revert changes made by HIVE-558

2010-09-30 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang resolved HIVE-1677.


Resolution: Invalid

Had an offline discussion with namit, reverting will need a lot of changes in 
the log file. It is not a good way. Let's first do a simple fix in HIVE-1658, 
and the do the pretty describe in another diff.

 revert changes made by HIVE-558
 ---

 Key: HIVE-1677
 URL: https://issues.apache.org/jira/browse/HIVE-1677
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang

 There is another jira (https://issues.apache.org/jira/browse/HIVE-1658) going 
 on to do a better fix for HIVE-558.
 If HIVE-1658 can not be patched in a timely fashion, we can first revert 
 HIVE-558 for now. So that it will not be a blocker for releasing etc. 
 This is just the bottom line. Please feel free to close this jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1665) drop operations may cause file leak

2010-09-29 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1665:
---

Attachment: hive-1665.1.patch

 drop operations may cause file leak
 ---

 Key: HIVE-1665
 URL: https://issues.apache.org/jira/browse/HIVE-1665
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: hive-1665.1.patch


 Right now when doing a drop, Hive first drops metadata and then drops the 
 actual files. If file system is down at that time, the files will keep not 
 deleted. 
 Had an offline discussion about this:
 to fix this, add a new conf scratch dir into hive conf. 
 when doing a drop operation:
 1) move data to scratch directory
 2) drop metadata
 3) if 2) failed, roll back 1) and report error 3.1
 if 2) succeeded, drop data from scratch directory 3.2
 4) if 3.2 fails, we are ok because we assume the scratch dir will be emptied 
 manually.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1673) Create table bug causes the row format property lost when serde is specified.

2010-09-29 Thread He Yongqiang (JIRA)
Create table bug causes the row format property lost when serde is specified.
-

 Key: HIVE-1673
 URL: https://issues.apache.org/jira/browse/HIVE-1673
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang


An example:

create table src_rc_serde_yongqiang(key string, value string) ROW FORMAT  
DELIMITED FIELDS TERMINATED BY '\\0' stored as rcfile; 
will lost the row format information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1673) Create table bug causes the row format property lost when serde is specified.

2010-09-29 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1673:
---

   Status: Patch Available  (was: Open)
Affects Version/s: 0.7.0
Fix Version/s: 0.7.0

 Create table bug causes the row format property lost when serde is specified.
 -

 Key: HIVE-1673
 URL: https://issues.apache.org/jira/browse/HIVE-1673
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: He Yongqiang
Assignee: He Yongqiang
 Fix For: 0.7.0

 Attachments: hive-1673.1.patch


 An example:
 create table src_rc_serde_yongqiang(key string, value string) ROW FORMAT  
 DELIMITED FIELDS TERMINATED BY '\\0' stored as rcfile; 
 will lost the row format information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1673) Create table bug causes the row format property lost when serde is specified.

2010-09-29 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang reassigned HIVE-1673:
--

Assignee: He Yongqiang

 Create table bug causes the row format property lost when serde is specified.
 -

 Key: HIVE-1673
 URL: https://issues.apache.org/jira/browse/HIVE-1673
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: He Yongqiang
Assignee: He Yongqiang
 Fix For: 0.7.0

 Attachments: hive-1673.1.patch


 An example:
 create table src_rc_serde_yongqiang(key string, value string) ROW FORMAT  
 DELIMITED FIELDS TERMINATED BY '\\0' stored as rcfile; 
 will lost the row format information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1673) Create table bug causes the row format property lost when serde is specified.

2010-09-29 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1673:
---

Attachment: hive-1673.1.patch

 Create table bug causes the row format property lost when serde is specified.
 -

 Key: HIVE-1673
 URL: https://issues.apache.org/jira/browse/HIVE-1673
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: He Yongqiang
Assignee: He Yongqiang
 Fix For: 0.7.0

 Attachments: hive-1673.1.patch


 An example:
 create table src_rc_serde_yongqiang(key string, value string) ROW FORMAT  
 DELIMITED FIELDS TERMINATED BY '\\0' stored as rcfile; 
 will lost the row format information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1665) drop operations may cause file leak

2010-09-29 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916329#action_12916329
 ] 

He Yongqiang commented on HIVE-1665:


If  2 failed and rolling back 1) also failed, then the data is in trash 
scratch dir and the table's metadata is there.
But 2 failed and rolling back 1) also failed will rarely happen. Most concern 
here is to deal with hdfs down and housekeeping operations.

For 'mark-then-delete', I think the main problem is there is no administration 
daemon process or helper script for it. 

 drop operations may cause file leak
 ---

 Key: HIVE-1665
 URL: https://issues.apache.org/jira/browse/HIVE-1665
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: hive-1665.1.patch


 Right now when doing a drop, Hive first drops metadata and then drops the 
 actual files. If file system is down at that time, the files will keep not 
 deleted. 
 Had an offline discussion about this:
 to fix this, add a new conf scratch dir into hive conf. 
 when doing a drop operation:
 1) move data to scratch directory
 2) drop metadata
 3) if 2) failed, roll back 1) and report error 3.1
 if 2) succeeded, drop data from scratch directory 3.2
 4) if 3.2 fails, we are ok because we assume the scratch dir will be emptied 
 manually.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1624) Patch to allows scripts in S3 location

2010-09-28 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915942#action_12915942
 ] 

He Yongqiang commented on HIVE-1624:


Mostly look good. 

In your testcase, can you put the new script file in
new Path(System.getProperty(test.data.dir, .) + file name) ?

By move fetchFilesNotInLocalFilesystem to SessionState, you can keep 
getScriptProgName() ect in SemanticAnalyer by changing 
fetchFilesNotInLocalFilesystem's arguments to pass in the command etc. I am 
also ok with current way.

 Patch to allows scripts in S3 location
 --

 Key: HIVE-1624
 URL: https://issues.apache.org/jira/browse/HIVE-1624
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Vaibhav Aggarwal
Assignee: Vaibhav Aggarwal
 Attachments: HIVE-1624-2.patch, HIVE-1624-3.patch, HIVE-1624-4.patch, 
 HIVE-1624.patch


 I want to submit a patch which allows user to run scripts located in S3.
 This patch enables Hive to download the hive scripts located in S3 buckets 
 and execute them. This saves users the effort of copying scripts to HDFS 
 before executing them.
 Thanks
 Vaibhav

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1624) Patch to allows scripts in S3 location

2010-09-27 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915391#action_12915391
 ] 

He Yongqiang commented on HIVE-1624:


Great. some nitpick, sorry for not posting them in the previous comment. 
1) It seems there is still one logging code +
getConsole().printInfo(Testing  + value);
2) Also can you add one junit test for DosToUnix?
3) Do you think it maybe better to move fetchFilesNotInLocalFilesystem to 
SessionState?

 Patch to allows scripts in S3 location
 --

 Key: HIVE-1624
 URL: https://issues.apache.org/jira/browse/HIVE-1624
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Vaibhav Aggarwal
Assignee: Vaibhav Aggarwal
 Attachments: HIVE-1624-2.patch, HIVE-1624-3.patch, HIVE-1624.patch


 I want to submit a patch which allows user to run scripts located in S3.
 This patch enables Hive to download the hive scripts located in S3 buckets 
 and execute them. This saves users the effort of copying scripts to HDFS 
 before executing them.
 Thanks
 Vaibhav

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1361) table/partition level statistics

2010-09-27 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1361:
---

Status: Resolved  (was: Patch Available)
Resolution: Fixed

I just committed! Thanks Ning and Ahmed!

 table/partition level statistics
 

 Key: HIVE-1361
 URL: https://issues.apache.org/jira/browse/HIVE-1361
 Project: Hadoop Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Ning Zhang
Assignee: Ahmed M Aly
 Fix For: 0.7.0

 Attachments: HIVE-1361.2.patch, HIVE-1361.2_java_only.patch, 
 HIVE-1361.3.patch, HIVE-1361.4.java_only.patch, HIVE-1361.4.patch, 
 HIVE-1361.5.java_only.patch, HIVE-1361.5.patch, HIVE-1361.java_only.patch, 
 HIVE-1361.patch, stats0.patch


 At the first step, we gather table-level stats for non-partitioned table and 
 partition-level stats for partitioned table. Future work could extend the 
 table level stats to partitioned table as well. 
 There are 3 major milestones in this subtask: 
  1) extend the insert statement to gather table/partition level stats 
 on-the-fly.
  2) extend metastore API to support storing and retrieving stats for a 
 particular table/partition. 
  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
 existing tables/partitions. 
 The proposed stats are:
 Partition-level stats: 
   - number of rows
   - total size in bytes
   - number of files
   - max, min, average row sizes
   - max, min, average file sizes
 Table-level stats in addition to partition level stats:
   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1663) ql/src/java/org/apache/hadoop/hive/ql/parse/SamplePruner.java is empty

2010-09-27 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang resolved HIVE-1663.


Fix Version/s: 0.7.0
   Resolution: Fixed

fixed. 

 ql/src/java/org/apache/hadoop/hive/ql/parse/SamplePruner.java is empty
 --

 Key: HIVE-1663
 URL: https://issues.apache.org/jira/browse/HIVE-1663
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Fix For: 0.7.0


 we should remove this empty file

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1663) ql/src/java/org/apache/hadoop/hive/ql/parse/SamplePruner.java is empty

2010-09-27 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915435#action_12915435
 ] 

He Yongqiang commented on HIVE-1663:


Sorry that I committed this myself. i can not generate a patch for it (this 
file is empty). 

 ql/src/java/org/apache/hadoop/hive/ql/parse/SamplePruner.java is empty
 --

 Key: HIVE-1663
 URL: https://issues.apache.org/jira/browse/HIVE-1663
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Fix For: 0.7.0


 we should remove this empty file

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1670) MapJoin throws EOFExeption when the mapjoined table has 0 column selected

2010-09-25 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914762#action_12914762
 ] 

He Yongqiang commented on HIVE-1670:


Is this the same as https://issues.apache.org/jira/browse/HIVE-1452? If yes, we 
can close HIVE-1452 since it is fixed here.

 MapJoin throws EOFExeption when the mapjoined table has 0 column selected
 -

 Key: HIVE-1670
 URL: https://issues.apache.org/jira/browse/HIVE-1670
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1670.patch


 select /*+mapjoin(b) */ sum(a.key) from src a join src b on (a.key=b.key); 
 throws EOFException

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1361) table/partition level statistics

2010-09-24 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914728#action_12914728
 ] 

He Yongqiang commented on HIVE-1361:


+1 running tests.

 table/partition level statistics
 

 Key: HIVE-1361
 URL: https://issues.apache.org/jira/browse/HIVE-1361
 Project: Hadoop Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Ning Zhang
Assignee: Ahmed M Aly
 Fix For: 0.7.0

 Attachments: HIVE-1361.2.patch, HIVE-1361.2_java_only.patch, 
 HIVE-1361.3.patch, HIVE-1361.4.java_only.patch, HIVE-1361.4.patch, 
 HIVE-1361.5.java_only.patch, HIVE-1361.5.patch, HIVE-1361.java_only.patch, 
 HIVE-1361.patch, stats0.patch


 At the first step, we gather table-level stats for non-partitioned table and 
 partition-level stats for partitioned table. Future work could extend the 
 table level stats to partitioned table as well. 
 There are 3 major milestones in this subtask: 
  1) extend the insert statement to gather table/partition level stats 
 on-the-fly.
  2) extend metastore API to support storing and retrieving stats for a 
 particular table/partition. 
  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
 existing tables/partitions. 
 The proposed stats are:
 Partition-level stats: 
   - number of rows
   - total size in bytes
   - number of files
   - max, min, average row sizes
   - max, min, average file sizes
 Table-level stats in addition to partition level stats:
   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1669) non-deterministic display of storage parameter in test

2010-09-24 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914727#action_12914727
 ] 

He Yongqiang commented on HIVE-1669:


Ning, Can you post a fix for this after i commit the statistic jira? (HIVE-1361)

 non-deterministic display of storage parameter in test
 --

 Key: HIVE-1669
 URL: https://issues.apache.org/jira/browse/HIVE-1669
 Project: Hadoop Hive
  Issue Type: Test
Reporter: Ning Zhang

 With the change to beautify the 'desc extended table', the storage parameters 
 are displayed in non-deterministic manner (since its implementation is 
 HashMap). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1661) Default values for parameters

2010-09-23 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1661:
---

Status: Resolved  (was: Patch Available)
Resolution: Fixed

committed! Thanks Siying!

 Default values for parameters
 -

 Key: HIVE-1661
 URL: https://issues.apache.org/jira/browse/HIVE-1661
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Siying Dong
 Fix For: 0.7.0

 Attachments: HIVE-1661.1.patch, HIVE-1661.2.patch


 It would be good to have a default value for some hive parameters:
 say RETENTION to be 30 days.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1624) Patch to allows scripts in S3 location

2010-09-23 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914176#action_12914176
 ] 

He Yongqiang commented on HIVE-1624:


Should I modify it to be hdfs://anything || s3://anything like path?
Yes. That will be a great start. We can add more if needed in future.

Also please make sure if a program, neither hdfs nor s3 , can not be found 
locally, the query should not fail in semantic analyzer. Otherwise, it may 
break a lot of existing queries.

 Patch to allows scripts in S3 location
 --

 Key: HIVE-1624
 URL: https://issues.apache.org/jira/browse/HIVE-1624
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Vaibhav Aggarwal
Assignee: Vaibhav Aggarwal
 Attachments: HIVE-1624-2.patch, HIVE-1624.patch


 I want to submit a patch which allows user to run scripts located in S3.
 This patch enables Hive to download the hive scripts located in S3 buckets 
 and execute them. This saves users the effort of copying scripts to HDFS 
 before executing them.
 Thanks
 Vaibhav

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1665) drop operations may cause file leak

2010-09-22 Thread He Yongqiang (JIRA)
drop operations may cause file leak
---

 Key: HIVE-1665
 URL: https://issues.apache.org/jira/browse/HIVE-1665
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang


Right now when doing a drop, Hive first drops metadata and then drops the 
actual files. If file system is down at that time, the files will keep not 
deleted. 

Had an offline discussion about this:
to fix this, add a new conf scratch dir into hive conf. 
when doing a drop operation:
1) move data to scratch directory
2) drop metadata
3) if 2) failed, roll back 1) and report error 3.1
if 2) succeeded, drop data from scratch directory 3.2
4) if 3.2 fails, we are ok because we assume the scratch dir will be emptied 
manually.




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1624) Patch to allows scripts in S3 location

2010-09-22 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913878#action_12913878
 ] 

He Yongqiang commented on HIVE-1624:


looks good basically. need to remove some unneeded logging information

one main problem here is to determine when to download file. We can not simply 
try downloading file when can not be found in local. 
Sometimes scripts exist in some remote dir that the hadoop cluster nodes can 
access but the client can not.

 Patch to allows scripts in S3 location
 --

 Key: HIVE-1624
 URL: https://issues.apache.org/jira/browse/HIVE-1624
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Vaibhav Aggarwal
Assignee: Vaibhav Aggarwal
 Attachments: HIVE-1624-2.patch, HIVE-1624.patch


 I want to submit a patch which allows user to run scripts located in S3.
 This patch enables Hive to download the hive scripts located in S3 buckets 
 and execute them. This saves users the effort of copying scripts to HDFS 
 before executing them.
 Thanks
 Vaibhav

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1624) Patch to allows scripts in S3 location

2010-09-22 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913895#action_12913895
 ] 

He Yongqiang commented on HIVE-1624:


For 2, sometimes it is actually a common case. For example, User can use php 
but no need to have php program in local. We can add some simple rule for 
downloading resource files, such as starts with s3 schema in this case.  

 Patch to allows scripts in S3 location
 --

 Key: HIVE-1624
 URL: https://issues.apache.org/jira/browse/HIVE-1624
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Vaibhav Aggarwal
Assignee: Vaibhav Aggarwal
 Attachments: HIVE-1624-2.patch, HIVE-1624.patch


 I want to submit a patch which allows user to run scripts located in S3.
 This patch enables Hive to download the hive scripts located in S3 buckets 
 and execute them. This saves users the effort of copying scripts to HDFS 
 before executing them.
 Thanks
 Vaibhav

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1633) CombineHiveInputFormat fails with cannot find dir for emptyFile

2010-09-21 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913230#action_12913230
 ] 

He Yongqiang commented on HIVE-1633:


Amareshwari, by adding a testcase in TestHiveFileFormatUtils, you will be able 
to find out the underlying problem, and then can you post a patch for it?

 CombineHiveInputFormat fails with cannot find dir for emptyFile
 -

 Key: HIVE-1633
 URL: https://issues.apache.org/jira/browse/HIVE-1633
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Clients
Reporter: Amareshwari Sriramadasu



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1609) Support partition filtering in metastore

2010-09-21 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913240#action_12913240
 ] 

He Yongqiang commented on HIVE-1609:


[by several partition functions in my previous comment, i mean the existing 
partition functions.] So just want to make sure the ones added in this jira 
will work finely for python client. 

@john, pls go ahead commit this. This is a really good one to have. We can fix 
problems later if there are any.

 Support partition filtering in metastore
 

 Key: HIVE-1609
 URL: https://issues.apache.org/jira/browse/HIVE-1609
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Metastore
Reporter: Ajay Kidave
Assignee: Ajay Kidave
 Fix For: 0.7.0

 Attachments: hive_1609.patch, hive_1609_2.patch, hive_1609_3.patch


 The metastore needs to have support for returning a list of partitions based 
 on user specified filter conditions. This will be useful for tools which need 
 to do partition pruning. Howl is one such use case. The way partition pruning 
 is done during hive query execution need not be changed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1662) Add file pruning into Hive.

2010-09-21 Thread He Yongqiang (JIRA)
Add file pruning into Hive.
---

 Key: HIVE-1662
 URL: https://issues.apache.org/jira/browse/HIVE-1662
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: He Yongqiang


now hive support filename virtual column. 
if a file name filter presents in a query, hive should be able to only add 
files which passed the filter to input paths.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1663) ql/src/java/org/apache/hadoop/hive/ql/parse/SamplePruner.java is empty

2010-09-21 Thread He Yongqiang (JIRA)
ql/src/java/org/apache/hadoop/hive/ql/parse/SamplePruner.java is empty
--

 Key: HIVE-1663
 URL: https://issues.apache.org/jira/browse/HIVE-1663
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang


we should remove this empty file

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1633) CombineHiveInputFormat fails with cannot find dir for emptyFile

2010-09-20 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12912795#action_12912795
 ] 

He Yongqiang commented on HIVE-1633:


For a given path, CombineHiveInputFormat does recursive lookup in 
partToPartitionInfo. If no match found, will lookup for the parent dir 
(hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1) 
in partToPartitionInfo. In your case, it seems the parent dir exist in 
partToPartitionInfo. 

 CombineHiveInputFormat fails with cannot find dir for emptyFile
 -

 Key: HIVE-1633
 URL: https://issues.apache.org/jira/browse/HIVE-1633
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Clients
Reporter: Amareshwari Sriramadasu



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1650) TestContribNegativeCliDriver fails

2010-09-17 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910648#action_12910648
 ] 

He Yongqiang commented on HIVE-1650:


+1, running tests.

 TestContribNegativeCliDriver fails
 --

 Key: HIVE-1650
 URL: https://issues.apache.org/jira/browse/HIVE-1650
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.1650.1.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1650) TestContribNegativeCliDriver fails

2010-09-17 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1650:
---

Status: Resolved  (was: Patch Available)
Resolution: Fixed

I just committed! Thanks Namit!

 TestContribNegativeCliDriver fails
 --

 Key: HIVE-1650
 URL: https://issues.apache.org/jira/browse/HIVE-1650
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.1650.1.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1226) support filter pushdown against non-native tables

2010-09-17 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1226:
---

Status: Resolved  (was: Patch Available)
Resolution: Fixed

I just committed! Thanks John!

 support filter pushdown against non-native tables
 -

 Key: HIVE-1226
 URL: https://issues.apache.org/jira/browse/HIVE-1226
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: HBase Handler, Query Processor
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: John Sichi
 Fix For: 0.7.0

 Attachments: HIVE-1226.1.patch, HIVE-1226.2.patch, HIVE-1226.3.patch, 
 HIVE-1226.4.patch


 For example, HBase's scan object can take filters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1645) ability to specify parent directory for zookeeper lock manager

2010-09-16 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang resolved HIVE-1645.


Resolution: Fixed

I just committed! Thanks Namit!

 ability to specify parent directory for zookeeper lock manager
 --

 Key: HIVE-1645
 URL: https://issues.apache.org/jira/browse/HIVE-1645
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.1645.1.patch


 For concurrency support, it would be desirable if all the locks were created 
 under a common parent, so that zookeeper can be used
 for different purposes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1633) CombineHiveInputFormat fails with cannot find dir for emptyFile

2010-09-16 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910255#action_12910255
 ] 

He Yongqiang commented on HIVE-1633:


Can you search 
hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1 
(replacing xxx with actual file/host names)?

It should appear one time in partToPartitionInfo and another one time in 
hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1/emptyFile.


 CombineHiveInputFormat fails with cannot find dir for emptyFile
 -

 Key: HIVE-1633
 URL: https://issues.apache.org/jira/browse/HIVE-1633
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Clients
Reporter: Amareshwari Sriramadasu



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1633) CombineHiveInputFormat fails with cannot find dir for emptyFile

2010-09-16 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910431#action_12910431
 ] 

He Yongqiang commented on HIVE-1633:


so 'xxx' part is not the same in 
hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1/ 
and 
hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1/emptyFile
?

 CombineHiveInputFormat fails with cannot find dir for emptyFile
 -

 Key: HIVE-1633
 URL: https://issues.apache.org/jira/browse/HIVE-1633
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Clients
Reporter: Amareshwari Sriramadasu



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1633) CombineHiveInputFormat fails with cannot find dir for emptyFile

2010-09-15 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909711#action_12909711
 ] 

He Yongqiang commented on HIVE-1633:


@Amareshwari

in your example:
hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1/emptyFile
in partToPartitionInfo:
[xxx..., xxx..., xxx..., ...
 hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1,
hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/2]

If i put these into TestHiveFormatUtils, it can return correct value. Maybe 
there is some mismatch about 'xxx'?

 CombineHiveInputFormat fails with cannot find dir for emptyFile
 -

 Key: HIVE-1633
 URL: https://issues.apache.org/jira/browse/HIVE-1633
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Clients
Reporter: Amareshwari Sriramadasu



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1226) support filter pushdown against non-native tables

2010-09-15 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909878#action_12909878
 ] 

He Yongqiang commented on HIVE-1226:


The patch looks good. 

One question:
In HBaseStorageHandler, it will exit if searchConditions.size() != 1. 
This makes sense if there are two point predicates on the key column (connected 
by  'AND'). What if they are composed to perform a range query (like akeyb) ?

Can you also open another jira for indexing to leverage this change? Indexing 
needs this change to do automatic rewrite of user's query.

 support filter pushdown against non-native tables
 -

 Key: HIVE-1226
 URL: https://issues.apache.org/jira/browse/HIVE-1226
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: HBase Handler, Query Processor
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: John Sichi
 Fix For: 0.7.0

 Attachments: HIVE-1226.1.patch, HIVE-1226.2.patch, HIVE-1226.3.patch, 
 HIVE-1226.4.patch


 For example, HBase's scan object can take filters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1645) ability to specify parent directory for zookeeper lock manager

2010-09-15 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909992#action_12909992
 ] 

He Yongqiang commented on HIVE-1645:


+1, running tests.

 ability to specify parent directory for zookeeper lock manager
 --

 Key: HIVE-1645
 URL: https://issues.apache.org/jira/browse/HIVE-1645
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.1645.1.patch


 For concurrency support, it would be desirable if all the locks were created 
 under a common parent, so that zookeeper can be used
 for different purposes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1633) CombineHiveInputFormat fails with cannot find dir for emptyFile

2010-09-13 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908716#action_12908716
 ] 

He Yongqiang commented on HIVE-1633:


Amareshwari, more details about your example? From your example, i can not 
reproduce the problem.

 CombineHiveInputFormat fails with cannot find dir for emptyFile
 -

 Key: HIVE-1633
 URL: https://issues.apache.org/jira/browse/HIVE-1633
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Clients
Reporter: Amareshwari Sriramadasu



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1624) Patch to allows scripts in S3 location

2010-09-13 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908729#action_12908729
 ] 

He Yongqiang commented on HIVE-1624:


S3 - client - cluster maybe better than directly downloading the script from 
S3 to TaskTracker node.
There may be thousands of concurrent downloading request to S3 for downloading 
a script. (I agree that the script can be cached in local machine, but right 
now hive does not do any cache clean up).
S3 - client - cluster will be able to use hadoop distributed cache.

 Patch to allows scripts in S3 location
 --

 Key: HIVE-1624
 URL: https://issues.apache.org/jira/browse/HIVE-1624
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Vaibhav Aggarwal
 Attachments: HIVE-1624.patch


 I want to submit a patch which allows user to run scripts located in S3.
 This patch enables Hive to download the hive scripts located in S3 buckets 
 and execute them. This saves users the effort of copying scripts to HDFS 
 before executing them.
 Thanks
 Vaibhav

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1610) Using CombinedHiveInputFormat causes partToPartitionInfo IOException

2010-09-07 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906837#action_12906837
 ] 

He Yongqiang commented on HIVE-1610:


Sammy, we can not fix this issue by just removing the schema check. 
If the input URI's path part is the same with one partition's path, but their 
schema is different, we should still return NULL.

For your case, the main problem is the port, which is contained in the 
partitionDesc but not in the input path.

Is it possible if we just ignore the port? I mean is there a case that two 
different instances share the same address but use different port?

 Using CombinedHiveInputFormat causes partToPartitionInfo IOException  
 --

 Key: HIVE-1610
 URL: https://issues.apache.org/jira/browse/HIVE-1610
 Project: Hadoop Hive
  Issue Type: Bug
 Environment: Hadoop 0.20.2
Reporter: Sammy Yu
 Attachments: 
 0002-HIVE-1610.-Added-additional-schema-check-to-doGetPar.patch, 
 0003-HIVE-1610.patch


 I have a relatively complicated hive query using CombinedHiveInputFormat:
 set hive.exec.dynamic.partition.mode=nonstrict;
 set hive.exec.dynamic.partition=true; 
 set hive.exec.max.dynamic.partitions=1000;
 set hive.exec.max.dynamic.partitions.pernode=300;
 set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
 INSERT OVERWRITE TABLE keyword_serp_results_no_dups PARTITION(week) select 
 distinct keywords.keyword, keywords.domain, keywords.url, keywords.rank, 
 keywords.universal_rank, keywords.serp_type, keywords.date_indexed, 
 keywords.search_engine_type, keywords.week from keyword_serp_results keywords 
 JOIN (select domain, keyword, search_engine_type, week, max_date_indexed, 
 min(rank) as best_rank from (select keywords1.domain, keywords1.keyword, 
 keywords1.search_engine_type,  keywords1.week, keywords1.rank, 
 dupkeywords1.max_date_indexed from keyword_serp_results keywords1 JOIN 
 (select domain, keyword, search_engine_type, week, max(date_indexed) as 
 max_date_indexed from keyword_serp_results group by 
 domain,keyword,search_engine_type,week) dupkeywords1 on keywords1.keyword = 
 dupkeywords1.keyword AND  keywords1.domain = dupkeywords1.domain AND 
 keywords1.search_engine_type = dupkeywords1.search_engine_type AND 
 keywords1.week = dupkeywords1.week AND keywords1.date_indexed = 
 dupkeywords1.max_date_indexed) dupkeywords2 group by 
 domain,keyword,search_engine_type,week,max_date_indexed ) dupkeywords3 on 
 keywords.keyword = dupkeywords3.keyword AND  keywords.domain = 
 dupkeywords3.domain AND keywords.search_engine_type = 
 dupkeywords3.search_engine_type AND keywords.week = dupkeywords3.week AND 
 keywords.date_indexed = dupkeywords3.max_date_indexed AND keywords.rank = 
 dupkeywords3.best_rank;
  
 This query use to work fine until I updated to r991183 on trunk and started 
 getting this error:
 java.io.IOException: cannot find dir = 
 hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/tmp/hive-root/hive_2010-09-01_10-57-41_396_1409145025949924904/-mr-10002/00_0
  in 
 partToPartitionInfo: 
 [hdfs://ec2-75-101-174-245.compute-1.amazonaws.com:8020/tmp/hive-root/hive_2010-09-01_10-57-41_396_1409145025949924904/-mr-10002,
 hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=417/week=201035/day=20100829,
 hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=418/week=201035/day=20100829,
 hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=419/week=201035/day=20100829,
 hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=422/week=201035/day=20100829,
 hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=422/week=201035/day=20100831]
 at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:277)
 at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit.init(CombineHiveInputFormat.java:100)
 at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:312)
 at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
 at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
 at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
 at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:610)
 at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:120)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
 This query works if I don't change the hive.input.format.
 set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
 I've narrowed down this issue to the commit for HIVE-1510.  If I take out the 
 changeset from r987746, everything 

[jira] Commented: (HIVE-1610) Using CombinedHiveInputFormat causes partToPartitionInfo IOException

2010-09-07 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907058#action_12907058
 ] 

He Yongqiang commented on HIVE-1610:


Sammy, there are mainly 2 problems. 
1) going over the map is not efficient, and 2) using startWith to do prefix 
match is a bug fixed in HIVE-1510.

Sammy, can you change the logic as follows:

right now, hive generates another pathToPartitionInfo map by removing the 
path's schema information, and put it in a cacheMap. 
We can keep the same logic but change the new pathToPartitionInfo map's value 
to be an array of PartitionDesc. 
And then we can just remove the schema check, and once we get a match, we go 
through the array of PartitionDesc to find the best one.

this can also solve another problem. If there are 2 partitionDesc which's path 
part is same but the schema is different, only one is contained in the new 
pathToPartitionInfo map. 

About how to go through the array of PartitionDesc to find the best one:
if the array contains only 1 element, return array.get(0);
1) if the original input does not have any schema information:  if the array 
contains more then 1 element, report error.
2) if the original input contains schema information: 1) if the array contains 
an element which's the exact match (also contains schema and port, and the same 
with input), return the exact match. 2) ignore port part but keep the schema 
and address, and go through the array. 

what do you think?

 Using CombinedHiveInputFormat causes partToPartitionInfo IOException  
 --

 Key: HIVE-1610
 URL: https://issues.apache.org/jira/browse/HIVE-1610
 Project: Hadoop Hive
  Issue Type: Bug
 Environment: Hadoop 0.20.2
Reporter: Sammy Yu
 Attachments: 
 0002-HIVE-1610.-Added-additional-schema-check-to-doGetPar.patch, 
 0003-HIVE-1610.patch, 0004-hive.patch


 I have a relatively complicated hive query using CombinedHiveInputFormat:
 set hive.exec.dynamic.partition.mode=nonstrict;
 set hive.exec.dynamic.partition=true; 
 set hive.exec.max.dynamic.partitions=1000;
 set hive.exec.max.dynamic.partitions.pernode=300;
 set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
 INSERT OVERWRITE TABLE keyword_serp_results_no_dups PARTITION(week) select 
 distinct keywords.keyword, keywords.domain, keywords.url, keywords.rank, 
 keywords.universal_rank, keywords.serp_type, keywords.date_indexed, 
 keywords.search_engine_type, keywords.week from keyword_serp_results keywords 
 JOIN (select domain, keyword, search_engine_type, week, max_date_indexed, 
 min(rank) as best_rank from (select keywords1.domain, keywords1.keyword, 
 keywords1.search_engine_type,  keywords1.week, keywords1.rank, 
 dupkeywords1.max_date_indexed from keyword_serp_results keywords1 JOIN 
 (select domain, keyword, search_engine_type, week, max(date_indexed) as 
 max_date_indexed from keyword_serp_results group by 
 domain,keyword,search_engine_type,week) dupkeywords1 on keywords1.keyword = 
 dupkeywords1.keyword AND  keywords1.domain = dupkeywords1.domain AND 
 keywords1.search_engine_type = dupkeywords1.search_engine_type AND 
 keywords1.week = dupkeywords1.week AND keywords1.date_indexed = 
 dupkeywords1.max_date_indexed) dupkeywords2 group by 
 domain,keyword,search_engine_type,week,max_date_indexed ) dupkeywords3 on 
 keywords.keyword = dupkeywords3.keyword AND  keywords.domain = 
 dupkeywords3.domain AND keywords.search_engine_type = 
 dupkeywords3.search_engine_type AND keywords.week = dupkeywords3.week AND 
 keywords.date_indexed = dupkeywords3.max_date_indexed AND keywords.rank = 
 dupkeywords3.best_rank;
  
 This query use to work fine until I updated to r991183 on trunk and started 
 getting this error:
 java.io.IOException: cannot find dir = 
 hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/tmp/hive-root/hive_2010-09-01_10-57-41_396_1409145025949924904/-mr-10002/00_0
  in 
 partToPartitionInfo: 
 [hdfs://ec2-75-101-174-245.compute-1.amazonaws.com:8020/tmp/hive-root/hive_2010-09-01_10-57-41_396_1409145025949924904/-mr-10002,
 hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=417/week=201035/day=20100829,
 hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=418/week=201035/day=20100829,
 hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=419/week=201035/day=20100829,
 hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=422/week=201035/day=20100829,
 hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=422/week=201035/day=20100831]
 at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:277)
 at 
 

[jira] Commented: (HIVE-1610) Using CombinedHiveInputFormat causes partToPartitionInfo IOException

2010-09-02 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905751#action_12905751
 ] 

He Yongqiang commented on HIVE-1610:


Sammy, the only change in TestHiveFileFormatUtils is to remove URI scheme 
checks (1 line change). 
You actually added some lines of code which were removed by HIVE-1510, and this 
is the reason the testcase fails. 

 Using CombinedHiveInputFormat causes partToPartitionInfo IOException  
 --

 Key: HIVE-1610
 URL: https://issues.apache.org/jira/browse/HIVE-1610
 Project: Hadoop Hive
  Issue Type: Bug
 Environment: Hadoop 0.20.2
Reporter: Sammy Yu
 Attachments: 
 0002-HIVE-1610.-Added-additional-schema-check-to-doGetPar.patch, 
 0003-HIVE-1610.patch


 I have a relatively complicated hive query using CombinedHiveInputFormat:
 set hive.exec.dynamic.partition.mode=nonstrict;
 set hive.exec.dynamic.partition=true; 
 set hive.exec.max.dynamic.partitions=1000;
 set hive.exec.max.dynamic.partitions.pernode=300;
 set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
 INSERT OVERWRITE TABLE keyword_serp_results_no_dups PARTITION(week) select 
 distinct keywords.keyword, keywords.domain, keywords.url, keywords.rank, 
 keywords.universal_rank, keywords.serp_type, keywords.date_indexed, 
 keywords.search_engine_type, keywords.week from keyword_serp_results keywords 
 JOIN (select domain, keyword, search_engine_type, week, max_date_indexed, 
 min(rank) as best_rank from (select keywords1.domain, keywords1.keyword, 
 keywords1.search_engine_type,  keywords1.week, keywords1.rank, 
 dupkeywords1.max_date_indexed from keyword_serp_results keywords1 JOIN 
 (select domain, keyword, search_engine_type, week, max(date_indexed) as 
 max_date_indexed from keyword_serp_results group by 
 domain,keyword,search_engine_type,week) dupkeywords1 on keywords1.keyword = 
 dupkeywords1.keyword AND  keywords1.domain = dupkeywords1.domain AND 
 keywords1.search_engine_type = dupkeywords1.search_engine_type AND 
 keywords1.week = dupkeywords1.week AND keywords1.date_indexed = 
 dupkeywords1.max_date_indexed) dupkeywords2 group by 
 domain,keyword,search_engine_type,week,max_date_indexed ) dupkeywords3 on 
 keywords.keyword = dupkeywords3.keyword AND  keywords.domain = 
 dupkeywords3.domain AND keywords.search_engine_type = 
 dupkeywords3.search_engine_type AND keywords.week = dupkeywords3.week AND 
 keywords.date_indexed = dupkeywords3.max_date_indexed AND keywords.rank = 
 dupkeywords3.best_rank;
  
 This query use to work fine until I updated to r991183 on trunk and started 
 getting this error:
 java.io.IOException: cannot find dir = 
 hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/tmp/hive-root/hive_2010-09-01_10-57-41_396_1409145025949924904/-mr-10002/00_0
  in 
 partToPartitionInfo: 
 [hdfs://ec2-75-101-174-245.compute-1.amazonaws.com:8020/tmp/hive-root/hive_2010-09-01_10-57-41_396_1409145025949924904/-mr-10002,
 hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=417/week=201035/day=20100829,
 hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=418/week=201035/day=20100829,
 hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=419/week=201035/day=20100829,
 hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=422/week=201035/day=20100829,
 hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=422/week=201035/day=20100831]
 at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:277)
 at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit.init(CombineHiveInputFormat.java:100)
 at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:312)
 at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
 at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
 at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
 at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:610)
 at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:120)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
 This query works if I don't change the hive.input.format.
 set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
 I've narrowed down this issue to the commit for HIVE-1510.  If I take out the 
 changeset from r987746, everything works as before.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-741) NULL is not handled correctly in join

2010-08-23 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12901503#action_12901503
 ] 

He Yongqiang commented on HIVE-741:
---

+1. The patch looks good to me. 
(Only have one minor comment on the name of hasNullElements, should we rename 
it since this function is used to determine all keys are null?)



 NULL is not handled correctly in join
 -

 Key: HIVE-741
 URL: https://issues.apache.org/jira/browse/HIVE-741
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Amareshwari Sriramadasu
 Attachments: patch-741-1.txt, patch-741-2.txt, patch-741-3.txt, 
 patch-741-4.txt, patch-741-5.txt, patch-741.txt, smbjoin_nulls.q.txt


 With the following data in table input4_cb:
 KeyValue
 --   
 NULL 325
 18  NULL
 The following query:
 {code}
 select * from input4_cb a join input4_cb b on a.key = b.value;
 {code}
 returns the following result:
 NULL32518   NULL
 The correct result should be empty set.
 When 'null' is replaced by '' it works.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-741) NULL is not handled correctly in join

2010-08-23 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12901508#action_12901508
 ] 

He Yongqiang commented on HIVE-741:
---

also about Ning's comments:
2) SMBMapJoinOperator.compareKey() is called for each row so it is critical 
for performance. In your code the hasNullElement() could be called 4 times in 
the worse case. If you cache the result it can be called only twice.
Agree. Not sure how much overhead is there, will try to estimate the overhead 
over production running. That will be great if you can try to cache the null 
check results, so that it can only happen one time for each key. 

 NULL is not handled correctly in join
 -

 Key: HIVE-741
 URL: https://issues.apache.org/jira/browse/HIVE-741
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Amareshwari Sriramadasu
 Attachments: patch-741-1.txt, patch-741-2.txt, patch-741-3.txt, 
 patch-741-4.txt, patch-741-5.txt, patch-741.txt, smbjoin_nulls.q.txt


 With the following data in table input4_cb:
 KeyValue
 --   
 NULL 325
 18  NULL
 The following query:
 {code}
 select * from input4_cb a join input4_cb b on a.key = b.value;
 {code}
 returns the following result:
 NULL32518   NULL
 The correct result should be empty set.
 When 'null' is replaced by '' it works.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1584) wrong log files in contrib client positive

2010-08-23 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1584:
---

Status: Resolved  (was: Patch Available)
Resolution: Fixed

I just committed! Thanks Namit!

 wrong log files in contrib client positive
 --

 Key: HIVE-1584
 URL: https://issues.apache.org/jira/browse/HIVE-1584
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.7.0

 Attachments: hive.1584.1.patch


 TestContribCliDriver still gets some diffs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1452) Mapside join on non partitioned table with partitioned table causes error

2010-08-23 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang reassigned HIVE-1452:
--

Assignee: Thiruvel Thirumoolan

Great! Assigned to Thiruvel. 
I think he is already in the contributor list, and he can just assign jiras to 
himself now.

 Mapside join on non partitioned table with partitioned table causes error
 -

 Key: HIVE-1452
 URL: https://issues.apache.org/jira/browse/HIVE-1452
 Project: Hadoop Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Assignee: Thiruvel Thirumoolan

 I am running script which contains two tables, one is dynamically partitioned 
 and stored as RCFormat and the other is stored as TXT file.
 The TXT file has around 397MB in size and has around 24million rows.
 {code}
 drop table joinquery;
 create external table joinquery (
   id string,
   type string,
   sec string,
   num string,
   url string,
   cost string,
   listinfo array mapstring,string
 ) 
 STORED AS TEXTFILE
 LOCATION '/projects/joinquery';
 CREATE EXTERNAL TABLE idtable20mil(
 id string
 )
 STORED AS TEXTFILE
 LOCATION '/projects/idtable20mil';
 insert overwrite table joinquery
select 
   /*+ MAPJOIN(idtable20mil) */
   rctable.id,
   rctable.type,
   rctable.map['sec'],
   rctable.map['num'],
   rctable.map['url'],
   rctable.map['cost'],
   rctable.listinfo
 from rctable
 JOIN  idtable20mil on (rctable.id = idtable20mil.id)
 where
 rctable.id is not null and
 rctable.part='value' and
 rctable.subpart='value'and
 rctable.pty='100' and
 rctable.uniqid='1000'
 order by id;
 {code}
 Result:
 Possible error:
   Data file split:string,part:string,subpart:string,subsubpart:stringgt; is 
 corrupted.
 Solution:
   Replace file. i.e. by re-running the query that produced the source table / 
 partition.
 -
 If I look at mapper logs.
 {verbatim}
 Caused by: java.io.IOException: java.io.EOFException
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinObjectValue.readExternal(MapJoinObjectValue.java:109)
   at 
 java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1792)
   at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1751)
   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
   at 
 org.apache.hadoop.hive.ql.util.jdbm.htree.HashBucket.readExternal(HashBucket.java:284)
   at 
 java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1792)
   at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1751)
   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
   at 
 org.apache.hadoop.hive.ql.util.jdbm.helper.Serialization.deserialize(Serialization.java:106)
   at 
 org.apache.hadoop.hive.ql.util.jdbm.helper.DefaultSerializer.deserialize(DefaultSerializer.java:106)
   at 
 org.apache.hadoop.hive.ql.util.jdbm.recman.BaseRecordManager.fetch(BaseRecordManager.java:360)
   at 
 org.apache.hadoop.hive.ql.util.jdbm.recman.BaseRecordManager.fetch(BaseRecordManager.java:332)
   at 
 org.apache.hadoop.hive.ql.util.jdbm.htree.HashDirectory.get(HashDirectory.java:195)
   at org.apache.hadoop.hive.ql.util.jdbm.htree.HTree.get(HTree.java:155)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.HashMapWrapper.get(HashMapWrapper.java:114)
   ... 11 more
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at 
 java.io.ObjectInputStream$BlockDataInputStream.readInt(ObjectInputStream.java:2776)
   at java.io.ObjectInputStream.readInt(ObjectInputStream.java:950)
   at org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:153)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinObjectValue.readExternal(MapJoinObjectValue.java:98)
 {verbatim}
 I am trying to create a testcase, which can demonstrate this error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1581) CompactIndexInputFormat should create split only for files in the index output file.

2010-08-22 Thread He Yongqiang (JIRA)
CompactIndexInputFormat should create split only for files in the index output 
file.


 Key: HIVE-1581
 URL: https://issues.apache.org/jira/browse/HIVE-1581
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-1581.1.patch

We can get a list of files from the index file, so no need to create splits 
based on all files in the base table/partition

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1581) CompactIndexInputFormat should create split only for files in the index output file.

2010-08-22 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1581:
---

Attachment: HIVE-1581.1.patch

 CompactIndexInputFormat should create split only for files in the index 
 output file.
 

 Key: HIVE-1581
 URL: https://issues.apache.org/jira/browse/HIVE-1581
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-1581.1.patch


 We can get a list of files from the index file, so no need to create splits 
 based on all files in the base table/partition

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1581) CompactIndexInputFormat should create split only for files in the index output file.

2010-08-22 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1581:
---

Attachment: (was: HIVE-1581.1.patch)

 CompactIndexInputFormat should create split only for files in the index 
 output file.
 

 Key: HIVE-1581
 URL: https://issues.apache.org/jira/browse/HIVE-1581
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: He Yongqiang
Assignee: He Yongqiang

 We can get a list of files from the index file, so no need to create splits 
 based on all files in the base table/partition

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1582) merge mapfiles task behaves incorrectly for 'inserting overwrite directory...'

2010-08-22 Thread He Yongqiang (JIRA)
merge mapfiles task behaves incorrectly for 'inserting overwrite directory...'
--

 Key: HIVE-1582
 URL: https://issues.apache.org/jira/browse/HIVE-1582
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang


hive 
 
 
  SET hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
hiveSET hive.exec.compress.output=false;
hiveINSERT OVERWRITE DIRECTORY 'x'
  SELECT  from  a;
Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks is set to 0 since there's no reduce operator
..
Ended Job = job_201008191557_54169
Ended Job = 450290112, job is filtered out (removed at runtime).
Launching Job 2 out of 2
.

the second job should not get started.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1581) CompactIndexInputFormat should create split only for files in the index output file.

2010-08-22 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1581:
---

Attachment: HIVE-1581.1.patch

 CompactIndexInputFormat should create split only for files in the index 
 output file.
 

 Key: HIVE-1581
 URL: https://issues.apache.org/jira/browse/HIVE-1581
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-1581.1.patch


 We can get a list of files from the index file, so no need to create splits 
 based on all files in the base table/partition

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1581) CompactIndexInputFormat should create split only for files in the index output file.

2010-08-22 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1581:
---

Status: Patch Available  (was: Open)

 CompactIndexInputFormat should create split only for files in the index 
 output file.
 

 Key: HIVE-1581
 URL: https://issues.apache.org/jira/browse/HIVE-1581
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-1581.1.patch


 We can get a list of files from the index file, so no need to create splits 
 based on all files in the base table/partition

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1582) merge mapfiles task behaves incorrectly for 'inserting overwrite directory...'

2010-08-22 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12901242#action_12901242
 ] 

He Yongqiang commented on HIVE-1582:


Ended Job = 450290112, job is filtered out (removed at runtime).

the second job seems be filtered out at runtime

 merge mapfiles task behaves incorrectly for 'inserting overwrite directory...'
 --

 Key: HIVE-1582
 URL: https://issues.apache.org/jira/browse/HIVE-1582
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang

 hive 
  
  
   SET hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
 hiveSET hive.exec.compress.output=false;
 hiveINSERT OVERWRITE DIRECTORY 'x'
   SELECT  from  a;
 Total MapReduce jobs = 2
 Launching Job 1 out of 2
 Number of reduce tasks is set to 0 since there's no reduce operator
 ..
 Ended Job = job_201008191557_54169
 Ended Job = 450290112, job is filtered out (removed at runtime).
 Launching Job 2 out of 2
 .
 the second job should not get started.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1510) HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path

2010-08-20 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900904#action_12900904
 ] 

He Yongqiang commented on HIVE-1510:


even without this patch, the 0.17 test failed on index_compat3.q. Please file a 
separate jira for this issue. 

 HiveCombineInputFormat should not use prefix matching to find the 
 partitionDesc for a given path
 

 Key: HIVE-1510
 URL: https://issues.apache.org/jira/browse/HIVE-1510
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: hive-1510.1.patch, hive-1510.3.patch, hive-1510.4.patch


 set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
 drop table combine_3_srcpart_seq_rc;
 create table combine_3_srcpart_seq_rc (key int , value string) partitioned by 
 (ds string, hr string) stored as sequencefile;
 insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, 
 hr=00) select * from src;
 alter table combine_3_srcpart_seq_rc set fileformat rcfile;
 insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, 
 hr=001) select * from src;
 desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=00);
 desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=001);
 select * from combine_3_srcpart_seq_rc where ds=2010-08-03 order by key;
 drop table combine_3_srcpart_seq_rc;
 will fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1510) HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path

2010-08-19 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1510:
---

Attachment: hive-1510.4.patch

 HiveCombineInputFormat should not use prefix matching to find the 
 partitionDesc for a given path
 

 Key: HIVE-1510
 URL: https://issues.apache.org/jira/browse/HIVE-1510
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: hive-1510.1.patch, hive-1510.3.patch, hive-1510.4.patch


 set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
 drop table combine_3_srcpart_seq_rc;
 create table combine_3_srcpart_seq_rc (key int , value string) partitioned by 
 (ds string, hr string) stored as sequencefile;
 insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, 
 hr=00) select * from src;
 alter table combine_3_srcpart_seq_rc set fileformat rcfile;
 insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, 
 hr=001) select * from src;
 desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=00);
 desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=001);
 select * from combine_3_srcpart_seq_rc where ds=2010-08-03 order by key;
 drop table combine_3_srcpart_seq_rc;
 will fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1567) increase hive.mapjoin.maxsize to 10 million

2010-08-19 Thread He Yongqiang (JIRA)
increase hive.mapjoin.maxsize to 10 million
---

 Key: HIVE-1567
 URL: https://issues.apache.org/jira/browse/HIVE-1567
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: He Yongqiang


i saw in a very wide table, hive can process 1million rows in less than one 
minute (select all columns).
setting the hive.mapjoin.maxsize to 100k is kind of too restrictive. Let's 
increase this to 10 million.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1561) smb_mapjoin_8.q returns different results in miniMr mode

2010-08-19 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900384#action_12900384
 ] 

He Yongqiang commented on HIVE-1561:


Amareshwari, did you use BucketizedHiveInputFormat for your query? SMBJoin can 
only work with BucketizedHiveInputFormat.

 smb_mapjoin_8.q returns different results in miniMr mode
 

 Key: HIVE-1561
 URL: https://issues.apache.org/jira/browse/HIVE-1561
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Joydeep Sen Sarma
Assignee: He Yongqiang

 follow on to HIVE-1523:
 ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test
 POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer 
 join smb_bucket4_2 b on a.key = b.key
 official results:
 4 val_356 NULL  NULL
 NULL  NULL  484 val_169
 2000  val_169 NULL  NULL
 NULL  NULL  3000  val_169
 4000  val_125 NULL  NULL
 in minimr mode:
 2000  val_169 NULL  NULL
 4 val_356 NULL  NULL
 2000  val_169 NULL  NULL
 4000  val_125 NULL  NULL
 NULL  NULL  5000  val_125

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1564) bucketizedhiveinputformat.q fails in minimr mode

2010-08-19 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang reassigned HIVE-1564:
--

Assignee: He Yongqiang

 bucketizedhiveinputformat.q fails in minimr mode
 

 Key: HIVE-1564
 URL: https://issues.apache.org/jira/browse/HIVE-1564
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Joydeep Sen Sarma
Assignee: He Yongqiang
 Fix For: 0.7.0

 Attachments: hive-1564.1.patch


 followup to HIVE-1523:
 ant -Dtestcase=TestCliDriver -Dqfile=bucketizedhiveinputformat.q 
 -Dclustermode=miniMR  clean-test test 
 [junit] Begin query: bucketizedhiveinputformat.q
 [junit] Exception: null
 [junit] java.lang.AssertionError
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.ExecDriver.showJobFailDebugInfo(ExecDriver.java:788)
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:624)
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:120)
 ExecDriver.java:788
 // These tasks should have come from the same job.
   
 assert(ti.getJobId() == jobId);

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1564) bucketizedhiveinputformat.q fails in minimr mode

2010-08-19 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1564:
---

   Status: Patch Available  (was: Open)
Fix Version/s: 0.7.0

 bucketizedhiveinputformat.q fails in minimr mode
 

 Key: HIVE-1564
 URL: https://issues.apache.org/jira/browse/HIVE-1564
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Joydeep Sen Sarma
Assignee: He Yongqiang
 Fix For: 0.7.0

 Attachments: hive-1564.1.patch


 followup to HIVE-1523:
 ant -Dtestcase=TestCliDriver -Dqfile=bucketizedhiveinputformat.q 
 -Dclustermode=miniMR  clean-test test 
 [junit] Begin query: bucketizedhiveinputformat.q
 [junit] Exception: null
 [junit] java.lang.AssertionError
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.ExecDriver.showJobFailDebugInfo(ExecDriver.java:788)
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:624)
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:120)
 ExecDriver.java:788
 // These tasks should have come from the same job.
   
 assert(ti.getJobId() == jobId);

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1564) bucketizedhiveinputformat.q fails in minimr mode

2010-08-19 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1564:
---

Attachment: hive-1564.1.patch

 bucketizedhiveinputformat.q fails in minimr mode
 

 Key: HIVE-1564
 URL: https://issues.apache.org/jira/browse/HIVE-1564
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Joydeep Sen Sarma
 Fix For: 0.7.0

 Attachments: hive-1564.1.patch


 followup to HIVE-1523:
 ant -Dtestcase=TestCliDriver -Dqfile=bucketizedhiveinputformat.q 
 -Dclustermode=miniMR  clean-test test 
 [junit] Begin query: bucketizedhiveinputformat.q
 [junit] Exception: null
 [junit] java.lang.AssertionError
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.ExecDriver.showJobFailDebugInfo(ExecDriver.java:788)
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:624)
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:120)
 ExecDriver.java:788
 // These tasks should have come from the same job.
   
 assert(ti.getJobId() == jobId);

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1561) smb_mapjoin_8.q returns different results in miniMr mode

2010-08-19 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1561:
---

Status: Patch Available  (was: Open)

 smb_mapjoin_8.q returns different results in miniMr mode
 

 Key: HIVE-1561
 URL: https://issues.apache.org/jira/browse/HIVE-1561
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Joydeep Sen Sarma
Assignee: He Yongqiang
 Attachments: hive-1561.1.patch


 follow on to HIVE-1523:
 ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test
 POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer 
 join smb_bucket4_2 b on a.key = b.key
 official results:
 4 val_356 NULL  NULL
 NULL  NULL  484 val_169
 2000  val_169 NULL  NULL
 NULL  NULL  3000  val_169
 4000  val_125 NULL  NULL
 in minimr mode:
 2000  val_169 NULL  NULL
 4 val_356 NULL  NULL
 2000  val_169 NULL  NULL
 4000  val_125 NULL  NULL
 NULL  NULL  5000  val_125

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1561) smb_mapjoin_8.q returns different results in miniMr mode

2010-08-19 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1561:
---

Attachment: hive-1561.1.patch

 smb_mapjoin_8.q returns different results in miniMr mode
 

 Key: HIVE-1561
 URL: https://issues.apache.org/jira/browse/HIVE-1561
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Joydeep Sen Sarma
Assignee: He Yongqiang
 Attachments: hive-1561.1.patch


 follow on to HIVE-1523:
 ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test
 POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer 
 join smb_bucket4_2 b on a.key = b.key
 official results:
 4 val_356 NULL  NULL
 NULL  NULL  484 val_169
 2000  val_169 NULL  NULL
 NULL  NULL  3000  val_169
 4000  val_125 NULL  NULL
 in minimr mode:
 2000  val_169 NULL  NULL
 4 val_356 NULL  NULL
 2000  val_169 NULL  NULL
 4000  val_125 NULL  NULL
 NULL  NULL  5000  val_125

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1569) groupby_bigdata.q fails in minimr mode

2010-08-19 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang resolved HIVE-1569.


Resolution: Invalid

local mode and miniMR are using different filesystems, so there is no single 
script path that work for both.

 groupby_bigdata.q fails in minimr mode
 --

 Key: HIVE-1569
 URL: https://issues.apache.org/jira/browse/HIVE-1569
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Reporter: Namit Jain
Assignee: He Yongqiang



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1572) skewjoin.q output in minimr differs from local mode

2010-08-19 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang reassigned HIVE-1572:
--

Assignee: He Yongqiang

 skewjoin.q output in minimr differs from local mode
 ---

 Key: HIVE-1572
 URL: https://issues.apache.org/jira/browse/HIVE-1572
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Joydeep Sen Sarma
Assignee: He Yongqiang

 checked in results:
 POSTHOOK: query: SELECT sum(hash(src1.key)), sum(hash(src1.val)), 
 sum(hash(src2.key)) FROM T1 src1 JOIN T2 src2 ON src1.key+1 = src2.key JOIN 
 T2 src3 ON src2.key = src3.key
 370 11003 377
 in minimr mode:
 POSTHOOK: query: SELECT sum(hash(src1.key)), sum(hash(src1.val)), 
 sum(hash(src2.key)) FROM T1 src1 JOIN T2 src2 ON src1.key+1 = src2.key JOIN 
 T2 src3 ON src2.key = src3.key
 150 4707  153
 it seems that the query is deterministic - so filing a bug.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1510) HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path

2010-08-18 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1510:
---

Attachment: hive-1510.3.patch

 HiveCombineInputFormat should not use prefix matching to find the 
 partitionDesc for a given path
 

 Key: HIVE-1510
 URL: https://issues.apache.org/jira/browse/HIVE-1510
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: hive-1510.1.patch, hive-1510.3.patch


 set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
 drop table combine_3_srcpart_seq_rc;
 create table combine_3_srcpart_seq_rc (key int , value string) partitioned by 
 (ds string, hr string) stored as sequencefile;
 insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, 
 hr=00) select * from src;
 alter table combine_3_srcpart_seq_rc set fileformat rcfile;
 insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, 
 hr=001) select * from src;
 desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=00);
 desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=001);
 select * from combine_3_srcpart_seq_rc where ds=2010-08-03 order by key;
 drop table combine_3_srcpart_seq_rc;
 will fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1510) HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path

2010-08-18 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900063#action_12900063
 ] 

He Yongqiang commented on HIVE-1510:


the IOPrepareCache is cleared in Driver, which should only contain generic 
code irrespect to task types. Can you do it in ExecDriver.execute()? This 
will new cache is only used in ExecDriver anyways.

ExecDriver is per map-reduce task. Driver is per query. We should do this for 
query granularity. I think the pathToPartitionDesc is also per query map?


some comments on why you need a new hash map keyed with the paths only will 
be helpful.
will do it in a next patch.

 HiveCombineInputFormat should not use prefix matching to find the 
 partitionDesc for a given path
 

 Key: HIVE-1510
 URL: https://issues.apache.org/jira/browse/HIVE-1510
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: hive-1510.1.patch, hive-1510.3.patch


 set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
 drop table combine_3_srcpart_seq_rc;
 create table combine_3_srcpart_seq_rc (key int , value string) partitioned by 
 (ds string, hr string) stored as sequencefile;
 insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, 
 hr=00) select * from src;
 alter table combine_3_srcpart_seq_rc set fileformat rcfile;
 insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, 
 hr=001) select * from src;
 desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=00);
 desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=001);
 select * from combine_3_srcpart_seq_rc where ds=2010-08-03 order by key;
 drop table combine_3_srcpart_seq_rc;
 will fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1561) smb_mapjoin_8.q returns different results in miniMr mode

2010-08-18 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang reassigned HIVE-1561:
--

Assignee: He Yongqiang

 smb_mapjoin_8.q returns different results in miniMr mode
 

 Key: HIVE-1561
 URL: https://issues.apache.org/jira/browse/HIVE-1561
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Joydeep Sen Sarma
Assignee: He Yongqiang

 follow on to HIVE-1523:
 ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test
 POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer 
 join smb_bucket4_2 b on a.key = b.key
 official results:
 4 val_356 NULL  NULL
 NULL  NULL  484 val_169
 2000  val_169 NULL  NULL
 NULL  NULL  3000  val_169
 4000  val_125 NULL  NULL
 in minimr mode:
 2000  val_169 NULL  NULL
 4 val_356 NULL  NULL
 2000  val_169 NULL  NULL
 4000  val_125 NULL  NULL
 NULL  NULL  5000  val_125

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1510) HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path

2010-08-18 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900074#action_12900074
 ] 

He Yongqiang commented on HIVE-1510:


About the additional hashmap added, it is used to match path to partitionDesc 
by discarding partitionDesc's schema information. 

In the long run, we should normalize all input path to let them contain full 
schema and authorization information. This is a must to let hive work with 
multiple hdfs clusters.

 HiveCombineInputFormat should not use prefix matching to find the 
 partitionDesc for a given path
 

 Key: HIVE-1510
 URL: https://issues.apache.org/jira/browse/HIVE-1510
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: hive-1510.1.patch, hive-1510.3.patch


 set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
 drop table combine_3_srcpart_seq_rc;
 create table combine_3_srcpart_seq_rc (key int , value string) partitioned by 
 (ds string, hr string) stored as sequencefile;
 insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, 
 hr=00) select * from src;
 alter table combine_3_srcpart_seq_rc set fileformat rcfile;
 insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, 
 hr=001) select * from src;
 desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=00);
 desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=001);
 select * from combine_3_srcpart_seq_rc where ds=2010-08-03 order by key;
 drop table combine_3_srcpart_seq_rc;
 will fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1561) smb_mapjoin_8.q returns different results in miniMr mode

2010-08-18 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900161#action_12900161
 ] 

He Yongqiang commented on HIVE-1561:


This is the complete result from Hive's smb_mapjoin_8.q.out, it's correct:
{noformat}
POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer join 
smb_bucket4_2 b on a.key = b.key
POSTHOOK: type: QUERY
POSTHOOK: Input: defa...@smb_bucket4_2
POSTHOOK: Input: defa...@smb_bucket4_1
POSTHOOK: Output: 
file:/tmp/jssarma/hive_2010-07-21_12-02-34_137_8141051139723931378/1
POSTHOOK: Lineage: smb_bucket4_1.key SIMPLE 
[(smb_bucket_input)smb_bucket_input.FieldSchema(name:key, type:int, 
comment:from deserializer), ]
POSTHOOK: Lineage: smb_bucket4_1.value SIMPLE 
[(smb_bucket_input)smb_bucket_input.FieldSchema(name:value, type:string, 
comment:from deserializer), ]
POSTHOOK: Lineage: smb_bucket4_2.key SIMPLE 
[(smb_bucket_input)smb_bucket_input.FieldSchema(name:key, type:int, 
comment:from deserializer), ]
POSTHOOK: Lineage: smb_bucket4_2.value SIMPLE 
[(smb_bucket_input)smb_bucket_input.FieldSchema(name:value, type:string, 
comment:from deserializer), ]
4   val_356 NULLNULL
NULLNULL484 val_169
2000val_169 NULLNULL
NULLNULL3000val_169
4000val_125 NULLNULL
NULLNULL5000val_125
{noformat}




 smb_mapjoin_8.q returns different results in miniMr mode
 

 Key: HIVE-1561
 URL: https://issues.apache.org/jira/browse/HIVE-1561
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Joydeep Sen Sarma
Assignee: He Yongqiang

 follow on to HIVE-1523:
 ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test
 POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer 
 join smb_bucket4_2 b on a.key = b.key
 official results:
 4 val_356 NULL  NULL
 NULL  NULL  484 val_169
 2000  val_169 NULL  NULL
 NULL  NULL  3000  val_169
 4000  val_125 NULL  NULL
 in minimr mode:
 2000  val_169 NULL  NULL
 4 val_356 NULL  NULL
 2000  val_169 NULL  NULL
 4000  val_125 NULL  NULL
 NULL  NULL  5000  val_125

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1203) HiveInputFormat.getInputFormatFromCache swallows cause exception when trowing IOExcpetion

2010-08-17 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12899488#action_12899488
 ] 

He Yongqiang commented on HIVE-1203:


Vladimir, can you update the patch? After that, i will test and commit it.

 HiveInputFormat.getInputFormatFromCache swallows  cause exception when 
 trowing IOExcpetion
 

 Key: HIVE-1203
 URL: https://issues.apache.org/jira/browse/HIVE-1203
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.4.0, 0.4.1, 0.5.0
Reporter: Vladimir Klimontovich
Assignee: Vladimir Klimontovich
 Attachments: 0.4.patch, 0.5.patch, trunk.patch


 To fix this it's simply needed to add second parameter to IOException 
 constructor. Patches for 0.4, 0.5 and trunk are available.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1203) HiveInputFormat.getInputFormatFromCache swallows cause exception when trowing IOExcpetion

2010-08-17 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1203:
---

   Status: Open  (was: Patch Available)
Affects Version/s: (was: 0.4.0)
   (was: 0.5.0)
   (was: 0.4.1)
Fix Version/s: 0.7.0

 HiveInputFormat.getInputFormatFromCache swallows  cause exception when 
 trowing IOExcpetion
 

 Key: HIVE-1203
 URL: https://issues.apache.org/jira/browse/HIVE-1203
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Vladimir Klimontovich
Assignee: Vladimir Klimontovich
 Fix For: 0.7.0

 Attachments: 0.4.patch, 0.5.patch, trunk.patch


 To fix this it's simply needed to add second parameter to IOException 
 constructor. Patches for 0.4, 0.5 and trunk are available.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1548) populate inputs and outputs for all statements

2010-08-17 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12899584#action_12899584
 ] 

He Yongqiang commented on HIVE-1548:


running test now.

 populate inputs and outputs for all statements
 --

 Key: HIVE-1548
 URL: https://issues.apache.org/jira/browse/HIVE-1548
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.1548.1.patch


 Currently, they are only populated for queries - and not for most of the DDLs.
 The pre and post execution hooks do not get the correct values.
 It would also be very useful for locking

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1548) populate inputs and outputs for all statements

2010-08-17 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1548:
---

   Status: Resolved  (was: Patch Available)
Fix Version/s: 0.7.0
   Resolution: Fixed

I just committed! Thanks namit!

 populate inputs and outputs for all statements
 --

 Key: HIVE-1548
 URL: https://issues.apache.org/jira/browse/HIVE-1548
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.7.0

 Attachments: hive.1548.1.patch


 Currently, they are only populated for queries - and not for most of the DDLs.
 The pre and post execution hooks do not get the correct values.
 It would also be very useful for locking

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-741) NULL is not handled correctly in join

2010-08-16 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12898996#action_12898996
 ] 

He Yongqiang commented on HIVE-741:
---

the change looks good to me. Can you also add one or few tests for sort merge 
join?

 NULL is not handled correctly in join
 -

 Key: HIVE-741
 URL: https://issues.apache.org/jira/browse/HIVE-741
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Amareshwari Sriramadasu
 Attachments: patch-741.txt


 With the following data in table input4_cb:
 KeyValue
 --   
 NULL 325
 18  NULL
 The following query:
 {code}
 select * from input4_cb a join input4_cb b on a.key = b.value;
 {code}
 returns the following result:
 NULL32518   NULL
 The correct result should be empty set.
 When 'null' is replaced by '' it works.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1510) HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path

2010-08-16 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12899086#action_12899086
 ] 

He Yongqiang commented on HIVE-1510:


Since HIVE-1515 depends on Hadoop, can we close this jira without adding new 
archive testcases.

 HiveCombineInputFormat should not use prefix matching to find the 
 partitionDesc for a given path
 

 Key: HIVE-1510
 URL: https://issues.apache.org/jira/browse/HIVE-1510
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: hive-1510.1.patch


 set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
 drop table combine_3_srcpart_seq_rc;
 create table combine_3_srcpart_seq_rc (key int , value string) partitioned by 
 (ds string, hr string) stored as sequencefile;
 insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, 
 hr=00) select * from src;
 alter table combine_3_srcpart_seq_rc set fileformat rcfile;
 insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, 
 hr=001) select * from src;
 desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=00);
 desc extended combine_3_srcpart_seq_rc partition(ds=2010-08-03, hr=001);
 select * from combine_3_srcpart_seq_rc where ds=2010-08-03 order by key;
 drop table combine_3_srcpart_seq_rc;
 will fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1543) set abort in ExecMapper when Hive's record reader got an IOException

2010-08-16 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12899098#action_12899098
 ] 

He Yongqiang commented on HIVE-1543:


let's do it in HiveContextAwareRecordReader. And maybe store the var in 
IOContext?

 set abort in ExecMapper when Hive's record reader got an IOException
 

 Key: HIVE-1543
 URL: https://issues.apache.org/jira/browse/HIVE-1543
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Fix For: 0.7.0

 Attachments: HIVE-1543.patch


 When RecordReader got an IOException, ExecMapper does not know and will close 
 the operators as if there is not error. We should catch this exception and 
 avoid writing partial results to HDFS which will be removed later anyways.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1543) set abort in ExecMapper when Hive's record reader got an IOException

2010-08-16 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12899169#action_12899169
 ] 

He Yongqiang commented on HIVE-1543:


we can do two different patches for trunk and 0.6.

I think BucketizedHiveRecordReader also extends HiveContextAwareRecordReader.

 set abort in ExecMapper when Hive's record reader got an IOException
 

 Key: HIVE-1543
 URL: https://issues.apache.org/jira/browse/HIVE-1543
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Fix For: 0.7.0

 Attachments: HIVE-1543.patch


 When RecordReader got an IOException, ExecMapper does not know and will close 
 the operators as if there is not error. We should catch this exception and 
 avoid writing partial results to HDFS which will be removed later anyways.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1532) Replace globStatus with listStatus inside Hive.java's replaceFiles.

2010-08-14 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1532:
---

Attachment: Hive-1532.1.patch

 Replace globStatus with listStatus inside Hive.java's replaceFiles.
 ---

 Key: HIVE-1532
 URL: https://issues.apache.org/jira/browse/HIVE-1532
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: Hive-1532.1.patch


 globStatus expects a regular expression,  so if there is special characters 
 (like '{' , '[') in the filepath, this function will fail.
 We should be able to replace this call with listStatus easily since we are 
 not passing regex to replaceFiles(). The only places replaceFiles is called 
 is in loadPartition and Table's replaceFiles.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1514) Be able to modify a partition's fileformat and file location information.

2010-08-13 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12898350#action_12898350
 ] 

He Yongqiang commented on HIVE-1514:


I updated the wiki page here :
http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL#Alter_Table.2BAC8-Partition_Location

This only change the metadata. With this patch, you will be able to let the 
partition point to some external places, and use a new fileformat. If the 
metadata you specified is correct, you will be able to do that.

 Be able to modify a partition's fileformat and file location information.
 -

 Key: HIVE-1514
 URL: https://issues.apache.org/jira/browse/HIVE-1514
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: hive-1514.1.patch, hive-1514.2.patch, hive-1514.3.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1495) supply correct information to hooks and lineage for index rebuild

2010-08-12 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1495:
---

Attachment: hive-1495.5.patch

Sorry, forgot to update outputs for these two testcases. Will be more careful 
next time.

 supply correct information to hooks and lineage for index rebuild
 -

 Key: HIVE-1495
 URL: https://issues.apache.org/jira/browse/HIVE-1495
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Indexing
Affects Versions: 0.7.0
Reporter: John Sichi
Assignee: He Yongqiang
 Fix For: 0.7.0

 Attachments: hive-1495.1.patch, hive-1495.2.patch, hive-1495.3.patch, 
 hive-1495.4.patch, hive-1495.5.patch


 This is a followup for HIVE-417.  
 Ashish can probably help on how this should work.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1515) archive is not working when multiple partitions inside one table are archived.

2010-08-12 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1515:
---

Attachment: hive-1515.2.patch

Attache a possible fix.

Talked with Namit and Paul this afternoon about this issue. Actually there is 
config which can disable FileSystem cache: fs.%s.impl.disable.cache . where %s 
is the filesystem schema, for archive, it's har.

So if you set fs.har.impl.disable.cache to false, the archive will 
automatically work. This should be the clean way to fix this issue.
In order to do this, you need to apply 
https://issues.apache.org/jira/browse/HADOOP-6231 if your hadoop does not 
include the code to disable FileSystem cache.

 archive is not working when multiple partitions inside one table are archived.
 --

 Key: HIVE-1515
 URL: https://issues.apache.org/jira/browse/HIVE-1515
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: hive-1515.1.patch, hive-1515.2.patch


 set hive.exec.compress.output = true;
 set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
 set mapred.min.split.size=256;
 set mapred.min.split.size.per.node=256;
 set mapred.min.split.size.per.rack=256;
 set mapred.max.split.size=256;
 set hive.archive.enabled = true;
 drop table combine_3_srcpart_seq_rc;
 create table combine_3_srcpart_seq_rc (key int , value string) partitioned by 
 (ds string, hr string) stored as sequencefile;
 insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, 
 hr=00) select * from src;
 insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, 
 hr=001) select * from src;
 ALTER TABLE combine_3_srcpart_seq_rc ARCHIVE PARTITION (ds=2010-08-03, 
 hr=00);
 ALTER TABLE combine_3_srcpart_seq_rc ARCHIVE PARTITION (ds=2010-08-03, 
 hr=001);
 select key, value, ds, hr from combine_3_srcpart_seq_rc where ds=2010-08-03 
 order by key, hr limit 30;
 drop table combine_3_srcpart_seq_rc;
 will fail.
 java.io.IOException: Invalid file name: 
 har:/data/users/heyongqiang/hive-trunk-clean/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=001/data.har/data/users/heyongqiang/hive-trunk-clean/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=001
  in 
 har:/data/users/heyongqiang/hive-trunk-clean/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=00/data.har
 The reason it fails is because:
 there are 2 input paths (one for each partition) for the above query:
 1): 
 har:/Users/heyongqiang/Documents/workspace/Hive-Index/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=00/data.har/Users/heyongqiang/Documents/workspace/Hive-Index/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=00
 2): 
 har:/Users/heyongqiang/Documents/workspace/Hive-Index/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=001/data.har/Users/heyongqiang/Documents/workspace/Hive-Index/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=001
 But when doing path.getFileSystem() for these 2 input paths. they both return 
 same one file system instance which points the first caller, in this case 
 which is 
 har:/Users/heyongqiang/Documents/workspace/Hive-Index/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=00/data.har
 The reason here is Hadoop's FileSystem has a global cache, and when trying to 
 load a FileSystem instance from a given path, it only take the path's scheme 
 and username to lookup the cache. So when we do Path.getFileSystem for the 
 second har path, it actually returns the file system handle for the first 
 path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1515) archive is not working when multiple partitions inside one table are archived.

2010-08-12 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1515:
---

Assignee: (was: He Yongqiang)

 archive is not working when multiple partitions inside one table are archived.
 --

 Key: HIVE-1515
 URL: https://issues.apache.org/jira/browse/HIVE-1515
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: He Yongqiang
 Attachments: hive-1515.1.patch, hive-1515.2.patch


 set hive.exec.compress.output = true;
 set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
 set mapred.min.split.size=256;
 set mapred.min.split.size.per.node=256;
 set mapred.min.split.size.per.rack=256;
 set mapred.max.split.size=256;
 set hive.archive.enabled = true;
 drop table combine_3_srcpart_seq_rc;
 create table combine_3_srcpart_seq_rc (key int , value string) partitioned by 
 (ds string, hr string) stored as sequencefile;
 insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, 
 hr=00) select * from src;
 insert overwrite table combine_3_srcpart_seq_rc partition (ds=2010-08-03, 
 hr=001) select * from src;
 ALTER TABLE combine_3_srcpart_seq_rc ARCHIVE PARTITION (ds=2010-08-03, 
 hr=00);
 ALTER TABLE combine_3_srcpart_seq_rc ARCHIVE PARTITION (ds=2010-08-03, 
 hr=001);
 select key, value, ds, hr from combine_3_srcpart_seq_rc where ds=2010-08-03 
 order by key, hr limit 30;
 drop table combine_3_srcpart_seq_rc;
 will fail.
 java.io.IOException: Invalid file name: 
 har:/data/users/heyongqiang/hive-trunk-clean/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=001/data.har/data/users/heyongqiang/hive-trunk-clean/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=001
  in 
 har:/data/users/heyongqiang/hive-trunk-clean/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=00/data.har
 The reason it fails is because:
 there are 2 input paths (one for each partition) for the above query:
 1): 
 har:/Users/heyongqiang/Documents/workspace/Hive-Index/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=00/data.har/Users/heyongqiang/Documents/workspace/Hive-Index/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=00
 2): 
 har:/Users/heyongqiang/Documents/workspace/Hive-Index/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=001/data.har/Users/heyongqiang/Documents/workspace/Hive-Index/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=001
 But when doing path.getFileSystem() for these 2 input paths. they both return 
 same one file system instance which points the first caller, in this case 
 which is 
 har:/Users/heyongqiang/Documents/workspace/Hive-Index/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=00/data.har
 The reason here is Hadoop's FileSystem has a global cache, and when trying to 
 load a FileSystem instance from a given path, it only take the path's scheme 
 and username to lookup the cache. So when we do Path.getFileSystem for the 
 second har path, it actually returns the file system handle for the first 
 path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1535) alter partition should throw exception if the specified partition does not exist.

2010-08-12 Thread He Yongqiang (JIRA)
alter partition should throw exception if the specified partition does not 
exist.
-

 Key: HIVE-1535
 URL: https://issues.apache.org/jira/browse/HIVE-1535
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1532) Replace globStatus with listStatus inside Hive.java's replaceFiles.

2010-08-12 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang reassigned HIVE-1532:
--

Assignee: He Yongqiang

 Replace globStatus with listStatus inside Hive.java's replaceFiles.
 ---

 Key: HIVE-1532
 URL: https://issues.apache.org/jira/browse/HIVE-1532
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang

 globStatus expects a regular expression,  so if there is special characters 
 (like '{' , '[') in the filepath, this function will fail.
 We should be able to replace this call with listStatus easily since we are 
 not passing regex to replaceFiles(). The only places replaceFiles is called 
 is in loadPartition and Table's replaceFiles.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1522) replace columns should prohibit using partition column names.

2010-08-12 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang reassigned HIVE-1522:
--

Assignee: He Yongqiang

 replace columns should prohibit using partition column names.
 -

 Key: HIVE-1522
 URL: https://issues.apache.org/jira/browse/HIVE-1522
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang

 create table src_part_w(key int , value string) partitioned by (ds string, hr 
 int);
 alter table src_part_w  replace columns (key int, ds string, hr int, value 
 string);
 should not be allowed. Once the alter table replace columns ... is done, 
 all commands on this table will fail. And not able to change the schema back.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1535) alter partition should throw exception if the specified partition does not exist.

2010-08-12 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1535:
---

Attachment: hive-1535.1.patch

No negative tests included because hive is using local meta store, and throw 
exception if the partition does not exist. So there is no problem when running 
with local meta store.

 alter partition should throw exception if the specified partition does not 
 exist.
 -

 Key: HIVE-1535
 URL: https://issues.apache.org/jira/browse/HIVE-1535
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: hive-1535.1.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1535) alter partition should throw exception if the specified partition does not exist.

2010-08-12 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1535:
---

Status: Patch Available  (was: Open)

 alter partition should throw exception if the specified partition does not 
 exist.
 -

 Key: HIVE-1535
 URL: https://issues.apache.org/jira/browse/HIVE-1535
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: hive-1535.1.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1495) supply correct information to hooks and lineage for index rebuild

2010-08-11 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1495:
---

Attachment: hive-1495.3.patch

 supply correct information to hooks and lineage for index rebuild
 -

 Key: HIVE-1495
 URL: https://issues.apache.org/jira/browse/HIVE-1495
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Indexing
Affects Versions: 0.7.0
Reporter: John Sichi
Assignee: He Yongqiang
 Fix For: 0.7.0

 Attachments: hive-1495.1.patch, hive-1495.2.patch, hive-1495.3.patch


 This is a followup for HIVE-417.  
 Ashish can probably help on how this should work.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



  1   2   3   4   5   6   7   >