[jira] Created: (HIVE-1691) ANALYZE TABLE command should check columns in partitioin spec

2010-10-05 Thread Ning Zhang (JIRA)
ANALYZE TABLE command should check columns in partitioin spec
-

 Key: HIVE-1691
 URL: https://issues.apache.org/jira/browse/HIVE-1691
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang


ANALYZE TABEL PARTITION (col1, col2,...) should check whether col1, col2 etc 
are partition columns.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1691) ANALYZE TABLE command should check columns in partitioin spec

2010-10-05 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1691:
-

Attachment: HIVE-1691.patch

 ANALYZE TABLE command should check columns in partitioin spec
 -

 Key: HIVE-1691
 URL: https://issues.apache.org/jira/browse/HIVE-1691
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1691.patch


 ANALYZE TABEL PARTITION (col1, col2,...) should check whether col1, col2 etc 
 are partition columns.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1691) ANALYZE TABLE command should check columns in partitioin spec

2010-10-05 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1691:
-

Status: Patch Available  (was: Open)

 ANALYZE TABLE command should check columns in partitioin spec
 -

 Key: HIVE-1691
 URL: https://issues.apache.org/jira/browse/HIVE-1691
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1691.patch


 ANALYZE TABEL PARTITION (col1, col2,...) should check whether col1, col2 etc 
 are partition columns.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1674) count(*) returns wrong result when a mapper returns empty results

2010-10-04 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1674:
-

Attachment: HIVE-1674.patch

 count(*) returns wrong result when a mapper returns empty results
 -

 Key: HIVE-1674
 URL: https://issues.apache.org/jira/browse/HIVE-1674
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1674.patch


 select count(*) from src where false; will return # of mappers rather than 0. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1674) count(*) returns wrong result when a mapper returns empty results

2010-10-04 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1674:
-

Status: Patch Available  (was: Open)

 count(*) returns wrong result when a mapper returns empty results
 -

 Key: HIVE-1674
 URL: https://issues.apache.org/jira/browse/HIVE-1674
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1674.patch


 select count(*) from src where false; will return # of mappers rather than 0. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1376) Simple UDAFs with more than 1 parameter crash on empty row query

2010-10-04 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1376:
-

Status: Patch Available  (was: Open)

 Simple UDAFs with more than 1 parameter crash on empty row query 
 -

 Key: HIVE-1376
 URL: https://issues.apache.org/jira/browse/HIVE-1376
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Mayank Lahiri
Assignee: Ning Zhang
 Attachments: HIVE-1376.2.patch, HIVE-1376.patch


 Simple UDAFs with more than 1 parameter crash when the query returns no rows. 
 Currently, this only seems to affect the percentile() UDAF where the second 
 parameter is the percentile to be computed (of type double). I've also 
 verified the bug by adding a dummy parameter to ExampleMin in contrib. 
 On an empty query, Hive seems to be trying to resolve an iterate() method 
 with signature {null,null} instead of {null,double}. You can reproduce this 
 bug using:
 CREATE TABLE pct_test ( val INT );
 SELECT percentile(val, 0.5) FROM pct_test;
 which produces a lot of errors like: 
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
 execute method public boolean 
 org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.iterate(org.apache.hadoop.io.LongWritable,double)
   on object 
 org.apache.hadoop.hive.ql.udf.udafpercentile$percentilelongevalua...@11d13272 
 of class org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator 
 with arguments {null, null} of size 2

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1526) Hive should depend on a release version of Thrift

2010-10-01 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916798#action_12916798
 ] 

Ning Zhang commented on HIVE-1526:
--

I will take a look. 

 Hive should depend on a release version of Thrift
 -

 Key: HIVE-1526
 URL: https://issues.apache.org/jira/browse/HIVE-1526
 Project: Hadoop Hive
  Issue Type: Task
  Components: Build Infrastructure, Clients
Reporter: Carl Steinbach
Assignee: Todd Lipcon
 Fix For: 0.7.0

 Attachments: HIVE-1526.2.patch.txt, hive-1526.txt, libfb303.jar, 
 libthrift.jar


 Hive should depend on a release version of Thrift, and ideally it should use 
 Ivy to resolve this dependency.
 The Thrift folks are working on adding Thrift artifacts to a maven repository 
 here: https://issues.apache.org/jira/browse/THRIFT-363

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1611) Add alternative search-provider to Hive site

2010-10-01 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916927#action_12916927
 ] 

Ning Zhang commented on HIVE-1611:
--

Hi Alex, some questions:
 - Hive doesn't have the file author/src/documentation/skinconf.xml, which is 
included in the patch. How does this work?
 - This patch and comments suggest this patch is for Hadoop subprojects. Hive 
is transiting to a TLP independent of Hadoop. Is there an issue after the 
transition?

 Add alternative search-provider to Hive site
 

 Key: HIVE-1611
 URL: https://issues.apache.org/jira/browse/HIVE-1611
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Alex Baranau
Assignee: Alex Baranau
Priority: Minor
 Attachments: HIVE-1611.patch


 Use search-hadoop.com service to make available search in Hive sources, MLs, 
 wiki, etc.
 This was initially proposed on user mailing list. The search service was 
 already added in site's skin (common for all Hadoop related projects) before 
 so this issue is about enabling it for Hive. The ultimate goal is to use it 
 at all Hadoop's sub-projects' sites.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1376) Simple UDAFs with more than 1 parameter crash on empty row query

2010-10-01 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1376:
-

Attachment: HIVE-1376.2.patch

The previous patch failed on several test, particularly count(*) queries. 
Attaching a new patch for percentile only and will update a patch for HIVE-1674 
separately. 

 Simple UDAFs with more than 1 parameter crash on empty row query 
 -

 Key: HIVE-1376
 URL: https://issues.apache.org/jira/browse/HIVE-1376
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Mayank Lahiri
Assignee: Ning Zhang
 Attachments: HIVE-1376.2.patch, HIVE-1376.patch


 Simple UDAFs with more than 1 parameter crash when the query returns no rows. 
 Currently, this only seems to affect the percentile() UDAF where the second 
 parameter is the percentile to be computed (of type double). I've also 
 verified the bug by adding a dummy parameter to ExampleMin in contrib. 
 On an empty query, Hive seems to be trying to resolve an iterate() method 
 with signature {null,null} instead of {null,double}. You can reproduce this 
 bug using:
 CREATE TABLE pct_test ( val INT );
 SELECT percentile(val, 0.5) FROM pct_test;
 which produces a lot of errors like: 
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
 execute method public boolean 
 org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.iterate(org.apache.hadoop.io.LongWritable,double)
   on object 
 org.apache.hadoop.hive.ql.udf.udafpercentile$percentilelongevalua...@11d13272 
 of class org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator 
 with arguments {null, null} of size 2

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1684) intermittent failures in create_escape.q

2010-10-01 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916990#action_12916990
 ] 

Ning Zhang commented on HIVE-1684:
--

This is the same as HIVE-1669, which was introduced in the new desc extended 
feature. It should be addressed by HIVE-1658. 

 intermittent failures in create_escape.q
 

 Key: HIVE-1684
 URL: https://issues.apache.org/jira/browse/HIVE-1684
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: He Yongqiang

 [junit] diff -a -I file: -I pfile: -I hdfs: -I /tmp/ -I invalidscheme: -I 
 lastUpdateTime -I lastAccessTime -I [Oo]wner -I CreateTime -I LastAccessTime 
 -I Location -I transient_lastDdlTime -I last_modified_ -I 
 java.lang.RuntimeException -I at org -I at sun -I at java -I at junit -I 
 Caused by: -I [.][.][.] [0-9]* more 
 /data/users/njain/hive_commit1/hive_commit1/build/ql/test/logs/clientpositive/create_escape.q.out
  
 /data/users/njain/hive_commit1/hive_commit1/ql/src/test/results/clientpositive/create_escape.q.out
 [junit] 48d47
 [junit]  serialization.format\t  
 [junit] 49a49
 [junit]  serialization.format\t  
 Sometimes, I see the above failure. 
 This does not happen always, and needs to be investigated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1684) intermittent failures in create_escape.q

2010-10-01 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang resolved HIVE-1684.
--

Resolution: Duplicate

duplicate of HIVE-1669.

 intermittent failures in create_escape.q
 

 Key: HIVE-1684
 URL: https://issues.apache.org/jira/browse/HIVE-1684
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: He Yongqiang

 [junit] diff -a -I file: -I pfile: -I hdfs: -I /tmp/ -I invalidscheme: -I 
 lastUpdateTime -I lastAccessTime -I [Oo]wner -I CreateTime -I LastAccessTime 
 -I Location -I transient_lastDdlTime -I last_modified_ -I 
 java.lang.RuntimeException -I at org -I at sun -I at java -I at junit -I 
 Caused by: -I [.][.][.] [0-9]* more 
 /data/users/njain/hive_commit1/hive_commit1/build/ql/test/logs/clientpositive/create_escape.q.out
  
 /data/users/njain/hive_commit1/hive_commit1/ql/src/test/results/clientpositive/create_escape.q.out
 [junit] 48d47
 [junit]  serialization.format\t  
 [junit] 49a49
 [junit]  serialization.format\t  
 Sometimes, I see the above failure. 
 This does not happen always, and needs to be investigated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1611) Add alternative search-provider to Hive site

2010-10-01 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917001#action_12917001
 ] 

Ning Zhang commented on HIVE-1611:
--

Thanks for the link Alex. I've talked to Ashish and he said Hive has just been 
approved to TLP. There might be some work need to be done to move the wiki and 
all documentations (I think Edward Capriolo has volunteered to do so?). Let me 
ask Edward and see what he thinks. 

 Add alternative search-provider to Hive site
 

 Key: HIVE-1611
 URL: https://issues.apache.org/jira/browse/HIVE-1611
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Alex Baranau
Assignee: Alex Baranau
Priority: Minor
 Attachments: HIVE-1611.patch


 Use search-hadoop.com service to make available search in Hive sources, MLs, 
 wiki, etc.
 This was initially proposed on user mailing list. The search service was 
 already added in site's skin (common for all Hadoop related projects) before 
 so this issue is about enabling it for Hive. The ultimate goal is to use it 
 at all Hadoop's sub-projects' sites.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1674) count(*) returns wrong result when a mapper returns empty results

2010-09-30 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang reassigned HIVE-1674:


Assignee: Ning Zhang

 count(*) returns wrong result when a mapper returns empty results
 -

 Key: HIVE-1674
 URL: https://issues.apache.org/jira/browse/HIVE-1674
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang

 select count(*) from src where false; will return # of mappers rather than 0. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1376) Simple UDAFs with more than 1 parameter crash on empty row query

2010-09-30 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1376:
-

Attachment: HIVE-1376.patch

Attaching a patch for review. This patch also fixes HIVE-1674 (count(*) 
returning wrong results). 

Tests are still running. Will upload a new patch if there are more changes. 

This patch implements 3) as suggest and SELECT PERCENTILE(col, 0.5) from src 
where false; will return a single row with NULL as value. 

 Simple UDAFs with more than 1 parameter crash on empty row query 
 -

 Key: HIVE-1376
 URL: https://issues.apache.org/jira/browse/HIVE-1376
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Mayank Lahiri
 Attachments: HIVE-1376.patch


 Simple UDAFs with more than 1 parameter crash when the query returns no rows. 
 Currently, this only seems to affect the percentile() UDAF where the second 
 parameter is the percentile to be computed (of type double). I've also 
 verified the bug by adding a dummy parameter to ExampleMin in contrib. 
 On an empty query, Hive seems to be trying to resolve an iterate() method 
 with signature {null,null} instead of {null,double}. You can reproduce this 
 bug using:
 CREATE TABLE pct_test ( val INT );
 SELECT percentile(val, 0.5) FROM pct_test;
 which produces a lot of errors like: 
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
 execute method public boolean 
 org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.iterate(org.apache.hadoop.io.LongWritable,double)
   on object 
 org.apache.hadoop.hive.ql.udf.udafpercentile$percentilelongevalua...@11d13272 
 of class org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator 
 with arguments {null, null} of size 2

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1376) Simple UDAFs with more than 1 parameter crash on empty row query

2010-09-30 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1376:
-

  Status: Patch Available  (was: Open)
Assignee: Ning Zhang

 Simple UDAFs with more than 1 parameter crash on empty row query 
 -

 Key: HIVE-1376
 URL: https://issues.apache.org/jira/browse/HIVE-1376
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Mayank Lahiri
Assignee: Ning Zhang
 Attachments: HIVE-1376.patch


 Simple UDAFs with more than 1 parameter crash when the query returns no rows. 
 Currently, this only seems to affect the percentile() UDAF where the second 
 parameter is the percentile to be computed (of type double). I've also 
 verified the bug by adding a dummy parameter to ExampleMin in contrib. 
 On an empty query, Hive seems to be trying to resolve an iterate() method 
 with signature {null,null} instead of {null,double}. You can reproduce this 
 bug using:
 CREATE TABLE pct_test ( val INT );
 SELECT percentile(val, 0.5) FROM pct_test;
 which produces a lot of errors like: 
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
 execute method public boolean 
 org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.iterate(org.apache.hadoop.io.LongWritable,double)
   on object 
 org.apache.hadoop.hive.ql.udf.udafpercentile$percentilelongevalua...@11d13272 
 of class org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator 
 with arguments {null, null} of size 2

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1427) Provide metastore schema migration scripts (0.5 - 0.6)

2010-09-30 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916742#action_12916742
 ] 

Ning Zhang commented on HIVE-1427:
--

Carl, this is the only 0.6 blocker that doesn't have patch available. Can you 
work on this as a hi-pri?

 Provide metastore schema migration scripts (0.5 - 0.6)
 ---

 Key: HIVE-1427
 URL: https://issues.apache.org/jira/browse/HIVE-1427
 Project: Hadoop Hive
  Issue Type: Task
  Components: Metastore
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Fix For: 0.6.0


 At a minimum this ticket covers packaging up example MySQL migration scripts 
 (cumulative across all schema changes from 0.5 to 0.6) and explaining what to 
 do with them in the release notes.
 This is also probably a good point at which to decide and clearly state which 
 Metastore DBs we officially support in production, e.g. do we need to provide 
 migration scripts for Derby?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1157) UDFs can't be loaded via add jar when jar is on HDFS

2010-09-30 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1157:
-

Status: Open  (was: Patch Available)

 UDFs can't be loaded via add jar when jar is on HDFS
 --

 Key: HIVE-1157
 URL: https://issues.apache.org/jira/browse/HIVE-1157
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Philip Zeyliger
Priority: Minor
 Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, 
 HIVE-1157.patch.v4.txt, HIVE-1157.patch.v5.txt, HIVE-1157.v2.patch.txt, 
 output.txt


 As discussed on the mailing list, it would be nice if you could use UDFs that 
 are on jars on HDFS.  The proposed implementation would be for add jar to 
 recognize that the target file is on HDFS, copy it locally, and load it into 
 the classpath.
 {quote}
 Hi folks,
 I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  
 Can you use a UDF where the jar which contains the function is on HDFS, and 
 not on the local filesystem.  Specifically, the following does not seem to 
 work:
 # This is Hive 0.5, from svn
 $bin/hive  
 Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
 hive add jar hdfs://localhost/FooTest.jar;   

 Added hdfs://localhost/FooTest.jar to class path
 hive create temporary function cube as 'com.cloudera.FooTestUDF';
 
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.FunctionTask
 Does this work for other people?  I could probably fix it by changing add 
 jar to download remote jars locally, when necessary (to load them into the 
 classpath), or update URLClassLoader (or whatever is underneath there) to 
 read directly from HDFS, which seems a bit more fragile.  But I wanted to 
 make sure that my interpretation of what's going on is right before I have at 
 it.
 Thanks,
 -- Philip
 {quote}
 {quote}
 Yes that's correct. I prefer to download the jars in add jar.
 Zheng
 {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1376) Simple UDAFs with more than 1 parameter crash on empty row query

2010-09-30 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1376:
-

Status: Open  (was: Patch Available)

 Simple UDAFs with more than 1 parameter crash on empty row query 
 -

 Key: HIVE-1376
 URL: https://issues.apache.org/jira/browse/HIVE-1376
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Mayank Lahiri
Assignee: Ning Zhang
 Attachments: HIVE-1376.patch


 Simple UDAFs with more than 1 parameter crash when the query returns no rows. 
 Currently, this only seems to affect the percentile() UDAF where the second 
 parameter is the percentile to be computed (of type double). I've also 
 verified the bug by adding a dummy parameter to ExampleMin in contrib. 
 On an empty query, Hive seems to be trying to resolve an iterate() method 
 with signature {null,null} instead of {null,double}. You can reproduce this 
 bug using:
 CREATE TABLE pct_test ( val INT );
 SELECT percentile(val, 0.5) FROM pct_test;
 which produces a lot of errors like: 
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
 execute method public boolean 
 org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.iterate(org.apache.hadoop.io.LongWritable,double)
   on object 
 org.apache.hadoop.hive.ql.udf.udafpercentile$percentilelongevalua...@11d13272 
 of class org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator 
 with arguments {null, null} of size 2

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1665) drop operations may cause file leak

2010-09-29 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916249#action_12916249
 ] 

Ning Zhang commented on HIVE-1665:
--

What about 2 failed and rolling back 1) also failed? This could happen if the 
CLI got killed at any time between 1) and 2). 

Another option is to use the traditional 'mark-then-delete' trick that you mark 
the partition as deleted on the metastore first and then clean up the data. In 
case of any failure, redoing the drop partiton will resume the data deletion 
process. It is also easier from the administrator's point of view that you can 
periodically check the metastore for deleted partitions (which are left 
uncommitted) and re-drop the partition. 

 drop operations may cause file leak
 ---

 Key: HIVE-1665
 URL: https://issues.apache.org/jira/browse/HIVE-1665
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: hive-1665.1.patch


 Right now when doing a drop, Hive first drops metadata and then drops the 
 actual files. If file system is down at that time, the files will keep not 
 deleted. 
 Had an offline discussion about this:
 to fix this, add a new conf scratch dir into hive conf. 
 when doing a drop operation:
 1) move data to scratch directory
 2) drop metadata
 3) if 2) failed, roll back 1) and report error 3.1
 if 2) succeeded, drop data from scratch directory 3.2
 4) if 3.2 fails, we are ok because we assume the scratch dir will be emptied 
 manually.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1638) convert commonly used udfs to generic udfs

2010-09-29 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916256#action_12916256
 ] 

Ning Zhang commented on HIVE-1638:
--

Siying, great work!

Also can you do an optimization for the case when the parameters are constants 
(e.g., the 2nd parameter of f_c='5015'). The objectInspector doesn't have the 
information of whether the input parameter is constant or not, but I think if 
you check in evaluate() whether the parameter is the same *object* between the 
1st and 2nd row, you can conclude the parameter is a constant. This can save a 
lot in object constructions. 

 convert commonly used udfs to generic udfs
 --

 Key: HIVE-1638
 URL: https://issues.apache.org/jira/browse/HIVE-1638
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Siying Dong
 Attachments: HIVE-1638.1.patch


 Copying a mail from Joy:
 i did a little bit of profiling of a simple hive group by query today. i was 
 surprised to see that one of the most expensive functions were in converting 
 the equals udf (i had some simple string filters) to generic udfs. 
 (primitiveobjectinspectorconverter.textconverter)
 am i correct in thinking that the fix is to simply port some of the most 
 popular udfs (string equality/comparison etc.) to generic udsf?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1674) count(*) returns wrong result when a mapper returns empty results

2010-09-29 Thread Ning Zhang (JIRA)
count(*) returns wrong result when a mapper returns empty results
-

 Key: HIVE-1674
 URL: https://issues.apache.org/jira/browse/HIVE-1674
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang


select count(*) from src where false; will return # of mappers rather than 0. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1524) parallel execution failed if mapred.job.name is set

2010-09-28 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915651#action_12915651
 ] 

Ning Zhang commented on HIVE-1524:
--

Thanks for back porting to 0.6 yourchanges. The code changes look good. Can you 
include the other 2 files (parallel.q and parallel.q.out) in the patch as well? 


 parallel execution failed if mapred.job.name is set
 ---

 Key: HIVE-1524
 URL: https://issues.apache.org/jira/browse/HIVE-1524
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Ning Zhang
Assignee: Ning Zhang
 Fix For: 0.7.0

 Attachments: HIVE-1524-for-Hive-0.6.patch, HIVE-1524.2.patch, 
 HIVE-1524.patch


 The plan file name was generated based on mapred.job.name. If the user 
 specify mapred.job.name before the query, two parallel queries will have 
 conflict plan file name. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-675) add database/schema support Hive QL

2010-09-28 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915946#action_12915946
 ] 

Ning Zhang commented on HIVE-675:
-

Hi Carl, 

Branch 0.6 currently is broken when running a unit test. The error is as 
follows:


compile-test:
[javac] /data/users/nzhang/reviews/0.6/branch-0.6/build-common.xml:307: 
warning: 'includeantruntime' was not set, defaulting to 
build.sysclasspath=last; set to false for repeatable builds
[javac] Compiling 4 source files to 
/data/users/nzhang/reviews/0.6/branch-0.6/build/metastore/test/classes
[javac] 
/data/users/nzhang/reviews/0.6/branch-0.6/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStoreRemote.java:76:
 
partitionTester(org.apache.hadoop.hive.metastore.HiveMetaStoreClient,org.apache.hadoop.hive.conf.HiveConf)
 in org.apache.hadoop.hive.metastore.TestHiveMetaStore cannot be applied to 
(org.apache.hadoop.hive.metastore.HiveMetaStoreClient,org.apache.hadoop.hive.conf.HiveConf,boolean)
[javac] TestHiveMetaStore.partitionTester(client, hiveConf, true);
[javac]  ^
[javac] 1 error


It seems the last patch that touches TestHiveMetaStore and 
TestHiveMetaStoreRemote is this patch. Can you take a look? 

 add database/schema support Hive QL
 ---

 Key: HIVE-675
 URL: https://issues.apache.org/jira/browse/HIVE-675
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Metastore, Query Processor
Reporter: Prasad Chakka
Assignee: Carl Steinbach
 Fix For: 0.6.0, 0.7.0

 Attachments: hive-675-2009-9-16.patch, hive-675-2009-9-19.patch, 
 hive-675-2009-9-21.patch, hive-675-2009-9-23.patch, hive-675-2009-9-7.patch, 
 hive-675-2009-9-8.patch, HIVE-675-2010-08-16.patch.txt, 
 HIVE-675-2010-7-16.patch.txt, HIVE-675-2010-8-4.patch.txt, 
 HIVE-675-backport-v6.1.patch.txt, HIVE-675-backport-v6.2.patch.txt, 
 HIVE-675.10.patch.txt, HIVE-675.11.patch.txt, HIVE-675.12.patch.txt, 
 HIVE-675.13.patch.txt


 Currently all Hive tables reside in single namespace (default). Hive should 
 support multiple namespaces (databases or schemas) such that users can create 
 tables in their specific namespaces. These name spaces can have different 
 warehouse directories (with a default naming scheme) and possibly different 
 properties.
 There is already some support for this in metastore but Hive query parser 
 should have this feature as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1524) parallel execution failed if mapred.job.name is set

2010-09-28 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915949#action_12915949
 ] 

Ning Zhang commented on HIVE-1524:
--

Currently branch 0.6 is broken. It may be caused by HIVE-675 patch. I'll run 
test after that one is resolved. 

 parallel execution failed if mapred.job.name is set
 ---

 Key: HIVE-1524
 URL: https://issues.apache.org/jira/browse/HIVE-1524
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.5.0
Reporter: Ning Zhang
Assignee: Ning Zhang
 Fix For: 0.6.0, 0.7.0

 Attachments: HIVE-1524-for-Hive-0.6.patch, HIVE-1524.2.patch, 
 HIVE-1524.patch


 The plan file name was generated based on mapred.job.name. If the user 
 specify mapred.job.name before the query, two parallel queries will have 
 conflict plan file name. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-675) add database/schema support Hive QL

2010-09-28 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915990#action_12915990
 ] 

Ning Zhang commented on HIVE-675:
-

That works. Thanks Carl!

 add database/schema support Hive QL
 ---

 Key: HIVE-675
 URL: https://issues.apache.org/jira/browse/HIVE-675
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Metastore, Query Processor
Reporter: Prasad Chakka
Assignee: Carl Steinbach
 Fix For: 0.6.0, 0.7.0

 Attachments: hive-675-2009-9-16.patch, hive-675-2009-9-19.patch, 
 hive-675-2009-9-21.patch, hive-675-2009-9-23.patch, hive-675-2009-9-7.patch, 
 hive-675-2009-9-8.patch, HIVE-675-2010-08-16.patch.txt, 
 HIVE-675-2010-7-16.patch.txt, HIVE-675-2010-8-4.patch.txt, 
 HIVE-675-backport-v6.1.patch.txt, HIVE-675-backport-v6.2.patch.txt, 
 HIVE-675.10.patch.txt, HIVE-675.11.patch.txt, HIVE-675.12.patch.txt, 
 HIVE-675.13.patch.txt


 Currently all Hive tables reside in single namespace (default). Hive should 
 support multiple namespaces (databases or schemas) such that users can create 
 tables in their specific namespaces. These name spaces can have different 
 warehouse directories (with a default naming scheme) and possibly different 
 properties.
 There is already some support for this in metastore but Hive query parser 
 should have this feature as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1524) parallel execution failed if mapred.job.name is set

2010-09-28 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang resolved HIVE-1524.
--

Release Note: Committed to branch 0.6. Thanks Yuanjun!
  Resolution: Fixed

 parallel execution failed if mapred.job.name is set
 ---

 Key: HIVE-1524
 URL: https://issues.apache.org/jira/browse/HIVE-1524
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.5.0
Reporter: Ning Zhang
Assignee: Ning Zhang
 Fix For: 0.6.0, 0.7.0

 Attachments: HIVE-1524-for-Hive-0.6.patch, HIVE-1524.2.patch, 
 HIVE-1524.patch


 The plan file name was generated based on mapred.job.name. If the user 
 specify mapred.job.name before the query, two parallel queries will have 
 conflict plan file name. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1582) merge mapfiles task behaves incorrectly for 'inserting overwrite directory...'

2010-09-27 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang resolved HIVE-1582.
--

Resolution: Not A Problem

Taked to Namit and Yongqiang, this is not a bug. INSERT OVERWRITE to (HDFS) 
directory should be merged as before. INSERT OVERWRITE LOCAL DIRECTORY cannot 
be merged and this is not the case. 

 merge mapfiles task behaves incorrectly for 'inserting overwrite directory...'
 --

 Key: HIVE-1582
 URL: https://issues.apache.org/jira/browse/HIVE-1582
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang

 hive 
  
  
   SET hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
 hiveSET hive.exec.compress.output=false;
 hiveINSERT OVERWRITE DIRECTORY 'x'
   SELECT  from  a;
 Total MapReduce jobs = 2
 Launching Job 1 out of 2
 Number of reduce tasks is set to 0 since there's no reduce operator
 ..
 Ended Job = job_201008191557_54169
 Ended Job = 450290112, job is filtered out (removed at runtime).
 Launching Job 2 out of 2
 .
 the second job should not get started.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1669) non-deterministic display of storage parameter in test

2010-09-27 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1669:
-

Parent: HIVE-1658
Issue Type: Sub-task  (was: Test)

 non-deterministic display of storage parameter in test
 --

 Key: HIVE-1669
 URL: https://issues.apache.org/jira/browse/HIVE-1669
 Project: Hadoop Hive
  Issue Type: Sub-task
Reporter: Ning Zhang

 With the change to beautify the 'desc extended table', the storage parameters 
 are displayed in non-deterministic manner (since its implementation is 
 HashMap). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1378) Return value for map, array, and struct needs to return a string

2010-09-27 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1378:
-

Status: Resolved  (was: Patch Available)
Resolution: Fixed

Committed. Thanks Steven!

 Return value for map, array, and struct needs to return a string 
 -

 Key: HIVE-1378
 URL: https://issues.apache.org/jira/browse/HIVE-1378
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Drivers
Reporter: Jerome Boulon
Assignee: Steven Wong
 Fix For: 0.7.0

 Attachments: HIVE-1378.1.patch, HIVE-1378.2.patch, HIVE-1378.3.patch, 
 HIVE-1378.4.patch, HIVE-1378.5.patch, HIVE-1378.6.patch, HIVE-1378.7.patch, 
 HIVE-1378.patch


 In order to be able to select/display any data from JDBC Hive driver, return 
 value for map, array, and struct needs to return a string

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1670) MapJoin throws EOFExeption when the mapjoined table has 0 column selected

2010-09-25 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914764#action_12914764
 ] 

Ning Zhang commented on HIVE-1670:
--

Not sure whether this patch fixes that bug. Maybe they can try this patch with 
their query.

 MapJoin throws EOFExeption when the mapjoined table has 0 column selected
 -

 Key: HIVE-1670
 URL: https://issues.apache.org/jira/browse/HIVE-1670
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1670.patch


 select /*+mapjoin(b) */ sum(a.key) from src a join src b on (a.key=b.key); 
 throws EOFException

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string

2010-09-24 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914346#action_12914346
 ] 

Ning Zhang commented on HIVE-1378:
--

@john, should we run a survey on hive-user mailing list to see how many people 
are still using pre-0.20 hadoop before dropping the support? 

 Return value for map, array, and struct needs to return a string 
 -

 Key: HIVE-1378
 URL: https://issues.apache.org/jira/browse/HIVE-1378
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Drivers
Reporter: Jerome Boulon
Assignee: Steven Wong
 Fix For: 0.7.0

 Attachments: HIVE-1378.1.patch, HIVE-1378.2.patch, HIVE-1378.3.patch, 
 HIVE-1378.4.patch, HIVE-1378.5.patch, HIVE-1378.6.patch, HIVE-1378.patch


 In order to be able to select/display any data from JDBC Hive driver, return 
 value for map, array, and struct needs to return a string

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1659) parse_url_tuple: a UDTF version of parse_url

2010-09-24 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914358#action_12914358
 ] 

Ning Zhang commented on HIVE-1659:
--

Xing, this patch doesn't apply cleanly with the latest trunk. Can you 'svn up' 
and regenerate the patch. You may need to resolve any conflicts after 'svn up'.

 parse_url_tuple:  a UDTF version of parse_url
 -

 Key: HIVE-1659
 URL: https://issues.apache.org/jira/browse/HIVE-1659
 Project: Hadoop Hive
  Issue Type: New Feature
Affects Versions: 0.5.0
Reporter: Ning Zhang
 Attachments: HIVE-1659.patch


 The UDF parse_url take s a URL, parse it and extract QUERY/PATH etc from it. 
 However it can only extract an atomic value from the URL. If we want to 
 extract multiple piece of information, we need to call the function many 
 times. It is desirable to parse the URL once and extract all needed 
 information and return a tuple in a UDTF. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1659) parse_url_tuple: a UDTF version of parse_url

2010-09-24 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914360#action_12914360
 ] 

Ning Zhang commented on HIVE-1659:
--

Also when you generate the patch, you need to run 'svn diff' at the root 
directory of the hive trunk. 

 parse_url_tuple:  a UDTF version of parse_url
 -

 Key: HIVE-1659
 URL: https://issues.apache.org/jira/browse/HIVE-1659
 Project: Hadoop Hive
  Issue Type: New Feature
Affects Versions: 0.5.0
Reporter: Ning Zhang
 Attachments: HIVE-1659.patch


 The UDF parse_url take s a URL, parse it and extract QUERY/PATH etc from it. 
 However it can only extract an atomic value from the URL. If we want to 
 extract multiple piece of information, we need to call the function many 
 times. It is desirable to parse the URL once and extract all needed 
 information and return a tuple in a UDTF. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string

2010-09-24 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914598#action_12914598
 ] 

Ning Zhang commented on HIVE-1378:
--

Before we decided to drop support for pre-0.20, we should have a separate JIRA 
to have a list of things that need to clean up: e.g., exclude downloading 
building hadoop 0.17. 

In the mean time, the change in the patch to be pre-0.20 compatible should be 
minimum. Steven, can you take a look the code and see how much it required to 
be done to be compatible with 0.17?

 Return value for map, array, and struct needs to return a string 
 -

 Key: HIVE-1378
 URL: https://issues.apache.org/jira/browse/HIVE-1378
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Drivers
Reporter: Jerome Boulon
Assignee: Steven Wong
 Fix For: 0.7.0

 Attachments: HIVE-1378.1.patch, HIVE-1378.2.patch, HIVE-1378.3.patch, 
 HIVE-1378.4.patch, HIVE-1378.5.patch, HIVE-1378.6.patch, HIVE-1378.patch


 In order to be able to select/display any data from JDBC Hive driver, return 
 value for map, array, and struct needs to return a string

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1659) parse_url_tuple: a UDTF version of parse_url

2010-09-24 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914602#action_12914602
 ] 

Ning Zhang commented on HIVE-1659:
--

Xing, there is a diff in show_functions.q. You need to overwrite the .out file 
with the addition of the new function. The following command will update the 
out file. 

 ant test -Dtestcase=TestCliDriver -Dqfile=show_functions.q -Doverwrite=true

Can you regenerate the patch after that?

 parse_url_tuple:  a UDTF version of parse_url
 -

 Key: HIVE-1659
 URL: https://issues.apache.org/jira/browse/HIVE-1659
 Project: Hadoop Hive
  Issue Type: New Feature
Affects Versions: 0.5.0
Reporter: Ning Zhang
 Attachments: HIVE-1659.patch, HIVE-1659.patch2


 The UDF parse_url take s a URL, parse it and extract QUERY/PATH etc from it. 
 However it can only extract an atomic value from the URL. If we want to 
 extract multiple piece of information, we need to call the function many 
 times. It is desirable to parse the URL once and extract all needed 
 information and return a tuple in a UDTF. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1659) parse_url_tuple: a UDTF version of parse_url

2010-09-24 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1659:
-

Status: Resolved  (was: Patch Available)
Resolution: Fixed

Committed. Thanks Xing!

 parse_url_tuple:  a UDTF version of parse_url
 -

 Key: HIVE-1659
 URL: https://issues.apache.org/jira/browse/HIVE-1659
 Project: Hadoop Hive
  Issue Type: New Feature
Affects Versions: 0.5.0
Reporter: Ning Zhang
 Attachments: HIVE-1659.patch, HIVE-1659.patch2, HIVE-1659.patch3


 The UDF parse_url take s a URL, parse it and extract QUERY/PATH etc from it. 
 However it can only extract an atomic value from the URL. If we want to 
 extract multiple piece of information, we need to call the function many 
 times. It is desirable to parse the URL once and extract all needed 
 information and return a tuple in a UDTF. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1669) non-deterministic display of storage parameter in test

2010-09-24 Thread Ning Zhang (JIRA)
non-deterministic display of storage parameter in test
--

 Key: HIVE-1669
 URL: https://issues.apache.org/jira/browse/HIVE-1669
 Project: Hadoop Hive
  Issue Type: Test
Reporter: Ning Zhang


With the change to beautify the 'desc extended table', the storage parameters 
are displayed in non-deterministic manner (since its implementation is 
HashMap). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1670) MapJoin throws EOFExeption when the mapjoined table has 0 column selected

2010-09-24 Thread Ning Zhang (JIRA)
MapJoin throws EOFExeption when the mapjoined table has 0 column selected
-

 Key: HIVE-1670
 URL: https://issues.apache.org/jira/browse/HIVE-1670
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang


select /*+mapjoin(b) */ sum(a.key) from src a join src b on (a.key=b.key); 
throws EOFException

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1670) MapJoin throws EOFExeption when the mapjoined table has 0 column selected

2010-09-24 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1670:
-

Attachment: HIVE-1670.patch

 MapJoin throws EOFExeption when the mapjoined table has 0 column selected
 -

 Key: HIVE-1670
 URL: https://issues.apache.org/jira/browse/HIVE-1670
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1670.patch


 select /*+mapjoin(b) */ sum(a.key) from src a join src b on (a.key=b.key); 
 throws EOFException

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1670) MapJoin throws EOFExeption when the mapjoined table has 0 column selected

2010-09-24 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1670:
-

Status: Patch Available  (was: Open)

 MapJoin throws EOFExeption when the mapjoined table has 0 column selected
 -

 Key: HIVE-1670
 URL: https://issues.apache.org/jira/browse/HIVE-1670
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1670.patch


 select /*+mapjoin(b) */ sum(a.key) from src a join src b on (a.key=b.key); 
 throws EOFException

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string

2010-09-24 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914756#action_12914756
 ] 

Ning Zhang commented on HIVE-1378:
--

+1. testing.

 Return value for map, array, and struct needs to return a string 
 -

 Key: HIVE-1378
 URL: https://issues.apache.org/jira/browse/HIVE-1378
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Drivers
Reporter: Jerome Boulon
Assignee: Steven Wong
 Fix For: 0.7.0

 Attachments: HIVE-1378.1.patch, HIVE-1378.2.patch, HIVE-1378.3.patch, 
 HIVE-1378.4.patch, HIVE-1378.5.patch, HIVE-1378.6.patch, HIVE-1378.7.patch, 
 HIVE-1378.patch


 In order to be able to select/display any data from JDBC Hive driver, return 
 value for map, array, and struct needs to return a string

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string

2010-09-23 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914282#action_12914282
 ] 

Ning Zhang commented on HIVE-1378:
--

Changes look good. However there are conflicts when applying to the latest 
trunk. Can you generate a new one against the latest trunk? I'll start testing 
once I got the new patch. 

 Return value for map, array, and struct needs to return a string 
 -

 Key: HIVE-1378
 URL: https://issues.apache.org/jira/browse/HIVE-1378
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Drivers
Reporter: Jerome Boulon
Assignee: Steven Wong
 Fix For: 0.7.0

 Attachments: HIVE-1378.1.patch, HIVE-1378.2.patch, HIVE-1378.3.patch, 
 HIVE-1378.4.patch, HIVE-1378.5.patch, HIVE-1378.patch


 In order to be able to select/display any data from JDBC Hive driver, return 
 value for map, array, and struct needs to return a string

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string

2010-09-23 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914305#action_12914305
 ] 

Ning Zhang commented on HIVE-1378:
--

OK. This one applied cleanly. I'm starting testing. 

I think 'svn up' may be able to do more merging than 'patch'. I got the 
conflict on eclipse-templates/.classpath (it asked me whether I want to reverse 
apply) and another file. 

 Return value for map, array, and struct needs to return a string 
 -

 Key: HIVE-1378
 URL: https://issues.apache.org/jira/browse/HIVE-1378
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Drivers
Reporter: Jerome Boulon
Assignee: Steven Wong
 Fix For: 0.7.0

 Attachments: HIVE-1378.1.patch, HIVE-1378.2.patch, HIVE-1378.3.patch, 
 HIVE-1378.4.patch, HIVE-1378.5.patch, HIVE-1378.6.patch, HIVE-1378.patch


 In order to be able to select/display any data from JDBC Hive driver, return 
 value for map, array, and struct needs to return a string

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string

2010-09-23 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914335#action_12914335
 ] 

Ning Zhang commented on HIVE-1378:
--

Steven, tests passed for hadoop 0.20, but it failed to compile on hadoop 0.17 
(ant clean package -Dhadoop.version=0.17.2.1). Can you take a look?

 Return value for map, array, and struct needs to return a string 
 -

 Key: HIVE-1378
 URL: https://issues.apache.org/jira/browse/HIVE-1378
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Drivers
Reporter: Jerome Boulon
Assignee: Steven Wong
 Fix For: 0.7.0

 Attachments: HIVE-1378.1.patch, HIVE-1378.2.patch, HIVE-1378.3.patch, 
 HIVE-1378.4.patch, HIVE-1378.5.patch, HIVE-1378.6.patch, HIVE-1378.patch


 In order to be able to select/display any data from JDBC Hive driver, return 
 value for map, array, and struct needs to return a string

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1659) parse_url_tuple: a UDTF version of parse_url

2010-09-23 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914337#action_12914337
 ] 

Ning Zhang commented on HIVE-1659:
--

Xing, the patch was not attached. Can you use the link Attach file in the 
left pane?


 parse_url_tuple:  a UDTF version of parse_url
 -

 Key: HIVE-1659
 URL: https://issues.apache.org/jira/browse/HIVE-1659
 Project: Hadoop Hive
  Issue Type: New Feature
Affects Versions: 0.5.0
Reporter: Ning Zhang

 The UDF parse_url take s a URL, parse it and extract QUERY/PATH etc from it. 
 However it can only extract an atomic value from the URL. If we want to 
 extract multiple piece of information, we need to call the function many 
 times. It is desirable to parse the URL once and extract all needed 
 information and return a tuple in a UDTF. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1658) Fix describe [extended] column formatting

2010-09-22 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913668#action_12913668
 ] 

Ning Zhang commented on HIVE-1658:
--

+1 on keeping the old format but add a pretty operator as the child of the 
explain, so that the execution plan for the EXPLAIN is an explain operator 
(with the old formatting) followed by an optional pretty operator taking the 
output and do further formatting. 

 Fix describe [extended] column formatting
 -

 Key: HIVE-1658
 URL: https://issues.apache.org/jira/browse/HIVE-1658
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Paul Yang
Assignee: Thiruvel Thirumoolan

 When displaying the column schema, the formatting should follow should be 
 nameTABtypeTABcommentNEWLINE
 to be inline with the previous formatting style for backward compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1361) table/partition level statistics

2010-09-22 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1361:
-

Attachment: (was: HIVE-1361.3.patch)

 table/partition level statistics
 

 Key: HIVE-1361
 URL: https://issues.apache.org/jira/browse/HIVE-1361
 Project: Hadoop Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Ning Zhang
Assignee: Ahmed M Aly
 Fix For: 0.7.0

 Attachments: HIVE-1361.2.patch, HIVE-1361.2_java_only.patch, 
 HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch


 At the first step, we gather table-level stats for non-partitioned table and 
 partition-level stats for partitioned table. Future work could extend the 
 table level stats to partitioned table as well. 
 There are 3 major milestones in this subtask: 
  1) extend the insert statement to gather table/partition level stats 
 on-the-fly.
  2) extend metastore API to support storing and retrieving stats for a 
 particular table/partition. 
  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
 existing tables/partitions. 
 The proposed stats are:
 Partition-level stats: 
   - number of rows
   - total size in bytes
   - number of files
   - max, min, average row sizes
   - max, min, average file sizes
 Table-level stats in addition to partition level stats:
   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1526) Hive should depend on a release version of Thrift

2010-09-22 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913843#action_12913843
 ] 

Ning Zhang commented on HIVE-1526:
--

The Hive ODBC code is dependent on Thrift as well. In particular the hive 
client and unixODBC libraries have to be linked with the new libthrift.so. Can 
you test if the ODBC code is compatible with the new thrift version?

 Hive should depend on a release version of Thrift
 -

 Key: HIVE-1526
 URL: https://issues.apache.org/jira/browse/HIVE-1526
 Project: Hadoop Hive
  Issue Type: Task
  Components: Build Infrastructure
Reporter: Carl Steinbach
Assignee: Todd Lipcon
 Attachments: hive-1526.txt, libfb303.jar, libthrift.jar


 Hive should depend on a release version of Thrift, and ideally it should use 
 Ivy to resolve this dependency.
 The Thrift folks are working on adding Thrift artifacts to a maven repository 
 here: https://issues.apache.org/jira/browse/THRIFT-363

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1361) table/partition level statistics

2010-09-22 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1361:
-

Attachment: HIVE-1361.4.java_only.patch
HIVE-1361.4.patch

Uploading new patch that refreshed to the latest trunk. Also added a negative 
test case analyze.q and some trivial clean up in Java code (removing commented 
out contents). 

 table/partition level statistics
 

 Key: HIVE-1361
 URL: https://issues.apache.org/jira/browse/HIVE-1361
 Project: Hadoop Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Ning Zhang
Assignee: Ahmed M Aly
 Fix For: 0.7.0

 Attachments: HIVE-1361.2.patch, HIVE-1361.2_java_only.patch, 
 HIVE-1361.3.patch, HIVE-1361.4.java_only.patch, HIVE-1361.4.patch, 
 HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch


 At the first step, we gather table-level stats for non-partitioned table and 
 partition-level stats for partitioned table. Future work could extend the 
 table level stats to partitioned table as well. 
 There are 3 major milestones in this subtask: 
  1) extend the insert statement to gather table/partition level stats 
 on-the-fly.
  2) extend metastore API to support storing and retrieving stats for a 
 particular table/partition. 
  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
 existing tables/partitions. 
 The proposed stats are:
 Partition-level stats: 
   - number of rows
   - total size in bytes
   - number of files
   - max, min, average row sizes
   - max, min, average file sizes
 Table-level stats in addition to partition level stats:
   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1609) Support partition filtering in metastore

2010-09-21 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12912855#action_12912855
 ] 

Ning Zhang commented on HIVE-1609:
--

@namit, the Hive metastore already has the API to get all sub-partitions given 
a partial specification like you provided -- Hive.getPartitions(Table, 
partialPartSpec).  

 Support partition filtering in metastore
 

 Key: HIVE-1609
 URL: https://issues.apache.org/jira/browse/HIVE-1609
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Metastore
Reporter: Ajay Kidave
Assignee: Ajay Kidave
 Fix For: 0.7.0

 Attachments: hive_1609.patch, hive_1609_2.patch, hive_1609_3.patch


 The metastore needs to have support for returning a list of partitions based 
 on user specified filter conditions. This will be useful for tools which need 
 to do partition pruning. Howl is one such use case. The way partition pruning 
 is done during hive query execution need not be changed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1659) parse_url_tuple: a UDTF version of parse_url

2010-09-21 Thread Ning Zhang (JIRA)
parse_url_tuple:  a UDTF version of parse_url
-

 Key: HIVE-1659
 URL: https://issues.apache.org/jira/browse/HIVE-1659
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Ning Zhang


The UDF parse_url take s a URL, parse it and extract QUERY/PATH etc from it. 
However it can only extract an atomic value from the URL. If we want to extract 
multiple piece of information, we need to call the function many times. It is 
desirable to parse the URL once and extract all needed information and return a 
tuple in a UDTF. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1659) parse_url_tuple: a UDTF version of parse_url

2010-09-21 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913081#action_12913081
 ] 

Ning Zhang commented on HIVE-1659:
--

parse_url currently support 2 signatures: parse_url(fullurl, 
'[QUERY|PATH|HOST|...]') and parse_url(fullurl, 'QUERY', '[ref|sk|...]'). In 
parse_url_tuple, the syntax is consolidated as parse_url_tuple(fullurl, 'HOST', 
'PATH', 'QUERY:ref', 'QUERY:sk',...). 

 parse_url_tuple:  a UDTF version of parse_url
 -

 Key: HIVE-1659
 URL: https://issues.apache.org/jira/browse/HIVE-1659
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Ning Zhang

 The UDF parse_url take s a URL, parse it and extract QUERY/PATH etc from it. 
 However it can only extract an atomic value from the URL. If we want to 
 extract multiple piece of information, we need to call the function many 
 times. It is desirable to parse the URL once and extract all needed 
 information and return a tuple in a UDTF. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1361) table/partition level statistics

2010-09-21 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1361:
-

Attachment: HIVE-1361.2.patch
HIVE-1361.2_java_only.patch

Uploading a new patch (including a full version and a Java_only version 
including XML build files) for review. This is against the latest trunk.

The major changes from the last patch include: 
  1) Make JDBC update/insert/select using PreparedStatement(). 
  2) In HBase, use HTable.delete(ArrayListDelete) to speed up delete, and 
flushCommit() to batch update. 
  3) Refactor StatsTask to put stats into PartitionStatistics and 
TableStatistics so that it is easier to add new stats later. 
  4) Move WriteEntity creation from StatsTask to compile-time.

 I'm running tests again after refreshed to the latest trunk.

 table/partition level statistics
 

 Key: HIVE-1361
 URL: https://issues.apache.org/jira/browse/HIVE-1361
 Project: Hadoop Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Ning Zhang
Assignee: Ahmed M Aly
 Fix For: 0.7.0

 Attachments: HIVE-1361.2.patch, HIVE-1361.2_java_only.patch, 
 HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch


 At the first step, we gather table-level stats for non-partitioned table and 
 partition-level stats for partitioned table. Future work could extend the 
 table level stats to partitioned table as well. 
 There are 3 major milestones in this subtask: 
  1) extend the insert statement to gather table/partition level stats 
 on-the-fly.
  2) extend metastore API to support storing and retrieving stats for a 
 particular table/partition. 
  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
 existing tables/partitions. 
 The proposed stats are:
 Partition-level stats: 
   - number of rows
   - total size in bytes
   - number of files
   - max, min, average row sizes
   - max, min, average file sizes
 Table-level stats in addition to partition level stats:
   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1361) table/partition level statistics

2010-09-21 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1361:
-

Status: Patch Available  (was: Open)

 table/partition level statistics
 

 Key: HIVE-1361
 URL: https://issues.apache.org/jira/browse/HIVE-1361
 Project: Hadoop Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Ning Zhang
Assignee: Ahmed M Aly
 Fix For: 0.7.0

 Attachments: HIVE-1361.2.patch, HIVE-1361.2_java_only.patch, 
 HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch


 At the first step, we gather table-level stats for non-partitioned table and 
 partition-level stats for partitioned table. Future work could extend the 
 table level stats to partitioned table as well. 
 There are 3 major milestones in this subtask: 
  1) extend the insert statement to gather table/partition level stats 
 on-the-fly.
  2) extend metastore API to support storing and retrieving stats for a 
 particular table/partition. 
  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
 existing tables/partitions. 
 The proposed stats are:
 Partition-level stats: 
   - number of rows
   - total size in bytes
   - number of files
   - max, min, average row sizes
   - max, min, average file sizes
 Table-level stats in addition to partition level stats:
   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1651) ScriptOperator should not forward any output to downstream operators if an exception is happened

2010-09-21 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913301#action_12913301
 ] 

Ning Zhang commented on HIVE-1651:
--

Discussed with Joydeep offline. The side effects of failed task should be 
cleaned after the job finished. _tmp* files are already taken care of in the 
current code base. The only side effect that need to be taken care of is the 
empty directories created by failed dynamic partition inserts. This issue is 
addressed in HIVE-1655. 


 ScriptOperator should not forward any output to downstream operators if an 
 exception is happened
 

 Key: HIVE-1651
 URL: https://issues.apache.org/jira/browse/HIVE-1651
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1651.patch


 ScriptOperator spawns 2 threads for getting the stdout and stderr from the 
 script and then forward the output from stdout to downstream operators. In 
 case of any exceptions to the script (e.g., got killed), the ScriptOperator 
 got an exception and throw it to upstream operators until MapOperator got it 
 and call close(abort). Before the ScriptOperator.close() is called the script 
 output stream can still forward output to downstream operators. We should 
 terminate it immediately.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string

2010-09-21 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913338#action_12913338
 ] 

Ning Zhang commented on HIVE-1378:
--

Steven, there are conflicts when applying to the latest trunk. Can you 
regenerate the patch?

 Return value for map, array, and struct needs to return a string 
 -

 Key: HIVE-1378
 URL: https://issues.apache.org/jira/browse/HIVE-1378
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Drivers
Reporter: Jerome Boulon
Assignee: Steven Wong
 Fix For: 0.7.0

 Attachments: HIVE-1378.1.patch, HIVE-1378.2.patch, HIVE-1378.3.patch, 
 HIVE-1378.patch


 In order to be able to select/display any data from JDBC Hive driver, return 
 value for map, array, and struct needs to return a string

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string

2010-09-18 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910899#action_12910899
 ] 

Ning Zhang commented on HIVE-1378:
--

Looks good in general. I've left some minor comments in the cloudera's review 
board. I'm not sure if it could be replicated here, but if not, I'll copy them 
manually.

 Return value for map, array, and struct needs to return a string 
 -

 Key: HIVE-1378
 URL: https://issues.apache.org/jira/browse/HIVE-1378
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Drivers
Reporter: Jerome Boulon
Assignee: Steven Wong
 Fix For: 0.7.0

 Attachments: HIVE-1378.1.patch, HIVE-1378.patch


 In order to be able to select/display any data from JDBC Hive driver, return 
 value for map, array, and struct needs to return a string

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1651) ScriptOperator should not forward any output to downstream operators if an exception is happened

2010-09-17 Thread Ning Zhang (JIRA)
ScriptOperator should not forward any output to downstream operators if an 
exception is happened


 Key: HIVE-1651
 URL: https://issues.apache.org/jira/browse/HIVE-1651
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang


ScriptOperator spawns 2 threads for getting the stdout and stderr from the 
script and then forward the output from stdout to downstream operators. In case 
of any exceptions to the script (e.g., got killed), the ScriptOperator got an 
exception and throw it to upstream operators until MapOperator got it and call 
close(abort). Before the ScriptOperator.close() is called the script output 
stream can still forward output to downstream operators. We should terminate it 
immediately.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1651) ScriptOperator should not forward any output to downstream operators if an exception is happened

2010-09-17 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1651:
-

Attachment: HIVE-1651.patch

 ScriptOperator should not forward any output to downstream operators if an 
 exception is happened
 

 Key: HIVE-1651
 URL: https://issues.apache.org/jira/browse/HIVE-1651
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1651.patch


 ScriptOperator spawns 2 threads for getting the stdout and stderr from the 
 script and then forward the output from stdout to downstream operators. In 
 case of any exceptions to the script (e.g., got killed), the ScriptOperator 
 got an exception and throw it to upstream operators until MapOperator got it 
 and call close(abort). Before the ScriptOperator.close() is called the script 
 output stream can still forward output to downstream operators. We should 
 terminate it immediately.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1651) ScriptOperator should not forward any output to downstream operators if an exception is happened

2010-09-17 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1651:
-

Status: Patch Available  (was: Open)

 ScriptOperator should not forward any output to downstream operators if an 
 exception is happened
 

 Key: HIVE-1651
 URL: https://issues.apache.org/jira/browse/HIVE-1651
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1651.patch


 ScriptOperator spawns 2 threads for getting the stdout and stderr from the 
 script and then forward the output from stdout to downstream operators. In 
 case of any exceptions to the script (e.g., got killed), the ScriptOperator 
 got an exception and throw it to upstream operators until MapOperator got it 
 and call close(abort). Before the ScriptOperator.close() is called the script 
 output stream can still forward output to downstream operators. We should 
 terminate it immediately.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string

2010-09-17 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910758#action_12910758
 ] 

Ning Zhang commented on HIVE-1378:
--

Taking a look now. 

 Return value for map, array, and struct needs to return a string 
 -

 Key: HIVE-1378
 URL: https://issues.apache.org/jira/browse/HIVE-1378
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Drivers
Reporter: Jerome Boulon
Assignee: Steven Wong
 Fix For: 0.7.0

 Attachments: HIVE-1378.1.patch, HIVE-1378.patch


 In order to be able to select/display any data from JDBC Hive driver, return 
 value for map, array, and struct needs to return a string

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1655) Adding consistency check at jobClose() when committing dynamic partitions

2010-09-17 Thread Ning Zhang (JIRA)
Adding consistency check at jobClose() when committing dynamic partitions
-

 Key: HIVE-1655
 URL: https://issues.apache.org/jira/browse/HIVE-1655
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang


In case of dynamic partition insert, FileSinkOperator generated a directory for 
a new partition and the files in the directory is named with '_tmp*'. When a 
task succeed, the file is renamed to remove the _tmp, which essentially 
implement the commit semantics. A lot of exceptions could happen (process got 
killed, machine dies etc.) could left the _tmp files exist in the DP directory. 
These _tmp files should be deleted (rolled back) at successful jobClose(). 
After the deletion, we should also delete any empty directories.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1655) Adding consistency check at jobClose() when committing dynamic partitions

2010-09-17 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910880#action_12910880
 ] 

Ning Zhang commented on HIVE-1655:
--

Actually the _tmp files are taken care of by FSPaths.commit() called at 
FileSinkOperator.close() and any missed _tmp* files are removed in jobClose() 
- Utilities.removeTempOrDuplicateFiles(). The only missing piece is the remove 
the empty directories at jobClose().

 Adding consistency check at jobClose() when committing dynamic partitions
 -

 Key: HIVE-1655
 URL: https://issues.apache.org/jira/browse/HIVE-1655
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang

 In case of dynamic partition insert, FileSinkOperator generated a directory 
 for a new partition and the files in the directory is named with '_tmp*'. 
 When a task succeed, the file is renamed to remove the _tmp, which 
 essentially implement the commit semantics. A lot of exceptions could 
 happen (process got killed, machine dies etc.) could left the _tmp files 
 exist in the DP directory. These _tmp files should be deleted (rolled back) 
 at successful jobClose(). After the deletion, we should also delete any empty 
 directories.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1655) Adding consistency check at jobClose() when committing dynamic partitions

2010-09-17 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1655:
-

Attachment: HIVE-1655.patch

 Adding consistency check at jobClose() when committing dynamic partitions
 -

 Key: HIVE-1655
 URL: https://issues.apache.org/jira/browse/HIVE-1655
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1655.patch


 In case of dynamic partition insert, FileSinkOperator generated a directory 
 for a new partition and the files in the directory is named with '_tmp*'. 
 When a task succeed, the file is renamed to remove the _tmp, which 
 essentially implement the commit semantics. A lot of exceptions could 
 happen (process got killed, machine dies etc.) could left the _tmp files 
 exist in the DP directory. These _tmp files should be deleted (rolled back) 
 at successful jobClose(). After the deletion, we should also delete any empty 
 directories.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1655) Adding consistency check at jobClose() when committing dynamic partitions

2010-09-17 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1655:
-

Status: Patch Available  (was: Open)

 Adding consistency check at jobClose() when committing dynamic partitions
 -

 Key: HIVE-1655
 URL: https://issues.apache.org/jira/browse/HIVE-1655
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1655.patch


 In case of dynamic partition insert, FileSinkOperator generated a directory 
 for a new partition and the files in the directory is named with '_tmp*'. 
 When a task succeed, the file is renamed to remove the _tmp, which 
 essentially implement the commit semantics. A lot of exceptions could 
 happen (process got killed, machine dies etc.) could left the _tmp files 
 exist in the DP directory. These _tmp files should be deleted (rolled back) 
 at successful jobClose(). After the deletion, we should also delete any empty 
 directories.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1361) table/partition level statistics

2010-09-16 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1361:
-

Attachment: HIVE-1361.patch
HIVE-1361.java_only.patch

Uploading a full version (HIVE-1361.patch) and a Java code only version 
(HIVE-1361.java_only.patch). 

This patch is based on Ahmed's previous patch and implements the following 
feature:
  1) automatically gather stats (number of rows currently) whenever an INSERT 
OVERWRITE TABLE is issued. Each mapper/reducer push their partial stats to 
either MySQL/Derby through JDBC or HBase. The INSERT OVERWRITE statement could 
be anything include dynamic partition insert, multi-table inserts and inserting 
to bucketized partitions. A StatsTask is responsible for aggregating partial 
stats at the end of the query and update the metastore.
  2) The stats of a table/partition can be exposed to the user by 'DESC 
EXTENDED' to the table/partition. They are stored as the storage parameters 
(numRows, nuFiles, numPartitions). 
  3) Introducing a new command 'ANALYZE TABLE [PARTITION (PARTITION SPEC)] 
COMPUTE STATISTICS' to scan the table/partition and gather stats in a similar 
fashion as INSERT OVERWRITE command except that the plan has only 1 MR job 
consisting a TableScanOperator and a StatsTask. Partition spec could be full 
partition spec or partial partition spec similar to what dynamic partition 
insert uses. This allows the user to analyze a subset/all partitions of a 
table. The resulting stats are stored in the same parameter in the meatstore.

Tested locally (unit tests) for JDBC:derby, hbase and on a cluster with 
JDBC:MySQL. 

Will run the full unit tests again. 

 table/partition level statistics
 

 Key: HIVE-1361
 URL: https://issues.apache.org/jira/browse/HIVE-1361
 Project: Hadoop Hive
  Issue Type: Sub-task
Affects Versions: 0.6.0
Reporter: Ning Zhang
Assignee: Ahmed M Aly
 Attachments: HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch


 At the first step, we gather table-level stats for non-partitioned table and 
 partition-level stats for partitioned table. Future work could extend the 
 table level stats to partitioned table as well. 
 There are 3 major milestones in this subtask: 
  1) extend the insert statement to gather table/partition level stats 
 on-the-fly.
  2) extend metastore API to support storing and retrieving stats for a 
 particular table/partition. 
  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
 existing tables/partitions. 
 The proposed stats are:
 Partition-level stats: 
   - number of rows
   - total size in bytes
   - number of files
   - max, min, average row sizes
   - max, min, average file sizes
 Table-level stats in addition to partition level stats:
   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1648) Automatically gathering stats when reading a table/partition

2010-09-16 Thread Ning Zhang (JIRA)
Automatically gathering stats when reading a table/partition


 Key: HIVE-1648
 URL: https://issues.apache.org/jira/browse/HIVE-1648
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Ning Zhang


HIVE-1361 introduces a new command 'ANALYZE TABLE T COMPUTE STATISTICS' to 
gathering stats. This requires additional scan of the data. Stats gathering can 
be piggy-backed on TableScanOperator whenever a table/partition is scanned 
(given not LIMIT operator). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1648) Automatically gathering stats when reading a table/partition

2010-09-16 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1648:
-

Parent: HIVE-33
Issue Type: Sub-task  (was: New Feature)

 Automatically gathering stats when reading a table/partition
 

 Key: HIVE-1648
 URL: https://issues.apache.org/jira/browse/HIVE-1648
 Project: Hadoop Hive
  Issue Type: Sub-task
Reporter: Ning Zhang

 HIVE-1361 introduces a new command 'ANALYZE TABLE T COMPUTE STATISTICS' to 
 gathering stats. This requires additional scan of the data. Stats gathering 
 can be piggy-backed on TableScanOperator whenever a table/partition is 
 scanned (given not LIMIT operator). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1361) table/partition level statistics

2010-09-16 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1361:
-

Status: Patch Available  (was: Open)

 table/partition level statistics
 

 Key: HIVE-1361
 URL: https://issues.apache.org/jira/browse/HIVE-1361
 Project: Hadoop Hive
  Issue Type: Sub-task
Affects Versions: 0.6.0
Reporter: Ning Zhang
Assignee: Ahmed M Aly
 Attachments: HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch


 At the first step, we gather table-level stats for non-partitioned table and 
 partition-level stats for partitioned table. Future work could extend the 
 table level stats to partitioned table as well. 
 There are 3 major milestones in this subtask: 
  1) extend the insert statement to gather table/partition level stats 
 on-the-fly.
  2) extend metastore API to support storing and retrieving stats for a 
 particular table/partition. 
  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
 existing tables/partitions. 
 The proposed stats are:
 Partition-level stats: 
   - number of rows
   - total size in bytes
   - number of files
   - max, min, average row sizes
   - max, min, average file sizes
 Table-level stats in addition to partition level stats:
   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string

2010-09-16 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910444#action_12910444
 ] 

Ning Zhang commented on HIVE-1378:
--

Will take a look.

 Return value for map, array, and struct needs to return a string 
 -

 Key: HIVE-1378
 URL: https://issues.apache.org/jira/browse/HIVE-1378
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Drivers
Reporter: Jerome Boulon
Assignee: Steven Wong
 Fix For: 0.7.0

 Attachments: HIVE-1378.patch


 In order to be able to select/display any data from JDBC Hive driver, return 
 value for map, array, and struct needs to return a string

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string

2010-09-16 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910455#action_12910455
 ] 

Ning Zhang commented on HIVE-1378:
--

Steven, there are conflicts when applying this patch. Can you regenerate it?

 Return value for map, array, and struct needs to return a string 
 -

 Key: HIVE-1378
 URL: https://issues.apache.org/jira/browse/HIVE-1378
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Drivers
Reporter: Jerome Boulon
Assignee: Steven Wong
 Fix For: 0.7.0

 Attachments: HIVE-1378.patch


 In order to be able to select/display any data from JDBC Hive driver, return 
 value for map, array, and struct needs to return a string

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1639) ExecDriver.addInputPaths() error if partition name contains a comma

2010-09-15 Thread Ning Zhang (JIRA)
ExecDriver.addInputPaths() error if partition name contains a comma
---

 Key: HIVE-1639
 URL: https://issues.apache.org/jira/browse/HIVE-1639
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang


The ExecDriver.addInputPaths() calls FileInputFormat.addPaths(), which takes a 
comma-separated string representing a set of paths. If the path name of a input 
file contains a comma, this code throw an exception: 
java.lang.IllegalArgumentException: Can not create a Path from an empty string.

Instead of calling FileInputFormat.addPaths(), ExecDriver.addInputPaths should 
iterate all paths and call FileInputFormat.addInputPath. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1639) ExecDriver.addInputPaths() error if partition name contains a comma

2010-09-15 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1639:
-

Attachment: HIVE-1639.patch

 ExecDriver.addInputPaths() error if partition name contains a comma
 ---

 Key: HIVE-1639
 URL: https://issues.apache.org/jira/browse/HIVE-1639
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1639.patch


 The ExecDriver.addInputPaths() calls FileInputFormat.addPaths(), which takes 
 a comma-separated string representing a set of paths. If the path name of a 
 input file contains a comma, this code throw an exception: 
 java.lang.IllegalArgumentException: Can not create a Path from an empty 
 string.
 Instead of calling FileInputFormat.addPaths(), ExecDriver.addInputPaths 
 should iterate all paths and call FileInputFormat.addInputPath. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1639) ExecDriver.addInputPaths() error if partition name contains a comma

2010-09-15 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1639:
-

Status: Patch Available  (was: Open)

 ExecDriver.addInputPaths() error if partition name contains a comma
 ---

 Key: HIVE-1639
 URL: https://issues.apache.org/jira/browse/HIVE-1639
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1639.patch


 The ExecDriver.addInputPaths() calls FileInputFormat.addPaths(), which takes 
 a comma-separated string representing a set of paths. If the path name of a 
 input file contains a comma, this code throw an exception: 
 java.lang.IllegalArgumentException: Can not create a Path from an empty 
 string.
 Instead of calling FileInputFormat.addPaths(), ExecDriver.addInputPaths 
 should iterate all paths and call FileInputFormat.addInputPath. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode

2010-09-15 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909943#action_12909943
 ] 

Ning Zhang commented on HIVE-1570:
--

Joy, scriptfile1.q actually failed on TestMinimrCliDriver with the command 

ant test -Dhadoop.version=0.20.0 -Dtestcase=TestMinimrCliDriver 
-Dminimr.query.files=scriptfile1.q

It gives NPE on ExecDriver.java:625. This NPE is a different issue and it can 
be solved by changing 'conf' to 'job'. But even after this change the NPE is 
gone and the test still failed. Should we move this test outside 
minimr.query.files for now before this JIRA is fixed?

 referencing an added file by it's name in a transform script does not work in 
 hive local mode
 -

 Key: HIVE-1570
 URL: https://issues.apache.org/jira/browse/HIVE-1570
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Joydeep Sen Sarma
Assignee: Joydeep Sen Sarma

 Yongqiang tried this and it fails in local mode:
 add file ../data/scripts/dumpdata_script.py;
 select count(distinct subq.key) from
 (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key 
 = 10) subq;
 this needs to be fixed because it means we cannot choose local mode 
 automatically in case of transform scripts (since different paths need to be 
 used for cluster vs. local mode execution)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1639) ExecDriver.addInputPaths() error if partition name contains a comma

2010-09-15 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1639:
-

Attachment: HIVE-1639.2.patch

Added a test case.

 ExecDriver.addInputPaths() error if partition name contains a comma
 ---

 Key: HIVE-1639
 URL: https://issues.apache.org/jira/browse/HIVE-1639
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1639.2.patch, HIVE-1639.patch


 The ExecDriver.addInputPaths() calls FileInputFormat.addPaths(), which takes 
 a comma-separated string representing a set of paths. If the path name of a 
 input file contains a comma, this code throw an exception: 
 java.lang.IllegalArgumentException: Can not create a Path from an empty 
 string.
 Instead of calling FileInputFormat.addPaths(), ExecDriver.addInputPaths 
 should iterate all paths and call FileInputFormat.addInputPath. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1629) Patch to fix hashCode method in DoubleWritable class

2010-09-13 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang resolved HIVE-1629.
--

Fix Version/s: 0.7.0
   Resolution: Fixed

Committed. Thanks Vaibhav!

 Patch to fix hashCode method in DoubleWritable class
 

 Key: HIVE-1629
 URL: https://issues.apache.org/jira/browse/HIVE-1629
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Vaibhav Aggarwal
Assignee: Vaibhav Aggarwal
 Fix For: 0.7.0

 Attachments: HIVE-1629.patch


 A patch to fix the hashCode() method of DoubleWritable class of Hive.
 It prevents the HashMap (of type DoubleWritable) from behaving as LinkedList.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1629) Patch to fix hashCode method in DoubleWritable class

2010-09-13 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908639#action_12908639
 ] 

Ning Zhang commented on HIVE-1629:
--

Good question John. I think this patch doesn't affect bucketing, which is 
implemented using ObjectInspectorUtils.hashCode(). Actually the hash function 
used there for Double is the same as the one provided in this patch. But I'll 
double check with Zheng/Namit tomorrow. 

 Patch to fix hashCode method in DoubleWritable class
 

 Key: HIVE-1629
 URL: https://issues.apache.org/jira/browse/HIVE-1629
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Vaibhav Aggarwal
Assignee: Vaibhav Aggarwal
 Fix For: 0.7.0

 Attachments: HIVE-1629.patch


 A patch to fix the hashCode() method of DoubleWritable class of Hive.
 It prevents the HashMap (of type DoubleWritable) from behaving as LinkedList.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1629) Patch to fix hashCode method in DoubleWritable class

2010-09-12 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang reassigned HIVE-1629:


Assignee: Vaibhav Aggarwal

 Patch to fix hashCode method in DoubleWritable class
 

 Key: HIVE-1629
 URL: https://issues.apache.org/jira/browse/HIVE-1629
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Vaibhav Aggarwal
Assignee: Vaibhav Aggarwal
 Attachments: HIVE-1629.patch


 A patch to fix the hashCode() method of DoubleWritable class of Hive.
 It prevents the HashMap (of type DoubleWritable) from behaving as LinkedList.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1629) Patch to fix hashCode method in DoubleWritable class

2010-09-12 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908599#action_12908599
 ] 

Ning Zhang commented on HIVE-1629:
--

+1 Will commit if tests pass.

 Patch to fix hashCode method in DoubleWritable class
 

 Key: HIVE-1629
 URL: https://issues.apache.org/jira/browse/HIVE-1629
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Vaibhav Aggarwal
 Attachments: HIVE-1629.patch


 A patch to fix the hashCode() method of DoubleWritable class of Hive.
 It prevents the HashMap (of type DoubleWritable) from behaving as LinkedList.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1622) Use CombineHiveInputFormat for the merge job if hive.merge.mapredfiles=true

2010-09-12 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1622:
-

Attachment: HIVE-1622_0.17.patch

oops, forgot the patch hadoop 0.17 logs. 

 Use CombineHiveInputFormat for the merge job if hive.merge.mapredfiles=true
 ---

 Key: HIVE-1622
 URL: https://issues.apache.org/jira/browse/HIVE-1622
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Fix For: 0.7.0

 Attachments: HIVE-1622.patch, HIVE-1622_0.17.patch


 Currently map-only merge (using CombineHiveInputFormat) is only enabled for 
 merging files generated by mappers. It should be used for files generated at 
 readers as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1629) Patch to fix hashCode method in DoubleWritable class

2010-09-10 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908194#action_12908194
 ] 

Ning Zhang commented on HIVE-1629:
--

+long v = Double.doubleToLongBits(value);
+return (int) (v ^ (v  32));

won't this return 0 for all long values less than 2^32?

Search on the web and it seems the following 64 bit to 32 bit hash is a good one

http://www.cris.com/~ttwang/tech/inthash.htm

 Patch to fix hashCode method in DoubleWritable class
 

 Key: HIVE-1629
 URL: https://issues.apache.org/jira/browse/HIVE-1629
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Vaibhav Aggarwal
 Attachments: HIVE-1629.patch


 A patch to fix the hashCode() method of DoubleWritable class of Hive.
 It prevents the HashMap (of type DoubleWritable) from behaving as LinkedList.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1622) Use CombineHiveInputFormat for the merge job if hive.merge.mapredfiles=true

2010-09-08 Thread Ning Zhang (JIRA)
Use CombineHiveInputFormat for the merge job if hive.merge.mapredfiles=true
---

 Key: HIVE-1622
 URL: https://issues.apache.org/jira/browse/HIVE-1622
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang


Currently map-only merge (using CombineHiveInputFormat) is only enabled for 
merging files generated by mappers. It should be used for files generated at 
readers as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1622) Use CombineHiveInputFormat for the merge job if hive.merge.mapredfiles=true

2010-09-08 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1622:
-

Status: Patch Available  (was: Open)

 Use CombineHiveInputFormat for the merge job if hive.merge.mapredfiles=true
 ---

 Key: HIVE-1622
 URL: https://issues.apache.org/jira/browse/HIVE-1622
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1622.patch


 Currently map-only merge (using CombineHiveInputFormat) is only enabled for 
 merging files generated by mappers. It should be used for files generated at 
 readers as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1614) UDTF json_tuple should return null row when input is not a valid JSON string

2010-09-03 Thread Ning Zhang (JIRA)
UDTF json_tuple should return null row when input is not a valid JSON string


 Key: HIVE-1614
 URL: https://issues.apache.org/jira/browse/HIVE-1614
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang


If the input column is not a valid JSON string, json_tuple will not return 
anything but this will prevent the downstream operators to access the left-hand 
side table. We should output a NULL row instead, similar to when the input 
column is a NULL value. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1614) UDTF json_tuple should return null row when input is not a valid JSON string

2010-09-03 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1614:
-

Attachment: HIVE-1614.patch

 UDTF json_tuple should return null row when input is not a valid JSON string
 

 Key: HIVE-1614
 URL: https://issues.apache.org/jira/browse/HIVE-1614
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1614.patch


 If the input column is not a valid JSON string, json_tuple will not return 
 anything but this will prevent the downstream operators to access the 
 left-hand side table. We should output a NULL row instead, similar to when 
 the input column is a NULL value. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1614) UDTF json_tuple should return null row when input is not a valid JSON string

2010-09-03 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1614:
-

   Status: Patch Available  (was: Open)
Affects Version/s: 0.7.0

 UDTF json_tuple should return null row when input is not a valid JSON string
 

 Key: HIVE-1614
 URL: https://issues.apache.org/jira/browse/HIVE-1614
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1614.patch


 If the input column is not a valid JSON string, json_tuple will not return 
 anything but this will prevent the downstream operators to access the 
 left-hand side table. We should output a NULL row instead, similar to when 
 the input column is a NULL value. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1614) UDTF json_tuple should return null row when input is not a valid JSON string

2010-09-03 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1614:
-

Attachment: HIVE-1614.2.patch

Added a catch for all throwable in the UDFT.

 UDTF json_tuple should return null row when input is not a valid JSON string
 

 Key: HIVE-1614
 URL: https://issues.apache.org/jira/browse/HIVE-1614
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1614.2.patch, HIVE-1614.patch


 If the input column is not a valid JSON string, json_tuple will not return 
 anything but this will prevent the downstream operators to access the 
 left-hand side table. We should output a NULL row instead, similar to when 
 the input column is a NULL value. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1467) dynamic partitioning should cluster by partitions

2010-09-02 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905700#action_12905700
 ] 

Ning Zhang commented on HIVE-1467:
--

As discussed with Joydeep and Ashish, it seems we should use the distribute 
by mechanism rather than cluster by to avoid sorting at the reducer side. 
The difference between them is distribute by only have MapReduce partition 
columns set to be the Dyanmic partition columns, and cluster by will 
additionally set key columns as the dynamic partition columns as well.

So I think we can use 2 mode of reducer-side DP with tradeoffs:
  -- distribute by mode: no sorting but reducers have to keep all files open 
during DP insert. Good choice when there are large amount of data passed from 
mappers to reducers.
  -- cluster by mode: sorting by the DP columns, but we can close a DP file 
once FileSinkOperator sees a dfferent DP column value. Good choice when total 
data size is not that large but there are large number of DPs generated.  

 dynamic partitioning should cluster by partitions
 -

 Key: HIVE-1467
 URL: https://issues.apache.org/jira/browse/HIVE-1467
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Joydeep Sen Sarma
Assignee: Namit Jain

 (based on internal discussion with Ning). Dynamic partitioning should offer a 
 mode where it clusters data by partition before writing out to each 
 partition. This will reduce number of files. Details:
 1. always use reducer stage
 2. mapper sends to reducer based on partitioning column. ie. reducer = 
 f(partition-cols)
 3. f() can be made somewhat smart to:
a. spread large partitions across multiple reducers - each mapper can 
 maintain row count seen per partition - and then apply (whenever it sees a 
 new row for a partition): 
* reducer = (row count / 64k) % numReducers 
Small partitions always go to one reducer. the larger the partition, 
 the more the reducers. this prevents one reducer becoming bottleneck writing 
 out one partition
b. this still leaves the issue of very large number of splits. (64K rows 
 from 10K mappers is pretty large). for this one can apply one slight 
 modification:
* reducer = (mapper-id/1024 + row-count/64k) % numReducers
ie. - the first 1000 mappers always send the first 64K rows for one 
 partition to the same reducer. the next 1000 send it to the next one. and so 
 on.
 the constants 1024 and 64k are used just as an example. i don't know what the 
 right numbers are. it's also clear that this is a case where we need hadoop 
 to do only partitioning (and no sorting). this will be a useful feature to 
 have in hadoop. that will reduce the overhead due to reducers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1607) Reinstate and deprecate IMetaStoreClient methods removed in HIVE-675

2010-08-31 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1607:
-

   Status: Resolved  (was: Patch Available)
Fix Version/s: 0.7.0
   Resolution: Fixed

Committed. Thanks Carl!

 Reinstate and deprecate IMetaStoreClient methods removed in HIVE-675
 

 Key: HIVE-1607
 URL: https://issues.apache.org/jira/browse/HIVE-1607
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Fix For: 0.7.0

 Attachments: HIVE-1607.1.patch.txt, HIVE-1607.2.patch.txt


 Several methods were removed from the IMetaStoreClient interface as part of 
 HIVE-675:
 {code}
   /**
* Drop the table.
*
* @param tableName
*  The table to drop
* @param deleteData
*  Should we delete the underlying data
* @throws MetaException
*   Could not drop table properly.
* @throws UnknownTableException
*   The table wasn't found.
* @throws TException
*   A thrift communication error occurred
* @throws NoSuchObjectException
*   The table wasn't found.
*/
   public void dropTable(String tableName, boolean deleteData)
   throws MetaException, UnknownTableException, TException,
   NoSuchObjectException;
   /**
* Get a table object.
*
* @param tableName
*  Name of the table to fetch.
* @return An object representing the table.
* @throws MetaException
*   Could not fetch the table
* @throws TException
*   A thrift communication error occurred
* @throws NoSuchObjectException
*   In case the table wasn't found.
*/
   public Table getTable(String tableName) throws MetaException, TException,
   NoSuchObjectException;
   public boolean tableExists(String databaseName, String tableName) throws 
 MetaException,
   TException, UnknownDBException;
 {code}
 These methods should be reinstated with a deprecation warning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1598) use SequenceFile rather than TextFile format for hive query results

2010-08-31 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1598:
-

Status: Patch Available  (was: Open)

all 0.17  0.20 tests passed.

 use SequenceFile rather than TextFile format for hive query results
 ---

 Key: HIVE-1598
 URL: https://issues.apache.org/jira/browse/HIVE-1598
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1598.patch


 Hive query's result is written to a temporary directory first and then 
 FetchTask takes the files and display it to the users. Currently the file 
 format used for the resulting file is TextFile format. This could cause 
 incorrect result display if some string typed column contains new lines, 
 which are used as record delimiters in TextInputFormat. Switching to 
 SequenceFile format will solve this problem. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1598) use SequenceFile rather than TextFile format for hive query results

2010-08-31 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1598:
-

Attachment: HIVE-1598.patch

This patch only add support for using SequenceFile as query result. There are 
still questions on whether we should use it for script operator or not. Will 
open another JIRA if needed.

 use SequenceFile rather than TextFile format for hive query results
 ---

 Key: HIVE-1598
 URL: https://issues.apache.org/jira/browse/HIVE-1598
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1598.patch


 Hive query's result is written to a temporary directory first and then 
 FetchTask takes the files and display it to the users. Currently the file 
 format used for the resulting file is TextFile format. This could cause 
 incorrect result display if some string typed column contains new lines, 
 which are used as record delimiters in TextInputFormat. Switching to 
 SequenceFile format will solve this problem. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1598) use SequenceFile rather than TextFile format for hive query results

2010-08-31 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1598:
-

Attachment: HIVE-1598.2.patch

Attached the test case and also removed some debugging info. These are the only 
changes. 

 use SequenceFile rather than TextFile format for hive query results
 ---

 Key: HIVE-1598
 URL: https://issues.apache.org/jira/browse/HIVE-1598
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1598.2.patch, HIVE-1598.patch


 Hive query's result is written to a temporary directory first and then 
 FetchTask takes the files and display it to the users. Currently the file 
 format used for the resulting file is TextFile format. This could cause 
 incorrect result display if some string typed column contains new lines, 
 which are used as record delimiters in TextInputFormat. Switching to 
 SequenceFile format will solve this problem. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1605) regression and improvements in handling NULLs in joins

2010-08-30 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1605:
-

Attachment: HIVE-1605.3.patch

Uploading hive-1605.3.patch. thanks amareshwari.

 regression and improvements in handling NULLs in joins
 --

 Key: HIVE-1605
 URL: https://issues.apache.org/jira/browse/HIVE-1605
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1605.2.patch, HIVE-1605.3.patch, HIVE-1605.patch


 There are regressions in sort-merge map join after HIVE-741. There are a lot 
 of OOM exceptions in SMBMapJoinOperator. This caused by the HashMap 
 maintained for each key to remember whether it is NULL. This takes too much 
 memory when the tables are large. 
 A second issu is in handling NULLs if the join keys are more than 1 column. 
 This appears in regular MapJoin as well as SMBMapJoin. The code only checks 
 if all the columns are NULL. It should return false in match if any joined 
 value is NULL. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-675) add database/schema support Hive QL

2010-08-30 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904415#action_12904415
 ] 

Ning Zhang commented on HIVE-675:
-

@Carl, I got compiling errors for tableExists() and getTable(). It would be 
nice to bring back those methods (as well as other methods like dropTable) 
without the database argument. Putting deprecation warning is fine for me. 

 add database/schema support Hive QL
 ---

 Key: HIVE-675
 URL: https://issues.apache.org/jira/browse/HIVE-675
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Metastore, Query Processor
Reporter: Prasad Chakka
Assignee: Carl Steinbach
 Fix For: 0.7.0

 Attachments: hive-675-2009-9-16.patch, hive-675-2009-9-19.patch, 
 hive-675-2009-9-21.patch, hive-675-2009-9-23.patch, hive-675-2009-9-7.patch, 
 hive-675-2009-9-8.patch, HIVE-675-2010-08-16.patch.txt, 
 HIVE-675-2010-7-16.patch.txt, HIVE-675-2010-8-4.patch.txt, 
 HIVE-675.10.patch.txt, HIVE-675.11.patch.txt, HIVE-675.12.patch.txt, 
 HIVE-675.13.patch.txt


 Currently all Hive tables reside in single namespace (default). Hive should 
 support multiple namespaces (databases or schemas) such that users can create 
 tables in their specific namespaces. These name spaces can have different 
 warehouse directories (with a default naming scheme) and possibly different 
 properties.
 There is already some support for this in metastore but Hive query parser 
 should have this feature as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1607) Reinstate and deprecate IMetaStoreClient methods removed in HIVE-675

2010-08-30 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904502#action_12904502
 ] 

Ning Zhang commented on HIVE-1607:
--

I'll take a look.


 Reinstate and deprecate IMetaStoreClient methods removed in HIVE-675
 

 Key: HIVE-1607
 URL: https://issues.apache.org/jira/browse/HIVE-1607
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Attachments: HIVE-1607.1.patch.txt, HIVE-1607.2.patch.txt


 Several methods were removed from the IMetaStoreClient interface as part of 
 HIVE-675:
 {code}
   /**
* Drop the table.
*
* @param tableName
*  The table to drop
* @param deleteData
*  Should we delete the underlying data
* @throws MetaException
*   Could not drop table properly.
* @throws UnknownTableException
*   The table wasn't found.
* @throws TException
*   A thrift communication error occurred
* @throws NoSuchObjectException
*   The table wasn't found.
*/
   public void dropTable(String tableName, boolean deleteData)
   throws MetaException, UnknownTableException, TException,
   NoSuchObjectException;
   /**
* Get a table object.
*
* @param tableName
*  Name of the table to fetch.
* @return An object representing the table.
* @throws MetaException
*   Could not fetch the table
* @throws TException
*   A thrift communication error occurred
* @throws NoSuchObjectException
*   In case the table wasn't found.
*/
   public Table getTable(String tableName) throws MetaException, TException,
   NoSuchObjectException;
   public boolean tableExists(String databaseName, String tableName) throws 
 MetaException,
   TException, UnknownDBException;
 {code}
 These methods should be reinstated with a deprecation warning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1607) Reinstate and deprecate IMetaStoreClient methods removed in HIVE-675

2010-08-30 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904512#action_12904512
 ] 

Ning Zhang commented on HIVE-1607:
--

+1.  Will commit if tests pass.

 Reinstate and deprecate IMetaStoreClient methods removed in HIVE-675
 

 Key: HIVE-1607
 URL: https://issues.apache.org/jira/browse/HIVE-1607
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Attachments: HIVE-1607.1.patch.txt, HIVE-1607.2.patch.txt


 Several methods were removed from the IMetaStoreClient interface as part of 
 HIVE-675:
 {code}
   /**
* Drop the table.
*
* @param tableName
*  The table to drop
* @param deleteData
*  Should we delete the underlying data
* @throws MetaException
*   Could not drop table properly.
* @throws UnknownTableException
*   The table wasn't found.
* @throws TException
*   A thrift communication error occurred
* @throws NoSuchObjectException
*   The table wasn't found.
*/
   public void dropTable(String tableName, boolean deleteData)
   throws MetaException, UnknownTableException, TException,
   NoSuchObjectException;
   /**
* Get a table object.
*
* @param tableName
*  Name of the table to fetch.
* @return An object representing the table.
* @throws MetaException
*   Could not fetch the table
* @throws TException
*   A thrift communication error occurred
* @throws NoSuchObjectException
*   In case the table wasn't found.
*/
   public Table getTable(String tableName) throws MetaException, TException,
   NoSuchObjectException;
   public boolean tableExists(String databaseName, String tableName) throws 
 MetaException,
   TException, UnknownDBException;
 {code}
 These methods should be reinstated with a deprecation warning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1605) regression and improvements in handling NULLs in joins

2010-08-29 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1605:
-

Status: Patch Available  (was: Open)

 regression and improvements in handling NULLs in joins
 --

 Key: HIVE-1605
 URL: https://issues.apache.org/jira/browse/HIVE-1605
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1605.patch


 There are regressions in sort-merge map join after HIVE-741. There are a lot 
 of OOM exceptions in SMBMapJoinOperator. This caused by the HashMap 
 maintained for each key to remember whether it is NULL. This takes too much 
 memory when the tables are large. 
 A second issu is in handling NULLs if the join keys are more than 1 column. 
 This appears in regular MapJoin as well as SMBMapJoin. The code only checks 
 if all the columns are NULL. It should return false in match if any joined 
 value is NULL. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1605) regression and improvements in handling NULLs in joins

2010-08-29 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1605:
-

Attachment: HIVE-1605.patch

Passed all test except scriptfile1.q in TestMinimrCliDriver in hadoop 0,20. 
This test also failed on trunk. 

 regression and improvements in handling NULLs in joins
 --

 Key: HIVE-1605
 URL: https://issues.apache.org/jira/browse/HIVE-1605
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1605.patch


 There are regressions in sort-merge map join after HIVE-741. There are a lot 
 of OOM exceptions in SMBMapJoinOperator. This caused by the HashMap 
 maintained for each key to remember whether it is NULL. This takes too much 
 memory when the tables are large. 
 A second issu is in handling NULLs if the join keys are more than 1 column. 
 This appears in regular MapJoin as well as SMBMapJoin. The code only checks 
 if all the columns are NULL. It should return false in match if any joined 
 value is NULL. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1605) regression and improvements in handling NULLs in joins

2010-08-29 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1605:
-

Attachment: HIVE-1605.2.patch

Thanks Amareshwari for the review. Attached HIVE-1605.2.patch address the 
issues.

 regression and improvements in handling NULLs in joins
 --

 Key: HIVE-1605
 URL: https://issues.apache.org/jira/browse/HIVE-1605
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1605.2.patch, HIVE-1605.patch


 There are regressions in sort-merge map join after HIVE-741. There are a lot 
 of OOM exceptions in SMBMapJoinOperator. This caused by the HashMap 
 maintained for each key to remember whether it is NULL. This takes too much 
 memory when the tables are large. 
 A second issu is in handling NULLs if the join keys are more than 1 column. 
 This appears in regular MapJoin as well as SMBMapJoin. The code only checks 
 if all the columns are NULL. It should return false in match if any joined 
 value is NULL. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



  1   2   3   4   5   6   7   >