[jira] Updated: (HIVE-1691) ANALYZE TABLE command should check columns in partitioin spec

2010-10-05 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1691:
-

Attachment: HIVE-1691.patch

> ANALYZE TABLE command should check columns in partitioin spec
> -
>
> Key: HIVE-1691
> URL: https://issues.apache.org/jira/browse/HIVE-1691
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1691.patch
>
>
> ANALYZE TABEL PARTITION (col1, col2,...) should check whether col1, col2 etc 
> are partition columns.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1691) ANALYZE TABLE command should check columns in partitioin spec

2010-10-05 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1691:
-

Status: Patch Available  (was: Open)

> ANALYZE TABLE command should check columns in partitioin spec
> -
>
> Key: HIVE-1691
> URL: https://issues.apache.org/jira/browse/HIVE-1691
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1691.patch
>
>
> ANALYZE TABEL PARTITION (col1, col2,...) should check whether col1, col2 etc 
> are partition columns.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1691) ANALYZE TABLE command should check columns in partitioin spec

2010-10-05 Thread Ning Zhang (JIRA)
ANALYZE TABLE command should check columns in partitioin spec
-

 Key: HIVE-1691
 URL: https://issues.apache.org/jira/browse/HIVE-1691
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang


ANALYZE TABEL PARTITION (col1, col2,...) should check whether col1, col2 etc 
are partition columns.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1376) Simple UDAFs with more than 1 parameter crash on empty row query

2010-10-04 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1376:
-

Status: Patch Available  (was: Open)

> Simple UDAFs with more than 1 parameter crash on empty row query 
> -
>
> Key: HIVE-1376
> URL: https://issues.apache.org/jira/browse/HIVE-1376
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Mayank Lahiri
>Assignee: Ning Zhang
> Attachments: HIVE-1376.2.patch, HIVE-1376.patch
>
>
> Simple UDAFs with more than 1 parameter crash when the query returns no rows. 
> Currently, this only seems to affect the percentile() UDAF where the second 
> parameter is the percentile to be computed (of type double). I've also 
> verified the bug by adding a dummy parameter to ExampleMin in contrib. 
> On an empty query, Hive seems to be trying to resolve an iterate() method 
> with signature {null,null} instead of {null,double}. You can reproduce this 
> bug using:
> CREATE TABLE pct_test ( val INT );
> SELECT percentile(val, 0.5) FROM pct_test;
> which produces a lot of errors like: 
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
> execute method public boolean 
> org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.iterate(org.apache.hadoop.io.LongWritable,double)
>   on object 
> org.apache.hadoop.hive.ql.udf.udafpercentile$percentilelongevalua...@11d13272 
> of class org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator 
> with arguments {null, null} of size 2

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1674) count(*) returns wrong result when a mapper returns empty results

2010-10-04 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1674:
-

Attachment: HIVE-1674.patch

> count(*) returns wrong result when a mapper returns empty results
> -
>
> Key: HIVE-1674
> URL: https://issues.apache.org/jira/browse/HIVE-1674
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1674.patch
>
>
> select count(*) from src where false; will return # of mappers rather than 0. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1674) count(*) returns wrong result when a mapper returns empty results

2010-10-04 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1674:
-

Status: Patch Available  (was: Open)

> count(*) returns wrong result when a mapper returns empty results
> -
>
> Key: HIVE-1674
> URL: https://issues.apache.org/jira/browse/HIVE-1674
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1674.patch
>
>
> select count(*) from src where false; will return # of mappers rather than 0. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1611) Add alternative search-provider to Hive site

2010-10-01 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917001#action_12917001
 ] 

Ning Zhang commented on HIVE-1611:
--

Thanks for the link Alex. I've talked to Ashish and he said Hive has just been 
approved to TLP. There might be some work need to be done to move the wiki and 
all documentations (I think Edward Capriolo has volunteered to do so?). Let me 
ask Edward and see what he thinks. 

> Add alternative search-provider to Hive site
> 
>
> Key: HIVE-1611
> URL: https://issues.apache.org/jira/browse/HIVE-1611
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Alex Baranau
>Assignee: Alex Baranau
>Priority: Minor
> Attachments: HIVE-1611.patch
>
>
> Use search-hadoop.com service to make available search in Hive sources, MLs, 
> wiki, etc.
> This was initially proposed on user mailing list. The search service was 
> already added in site's skin (common for all Hadoop related projects) before 
> so this issue is about enabling it for Hive. The ultimate goal is to use it 
> at all Hadoop's sub-projects' sites.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1684) intermittent failures in create_escape.q

2010-10-01 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang resolved HIVE-1684.
--

Resolution: Duplicate

duplicate of HIVE-1669.

> intermittent failures in create_escape.q
> 
>
> Key: HIVE-1684
> URL: https://issues.apache.org/jira/browse/HIVE-1684
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: He Yongqiang
>
> [junit] diff -a -I file: -I pfile: -I hdfs: -I /tmp/ -I invalidscheme: -I 
> lastUpdateTime -I lastAccessTime -I [Oo]wner -I CreateTime -I LastAccessTime 
> -I Location -I transient_lastDdlTime -I last_modified_ -I 
> java.lang.RuntimeException -I at org -I at sun -I at java -I at junit -I 
> Caused by: -I [.][.][.] [0-9]* more 
> /data/users/njain/hive_commit1/hive_commit1/build/ql/test/logs/clientpositive/create_escape.q.out
>  
> /data/users/njain/hive_commit1/hive_commit1/ql/src/test/results/clientpositive/create_escape.q.out
> [junit] 48d47
> [junit] < serialization.format\t  
> [junit] 49a49
> [junit] > serialization.format\t  
> Sometimes, I see the above failure. 
> This does not happen always, and needs to be investigated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1684) intermittent failures in create_escape.q

2010-10-01 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916990#action_12916990
 ] 

Ning Zhang commented on HIVE-1684:
--

This is the same as HIVE-1669, which was introduced in the new desc extended 
feature. It should be addressed by HIVE-1658. 

> intermittent failures in create_escape.q
> 
>
> Key: HIVE-1684
> URL: https://issues.apache.org/jira/browse/HIVE-1684
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: He Yongqiang
>
> [junit] diff -a -I file: -I pfile: -I hdfs: -I /tmp/ -I invalidscheme: -I 
> lastUpdateTime -I lastAccessTime -I [Oo]wner -I CreateTime -I LastAccessTime 
> -I Location -I transient_lastDdlTime -I last_modified_ -I 
> java.lang.RuntimeException -I at org -I at sun -I at java -I at junit -I 
> Caused by: -I [.][.][.] [0-9]* more 
> /data/users/njain/hive_commit1/hive_commit1/build/ql/test/logs/clientpositive/create_escape.q.out
>  
> /data/users/njain/hive_commit1/hive_commit1/ql/src/test/results/clientpositive/create_escape.q.out
> [junit] 48d47
> [junit] < serialization.format\t  
> [junit] 49a49
> [junit] > serialization.format\t  
> Sometimes, I see the above failure. 
> This does not happen always, and needs to be investigated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1376) Simple UDAFs with more than 1 parameter crash on empty row query

2010-10-01 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1376:
-

Attachment: HIVE-1376.2.patch

The previous patch failed on several test, particularly count(*) queries. 
Attaching a new patch for percentile only and will update a patch for HIVE-1674 
separately. 

> Simple UDAFs with more than 1 parameter crash on empty row query 
> -
>
> Key: HIVE-1376
> URL: https://issues.apache.org/jira/browse/HIVE-1376
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Mayank Lahiri
>Assignee: Ning Zhang
> Attachments: HIVE-1376.2.patch, HIVE-1376.patch
>
>
> Simple UDAFs with more than 1 parameter crash when the query returns no rows. 
> Currently, this only seems to affect the percentile() UDAF where the second 
> parameter is the percentile to be computed (of type double). I've also 
> verified the bug by adding a dummy parameter to ExampleMin in contrib. 
> On an empty query, Hive seems to be trying to resolve an iterate() method 
> with signature {null,null} instead of {null,double}. You can reproduce this 
> bug using:
> CREATE TABLE pct_test ( val INT );
> SELECT percentile(val, 0.5) FROM pct_test;
> which produces a lot of errors like: 
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
> execute method public boolean 
> org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.iterate(org.apache.hadoop.io.LongWritable,double)
>   on object 
> org.apache.hadoop.hive.ql.udf.udafpercentile$percentilelongevalua...@11d13272 
> of class org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator 
> with arguments {null, null} of size 2

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1611) Add alternative search-provider to Hive site

2010-10-01 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916927#action_12916927
 ] 

Ning Zhang commented on HIVE-1611:
--

Hi Alex, some questions:
 - Hive doesn't have the file author/src/documentation/skinconf.xml, which is 
included in the patch. How does this work?
 - This patch and comments suggest this patch is for Hadoop subprojects. Hive 
is transiting to a TLP independent of Hadoop. Is there an issue after the 
transition?

> Add alternative search-provider to Hive site
> 
>
> Key: HIVE-1611
> URL: https://issues.apache.org/jira/browse/HIVE-1611
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Alex Baranau
>Assignee: Alex Baranau
>Priority: Minor
> Attachments: HIVE-1611.patch
>
>
> Use search-hadoop.com service to make available search in Hive sources, MLs, 
> wiki, etc.
> This was initially proposed on user mailing list. The search service was 
> already added in site's skin (common for all Hadoop related projects) before 
> so this issue is about enabling it for Hive. The ultimate goal is to use it 
> at all Hadoop's sub-projects' sites.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1526) Hive should depend on a release version of Thrift

2010-09-30 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916798#action_12916798
 ] 

Ning Zhang commented on HIVE-1526:
--

I will take a look. 

> Hive should depend on a release version of Thrift
> -
>
> Key: HIVE-1526
> URL: https://issues.apache.org/jira/browse/HIVE-1526
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Build Infrastructure, Clients
>Reporter: Carl Steinbach
>Assignee: Todd Lipcon
> Fix For: 0.7.0
>
> Attachments: HIVE-1526.2.patch.txt, hive-1526.txt, libfb303.jar, 
> libthrift.jar
>
>
> Hive should depend on a release version of Thrift, and ideally it should use 
> Ivy to resolve this dependency.
> The Thrift folks are working on adding Thrift artifacts to a maven repository 
> here: https://issues.apache.org/jira/browse/THRIFT-363

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1376) Simple UDAFs with more than 1 parameter crash on empty row query

2010-09-30 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1376:
-

Status: Open  (was: Patch Available)

> Simple UDAFs with more than 1 parameter crash on empty row query 
> -
>
> Key: HIVE-1376
> URL: https://issues.apache.org/jira/browse/HIVE-1376
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Mayank Lahiri
>Assignee: Ning Zhang
> Attachments: HIVE-1376.patch
>
>
> Simple UDAFs with more than 1 parameter crash when the query returns no rows. 
> Currently, this only seems to affect the percentile() UDAF where the second 
> parameter is the percentile to be computed (of type double). I've also 
> verified the bug by adding a dummy parameter to ExampleMin in contrib. 
> On an empty query, Hive seems to be trying to resolve an iterate() method 
> with signature {null,null} instead of {null,double}. You can reproduce this 
> bug using:
> CREATE TABLE pct_test ( val INT );
> SELECT percentile(val, 0.5) FROM pct_test;
> which produces a lot of errors like: 
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
> execute method public boolean 
> org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.iterate(org.apache.hadoop.io.LongWritable,double)
>   on object 
> org.apache.hadoop.hive.ql.udf.udafpercentile$percentilelongevalua...@11d13272 
> of class org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator 
> with arguments {null, null} of size 2

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

2010-09-30 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1157:
-

Status: Open  (was: Patch Available)

> UDFs can't be loaded via "add jar" when jar is on HDFS
> --
>
> Key: HIVE-1157
> URL: https://issues.apache.org/jira/browse/HIVE-1157
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Philip Zeyliger
>Priority: Minor
> Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, 
> HIVE-1157.patch.v4.txt, HIVE-1157.patch.v5.txt, HIVE-1157.v2.patch.txt, 
> output.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that 
> are on jars on HDFS.  The proposed implementation would be for "add jar" to 
> recognize that the target file is on HDFS, copy it locally, and load it into 
> the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  
> Can you use a UDF where the jar which contains the function is on HDFS, and 
> not on the local filesystem.  Specifically, the following does not seem to 
> work:
> # This is Hive 0.5, from svn
> $bin/hive  
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;   
>
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';
> 
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add 
> jar" to download remote jars locally, when necessary (to load them into the 
> classpath), or update URLClassLoader (or whatever is underneath there) to 
> read directly from HDFS, which seems a bit more fragile.  But I wanted to 
> make sure that my interpretation of what's going on is right before I have at 
> it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1427) Provide metastore schema migration scripts (0.5 -> 0.6)

2010-09-30 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916742#action_12916742
 ] 

Ning Zhang commented on HIVE-1427:
--

Carl, this is the only 0.6 blocker that doesn't have patch available. Can you 
work on this as a hi-pri?

> Provide metastore schema migration scripts (0.5 -> 0.6)
> ---
>
> Key: HIVE-1427
> URL: https://issues.apache.org/jira/browse/HIVE-1427
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Metastore
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Fix For: 0.6.0
>
>
> At a minimum this ticket covers packaging up example MySQL migration scripts 
> (cumulative across all schema changes from 0.5 to 0.6) and explaining what to 
> do with them in the release notes.
> This is also probably a good point at which to decide and clearly state which 
> Metastore DBs we officially support in production, e.g. do we need to provide 
> migration scripts for Derby?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1526) Hive should depend on a release version of Thrift

2010-09-30 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916741#action_12916741
 ] 

Ning Zhang commented on HIVE-1526:
--

Carl and Todd, is this a blocking issue for 0.6? If not, we can make it in 0.7 
and get 0.6 release ASAP.

> Hive should depend on a release version of Thrift
> -
>
> Key: HIVE-1526
> URL: https://issues.apache.org/jira/browse/HIVE-1526
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Build Infrastructure, Clients
>Reporter: Carl Steinbach
>Assignee: Todd Lipcon
> Fix For: 0.6.0, 0.7.0
>
> Attachments: HIVE-1526.2.patch.txt, hive-1526.txt, libfb303.jar, 
> libthrift.jar
>
>
> Hive should depend on a release version of Thrift, and ideally it should use 
> Ivy to resolve this dependency.
> The Thrift folks are working on adding Thrift artifacts to a maven repository 
> here: https://issues.apache.org/jira/browse/THRIFT-363

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1376) Simple UDAFs with more than 1 parameter crash on empty row query

2010-09-30 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1376:
-

  Status: Patch Available  (was: Open)
Assignee: Ning Zhang

> Simple UDAFs with more than 1 parameter crash on empty row query 
> -
>
> Key: HIVE-1376
> URL: https://issues.apache.org/jira/browse/HIVE-1376
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Mayank Lahiri
>Assignee: Ning Zhang
> Attachments: HIVE-1376.patch
>
>
> Simple UDAFs with more than 1 parameter crash when the query returns no rows. 
> Currently, this only seems to affect the percentile() UDAF where the second 
> parameter is the percentile to be computed (of type double). I've also 
> verified the bug by adding a dummy parameter to ExampleMin in contrib. 
> On an empty query, Hive seems to be trying to resolve an iterate() method 
> with signature {null,null} instead of {null,double}. You can reproduce this 
> bug using:
> CREATE TABLE pct_test ( val INT );
> SELECT percentile(val, 0.5) FROM pct_test;
> which produces a lot of errors like: 
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
> execute method public boolean 
> org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.iterate(org.apache.hadoop.io.LongWritable,double)
>   on object 
> org.apache.hadoop.hive.ql.udf.udafpercentile$percentilelongevalua...@11d13272 
> of class org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator 
> with arguments {null, null} of size 2

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1376) Simple UDAFs with more than 1 parameter crash on empty row query

2010-09-30 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1376:
-

Attachment: HIVE-1376.patch

Attaching a patch for review. This patch also fixes HIVE-1674 (count(*) 
returning wrong results). 

Tests are still running. Will upload a new patch if there are more changes. 

This patch implements 3) as suggest and SELECT PERCENTILE(col, 0.5) from src 
where false; will return a single row with NULL as value. 

> Simple UDAFs with more than 1 parameter crash on empty row query 
> -
>
> Key: HIVE-1376
> URL: https://issues.apache.org/jira/browse/HIVE-1376
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Mayank Lahiri
> Attachments: HIVE-1376.patch
>
>
> Simple UDAFs with more than 1 parameter crash when the query returns no rows. 
> Currently, this only seems to affect the percentile() UDAF where the second 
> parameter is the percentile to be computed (of type double). I've also 
> verified the bug by adding a dummy parameter to ExampleMin in contrib. 
> On an empty query, Hive seems to be trying to resolve an iterate() method 
> with signature {null,null} instead of {null,double}. You can reproduce this 
> bug using:
> CREATE TABLE pct_test ( val INT );
> SELECT percentile(val, 0.5) FROM pct_test;
> which produces a lot of errors like: 
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
> execute method public boolean 
> org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.iterate(org.apache.hadoop.io.LongWritable,double)
>   on object 
> org.apache.hadoop.hive.ql.udf.udafpercentile$percentilelongevalua...@11d13272 
> of class org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator 
> with arguments {null, null} of size 2

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1674) count(*) returns wrong result when a mapper returns empty results

2010-09-30 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang reassigned HIVE-1674:


Assignee: Ning Zhang

> count(*) returns wrong result when a mapper returns empty results
> -
>
> Key: HIVE-1674
> URL: https://issues.apache.org/jira/browse/HIVE-1674
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
>
> select count(*) from src where false; will return # of mappers rather than 0. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1674) count(*) returns wrong result when a mapper returns empty results

2010-09-29 Thread Ning Zhang (JIRA)
count(*) returns wrong result when a mapper returns empty results
-

 Key: HIVE-1674
 URL: https://issues.apache.org/jira/browse/HIVE-1674
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang


select count(*) from src where false; will return # of mappers rather than 0. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1638) convert commonly used udfs to generic udfs

2010-09-29 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916256#action_12916256
 ] 

Ning Zhang commented on HIVE-1638:
--

Siying, great work!

Also can you do an optimization for the case when the parameters are constants 
(e.g., the 2nd parameter of f_c='5015'). The objectInspector doesn't have the 
information of whether the input parameter is constant or not, but I think if 
you check in evaluate() whether the parameter is the same *object* between the 
1st and 2nd row, you can conclude the parameter is a constant. This can save a 
lot in object constructions. 

> convert commonly used udfs to generic udfs
> --
>
> Key: HIVE-1638
> URL: https://issues.apache.org/jira/browse/HIVE-1638
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Siying Dong
> Attachments: HIVE-1638.1.patch
>
>
> Copying a mail from Joy:
> i did a little bit of profiling of a simple hive group by query today. i was 
> surprised to see that one of the most expensive functions were in converting 
> the equals udf (i had some simple string filters) to generic udfs. 
> (primitiveobjectinspectorconverter.textconverter)
> am i correct in thinking that the fix is to simply port some of the most 
> popular udfs (string equality/comparison etc.) to generic udsf?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1665) drop operations may cause file leak

2010-09-29 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916249#action_12916249
 ] 

Ning Zhang commented on HIVE-1665:
--

What about 2 failed and rolling back 1) also failed? This could happen if the 
CLI got killed at any time between 1) and 2). 

Another option is to use the traditional 'mark-then-delete' trick that you mark 
the partition as deleted on the metastore first and then clean up the data. In 
case of any failure, redoing the drop partiton will resume the data deletion 
process. It is also easier from the administrator's point of view that you can 
periodically check the metastore for deleted partitions (which are left 
uncommitted) and re-drop the partition. 

> drop operations may cause file leak
> ---
>
> Key: HIVE-1665
> URL: https://issues.apache.org/jira/browse/HIVE-1665
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: hive-1665.1.patch
>
>
> Right now when doing a drop, Hive first drops metadata and then drops the 
> actual files. If file system is down at that time, the files will keep not 
> deleted. 
> Had an offline discussion about this:
> to fix this, add a new conf "scratch dir" into hive conf. 
> when doing a drop operation:
> 1) move data to scratch directory
> 2) drop metadata
> 3) if 2) failed, roll back 1) and report error 3.1
> if 2) succeeded, drop data from scratch directory 3.2
> 4) if 3.2 fails, we are ok because we assume the scratch dir will be emptied 
> manually.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1524) parallel execution failed if mapred.job.name is set

2010-09-28 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang resolved HIVE-1524.
--

Release Note: Committed to branch 0.6. Thanks Yuanjun!
  Resolution: Fixed

> parallel execution failed if mapred.job.name is set
> ---
>
> Key: HIVE-1524
> URL: https://issues.apache.org/jira/browse/HIVE-1524
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.6.0, 0.7.0
>
> Attachments: HIVE-1524-for-Hive-0.6.patch, HIVE-1524.2.patch, 
> HIVE-1524.patch
>
>
> The plan file name was generated based on mapred.job.name. If the user 
> specify mapred.job.name before the query, two parallel queries will have 
> conflict plan file name. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-675) add database/schema support Hive QL

2010-09-28 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915990#action_12915990
 ] 

Ning Zhang commented on HIVE-675:
-

That works. Thanks Carl!

> add database/schema support Hive QL
> ---
>
> Key: HIVE-675
> URL: https://issues.apache.org/jira/browse/HIVE-675
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Carl Steinbach
> Fix For: 0.6.0, 0.7.0
>
> Attachments: hive-675-2009-9-16.patch, hive-675-2009-9-19.patch, 
> hive-675-2009-9-21.patch, hive-675-2009-9-23.patch, hive-675-2009-9-7.patch, 
> hive-675-2009-9-8.patch, HIVE-675-2010-08-16.patch.txt, 
> HIVE-675-2010-7-16.patch.txt, HIVE-675-2010-8-4.patch.txt, 
> HIVE-675-backport-v6.1.patch.txt, HIVE-675-backport-v6.2.patch.txt, 
> HIVE-675.10.patch.txt, HIVE-675.11.patch.txt, HIVE-675.12.patch.txt, 
> HIVE-675.13.patch.txt
>
>
> Currently all Hive tables reside in single namespace (default). Hive should 
> support multiple namespaces (databases or schemas) such that users can create 
> tables in their specific namespaces. These name spaces can have different 
> warehouse directories (with a default naming scheme) and possibly different 
> properties.
> There is already some support for this in metastore but Hive query parser 
> should have this feature as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1524) parallel execution failed if mapred.job.name is set

2010-09-28 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915949#action_12915949
 ] 

Ning Zhang commented on HIVE-1524:
--

Currently branch 0.6 is broken. It may be caused by HIVE-675 patch. I'll run 
test after that one is resolved. 

> parallel execution failed if mapred.job.name is set
> ---
>
> Key: HIVE-1524
> URL: https://issues.apache.org/jira/browse/HIVE-1524
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.6.0, 0.7.0
>
> Attachments: HIVE-1524-for-Hive-0.6.patch, HIVE-1524.2.patch, 
> HIVE-1524.patch
>
>
> The plan file name was generated based on mapred.job.name. If the user 
> specify mapred.job.name before the query, two parallel queries will have 
> conflict plan file name. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-675) add database/schema support Hive QL

2010-09-28 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915946#action_12915946
 ] 

Ning Zhang commented on HIVE-675:
-

Hi Carl, 

Branch 0.6 currently is broken when running a unit test. The error is as 
follows:


compile-test:
[javac] /data/users/nzhang/reviews/0.6/branch-0.6/build-common.xml:307: 
warning: 'includeantruntime' was not set, defaulting to 
build.sysclasspath=last; set to false for repeatable builds
[javac] Compiling 4 source files to 
/data/users/nzhang/reviews/0.6/branch-0.6/build/metastore/test/classes
[javac] 
/data/users/nzhang/reviews/0.6/branch-0.6/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStoreRemote.java:76:
 
partitionTester(org.apache.hadoop.hive.metastore.HiveMetaStoreClient,org.apache.hadoop.hive.conf.HiveConf)
 in org.apache.hadoop.hive.metastore.TestHiveMetaStore cannot be applied to 
(org.apache.hadoop.hive.metastore.HiveMetaStoreClient,org.apache.hadoop.hive.conf.HiveConf,boolean)
[javac] TestHiveMetaStore.partitionTester(client, hiveConf, true);
[javac]  ^
[javac] 1 error


It seems the last patch that touches TestHiveMetaStore and 
TestHiveMetaStoreRemote is this patch. Can you take a look? 

> add database/schema support Hive QL
> ---
>
> Key: HIVE-675
> URL: https://issues.apache.org/jira/browse/HIVE-675
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Carl Steinbach
> Fix For: 0.6.0, 0.7.0
>
> Attachments: hive-675-2009-9-16.patch, hive-675-2009-9-19.patch, 
> hive-675-2009-9-21.patch, hive-675-2009-9-23.patch, hive-675-2009-9-7.patch, 
> hive-675-2009-9-8.patch, HIVE-675-2010-08-16.patch.txt, 
> HIVE-675-2010-7-16.patch.txt, HIVE-675-2010-8-4.patch.txt, 
> HIVE-675-backport-v6.1.patch.txt, HIVE-675-backport-v6.2.patch.txt, 
> HIVE-675.10.patch.txt, HIVE-675.11.patch.txt, HIVE-675.12.patch.txt, 
> HIVE-675.13.patch.txt
>
>
> Currently all Hive tables reside in single namespace (default). Hive should 
> support multiple namespaces (databases or schemas) such that users can create 
> tables in their specific namespaces. These name spaces can have different 
> warehouse directories (with a default naming scheme) and possibly different 
> properties.
> There is already some support for this in metastore but Hive query parser 
> should have this feature as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1524) parallel execution failed if mapred.job.name is set

2010-09-28 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915651#action_12915651
 ] 

Ning Zhang commented on HIVE-1524:
--

Thanks for back porting to 0.6 yourchanges. The code changes look good. Can you 
include the other 2 files (parallel.q and parallel.q.out) in the patch as well? 


> parallel execution failed if mapred.job.name is set
> ---
>
> Key: HIVE-1524
> URL: https://issues.apache.org/jira/browse/HIVE-1524
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1524-for-Hive-0.6.patch, HIVE-1524.2.patch, 
> HIVE-1524.patch
>
>
> The plan file name was generated based on mapred.job.name. If the user 
> specify mapred.job.name before the query, two parallel queries will have 
> conflict plan file name. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1378) Return value for map, array, and struct needs to return a string

2010-09-27 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1378:
-

Status: Resolved  (was: Patch Available)
Resolution: Fixed

Committed. Thanks Steven!

> Return value for map, array, and struct needs to return a string 
> -
>
> Key: HIVE-1378
> URL: https://issues.apache.org/jira/browse/HIVE-1378
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Drivers
>Reporter: Jerome Boulon
>Assignee: Steven Wong
> Fix For: 0.7.0
>
> Attachments: HIVE-1378.1.patch, HIVE-1378.2.patch, HIVE-1378.3.patch, 
> HIVE-1378.4.patch, HIVE-1378.5.patch, HIVE-1378.6.patch, HIVE-1378.7.patch, 
> HIVE-1378.patch
>
>
> In order to be able to select/display any data from JDBC Hive driver, return 
> value for map, array, and struct needs to return a string

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1669) non-deterministic display of storage parameter in test

2010-09-27 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1669:
-

Parent: HIVE-1658
Issue Type: Sub-task  (was: Test)

> non-deterministic display of storage parameter in test
> --
>
> Key: HIVE-1669
> URL: https://issues.apache.org/jira/browse/HIVE-1669
> Project: Hadoop Hive
>  Issue Type: Sub-task
>Reporter: Ning Zhang
>
> With the change to beautify the 'desc extended table', the storage parameters 
> are displayed in non-deterministic manner (since its implementation is 
> HashMap). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1658) Fix describe [extended] column formatting

2010-09-27 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915437#action_12915437
 ] 

Ning Zhang commented on HIVE-1658:
--

Another issue is that now 'desc extended' displays table/partition parameters 
in different lines. Since parameters is using a unordered map implementation, 
it will give non-deterministic display of those parameters. It will be great if 
the pretty operator will take care of ordering as well.

> Fix describe [extended] column formatting
> -
>
> Key: HIVE-1658
> URL: https://issues.apache.org/jira/browse/HIVE-1658
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Paul Yang
>Assignee: Thiruvel Thirumoolan
>
> When displaying the column schema, the formatting should follow should be 
> nametypecomment
> to be inline with the previous formatting style for backward compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1582) merge mapfiles task behaves incorrectly for 'inserting overwrite directory...'

2010-09-27 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang resolved HIVE-1582.
--

Resolution: Not A Problem

Taked to Namit and Yongqiang, this is not a bug. INSERT OVERWRITE to (HDFS) 
directory should be merged as before. INSERT OVERWRITE LOCAL DIRECTORY cannot 
be merged and this is not the case. 

> merge mapfiles task behaves incorrectly for 'inserting overwrite directory...'
> --
>
> Key: HIVE-1582
> URL: https://issues.apache.org/jira/browse/HIVE-1582
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>
> hive> 
> > 
> > 
> >  SET hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
> hive>SET hive.exec.compress.output=false;
> hive>INSERT OVERWRITE DIRECTORY 'x'
> >  SELECT  from  a;
> Total MapReduce jobs = 2
> Launching Job 1 out of 2
> Number of reduce tasks is set to 0 since there's no reduce operator
> ..
> Ended Job = job_201008191557_54169
> Ended Job = 450290112, job is filtered out (removed at runtime).
> Launching Job 2 out of 2
> .
> the second job should not get started.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1670) MapJoin throws EOFExeption when the mapjoined table has 0 column selected

2010-09-24 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914764#action_12914764
 ] 

Ning Zhang commented on HIVE-1670:
--

Not sure whether this patch fixes that bug. Maybe they can try this patch with 
their query.

> MapJoin throws EOFExeption when the mapjoined table has 0 column selected
> -
>
> Key: HIVE-1670
> URL: https://issues.apache.org/jira/browse/HIVE-1670
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1670.patch
>
>
> select /*+mapjoin(b) */ sum(a.key) from src a join src b on (a.key=b.key); 
> throws EOFException

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string

2010-09-24 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914756#action_12914756
 ] 

Ning Zhang commented on HIVE-1378:
--

+1. testing.

> Return value for map, array, and struct needs to return a string 
> -
>
> Key: HIVE-1378
> URL: https://issues.apache.org/jira/browse/HIVE-1378
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Drivers
>Reporter: Jerome Boulon
>Assignee: Steven Wong
> Fix For: 0.7.0
>
> Attachments: HIVE-1378.1.patch, HIVE-1378.2.patch, HIVE-1378.3.patch, 
> HIVE-1378.4.patch, HIVE-1378.5.patch, HIVE-1378.6.patch, HIVE-1378.7.patch, 
> HIVE-1378.patch
>
>
> In order to be able to select/display any data from JDBC Hive driver, return 
> value for map, array, and struct needs to return a string

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1670) MapJoin throws EOFExeption when the mapjoined table has 0 column selected

2010-09-24 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1670:
-

Status: Patch Available  (was: Open)

> MapJoin throws EOFExeption when the mapjoined table has 0 column selected
> -
>
> Key: HIVE-1670
> URL: https://issues.apache.org/jira/browse/HIVE-1670
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1670.patch
>
>
> select /*+mapjoin(b) */ sum(a.key) from src a join src b on (a.key=b.key); 
> throws EOFException

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1670) MapJoin throws EOFExeption when the mapjoined table has 0 column selected

2010-09-24 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1670:
-

Attachment: HIVE-1670.patch

> MapJoin throws EOFExeption when the mapjoined table has 0 column selected
> -
>
> Key: HIVE-1670
> URL: https://issues.apache.org/jira/browse/HIVE-1670
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1670.patch
>
>
> select /*+mapjoin(b) */ sum(a.key) from src a join src b on (a.key=b.key); 
> throws EOFException

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1670) MapJoin throws EOFExeption when the mapjoined table has 0 column selected

2010-09-24 Thread Ning Zhang (JIRA)
MapJoin throws EOFExeption when the mapjoined table has 0 column selected
-

 Key: HIVE-1670
 URL: https://issues.apache.org/jira/browse/HIVE-1670
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang


select /*+mapjoin(b) */ sum(a.key) from src a join src b on (a.key=b.key); 
throws EOFException

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1669) non-deterministic display of storage parameter in test

2010-09-24 Thread Ning Zhang (JIRA)
non-deterministic display of storage parameter in test
--

 Key: HIVE-1669
 URL: https://issues.apache.org/jira/browse/HIVE-1669
 Project: Hadoop Hive
  Issue Type: Test
Reporter: Ning Zhang


With the change to beautify the 'desc extended table', the storage parameters 
are displayed in non-deterministic manner (since its implementation is 
HashMap). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1659) parse_url_tuple: a UDTF version of parse_url

2010-09-24 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1659:
-

Status: Resolved  (was: Patch Available)
Resolution: Fixed

Committed. Thanks Xing!

> parse_url_tuple:  a UDTF version of parse_url
> -
>
> Key: HIVE-1659
> URL: https://issues.apache.org/jira/browse/HIVE-1659
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.5.0
>Reporter: Ning Zhang
> Attachments: HIVE-1659.patch, HIVE-1659.patch2, HIVE-1659.patch3
>
>
> The UDF parse_url take s a URL, parse it and extract QUERY/PATH etc from it. 
> However it can only extract an atomic value from the URL. If we want to 
> extract multiple piece of information, we need to call the function many 
> times. It is desirable to parse the URL once and extract all needed 
> information and return a tuple in a UDTF. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1659) parse_url_tuple: a UDTF version of parse_url

2010-09-24 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914602#action_12914602
 ] 

Ning Zhang commented on HIVE-1659:
--

Xing, there is a diff in show_functions.q. You need to overwrite the .out file 
with the addition of the new function. The following command will update the 
out file. 

 ant test -Dtestcase=TestCliDriver -Dqfile=show_functions.q -Doverwrite=true

Can you regenerate the patch after that?

> parse_url_tuple:  a UDTF version of parse_url
> -
>
> Key: HIVE-1659
> URL: https://issues.apache.org/jira/browse/HIVE-1659
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.5.0
>Reporter: Ning Zhang
> Attachments: HIVE-1659.patch, HIVE-1659.patch2
>
>
> The UDF parse_url take s a URL, parse it and extract QUERY/PATH etc from it. 
> However it can only extract an atomic value from the URL. If we want to 
> extract multiple piece of information, we need to call the function many 
> times. It is desirable to parse the URL once and extract all needed 
> information and return a tuple in a UDTF. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string

2010-09-24 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914598#action_12914598
 ] 

Ning Zhang commented on HIVE-1378:
--

Before we decided to drop support for pre-0.20, we should have a separate JIRA 
to have a list of things that need to clean up: e.g., exclude downloading& 
building hadoop 0.17. 

In the mean time, the change in the patch to be pre-0.20 compatible should be 
minimum. Steven, can you take a look the code and see how much it required to 
be done to be compatible with 0.17?

> Return value for map, array, and struct needs to return a string 
> -
>
> Key: HIVE-1378
> URL: https://issues.apache.org/jira/browse/HIVE-1378
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Drivers
>Reporter: Jerome Boulon
>Assignee: Steven Wong
> Fix For: 0.7.0
>
> Attachments: HIVE-1378.1.patch, HIVE-1378.2.patch, HIVE-1378.3.patch, 
> HIVE-1378.4.patch, HIVE-1378.5.patch, HIVE-1378.6.patch, HIVE-1378.patch
>
>
> In order to be able to select/display any data from JDBC Hive driver, return 
> value for map, array, and struct needs to return a string

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1659) parse_url_tuple: a UDTF version of parse_url

2010-09-23 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914360#action_12914360
 ] 

Ning Zhang commented on HIVE-1659:
--

Also when you generate the patch, you need to run 'svn diff' at the "root" 
directory of the hive trunk. 

> parse_url_tuple:  a UDTF version of parse_url
> -
>
> Key: HIVE-1659
> URL: https://issues.apache.org/jira/browse/HIVE-1659
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.5.0
>Reporter: Ning Zhang
> Attachments: HIVE-1659.patch
>
>
> The UDF parse_url take s a URL, parse it and extract QUERY/PATH etc from it. 
> However it can only extract an atomic value from the URL. If we want to 
> extract multiple piece of information, we need to call the function many 
> times. It is desirable to parse the URL once and extract all needed 
> information and return a tuple in a UDTF. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1659) parse_url_tuple: a UDTF version of parse_url

2010-09-23 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914358#action_12914358
 ] 

Ning Zhang commented on HIVE-1659:
--

Xing, this patch doesn't apply cleanly with the latest trunk. Can you 'svn up' 
and regenerate the patch. You may need to resolve any conflicts after 'svn up'.

> parse_url_tuple:  a UDTF version of parse_url
> -
>
> Key: HIVE-1659
> URL: https://issues.apache.org/jira/browse/HIVE-1659
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.5.0
>Reporter: Ning Zhang
> Attachments: HIVE-1659.patch
>
>
> The UDF parse_url take s a URL, parse it and extract QUERY/PATH etc from it. 
> However it can only extract an atomic value from the URL. If we want to 
> extract multiple piece of information, we need to call the function many 
> times. It is desirable to parse the URL once and extract all needed 
> information and return a tuple in a UDTF. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string

2010-09-23 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914346#action_12914346
 ] 

Ning Zhang commented on HIVE-1378:
--

@john, should we run a survey on hive-user mailing list to see how many people 
are still using pre-0.20 hadoop before dropping the support? 

> Return value for map, array, and struct needs to return a string 
> -
>
> Key: HIVE-1378
> URL: https://issues.apache.org/jira/browse/HIVE-1378
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Drivers
>Reporter: Jerome Boulon
>Assignee: Steven Wong
> Fix For: 0.7.0
>
> Attachments: HIVE-1378.1.patch, HIVE-1378.2.patch, HIVE-1378.3.patch, 
> HIVE-1378.4.patch, HIVE-1378.5.patch, HIVE-1378.6.patch, HIVE-1378.patch
>
>
> In order to be able to select/display any data from JDBC Hive driver, return 
> value for map, array, and struct needs to return a string

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1659) parse_url_tuple: a UDTF version of parse_url

2010-09-23 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914337#action_12914337
 ] 

Ning Zhang commented on HIVE-1659:
--

Xing, the patch was not attached. Can you use the link "Attach file" in the 
left pane?


> parse_url_tuple:  a UDTF version of parse_url
> -
>
> Key: HIVE-1659
> URL: https://issues.apache.org/jira/browse/HIVE-1659
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.5.0
>Reporter: Ning Zhang
>
> The UDF parse_url take s a URL, parse it and extract QUERY/PATH etc from it. 
> However it can only extract an atomic value from the URL. If we want to 
> extract multiple piece of information, we need to call the function many 
> times. It is desirable to parse the URL once and extract all needed 
> information and return a tuple in a UDTF. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string

2010-09-23 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914335#action_12914335
 ] 

Ning Zhang commented on HIVE-1378:
--

Steven, tests passed for hadoop 0.20, but it failed to compile on hadoop 0.17 
(ant clean package -Dhadoop.version=0.17.2.1). Can you take a look?

> Return value for map, array, and struct needs to return a string 
> -
>
> Key: HIVE-1378
> URL: https://issues.apache.org/jira/browse/HIVE-1378
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Drivers
>Reporter: Jerome Boulon
>Assignee: Steven Wong
> Fix For: 0.7.0
>
> Attachments: HIVE-1378.1.patch, HIVE-1378.2.patch, HIVE-1378.3.patch, 
> HIVE-1378.4.patch, HIVE-1378.5.patch, HIVE-1378.6.patch, HIVE-1378.patch
>
>
> In order to be able to select/display any data from JDBC Hive driver, return 
> value for map, array, and struct needs to return a string

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string

2010-09-23 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914305#action_12914305
 ] 

Ning Zhang commented on HIVE-1378:
--

OK. This one applied cleanly. I'm starting testing. 

I think 'svn up' may be able to do more merging than 'patch'. I got the 
conflict on eclipse-templates/.classpath (it asked me whether I want to reverse 
apply) and another file. 

> Return value for map, array, and struct needs to return a string 
> -
>
> Key: HIVE-1378
> URL: https://issues.apache.org/jira/browse/HIVE-1378
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Drivers
>Reporter: Jerome Boulon
>Assignee: Steven Wong
> Fix For: 0.7.0
>
> Attachments: HIVE-1378.1.patch, HIVE-1378.2.patch, HIVE-1378.3.patch, 
> HIVE-1378.4.patch, HIVE-1378.5.patch, HIVE-1378.6.patch, HIVE-1378.patch
>
>
> In order to be able to select/display any data from JDBC Hive driver, return 
> value for map, array, and struct needs to return a string

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string

2010-09-23 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914282#action_12914282
 ] 

Ning Zhang commented on HIVE-1378:
--

Changes look good. However there are conflicts when applying to the latest 
trunk. Can you generate a new one against the latest trunk? I'll start testing 
once I got the new patch. 

> Return value for map, array, and struct needs to return a string 
> -
>
> Key: HIVE-1378
> URL: https://issues.apache.org/jira/browse/HIVE-1378
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Drivers
>Reporter: Jerome Boulon
>Assignee: Steven Wong
> Fix For: 0.7.0
>
> Attachments: HIVE-1378.1.patch, HIVE-1378.2.patch, HIVE-1378.3.patch, 
> HIVE-1378.4.patch, HIVE-1378.5.patch, HIVE-1378.patch
>
>
> In order to be able to select/display any data from JDBC Hive driver, return 
> value for map, array, and struct needs to return a string

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1361) table/partition level statistics

2010-09-23 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1361:
-

Attachment: HIVE-1361.5.java_only.patch
HIVE-1361.5.patch

uploading a new set of patches that resolves the conflicts with the latest 
commits. 

> table/partition level statistics
> 
>
> Key: HIVE-1361
> URL: https://issues.apache.org/jira/browse/HIVE-1361
> Project: Hadoop Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Ning Zhang
>Assignee: Ahmed M Aly
> Fix For: 0.7.0
>
> Attachments: HIVE-1361.2.patch, HIVE-1361.2_java_only.patch, 
> HIVE-1361.3.patch, HIVE-1361.4.java_only.patch, HIVE-1361.4.patch, 
> HIVE-1361.5.java_only.patch, HIVE-1361.5.patch, HIVE-1361.java_only.patch, 
> HIVE-1361.patch, stats0.patch
>
>
> At the first step, we gather table-level stats for non-partitioned table and 
> partition-level stats for partitioned table. Future work could extend the 
> table level stats to partitioned table as well. 
> There are 3 major milestones in this subtask: 
>  1) extend the insert statement to gather table/partition level stats 
> on-the-fly.
>  2) extend metastore API to support storing and retrieving stats for a 
> particular table/partition. 
>  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
> existing tables/partitions. 
> The proposed stats are:
> Partition-level stats: 
>   - number of rows
>   - total size in bytes
>   - number of files
>   - max, min, average row sizes
>   - max, min, average file sizes
> Table-level stats in addition to partition level stats:
>   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1361) table/partition level statistics

2010-09-22 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1361:
-

Attachment: HIVE-1361.4.java_only.patch
HIVE-1361.4.patch

Uploading new patch that refreshed to the latest trunk. Also added a negative 
test case analyze.q and some trivial clean up in Java code (removing commented 
out contents). 

> table/partition level statistics
> 
>
> Key: HIVE-1361
> URL: https://issues.apache.org/jira/browse/HIVE-1361
> Project: Hadoop Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Ning Zhang
>Assignee: Ahmed M Aly
> Fix For: 0.7.0
>
> Attachments: HIVE-1361.2.patch, HIVE-1361.2_java_only.patch, 
> HIVE-1361.3.patch, HIVE-1361.4.java_only.patch, HIVE-1361.4.patch, 
> HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch
>
>
> At the first step, we gather table-level stats for non-partitioned table and 
> partition-level stats for partitioned table. Future work could extend the 
> table level stats to partitioned table as well. 
> There are 3 major milestones in this subtask: 
>  1) extend the insert statement to gather table/partition level stats 
> on-the-fly.
>  2) extend metastore API to support storing and retrieving stats for a 
> particular table/partition. 
>  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
> existing tables/partitions. 
> The proposed stats are:
> Partition-level stats: 
>   - number of rows
>   - total size in bytes
>   - number of files
>   - max, min, average row sizes
>   - max, min, average file sizes
> Table-level stats in addition to partition level stats:
>   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1526) Hive should depend on a release version of Thrift

2010-09-22 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913843#action_12913843
 ] 

Ning Zhang commented on HIVE-1526:
--

The Hive ODBC code is dependent on Thrift as well. In particular the hive 
client and unixODBC libraries have to be linked with the new libthrift.so. Can 
you test if the ODBC code is compatible with the new thrift version?

> Hive should depend on a release version of Thrift
> -
>
> Key: HIVE-1526
> URL: https://issues.apache.org/jira/browse/HIVE-1526
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Reporter: Carl Steinbach
>Assignee: Todd Lipcon
> Attachments: hive-1526.txt, libfb303.jar, libthrift.jar
>
>
> Hive should depend on a release version of Thrift, and ideally it should use 
> Ivy to resolve this dependency.
> The Thrift folks are working on adding Thrift artifacts to a maven repository 
> here: https://issues.apache.org/jira/browse/THRIFT-363

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1361) table/partition level statistics

2010-09-22 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1361:
-

Attachment: HIVE-1361.3.patch

Updated HIVE-1361.3.patch.

> table/partition level statistics
> 
>
> Key: HIVE-1361
> URL: https://issues.apache.org/jira/browse/HIVE-1361
> Project: Hadoop Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Ning Zhang
>Assignee: Ahmed M Aly
> Fix For: 0.7.0
>
> Attachments: HIVE-1361.2.patch, HIVE-1361.2_java_only.patch, 
> HIVE-1361.3.patch, HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch
>
>
> At the first step, we gather table-level stats for non-partitioned table and 
> partition-level stats for partitioned table. Future work could extend the 
> table level stats to partitioned table as well. 
> There are 3 major milestones in this subtask: 
>  1) extend the insert statement to gather table/partition level stats 
> on-the-fly.
>  2) extend metastore API to support storing and retrieving stats for a 
> particular table/partition. 
>  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
> existing tables/partitions. 
> The proposed stats are:
> Partition-level stats: 
>   - number of rows
>   - total size in bytes
>   - number of files
>   - max, min, average row sizes
>   - max, min, average file sizes
> Table-level stats in addition to partition level stats:
>   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1361) table/partition level statistics

2010-09-22 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1361:
-

Attachment: (was: HIVE-1361.3.patch)

> table/partition level statistics
> 
>
> Key: HIVE-1361
> URL: https://issues.apache.org/jira/browse/HIVE-1361
> Project: Hadoop Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Ning Zhang
>Assignee: Ahmed M Aly
> Fix For: 0.7.0
>
> Attachments: HIVE-1361.2.patch, HIVE-1361.2_java_only.patch, 
> HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch
>
>
> At the first step, we gather table-level stats for non-partitioned table and 
> partition-level stats for partitioned table. Future work could extend the 
> table level stats to partitioned table as well. 
> There are 3 major milestones in this subtask: 
>  1) extend the insert statement to gather table/partition level stats 
> on-the-fly.
>  2) extend metastore API to support storing and retrieving stats for a 
> particular table/partition. 
>  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
> existing tables/partitions. 
> The proposed stats are:
> Partition-level stats: 
>   - number of rows
>   - total size in bytes
>   - number of files
>   - max, min, average row sizes
>   - max, min, average file sizes
> Table-level stats in addition to partition level stats:
>   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1658) Fix describe [extended] column formatting

2010-09-22 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913668#action_12913668
 ] 

Ning Zhang commented on HIVE-1658:
--

+1 on keeping the old format but add a "pretty operator" as the child of the 
explain, so that the execution plan for the EXPLAIN is an explain operator 
(with the old formatting) followed by an optional "pretty operator" taking the 
output and do further formatting. 

> Fix describe [extended] column formatting
> -
>
> Key: HIVE-1658
> URL: https://issues.apache.org/jira/browse/HIVE-1658
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Paul Yang
>Assignee: Thiruvel Thirumoolan
>
> When displaying the column schema, the formatting should follow should be 
> nametypecomment
> to be inline with the previous formatting style for backward compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1361) table/partition level statistics

2010-09-22 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1361:
-

Attachment: HIVE-1361.3.patch

Uploading HIVE-1361.3.patch which passes all tests on hadoop 0.20 &0.17. The 
only difference from the last patch is the log change in stats2.q.out.

> table/partition level statistics
> 
>
> Key: HIVE-1361
> URL: https://issues.apache.org/jira/browse/HIVE-1361
> Project: Hadoop Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Ning Zhang
>Assignee: Ahmed M Aly
> Fix For: 0.7.0
>
> Attachments: HIVE-1361.2.patch, HIVE-1361.2_java_only.patch, 
> HIVE-1361.3.patch, HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch
>
>
> At the first step, we gather table-level stats for non-partitioned table and 
> partition-level stats for partitioned table. Future work could extend the 
> table level stats to partitioned table as well. 
> There are 3 major milestones in this subtask: 
>  1) extend the insert statement to gather table/partition level stats 
> on-the-fly.
>  2) extend metastore API to support storing and retrieving stats for a 
> particular table/partition. 
>  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
> existing tables/partitions. 
> The proposed stats are:
> Partition-level stats: 
>   - number of rows
>   - total size in bytes
>   - number of files
>   - max, min, average row sizes
>   - max, min, average file sizes
> Table-level stats in addition to partition level stats:
>   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string

2010-09-21 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913338#action_12913338
 ] 

Ning Zhang commented on HIVE-1378:
--

Steven, there are conflicts when applying to the latest trunk. Can you 
regenerate the patch?

> Return value for map, array, and struct needs to return a string 
> -
>
> Key: HIVE-1378
> URL: https://issues.apache.org/jira/browse/HIVE-1378
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Drivers
>Reporter: Jerome Boulon
>Assignee: Steven Wong
> Fix For: 0.7.0
>
> Attachments: HIVE-1378.1.patch, HIVE-1378.2.patch, HIVE-1378.3.patch, 
> HIVE-1378.patch
>
>
> In order to be able to select/display any data from JDBC Hive driver, return 
> value for map, array, and struct needs to return a string

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1651) ScriptOperator should not forward any output to downstream operators if an exception is happened

2010-09-21 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913301#action_12913301
 ] 

Ning Zhang commented on HIVE-1651:
--

Discussed with Joydeep offline. The side effects of failed task should be 
cleaned after the job finished. _tmp* files are already taken care of in the 
current code base. The only side effect that need to be taken care of is the 
empty directories created by failed dynamic partition inserts. This issue is 
addressed in HIVE-1655. 


> ScriptOperator should not forward any output to downstream operators if an 
> exception is happened
> 
>
> Key: HIVE-1651
> URL: https://issues.apache.org/jira/browse/HIVE-1651
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1651.patch
>
>
> ScriptOperator spawns 2 threads for getting the stdout and stderr from the 
> script and then forward the output from stdout to downstream operators. In 
> case of any exceptions to the script (e.g., got killed), the ScriptOperator 
> got an exception and throw it to upstream operators until MapOperator got it 
> and call close(abort). Before the ScriptOperator.close() is called the script 
> output stream can still forward output to downstream operators. We should 
> terminate it immediately.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1361) table/partition level statistics

2010-09-21 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1361:
-

Status: Patch Available  (was: Open)

> table/partition level statistics
> 
>
> Key: HIVE-1361
> URL: https://issues.apache.org/jira/browse/HIVE-1361
> Project: Hadoop Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Ning Zhang
>Assignee: Ahmed M Aly
> Fix For: 0.7.0
>
> Attachments: HIVE-1361.2.patch, HIVE-1361.2_java_only.patch, 
> HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch
>
>
> At the first step, we gather table-level stats for non-partitioned table and 
> partition-level stats for partitioned table. Future work could extend the 
> table level stats to partitioned table as well. 
> There are 3 major milestones in this subtask: 
>  1) extend the insert statement to gather table/partition level stats 
> on-the-fly.
>  2) extend metastore API to support storing and retrieving stats for a 
> particular table/partition. 
>  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
> existing tables/partitions. 
> The proposed stats are:
> Partition-level stats: 
>   - number of rows
>   - total size in bytes
>   - number of files
>   - max, min, average row sizes
>   - max, min, average file sizes
> Table-level stats in addition to partition level stats:
>   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1361) table/partition level statistics

2010-09-21 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1361:
-

Attachment: HIVE-1361.2.patch
HIVE-1361.2_java_only.patch

Uploading a new patch (including a full version and a Java_only version 
including XML build files) for review. This is against the latest trunk.

The major changes from the last patch include: 
  1) Make JDBC update/insert/select using PreparedStatement(). 
  2) In HBase, use HTable.delete(ArrayList) to speed up delete, and 
flushCommit() to batch update. 
  3) Refactor StatsTask to put stats into PartitionStatistics and 
TableStatistics so that it is easier to add new stats later. 
  4) Move WriteEntity creation from StatsTask to compile-time.

 I'm running tests again after refreshed to the latest trunk.

> table/partition level statistics
> 
>
> Key: HIVE-1361
> URL: https://issues.apache.org/jira/browse/HIVE-1361
> Project: Hadoop Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Ning Zhang
>Assignee: Ahmed M Aly
> Fix For: 0.7.0
>
> Attachments: HIVE-1361.2.patch, HIVE-1361.2_java_only.patch, 
> HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch
>
>
> At the first step, we gather table-level stats for non-partitioned table and 
> partition-level stats for partitioned table. Future work could extend the 
> table level stats to partitioned table as well. 
> There are 3 major milestones in this subtask: 
>  1) extend the insert statement to gather table/partition level stats 
> on-the-fly.
>  2) extend metastore API to support storing and retrieving stats for a 
> particular table/partition. 
>  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
> existing tables/partitions. 
> The proposed stats are:
> Partition-level stats: 
>   - number of rows
>   - total size in bytes
>   - number of files
>   - max, min, average row sizes
>   - max, min, average file sizes
> Table-level stats in addition to partition level stats:
>   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1659) parse_url_tuple: a UDTF version of parse_url

2010-09-21 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913081#action_12913081
 ] 

Ning Zhang commented on HIVE-1659:
--

parse_url currently support 2 signatures: parse_url(fullurl, 
'[QUERY|PATH|HOST|...]') and parse_url(fullurl, 'QUERY', '[ref|sk|...]'). In 
parse_url_tuple, the syntax is consolidated as parse_url_tuple(fullurl, 'HOST', 
'PATH', 'QUERY:ref', 'QUERY:sk',...). 

> parse_url_tuple:  a UDTF version of parse_url
> -
>
> Key: HIVE-1659
> URL: https://issues.apache.org/jira/browse/HIVE-1659
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Ning Zhang
>
> The UDF parse_url take s a URL, parse it and extract QUERY/PATH etc from it. 
> However it can only extract an atomic value from the URL. If we want to 
> extract multiple piece of information, we need to call the function many 
> times. It is desirable to parse the URL once and extract all needed 
> information and return a tuple in a UDTF. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1659) parse_url_tuple: a UDTF version of parse_url

2010-09-21 Thread Ning Zhang (JIRA)
parse_url_tuple:  a UDTF version of parse_url
-

 Key: HIVE-1659
 URL: https://issues.apache.org/jira/browse/HIVE-1659
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Ning Zhang


The UDF parse_url take s a URL, parse it and extract QUERY/PATH etc from it. 
However it can only extract an atomic value from the URL. If we want to extract 
multiple piece of information, we need to call the function many times. It is 
desirable to parse the URL once and extract all needed information and return a 
tuple in a UDTF. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1609) Support partition filtering in metastore

2010-09-20 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912855#action_12912855
 ] 

Ning Zhang commented on HIVE-1609:
--

@namit, the Hive metastore already has the API to get all sub-partitions given 
a partial specification like you provided -- Hive.getPartitions(Table, 
partialPartSpec).  

> Support partition filtering in metastore
> 
>
> Key: HIVE-1609
> URL: https://issues.apache.org/jira/browse/HIVE-1609
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore
>Reporter: Ajay Kidave
>Assignee: Ajay Kidave
> Fix For: 0.7.0
>
> Attachments: hive_1609.patch, hive_1609_2.patch, hive_1609_3.patch
>
>
> The metastore needs to have support for returning a list of partitions based 
> on user specified filter conditions. This will be useful for tools which need 
> to do partition pruning. Howl is one such use case. The way partition pruning 
> is done during hive query execution need not be changed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string

2010-09-17 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910899#action_12910899
 ] 

Ning Zhang commented on HIVE-1378:
--

Looks good in general. I've left some minor comments in the cloudera's review 
board. I'm not sure if it could be replicated here, but if not, I'll copy them 
manually.

> Return value for map, array, and struct needs to return a string 
> -
>
> Key: HIVE-1378
> URL: https://issues.apache.org/jira/browse/HIVE-1378
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Drivers
>Reporter: Jerome Boulon
>Assignee: Steven Wong
> Fix For: 0.7.0
>
> Attachments: HIVE-1378.1.patch, HIVE-1378.patch
>
>
> In order to be able to select/display any data from JDBC Hive driver, return 
> value for map, array, and struct needs to return a string

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1655) Adding consistency check at jobClose() when committing dynamic partitions

2010-09-17 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1655:
-

Attachment: HIVE-1655.patch

> Adding consistency check at jobClose() when committing dynamic partitions
> -
>
> Key: HIVE-1655
> URL: https://issues.apache.org/jira/browse/HIVE-1655
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1655.patch
>
>
> In case of dynamic partition insert, FileSinkOperator generated a directory 
> for a new partition and the files in the directory is named with '_tmp*'. 
> When a task succeed, the file is renamed to remove the "_tmp", which 
> essentially implement the "commit" semantics. A lot of exceptions could 
> happen (process got killed, machine dies etc.) could left the _tmp files 
> exist in the DP directory. These _tmp files should be deleted ("rolled back") 
> at successful jobClose(). After the deletion, we should also delete any empty 
> directories.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1655) Adding consistency check at jobClose() when committing dynamic partitions

2010-09-17 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1655:
-

Status: Patch Available  (was: Open)

> Adding consistency check at jobClose() when committing dynamic partitions
> -
>
> Key: HIVE-1655
> URL: https://issues.apache.org/jira/browse/HIVE-1655
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1655.patch
>
>
> In case of dynamic partition insert, FileSinkOperator generated a directory 
> for a new partition and the files in the directory is named with '_tmp*'. 
> When a task succeed, the file is renamed to remove the "_tmp", which 
> essentially implement the "commit" semantics. A lot of exceptions could 
> happen (process got killed, machine dies etc.) could left the _tmp files 
> exist in the DP directory. These _tmp files should be deleted ("rolled back") 
> at successful jobClose(). After the deletion, we should also delete any empty 
> directories.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1655) Adding consistency check at jobClose() when committing dynamic partitions

2010-09-17 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910880#action_12910880
 ] 

Ning Zhang commented on HIVE-1655:
--

Actually the _tmp files are taken care of by FSPaths.commit() called at 
FileSinkOperator.close() and any missed _tmp* files are removed in jobClose() 
-> Utilities.removeTempOrDuplicateFiles(). The only missing piece is the remove 
the empty directories at jobClose().

> Adding consistency check at jobClose() when committing dynamic partitions
> -
>
> Key: HIVE-1655
> URL: https://issues.apache.org/jira/browse/HIVE-1655
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Ning Zhang
>
> In case of dynamic partition insert, FileSinkOperator generated a directory 
> for a new partition and the files in the directory is named with '_tmp*'. 
> When a task succeed, the file is renamed to remove the "_tmp", which 
> essentially implement the "commit" semantics. A lot of exceptions could 
> happen (process got killed, machine dies etc.) could left the _tmp files 
> exist in the DP directory. These _tmp files should be deleted ("rolled back") 
> at successful jobClose(). After the deletion, we should also delete any empty 
> directories.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1655) Adding consistency check at jobClose() when committing dynamic partitions

2010-09-17 Thread Ning Zhang (JIRA)
Adding consistency check at jobClose() when committing dynamic partitions
-

 Key: HIVE-1655
 URL: https://issues.apache.org/jira/browse/HIVE-1655
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang


In case of dynamic partition insert, FileSinkOperator generated a directory for 
a new partition and the files in the directory is named with '_tmp*'. When a 
task succeed, the file is renamed to remove the "_tmp", which essentially 
implement the "commit" semantics. A lot of exceptions could happen (process got 
killed, machine dies etc.) could left the _tmp files exist in the DP directory. 
These _tmp files should be deleted ("rolled back") at successful jobClose(). 
After the deletion, we should also delete any empty directories.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1651) ScriptOperator should not forward any output to downstream operators if an exception is happened

2010-09-17 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910834#action_12910834
 ] 

Ning Zhang commented on HIVE-1651:
--

@joydeep, the output file will not be committed if an exception occurred and 
close(abort=true) is called. This bug happened in a short time window after the 
exception occurred and before the close(abort) is called. Although the file got 
deleted, the dynamic partition insert already created a directory which later 
will be considered as an empty partition. 

> ScriptOperator should not forward any output to downstream operators if an 
> exception is happened
> 
>
> Key: HIVE-1651
> URL: https://issues.apache.org/jira/browse/HIVE-1651
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1651.patch
>
>
> ScriptOperator spawns 2 threads for getting the stdout and stderr from the 
> script and then forward the output from stdout to downstream operators. In 
> case of any exceptions to the script (e.g., got killed), the ScriptOperator 
> got an exception and throw it to upstream operators until MapOperator got it 
> and call close(abort). Before the ScriptOperator.close() is called the script 
> output stream can still forward output to downstream operators. We should 
> terminate it immediately.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string

2010-09-17 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910758#action_12910758
 ] 

Ning Zhang commented on HIVE-1378:
--

Taking a look now. 

> Return value for map, array, and struct needs to return a string 
> -
>
> Key: HIVE-1378
> URL: https://issues.apache.org/jira/browse/HIVE-1378
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Drivers
>Reporter: Jerome Boulon
>Assignee: Steven Wong
> Fix For: 0.7.0
>
> Attachments: HIVE-1378.1.patch, HIVE-1378.patch
>
>
> In order to be able to select/display any data from JDBC Hive driver, return 
> value for map, array, and struct needs to return a string

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1651) ScriptOperator should not forward any output to downstream operators if an exception is happened

2010-09-17 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1651:
-

Status: Patch Available  (was: Open)

> ScriptOperator should not forward any output to downstream operators if an 
> exception is happened
> 
>
> Key: HIVE-1651
> URL: https://issues.apache.org/jira/browse/HIVE-1651
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1651.patch
>
>
> ScriptOperator spawns 2 threads for getting the stdout and stderr from the 
> script and then forward the output from stdout to downstream operators. In 
> case of any exceptions to the script (e.g., got killed), the ScriptOperator 
> got an exception and throw it to upstream operators until MapOperator got it 
> and call close(abort). Before the ScriptOperator.close() is called the script 
> output stream can still forward output to downstream operators. We should 
> terminate it immediately.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1651) ScriptOperator should not forward any output to downstream operators if an exception is happened

2010-09-17 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1651:
-

Attachment: HIVE-1651.patch

> ScriptOperator should not forward any output to downstream operators if an 
> exception is happened
> 
>
> Key: HIVE-1651
> URL: https://issues.apache.org/jira/browse/HIVE-1651
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1651.patch
>
>
> ScriptOperator spawns 2 threads for getting the stdout and stderr from the 
> script and then forward the output from stdout to downstream operators. In 
> case of any exceptions to the script (e.g., got killed), the ScriptOperator 
> got an exception and throw it to upstream operators until MapOperator got it 
> and call close(abort). Before the ScriptOperator.close() is called the script 
> output stream can still forward output to downstream operators. We should 
> terminate it immediately.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1651) ScriptOperator should not forward any output to downstream operators if an exception is happened

2010-09-17 Thread Ning Zhang (JIRA)
ScriptOperator should not forward any output to downstream operators if an 
exception is happened


 Key: HIVE-1651
 URL: https://issues.apache.org/jira/browse/HIVE-1651
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang


ScriptOperator spawns 2 threads for getting the stdout and stderr from the 
script and then forward the output from stdout to downstream operators. In case 
of any exceptions to the script (e.g., got killed), the ScriptOperator got an 
exception and throw it to upstream operators until MapOperator got it and call 
close(abort). Before the ScriptOperator.close() is called the script output 
stream can still forward output to downstream operators. We should terminate it 
immediately.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string

2010-09-16 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910455#action_12910455
 ] 

Ning Zhang commented on HIVE-1378:
--

Steven, there are conflicts when applying this patch. Can you regenerate it?

> Return value for map, array, and struct needs to return a string 
> -
>
> Key: HIVE-1378
> URL: https://issues.apache.org/jira/browse/HIVE-1378
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Drivers
>Reporter: Jerome Boulon
>Assignee: Steven Wong
> Fix For: 0.7.0
>
> Attachments: HIVE-1378.patch
>
>
> In order to be able to select/display any data from JDBC Hive driver, return 
> value for map, array, and struct needs to return a string

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string

2010-09-16 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910444#action_12910444
 ] 

Ning Zhang commented on HIVE-1378:
--

Will take a look.

> Return value for map, array, and struct needs to return a string 
> -
>
> Key: HIVE-1378
> URL: https://issues.apache.org/jira/browse/HIVE-1378
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Drivers
>Reporter: Jerome Boulon
>Assignee: Steven Wong
> Fix For: 0.7.0
>
> Attachments: HIVE-1378.patch
>
>
> In order to be able to select/display any data from JDBC Hive driver, return 
> value for map, array, and struct needs to return a string

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1361) table/partition level statistics

2010-09-16 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1361:
-

Status: Patch Available  (was: Open)

> table/partition level statistics
> 
>
> Key: HIVE-1361
> URL: https://issues.apache.org/jira/browse/HIVE-1361
> Project: Hadoop Hive
>  Issue Type: Sub-task
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: Ahmed M Aly
> Attachments: HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch
>
>
> At the first step, we gather table-level stats for non-partitioned table and 
> partition-level stats for partitioned table. Future work could extend the 
> table level stats to partitioned table as well. 
> There are 3 major milestones in this subtask: 
>  1) extend the insert statement to gather table/partition level stats 
> on-the-fly.
>  2) extend metastore API to support storing and retrieving stats for a 
> particular table/partition. 
>  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
> existing tables/partitions. 
> The proposed stats are:
> Partition-level stats: 
>   - number of rows
>   - total size in bytes
>   - number of files
>   - max, min, average row sizes
>   - max, min, average file sizes
> Table-level stats in addition to partition level stats:
>   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1648) Automatically gathering stats when reading a table/partition

2010-09-16 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1648:
-

Parent: HIVE-33
Issue Type: Sub-task  (was: New Feature)

> Automatically gathering stats when reading a table/partition
> 
>
> Key: HIVE-1648
> URL: https://issues.apache.org/jira/browse/HIVE-1648
> Project: Hadoop Hive
>  Issue Type: Sub-task
>Reporter: Ning Zhang
>
> HIVE-1361 introduces a new command 'ANALYZE TABLE T COMPUTE STATISTICS' to 
> gathering stats. This requires additional scan of the data. Stats gathering 
> can be piggy-backed on TableScanOperator whenever a table/partition is 
> scanned (given not LIMIT operator). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1648) Automatically gathering stats when reading a table/partition

2010-09-16 Thread Ning Zhang (JIRA)
Automatically gathering stats when reading a table/partition


 Key: HIVE-1648
 URL: https://issues.apache.org/jira/browse/HIVE-1648
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Ning Zhang


HIVE-1361 introduces a new command 'ANALYZE TABLE T COMPUTE STATISTICS' to 
gathering stats. This requires additional scan of the data. Stats gathering can 
be piggy-backed on TableScanOperator whenever a table/partition is scanned 
(given not LIMIT operator). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-33) [Hive]: Add ability to compute statistics on hive tables

2010-09-16 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-33?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910358#action_12910358
 ] 

Ning Zhang commented on HIVE-33:


Patches for HIVE-1361 are ready for review. Comments are welcome!

> [Hive]: Add ability to compute statistics on hive tables
> 
>
> Key: HIVE-33
> URL: https://issues.apache.org/jira/browse/HIVE-33
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Ashish Thusoo
>Assignee: Ahmed M Aly
>
> Add commands to collect partition and column level statistics in hive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1361) table/partition level statistics

2010-09-16 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1361:
-

Attachment: HIVE-1361.patch
HIVE-1361.java_only.patch

Uploading a full version (HIVE-1361.patch) and a Java code only version 
(HIVE-1361.java_only.patch). 

This patch is based on Ahmed's previous patch and implements the following 
feature:
  1) automatically gather stats (number of rows currently) whenever an INSERT 
OVERWRITE TABLE is issued. Each mapper/reducer push their partial stats to 
either MySQL/Derby through JDBC or HBase. The INSERT OVERWRITE statement could 
be anything include dynamic partition insert, multi-table inserts and inserting 
to bucketized partitions. A StatsTask is responsible for aggregating partial 
stats at the end of the query and update the metastore.
  2) The stats of a table/partition can be exposed to the user by 'DESC 
EXTENDED' to the table/partition. They are stored as the storage parameters 
(numRows, nuFiles, numPartitions). 
  3) Introducing a new command 'ANALYZE TABLE [PARTITION (PARTITION SPEC)] 
COMPUTE STATISTICS' to scan the table/partition and gather stats in a similar 
fashion as INSERT OVERWRITE command except that the plan has only 1 MR job 
consisting a TableScanOperator and a StatsTask. Partition spec could be full 
partition spec or partial partition spec similar to what dynamic partition 
insert uses. This allows the user to analyze a subset/all partitions of a 
table. The resulting stats are stored in the same parameter in the meatstore.

Tested locally (unit tests) for JDBC:derby, hbase and on a cluster with 
JDBC:MySQL. 

Will run the full unit tests again. 

> table/partition level statistics
> 
>
> Key: HIVE-1361
> URL: https://issues.apache.org/jira/browse/HIVE-1361
> Project: Hadoop Hive
>  Issue Type: Sub-task
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: Ahmed M Aly
> Attachments: HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch
>
>
> At the first step, we gather table-level stats for non-partitioned table and 
> partition-level stats for partitioned table. Future work could extend the 
> table level stats to partitioned table as well. 
> There are 3 major milestones in this subtask: 
>  1) extend the insert statement to gather table/partition level stats 
> on-the-fly.
>  2) extend metastore API to support storing and retrieving stats for a 
> particular table/partition. 
>  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
> existing tables/partitions. 
> The proposed stats are:
> Partition-level stats: 
>   - number of rows
>   - total size in bytes
>   - number of files
>   - max, min, average row sizes
>   - max, min, average file sizes
> Table-level stats in addition to partition level stats:
>   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1639) ExecDriver.addInputPaths() error if partition name contains a comma

2010-09-15 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1639:
-

Attachment: HIVE-1639.2.patch

Added a test case.

> ExecDriver.addInputPaths() error if partition name contains a comma
> ---
>
> Key: HIVE-1639
> URL: https://issues.apache.org/jira/browse/HIVE-1639
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1639.2.patch, HIVE-1639.patch
>
>
> The ExecDriver.addInputPaths() calls FileInputFormat.addPaths(), which takes 
> a comma-separated string representing a set of paths. If the path name of a 
> input file contains a comma, this code throw an exception: 
> java.lang.IllegalArgumentException: Can not create a Path from an empty 
> string.
> Instead of calling FileInputFormat.addPaths(), ExecDriver.addInputPaths 
> should iterate all paths and call FileInputFormat.addInputPath. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode

2010-09-15 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909943#action_12909943
 ] 

Ning Zhang commented on HIVE-1570:
--

Joy, scriptfile1.q actually failed on TestMinimrCliDriver with the command 

ant test -Dhadoop.version=0.20.0 -Dtestcase=TestMinimrCliDriver 
-Dminimr.query.files=scriptfile1.q

It gives NPE on ExecDriver.java:625. This NPE is a different issue and it can 
be solved by changing 'conf' to 'job'. But even after this change the NPE is 
gone and the test still failed. Should we move this test outside 
minimr.query.files for now before this JIRA is fixed?

> referencing an added file by it's name in a transform script does not work in 
> hive local mode
> -
>
> Key: HIVE-1570
> URL: https://issues.apache.org/jira/browse/HIVE-1570
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
>
> Yongqiang tried this and it fails in local mode:
> add file ../data/scripts/dumpdata_script.py;
> select count(distinct subq.key) from
> (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key 
> = 10) subq;
> this needs to be fixed because it means we cannot choose local mode 
> automatically in case of transform scripts (since different paths need to be 
> used for cluster vs. local mode execution)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1639) ExecDriver.addInputPaths() error if partition name contains a comma

2010-09-15 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1639:
-

Status: Patch Available  (was: Open)

> ExecDriver.addInputPaths() error if partition name contains a comma
> ---
>
> Key: HIVE-1639
> URL: https://issues.apache.org/jira/browse/HIVE-1639
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1639.patch
>
>
> The ExecDriver.addInputPaths() calls FileInputFormat.addPaths(), which takes 
> a comma-separated string representing a set of paths. If the path name of a 
> input file contains a comma, this code throw an exception: 
> java.lang.IllegalArgumentException: Can not create a Path from an empty 
> string.
> Instead of calling FileInputFormat.addPaths(), ExecDriver.addInputPaths 
> should iterate all paths and call FileInputFormat.addInputPath. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1639) ExecDriver.addInputPaths() error if partition name contains a comma

2010-09-15 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1639:
-

Attachment: HIVE-1639.patch

> ExecDriver.addInputPaths() error if partition name contains a comma
> ---
>
> Key: HIVE-1639
> URL: https://issues.apache.org/jira/browse/HIVE-1639
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1639.patch
>
>
> The ExecDriver.addInputPaths() calls FileInputFormat.addPaths(), which takes 
> a comma-separated string representing a set of paths. If the path name of a 
> input file contains a comma, this code throw an exception: 
> java.lang.IllegalArgumentException: Can not create a Path from an empty 
> string.
> Instead of calling FileInputFormat.addPaths(), ExecDriver.addInputPaths 
> should iterate all paths and call FileInputFormat.addInputPath. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1639) ExecDriver.addInputPaths() error if partition name contains a comma

2010-09-15 Thread Ning Zhang (JIRA)
ExecDriver.addInputPaths() error if partition name contains a comma
---

 Key: HIVE-1639
 URL: https://issues.apache.org/jira/browse/HIVE-1639
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang


The ExecDriver.addInputPaths() calls FileInputFormat.addPaths(), which takes a 
comma-separated string representing a set of paths. If the path name of a input 
file contains a comma, this code throw an exception: 
java.lang.IllegalArgumentException: Can not create a Path from an empty string.

Instead of calling FileInputFormat.addPaths(), ExecDriver.addInputPaths should 
iterate all paths and call FileInputFormat.addInputPath. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1629) Patch to fix hashCode method in DoubleWritable class

2010-09-12 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908639#action_12908639
 ] 

Ning Zhang commented on HIVE-1629:
--

Good question John. I think this patch doesn't affect bucketing, which is 
implemented using ObjectInspectorUtils.hashCode(). Actually the hash function 
used there for Double is the same as the one provided in this patch. But I'll 
double check with Zheng/Namit tomorrow. 

> Patch to fix hashCode method in DoubleWritable class
> 
>
> Key: HIVE-1629
> URL: https://issues.apache.org/jira/browse/HIVE-1629
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Vaibhav Aggarwal
>Assignee: Vaibhav Aggarwal
> Fix For: 0.7.0
>
> Attachments: HIVE-1629.patch
>
>
> A patch to fix the hashCode() method of DoubleWritable class of Hive.
> It prevents the HashMap (of type DoubleWritable) from behaving as LinkedList.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1629) Patch to fix hashCode method in DoubleWritable class

2010-09-12 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang resolved HIVE-1629.
--

Fix Version/s: 0.7.0
   Resolution: Fixed

Committed. Thanks Vaibhav!

> Patch to fix hashCode method in DoubleWritable class
> 
>
> Key: HIVE-1629
> URL: https://issues.apache.org/jira/browse/HIVE-1629
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Vaibhav Aggarwal
>Assignee: Vaibhav Aggarwal
> Fix For: 0.7.0
>
> Attachments: HIVE-1629.patch
>
>
> A patch to fix the hashCode() method of DoubleWritable class of Hive.
> It prevents the HashMap (of type DoubleWritable) from behaving as LinkedList.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1622) Use CombineHiveInputFormat for the merge job if hive.merge.mapredfiles=true

2010-09-12 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1622:
-

Attachment: HIVE-1622_0.17.patch

oops, forgot the patch hadoop 0.17 logs. 

> Use CombineHiveInputFormat for the merge job if hive.merge.mapredfiles=true
> ---
>
> Key: HIVE-1622
> URL: https://issues.apache.org/jira/browse/HIVE-1622
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1622.patch, HIVE-1622_0.17.patch
>
>
> Currently map-only merge (using CombineHiveInputFormat) is only enabled for 
> merging files generated by mappers. It should be used for files generated at 
> readers as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1629) Patch to fix hashCode method in DoubleWritable class

2010-09-12 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908599#action_12908599
 ] 

Ning Zhang commented on HIVE-1629:
--

+1 Will commit if tests pass.

> Patch to fix hashCode method in DoubleWritable class
> 
>
> Key: HIVE-1629
> URL: https://issues.apache.org/jira/browse/HIVE-1629
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Vaibhav Aggarwal
> Attachments: HIVE-1629.patch
>
>
> A patch to fix the hashCode() method of DoubleWritable class of Hive.
> It prevents the HashMap (of type DoubleWritable) from behaving as LinkedList.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1629) Patch to fix hashCode method in DoubleWritable class

2010-09-12 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang reassigned HIVE-1629:


Assignee: Vaibhav Aggarwal

> Patch to fix hashCode method in DoubleWritable class
> 
>
> Key: HIVE-1629
> URL: https://issues.apache.org/jira/browse/HIVE-1629
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Vaibhav Aggarwal
>Assignee: Vaibhav Aggarwal
> Attachments: HIVE-1629.patch
>
>
> A patch to fix the hashCode() method of DoubleWritable class of Hive.
> It prevents the HashMap (of type DoubleWritable) from behaving as LinkedList.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1629) Patch to fix hashCode method in DoubleWritable class

2010-09-10 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908194#action_12908194
 ] 

Ning Zhang commented on HIVE-1629:
--

+long v = Double.doubleToLongBits(value);
+return (int) (v ^ (v >>> 32));

won't this return 0 for all long values less than 2^32?

Search on the web and it seems the following 64 bit to 32 bit hash is a good one

http://www.cris.com/~ttwang/tech/inthash.htm

> Patch to fix hashCode method in DoubleWritable class
> 
>
> Key: HIVE-1629
> URL: https://issues.apache.org/jira/browse/HIVE-1629
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Vaibhav Aggarwal
> Attachments: HIVE-1629.patch
>
>
> A patch to fix the hashCode() method of DoubleWritable class of Hive.
> It prevents the HashMap (of type DoubleWritable) from behaving as LinkedList.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1622) Use CombineHiveInputFormat for the merge job if hive.merge.mapredfiles=true

2010-09-08 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1622:
-

Status: Patch Available  (was: Open)

> Use CombineHiveInputFormat for the merge job if hive.merge.mapredfiles=true
> ---
>
> Key: HIVE-1622
> URL: https://issues.apache.org/jira/browse/HIVE-1622
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1622.patch
>
>
> Currently map-only merge (using CombineHiveInputFormat) is only enabled for 
> merging files generated by mappers. It should be used for files generated at 
> readers as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1622) Use CombineHiveInputFormat for the merge job if hive.merge.mapredfiles=true

2010-09-08 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1622:
-

Attachment: HIVE-1622.patch

> Use CombineHiveInputFormat for the merge job if hive.merge.mapredfiles=true
> ---
>
> Key: HIVE-1622
> URL: https://issues.apache.org/jira/browse/HIVE-1622
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1622.patch
>
>
> Currently map-only merge (using CombineHiveInputFormat) is only enabled for 
> merging files generated by mappers. It should be used for files generated at 
> readers as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1622) Use CombineHiveInputFormat for the merge job if hive.merge.mapredfiles=true

2010-09-08 Thread Ning Zhang (JIRA)
Use CombineHiveInputFormat for the merge job if hive.merge.mapredfiles=true
---

 Key: HIVE-1622
 URL: https://issues.apache.org/jira/browse/HIVE-1622
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang


Currently map-only merge (using CombineHiveInputFormat) is only enabled for 
merging files generated by mappers. It should be used for files generated at 
readers as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1614) UDTF json_tuple should return null row when input is not a valid JSON string

2010-09-03 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1614:
-

Attachment: HIVE-1614.2.patch

Added a catch for all throwable in the UDFT.

> UDTF json_tuple should return null row when input is not a valid JSON string
> 
>
> Key: HIVE-1614
> URL: https://issues.apache.org/jira/browse/HIVE-1614
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1614.2.patch, HIVE-1614.patch
>
>
> If the input column is not a valid JSON string, json_tuple will not return 
> anything but this will prevent the downstream operators to access the 
> left-hand side table. We should output a NULL row instead, similar to when 
> the input column is a NULL value. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1614) UDTF json_tuple should return null row when input is not a valid JSON string

2010-09-03 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1614:
-

   Status: Patch Available  (was: Open)
Affects Version/s: 0.7.0

> UDTF json_tuple should return null row when input is not a valid JSON string
> 
>
> Key: HIVE-1614
> URL: https://issues.apache.org/jira/browse/HIVE-1614
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1614.patch
>
>
> If the input column is not a valid JSON string, json_tuple will not return 
> anything but this will prevent the downstream operators to access the 
> left-hand side table. We should output a NULL row instead, similar to when 
> the input column is a NULL value. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1614) UDTF json_tuple should return null row when input is not a valid JSON string

2010-09-03 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1614:
-

Attachment: HIVE-1614.patch

> UDTF json_tuple should return null row when input is not a valid JSON string
> 
>
> Key: HIVE-1614
> URL: https://issues.apache.org/jira/browse/HIVE-1614
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1614.patch
>
>
> If the input column is not a valid JSON string, json_tuple will not return 
> anything but this will prevent the downstream operators to access the 
> left-hand side table. We should output a NULL row instead, similar to when 
> the input column is a NULL value. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1614) UDTF json_tuple should return null row when input is not a valid JSON string

2010-09-03 Thread Ning Zhang (JIRA)
UDTF json_tuple should return null row when input is not a valid JSON string


 Key: HIVE-1614
 URL: https://issues.apache.org/jira/browse/HIVE-1614
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang


If the input column is not a valid JSON string, json_tuple will not return 
anything but this will prevent the downstream operators to access the left-hand 
side table. We should output a NULL row instead, similar to when the input 
column is a NULL value. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1467) dynamic partitioning should cluster by partitions

2010-09-02 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905700#action_12905700
 ] 

Ning Zhang commented on HIVE-1467:
--

As discussed with Joydeep and Ashish, it seems we should use the "distribute 
by" mechanism rather than "cluster by" to avoid sorting at the reducer side. 
The difference between them is "distribute by" only have MapReduce partition 
columns set to be the Dyanmic partition columns, and "cluster by" will 
additionally set "key columns" as the dynamic partition columns as well.

So I think we can use 2 mode of reducer-side DP with tradeoffs:
  -- distribute by mode: no sorting but reducers have to keep all files open 
during DP insert. Good choice when there are large amount of data passed from 
mappers to reducers.
  -- cluster by mode: sorting by the DP columns, but we can close a DP file 
once FileSinkOperator sees a dfferent DP column value. Good choice when total 
data size is not that large but there are large number of DPs generated.  

> dynamic partitioning should cluster by partitions
> -
>
> Key: HIVE-1467
> URL: https://issues.apache.org/jira/browse/HIVE-1467
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Joydeep Sen Sarma
>Assignee: Namit Jain
>
> (based on internal discussion with Ning). Dynamic partitioning should offer a 
> mode where it clusters data by partition before writing out to each 
> partition. This will reduce number of files. Details:
> 1. always use reducer stage
> 2. mapper sends to reducer based on partitioning column. ie. reducer = 
> f(partition-cols)
> 3. f() can be made somewhat smart to:
>a. spread large partitions across multiple reducers - each mapper can 
> maintain row count seen per partition - and then apply (whenever it sees a 
> new row for a partition): 
>* reducer = (row count / 64k) % numReducers 
>Small partitions always go to one reducer. the larger the partition, 
> the more the reducers. this prevents one reducer becoming bottleneck writing 
> out one partition
>b. this still leaves the issue of very large number of splits. (64K rows 
> from 10K mappers is pretty large). for this one can apply one slight 
> modification:
>* reducer = (mapper-id/1024 + row-count/64k) % numReducers
>ie. - the first 1000 mappers always send the first 64K rows for one 
> partition to the same reducer. the next 1000 send it to the next one. and so 
> on.
> the constants 1024 and 64k are used just as an example. i don't know what the 
> right numbers are. it's also clear that this is a case where we need hadoop 
> to do only partitioning (and no sorting). this will be a useful feature to 
> have in hadoop. that will reduce the overhead due to reducers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1598) use SequenceFile rather than TextFile format for hive query results

2010-08-31 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1598:
-

Attachment: HIVE-1598.2.patch

Attached the test case and also removed some debugging info. These are the only 
changes. 

> use SequenceFile rather than TextFile format for hive query results
> ---
>
> Key: HIVE-1598
> URL: https://issues.apache.org/jira/browse/HIVE-1598
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1598.2.patch, HIVE-1598.patch
>
>
> Hive query's result is written to a temporary directory first and then 
> FetchTask takes the files and display it to the users. Currently the file 
> format used for the resulting file is TextFile format. This could cause 
> incorrect result display if some string typed column contains new lines, 
> which are used as record delimiters in TextInputFormat. Switching to 
> SequenceFile format will solve this problem. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1598) use SequenceFile rather than TextFile format for hive query results

2010-08-31 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1598:
-

Attachment: HIVE-1598.patch

This patch only add support for using SequenceFile as query result. There are 
still questions on whether we should use it for script operator or not. Will 
open another JIRA if needed.

> use SequenceFile rather than TextFile format for hive query results
> ---
>
> Key: HIVE-1598
> URL: https://issues.apache.org/jira/browse/HIVE-1598
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1598.patch
>
>
> Hive query's result is written to a temporary directory first and then 
> FetchTask takes the files and display it to the users. Currently the file 
> format used for the resulting file is TextFile format. This could cause 
> incorrect result display if some string typed column contains new lines, 
> which are used as record delimiters in TextInputFormat. Switching to 
> SequenceFile format will solve this problem. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1598) use SequenceFile rather than TextFile format for hive query results

2010-08-31 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1598:
-

Status: Patch Available  (was: Open)

all 0.17 & 0.20 tests passed.

> use SequenceFile rather than TextFile format for hive query results
> ---
>
> Key: HIVE-1598
> URL: https://issues.apache.org/jira/browse/HIVE-1598
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1598.patch
>
>
> Hive query's result is written to a temporary directory first and then 
> FetchTask takes the files and display it to the users. Currently the file 
> format used for the resulting file is TextFile format. This could cause 
> incorrect result display if some string typed column contains new lines, 
> which are used as record delimiters in TextInputFormat. Switching to 
> SequenceFile format will solve this problem. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



  1   2   3   4   5   6   7   8   >