[jira] Assigned: (HIVE-1501) when generating reentrant INSERT for index rebuild, quote identifiers using backticks

2010-10-04 Thread Russell Melick (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Melick reassigned HIVE-1501:


Assignee: Skye Berghel  (was: Russell Melick)

> when generating reentrant INSERT for index rebuild, quote identifiers using 
> backticks
> -
>
> Key: HIVE-1501
> URL: https://issues.apache.org/jira/browse/HIVE-1501
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Indexing
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: Skye Berghel
> Fix For: 0.7.0
>
>
> Yongqiang, you mentioned that you weren't able to do this due to SORT BY not 
> accepting them.  The SORT BY is gone now as of HIVE-1494 (and SORT BY needs 
> to be fixed anyway).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1674) count(*) returns wrong result when a mapper returns empty results

2010-10-04 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917856#action_12917856
 ] 

He Yongqiang commented on HIVE-1674:


will take a look.

> count(*) returns wrong result when a mapper returns empty results
> -
>
> Key: HIVE-1674
> URL: https://issues.apache.org/jira/browse/HIVE-1674
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1674.patch
>
>
> select count(*) from src where false; will return # of mappers rather than 0. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1546) Ability to plug custom Semantic Analyzers for Hive Grammar

2010-10-04 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917851#action_12917851
 ] 

Ashutosh Chauhan commented on HIVE-1546:


I did it in junit form because John suggested it that way in his earlier 
comment:

{quote}
* We need a test for loading a variation on the default semantic analyzer 
in order to exercise the pluggable configuration. You can create a subclass of 
the default analyzer (under ql/src/test/org/apache/hadoop/hive/ql/parse) to 
inject some mock behavior change.
{quote}

I also feel junit test is better suited for this kind of behavioral testing of 
code paths (which exercises interface points) rather then forcing through 
string comparison ways of test/queries/*  which are more end-to-end tests for 
hive. Further if we add dummy hook name in data/conf/hive-site.xml then that 
dummy hook will get loaded and all the subsequent tests will have it too. Do we 
want it that way?

> Ability to plug custom Semantic Analyzers for Hive Grammar
> --
>
> Key: HIVE-1546
> URL: https://issues.apache.org/jira/browse/HIVE-1546
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 0.7.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 0.7.0
>
> Attachments: hive-1546-3.patch, hive-1546-4.patch, hive-1546.patch, 
> hive-1546_2.patch, hooks.patch, Howl_Semantic_Analysis.txt
>
>
> It will be useful if Semantic Analysis phase is made pluggable such that 
> other projects can do custom analysis of hive queries before doing metastore 
> operations on them. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode

2010-10-04 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1570:


Status: Patch Available  (was: Open)

> referencing an added file by it's name in a transform script does not work in 
> hive local mode
> -
>
> Key: HIVE-1570
> URL: https://issues.apache.org/jira/browse/HIVE-1570
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
> Attachments: 1570.1.patch, 1570.2.patch
>
>
> Yongqiang tried this and it fails in local mode:
> add file ../data/scripts/dumpdata_script.py;
> select count(distinct subq.key) from
> (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key 
> = 10) subq;
> this needs to be fixed because it means we cannot choose local mode 
> automatically in case of transform scripts (since different paths need to be 
> used for cluster vs. local mode execution)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode

2010-10-04 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1570:


Attachment: 1570.2.patch

working patch. no need for new test. had to modify some other tests to use 'add 
file'.

> referencing an added file by it's name in a transform script does not work in 
> hive local mode
> -
>
> Key: HIVE-1570
> URL: https://issues.apache.org/jira/browse/HIVE-1570
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
> Attachments: 1570.1.patch, 1570.2.patch
>
>
> Yongqiang tried this and it fails in local mode:
> add file ../data/scripts/dumpdata_script.py;
> select count(distinct subq.key) from
> (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key 
> = 10) subq;
> this needs to be fixed because it means we cannot choose local mode 
> automatically in case of transform scripts (since different paths need to be 
> used for cluster vs. local mode execution)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode

2010-10-04 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1570:


Attachment: 1570.1.patch

before running a map-reduce job in local mode we:
1. set a new working directory
2. symlink all added files from that working directory

this is pretty much identical to how hadoop sets up task execution environment. 
all references to scripts and add files using their names only now resolve 
correctly in local mode.

there was some hacky code in SemanticAnalyzer.java to deal with this that 
doesn't work in all cases (when referenced file is not the first item in 
command line or in automatic local mode). i have deleted it.

duplicated one of the tests so that we get coverage against a real cluster 
(scriptfile1.q executed against minimr) and local mode (scriptfile2.q).

still running tests.

> referencing an added file by it's name in a transform script does not work in 
> hive local mode
> -
>
> Key: HIVE-1570
> URL: https://issues.apache.org/jira/browse/HIVE-1570
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
> Attachments: 1570.1.patch
>
>
> Yongqiang tried this and it fails in local mode:
> add file ../data/scripts/dumpdata_script.py;
> select count(distinct subq.key) from
> (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key 
> = 10) subq;
> this needs to be fixed because it means we cannot choose local mode 
> automatically in case of transform scripts (since different paths need to be 
> used for cluster vs. local mode execution)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1376) Simple UDAFs with more than 1 parameter crash on empty row query

2010-10-04 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1376:
-

Status: Patch Available  (was: Open)

> Simple UDAFs with more than 1 parameter crash on empty row query 
> -
>
> Key: HIVE-1376
> URL: https://issues.apache.org/jira/browse/HIVE-1376
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Mayank Lahiri
>Assignee: Ning Zhang
> Attachments: HIVE-1376.2.patch, HIVE-1376.patch
>
>
> Simple UDAFs with more than 1 parameter crash when the query returns no rows. 
> Currently, this only seems to affect the percentile() UDAF where the second 
> parameter is the percentile to be computed (of type double). I've also 
> verified the bug by adding a dummy parameter to ExampleMin in contrib. 
> On an empty query, Hive seems to be trying to resolve an iterate() method 
> with signature {null,null} instead of {null,double}. You can reproduce this 
> bug using:
> CREATE TABLE pct_test ( val INT );
> SELECT percentile(val, 0.5) FROM pct_test;
> which produces a lot of errors like: 
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
> execute method public boolean 
> org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.iterate(org.apache.hadoop.io.LongWritable,double)
>   on object 
> org.apache.hadoop.hive.ql.udf.udafpercentile$percentilelongevalua...@11d13272 
> of class org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator 
> with arguments {null, null} of size 2

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1674) count(*) returns wrong result when a mapper returns empty results

2010-10-04 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1674:
-

Attachment: HIVE-1674.patch

> count(*) returns wrong result when a mapper returns empty results
> -
>
> Key: HIVE-1674
> URL: https://issues.apache.org/jira/browse/HIVE-1674
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1674.patch
>
>
> select count(*) from src where false; will return # of mappers rather than 0. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1674) count(*) returns wrong result when a mapper returns empty results

2010-10-04 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1674:
-

Status: Patch Available  (was: Open)

> count(*) returns wrong result when a mapper returns empty results
> -
>
> Key: HIVE-1674
> URL: https://issues.apache.org/jira/browse/HIVE-1674
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1674.patch
>
>
> select count(*) from src where false; will return # of mappers rather than 0. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1678) NPE in MapJoin

2010-10-04 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1678:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed. Thanks Amareshwari

> NPE in MapJoin 
> ---
>
> Key: HIVE-1678
> URL: https://issues.apache.org/jira/browse/HIVE-1678
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1678.txt
>
>
> The query with two map joins and a group by fails with following NPE:
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:177)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:464)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1658) Fix describe [extended] column formatting

2010-10-04 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917771#action_12917771
 ] 

He Yongqiang commented on HIVE-1658:


one more thing, if the time information (create time, last access time etc) is 
0, can you put some string like "unknown" to the output of desc format?

> Fix describe [extended] column formatting
> -
>
> Key: HIVE-1658
> URL: https://issues.apache.org/jira/browse/HIVE-1658
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Paul Yang
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-1658-PrelimPatch.patch
>
>
> When displaying the column schema, the formatting should follow should be 
> nametypecomment
> to be inline with the previous formatting style for backward compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1658) Fix describe [extended] column formatting

2010-10-04 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917725#action_12917725
 ] 

He Yongqiang commented on HIVE-1658:


+1. Looks good. Can you do the final patch?

> Fix describe [extended] column formatting
> -
>
> Key: HIVE-1658
> URL: https://issues.apache.org/jira/browse/HIVE-1658
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Paul Yang
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-1658-PrelimPatch.patch
>
>
> When displaying the column schema, the formatting should follow should be 
> nametypecomment
> to be inline with the previous formatting style for backward compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Hive-trunk-h0.20 #382

2010-10-04 Thread Apache Hudson Server
See 

--
[...truncated 14189 lines...]
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 

 

[junit] Done query: unknown_table1.q
[junit] Begin query: unknown_table2.q
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Copying data from 

[

[jira] Commented: (HIVE-1678) NPE in MapJoin

2010-10-04 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917684#action_12917684
 ] 

Namit Jain commented on HIVE-1678:
--

Nice catch - Thanks

+1

will commit if the tests pass

> NPE in MapJoin 
> ---
>
> Key: HIVE-1678
> URL: https://issues.apache.org/jira/browse/HIVE-1678
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1678.txt
>
>
> The query with two map joins and a group by fails with following NPE:
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:177)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:464)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1546) Ability to plug custom Semantic Analyzers for Hive Grammar

2010-10-04 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917677#action_12917677
 ] 

Namit Jain commented on HIVE-1546:
--

I will take a look in more detail, but overall it looks good. I had the 
following comments:

1. Instead of TestSemanticAnalyzerHookLoading.java, add tests in 
test/queries/clientpositive and test/queries/clientnegative
2. Do you want to set the value of hive.semantic.analyzer.hook to a dummy value 
in data/conf/hive-site.xml for the unit tests ?
Can something meaningful be printed here, which can be used for comparing ?


> Ability to plug custom Semantic Analyzers for Hive Grammar
> --
>
> Key: HIVE-1546
> URL: https://issues.apache.org/jira/browse/HIVE-1546
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 0.7.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 0.7.0
>
> Attachments: hive-1546-3.patch, hive-1546-4.patch, hive-1546.patch, 
> hive-1546_2.patch, hooks.patch, Howl_Semantic_Analysis.txt
>
>
> It will be useful if Semantic Analysis phase is made pluggable such that 
> other projects can do custom analysis of hive queries before doing metastore 
> operations on them. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Hive-trunk-h0.18 #559

2010-10-04 Thread Apache Hudson Server
See 

--
[...truncated 31015 lines...]
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 

 

[junit] Done query: unknown_table1.q
[junit] Begin query: unknown_table2.q
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Copying data from 

[

Build failed in Hudson: Hive-trunk-h0.19 #559

2010-10-04 Thread Apache Hudson Server
See 

--
[...truncated 12234 lines...]
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 

 

[junit] Done query: unknown_table1.q
[junit] Begin query: unknown_table2.q
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Copying data from 

[

Build failed in Hudson: Hive-trunk-h0.17 #558

2010-10-04 Thread Apache Hudson Server
See 

--
[...truncated 10843 lines...]
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 

 

[junit] Done query: unknown_table1.q
[junit] Begin query: unknown_table2.q
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src_sequencefile
[junit] POSTHO

[jira] Created: (HIVE-1689) Add GROUP_CONCAT to HiveQL

2010-10-04 Thread Jeff Hammerbacher (JIRA)
Add GROUP_CONCAT to HiveQL
--

 Key: HIVE-1689
 URL: https://issues.apache.org/jira/browse/HIVE-1689
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Jeff Hammerbacher


I often find GROUP_CONCAT to be handy when working with list-type data. See 
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_group-concat
 for the MySQL syntax.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1545) Add a bunch of UDFs and UDAFs

2010-10-04 Thread Terje Marthinussen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917546#action_12917546
 ] 

Terje Marthinussen commented on HIVE-1545:
--

Was just quickly looking at this and noticed that

grep lib com/facebook/hive/udf/*java
com/facebook/hive/udf/UDAFHistogram.java:import 
com.facebook.hive.udf.lib.Counter;
com/facebook/hive/udf/UDFJaccard.java:import com.facebook.hive.udf.lib.SetOps;

however, there is no com.facebook.hive.udf.lib included.





> Add a bunch of UDFs and UDAFs
> -
>
> Key: HIVE-1545
> URL: https://issues.apache.org/jira/browse/HIVE-1545
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: UDF
>Reporter: Jonathan Chang
>Assignee: Jonathan Chang
>Priority: Minor
> Attachments: udfs.tar.gz
>
>
> Here some UD(A)Fs which can be incorporated into the Hive distribution:
> UDFArgMax - Find the 0-indexed index of the largest argument. e.g., ARGMAX(4, 
> 5, 3) returns 1.
> UDFBucket - Find the bucket in which the first argument belongs. e.g., 
> BUCKET(x, b_1, b_2, b_3, ...), will return the smallest i such that x > b_{i} 
> but <= b_{i+1}. Returns 0 if x is smaller than all the buckets.
> UDFFindInArray - Finds the 1-index of the first element in the array given as 
> the second argument. Returns 0 if not found. Returns NULL if either argument 
> is NULL. E.g., FIND_IN_ARRAY(5, array(1,2,5)) will return 3. FIND_IN_ARRAY(5, 
> array(1,2,3)) will return 0.
> UDFGreatCircleDist - Finds the great circle distance (in km) between two 
> lat/long coordinates (in degrees).
> UDFLDA - Performs LDA inference on a vector given fixed topics.
> UDFNumberRows - Number successive rows starting from 1. Counter resets to 1 
> whenever any of its parameters changes.
> UDFPmax - Finds the maximum of a set of columns. e.g., PMAX(4, 5, 3) returns 
> 5.
> UDFRegexpExtractAll - Like REGEXP_EXTRACT except that it returns all matches 
> in an array.
> UDFUnescape - Returns the string unescaped (using C/Java style unescaping).
> UDFWhich - Given a boolean array, return the indices which are TRUE.
> UDFJaccard
> UDAFCollect - Takes all the values associated with a row and converts it into 
> a list. Make sure to have: set hive.map.aggr = false;
> UDAFCollectMap - Like collect except that it takes tuples and generates a map.
> UDAFEntropy - Compute the entropy of a column.
> UDAFPearson (BROKEN!!!) - Computes the pearson correlation between two 
> columns.
> UDAFTop - TOP(KEY, VAL) - returns the KEY associated with the largest value 
> of VAL.
> UDAFTopN (BROKEN!!!) - Like TOP except returns a list of the keys associated 
> with the N (passed as the third parameter) largest values of VAL.
> UDAFHistogram

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1681) ObjectStore.commitTransaction() does not properly handle transactions that have already been rolled back

2010-10-04 Thread Venkatesh S (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917520#action_12917520
 ] 

Venkatesh S commented on HIVE-1681:
---

The query ran successfully with this patch. Thanks Carl. Appreciate if this can 
be committed quickly.

> ObjectStore.commitTransaction() does not properly handle transactions that 
> have already been rolled back
> 
>
> Key: HIVE-1681
> URL: https://issues.apache.org/jira/browse/HIVE-1681
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.5.0, 0.6.0, 0.7.0
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Attachments: HIVE-1681.1.patch.txt
>
>
> Here's the code for ObjectStore.commitTransaction() and 
> ObjectStore.rollbackTransaction():
> {code}
>   public boolean commitTransaction() {
> assert (openTrasactionCalls >= 1);
> if (!currentTransaction.isActive()) {
>   throw new RuntimeException(
>   "Commit is called, but transaction is not active. Either there are"
>   + " mismatching open and close calls or rollback was called in 
> the same trasaction");
> }
> openTrasactionCalls--;
> if ((openTrasactionCalls == 0) && currentTransaction.isActive()) {
>   transactionStatus = TXN_STATUS.COMMITED;
>   currentTransaction.commit();
> }
> return true;
>   }
>   public void rollbackTransaction() {
> if (openTrasactionCalls < 1) {
>   return;
> }
> openTrasactionCalls = 0;
> if (currentTransaction.isActive()
> && transactionStatus != TXN_STATUS.ROLLBACK) {
>   transactionStatus = TXN_STATUS.ROLLBACK;
>   // could already be rolled back
>   currentTransaction.rollback();
> }
>   }
> {code}
> Now suppose a nested transaction throws an exception which results
> in the nested pseudo-transaction calling rollbackTransaction(). This causes
> rollbackTransaction() to rollback the actual transaction, as well as to set 
> openTransactionCalls=0 and transactionStatus = TXN_STATUS.ROLLBACK.
> Suppose also that this nested transaction squelches the original exception.
> In this case the stack will unwind and the caller will eventually try to 
> commit the
> transaction by calling commitTransaction() which will see that 
> currentTransaction.isActive() returns
> FALSE and will throw a RuntimeException. The fix for this problem is
> that commitTransaction() needs to first check transactionStatus and return 
> immediately
> if transactionStatus==TXN_STATUS.ROLLBACK.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1678) NPE in MapJoin

2010-10-04 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-1678:
--

Status: Patch Available  (was: Open)

> NPE in MapJoin 
> ---
>
> Key: HIVE-1678
> URL: https://issues.apache.org/jira/browse/HIVE-1678
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1678.txt
>
>
> The query with two map joins and a group by fails with following NPE:
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:177)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:464)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1678) NPE in MapJoin

2010-10-04 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-1678:
--

Attachment: patch-1678.txt

The bug is in plan generation when MapJoin is followed MapJoin, and is followed 
by ReduceSink. ReduceSink operator reads the input from oldMapJoin instead of 
current MapJoin.

Attached patch has one line fix in GenMapRedUtils.initMapJoinPlan to fix the 
bug. Also includes the testcase.

> NPE in MapJoin 
> ---
>
> Key: HIVE-1678
> URL: https://issues.apache.org/jira/browse/HIVE-1678
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1678.txt
>
>
> The query with two map joins and a group by fails with following NPE:
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:177)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:464)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1647) Incorrect initialization of thread local variable inside IOContext ( implementation is not threadsafe )

2010-10-04 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1647:
-

Status: Open  (was: Patch Available)

> Incorrect initialization of thread local variable inside IOContext ( 
> implementation is not threadsafe ) 
> 
>
> Key: HIVE-1647
> URL: https://issues.apache.org/jira/browse/HIVE-1647
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Server Infrastructure
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Raman Grover
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: HIVE-1647.patch
>
>   Original Estimate: 0.17h
>  Remaining Estimate: 0.17h
>
> Bug in org.apache.hadoop.hive.ql.io.IOContext
> in relation to initialization of thread local variable.
>  
> public class IOContext {
>  
>   private static ThreadLocal threadLocal = new 
> ThreadLocal(){ };
>  
>   static {
> if (threadLocal.get() == null) {
>   threadLocal.set(new IOContext());
> }
>   }
>  
> In a multi-threaded environment, the thread that gets to load the class first 
> for the JVM (assuming threads share the classloader),
> gets to initialize itself correctly by executing the code in the static 
> block. Once the class is loaded, 
> any subsequent threads would  have their respective threadlocal variable as 
> null.  Since IOContext
> is set during initialization of HiveRecordReader, In a scenario where 
> multiple threads get to acquire
>  an instance of HiveRecordReader, it would result in a NPE for all but the 
> first thread that gets to load the class in the VM.
>  
> Is the above scenario of multiple threads initializing HiveRecordReader a 
> typical one ?  or we could just provide the following fix...
>  
>   private static ThreadLocal threadLocal = new 
> ThreadLocal(){
> protected synchronized IOContext initialValue() {
>   return new IOContext();
> }  
>   };

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.