from:"Arvind Prabhakar \(JIRA\)"

[jira] Commented: (HIVE-287) support count(*) and count distinct on multiple columns

2010-07-23 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891889#action_12891889
 ] 

Arvind Prabhakar commented on HIVE-287:
---

Updated the wiki in the all the above places.

> support count(*) and count distinct on multiple columns
> ---
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0, 0.7.0
>
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, 
> HIVE-287-4.patch, HIVE-287-5-branch-0.6.patch, HIVE-287-5-trunk.patch, 
> HIVE-287-6-branch-0.6.patch, HIVE-287-6-trunk.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-287) support count(*) and count distinct on multiple columns

2010-07-13 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888106#action_12888106
 ] 

Arvind Prabhakar commented on HIVE-287:
---

@John: I updated the UDAF documentation on the wiki at 
http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF and also added a short 
blurb regarding the various interface changes on the UDAF tutorial page 
http://wiki.apache.org/hadoop/Hive/GenericUDAFCaseStudy#Writing_the_source. 
Please let me know if there are other places that need to be updated as well.

> support count(*) and count distinct on multiple columns
> ---
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0, 0.7.0
>
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, 
> HIVE-287-4.patch, HIVE-287-5-branch-0.6.patch, HIVE-287-5-trunk.patch, 
> HIVE-287-6-branch-0.6.patch, HIVE-287-6-trunk.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-287) count distinct on multiple columns does not work

2010-07-09 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886950#action_12886950
 ] 

Arvind Prabhakar commented on HIVE-287:
---

bq. 1. Change the comments for the 2 new fields. It's easy for UDAF writers to 
assume that the UDAF itself needs to handle whether it's distinct or whether 
it's all columns.

I updated the javadocs in various places to make this clear.

bq. 2. Deprecate the old interface, and move all existing GenericUDAF to 
inherit from the new one.

No changes necessary for this - the previously submitted patch also did it. All 
existing generic UDAFs now extend from the abstract class that implements the 
new interface. If you see any problems with that let me know.



> count distinct on multiple columns does not work
> 
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, 
> HIVE-287-4.patch, HIVE-287-5-branch-0.6.patch, HIVE-287-5-trunk.patch, 
> HIVE-287-6-branch-0.6.patch, HIVE-287-6-trunk.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-287) count distinct on multiple columns does not work

2010-07-09 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-287:
--

Attachment: HIVE-287-6-trunk.patch
HIVE-287-6-branch-0.6.patch

> count distinct on multiple columns does not work
> 
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, 
> HIVE-287-4.patch, HIVE-287-5-branch-0.6.patch, HIVE-287-5-trunk.patch, 
> HIVE-287-6-branch-0.6.patch, HIVE-287-6-trunk.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-287) count distinct on multiple columns does not work

2010-07-09 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886804#action_12886804
 ] 

Arvind Prabhakar commented on HIVE-287:
---

I think keeping two different interfaces for UDAFs will lead to confusion in 
the long run. Thats why the current patch deprecates the old interface in favor 
of the new one. But if all agree that it is a good idea, then I will go with 
that.

Also - can you suggest an alternate name for the new interface?

> count distinct on multiple columns does not work
> 
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, 
> HIVE-287-4.patch, HIVE-287-5-branch-0.6.patch, HIVE-287-5-trunk.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-287) count distinct on multiple columns does not work

2010-07-09 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886769#action_12886769
 ] 

Arvind Prabhakar commented on HIVE-287:
---

I vote for a meeting to hash this out face-to-face. I am willing to modify the 
patch provided we all are in agreement as to how it should be changed. It will 
be much better use of everyone's time to avoid the numerous deltas to the patch 
before settling in on the final solution. Please let me know what you think.

> count distinct on multiple columns does not work
> 
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, 
> HIVE-287-4.patch, HIVE-287-5-branch-0.6.patch, HIVE-287-5-trunk.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-287) count distinct on multiple columns does not work

2010-07-08 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886339#action_12886339
 ] 

Arvind Prabhakar commented on HIVE-287:
---

@Zheng: Welcome to the party.

bq. Why do we put the DISTINCT in the information? DISTINCT is currently done 
by the framework, instead of individual UDAF. This is good because the logic of 
removing duplicates are common for all UDAFs. We do support SUM(DISTINCT val).

Providing the information in the parameter specification is not the same as 
enforcing its interpretation. This is provided primarily to ensure that UDAFs 
that rely on this information can make appropriate decisions. For example, we 
wanted to disallow the invocation {{COUNT( EXPR1, EXPR2 ...)}} in favor of 
{{COUNT(*DISTINCT* EXPR1, EXPR2 ...)}}. Without this information, the count 
UDAF will not be able to enforce the later syntax.

bq. Why do we special-case ""? It seems to me that "" is just a short-cut. Hive 
already supports regex-based multi-column specification, so that we can say 
`abc.*` for all columns with name starting with abc. The compiler should just 
expand * and give all the columns to the UDAF.

If you wish to use \* as a regular expression, you would have to quote it as a 
string - {{COUNT('\*')}}. This is different from the invocation as specified in 
SQL which treats \* as a terminal symbol. So if it is OK to deviate from the 
standard representation, the user can easily use the quoted string 
representation to achieve the effect similar to {{COUNT(col1, col2 ..)}}. The 
semantics of this should be more like {{COUNT(DISTINCT EXPR1, EXPR2 ...)}} as 
opposed to {{COUNT(\*)}}.

bq. Since COUNT(\*) is a special-case in the SQL standard (COUNT(\*) is 
different from COUNT(col) even if the table has a single column col), I think 
we should just special-case that and replace that with count(1) at some place.

Are you suggesting that we allow the grammar to express {{COUNT(\*)}} syntax, 
but in the lexical analysis stage turn it into a {{COUNT(1)}}? I can see how 
that may work - but personally I am not a fan of such an approach. 

> count distinct on multiple columns does not work
> 
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, 
> HIVE-287-4.patch, HIVE-287-5-branch-0.6.patch, HIVE-287-5-trunk.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1453) Build configuration changes introduced regression in launch configurations

2010-07-07 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886131#action_12886131
 ] 

Arvind Prabhakar commented on HIVE-1453:


Review posted:  http://review.hbase.org/r/280/

> Build configuration changes introduced regression in launch configurations
> --
>
> Key: HIVE-1453
> URL: https://issues.apache.org/jira/browse/HIVE-1453
> Project: Hadoop Hive
>  Issue Type: Bug
> Environment: All Eclipse environments
>Reporter: Arvind Prabhakar
>Assignee: Arvind Prabhakar
> Attachments: HIVE-1453.patch
>
>
> The changes to prepare for branching out 0.6.0 required [changes to build 
> configuration|http://svn.apache.org/viewvc/hadoop/hive/trunk/build.properties?r1=952877&r2=956430]
>  which caused the launch configurations to break as the jars they referred to 
> were renamed automatically. As a result, none of the launch configurations 
> are working at this point.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1453) Build configuration changes introduced regression in launch configurations

2010-07-07 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-1453:
---

Status: Patch Available  (was: Open)

Modified the launch configurations to use parameterized version suffix for jars 
instead of hardcoding them to a specific jar version.

> Build configuration changes introduced regression in launch configurations
> --
>
> Key: HIVE-1453
> URL: https://issues.apache.org/jira/browse/HIVE-1453
> Project: Hadoop Hive
>  Issue Type: Bug
> Environment: All Eclipse environments
>Reporter: Arvind Prabhakar
>Assignee: Arvind Prabhakar
> Attachments: HIVE-1453.patch
>
>
> The changes to prepare for branching out 0.6.0 required [changes to build 
> configuration|http://svn.apache.org/viewvc/hadoop/hive/trunk/build.properties?r1=952877&r2=956430]
>  which caused the launch configurations to break as the jars they referred to 
> were renamed automatically. As a result, none of the launch configurations 
> are working at this point.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1453) Build configuration changes introduced regression in launch configurations

2010-07-07 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-1453:
---

Attachment: HIVE-1453.patch

> Build configuration changes introduced regression in launch configurations
> --
>
> Key: HIVE-1453
> URL: https://issues.apache.org/jira/browse/HIVE-1453
> Project: Hadoop Hive
>  Issue Type: Bug
> Environment: All Eclipse environments
>Reporter: Arvind Prabhakar
>Assignee: Arvind Prabhakar
> Attachments: HIVE-1453.patch
>
>
> The changes to prepare for branching out 0.6.0 required [changes to build 
> configuration|http://svn.apache.org/viewvc/hadoop/hive/trunk/build.properties?r1=952877&r2=956430]
>  which caused the launch configurations to break as the jars they referred to 
> were renamed automatically. As a result, none of the launch configurations 
> are working at this point.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1453) Build configuration changes introduced regression in launch configurations

2010-07-07 Thread Arvind Prabhakar (JIRA)

Build configuration changes introduced regression in launch configurations
--

 Key: HIVE-1453
 URL: https://issues.apache.org/jira/browse/HIVE-1453
 Project: Hadoop Hive
  Issue Type: Bug
 Environment: All Eclipse environments
Reporter: Arvind Prabhakar
Assignee: Arvind Prabhakar


The changes to prepare for branching out 0.6.0 required [changes to build 
configuration|http://svn.apache.org/viewvc/hadoop/hive/trunk/build.properties?r1=952877&r2=956430]
 which caused the launch configurations to break as the jars they referred to 
were renamed automatically. As a result, none of the launch configurations are 
working at this point.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1432) Create a test case for case sensitive comparison done during field comparison

2010-07-06 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885773#action_12885773
 ] 

Arvind Prabhakar commented on HIVE-1432:


Review posted:

http://review.hbase.org/r/276/

> Create a test case for case sensitive comparison done during field comparison
> -
>
> Key: HIVE-1432
> URL: https://issues.apache.org/jira/browse/HIVE-1432
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Query Processor
>Reporter: Arvind Prabhakar
>Assignee: Arvind Prabhakar
> Fix For: 0.7.0
>
> Attachments: HIVE-1432.patch
>
>
> See HIVE-1271. This jira tracks the creation of a test case to test this fix 
> specifically.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1451) Creating a table stores the full address of namenode in the metadata. This leads to problems when the namenode address changes.

2010-07-06 Thread Arvind Prabhakar (JIRA)

Creating a table stores the full address of namenode in the metadata. This 
leads to problems when the namenode address changes.
---

 Key: HIVE-1451
 URL: https://issues.apache.org/jira/browse/HIVE-1451
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Metastore, Query Processor
Affects Versions: 0.5.0
 Environment: Any
Reporter: Arvind Prabhakar
Assignee: Arvind Prabhakar


Here is an excerpt from table metadata for an arbitrary table {{table1}}:

{noformat}
hive> describe extended table1;
OK
...
Detailed Table Information  ...
location:hdfs://localhost:9000/user/arvind/hive/warehouse/table1, 
...
{noformat}

As can be seen, the full address of namenode is captured in the location 
information for the table. This information is later used to run any queries on 
the table - thus making it impossible to change the namenode location once the 
table has been created. For example, for the above table, a query will fail if 
the namenode is migrated from port 9000 to 8020:

{noformat}
hive> select * from table1;
OK
Failed with exception java.io.IOException:java.net.ConnectException: Call to 
localhost/127.0.0.1:9000
failed on connection exception: java.net.ConnectException: Connection refused
Time taken: 10.78 seconds
hive> 
{noformat}

It should be possible to change the namenode location regardless of when the 
tables are created. Also, any query execution should work with the configured 
namenode at that point in time rather than requiring the configuration to be 
exactly the same at the time when the tables were created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-287) count distinct on multiple columns does not work

2010-07-06 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885766#action_12885766
 ] 

Arvind Prabhakar commented on HIVE-287:
---

Review board review posted:

http://review.hbase.org/r/275/


> count distinct on multiple columns does not work
> 
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, 
> HIVE-287-4.patch, HIVE-287-5-branch-0.6.patch, HIVE-287-5-trunk.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-287) count distinct on multiple columns does not work

2010-07-06 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-287:
--

Status: Patch Available  (was: Open)

Uploaded patch for trunk and branch-0.6. Ran all the tests on trunk and did 
spot testing on branch-0.6.

*Changes from Previous patch:*
* Modified the implementation of {{AbstractGenericUDAFResolver}} to raise an 
exception when invoked with the {{UDAF(STAR)}} syntax.
* Added negative test cases to assert that the current UDAFs present in the 
code other than {{COUNT}} do not accept the {{UDAF(STAR)}} syntax.
* Added {{EXPLAIN}} directives for the queries run in {{udf_count.q}} test file.

Will attempt to post the patch on review board as well.


> count distinct on multiple columns does not work
> 
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, 
> HIVE-287-4.patch, HIVE-287-5-branch-0.6.patch, HIVE-287-5-trunk.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-287) count distinct on multiple columns does not work

2010-07-06 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-287:
--

Attachment: HIVE-287-5-trunk.patch
HIVE-287-5-branch-0.6.patch

> count distinct on multiple columns does not work
> 
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, 
> HIVE-287-4.patch, HIVE-287-5-branch-0.6.patch, HIVE-287-5-trunk.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1432) Create a test case for case sensitive comparison done during field comparison

2010-07-02 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884875#action_12884875
 ] 

Arvind Prabhakar commented on HIVE-1432:


Adding the test case to exercise the fix for HIVE-1271. This test was run 
manually on trunk revision prior to commit of HIVE-1271 and produced the 
following error:

{noformat}
2010-07-02 17:13:05,085 DEBUG lazy.LazySimpleSerDe 
(LazySimpleSerDe.java:initialize(212)) - LazySimpleSerDe initialized
 with: columnNames=[info] columnTypes=[struct] 
separator=...@dbf2988] nullstring=\N 
lastColumnTakesRest=false
2010-07-02 17:13:05,089 ERROR ql.Driver (SessionState.java:printError(277)) - 
FAILED: Error in semantic analysis: line 4:23 
Cannot insert into target table because column number/types are different 
table2: Cannot convert column 0 from 
struct to 
struct.
org.apache.hadoop.hive.ql.parse.SemanticException: line 4:23 Cannot insert into 
target table because column number/types are 
different table2: Cannot convert column 0 from 
struct to 
struct.
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genConversionSelectOperator(SemanticAnalyzer.java:3573)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:3434)

{noformat}

This test passes on the current trunk.

> Create a test case for case sensitive comparison done during field comparison
> -
>
> Key: HIVE-1432
> URL: https://issues.apache.org/jira/browse/HIVE-1432
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Query Processor
>Reporter: Arvind Prabhakar
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1432.patch
>
>
> See HIVE-1271. This jira tracks the creation of a test case to test this fix 
> specifically.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1432) Create a test case for case sensitive comparison done during field comparison

2010-07-02 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-1432:
---

Status: Patch Available  (was: Open)

Patch available. Please review.

> Create a test case for case sensitive comparison done during field comparison
> -
>
> Key: HIVE-1432
> URL: https://issues.apache.org/jira/browse/HIVE-1432
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Query Processor
>Reporter: Arvind Prabhakar
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1432.patch
>
>
> See HIVE-1271. This jira tracks the creation of a test case to test this fix 
> specifically.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1432) Create a test case for case sensitive comparison done during field comparison

2010-07-02 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-1432:
---

Attachment: HIVE-1432.patch

> Create a test case for case sensitive comparison done during field comparison
> -
>
> Key: HIVE-1432
> URL: https://issues.apache.org/jira/browse/HIVE-1432
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Query Processor
>Reporter: Arvind Prabhakar
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1432.patch
>
>
> See HIVE-1271. This jira tracks the creation of a test case to test this fix 
> specifically.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-287) count distinct on multiple columns does not work

2010-07-02 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884580#action_12884580
 ] 

Arvind Prabhakar commented on HIVE-287:
---

Thanks for the explanation John. The SQL BNF that you pointed out is the 
normative SQL specification. I do not think any SQL implementations use this 
grammar though. The parallel is that of an interface and its implementation. 
While the interface can be short and precise, the implementations may choose to 
delegate interface methods to other implementation specific methods. Similarly, 
most databases deal with their own SQL grammar that is compliant with the SQL 
standard at specific levels.

More to the point in Hive - my key concern is that by modifying the grammar to 
make an exception for {{COUNT}}, we will be introducing a brittle coupling 
between the the parser and semantic analyzer. Right now the count aggregate 
function is treated like any other function and is thus part of the general 
framework. By making this change, we will be modifying it to be specifically 
associated from with the grammar directives. 

This is the current function definition in Hive QL grammar (*A*):

{noformat}
-->[ functionName ]-->[ LPAREN ]--+-->[ KW_DISTINCT ]--+--+--+-->[ expression 
]--+--+-->[ RPAREN ]-->
  ||  |  |  
 |  |
  +--->+  |  +--[ COMMA 
]<---+  |
  | 
|
  
+->---+
{noformat}

The patch that I have supplied already on this Jira modifies this definition as 
follows (*B*):

{noformat}

-->[ functionName ]-->[ LPAREN ]--+>[ STAR 
]--+-->[ RPAREN ]-->
  | 
  |
  +--+-->[ KW_DISTINCT ]--+--+--+-->[ 
expression ]--+--+--+
 ||  |  |   
|  |
 +--->+  |  +--[ COMMA 
]<---+  |
 |  
   |
 
+->---+
{noformat}

If I were to modify the grammar to make an exception for {{COUNT}} it will 
likely be changed to something like this (*C*):

{noformat}

--+-->[ KW_COUNT ]-->[ LPAREN ]-->[ STAR ]-->[ 
RPAREN ]+-->
  | 
   |
  +-->[ functionName ]-->[ LPAREN ]--+--+-->[ KW_DISTINCT ]--+--+-->[ 
expression ]--+--+-->[ RPAREN ]--+
 |  ||  |   
|  |
 |  +--->+  +[ COMMA 
]<-+  |
 |  
   |
 
+->---+

{noformat}

Consider the *C* approach closely: The production that matches a {{COUNT}} 
invocation can be directly matched via the top branch using {{KW_COUNT}} token, 
or it could follow the branch below where {{functionName}} could match 
{{COUNT}}. On the semantic analyzer side, it makes the matching logic more 
complex and less intuitive since now the {{COUNT}} can be invoked via two 
branch conditions. For example - there would be one invocation that would 
directly delegate to the {{COUNT}} aggregate function, whereas another that 
will use the current resolver mechanism to invoke it.

Instead, the approach *B* keeps the grammar consistent with the regular 
function invocation. It does not favor any one function over the other and 
simply establishes matching rules for function production. That way, the call 
is then delegated to the semantic analyzer which in turn matches the 
appropriate handling function based on the name and parameter type using the 
generic resolver mechanism without regard to what function is being invoked. 

The changes supplied in the current patch also allow individual function 
handlers to decide if they would like to support the {{functionName(STAR)}} 
syntax. Since you feel strongly about not supporting this syntax by default for 
any function, I can perhaps modify the {{AbstractGenericUDAFResolver}} class to 
raise an exception if invoked with this syntax. That way, only the functions 
that choose to overwrite that behavior will be able to support it. Also as you 
can see in the syntax diagram for (B), there is no production that will match 
things like {{functionName(DISTINCT STAR)}} or {{functionName(STAR, EX

[jira] Commented: (HIVE-287) count distinct on multiple columns does not work

2010-06-30 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884125#action_12884125
 ] 

Arvind Prabhakar commented on HIVE-287:
---

@John - are you suggesting that the grammar be updated to restrict single star 
argument with the specific function {{COUNT}}? If not in the grammar where else 
do you think these restrictions should be coded.

In either case, what other subsystems you think will be impacted by this change 
and what do you suggest should be the downstream changes to accomodate this?

p.s. I ask these questions to best utilize both our time and reduce the number 
of back/froths to the extent possible. 

> count distinct on multiple columns does not work
> 
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, 
> HIVE-287-4.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1271) Case sensitiveness of type information specified when using custom reducer causes type mismatch

2010-06-28 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883214#action_12883214
 ] 

Arvind Prabhakar commented on HIVE-1271:


@Zheng: Please see the second comment. This patch uses C2 method - comparing 
field names in a case insensitive manner.

> Case sensitiveness of type information specified when using custom reducer 
> causes type mismatch
> ---
>
> Key: HIVE-1271
> URL: https://issues.apache.org/jira/browse/HIVE-1271
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Dilip Joseph
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1271-1.patch, HIVE-1271.patch
>
>
> Type information specified  while using a custom reduce script is converted 
> to lower case, and causes type mismatch during query semantic analysis .  The 
> following REDUCE query where field name =  "userId" failed.
> hive> CREATE TABLE SS (
>> a INT,
>> b INT,
>> vals ARRAY>
>> );
> OK
> hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
>> INSERT OVERWRITE TABLE SS
>> REDUCE *
>> USING 'myreduce.py'
>> AS
>> (a INT,
>> b INT,
>> vals ARRAY>
>> )
>> ;
> FAILED: Error in semantic analysis: line 2:27 Cannot insert into
> target table because column number/types are different SS: Cannot
> convert column 2 from array> to
> array>.
> The same query worked fine after changing "userId" to "userid".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1271) Case sensitiveness of type information specified when using custom reducer causes type mismatch

2010-06-23 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882048#action_12882048
 ] 

Arvind Prabhakar commented on HIVE-1271:


@Ashish: I created HIVE-1432 to track the test case creation. I will be 
submitting a patch for that soon. Thanks for pointing this out.

> Case sensitiveness of type information specified when using custom reducer 
> causes type mismatch
> ---
>
> Key: HIVE-1271
> URL: https://issues.apache.org/jira/browse/HIVE-1271
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Dilip Joseph
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1271-1.patch, HIVE-1271.patch
>
>
> Type information specified  while using a custom reduce script is converted 
> to lower case, and causes type mismatch during query semantic analysis .  The 
> following REDUCE query where field name =  "userId" failed.
> hive> CREATE TABLE SS (
>> a INT,
>> b INT,
>> vals ARRAY>
>> );
> OK
> hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
>> INSERT OVERWRITE TABLE SS
>> REDUCE *
>> USING 'myreduce.py'
>> AS
>> (a INT,
>> b INT,
>> vals ARRAY>
>> )
>> ;
> FAILED: Error in semantic analysis: line 2:27 Cannot insert into
> target table because column number/types are different SS: Cannot
> convert column 2 from array> to
> array>.
> The same query worked fine after changing "userId" to "userid".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1432) Create a test case for case sensitive comparison done during field comparison

2010-06-23 Thread Arvind Prabhakar (JIRA)

Create a test case for case sensitive comparison done during field comparison
-

 Key: HIVE-1432
 URL: https://issues.apache.org/jira/browse/HIVE-1432
 Project: Hadoop Hive
  Issue Type: Task
  Components: Query Processor
Reporter: Arvind Prabhakar
Assignee: Arvind Prabhakar
 Fix For: 0.6.0


See HIVE-1271. This jira tracks the creation of a test case to test this fix 
specifically.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-06-23 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-1176:
---

Attachment: HIVE-1176-6.patch

> 'create if not exists' fails for a table name with 'select' in it
> -
>
> Key: HIVE-1176
> URL: https://issues.apache.org/jira/browse/HIVE-1176
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, HIVE-1176-3.patch, 
> HIVE-1176-4.patch, HIVE-1176-5.patch, HIVE-1176-6.patch, 
> HIVE-1176.lib-files.tar.gz, HIVE-1176.patch
>
>
> hive> create table if not exists tmp_select(s string, c string, n int);
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
> start with SELECT)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
> JDOQL Single-String query should always start with SELECT)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
> ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-06-23 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881956#action_12881956
 ] 

Arvind Prabhakar commented on HIVE-1176:


yes - thats what my intention was. Thanks for catching it.


> 'create if not exists' fails for a table name with 'select' in it
> -
>
> Key: HIVE-1176
> URL: https://issues.apache.org/jira/browse/HIVE-1176
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, HIVE-1176-3.patch, 
> HIVE-1176-4.patch, HIVE-1176-5.patch, HIVE-1176-6.patch, 
> HIVE-1176.lib-files.tar.gz, HIVE-1176.patch
>
>
> hive> create table if not exists tmp_select(s string, c string, n int);
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
> start with SELECT)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
> JDOQL Single-String query should always start with SELECT)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
> ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-06-23 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881946#action_12881946
 ] 

Arvind Prabhakar commented on HIVE-1176:


@John: done. Please see the new patch attachment -  HIVE-1176-5.patch

Since a lot of good points came out of the discussion on this jira, I took the 
liberty of adding them to the Hive wiki for posterity. You can find it 
[here|http://wiki.apache.org/hadoop/Hive/TipsForAddingNewTests]. Please add to 
it any other points that you feel contributors should take into consideration 
while adding new tests.

> 'create if not exists' fails for a table name with 'select' in it
> -
>
> Key: HIVE-1176
> URL: https://issues.apache.org/jira/browse/HIVE-1176
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, HIVE-1176-3.patch, 
> HIVE-1176-4.patch, HIVE-1176-5.patch, HIVE-1176.lib-files.tar.gz, 
> HIVE-1176.patch
>
>
> hive> create table if not exists tmp_select(s string, c string, n int);
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
> start with SELECT)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
> JDOQL Single-String query should always start with SELECT)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
> ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-06-23 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-1176:
---

Attachment: HIVE-1176-5.patch

> 'create if not exists' fails for a table name with 'select' in it
> -
>
> Key: HIVE-1176
> URL: https://issues.apache.org/jira/browse/HIVE-1176
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, HIVE-1176-3.patch, 
> HIVE-1176-4.patch, HIVE-1176-5.patch, HIVE-1176.lib-files.tar.gz, 
> HIVE-1176.patch
>
>
> hive> create table if not exists tmp_select(s string, c string, n int);
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
> start with SELECT)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
> JDOQL Single-String query should always start with SELECT)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
> ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-06-22 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-1176:
---

Attachment: HIVE-1176-4.patch

> 'create if not exists' fails for a table name with 'select' in it
> -
>
> Key: HIVE-1176
> URL: https://issues.apache.org/jira/browse/HIVE-1176
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, HIVE-1176-3.patch, 
> HIVE-1176-4.patch, HIVE-1176.lib-files.tar.gz, HIVE-1176.patch
>
>
> hive> create table if not exists tmp_select(s string, c string, n int);
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
> start with SELECT)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
> JDOQL Single-String query should always start with SELECT)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
> ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-06-22 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881566#action_12881566
 ] 

Arvind Prabhakar commented on HIVE-1176:


@Paul: You suggestions are fair enough. I have incorporated all changes you 
suggested except for the pre-drop based on @John's response. Let me know if you 
guys need any further tweaking of this patch.

> 'create if not exists' fails for a table name with 'select' in it
> -
>
> Key: HIVE-1176
> URL: https://issues.apache.org/jira/browse/HIVE-1176
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, HIVE-1176-3.patch, 
> HIVE-1176-4.patch, HIVE-1176.lib-files.tar.gz, HIVE-1176.patch
>
>
> hive> create table if not exists tmp_select(s string, c string, n int);
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
> start with SELECT)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
> JDOQL Single-String query should always start with SELECT)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
> ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-06-22 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881538#action_12881538
 ] 

Arvind Prabhakar commented on HIVE-1176:


Updated patch with a test case attached. Please use HIVE-1176-3.patch. The 
changed files in this patch are as follows:


#   modified:   build.properties
#   modified:   build.xml
#   new file:   data/files/simple.txt
#   modified:   eclipse-templates/.classpath
#   modified:   ivy/ivysettings.xml
#   deleted:lib/datanucleus-core-1.1.2.LICENSE
#   deleted:lib/datanucleus-core-1.1.2.jar
#   deleted:lib/datanucleus-enhancer-1.1.2.LICENSE
#   deleted:lib/datanucleus-enhancer-1.1.2.jar
#   deleted:lib/datanucleus-rdbms-1.1.2.LICENSE
#   deleted:lib/datanucleus-rdbms-1.1.2.jar
#   deleted:lib/jdo2-api-2.3-SNAPSHOT.LICENSE
#   deleted:lib/jdo2-api-2.3-SNAPSHOT.jar
#   modified:   metastore/ivy.xml
#   modified:   
metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
#   new file:   ql/src/test/queries/clientpositive/hive_1176.q
#   new file:   ql/src/test/results/clientpositive/hive_1176.q.out


> 'create if not exists' fails for a table name with 'select' in it
> -
>
> Key: HIVE-1176
> URL: https://issues.apache.org/jira/browse/HIVE-1176
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, HIVE-1176-3.patch, 
> HIVE-1176.lib-files.tar.gz, HIVE-1176.patch
>
>
> hive> create table if not exists tmp_select(s string, c string, n int);
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
> start with SELECT)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
> JDOQL Single-String query should always start with SELECT)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
> ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-06-22 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-1176:
---

Attachment: HIVE-1176-3.patch

> 'create if not exists' fails for a table name with 'select' in it
> -
>
> Key: HIVE-1176
> URL: https://issues.apache.org/jira/browse/HIVE-1176
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, HIVE-1176-3.patch, 
> HIVE-1176.lib-files.tar.gz, HIVE-1176.patch
>
>
> hive> create table if not exists tmp_select(s string, c string, n int);
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
> start with SELECT)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
> JDOQL Single-String query should always start with SELECT)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
> ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-06-22 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881499#action_12881499
 ] 

Arvind Prabhakar commented on HIVE-1176:


Makes sense. Will add a test case and update the patch soon. Sorry for the 
misunderstanding.

> 'create if not exists' fails for a table name with 'select' in it
> -
>
> Key: HIVE-1176
> URL: https://issues.apache.org/jira/browse/HIVE-1176
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, 
> HIVE-1176.lib-files.tar.gz, HIVE-1176.patch
>
>
> hive> create table if not exists tmp_select(s string, c string, n int);
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
> start with SELECT)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
> JDOQL Single-String query should always start with SELECT)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
> ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-06-22 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881488#action_12881488
 ] 

Arvind Prabhakar commented on HIVE-1176:


Also, for the specific change to {{HiveMetaStoreClient.java}} - the tests under 
{{metastore}} validate that the new libraries are working correctly. 

> 'create if not exists' fails for a table name with 'select' in it
> -
>
> Key: HIVE-1176
> URL: https://issues.apache.org/jira/browse/HIVE-1176
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, 
> HIVE-1176.lib-files.tar.gz, HIVE-1176.patch
>
>
> hive> create table if not exists tmp_select(s string, c string, n int);
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
> start with SELECT)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
> JDOQL Single-String query should always start with SELECT)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
> ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-06-22 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881486#action_12881486
 ] 

Arvind Prabhakar commented on HIVE-1176:


Sorry - it is not clear to me what unit test should I be writing. Can you give 
an example perhaps?

>From my perspective, any test that uses the metastore exercises this change. 
>And together, all the tests form an exhaustive layer that ensures that there 
>is no regression seeping into the system. Note that this is not a 
>functionality change, only a change of underlying libraries. 

> 'create if not exists' fails for a table name with 'select' in it
> -
>
> Key: HIVE-1176
> URL: https://issues.apache.org/jira/browse/HIVE-1176
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, 
> HIVE-1176.lib-files.tar.gz, HIVE-1176.patch
>
>
> hive> create table if not exists tmp_select(s string, c string, n int);
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
> start with SELECT)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
> JDOQL Single-String query should always start with SELECT)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
> ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-06-22 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881472#action_12881472
 ] 

Arvind Prabhakar commented on HIVE-1176:


Yes - it appears that the change in behavior can be attributed to the 
difference in major versions.

> 'create if not exists' fails for a table name with 'select' in it
> -
>
> Key: HIVE-1176
> URL: https://issues.apache.org/jira/browse/HIVE-1176
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, 
> HIVE-1176.lib-files.tar.gz, HIVE-1176.patch
>
>
> hive> create table if not exists tmp_select(s string, c string, n int);
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
> start with SELECT)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
> JDOQL Single-String query should always start with SELECT)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
> ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-06-22 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881381#action_12881381
 ] 

Arvind Prabhakar commented on HIVE-1176:


@Paul: I just tested the patch (HIVE-1176-2.patch) on latest trunk and it seems 
to apply cleanly. Can you please try again and see if it works? Also, can you 
post the errors that you are seeing? If necessary, I can break down the patch 
into single-file units to help with applying it. Just let me know either way.


> 'create if not exists' fails for a table name with 'select' in it
> -
>
> Key: HIVE-1176
> URL: https://issues.apache.org/jira/browse/HIVE-1176
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, 
> HIVE-1176.lib-files.tar.gz, HIVE-1176.patch
>
>
> hive> create table if not exists tmp_select(s string, c string, n int);
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
> start with SELECT)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
> JDOQL Single-String query should always start with SELECT)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
> ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1271) Case sensitiveness of type information specified when using custom reducer causes type mismatch

2010-06-22 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881368#action_12881368
 ] 

Arvind Prabhakar commented on HIVE-1271:


@Ashish: Thanks for looking at the patch. 

bq. why remove the check on Category?

I modified all the specialized type infos to be {{final}} - which in turn 
ensures that if the test on {{instanceof}} succeeds, then they have to be the 
same category type. Therefore, the check on category was redundant going 
forward.

bq. Also why drop the default implementation of the equals method for TypeInfo?

I did this for two main reasons - first that fact that it was implementing the 
{{equals()}} but not {{hashCode()}} method. This could lead to unexpected 
behavior when {{TypeInfo}} instances were put in collections. Second, the 
implementation was modified to make both {{equals()}} and {{hashCode()}} 
methods to be made abstract in order to force any (new) child classes to make 
sure that they implement both consistently.

Let me know if you would like to tweak this change as necessary.

> Case sensitiveness of type information specified when using custom reducer 
> causes type mismatch
> ---
>
> Key: HIVE-1271
> URL: https://issues.apache.org/jira/browse/HIVE-1271
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Dilip Joseph
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1271-1.patch, HIVE-1271.patch
>
>
> Type information specified  while using a custom reduce script is converted 
> to lower case, and causes type mismatch during query semantic analysis .  The 
> following REDUCE query where field name =  "userId" failed.
> hive> CREATE TABLE SS (
>> a INT,
>> b INT,
>> vals ARRAY>
>> );
> OK
> hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
>> INSERT OVERWRITE TABLE SS
>> REDUCE *
>> USING 'myreduce.py'
>> AS
>> (a INT,
>> b INT,
>> vals ARRAY>
>> )
>> ;
> FAILED: Error in semantic analysis: line 2:27 Cannot insert into
> target table because column number/types are different SS: Cannot
> convert column 2 from array> to
> array>.
> The same query worked fine after changing "userId" to "userid".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1271) Case sensitiveness of type information specified when using custom reducer causes type mismatch

2010-06-21 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881024#action_12881024
 ] 

Arvind Prabhakar commented on HIVE-1271:


Is anyone reviewing this change? Thanks.

> Case sensitiveness of type information specified when using custom reducer 
> causes type mismatch
> ---
>
> Key: HIVE-1271
> URL: https://issues.apache.org/jira/browse/HIVE-1271
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Dilip Joseph
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1271-1.patch, HIVE-1271.patch
>
>
> Type information specified  while using a custom reduce script is converted 
> to lower case, and causes type mismatch during query semantic analysis .  The 
> following REDUCE query where field name =  "userId" failed.
> hive> CREATE TABLE SS (
>> a INT,
>> b INT,
>> vals ARRAY>
>> );
> OK
> hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
>> INSERT OVERWRITE TABLE SS
>> REDUCE *
>> USING 'myreduce.py'
>> AS
>> (a INT,
>> b INT,
>> vals ARRAY>
>> )
>> ;
> FAILED: Error in semantic analysis: line 2:27 Cannot insert into
> target table because column number/types are different SS: Cannot
> convert column 2 from array> to
> array>.
> The same query worked fine after changing "userId" to "userid".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-06-21 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881023#action_12881023
 ] 

Arvind Prabhakar commented on HIVE-1176:


@Paul: Any updates on this from your end? Thanks.

> 'create if not exists' fails for a table name with 'select' in it
> -
>
> Key: HIVE-1176
> URL: https://issues.apache.org/jira/browse/HIVE-1176
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, 
> HIVE-1176.lib-files.tar.gz, HIVE-1176.patch
>
>
> hive> create table if not exists tmp_select(s string, c string, n int);
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
> start with SELECT)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
> JDOQL Single-String query should always start with SELECT)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
> ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-287) count distinct on multiple columns does not work

2010-06-21 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881020#action_12881020
 ] 

Arvind Prabhakar commented on HIVE-287:
---

@John: Can you please take a look at the updated patch? Let me know if you have 
any feedback for further tweaking this change as necessary.

> count distinct on multiple columns does not work
> 
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, 
> HIVE-287-4.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-287) count distinct on multiple columns does not work

2010-06-21 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-287:
--

Attachment: HIVE-287-4.patch

applies cleanly on trunk and branch-0.6

> count distinct on multiple columns does not work
> 
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, 
> HIVE-287-4.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-287) count distinct on multiple columns does not work

2010-06-17 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1288#action_1288
 ] 

Arvind Prabhakar commented on HIVE-287:
---

@John: I agree with your assessment above. Regarding the count(*), my earlier 
comment was not to imply that there exists a UDAF today, but that it might 
exist in the future. More importantly though, using an empty parameter list as 
an indicator for * would blur the distinction between UDAF(*) vs UDAF() 
invocation. This is one way of many perhaps where parameter overloading could 
lead to confusion and hard to understand code. 

I think introducing {{GenericUDAFResolver2}} interface is a great idea. I also 
like the idea of using a call back for decoupling the invocation from parameter 
list but am concerned that this could lead to perhaps redundant method call and 
object creation. I am not sure if that would add to any significant performance 
penalty in the long run or not. 

I would love to know what the opinion of others interested in this issue is 
regarding this route. If all agree that adding a new interface with callback 
for parameter discovery is acceptable, I can start working on that patch.

> count distinct on multiple columns does not work
> 
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-287) count distinct on multiple columns does not work

2010-06-17 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879983#action_12879983
 ] 

Arvind Prabhakar commented on HIVE-287:
---

@John: Thanks for reviewing this change. I have some follow-up comments and 
suggestions:

bq. isDistinct: this doesn't actually modify the choice of evaluator 
implementation at all, since the actual duplicate elimination takes place 
upstream of the UDAF invocation. So instead of adding this parameter, can we 
instead add a new method supportsDistinct() on GenericUDAFEvaluator? 

While the evaluation may be happening upstream, I was concerned that it does 
not exclude the cases where this information is relevant to the function 
invocation itself. For example, the implementation of {{count}} requires that 
if there is a valid argument list, it must be qualified with {{DISTINCT}}.

bq. isAllColumns: COUNT is probably the only function which is ever even going 
to care about this one. Couldn't we just use an empty array of TypeInfo to 
indicate all columns?

I had a similar idea, but after some consideration opted for a simpler design. 
I felt that overloading arguments to indicate special cases might lead to 
confusion and eventual problem when a use-case emerges that invalidates this 
assumption. 

I do agree with your point that it will be good to stay compatible if possible. 
One way to do it would be as follows:

# Revert the {{GenericUDAFResolver}} to its previous state but make the 
interface deprecated in favor of the abstract base class.
# Push the newly introduced method into {{AbstractGenericUDAFResolver}} 
implementation.
# Modify {{FunctionRegistry.getGenericUDAFEvaluator()}} method to test the 
resolver instance to be type compatible with {{AbstractGenericUDAFResolver}} 
and if so, invoke the new method. Otherwise revert to the old mechanism.

What do you think about this approach?


> count distinct on multiple columns does not work
> 
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-06-16 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879441#action_12879441
 ] 

Arvind Prabhakar commented on HIVE-1176:


bq. Can you elaborate on what you mean by 'some collections were being fetched 
as semi-populated proxies with missing session context leading to NPEs'? Is 
there something I can do to reproduce this?

@Paul: Here are the steps to reproduce this problem:

# Startout with a clean workspace checkout and apply the updated patch 
HIVE-1176-2.patch. 
# Manually revert the file 
{{metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java}}
 to its previous state
# run {{ant package}} from the root of the workspace
# run {{ant test}} from within metastore

You should see failures like the following:
{code}
[junit] testPartition() failed.
[junit] java.lang.NullPointerException
[junit] at 
org.datanucleus.store.mapped.scostore.AbstractMapStore.validateKeyForWriting(AbstractMapStore.java:333)
[junit] at 
org.datanucleus.store.mapped.scostore.JoinMapStore.put(JoinMapStore.java:252)
[junit] at org.datanucleus.sco.backed.Map.put(Map.java:640)
[junit] at 
org.apache.hadoop.hive.metastore.api.Table.putToParameters(Table.java:359)
[junit] at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table(HiveMetaStore.java:1281)
[junit] at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_table(HiveMetaStoreClient.java:140)
[junit] at 
org.apache.hadoop.hive.metastore.TestHiveMetaStore.testAlterTable(TestHiveMetaStore.java:728)
[junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
...
{code}

If you look at 
{{src/gen-javabean/org/apache/hadoop/hive/metastore/api/Table.java}} you would 
notice that the line causing this exception should ideally be a {{HashMap}} and 
not an {{org.datanucleus.store.mapped.scostore.AbstractMapStore}} as indicated 
by the stack trace. This happens because the datanucleus JDO framework replaces 
collections with its own implementation in order to allow lazy-dereferencing 
and optimize for database connections/queries/memory consumption etc.

Lazy loading of collections (and second class objects in general) can be 
disabled at a global level or at entity level. Disabling this globally is 
generally not recommended unless there is evidence backed by extensive testing 
that supports that change. Disabling at an entity level is still OK provided 
the entity object graph is fully dereferenced at all times. This could lead to 
extensive memory consumption in the system in case the entity graph is huge. 

My approach towards fixing the problem was to *not* change the default behavior 
in the general case. Instead I felt that it was better to circumvent this 
problem in the case of a remote metastore by creating a copy explicitly. If you 
have other suggestions on how to address this, please let me know.

Also - more information on the lazy dereferencing mechanism used by datanucleus 
framework can be found [here|http://www.datanucleus.org/plugins/core/sco.html].


> 'create if not exists' fails for a table name with 'select' in it
> -
>
> Key: HIVE-1176
> URL: https://issues.apache.org/jira/browse/HIVE-1176
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, 
> HIVE-1176.lib-files.tar.gz, HIVE-1176.patch
>
>
> hive> create table if not exists tmp_select(s string, c string, n int);
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
> start with SELECT)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
>

[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-06-16 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879432#action_12879432
 ] 

Arvind Prabhakar commented on HIVE-1176:


The updated patch HIVE-1176-2.patch contains the following changes:

#   modified:   build.properties
#   modified:   build.xml
#   modified:   eclipse-templates/.classpath
#   modified:   ivy/ivysettings.xml
#   deleted:lib/datanucleus-core-1.1.2.LICENSE
#   deleted:lib/datanucleus-core-1.1.2.jar
#   deleted:lib/datanucleus-enhancer-1.1.2.LICENSE
#   deleted:lib/datanucleus-enhancer-1.1.2.jar
#   deleted:lib/datanucleus-rdbms-1.1.2.LICENSE
#   deleted:lib/datanucleus-rdbms-1.1.2.jar
#   deleted:lib/jdo2-api-2.3-SNAPSHOT.LICENSE
#   deleted:lib/jdo2-api-2.3-SNAPSHOT.jar
#   modified:   metastore/ivy.xml
#   modified:   
metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java

> 'create if not exists' fails for a table name with 'select' in it
> -
>
> Key: HIVE-1176
> URL: https://issues.apache.org/jira/browse/HIVE-1176
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, 
> HIVE-1176.lib-files.tar.gz, HIVE-1176.patch
>
>
> hive> create table if not exists tmp_select(s string, c string, n int);
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
> start with SELECT)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
> JDOQL Single-String query should always start with SELECT)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
> ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-06-16 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-1176:
---

Attachment: HIVE-1176-2.patch

Updating the patch with latest trunk image. This is necessary since HIVE-1373 
updated the eclipse classpath with connection pool libraries which will be 
outdated with the application of this patch. The updated version of the patch 
takes care of this problem by updating eclipse classpath to use the updated 
libraries instead. Tested out launch configuration via eclipse to make sure it 
is working.

> 'create if not exists' fails for a table name with 'select' in it
> -
>
> Key: HIVE-1176
> URL: https://issues.apache.org/jira/browse/HIVE-1176
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, 
> HIVE-1176.lib-files.tar.gz, HIVE-1176.patch
>
>
> hive> create table if not exists tmp_select(s string, c string, n int);
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
> start with SELECT)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
> JDOQL Single-String query should always start with SELECT)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
> ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-287) count distinct on multiple columns does not work

2010-06-16 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-287:
--

Status: Patch Available  (was: Open)

Submitting the regenerated patch with lastest trunk image. Patch file is 
HIVE-287-3.patch.

> count distinct on multiple columns does not work
> 
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-287) count distinct on multiple columns does not work

2010-06-16 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-287:
--

Attachment: HIVE-287-3.patch

> count distinct on multiple columns does not work
> 
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1139) GroupByOperator sometimes throws OutOfMemory error when there are too many distinct keys

2010-06-09 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877239#action_12877239
 ] 

Arvind Prabhakar commented on HIVE-1139:


Ashish - no problem - let me explain: The problem being addressed by this JIRA 
is that {{GroupByOperator}} and possibly other aggregation operators use 
in-memory maps to store intermediate keys, which could lead to 
{{OutOfMemoryException}} in case the number of such keys is large. It is 
suggested that one way to work around it is to use the {{HashMapWrapper}} class 
which would help alleviate the memory concern since it is capable of spilling 
the excess data to disk.

The {{HashMapWrapper}} however, uses Java serialization to write out the excess 
data. This does not work when the data contains non-serializable objects such 
as {{Writable}} types - {{Text}} etc. What I have done so far is to modify the 
{{HashMapWrapper}} to support full {{java.util.Map}} interface. However, when I 
tried updating the {{GroupByOperator}} to use it, I ran into the said 
serialization problem.

Thats why I was suggesting that perhaps we should decouple the serialization 
problem from enhancing the {{HashMapWrapper}} and let the later be checked 
independently.

> GroupByOperator sometimes throws OutOfMemory error when there are too many 
> distinct keys
> 
>
> Key: HIVE-1139
> URL: https://issues.apache.org/jira/browse/HIVE-1139
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Arvind Prabhakar
>
> When a partial aggregation performed on a mapper, a HashMap is created to 
> keep all distinct keys in main memory. This could leads to OOM exception when 
> there are too many distinct keys for a particular mapper. A workaround is to 
> set the map split size smaller so that each mapper takes less number of rows. 
> A better solution is to use the persistent HashMapWrapper (currently used in 
> CommonJoinOperator) to spill overflow rows to disk. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1139) GroupByOperator sometimes throws OutOfMemory error when there are too many distinct keys

2010-06-09 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877222#action_12877222
 ] 

Arvind Prabhakar commented on HIVE-1139:


If there is interest, I can file a separate JIRA for modifying 
{{HashMapWrapper}} to support the {{java.util.Map}} interface and decouple that 
work from this JIRA. I think there is a lot of benefit in doing just that. 
Also, we could have this JIRA depend upon that as a prerequisite.



> GroupByOperator sometimes throws OutOfMemory error when there are too many 
> distinct keys
> 
>
> Key: HIVE-1139
> URL: https://issues.apache.org/jira/browse/HIVE-1139
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Arvind Prabhakar
>
> When a partial aggregation performed on a mapper, a HashMap is created to 
> keep all distinct keys in main memory. This could leads to OOM exception when 
> there are too many distinct keys for a particular mapper. A workaround is to 
> set the map split size smaller so that each mapper takes less number of rows. 
> A better solution is to use the persistent HashMapWrapper (currently used in 
> CommonJoinOperator) to spill overflow rows to disk. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1139) GroupByOperator sometimes throws OutOfMemory error when there are too many distinct keys

2010-06-08 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876979#action_12876979
 ] 

Arvind Prabhakar commented on HIVE-1139:


I did some preliminary analysis for this JIRA and converted the 
{{HashMapWrapper}} to implement the {{java.util.Map}} interface. This required 
some changes all the way down to the underlying JDBM classes. 

However, this alone is not sufficient to plug it into the {{GroupByOperator}} 
implementation because the data stored in the {{HashMap}} is a mix of 
serializable Java objects as well as {{Writable}}s. Since {{Writable}}s cannot 
be directly serialized to Java, it follows that inorder to use this for fixing 
the memory problem we need _an external serialization_ mechanism that can 
handle arbitrary mixed type object graphs.

A trivial approach to address this would be to implement custom serialization 
using Java reflection but that would incur cost of excessive reflection and 
byte handling/marshaling.

If you have any other ideas regarding this, please add it to the comments of 
this issue for consideration.


> GroupByOperator sometimes throws OutOfMemory error when there are too many 
> distinct keys
> 
>
> Key: HIVE-1139
> URL: https://issues.apache.org/jira/browse/HIVE-1139
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Arvind Prabhakar
>
> When a partial aggregation performed on a mapper, a HashMap is created to 
> keep all distinct keys in main memory. This could leads to OOM exception when 
> there are too many distinct keys for a particular mapper. A workaround is to 
> set the map split size smaller so that each mapper takes less number of rows. 
> A better solution is to use the persistent HashMapWrapper (currently used in 
> CommonJoinOperator) to spill overflow rows to disk. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-06-06 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876127#action_12876127
 ] 

Arvind Prabhakar commented on HIVE-1176:


I think the difference is more likely a bug in Mac OSX version of {{sed}}. 
Specifically, it fails to process directives with escaped tab sequence 
characters and instead treats it as unescaped. For example, the command to 
replace first occurrence of *b* in the string *abc* with a tab character *\t* 
fails as shown below:

{code}
$ echo "abc" | /usr/bin/sed "s...@b@\t@"
atc
{code}

Whereas this works fine with the GNU distribution of sed

{code}
$ echo "abc" | /opt/local/bin/sed "s...@b@\t@"
a   c
{code}

> 'create if not exists' fails for a table name with 'select' in it
> -
>
> Key: HIVE-1176
> URL: https://issues.apache.org/jira/browse/HIVE-1176
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Arvind Prabhakar
> Attachments: HIVE-1176-1.patch, HIVE-1176.lib-files.tar.gz, 
> HIVE-1176.patch
>
>
> hive> create table if not exists tmp_select(s string, c string, n int);
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
> start with SELECT)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
> JDOQL Single-String query should always start with SELECT)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
> ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-06-05 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12875960#action_12875960
 ] 

Arvind Prabhakar commented on HIVE-1176:


John - I debugged the failures that I was seeing for input20 and input33 and it 
turns out to be a subtle difference in the way the stream editor {{sed}} works 
on Mac vs the regular linux distribution. I installed the GNU port for {{sed}} 
and the failures no longer occur.

I don't think this is related to the sporadic failures that are reported on 
hudson.

> 'create if not exists' fails for a table name with 'select' in it
> -
>
> Key: HIVE-1176
> URL: https://issues.apache.org/jira/browse/HIVE-1176
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Arvind Prabhakar
> Attachments: HIVE-1176-1.patch, HIVE-1176.lib-files.tar.gz, 
> HIVE-1176.patch
>
>
> hive> create table if not exists tmp_select(s string, c string, n int);
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
> start with SELECT)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
> JDOQL Single-String query should always start with SELECT)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
> ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-365) Create Table to support multiple levels of delimiters

2010-06-04 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12875677#action_12875677
 ] 

Arvind Prabhakar commented on HIVE-365:
---

For the table "nested" as defined above, a row that contains the following data:
[ [1,2,3],[10,20,30] ], { {foo:{1:1} }, {bar:{2,2} } }

would be represented as:
1 \003 2 \003 3 \002 10 \003 20 \003 30 \001 foo \003 1 \004 1 \002 \bar \003 2 
\004 2

note: spaces added for readability

> Create Table to support multiple levels of delimiters
> -
>
> Key: HIVE-365
> URL: https://issues.apache.org/jira/browse/HIVE-365
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Zheng Shao
>
> From HIVE-337, the SerDe layer now supports multiple-levels of delimiters, 
> for the purpose of supporting nested map/array/struct.
> Array(the same as List) and struct consume a single level of separator, and 
> Map consumes 2 levels.
> DDL (Create Table) needs to allow users to specify multiple levels of 
> delimiters in order to take the advantage of this new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1139) GroupByOperator sometimes throws OutOfMemory error when there are too many distinct keys

2010-06-04 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12875650#action_12875650
 ] 

Arvind Prabhakar commented on HIVE-1139:


Soundararajan, Ning - Yes I am planning on working on it starting next week. I 
expect this to take at least upto mid to late in the week in order to get a 
patch available for this. However, if that schedule does not work for you, 
please feel free to take this issue into your queue and go ahead. It will be 
great if you could confirm it either way first.

Arvind

> GroupByOperator sometimes throws OutOfMemory error when there are too many 
> distinct keys
> 
>
> Key: HIVE-1139
> URL: https://issues.apache.org/jira/browse/HIVE-1139
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Arvind Prabhakar
>
> When a partial aggregation performed on a mapper, a HashMap is created to 
> keep all distinct keys in main memory. This could leads to OOM exception when 
> there are too many distinct keys for a particular mapper. A workaround is to 
> set the map split size smaller so that each mapper takes less number of rows. 
> A better solution is to use the persistent HashMapWrapper (currently used in 
> CommonJoinOperator) to spill overflow rows to disk. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-802) Bug in DataNucleus prevents Hive from building if inside a dir with '+' in it

2010-05-30 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-802:
--

Status: Patch Available  (was: Open)

A patch for HIVE-1176 has been submitted which addresses this problem by 
updating the datanucleus plugin as well as dependent libraries for Hive. 
Marking this JIRA as patch-submitted.

> Bug in DataNucleus prevents Hive from building if inside a dir with '+' in it
> -
>
> Key: HIVE-802
> URL: https://issues.apache.org/jira/browse/HIVE-802
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Reporter: Todd Lipcon
>Assignee: Arvind Prabhakar
>
> There's a bug in DataNucleus that causes this issue:
> http://www.jpox.org/servlet/jira/browse/NUCCORE-371
> To reproduce, simply put your hive source tree in a directory that contains a 
> '+' character.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-05-30 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12873557#action_12873557
 ] 

Arvind Prabhakar commented on HIVE-1176:


Updated the patch so that it cleanly applies to the trunk. 

*Changes from Previous Patch:*

* This patch uses ivy to download the updated datanucleus plugins and other 
dependent libraries. There is no need to use the previously supplied tar.gz 
anymore.
* At the time the previous patch was written, the enhancer plugin version was 
2.0.1 which by default would enable annotation processing. Since then version 
2.0.3 has been released which [disables this 
behavior|http://www.datanucleus.org/servlet/jira/browse/NUCENHANCER-56]. Hence 
the previously submitted changes to {{javac.args}} in build.properties file are 
no longer necessary. This patch uses the updated version of the datanucleus 
enhancer plugin.
* The JDO2 API library used by datanucleus plugin is distributed by the 
datanucleus's public maven repository. This repository has been added to ivy 
configuration to automate the download.
* The connection pool libraries have been updated to work with the newer 
datanucleus plugins. Library {{commons-dbcp}} has been updated from 1.2.2 to 
1.4, {{commons-pool}} from 1.2 to 1.5.4, and {{datanucleus-connectionpool}} 
from 1.0.2 to 2.0.1.
* As with the previously submitted patch, {{HiveMetaStoreClient}} 
implementation has been modified to create deep-copies of non-primitive objects 
being returned from the thrift server in order to avoid semi-populated proxies 
causing NPEs downstream.

*Testing Done:*

Built and ran all tests. Only two failures were reported as before - 
clientpositive test for input20.q and input33.q. These tests appear to be 
failing on the trunk as well.


> 'create if not exists' fails for a table name with 'select' in it
> -
>
> Key: HIVE-1176
> URL: https://issues.apache.org/jira/browse/HIVE-1176
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Arvind Prabhakar
> Attachments: HIVE-1176-1.patch, HIVE-1176.lib-files.tar.gz, 
> HIVE-1176.patch
>
>
> hive> create table if not exists tmp_select(s string, c string, n int);
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
> start with SELECT)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
> JDOQL Single-String query should always start with SELECT)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
> ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-05-30 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-1176:
---

Attachment: HIVE-1176-1.patch

This patch replaces the patch submitted before (HIVE-1176.lib-files.tar.gz and 
HIVE-1176.patch).

{code}
HIVE-1176-1.patch:

#   modified:   build.properties
#   modified:   build.xml
#   modified:   eclipse-templates/.classpath
#   modified:   ivy/ivysettings.xml
#   deleted:lib/datanucleus-core-1.1.2.LICENSE
#   deleted:lib/datanucleus-core-1.1.2.jar
#   deleted:lib/datanucleus-enhancer-1.1.2.LICENSE
#   deleted:lib/datanucleus-enhancer-1.1.2.jar
#   deleted:lib/datanucleus-rdbms-1.1.2.LICENSE
#   deleted:lib/datanucleus-rdbms-1.1.2.jar
#   deleted:lib/jdo2-api-2.3-SNAPSHOT.LICENSE
#   deleted:lib/jdo2-api-2.3-SNAPSHOT.jar
#   modified:   metastore/ivy.xml
#   modified:   
metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
{code}

> 'create if not exists' fails for a table name with 'select' in it
> -
>
> Key: HIVE-1176
> URL: https://issues.apache.org/jira/browse/HIVE-1176
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Arvind Prabhakar
> Attachments: HIVE-1176-1.patch, HIVE-1176.lib-files.tar.gz, 
> HIVE-1176.patch
>
>
> hive> create table if not exists tmp_select(s string, c string, n int);
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
> start with SELECT)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
> JDOQL Single-String query should always start with SELECT)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
> ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-80) Allow Hive Server to run multiple queries simulteneously

2010-05-27 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-80?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar reassigned HIVE-80:


Assignee: Arvind Prabhakar  (was: Neil Conway)

> Allow Hive Server to run multiple queries simulteneously
> 
>
> Key: HIVE-80
> URL: https://issues.apache.org/jira/browse/HIVE-80
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Server Infrastructure
>Reporter: Raghotham Murthy
>Assignee: Arvind Prabhakar
>Priority: Critical
> Attachments: hive_input_format_race-2.patch
>
>
> Can use one driver object per query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1198) When checkstyle is activated for Hive in Eclipse environment, it shows all checkstyle problems as errors.

2010-05-27 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12872296#action_12872296
 ] 

Arvind Prabhakar commented on HIVE-1198:


Ning - I have attached an updated patch - (hive-1198-2.patch). The key 
difference in this patch is that it does not activate checkstyle by default. 
When you import the hive project and you wish to activate checkstyle, you will 
have to right click on the project  and select Checkstyle > Activate Checkstyle 
from the context menu.

So in case checkstyle is causing problems on your workbench, you can choose to 
not activate it. The steps to activate checkstyle plugin in eclipse are also 
documented in the README.txt file, right below the section on setting up 
Eclipse. 

Can you give this patch a try and see if it resolves the problem you were 
facing?


> When checkstyle is activated for Hive in Eclipse environment, it shows all 
> checkstyle problems as errors.
> -
>
> Key: HIVE-1198
> URL: https://issues.apache.org/jira/browse/HIVE-1198
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
> Environment: Mac OS X (10.6.2), Eclipse 3.5.1.R35, Checkstyle Plugin 
> 5.1.0.201002232103 (latest eclipse and checkstyle build as of 02/2010)
>Reporter: Arvind Prabhakar
>Assignee: Arvind Prabhakar
>Priority: Minor
> Attachments: HIVE-1198-1.patch, HIVE-1198-2.patch, HIVE-1198.patch
>
>
> As of now, checkstyle plugin reports all problems as errors. This causes an 
> overwhelming number of errors to show up (3000+) which masks real errors that 
> might be there. Since all the checkstyle violations are not going to be fixed 
> in one shot, it is desirable to lower the severity of checkstyle violations 
> to warnings so that the plugin can be kept enabled. This will encourage 
> developers to spot checkstyle violations in the files they touch and 
> potentially fix them as they go along, along with pointing out violations as 
> they code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1198) When checkstyle is activated for Hive in Eclipse environment, it shows all checkstyle problems as errors.

2010-05-27 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-1198:
---

Attachment: HIVE-1198-2.patch

> When checkstyle is activated for Hive in Eclipse environment, it shows all 
> checkstyle problems as errors.
> -
>
> Key: HIVE-1198
> URL: https://issues.apache.org/jira/browse/HIVE-1198
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
> Environment: Mac OS X (10.6.2), Eclipse 3.5.1.R35, Checkstyle Plugin 
> 5.1.0.201002232103 (latest eclipse and checkstyle build as of 02/2010)
>Reporter: Arvind Prabhakar
>Assignee: Arvind Prabhakar
>Priority: Minor
> Attachments: HIVE-1198-1.patch, HIVE-1198-2.patch, HIVE-1198.patch
>
>
> As of now, checkstyle plugin reports all problems as errors. This causes an 
> overwhelming number of errors to show up (3000+) which masks real errors that 
> might be there. Since all the checkstyle violations are not going to be fixed 
> in one shot, it is desirable to lower the severity of checkstyle violations 
> to warnings so that the plugin can be kept enabled. This will encourage 
> developers to spot checkstyle violations in the files they touch and 
> potentially fix them as they go along, along with pointing out violations as 
> they code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-80) Allow Hive Server to run multiple queries simulteneously

2010-05-27 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-80?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12872282#action_12872282
 ] 

Arvind Prabhakar commented on HIVE-80:
--

This sounds like a good plan. If Neil is not actively working on this issue, I 
can move this to my queue and start working on it. 

> Allow Hive Server to run multiple queries simulteneously
> 
>
> Key: HIVE-80
> URL: https://issues.apache.org/jira/browse/HIVE-80
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Server Infrastructure
>Reporter: Raghotham Murthy
>Assignee: Neil Conway
>Priority: Critical
> Attachments: hive_input_format_race-2.patch
>
>
> Can use one driver object per query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-802) Bug in DataNucleus prevents Hive from building if inside a dir with '+' in it

2010-05-26 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar reassigned HIVE-802:
-

Assignee: Arvind Prabhakar

> Bug in DataNucleus prevents Hive from building if inside a dir with '+' in it
> -
>
> Key: HIVE-802
> URL: https://issues.apache.org/jira/browse/HIVE-802
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Reporter: Todd Lipcon
>Assignee: Arvind Prabhakar
>
> There's a bug in DataNucleus that causes this issue:
> http://www.jpox.org/servlet/jira/browse/NUCCORE-371
> To reproduce, simply put your hive source tree in a directory that contains a 
> '+' character.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-802) Bug in DataNucleus prevents Hive from building if inside a dir with '+' in it

2010-05-26 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12872029#action_12872029
 ] 

Arvind Prabhakar commented on HIVE-802:
---

The patch submitted for HIVE-1176 would upgrade the data nucleus plugin to the 
latest stable version which does have a fix for this issue.

> Bug in DataNucleus prevents Hive from building if inside a dir with '+' in it
> -
>
> Key: HIVE-802
> URL: https://issues.apache.org/jira/browse/HIVE-802
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Reporter: Todd Lipcon
>
> There's a bug in DataNucleus that causes this issue:
> http://www.jpox.org/servlet/jira/browse/NUCCORE-371
> To reproduce, simply put your hive source tree in a directory that contains a 
> '+' character.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1198) When checkstyle is activated for Hive in Eclipse environment, it shows all checkstyle problems as errors.

2010-05-26 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871968#action_12871968
 ] 

Arvind Prabhakar commented on HIVE-1198:


I just installed freshly downloaded eclipse on Ubuntu desktop 9.10, with Java 
1.6.0_20, checkstyle 5.1. The version of eclipse build is 20100218-1602 (latest 
galileo SR2 build). I was able to import the project in under 45 seconds. 

Since you are using version 3.6 of eclipse that is not yet released, perhaps 
that is why you are seeing this problem. Can you try reproducing this issue 
with a stable release of eclipse such as galileo SR2?

> When checkstyle is activated for Hive in Eclipse environment, it shows all 
> checkstyle problems as errors.
> -
>
> Key: HIVE-1198
> URL: https://issues.apache.org/jira/browse/HIVE-1198
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
> Environment: Mac OS X (10.6.2), Eclipse 3.5.1.R35, Checkstyle Plugin 
> 5.1.0.201002232103 (latest eclipse and checkstyle build as of 02/2010)
>Reporter: Arvind Prabhakar
>Assignee: Arvind Prabhakar
>Priority: Minor
> Attachments: HIVE-1198-1.patch, HIVE-1198.patch
>
>
> As of now, checkstyle plugin reports all problems as errors. This causes an 
> overwhelming number of errors to show up (3000+) which masks real errors that 
> might be there. Since all the checkstyle violations are not going to be fixed 
> in one shot, it is desirable to lower the severity of checkstyle violations 
> to warnings so that the plugin can be kept enabled. This will encourage 
> developers to spot checkstyle violations in the files they touch and 
> potentially fix them as they go along, along with pointing out violations as 
> they code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1198) When checkstyle is activated for Hive in Eclipse environment, it shows all checkstyle problems as errors.

2010-05-26 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871952#action_12871952
 ] 

Arvind Prabhakar commented on HIVE-1198:


I will try to reproduce this on a linux box and note any findings in the 
comments.

> When checkstyle is activated for Hive in Eclipse environment, it shows all 
> checkstyle problems as errors.
> -
>
> Key: HIVE-1198
> URL: https://issues.apache.org/jira/browse/HIVE-1198
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
> Environment: Mac OS X (10.6.2), Eclipse 3.5.1.R35, Checkstyle Plugin 
> 5.1.0.201002232103 (latest eclipse and checkstyle build as of 02/2010)
>Reporter: Arvind Prabhakar
>Assignee: Arvind Prabhakar
>Priority: Minor
> Attachments: HIVE-1198-1.patch, HIVE-1198.patch
>
>
> As of now, checkstyle plugin reports all problems as errors. This causes an 
> overwhelming number of errors to show up (3000+) which masks real errors that 
> might be there. Since all the checkstyle violations are not going to be fixed 
> in one shot, it is desirable to lower the severity of checkstyle violations 
> to warnings so that the plugin can be kept enabled. This will encourage 
> developers to spot checkstyle violations in the files they touch and 
> potentially fix them as they go along, along with pointing out violations as 
> they code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1198) When checkstyle is activated for Hive in Eclipse environment, it shows all checkstyle problems as errors.

2010-05-26 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871882#action_12871882
 ] 

Arvind Prabhakar commented on HIVE-1198:


I do not see any slow down Ning. I tested it just now and the project imports 
and builds under 40 seconds. 

Did you do the ant package, model-jar and gen-test before importing the project 
in eclipse? Without doing that, eclipse will not find the necessary classpath 
entires and that could lead to slow down.

> When checkstyle is activated for Hive in Eclipse environment, it shows all 
> checkstyle problems as errors.
> -
>
> Key: HIVE-1198
> URL: https://issues.apache.org/jira/browse/HIVE-1198
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
> Environment: Mac OS X (10.6.2), Eclipse 3.5.1.R35, Checkstyle Plugin 
> 5.1.0.201002232103 (latest eclipse and checkstyle build as of 02/2010)
>Reporter: Arvind Prabhakar
>Assignee: Arvind Prabhakar
>Priority: Minor
> Attachments: HIVE-1198-1.patch, HIVE-1198.patch
>
>
> As of now, checkstyle plugin reports all problems as errors. This causes an 
> overwhelming number of errors to show up (3000+) which masks real errors that 
> might be there. Since all the checkstyle violations are not going to be fixed 
> in one shot, it is desirable to lower the severity of checkstyle violations 
> to warnings so that the plugin can be kept enabled. This will encourage 
> developers to spot checkstyle violations in the files they touch and 
> potentially fix them as they go along, along with pointing out violations as 
> they code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1198) When checkstyle is activated for Hive in Eclipse environment, it shows all checkstyle problems as errors.

2010-05-26 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871830#action_12871830
 ] 

Arvind Prabhakar commented on HIVE-1198:


Updated the patch so that it cleanly applies to the trunk. It will be great to 
have this patch committed as it really helps in using eclipse effectively.

> When checkstyle is activated for Hive in Eclipse environment, it shows all 
> checkstyle problems as errors.
> -
>
> Key: HIVE-1198
> URL: https://issues.apache.org/jira/browse/HIVE-1198
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
> Environment: Mac OS X (10.6.2), Eclipse 3.5.1.R35, Checkstyle Plugin 
> 5.1.0.201002232103 (latest eclipse and checkstyle build as of 02/2010)
>Reporter: Arvind Prabhakar
>Assignee: Arvind Prabhakar
>Priority: Minor
> Attachments: HIVE-1198-1.patch, HIVE-1198.patch
>
>
> As of now, checkstyle plugin reports all problems as errors. This causes an 
> overwhelming number of errors to show up (3000+) which masks real errors that 
> might be there. Since all the checkstyle violations are not going to be fixed 
> in one shot, it is desirable to lower the severity of checkstyle violations 
> to warnings so that the plugin can be kept enabled. This will encourage 
> developers to spot checkstyle violations in the files they touch and 
> potentially fix them as they go along, along with pointing out violations as 
> they code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1198) When checkstyle is activated for Hive in Eclipse environment, it shows all checkstyle problems as errors.

2010-05-26 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-1198:
---

Attachment: HIVE-1198-1.patch

> When checkstyle is activated for Hive in Eclipse environment, it shows all 
> checkstyle problems as errors.
> -
>
> Key: HIVE-1198
> URL: https://issues.apache.org/jira/browse/HIVE-1198
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
> Environment: Mac OS X (10.6.2), Eclipse 3.5.1.R35, Checkstyle Plugin 
> 5.1.0.201002232103 (latest eclipse and checkstyle build as of 02/2010)
>Reporter: Arvind Prabhakar
>Assignee: Arvind Prabhakar
>Priority: Minor
> Attachments: HIVE-1198-1.patch, HIVE-1198.patch
>
>
> As of now, checkstyle plugin reports all problems as errors. This causes an 
> overwhelming number of errors to show up (3000+) which masks real errors that 
> might be there. Since all the checkstyle violations are not going to be fixed 
> in one shot, it is desirable to lower the severity of checkstyle violations 
> to warnings so that the plugin can be kept enabled. This will encourage 
> developers to spot checkstyle violations in the files they touch and 
> potentially fix them as they go along, along with pointing out violations as 
> they code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-287) count distinct on multiple columns does not work

2010-05-25 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871494#action_12871494
 ] 

Arvind Prabhakar commented on HIVE-287:
---

Modified the implementation as per review feedback - Introduced an abstract 
base class and reverted the resolvers to extend that instead of directly 
implementing the new functionality.




> count distinct on multiple columns does not work
> 
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-287) count distinct on multiple columns does not work

2010-05-25 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-287:
--

Status: Patch Available  (was: Open)

> count distinct on multiple columns does not work
> 
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-287) count distinct on multiple columns does not work

2010-05-25 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-287:
--

Attachment: HIVE-287-2.patch

> count distinct on multiple columns does not work
> 
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1179) Add UDF array_contains

2010-05-25 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871331#action_12871331
 ] 

Arvind Prabhakar commented on HIVE-1179:


I took a quick look at the various Operator implementations and found that the 
ones that store any evaluated expression results end up creating copies anyway 
- {{ObjectInspectorUtils.copyToStandardObject()}}. So although it appears that 
the system is working normally by reusing the object instance at the UDF level, 
code elsewhere in the system is forced to do the defensive copying. 

To clarify, my concern is not regarding a problem that may currently exist - 
but the potential problems that could occur due to not making defensive copies 
of mutable objects. If you are certain that this is does not apply to Hive 
implementation, then the updated patch should be fine for pushing in.

> Add UDF array_contains
> --
>
> Key: HIVE-1179
> URL: https://issues.apache.org/jira/browse/HIVE-1179
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Zheng Shao
>Assignee: Arvind Prabhakar
> Attachments: HIVE-1179-1.patch, HIVE-1179-2.patch, HIVE-1179-3.patch, 
> HIVE-1179.patch
>
>
> Returns true or false, depending on whether an element is in an array.
> {{array_contains(T element, array theArray)}}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1179) Add UDF array_contains

2010-05-24 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-1179:
---

Status: Patch Available  (was: Open)

Paul - I have updated the patch. I do not have any examples of queries that 
will produce this failure today as no UDAF that can be applied to boolean input 
today does batch processing. My concern was primarily for creating defensive 
objects to guard against inadvertent mutation. 

> Add UDF array_contains
> --
>
> Key: HIVE-1179
> URL: https://issues.apache.org/jira/browse/HIVE-1179
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Zheng Shao
>Assignee: Arvind Prabhakar
> Attachments: HIVE-1179-1.patch, HIVE-1179-2.patch, HIVE-1179-3.patch, 
> HIVE-1179.patch
>
>
> Returns true or false, depending on whether an element is in an array.
> {{array_contains(T element, array theArray)}}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1179) Add UDF array_contains

2010-05-24 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-1179:
---

Attachment: HIVE-1179-3.patch

> Add UDF array_contains
> --
>
> Key: HIVE-1179
> URL: https://issues.apache.org/jira/browse/HIVE-1179
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Zheng Shao
>Assignee: Arvind Prabhakar
> Attachments: HIVE-1179-1.patch, HIVE-1179-2.patch, HIVE-1179-3.patch, 
> HIVE-1179.patch
>
>
> Returns true or false, depending on whether an element is in an array.
> {{array_contains(T element, array theArray)}}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-287) count distinct on multiple columns does not work

2010-05-24 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870860#action_12870860
 ] 

Arvind Prabhakar commented on HIVE-287:
---

Thanks for taking a look at this patch Namit. I have some questions and 
clarifcations regarding your feedback:

bq. 1. This should be independent of COUNT - so, all basically all aggregation 
functions should be supported with DISTINCT.
For eg: select avg(distinct c1,c2) from T

Not sure how this relates to the change I made. Even before making this change, 
the DISTINCT qualifier was allowed for any function invocation. Can you 
elaborate what you mean by this? Specifically, which part of the patch needs to 
be changed in order to accomodate this request.

bq. 2. It would be a good idea to maintain some compatibility for the existing 
interface - so, can we add another method to UDAFResolver, which has the new 
API - and a common class which invokes the default implementation, that would 
be better.

Here is what I understand your suggestion as: Add a new method to 
GenericUDAFResolver interface maintaining the old method. Create an abstract 
base class that implements the new interface method and invokes the old method 
by dropping isDistinct/isAllColumn arguments. Extend the current resolvers to 
override this method. Will this address your concern? If not, can you provide a 
concrete example.

bq. 3. Follows from 1 - more tests are needed

Are you suggesting more tests for array_contains UDF or to add more tests for 
other UDFs? Please clarify with examples if possible.

> count distinct on multiple columns does not work
> 
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-287-1.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1179) Add UDF array_contains

2010-05-23 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870475#action_12870475
 ] 

Arvind Prabhakar commented on HIVE-1179:


bq. One minor point - can you make result a member variable of 
GenericUDFArrayContains? This will reduce object creation.

While this will reduce object creation, it will cause correctness problems when 
this UDF is used in an aggregate operation. Using a member variable for 
{{result}} would then mean that all values of aggregated output will always 
reflect the evaluated value of the last row. A similar problem would occur if 
there is a lag between collecting and processing of output values. Hence my 
preference is to keep the implementation as is (stateless).

If you still would like to make it a member variable, please let me know and I 
can make that change. 

> Add UDF array_contains
> --
>
> Key: HIVE-1179
> URL: https://issues.apache.org/jira/browse/HIVE-1179
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Zheng Shao
>Assignee: Arvind Prabhakar
> Attachments: HIVE-1179-1.patch, HIVE-1179-2.patch, HIVE-1179.patch
>
>
> Returns true or false, depending on whether an element is in an array.
> {{array_contains(T element, array theArray)}}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1179) Add UDF array_contains

2010-05-23 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-1179:
---

Attachment: HIVE-1179-2.patch

Updated the patch to work with current trunk.

> Add UDF array_contains
> --
>
> Key: HIVE-1179
> URL: https://issues.apache.org/jira/browse/HIVE-1179
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Zheng Shao
>Assignee: Arvind Prabhakar
> Attachments: HIVE-1179-1.patch, HIVE-1179-2.patch, HIVE-1179.patch
>
>
> Returns true or false, depending on whether an element is in an array.
> {{array_contains(T element, array theArray)}}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-1198) When checkstyle is activated for Hive in Eclipse environment, it shows all checkstyle problems as errors.

2010-05-20 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar reassigned HIVE-1198:
--

Assignee: Arvind Prabhakar

> When checkstyle is activated for Hive in Eclipse environment, it shows all 
> checkstyle problems as errors.
> -
>
> Key: HIVE-1198
> URL: https://issues.apache.org/jira/browse/HIVE-1198
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
> Environment: Mac OS X (10.6.2), Eclipse 3.5.1.R35, Checkstyle Plugin 
> 5.1.0.201002232103 (latest eclipse and checkstyle build as of 02/2010)
>Reporter: Arvind Prabhakar
>Assignee: Arvind Prabhakar
>Priority: Minor
> Attachments: HIVE-1198.patch
>
>
> As of now, checkstyle plugin reports all problems as errors. This causes an 
> overwhelming number of errors to show up (3000+) which masks real errors that 
> might be there. Since all the checkstyle violations are not going to be fixed 
> in one shot, it is desirable to lower the severity of checkstyle violations 
> to warnings so that the plugin can be kept enabled. This will encourage 
> developers to spot checkstyle violations in the files they touch and 
> potentially fix them as they go along, along with pointing out violations as 
> they code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-287) count distinct on multiple columns does not work

2010-05-20 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-287:
--

   Status: Patch Available  (was: Open)
Fix Version/s: 0.6.0

*Summary*
This patch fixes the {{count()}} aggregate function to be consistent with SQL. 
Specifically:
* Provides support for {{SELECT count(*) FROM table}} queries, where it returns 
the total number of rows of the table.
* Also extended the support for {{count()}} to include multiple expression 
list. {{count(DISTINCT expr1, exp2,...)}} returns the number of non-NULL and 
different valued rows from the evaluated expressions.

*Details*
* Modified HiveQL grammar to allow function invocation with a single * in place 
of parameter list.
* Propagated the presence of * as parameter or specification of {{DISTINCT}} 
keyword in the UDF resolver framework so that it can be used by UDFs that 
behave differently when these are applicable.
* Modified the {{count()}} UDAF to support the same semantics of handling NULL 
values as regular SQL.
* Added test case to specifically exercise the newly introduced semantics of 
the count UDAF.

*Testing*
Ran all tests. Noted only two failures (input20.q, input33.q) which were found 
to be failing on the local trunk image as well.

If and when this patch is committed to the trunk, I will go ahead and update 
the Hive Wiki with details and examples regarding the use of {{count()}} UDAF 
in various forms.


> count distinct on multiple columns does not work
> 
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-287-1.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-287) count distinct on multiple columns does not work

2010-05-20 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-287:
--

Attachment: HIVE-287-1.patch

> count distinct on multiple columns does not work
> 
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Attachments: HIVE-287-1.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-1139) GroupByOperator sometimes throws OutOfMemory error when there are too many distinct keys

2010-05-18 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar reassigned HIVE-1139:
--

Assignee: Arvind Prabhakar  (was: Ning Zhang)

> GroupByOperator sometimes throws OutOfMemory error when there are too many 
> distinct keys
> 
>
> Key: HIVE-1139
> URL: https://issues.apache.org/jira/browse/HIVE-1139
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Arvind Prabhakar
>
> When a partial aggregation performed on a mapper, a HashMap is created to 
> keep all distinct keys in main memory. This could leads to OOM exception when 
> there are too many distinct keys for a particular mapper. A workaround is to 
> set the map split size smaller so that each mapper takes less number of rows. 
> A better solution is to use the persistent HashMapWrapper (currently used in 
> CommonJoinOperator) to spill overflow rows to disk. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-287) count distinct on multiple columns does not work

2010-05-17 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar reassigned HIVE-287:
-

Assignee: Arvind Prabhakar

> count distinct on multiple columns does not work
> 
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1029) typedbytes does not support nulls

2010-05-16 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-1029:
---

Status: Patch Available  (was: Open)

This patch adds support for NULL types in TypedBytesSerDe.

> typedbytes does not support nulls
> -
>
> Key: HIVE-1029
> URL: https://issues.apache.org/jira/browse/HIVE-1029
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1029.patch
>
>
> typedbytes does not support nulls

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1029) typedbytes does not support nulls

2010-05-16 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-1029:
---

Attachment: HIVE-1029.patch

> typedbytes does not support nulls
> -
>
> Key: HIVE-1029
> URL: https://issues.apache.org/jira/browse/HIVE-1029
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1029.patch
>
>
> typedbytes does not support nulls

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-1029) typedbytes does not support nulls

2010-05-16 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar reassigned HIVE-1029:
--

Assignee: Arvind Prabhakar

> typedbytes does not support nulls
> -
>
> Key: HIVE-1029
> URL: https://issues.apache.org/jira/browse/HIVE-1029
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
>
> typedbytes does not support nulls

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1345) TypedBytesSerDe fails to create table with multiple columns.

2010-05-14 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-1345:
---

Status: Patch Available  (was: Open)

The problem was due to incorrect parsing of the {{columnTypeProperty}} during 
the initialization of {{TypedBytesSerDe}}. This patch fixes the problem by 
delegating the parsing logic to the standard routine used by other SerDes - 
{{TypeInfoUtils.getTypeInfosFromTypeString()}}.

Also included in this patch is a test case that exercises this change and 
validates that multi-column tables can be created when using this SerDe.

> TypedBytesSerDe fails to create table with multiple columns.
> 
>
> Key: HIVE-1345
> URL: https://issues.apache.org/jira/browse/HIVE-1345
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Contrib
>Affects Versions: 0.5.0
> Environment: JDK 6 (1.6.0_17) on Mac OSX 10.6.3, Hadoop 0.20.2, Hive 
> 0.5.0
>Reporter: Arvind Prabhakar
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1345-1.patch
>
>
> Creating a table with more than one columns fails when the row format SerDe 
> is TypedBytesSerDe. 
> {code}
> hive> CREATE TABLE test (a STRING, b STRING) ROW FORMAT SERDE 
> 'org.apache.hadoop.hive.contrib.serde2.TypedBytesSerDe';  
> Found class for org.apache.hadoop.hive.contrib.serde2.TypedBytesSerDe 
>   
> FAILED: Error in metadata: java.lang.IndexOutOfBoundsException: Index: 1, 
> Size: 1   
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask
>   
> hive> 
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1345) TypedBytesSerDe fails to create table with multiple columns.

2010-05-14 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-1345:
---

Attachment: HIVE-1345-1.patch

> TypedBytesSerDe fails to create table with multiple columns.
> 
>
> Key: HIVE-1345
> URL: https://issues.apache.org/jira/browse/HIVE-1345
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Contrib
>Affects Versions: 0.5.0
> Environment: JDK 6 (1.6.0_17) on Mac OSX 10.6.3, Hadoop 0.20.2, Hive 
> 0.5.0
>Reporter: Arvind Prabhakar
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1345-1.patch
>
>
> Creating a table with more than one columns fails when the row format SerDe 
> is TypedBytesSerDe. 
> {code}
> hive> CREATE TABLE test (a STRING, b STRING) ROW FORMAT SERDE 
> 'org.apache.hadoop.hive.contrib.serde2.TypedBytesSerDe';  
> Found class for org.apache.hadoop.hive.contrib.serde2.TypedBytesSerDe 
>   
> FAILED: Error in metadata: java.lang.IndexOutOfBoundsException: Index: 1, 
> Size: 1   
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask
>   
> hive> 
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1345) TypedBytesSerDe fails to create table with multiple columns.

2010-05-14 Thread Arvind Prabhakar (JIRA)

TypedBytesSerDe fails to create table with multiple columns.


 Key: HIVE-1345
 URL: https://issues.apache.org/jira/browse/HIVE-1345
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Contrib
Affects Versions: 0.5.0
 Environment: JDK 6 (1.6.0_17) on Mac OSX 10.6.3, Hadoop 0.20.2, Hive 
0.5.0
Reporter: Arvind Prabhakar
Assignee: Arvind Prabhakar
 Fix For: 0.6.0


Creating a table with more than one columns fails when the row format SerDe is 
TypedBytesSerDe. 


{code}
hive> CREATE TABLE test (a STRING, b STRING) ROW FORMAT SERDE 
'org.apache.hadoop.hive.contrib.serde2.TypedBytesSerDe';  
Found class for org.apache.hadoop.hive.contrib.serde2.TypedBytesSerDe   

FAILED: Error in metadata: java.lang.IndexOutOfBoundsException: Index: 1, Size: 
1   
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask  
hive> 
{code}



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-80) Allow Hive Server to run multiple queries simulteneously

2010-05-14 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-80?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867693#action_12867693
 ] 

Arvind Prabhakar commented on HIVE-80:
--

I wanted to fix this JIRA and so started looking at it. From what I have 
observed it appears that the {{HiveServer}} *is* multi-thread capable. 
Specifically:

* The {{HiveServer}} is using a {{TThreadPoolServer}} which is multi-threaded.
* The {{ThriftHiveProcessorFactory}} overrides the {{getProcessor()}} call and 
returns a new instance of {{HiveServerHandler}} on every invokation.
* Every instance of {{HiveServerHandler}} has its own thread local session 
state and a private driver instance.
* Query execution is thread safe thanks to HIVE-77.

Give the above, I believe that this JIRA should be marked closed and resolved. 
If you think I missed something in my analysis, can you please point that out?

> Allow Hive Server to run multiple queries simulteneously
> 
>
> Key: HIVE-80
> URL: https://issues.apache.org/jira/browse/HIVE-80
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Server Infrastructure
>Reporter: Raghotham Murthy
>Assignee: Neil Conway
>Priority: Critical
> Attachments: hive_input_format_race-2.patch
>
>
> Can use one driver object per query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-04-19 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-1176:
---

  Status: Patch Available  (was: Open)
Assignee: Arvind Prabhakar

This problem is due to [a bug in Datanucleus 
JDOQL|http://www.jpox.org/servlet/jira/browse/NUCCORE-427] implementation and 
has been fixed in version 2.0.x.

The fix is therefore to upgrade datanucleus plugins to the latest stable 
release.

*Details:*
- Replaced the old datanucleus plugins version 1.1.2 with the latest stable 
release.
- Updated jdo2-api library with the version required by datanucleus - 2.3-0ec, 
available from datanucleus maven repository at 
http://www.datanucleus.org/downloads/maven2/javax/jdo/jdo2-api/2.3-ec/, Apache 
licensed
- Modified the build files to suppress auto-enhancement of all complied 
classes, a new feature introduced in the latest version.
- Modified the HiveMetaStoreClient implementation to create deep-copies of 
non-primitive objects being returned from the thrift server. Without this 
change, some collections were being fetched as semi-populated proxies with 
missing session context leading to NPEs.

*Testing Done:*
Built and ran all tests. Only two failures were reported - clientpositive test 
for input20.q and input33.q. These tests appear to be failing on the trunk as 
well.

> 'create if not exists' fails for a table name with 'select' in it
> -
>
> Key: HIVE-1176
> URL: https://issues.apache.org/jira/browse/HIVE-1176
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Arvind Prabhakar
> Attachments: HIVE-1176.lib-files.tar.gz, HIVE-1176.patch
>
>
> hive> create table if not exists tmp_select(s string, c string, n int);
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
> start with SELECT)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
> JDOQL Single-String query should always start with SELECT)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
> ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-04-19 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-1176:
---

Attachment: HIVE-1176.patch
HIVE-1176.lib-files.tar.gz

> 'create if not exists' fails for a table name with 'select' in it
> -
>
> Key: HIVE-1176
> URL: https://issues.apache.org/jira/browse/HIVE-1176
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
> Attachments: HIVE-1176.lib-files.tar.gz, HIVE-1176.patch
>
>
> hive> create table if not exists tmp_select(s string, c string, n int);
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
> start with SELECT)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
> JDOQL Single-String query should always start with SELECT)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
> ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1287) Struct datatype should not use field names for type equivalence.

2010-03-29 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12851166#action_12851166
 ] 

Arvind Prabhakar commented on HIVE-1287:


I think I understand your point of view. Let me explain mine:

Right now there is no consistent type checking. What we have is implicit type 
conversion where possible - such as converting a struct to string but not the 
other way around. In other places this implicit type conversion leads to 
internal error. In case of struct to struct conversion however the check is 
rigid to the field names. This is not consistent.

My suggestion is to provide type equivalence semantics within the query 
language framework. Doing this will help in the following ways:
- Implicit type conversion would not be allowed and would require explicit CAST 
to convert to another type. 
- The query compiler would ensure that the data types are equivalent and 
therefore allow data to flow without having to invoke any UDF for every row. 
This should help us gain performance relative to the current approach.
- Providing type equivalence checks will also be fundamental to building 
higher-level UD*Fs which would otherwise have to deal with cast semantics. 


> Struct datatype should not use field names for type equivalence.
> 
>
> Key: HIVE-1287
> URL: https://issues.apache.org/jira/browse/HIVE-1287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
> Environment: Mac OS X (10.6.2) Java SE 6 ( 1.6.0_17)
>Reporter: Arvind Prabhakar
>
> The field names for {{Struct}} types are currently being matched for testing 
> type equivalence. This is readily seen by running the following example:
> {noformat}
> hive> create table source ( foo struct < x : string > );
> OK
> Time taken: 3.094 seconds
> hive> load data local inpath '/path/to/sample/data.txt' overwrite into table 
> source;
> Copying data from file:/path/to/sample/data.txt
> Loading data to table source
> OK
> Time taken: 0.593 seconds
> hive> create table sink ( bar struct < y : string >);
> OK
> Time taken: 0.11 seconds
> hive> insert overwrite table sink select foo from source;
> FAILED: Error in semantic analysis: line 1:23 Cannot insert into target table 
> because column number/types are different sink: Cannot convert column 0 
> from struct to struct.
> {noformat}
> Since both {{soruce.foo}} and {{sink.bar}} are similar in definition with 
> only field names being different, data movement between these two should be 
> allowed. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1287) Struct datatype should not use field names for type equivalence.

2010-03-29 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12851097#action_12851097
 ] 

Arvind Prabhakar commented on HIVE-1287:


Thanks for your comment Zheng.

I can see how the {{CAST}} would work, but believe that we need a stronger type 
checking semantic. Traditionally, a {{CAST}} is used to bypass compile time 
checks. While this is very powerful concept, it can lead to data corrpution if 
not used with caution.

An alternative to using the {{CAST}} approach would be to use compile time type 
checking without regard to the field names. This is similar to function 
signatures in say Java - where it does not matter what the parameter names are, 
as long as they are specified in the correct order. This can be achieved by 
thinking of field names as aliases for the datatypes of that field.

For example - the columns defined as {{struct < a : string >}} and {{struct < b 
: string >}} are type-equivalent because they are both of the type {{struct < ? 
: string >}}. 


> Struct datatype should not use field names for type equivalence.
> 
>
> Key: HIVE-1287
> URL: https://issues.apache.org/jira/browse/HIVE-1287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
> Environment: Mac OS X (10.6.2) Java SE 6 ( 1.6.0_17)
>Reporter: Arvind Prabhakar
>
> The field names for {{Struct}} types are currently being matched for testing 
> type equivalence. This is readily seen by running the following example:
> {noformat}
> hive> create table source ( foo struct < x : string > );
> OK
> Time taken: 3.094 seconds
> hive> load data local inpath '/path/to/sample/data.txt' overwrite into table 
> source;
> Copying data from file:/path/to/sample/data.txt
> Loading data to table source
> OK
> Time taken: 0.593 seconds
> hive> create table sink ( bar struct < y : string >);
> OK
> Time taken: 0.11 seconds
> hive> insert overwrite table sink select foo from source;
> FAILED: Error in semantic analysis: line 1:23 Cannot insert into target table 
> because column number/types are different sink: Cannot convert column 0 
> from struct to struct.
> {noformat}
> Since both {{soruce.foo}} and {{sink.bar}} are similar in definition with 
> only field names being different, data movement between these two should be 
> allowed. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1287) Struct datatype should not use field names for type equivalence.

2010-03-29 Thread Arvind Prabhakar (JIRA)

Struct datatype should not use field names for type equivalence.


 Key: HIVE-1287
 URL: https://issues.apache.org/jira/browse/HIVE-1287
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
 Environment: Mac OS X (10.6.2) Java SE 6 ( 1.6.0_17)
Reporter: Arvind Prabhakar


The field names for {{Struct}} types are currently being matched for testing 
type equivalence. This is readily seen by running the following example:

{noformat}

hive> create table source ( foo struct < x : string > );
OK
Time taken: 3.094 seconds

hive> load data local inpath '/path/to/sample/data.txt' overwrite into table 
source;
Copying data from file:/path/to/sample/data.txt
Loading data to table source
OK
Time taken: 0.593 seconds

hive> create table sink ( bar struct < y : string >);
OK
Time taken: 0.11 seconds

hive> insert overwrite table sink select foo from source;
FAILED: Error in semantic analysis: line 1:23 Cannot insert into target table 
because column number/types are different sink: Cannot convert column 0 
from struct to struct.

{noformat}

Since both {{soruce.foo}} and {{sink.bar}} are similar in definition with only 
field names being different, data movement between these two should be allowed. 



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1271) Case sensitiveness of type information specified when using custom reducer causes type mismatch

2010-03-28 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-1271:
---

Attachment: HIVE-1271-1.patch

> Case sensitiveness of type information specified when using custom reducer 
> causes type mismatch
> ---
>
> Key: HIVE-1271
> URL: https://issues.apache.org/jira/browse/HIVE-1271
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Dilip Joseph
>Assignee: Arvind Prabhakar
> Attachments: HIVE-1271-1.patch, HIVE-1271.patch
>
>
> Type information specified  while using a custom reduce script is converted 
> to lower case, and causes type mismatch during query semantic analysis .  The 
> following REDUCE query where field name =  "userId" failed.
> hive> CREATE TABLE SS (
>> a INT,
>> b INT,
>> vals ARRAY>
>> );
> OK
> hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
>> INSERT OVERWRITE TABLE SS
>> REDUCE *
>> USING 'myreduce.py'
>> AS
>> (a INT,
>> b INT,
>> vals ARRAY>
>> )
>> ;
> FAILED: Error in semantic analysis: line 2:27 Cannot insert into
> target table because column number/types are different SS: Cannot
> convert column 2 from array> to
> array>.
> The same query worked fine after changing "userId" to "userid".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1271) Case sensitiveness of type information specified when using custom reducer causes type mismatch

2010-03-28 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850785#action_12850785
 ] 

Arvind Prabhakar commented on HIVE-1271:


Changes for HIVE-1271 (patch updated)

*Summary:*
The previously submitted patch removed the dependence of {{StructTypeInfo}} on 
field names for equivalence comparison. This patch reverts that change and 
addresses the type equivalence by canonical testing of field names.

*Details:*
The changes to {{TypeInfo}} hierarchy made by previous patch assumed that the 
field names should not be considered part of the {{StructTypeInfo}} for testing 
equivalence. This conflicts with the implementation of {{LazyBinarySerDe}} (and 
others perhaps) which rely on field name distinction for caching purposes. This 
update changes the implementation so that field names are used as before, but 
are compared using case-insensitive comparison when testing the equivalence of 
two {{StructTypeInfo}}s.

*Testing Done:*
- Built and tested the usecase identified in this issue - it works now.
- Ran complete set of tests with the previously reported unrelated failures 
only.


> Case sensitiveness of type information specified when using custom reducer 
> causes type mismatch
> ---
>
> Key: HIVE-1271
> URL: https://issues.apache.org/jira/browse/HIVE-1271
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Dilip Joseph
>Assignee: Arvind Prabhakar
> Attachments: HIVE-1271-1.patch, HIVE-1271.patch
>
>
> Type information specified  while using a custom reduce script is converted 
> to lower case, and causes type mismatch during query semantic analysis .  The 
> following REDUCE query where field name =  "userId" failed.
> hive> CREATE TABLE SS (
>> a INT,
>> b INT,
>> vals ARRAY>
>> );
> OK
> hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
>> INSERT OVERWRITE TABLE SS
>> REDUCE *
>> USING 'myreduce.py'
>> AS
>> (a INT,
>> b INT,
>> vals ARRAY>
>> )
>> ;
> FAILED: Error in semantic analysis: line 2:27 Cannot insert into
> target table because column number/types are different SS: Cannot
> convert column 2 from array> to
> array>.
> The same query worked fine after changing "userId" to "userid".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1271) Case sensitiveness of type information specified when using custom reducer causes type mismatch

2010-03-26 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-1271:
---

Status: Patch Available  (was: Open)

*Summary*
The implementation of {{equals()}} method of {{StructTypeInfo}} was comparing 
field names as part of the comparison. This is not valid since field namess do 
not contitute the definition of a type. This patch refactors the {{TypedInfo}} 
hierarchy to address this issue.

*Implementation Details*
- Modified the {{TypedInfo}} and removed its implementation of the {{equals()}} 
method.
- Modified all specialized subclasses to make them {{final}}.
- Modified all subclass implementation of {{equals()}} to skip category 
comparison.
- Modified {{StructTypeInfo}} implementation of {{equals()}} to not compare 
field names.

*Testing Done*
- Built and tested the usecase identified in this issue. It works now.
- Ran full set of tests. Out of these two tests - clientpositive for input20.q 
and input33.q failed for unrelated reasons (these tests are failing on the 
trunk as well). All other tests passed.

> Case sensitiveness of type information specified when using custom reducer 
> causes type mismatch
> ---
>
> Key: HIVE-1271
> URL: https://issues.apache.org/jira/browse/HIVE-1271
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Dilip Joseph
>Assignee: Arvind Prabhakar
> Attachments: HIVE-1271.patch
>
>
> Type information specified  while using a custom reduce script is converted 
> to lower case, and causes type mismatch during query semantic analysis .  The 
> following REDUCE query where field name =  "userId" failed.
> hive> CREATE TABLE SS (
>> a INT,
>> b INT,
>> vals ARRAY>
>> );
> OK
> hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
>> INSERT OVERWRITE TABLE SS
>> REDUCE *
>> USING 'myreduce.py'
>> AS
>> (a INT,
>> b INT,
>> vals ARRAY>
>> )
>> ;
> FAILED: Error in semantic analysis: line 2:27 Cannot insert into
> target table because column number/types are different SS: Cannot
> convert column 2 from array> to
> array>.
> The same query worked fine after changing "userId" to "userid".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1271) Case sensitiveness of type information specified when using custom reducer causes type mismatch

2010-03-26 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-1271:
---

Attachment: HIVE-1271.patch

> Case sensitiveness of type information specified when using custom reducer 
> causes type mismatch
> ---
>
> Key: HIVE-1271
> URL: https://issues.apache.org/jira/browse/HIVE-1271
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Dilip Joseph
>Assignee: Arvind Prabhakar
> Attachments: HIVE-1271.patch
>
>
> Type information specified  while using a custom reduce script is converted 
> to lower case, and causes type mismatch during query semantic analysis .  The 
> following REDUCE query where field name =  "userId" failed.
> hive> CREATE TABLE SS (
>> a INT,
>> b INT,
>> vals ARRAY>
>> );
> OK
> hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
>> INSERT OVERWRITE TABLE SS
>> REDUCE *
>> USING 'myreduce.py'
>> AS
>> (a INT,
>> b INT,
>> vals ARRAY>
>> )
>> ;
> FAILED: Error in semantic analysis: line 2:27 Cannot insert into
> target table because column number/types are different SS: Cannot
> convert column 2 from array> to
> array>.
> The same query worked fine after changing "userId" to "userid".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

1 2 >

1 - 100 of 116 matches

Mail list logo