[jira] [Updated] (HIVE-2664) dummy issue for Phabricator testing, please ignore

2011-12-22 Thread Phabricator (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2664:
--

Attachment: HIVE-2664.D999.1.patch

jsichi requested code review of "HIVE-2664 [jira] dummy issue for Phabricator 
testing, please ignore".
Reviewers: JIRA

  https://issues.apache.org/jira/browse/HIVE-2664

  blah



TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D999

AFFECTED FILES
  README.txt

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/2097/

Tip: use the X-Herald-Rules header to filter Herald messages in your client.


> dummy issue for Phabricator testing, please ignore
> --
>
> Key: HIVE-2664
> URL: https://issues.apache.org/jira/browse/HIVE-2664
> Project: Hive
>  Issue Type: Improvement
>Reporter: John Sichi
>Assignee: John Sichi
> Attachments: HIVE-2664.D939.1.patch, HIVE-2664.D963.1.patch, 
> HIVE-2664.D969.1.patch, HIVE-2664.D975.1.patch, HIVE-2664.D993.1.patch, 
> HIVE-2664.D999.1.patch, HIVE-2664.final.patch, HIVE-2664.final.patch, 
> HIVE-2664.final.patch, HIVE-2664.final.patch, HIVE-2664.final.patch, 
> HIVE-2664.final.patch, HIVE-2664.final.patch, HIVE-2664.final.patch, 
> HIVE-2664.final.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2676) The row count that loaded to a table may not right

2011-12-22 Thread binlijin (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

binlijin updated HIVE-2676:
---

Fix Version/s: 0.8.1

> The row count that loaded to a table may not right   
> -
>
> Key: HIVE-2676
> URL: https://issues.apache.org/jira/browse/HIVE-2676
> Project: Hive
>  Issue Type: Improvement
>Reporter: binlijin
>Priority: Minor
>  Labels: patch
> Fix For: 0.8.1
>
> Attachments: HIVE-2676.patch
>
>
> create table tablename as SELECT ***
> At the end hive will print a number that show how many Rows loaded to the 
> tablename, but sometimes the number is not right.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2676) The row count that loaded to a table may not right

2011-12-22 Thread binlijin (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

binlijin updated HIVE-2676:
---

Attachment: HIVE-2676.patch

> The row count that loaded to a table may not right   
> -
>
> Key: HIVE-2676
> URL: https://issues.apache.org/jira/browse/HIVE-2676
> Project: Hive
>  Issue Type: Improvement
>Reporter: binlijin
>Priority: Minor
>  Labels: patch
> Attachments: HIVE-2676.patch
>
>
> create table tablename as SELECT ***
> At the end hive will print a number that show how many Rows loaded to the 
> tablename, but sometimes the number is not right.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2676) The row count that loaded to a table may not right

2011-12-22 Thread binlijin (Created) (JIRA)
The row count that loaded to a table may not right   
-

 Key: HIVE-2676
 URL: https://issues.apache.org/jira/browse/HIVE-2676
 Project: Hive
  Issue Type: Improvement
Reporter: binlijin
Priority: Minor


create table tablename as SELECT ***
At the end hive will print a number that show how many Rows loaded to the 
tablename, but sometimes the number is not right.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2621) Allow multiple group bys with the same input data and spray keys to be run on the same reducer.

2011-12-22 Thread Phabricator (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2621:
--

Attachment: HIVE-2621.D567.4.patch

kevinwilfong updated the revision "HIVE-2621 [jira] Allow multiple group bys 
with the same input data and spray keys to be run on the same reducer.".
Reviewers: JIRA

  Addressed Namit's and Yongqiang's comments as follows:

  Removed code for singlemrMultiGroupBy optimization, and all related methods, 
as the new code should produce similar results and can handle more cases, such 
as filters.

  Shared code between getCommonDistinctExprs and getCommonGroupByDestGroups, as 
well as between genCommonGroupByPlanReduceSinkOperator and 
genGroupByPlanReduceSinkOperator.

  Added comments where requested.

  Deduplicated filters in common filter used before a common group by reduce 
sink.

REVISION DETAIL
  https://reviews.facebook.net/D567

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  ql/src/test/results/clientpositive/groupby9.q.out
  ql/src/test/results/clientpositive/groupby7_noskew_multi_single_reducer.q.out
  ql/src/test/results/clientpositive/groupby10.q.out
  ql/src/test/results/clientpositive/parallel.q.out
  ql/src/test/results/clientpositive/groupby_multi_single_reducer.q.out
  ql/src/test/results/clientpositive/multigroupby_singlemr.q.out
  ql/src/test/results/clientpositive/multi_insert.q.out
  ql/src/test/results/clientpositive/groupby8.q.out
  
ql/src/test/results/clientpositive/groupby_complex_types_multi_single_reducer.q.out
  ql/src/test/results/clientpositive/groupby7_map_multi_single_reducer.q.out
  ql/src/test/queries/clientpositive/groupby7_noskew.q
  ql/src/test/queries/clientpositive/groupby10.q
  ql/src/test/queries/clientpositive/groupby_multi_single_reducer.q
  ql/src/test/queries/clientpositive/multigroupby_singlemr.q
  ql/src/test/queries/clientpositive/groupby7_map.q
  ql/src/test/queries/clientpositive/groupby8.q
  ql/src/test/queries/clientpositive/groupby9.q
  ql/src/test/queries/clientpositive/groupby7_noskew_multi_single_reducer.q
  ql/src/test/queries/clientpositive/groupby7_map_multi_single_reducer.q
  
ql/src/test/queries/clientpositive/groupby_complex_types_multi_single_reducer.q
  ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java


> Allow multiple group bys with the same input data and spray keys to be run on 
> the same reducer.
> ---
>
> Key: HIVE-2621
> URL: https://issues.apache.org/jira/browse/HIVE-2621
> Project: Hive
>  Issue Type: New Feature
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-2621.1.patch.txt, HIVE-2621.D567.1.patch, 
> HIVE-2621.D567.2.patch, HIVE-2621.D567.3.patch, HIVE-2621.D567.4.patch
>
>
> Currently, when a user runs a query, such as a multi-insert, where each 
> insertion subclause consists of a simple query followed by a group by, the 
> group bys for each clause are run on a separate reducer.  This requires 
> writing the data for each group by clause to an intermediate file, and then 
> reading it back.  This uses a significant amount of the total CPU consumed by 
> the query for an otherwise simple query.
> If the subclauses are grouped by their distinct expressions and group by 
> keys, with all of the group by expressions for a group of subclauses run on a 
> single reducer, this would reduce the amount of reading/writing to 
> intermediate files for some queries.
> To do this, for each group of subclauses, in the mapper we would execute a 
> the filters for each subclause 'or'd together (provided each subclause has a 
> filter) followed by a reduce sink.  In the reducer, the child operators would 
> be each subclauses filter followed by the group by and any subsequent 
> operations.
> Note that this would require turning off map aggregation, so we would need to 
> make using this type of plan configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2621) Allow multiple group bys with the same input data and spray keys to be run on the same reducer.

2011-12-22 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175209#comment-13175209
 ] 

Phabricator commented on HIVE-2621:
---

heyongqiang has commented on the revision "HIVE-2621 [jira] Allow multiple 
group bys with the same input data and spray keys to be run on the same 
reducer.".

  otherwise, looks good to me.

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:3015 
Instead of a lot duplicate code here, can we just pass one dest to  
genGroupByPlanReduceSinkOperator()?

REVISION DETAIL
  https://reviews.facebook.net/D567


> Allow multiple group bys with the same input data and spray keys to be run on 
> the same reducer.
> ---
>
> Key: HIVE-2621
> URL: https://issues.apache.org/jira/browse/HIVE-2621
> Project: Hive
>  Issue Type: New Feature
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-2621.1.patch.txt, HIVE-2621.D567.1.patch, 
> HIVE-2621.D567.2.patch, HIVE-2621.D567.3.patch
>
>
> Currently, when a user runs a query, such as a multi-insert, where each 
> insertion subclause consists of a simple query followed by a group by, the 
> group bys for each clause are run on a separate reducer.  This requires 
> writing the data for each group by clause to an intermediate file, and then 
> reading it back.  This uses a significant amount of the total CPU consumed by 
> the query for an otherwise simple query.
> If the subclauses are grouped by their distinct expressions and group by 
> keys, with all of the group by expressions for a group of subclauses run on a 
> single reducer, this would reduce the amount of reading/writing to 
> intermediate files for some queries.
> To do this, for each group of subclauses, in the mapper we would execute a 
> the filters for each subclause 'or'd together (provided each subclause has a 
> filter) followed by a reduce sink.  In the reducer, the child operators would 
> be each subclauses filter followed by the group by and any subsequent 
> operations.
> Note that this would require turning off map aggregation, so we would need to 
> make using this type of plan configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2621) Allow multiple group bys with the same input data and spray keys to be run on the same reducer.

2011-12-22 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175207#comment-13175207
 ] 

Phabricator commented on HIVE-2621:
---

heyongqiang has commented on the revision "HIVE-2621 [jira] Allow multiple 
group bys with the same input data and spray keys to be run on the same 
reducer.".

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:3411 can we 
do some simple de-duplication here?

REVISION DETAIL
  https://reviews.facebook.net/D567


> Allow multiple group bys with the same input data and spray keys to be run on 
> the same reducer.
> ---
>
> Key: HIVE-2621
> URL: https://issues.apache.org/jira/browse/HIVE-2621
> Project: Hive
>  Issue Type: New Feature
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-2621.1.patch.txt, HIVE-2621.D567.1.patch, 
> HIVE-2621.D567.2.patch, HIVE-2621.D567.3.patch
>
>
> Currently, when a user runs a query, such as a multi-insert, where each 
> insertion subclause consists of a simple query followed by a group by, the 
> group bys for each clause are run on a separate reducer.  This requires 
> writing the data for each group by clause to an intermediate file, and then 
> reading it back.  This uses a significant amount of the total CPU consumed by 
> the query for an otherwise simple query.
> If the subclauses are grouped by their distinct expressions and group by 
> keys, with all of the group by expressions for a group of subclauses run on a 
> single reducer, this would reduce the amount of reading/writing to 
> intermediate files for some queries.
> To do this, for each group of subclauses, in the mapper we would execute a 
> the filters for each subclause 'or'd together (provided each subclause has a 
> filter) followed by a reduce sink.  In the reducer, the child operators would 
> be each subclauses filter followed by the group by and any subsequent 
> operations.
> Note that this would require turning off map aggregation, so we would need to 
> make using this type of plan configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2621) Allow multiple group bys with the same input data and spray keys to be run on the same reducer.

2011-12-22 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175205#comment-13175205
 ] 

Phabricator commented on HIVE-2621:
---

heyongqiang has commented on the revision "HIVE-2621 [jira] Allow multiple 
group bys with the same input data and spray keys to be run on the same 
reducer.".

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:6284 can u 
add here comments that you just explained to me offline?

REVISION DETAIL
  https://reviews.facebook.net/D567


> Allow multiple group bys with the same input data and spray keys to be run on 
> the same reducer.
> ---
>
> Key: HIVE-2621
> URL: https://issues.apache.org/jira/browse/HIVE-2621
> Project: Hive
>  Issue Type: New Feature
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-2621.1.patch.txt, HIVE-2621.D567.1.patch, 
> HIVE-2621.D567.2.patch, HIVE-2621.D567.3.patch
>
>
> Currently, when a user runs a query, such as a multi-insert, where each 
> insertion subclause consists of a simple query followed by a group by, the 
> group bys for each clause are run on a separate reducer.  This requires 
> writing the data for each group by clause to an intermediate file, and then 
> reading it back.  This uses a significant amount of the total CPU consumed by 
> the query for an otherwise simple query.
> If the subclauses are grouped by their distinct expressions and group by 
> keys, with all of the group by expressions for a group of subclauses run on a 
> single reducer, this would reduce the amount of reading/writing to 
> intermediate files for some queries.
> To do this, for each group of subclauses, in the mapper we would execute a 
> the filters for each subclause 'or'd together (provided each subclause has a 
> filter) followed by a reduce sink.  In the reducer, the child operators would 
> be each subclauses filter followed by the group by and any subsequent 
> operations.
> Note that this would require turning off map aggregation, so we would need to 
> make using this type of plan configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2621) Allow multiple group bys with the same input data and spray keys to be run on the same reducer.

2011-12-22 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175201#comment-13175201
 ] 

Phabricator commented on HIVE-2621:
---

heyongqiang has commented on the revision "HIVE-2621 [jira] Allow multiple 
group bys with the same input data and spray keys to be run on the same 
reducer.".

  can u add a testcase which includes a subquery in one group by clause?
   still reviewing


INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:6290 the 
check here is confusing. It is not very clear which cases should come in, and 
which cases should not. Can u try to reduce the size of check list by moving 
some up?

  or add some comments here

REVISION DETAIL
  https://reviews.facebook.net/D567


> Allow multiple group bys with the same input data and spray keys to be run on 
> the same reducer.
> ---
>
> Key: HIVE-2621
> URL: https://issues.apache.org/jira/browse/HIVE-2621
> Project: Hive
>  Issue Type: New Feature
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-2621.1.patch.txt, HIVE-2621.D567.1.patch, 
> HIVE-2621.D567.2.patch, HIVE-2621.D567.3.patch
>
>
> Currently, when a user runs a query, such as a multi-insert, where each 
> insertion subclause consists of a simple query followed by a group by, the 
> group bys for each clause are run on a separate reducer.  This requires 
> writing the data for each group by clause to an intermediate file, and then 
> reading it back.  This uses a significant amount of the total CPU consumed by 
> the query for an otherwise simple query.
> If the subclauses are grouped by their distinct expressions and group by 
> keys, with all of the group by expressions for a group of subclauses run on a 
> single reducer, this would reduce the amount of reading/writing to 
> intermediate files for some queries.
> To do this, for each group of subclauses, in the mapper we would execute a 
> the filters for each subclause 'or'd together (provided each subclause has a 
> filter) followed by a reduce sink.  In the reducer, the child operators would 
> be each subclauses filter followed by the group by and any subsequent 
> operations.
> Note that this would require turning off map aggregation, so we would need to 
> make using this type of plan configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-1877) Add java_method() as a synonym for the reflect() UDF

2011-12-22 Thread Zhenxiao Luo (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhenxiao Luo updated HIVE-1877:
---

Attachment: HIVE-1877.3.patch.txt

Code Review Request for HIVE-1877

> Add java_method() as a synonym for the reflect() UDF
> 
>
> Key: HIVE-1877
> URL: https://issues.apache.org/jira/browse/HIVE-1877
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Affects Versions: 0.7.0
>Reporter: Carl Steinbach
>Assignee: Zhenxiao Luo
> Attachments: HIVE-1877.1.patch.txt, HIVE-1877.2.patch.txt, 
> HIVE-1877.3.patch.txt
>
>
> HIVE-471 added the reflect() UDF which allows people to invoke static Java 
> methods from within HQL
> queries. In my opinion the name is confusing since it describes how the UDF 
> works instead of what
> it does. I propose changing the name of (or providing a synonym for) the UDF 
> to something like
>  'invoke_method' or 'java_method', or something similar. I'm open to 
> suggestions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-1877) Add java_method() as a synonym for the reflect() UDF

2011-12-22 Thread Zhenxiao Luo (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhenxiao Luo updated HIVE-1877:
---

Release Note: java_method() is a synonym for reflect()
  Status: Patch Available  (was: Open)

> Add java_method() as a synonym for the reflect() UDF
> 
>
> Key: HIVE-1877
> URL: https://issues.apache.org/jira/browse/HIVE-1877
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Affects Versions: 0.7.0
>Reporter: Carl Steinbach
>Assignee: Zhenxiao Luo
> Attachments: HIVE-1877.1.patch.txt, HIVE-1877.2.patch.txt, 
> HIVE-1877.3.patch.txt
>
>
> HIVE-471 added the reflect() UDF which allows people to invoke static Java 
> methods from within HQL
> queries. In my opinion the name is confusing since it describes how the UDF 
> works instead of what
> it does. I propose changing the name of (or providing a synonym for) the UDF 
> to something like
>  'invoke_method' or 'java_method', or something similar. I'm open to 
> suggestions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2670) A cluster test utility for Hive

2011-12-22 Thread Alan Gates (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-2670:
-

Attachment: hive_cluster_test_2.patch

Here's a second version of the patch, with changes to run tests for HIVE-2616.  
They are included in cmdline.conf, and called HCat_sudo.  To run these two new 
tests it works better if you set up a Hive server and point the test harness at 
it by adding:

-Dharness.metastore.host= -Dharness.metastore.port= 
-Dharness.metastore.passwd= -Dharness.metastore.thrift=1

You also need to define the user to sudo to, and your password to use to sudo.  
Add these on the command line as:
-Dharness.sudo.to= -Dharness.sudo.pass=

> A cluster test utility for Hive
> ---
>
> Key: HIVE-2670
> URL: https://issues.apache.org/jira/browse/HIVE-2670
> Project: Hive
>  Issue Type: New Feature
>  Components: Testing Infrastructure
>Reporter: Alan Gates
> Attachments: harness.tar, hive_cluster_test.patch, 
> hive_cluster_test_2.patch
>
>
> Hive has an extensive set of unit tests, but it does not have an 
> infrastructure for testing in a cluster environment.  Pig and HCatalog have 
> been using a test harness for cluster testing for some time.  We have written 
> Hive drivers and tests to run in this harness.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2675) JDBC SQL execution exception does not contain cause

2011-12-22 Thread Greg Cottman (Created) (JIRA)
JDBC SQL execution exception does not contain cause
---

 Key: HIVE-2675
 URL: https://issues.apache.org/jira/browse/HIVE-2675
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.8.0
 Environment: Any
Reporter: Greg Cottman


If SQL execution throws an exception in the HiveStatement.executeSQL() method 
then it's message is rethrown as a SQLException with a SQLState of "08S01":

  try {
resultSet = null;
client.execute(sql);
  } catch (HiveServerException e) {
throw new SQLException(e.getMessage(), e.getSQLState(), e.getErrorCode());
  } catch (Exception ex) {
throw new SQLException(ex.toString(), "08S01");
  }

In the case of failed DDL, the exception "ex" has a cause - such as a 
java.io.IOException - that contains the actual error text.  The description of 
the actual problem is lost by failing to include "ex" as the cause in the new 
SQLException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2664) dummy issue for Phabricator testing, please ignore

2011-12-22 Thread John Sichi (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-2664:
-

Attachment: HIVE-2664.final.patch

> dummy issue for Phabricator testing, please ignore
> --
>
> Key: HIVE-2664
> URL: https://issues.apache.org/jira/browse/HIVE-2664
> Project: Hive
>  Issue Type: Improvement
>Reporter: John Sichi
>Assignee: John Sichi
> Attachments: HIVE-2664.D939.1.patch, HIVE-2664.D963.1.patch, 
> HIVE-2664.D969.1.patch, HIVE-2664.D975.1.patch, HIVE-2664.D993.1.patch, 
> HIVE-2664.final.patch, HIVE-2664.final.patch, HIVE-2664.final.patch, 
> HIVE-2664.final.patch, HIVE-2664.final.patch, HIVE-2664.final.patch, 
> HIVE-2664.final.patch, HIVE-2664.final.patch, HIVE-2664.final.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2664) dummy issue for Phabricator testing, please ignore

2011-12-22 Thread John Sichi (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-2664:
-

Attachment: HIVE-2664.final.patch

> dummy issue for Phabricator testing, please ignore
> --
>
> Key: HIVE-2664
> URL: https://issues.apache.org/jira/browse/HIVE-2664
> Project: Hive
>  Issue Type: Improvement
>Reporter: John Sichi
>Assignee: John Sichi
> Attachments: HIVE-2664.D939.1.patch, HIVE-2664.D963.1.patch, 
> HIVE-2664.D969.1.patch, HIVE-2664.D975.1.patch, HIVE-2664.D993.1.patch, 
> HIVE-2664.final.patch, HIVE-2664.final.patch, HIVE-2664.final.patch, 
> HIVE-2664.final.patch, HIVE-2664.final.patch, HIVE-2664.final.patch, 
> HIVE-2664.final.patch, HIVE-2664.final.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2664) dummy issue for Phabricator testing, please ignore

2011-12-22 Thread John Sichi (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-2664:
-

Attachment: HIVE-2664.final.patch

> dummy issue for Phabricator testing, please ignore
> --
>
> Key: HIVE-2664
> URL: https://issues.apache.org/jira/browse/HIVE-2664
> Project: Hive
>  Issue Type: Improvement
>Reporter: John Sichi
>Assignee: John Sichi
> Attachments: HIVE-2664.D939.1.patch, HIVE-2664.D963.1.patch, 
> HIVE-2664.D969.1.patch, HIVE-2664.D975.1.patch, HIVE-2664.D993.1.patch, 
> HIVE-2664.final.patch, HIVE-2664.final.patch, HIVE-2664.final.patch, 
> HIVE-2664.final.patch, HIVE-2664.final.patch, HIVE-2664.final.patch, 
> HIVE-2664.final.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2664) dummy issue for Phabricator testing, please ignore

2011-12-22 Thread John Sichi (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-2664:
-

Attachment: HIVE-2664.final.patch

> dummy issue for Phabricator testing, please ignore
> --
>
> Key: HIVE-2664
> URL: https://issues.apache.org/jira/browse/HIVE-2664
> Project: Hive
>  Issue Type: Improvement
>Reporter: John Sichi
>Assignee: John Sichi
> Attachments: HIVE-2664.D939.1.patch, HIVE-2664.D963.1.patch, 
> HIVE-2664.D969.1.patch, HIVE-2664.D975.1.patch, HIVE-2664.D993.1.patch, 
> HIVE-2664.final.patch, HIVE-2664.final.patch, HIVE-2664.final.patch, 
> HIVE-2664.final.patch, HIVE-2664.final.patch, HIVE-2664.final.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2478) Support dry run option in hive

2011-12-22 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175012#comment-13175012
 ] 

Phabricator commented on HIVE-2478:
---

ashutoshc has requested changes to the revision "HIVE-2478 [jira] Support dry 
run option in hive".

INLINE COMMENTS
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:558 Name the 
property as hive.exec.dryrun. Also, add it in hive-default.xml.template
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java:135-140 There seems to be a 
confusion about these different mode means. You have not used PLAN anywhere. 
Also, 'off' is confusing. I don't think there is any need of OFF. Allow three 
values: PARSE, ANALYZE & PLAN.  and then match these against the user provided 
value. In hiveConf, default value returned is empty string "".
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java:423 Use, conf.getVar() 
instead.
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java:452 Without thus does it 
result in NPE later? Because returned schema is empty object. It will be good 
to add that in comment, if that is so.
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java:479-488 I think you want to 
end this phase after sem.validate().
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java:944 This should be 
DryMode.PLAN
  ql/src/test/queries/clientnegative/dryrun_bad_fetch_serde.q:1 Please add some 
comments at the top of script, explaining which phase should succeed and which 
should fail in these tests. You can also come up with better filenaemes for 
these tests which explains the intent.

  This applies for all tests

REVISION DETAIL
  https://reviews.facebook.net/D927


> Support dry run option in hive
> --
>
> Key: HIVE-2478
> URL: https://issues.apache.org/jira/browse/HIVE-2478
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Affects Versions: 0.9.0
>Reporter: kalyan ram
>Assignee: Sushanth Sowmyan
>Priority: Minor
> Attachments: HIVE-2478-1.patch, HIVE-2478-2.patch, HIVE-2478-3.patch, 
> HIVE-2478.D927.1.patch
>
>
> Hive currently doesn't support a dry run option. For some complex queries we 
> just want to verify the query syntax initally before running it. A dry run 
> option where just the parsing is done without actual execution is a good 
> option.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HIVE-2652) Change arc config to hide generated files from Differential by default

2011-12-22 Thread John Sichi (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi resolved HIVE-2652.
--

Resolution: Fixed

Fixed generically for thrift in arc-jira.

> Change arc config to hide generated files from Differential by default
> --
>
> Key: HIVE-2652
> URL: https://issues.apache.org/jira/browse/HIVE-2652
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: John Sichi
>Assignee: Marek Sapota
>
> I noticed that some files are hidden by default in arc-jira-lib, e.g. 
> events/__init__.php.  Is it possible via project-specific configuration to do 
> the same for Thrift-generated files in Hive, e.g. metastore/src/gen*, 
> service/src/gen*, serde/src/gen*?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2600) Enable/Add type-specific compression for rcfile

2011-12-22 Thread He Yongqiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174964#comment-13174964
 ] 

He Yongqiang commented on HIVE-2600:


Krishna, can u update the review board with arc? see 
https://cwiki.apache.org/Hive/phabricatorcodereview.html

i will test and commit

> Enable/Add type-specific compression for rcfile
> ---
>
> Key: HIVE-2600
> URL: https://issues.apache.org/jira/browse/HIVE-2600
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor, Serializers/Deserializers
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
>Priority: Minor
> Attachments: HIVE-2600.v0.patch, HIVE-2600.v1.patch
>
>
> Enable schema-aware compression codecs which can perform type-specific 
> compression on a per-column basis. I see this as in three-parts
> 1. Add interfaces for the rcfile to communicate column information to the 
> codec
> 2. Add an "uber compressor" which can perform column-specific compression on 
> a per-block basis. Initially, this can be config driven, but we can go for a 
> dynamic implementation later.
> 3. A bunch of type-specific compressors
> This jira is for the first part of the effort.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2621) Allow multiple group bys with the same input data and spray keys to be run on the same reducer.

2011-12-22 Thread Kevin Wilfong (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174945#comment-13174945
 ] 

Kevin Wilfong commented on HIVE-2621:
-

There are currently two ways of getting common distincts, the current way 
checks that all distinct expressions in the subqueries are the same.  My new 
code doesn't depend on this, it tries to construct subsets of the subqueries 
such that this is true for each subset.

The advantage of doing it in the form
if (optimizeMultiGroupBy) {
  ...
} else {
  
  for each group:
if (size of group > 1 && etc.) {
  
} else {
  
}
}

is that the block of code inside the optimizeMultiGroupBy if statement can 
produce 2 map reduce jobs where the new code might produce many.

After looking at it more carefully, I can get rid of the singlemrMultiGroupBy 
if statement and the code within the block because it produces the same result 
that my new code would except that the new code can handle filters as well.

After removing that code, the only remaining code above the if statement will 
be the poorly named getCommonDistinctExprs (as it only returns the common 
distinct expressions provided a lot of conditions are met including a 
requirement that all the distinct expressions are common), which I should be 
able to modify to use my new code.

> Allow multiple group bys with the same input data and spray keys to be run on 
> the same reducer.
> ---
>
> Key: HIVE-2621
> URL: https://issues.apache.org/jira/browse/HIVE-2621
> Project: Hive
>  Issue Type: New Feature
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-2621.1.patch.txt, HIVE-2621.D567.1.patch, 
> HIVE-2621.D567.2.patch, HIVE-2621.D567.3.patch
>
>
> Currently, when a user runs a query, such as a multi-insert, where each 
> insertion subclause consists of a simple query followed by a group by, the 
> group bys for each clause are run on a separate reducer.  This requires 
> writing the data for each group by clause to an intermediate file, and then 
> reading it back.  This uses a significant amount of the total CPU consumed by 
> the query for an otherwise simple query.
> If the subclauses are grouped by their distinct expressions and group by 
> keys, with all of the group by expressions for a group of subclauses run on a 
> single reducer, this would reduce the amount of reading/writing to 
> intermediate files for some queries.
> To do this, for each group of subclauses, in the mapper we would execute a 
> the filters for each subclause 'or'd together (provided each subclause has a 
> filter) followed by a reduce sink.  In the reducer, the child operators would 
> be each subclauses filter followed by the group by and any subsequent 
> operations.
> Note that this would require turning off map aggregation, so we would need to 
> make using this type of plan configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2621) Allow multiple group bys with the same input data and spray keys to be run on the same reducer.

2011-12-22 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174927#comment-13174927
 ] 

Phabricator commented on HIVE-2621:
---

kevinwilfong has commented on the revision "HIVE-2621 [jira] Allow multiple 
group bys with the same input data and spray keys to be run on the same 
reducer.".

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:5925 I 
return a list of lists of clause names (which map to subqueries) where the 
queries mapped to by each clause name in a list all have the same distinct and 
group by keys.

  It doesn't return common distincts.

  I'll try to make the comment clearer.

REVISION DETAIL
  https://reviews.facebook.net/D567


> Allow multiple group bys with the same input data and spray keys to be run on 
> the same reducer.
> ---
>
> Key: HIVE-2621
> URL: https://issues.apache.org/jira/browse/HIVE-2621
> Project: Hive
>  Issue Type: New Feature
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-2621.1.patch.txt, HIVE-2621.D567.1.patch, 
> HIVE-2621.D567.2.patch, HIVE-2621.D567.3.patch
>
>
> Currently, when a user runs a query, such as a multi-insert, where each 
> insertion subclause consists of a simple query followed by a group by, the 
> group bys for each clause are run on a separate reducer.  This requires 
> writing the data for each group by clause to an intermediate file, and then 
> reading it back.  This uses a significant amount of the total CPU consumed by 
> the query for an otherwise simple query.
> If the subclauses are grouped by their distinct expressions and group by 
> keys, with all of the group by expressions for a group of subclauses run on a 
> single reducer, this would reduce the amount of reading/writing to 
> intermediate files for some queries.
> To do this, for each group of subclauses, in the mapper we would execute a 
> the filters for each subclause 'or'd together (provided each subclause has a 
> filter) followed by a reduce sink.  In the reducer, the child operators would 
> be each subclauses filter followed by the group by and any subsequent 
> operations.
> Note that this would require turning off map aggregation, so we would need to 
> make using this type of plan configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2621) Allow multiple group bys with the same input data and spray keys to be run on the same reducer.

2011-12-22 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174922#comment-13174922
 ] 

Phabricator commented on HIVE-2621:
---

njain has commented on the revision "HIVE-2621 [jira] Allow multiple group bys 
with the same input data and spray keys to be run on the same reducer.".

  otherwise it looks good

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:5925 can 
you give an example in the comments ?

  Sorry, but it is not clear to me.

  Do you want to return 2 lists - one for the common distincts ?
  I am missing something: what else do you want to return ?


  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:3469 thanks 
for creating this function and re-using this code

REVISION DETAIL
  https://reviews.facebook.net/D567


> Allow multiple group bys with the same input data and spray keys to be run on 
> the same reducer.
> ---
>
> Key: HIVE-2621
> URL: https://issues.apache.org/jira/browse/HIVE-2621
> Project: Hive
>  Issue Type: New Feature
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-2621.1.patch.txt, HIVE-2621.D567.1.patch, 
> HIVE-2621.D567.2.patch, HIVE-2621.D567.3.patch
>
>
> Currently, when a user runs a query, such as a multi-insert, where each 
> insertion subclause consists of a simple query followed by a group by, the 
> group bys for each clause are run on a separate reducer.  This requires 
> writing the data for each group by clause to an intermediate file, and then 
> reading it back.  This uses a significant amount of the total CPU consumed by 
> the query for an otherwise simple query.
> If the subclauses are grouped by their distinct expressions and group by 
> keys, with all of the group by expressions for a group of subclauses run on a 
> single reducer, this would reduce the amount of reading/writing to 
> intermediate files for some queries.
> To do this, for each group of subclauses, in the mapper we would execute a 
> the filters for each subclause 'or'd together (provided each subclause has a 
> filter) followed by a reduce sink.  In the reducer, the child operators would 
> be each subclauses filter followed by the group by and any subsequent 
> operations.
> Note that this would require turning off map aggregation, so we would need to 
> make using this type of plan configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2621) Allow multiple group bys with the same input data and spray keys to be run on the same reducer.

2011-12-22 Thread Namit Jain (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174891#comment-13174891
 ] 

Namit Jain commented on HIVE-2621:
--

Let me take a look at the code again:

But the general flow should be as follows:

if  hive.multigroupby.singlereducer is true (which should always be),
  find common distincts. 
(or the check hive.multigroupby.singlereducer can be done inside find 
common distincts function itself)
  if common distincts == null
 old (current) approach - map side aggr should be used
  else:
 new code path

What do you think ? That way, we are guaranteed that the existing behavior is 
not changed.
This new parameter is only affecting distincts, and we it is very easy to turn 
it off

I know the code is kind of messy here, but can you spend some time to 
modularize it,
and reuse as much as possible ?



> Allow multiple group bys with the same input data and spray keys to be run on 
> the same reducer.
> ---
>
> Key: HIVE-2621
> URL: https://issues.apache.org/jira/browse/HIVE-2621
> Project: Hive
>  Issue Type: New Feature
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-2621.1.patch.txt, HIVE-2621.D567.1.patch, 
> HIVE-2621.D567.2.patch, HIVE-2621.D567.3.patch
>
>
> Currently, when a user runs a query, such as a multi-insert, where each 
> insertion subclause consists of a simple query followed by a group by, the 
> group bys for each clause are run on a separate reducer.  This requires 
> writing the data for each group by clause to an intermediate file, and then 
> reading it back.  This uses a significant amount of the total CPU consumed by 
> the query for an otherwise simple query.
> If the subclauses are grouped by their distinct expressions and group by 
> keys, with all of the group by expressions for a group of subclauses run on a 
> single reducer, this would reduce the amount of reading/writing to 
> intermediate files for some queries.
> To do this, for each group of subclauses, in the mapper we would execute a 
> the filters for each subclause 'or'd together (provided each subclause has a 
> filter) followed by a reduce sink.  In the reducer, the child operators would 
> be each subclauses filter followed by the group by and any subsequent 
> operations.
> Note that this would require turning off map aggregation, so we would need to 
> make using this type of plan configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Hive-0.8.0-SNAPSHOT-h0.21 - Build # 135 - Still Failing

2011-12-22 Thread Apache Jenkins Server
Changes for Build #131

Changes for Build #132

Changes for Build #133

Changes for Build #134

Changes for Build #135



No tests ran.

The Apache Jenkins build system has built Hive-0.8.0-SNAPSHOT-h0.21 (build #135)

Status: Still Failing

Check console output at 
https://builds.apache.org/job/Hive-0.8.0-SNAPSHOT-h0.21/135/ to view the 
results.


[jira] [Updated] (HIVE-2504) Warehouse table subdirectories should inherit the group permissions of the warehouse parent directory

2011-12-22 Thread Chinna Rao Lalam (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-2504:
---

Status: Patch Available  (was: In Progress)

> Warehouse table subdirectories should inherit the group permissions of the 
> warehouse parent directory
> -
>
> Key: HIVE-2504
> URL: https://issues.apache.org/jira/browse/HIVE-2504
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Carl Steinbach
>Assignee: Chinna Rao Lalam
> Attachments: HIVE-2504.patch
>
>
> When the Hive Metastore creates a subdirectory in the Hive warehouse for
> a new table it does so with the default HDFS permissions. Since the default
> dfs.umask value is 022, this means that the new subdirectory will not inherit 
> the
> group write permissions of the hive warehouse directory.
> We should make the umask used by Warehouse.mkdirs() configurable, and set
> it to use a default value of 002.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2504) Warehouse table subdirectories should inherit the group permissions of the warehouse parent directory

2011-12-22 Thread Chinna Rao Lalam (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174833#comment-13174833
 ] 

Chinna Rao Lalam commented on HIVE-2504:


for directory creation passed the umask value as 0002.

> Warehouse table subdirectories should inherit the group permissions of the 
> warehouse parent directory
> -
>
> Key: HIVE-2504
> URL: https://issues.apache.org/jira/browse/HIVE-2504
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Carl Steinbach
>Assignee: Chinna Rao Lalam
> Attachments: HIVE-2504.patch
>
>
> When the Hive Metastore creates a subdirectory in the Hive warehouse for
> a new table it does so with the default HDFS permissions. Since the default
> dfs.umask value is 022, this means that the new subdirectory will not inherit 
> the
> group write permissions of the hive warehouse directory.
> We should make the umask used by Warehouse.mkdirs() configurable, and set
> it to use a default value of 002.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Work started] (HIVE-2504) Warehouse table subdirectories should inherit the group permissions of the warehouse parent directory

2011-12-22 Thread Chinna Rao Lalam (Work started) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-2504 started by Chinna Rao Lalam.

> Warehouse table subdirectories should inherit the group permissions of the 
> warehouse parent directory
> -
>
> Key: HIVE-2504
> URL: https://issues.apache.org/jira/browse/HIVE-2504
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Carl Steinbach
>Assignee: Chinna Rao Lalam
> Attachments: HIVE-2504.patch
>
>
> When the Hive Metastore creates a subdirectory in the Hive warehouse for
> a new table it does so with the default HDFS permissions. Since the default
> dfs.umask value is 022, this means that the new subdirectory will not inherit 
> the
> group write permissions of the hive warehouse directory.
> We should make the umask used by Warehouse.mkdirs() configurable, and set
> it to use a default value of 002.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2663) DynamicSerDeTypeList.serialize() method have null check for the "nullProtocol" it needs to be change.

2011-12-22 Thread Chinna Rao Lalam (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174831#comment-13174831
 ] 

Chinna Rao Lalam commented on HIVE-2663:


In the below case nulll check is not needed and this will make the code 
complicated and looks bugy like 

if (element == null && nullProtocol != null) {

here if element = null and nullProtocol = null the else block will be executed 
with element = null.

So we can remove the check nullProtocol != null.

> DynamicSerDeTypeList.serialize() method  have null check for the 
> "nullProtocol" it needs to be change.
> --
>
> Key: HIVE-2663
> URL: https://issues.apache.org/jira/browse/HIVE-2663
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
> Environment: Hadoop 0.20.1, Hive0.9.0 and SUSE Linux Enterprise 
> Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
>Reporter: Chinna Rao Lalam
>Assignee: Chinna Rao Lalam
>Priority: Minor
> Attachments: HIVE-2663.patch
>
>
> In DynamicSerDeTypeList.serialize() method have the null check for the 
> "nullProtocol" like
>  if (element == null && nullProtocol != null) {
> here if element= null and nullProtocol=null in this case it will execute else 
> block with element=null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2663) DynamicSerDeTypeList.serialize() method have null check for the "nullProtocol" it needs to be change.

2011-12-22 Thread Chinna Rao Lalam (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-2663:
---

Status: Patch Available  (was: Open)

> DynamicSerDeTypeList.serialize() method  have null check for the 
> "nullProtocol" it needs to be change.
> --
>
> Key: HIVE-2663
> URL: https://issues.apache.org/jira/browse/HIVE-2663
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
> Environment: Hadoop 0.20.1, Hive0.9.0 and SUSE Linux Enterprise 
> Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
>Reporter: Chinna Rao Lalam
>Assignee: Chinna Rao Lalam
>Priority: Minor
> Attachments: HIVE-2663.patch
>
>
> In DynamicSerDeTypeList.serialize() method have the null check for the 
> "nullProtocol" like
>  if (element == null && nullProtocol != null) {
> here if element= null and nullProtocol=null in this case it will execute else 
> block with element=null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2623) Add Integer type compressors

2011-12-22 Thread Krishna Kumar (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-2623:


Status: Patch Available  (was: Open)

> Add Integer type compressors
> 
>
> Key: HIVE-2623
> URL: https://issues.apache.org/jira/browse/HIVE-2623
> Project: Hive
>  Issue Type: Sub-task
>  Components: Contrib
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
>Priority: Minor
> Attachments: HIVE-2623.v0.patch, HIVE-2623.v1.patch, 
> HIVE-2623.v2.patch, data.tar.gz
>
>
> Type-specific compressors for integers.
> Starting with elias gamma which prefers small values as per a power-law like 
> distribution. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2623) Add Integer type compressors

2011-12-22 Thread Krishna Kumar (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174824#comment-13174824
 ] 

Krishna Kumar commented on HIVE-2623:
-

Compression efficiency stats (unary is optimal for geometric_0.5, eliasgamma 
for steppedpowerlaw):

geometric_0.5 : gzip/bzip2/uber+Unary/uber+EliasGamma

1s  : 32628/31185/25640/28976
2s  : 51155/46444/39476/42789
D1s : 109133/172441/25300/28625
D2s : 85525/99032/25395/28719

steppedpowerlaw : gzip/bzip2/uber+Unary/uber+EliasGamma

1s  : 52082/48239/163440/45798
2s  : 63938/59270/165574/53033
D1s : 123481/205486/172697/37669
D2s : 94362/106407/178985/38004 

> Add Integer type compressors
> 
>
> Key: HIVE-2623
> URL: https://issues.apache.org/jira/browse/HIVE-2623
> Project: Hive
>  Issue Type: Sub-task
>  Components: Contrib
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
>Priority: Minor
> Attachments: HIVE-2623.v0.patch, HIVE-2623.v1.patch, 
> HIVE-2623.v2.patch, data.tar.gz
>
>
> Type-specific compressors for integers.
> Starting with elias gamma which prefers small values as per a power-law like 
> distribution. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HIVE-2504) Warehouse table subdirectories should inherit the group permissions of the warehouse parent directory

2011-12-22 Thread Chinna Rao Lalam (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam reassigned HIVE-2504:
--

Assignee: Chinna Rao Lalam

> Warehouse table subdirectories should inherit the group permissions of the 
> warehouse parent directory
> -
>
> Key: HIVE-2504
> URL: https://issues.apache.org/jira/browse/HIVE-2504
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Carl Steinbach
>Assignee: Chinna Rao Lalam
> Attachments: HIVE-2504.patch
>
>
> When the Hive Metastore creates a subdirectory in the Hive warehouse for
> a new table it does so with the default HDFS permissions. Since the default
> dfs.umask value is 022, this means that the new subdirectory will not inherit 
> the
> group write permissions of the hive warehouse directory.
> We should make the umask used by Warehouse.mkdirs() configurable, and set
> it to use a default value of 002.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2623) Add Integer type compressors

2011-12-22 Thread Krishna Kumar (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-2623:


Attachment: (was: steppedpowerlawTimestamp.gz)

> Add Integer type compressors
> 
>
> Key: HIVE-2623
> URL: https://issues.apache.org/jira/browse/HIVE-2623
> Project: Hive
>  Issue Type: Sub-task
>  Components: Contrib
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
>Priority: Minor
> Attachments: HIVE-2623.v0.patch, HIVE-2623.v1.patch, 
> HIVE-2623.v2.patch, data.tar.gz
>
>
> Type-specific compressors for integers.
> Starting with elias gamma which prefers small values as per a power-law like 
> distribution. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2623) Add Integer type compressors

2011-12-22 Thread Krishna Kumar (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-2623:


Attachment: data.tar.gz

Artificial data sets for the 8 integer compressors.

> Add Integer type compressors
> 
>
> Key: HIVE-2623
> URL: https://issues.apache.org/jira/browse/HIVE-2623
> Project: Hive
>  Issue Type: Sub-task
>  Components: Contrib
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
>Priority: Minor
> Attachments: HIVE-2623.v0.patch, HIVE-2623.v1.patch, 
> HIVE-2623.v2.patch, data.tar.gz
>
>
> Type-specific compressors for integers.
> Starting with elias gamma which prefers small values as per a power-law like 
> distribution. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2623) Add Integer type compressors

2011-12-22 Thread Krishna Kumar (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-2623:


Attachment: (was: steppedpowerlaw1S.gz)

> Add Integer type compressors
> 
>
> Key: HIVE-2623
> URL: https://issues.apache.org/jira/browse/HIVE-2623
> Project: Hive
>  Issue Type: Sub-task
>  Components: Contrib
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
>Priority: Minor
> Attachments: HIVE-2623.v0.patch, HIVE-2623.v1.patch, 
> HIVE-2623.v2.patch, data.tar.gz
>
>
> Type-specific compressors for integers.
> Starting with elias gamma which prefers small values as per a power-law like 
> distribution. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2623) Add Integer type compressors

2011-12-22 Thread Krishna Kumar (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-2623:


Attachment: (was: steppedpowerlaw2S.gz)

> Add Integer type compressors
> 
>
> Key: HIVE-2623
> URL: https://issues.apache.org/jira/browse/HIVE-2623
> Project: Hive
>  Issue Type: Sub-task
>  Components: Contrib
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
>Priority: Minor
> Attachments: HIVE-2623.v0.patch, HIVE-2623.v1.patch, 
> HIVE-2623.v2.patch, data.tar.gz
>
>
> Type-specific compressors for integers.
> Starting with elias gamma which prefers small values as per a power-law like 
> distribution. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2623) Add Integer type compressors

2011-12-22 Thread Krishna Kumar (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-2623:


Attachment: (was: steppedpowerlawIQ.gz)

> Add Integer type compressors
> 
>
> Key: HIVE-2623
> URL: https://issues.apache.org/jira/browse/HIVE-2623
> Project: Hive
>  Issue Type: Sub-task
>  Components: Contrib
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
>Priority: Minor
> Attachments: HIVE-2623.v0.patch, HIVE-2623.v1.patch, 
> HIVE-2623.v2.patch, data.tar.gz
>
>
> Type-specific compressors for integers.
> Starting with elias gamma which prefers small values as per a power-law like 
> distribution. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2663) DynamicSerDeTypeList.serialize() method have null check for the "nullProtocol" it needs to be change.

2011-12-22 Thread Chinna Rao Lalam (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-2663:
---

Attachment: HIVE-2663.patch

> DynamicSerDeTypeList.serialize() method  have null check for the 
> "nullProtocol" it needs to be change.
> --
>
> Key: HIVE-2663
> URL: https://issues.apache.org/jira/browse/HIVE-2663
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
> Environment: Hadoop 0.20.1, Hive0.9.0 and SUSE Linux Enterprise 
> Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
>Reporter: Chinna Rao Lalam
>Assignee: Chinna Rao Lalam
>Priority: Minor
> Attachments: HIVE-2663.patch
>
>
> In DynamicSerDeTypeList.serialize() method have the null check for the 
> "nullProtocol" like
>  if (element == null && nullProtocol != null) {
> here if element= null and nullProtocol=null in this case it will execute else 
> block with element=null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2623) Add Integer type compressors

2011-12-22 Thread Krishna Kumar (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-2623:


Attachment: HIVE-2623.v2.patch

Added another simple integer coder (unary) and the four variants of compressors 
based on that. 

> Add Integer type compressors
> 
>
> Key: HIVE-2623
> URL: https://issues.apache.org/jira/browse/HIVE-2623
> Project: Hive
>  Issue Type: Sub-task
>  Components: Contrib
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
>Priority: Minor
> Attachments: HIVE-2623.v0.patch, HIVE-2623.v1.patch, 
> HIVE-2623.v2.patch, steppedpowerlaw1S.gz, steppedpowerlaw2S.gz, 
> steppedpowerlawIQ.gz, steppedpowerlawTimestamp.gz
>
>
> Type-specific compressors for integers.
> Starting with elias gamma which prefers small values as per a power-law like 
> distribution. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2604) Add UberCompressor Serde/Codec to contrib which allows per-column compression strategies

2011-12-22 Thread Krishna Kumar (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-2604:


Component/s: (was: Query Processor)
 (was: Serializers/Deserializers)
 Contrib

> Add UberCompressor Serde/Codec to contrib which allows per-column compression 
> strategies
> 
>
> Key: HIVE-2604
> URL: https://issues.apache.org/jira/browse/HIVE-2604
> Project: Hive
>  Issue Type: Sub-task
>  Components: Contrib
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
> Attachments: HIVE-2604.v0.patch, HIVE-2604.v1.patch, 
> HIVE-2604.v2.patch
>
>
> The strategies supported are
> 1. using a specified codec on the column
> 2. using a specific codec on the column which is serialized via a specific 
> serde
> 3. using a specific "TypeSpecificCompressor" instance

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2623) Add Integer type compressors

2011-12-22 Thread Krishna Kumar (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-2623:


Component/s: (was: Query Processor)
 (was: Serializers/Deserializers)
 Contrib

> Add Integer type compressors
> 
>
> Key: HIVE-2623
> URL: https://issues.apache.org/jira/browse/HIVE-2623
> Project: Hive
>  Issue Type: Sub-task
>  Components: Contrib
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
>Priority: Minor
> Attachments: HIVE-2623.v0.patch, HIVE-2623.v1.patch, 
> steppedpowerlaw1S.gz, steppedpowerlaw2S.gz, steppedpowerlawIQ.gz, 
> steppedpowerlawTimestamp.gz
>
>
> Type-specific compressors for integers.
> Starting with elias gamma which prefers small values as per a power-law like 
> distribution. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2504) Warehouse table subdirectories should inherit the group permissions of the warehouse parent directory

2011-12-22 Thread Chinna Rao Lalam (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-2504:
---

Attachment: HIVE-2504.patch

> Warehouse table subdirectories should inherit the group permissions of the 
> warehouse parent directory
> -
>
> Key: HIVE-2504
> URL: https://issues.apache.org/jira/browse/HIVE-2504
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Carl Steinbach
> Attachments: HIVE-2504.patch
>
>
> When the Hive Metastore creates a subdirectory in the Hive warehouse for
> a new table it does so with the default HDFS permissions. Since the default
> dfs.umask value is 022, this means that the new subdirectory will not inherit 
> the
> group write permissions of the hive warehouse directory.
> We should make the umask used by Warehouse.mkdirs() configurable, and set
> it to use a default value of 002.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2659) SHOW FUNCTIONS still returns internal operators

2011-12-22 Thread Priyadarshini (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Priyadarshini updated HIVE-2659:


Status: Patch Available  (was: Open)

> SHOW FUNCTIONS still returns internal operators
> ---
>
> Key: HIVE-2659
> URL: https://issues.apache.org/jira/browse/HIVE-2659
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Carl Steinbach
>Assignee: Priyadarshini
> Attachments: HIVE-2659.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2659) SHOW FUNCTIONS still returns internal operators

2011-12-22 Thread Priyadarshini (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Priyadarshini updated HIVE-2659:


Attachment: HIVE-2659.patch

> SHOW FUNCTIONS still returns internal operators
> ---
>
> Key: HIVE-2659
> URL: https://issues.apache.org/jira/browse/HIVE-2659
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Carl Steinbach
>Assignee: Priyadarshini
> Attachments: HIVE-2659.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Hive-0.8.0-SNAPSHOT-h0.21 - Build # 134 - Still Failing

2011-12-22 Thread Apache Jenkins Server
Changes for Build #131

Changes for Build #132

Changes for Build #133

Changes for Build #134



No tests ran.

The Apache Jenkins build system has built Hive-0.8.0-SNAPSHOT-h0.21 (build #134)

Status: Still Failing

Check console output at 
https://builds.apache.org/job/Hive-0.8.0-SNAPSHOT-h0.21/134/ to view the 
results.


Re: Does hive REAL enable TestHadoop20SAuthBridge in hive-0.8.0 ? -- [HIVE-2257] patch doesn't work for me

2011-12-22 Thread Bing Li
So in hive-0.8.0, it excludes TestHadoop20SAuthBridge, right?
If so, which test case would cover the security in hive?

If I want to use hadoop-1.0.0, I ONLY can make is as *hadoop.version*,
right?  ( need to apply patch in [HIVE-2631] )

Thanks
- Bing

2011/12/22 Thomas Weise 

> This was taken out when making Hive compile with Hadoop 0.23.
>
> The test case will be put back as part of
>
> https://issues.apache.org/jira/browse/HIVE-2629
>
>
> On 12/22/11 12:15 AM, "Bing Li"  wrote:
>
> > Hi, All
> > When I ran hive UT, I found that TestHadoop20SAuthBridge wasn't compiled,
> > so TestHadoop20SAuthBridge won't be run by "ant test" command.
> >
> > In src/shims/build.xml, I found the following lines:
> >
> >   
> > 
> > 
> > 
> >   
> >
> > Then, I commented off lines in blue, and it could generate the class file
> > of TestHadoop20SAuthBridge.
> > But if I change the security hadoop version to 1.0.0, it failed with:
> >
> > build_shims:
> >  [echo] Project: shims
> >  [echo] Compiling shims against hadoop 1.0.1-SNAPSHOT
> > (/home/libing/Round-1/hive-0.8.0/src/build/hadoopcore/IHC-1.0.1-SNAPSHOT)
> >
> > BUILD FAILED
> > /home/libing/Round-1/hive-0.8.0/src/build.xml:307: The following error
> > occurred while executing this line:
> > /home/libing/Round-1/hive-0.8.0/src/build.xml:325: The following error
> > occurred while executing this line:
> > /home/libing/Round-1/hive-0.8.0/src/shims/build.xml:76: The following
> error
> > occurred while executing this line:
> > /home/libing/Round-1/hive-0.8.0/src/shims/build.xml:66: srcdir
> > "/home/libing/Round-1/hive-0.8.0/src/shims/src/1.0/java" does not exist!
> >
> >
> > Does it mean that if we want to use a hadoop as hadoop.security.version,
> we
> > should keep a directory in shims/src/xxx by ourselves as well?
> >
> >
> > Thanks,
> > - Bing
>
>


Re: Does hive REAL enable TestHadoop20SAuthBridge in hive-0.8.0 ? -- [HIVE-2257] patch doesn't work for me

2011-12-22 Thread Thomas Weise
This was taken out when making Hive compile with Hadoop 0.23.

The test case will be put back as part of

https://issues.apache.org/jira/browse/HIVE-2629


On 12/22/11 12:15 AM, "Bing Li"  wrote:

> Hi, All
> When I ran hive UT, I found that TestHadoop20SAuthBridge wasn't compiled,
> so TestHadoop20SAuthBridge won't be run by "ant test" command.
> 
> In src/shims/build.xml, I found the following lines:
> 
>   
> 
> 
> 
>   
> 
> Then, I commented off lines in blue, and it could generate the class file
> of TestHadoop20SAuthBridge.
> But if I change the security hadoop version to 1.0.0, it failed with:
> 
> build_shims:
>  [echo] Project: shims
>  [echo] Compiling shims against hadoop 1.0.1-SNAPSHOT
> (/home/libing/Round-1/hive-0.8.0/src/build/hadoopcore/IHC-1.0.1-SNAPSHOT)
> 
> BUILD FAILED
> /home/libing/Round-1/hive-0.8.0/src/build.xml:307: The following error
> occurred while executing this line:
> /home/libing/Round-1/hive-0.8.0/src/build.xml:325: The following error
> occurred while executing this line:
> /home/libing/Round-1/hive-0.8.0/src/shims/build.xml:76: The following error
> occurred while executing this line:
> /home/libing/Round-1/hive-0.8.0/src/shims/build.xml:66: srcdir
> "/home/libing/Round-1/hive-0.8.0/src/shims/src/1.0/java" does not exist!
> 
> 
> Does it mean that if we want to use a hadoop as hadoop.security.version, we
> should keep a directory in shims/src/xxx by ourselves as well?
> 
> 
> Thanks,
> - Bing



[jira] [Commented] (HIVE-2621) Allow multiple group bys with the same input data and spray keys to be run on the same reducer.

2011-12-22 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174730#comment-13174730
 ] 

Phabricator commented on HIVE-2621:
---

kevinwilfong has commented on the revision "HIVE-2621 [jira] Allow multiple 
group bys with the same input data and spray keys to be run on the same 
reducer.".

INLINE COMMENTS
  ql/src/test/queries/clientpositive/groupby7_noskew_multi_single_reducer.q:12 
I actually had it in one MR job originally, but the addition of the limits 
caused the addition of an extra job for each insert.  The code to do that is at 
line 6379 of the SemanticAnalyzer.  I assume it does this to ensure all the 
rows go to a single reducer, so the limit can be enforced.

  As for making hive.multigroupby.singlereducer true by default, currently map 
aggregation is set to true by default, and setting this new variable causes the 
map aggregation variable to be ignored for any group by that can go to the same 
reducer as at least one other.

  I'll turn it on by default, I just wanted to make that clear.

  Also, in that case this test isn't needed as I just copied another test and 
turned on hive.multigroupby.singlereducer
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:6211 This 
code looks at all the group bys in the query block and allows for only some of 
the distinct expressions and group by keys to be the same.

  The new code looks for sets of group by subqueries where all of the distinct 
expressions and group by keys are the same.

  I'll go back and see if I reused as much of the definitions of the two 
functions getCommonDistinctExprs and getCommonGroupByKeys as I could have, but 
I don't think this code here could be simplified using my new code.
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:6273 The 
above if block guarantees the job can be performed using only 2 MR jobs 
provided that there are no filter operators, and the distinct keys are the same 
across all subqueries.

  If my new code were used, there is no hard limit on the number of MR jobs 
because it depends on how many variations there are on the group by keys.

  I think it is possible that we could first look at the set of subqueries 
without filters, see how they split on both distinct and group by keys, and for 
any of these groups where two or more have the same distinct keys run this 
other method on them.  However, this could have been done much more easily 
before (without the comparison, just group queries with the same distinct keys 
together), and it wasn't, so I suspect the gains are not that great, so I'd 
much rather do this as part of a separate patch once we have queries that would 
benefit from it.

  I'm not sure whether or not the code is broken, but I would like to leave it 
for now, to be fixed, or modified as I described above as part of a separate 
patch.

REVISION DETAIL
  https://reviews.facebook.net/D567


> Allow multiple group bys with the same input data and spray keys to be run on 
> the same reducer.
> ---
>
> Key: HIVE-2621
> URL: https://issues.apache.org/jira/browse/HIVE-2621
> Project: Hive
>  Issue Type: New Feature
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-2621.1.patch.txt, HIVE-2621.D567.1.patch, 
> HIVE-2621.D567.2.patch, HIVE-2621.D567.3.patch
>
>
> Currently, when a user runs a query, such as a multi-insert, where each 
> insertion subclause consists of a simple query followed by a group by, the 
> group bys for each clause are run on a separate reducer.  This requires 
> writing the data for each group by clause to an intermediate file, and then 
> reading it back.  This uses a significant amount of the total CPU consumed by 
> the query for an otherwise simple query.
> If the subclauses are grouped by their distinct expressions and group by 
> keys, with all of the group by expressions for a group of subclauses run on a 
> single reducer, this would reduce the amount of reading/writing to 
> intermediate files for some queries.
> To do this, for each group of subclauses, in the mapper we would execute a 
> the filters for each subclause 'or'd together (provided each subclause has a 
> filter) followed by a reduce sink.  In the reducer, the child operators would 
> be each subclauses filter followed by the group by and any subsequent 
> operations.
> Note that this would require turning off map aggregation, so we would need to 
> make using this type of plan configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://ww

[jira] [Updated] (HIVE-2664) dummy issue for Phabricator testing, please ignore

2011-12-22 Thread Phabricator (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2664:
--

Attachment: HIVE-2664.D993.1.patch

jsichi requested code review of "HIVE-2664 [jira] dummy issue for Phabricator 
testing, please ignore".
Reviewers: JIRA

  https://issues.apache.org/jira/browse/HIVE-2664

  Foo.



TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D993

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
  ql/build.xml

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/2067/

Tip: use the X-Herald-Rules header to filter Herald messages in your client.


> dummy issue for Phabricator testing, please ignore
> --
>
> Key: HIVE-2664
> URL: https://issues.apache.org/jira/browse/HIVE-2664
> Project: Hive
>  Issue Type: Improvement
>Reporter: John Sichi
>Assignee: John Sichi
> Attachments: HIVE-2664.D939.1.patch, HIVE-2664.D963.1.patch, 
> HIVE-2664.D969.1.patch, HIVE-2664.D975.1.patch, HIVE-2664.D993.1.patch, 
> HIVE-2664.final.patch, HIVE-2664.final.patch, HIVE-2664.final.patch, 
> HIVE-2664.final.patch, HIVE-2664.final.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Does hive REAL enable TestHadoop20SAuthBridge in hive-0.8.0 ? -- [HIVE-2257] patch doesn't work for me

2011-12-22 Thread Bing Li
Hi, All
When I ran hive UT, I found that TestHadoop20SAuthBridge wasn't compiled,
so TestHadoop20SAuthBridge won't be run by "ant test" command.

In src/shims/build.xml, I found the following lines:

  



  

Then, I commented off lines in blue, and it could generate the class file
of TestHadoop20SAuthBridge.
But if I change the security hadoop version to 1.0.0, it failed with:

build_shims:
 [echo] Project: shims
 [echo] Compiling shims against hadoop 1.0.1-SNAPSHOT
(/home/libing/Round-1/hive-0.8.0/src/build/hadoopcore/IHC-1.0.1-SNAPSHOT)

BUILD FAILED
/home/libing/Round-1/hive-0.8.0/src/build.xml:307: The following error
occurred while executing this line:
/home/libing/Round-1/hive-0.8.0/src/build.xml:325: The following error
occurred while executing this line:
/home/libing/Round-1/hive-0.8.0/src/shims/build.xml:76: The following error
occurred while executing this line:
/home/libing/Round-1/hive-0.8.0/src/shims/build.xml:66: srcdir
"/home/libing/Round-1/hive-0.8.0/src/shims/src/1.0/java" does not exist!


Does it mean that if we want to use a hadoop as hadoop.security.version, we
should keep a directory in shims/src/xxx by ourselves as well?


Thanks,
- Bing