[jira] [Commented] (HIVE-6056) The AvroSerDe gives out BadSchemaException if a partition is added to the table

2014-04-06 Thread Rushil Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961650#comment-13961650
 ] 

Rushil Gupta commented on HIVE-6056:


I am not blocked on it. I found a workaround through pig. Wonder if its fixed 
though.

> The AvroSerDe gives out BadSchemaException if a partition is added to the 
> table
> ---
>
> Key: HIVE-6056
> URL: https://issues.apache.org/jira/browse/HIVE-6056
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema
>Affects Versions: 0.11.0
> Environment: amazon EMR (hadoop Amazon 1.0.3), avro-1.7.5
>Reporter: Rushil Gupta
>
> While creating an external table if I do not add a partition, I am able to 
> read files using following format: 
> {code}
> CREATE external TABLE event
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> STORED AS INPUTFORMAT 
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> LOCATION 's3n://test-event/input/2013/14/10'
> TBLPROPERTIES ('avro.schema.literal' = '..some schema..');
> {code}
> but if I add a partition based on date
> {code}
> CREATE external TABLE event
> PARTITIONED BY (ds STRING)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> STORED AS INPUTFORMAT 
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> LOCATION 's3n://test-event/input/'
> TBLPROPERTIES ('avro.schema.literal' = '..some schema..');
> ALTER TABLE event ADD IF NOT EXISTS PARTITION (ds = '2013_12_16') LOCATION 
> '2013/12/16/';
> {code}
> I get the following exception:
> {code}
> java.io.IOException:org.apache.hadoop.hive.serde2.avro.BadSchemaException
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-2621) Allow multiple group bys with the same input data and spray keys to be run on the same reducer.

2014-04-06 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961645#comment-13961645
 ] 

Lefty Leverenz commented on HIVE-2621:
--

I added *hive.multigroupby.singlereducer* to the Configuration Properties 
wikidoc.  It has the same definition as the defunct 
*hive.multigroupby.singlemr* -- is that correct?

bq.  Whether to optimize multi group by query to generate a single M/R  job 
plan. If the multi group by query has common group by keys, it will be 
optimized to generate a single M/R job.

* [Configuration Properties:  hive.multigroupby.singlemr 
|https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.multigroupby.singlemr]
* [Configuration Properties:  hive.multigroupby.singlereducer 
|https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.multigroupby.singlereducer]

> Allow multiple group bys with the same input data and spray keys to be run on 
> the same reducer.
> ---
>
> Key: HIVE-2621
> URL: https://issues.apache.org/jira/browse/HIVE-2621
> Project: Hive
>  Issue Type: New Feature
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Fix For: 0.9.0
>
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2621.D567.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2621.D567.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2621.D567.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2621.D567.4.patch, HIVE-2621.1.patch.txt
>
>
> Currently, when a user runs a query, such as a multi-insert, where each 
> insertion subclause consists of a simple query followed by a group by, the 
> group bys for each clause are run on a separate reducer.  This requires 
> writing the data for each group by clause to an intermediate file, and then 
> reading it back.  This uses a significant amount of the total CPU consumed by 
> the query for an otherwise simple query.
> If the subclauses are grouped by their distinct expressions and group by 
> keys, with all of the group by expressions for a group of subclauses run on a 
> single reducer, this would reduce the amount of reading/writing to 
> intermediate files for some queries.
> To do this, for each group of subclauses, in the mapper we would execute a 
> the filters for each subclause 'or'd together (provided each subclause has a 
> filter) followed by a reduce sink.  In the reducer, the child operators would 
> be each subclauses filter followed by the group by and any subsequent 
> operations.
> Note that this would require turning off map aggregation, so we would need to 
> make using this type of plan configurable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-2621) Allow multiple group bys with the same input data and spray keys to be run on the same reducer.

2014-04-06 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961641#comment-13961641
 ] 

Lefty Leverenz commented on HIVE-2621:
--

This jira removed the configuration property *hive.multigroupby.singlemr* 
(HIVE-2056) and added *hive.multigroupby.singlereducer*.

> Allow multiple group bys with the same input data and spray keys to be run on 
> the same reducer.
> ---
>
> Key: HIVE-2621
> URL: https://issues.apache.org/jira/browse/HIVE-2621
> Project: Hive
>  Issue Type: New Feature
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Fix For: 0.9.0
>
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2621.D567.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2621.D567.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2621.D567.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2621.D567.4.patch, HIVE-2621.1.patch.txt
>
>
> Currently, when a user runs a query, such as a multi-insert, where each 
> insertion subclause consists of a simple query followed by a group by, the 
> group bys for each clause are run on a separate reducer.  This requires 
> writing the data for each group by clause to an intermediate file, and then 
> reading it back.  This uses a significant amount of the total CPU consumed by 
> the query for an otherwise simple query.
> If the subclauses are grouped by their distinct expressions and group by 
> keys, with all of the group by expressions for a group of subclauses run on a 
> single reducer, this would reduce the amount of reading/writing to 
> intermediate files for some queries.
> To do this, for each group of subclauses, in the mapper we would execute a 
> the filters for each subclause 'or'd together (provided each subclause has a 
> filter) followed by a reduce sink.  In the reducer, the child operators would 
> be each subclauses filter followed by the group by and any subsequent 
> operations.
> Note that this would require turning off map aggregation, so we would need to 
> make using this type of plan configurable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6319) Insert, update, delete functionality needs a compactor

2014-04-06 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961630#comment-13961630
 ] 

Alan Gates commented on HIVE-6319:
--

Responses to Ashutosh's comments on review board.  Will upload new patch 
shortly.

> Insert, update, delete functionality needs a compactor
> --
>
> Key: HIVE-6319
> URL: https://issues.apache.org/jira/browse/HIVE-6319
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Alan Gates
>Assignee: Alan Gates
> Fix For: 0.13.0
>
> Attachments: 6319.wip.patch, HIVE-6319.patch, HIVE-6319.patch, 
> HIVE-6319.patch, HiveCompactorDesign.pdf
>
>
> In order to keep the number of delta files from spiraling out of control we 
> need a compactor to collect these delta files together, and eventually 
> rewrite the base file when the deltas get large enough.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6821) Fix some non-deterministic tests

2014-04-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961622#comment-13961622
 ] 

Hive QA commented on HIVE-6821:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12638932/HIVE-6821.3.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5548 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_dyn_part
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2159/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2159/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12638932

> Fix some non-deterministic tests 
> -
>
> Key: HIVE-6821
> URL: https://issues.apache.org/jira/browse/HIVE-6821
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-6821.1.patch, HIVE-6821.2.patch, HIVE-6821.3.patch
>
>
> A bunch of qfile tests look like they need an ORDER-BY added to the queries 
> so that the output looks repeatable when testing with hadoop1/hadoop2.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6757) Remove deprecated parquet classes from outside of org.apache package

2014-04-06 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961621#comment-13961621
 ] 

Brock Noland commented on HIVE-6757:


+1
Thank you Harish!!

> Remove deprecated parquet classes from outside of org.apache package
> 
>
> Key: HIVE-6757
> URL: https://issues.apache.org/jira/browse/HIVE-6757
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Blocker
> Fix For: 0.13.0
>
> Attachments: HIVE-6757.2.patch, HIVE-6757.patch, parquet-hive.patch
>
>
> Apache shouldn't release projects with files outside of the org.apache 
> namespace.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6757) Remove deprecated parquet classes from outside of org.apache package

2014-04-06 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961615#comment-13961615
 ] 

Xuefu Zhang commented on HIVE-6757:
---

+1. The patch looks good to me. Thanks to Harish for taking on this.

> Remove deprecated parquet classes from outside of org.apache package
> 
>
> Key: HIVE-6757
> URL: https://issues.apache.org/jira/browse/HIVE-6757
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Blocker
> Fix For: 0.13.0
>
> Attachments: HIVE-6757.2.patch, HIVE-6757.patch, parquet-hive.patch
>
>
> Apache shouldn't release projects with files outside of the org.apache 
> namespace.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6821) Fix some non-deterministic tests

2014-04-06 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961613#comment-13961613
 ] 

Ashutosh Chauhan commented on HIVE-6821:


+1

> Fix some non-deterministic tests 
> -
>
> Key: HIVE-6821
> URL: https://issues.apache.org/jira/browse/HIVE-6821
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-6821.1.patch, HIVE-6821.2.patch, HIVE-6821.3.patch
>
>
> A bunch of qfile tests look like they need an ORDER-BY added to the queries 
> so that the output looks repeatable when testing with hadoop1/hadoop2.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6841) Vectorized execution throws NPE for partitioning columns with __HIVE_DEFAULT_PARTITION__

2014-04-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961599#comment-13961599
 ] 

Hive QA commented on HIVE-6841:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12638899/HIVE-6841.3.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5549 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_dyn_part
org.apache.hive.service.cli.thrift.TestThriftBinaryCLIService.testExecuteStatementAsync
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2157/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2157/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12638899

> Vectorized execution throws NPE for partitioning columns with 
> __HIVE_DEFAULT_PARTITION__
> 
>
> Key: HIVE-6841
> URL: https://issues.apache.org/jira/browse/HIVE-6841
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
>Priority: Critical
> Attachments: HIVE-6841.1.patch, HIVE-6841.2.patch, HIVE-6841.3.patch
>
>
> If partitioning columns have __HIVE_DEFAULT_PARTITION__ or null, vectorized 
> execution throws NPE.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6821) Fix some non-deterministic tests

2014-04-06 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-6821:
-

Attachment: HIVE-6821.3.patch

patch v3 - rebase patch with trunk, also include list_bucket_dml_2.q as I've 
seen the results fail for this on both Mac/Linux.

> Fix some non-deterministic tests 
> -
>
> Key: HIVE-6821
> URL: https://issues.apache.org/jira/browse/HIVE-6821
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-6821.1.patch, HIVE-6821.2.patch, HIVE-6821.3.patch
>
>
> A bunch of qfile tests look like they need an ORDER-BY added to the queries 
> so that the output looks repeatable when testing with hadoop1/hadoop2.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6848) Importing into an existing table fails

2014-04-06 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961585#comment-13961585
 ] 

Xuefu Zhang commented on HIVE-6848:
---

I was wondering if a test case would help prevent future breakage.

> Importing into an existing table fails
> --
>
> Key: HIVE-6848
> URL: https://issues.apache.org/jira/browse/HIVE-6848
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Arpit Gupta
>Assignee: Harish Butani
> Fix For: 0.13.0
>
> Attachments: HIVE-6848.1.patch
>
>
> This is because ImportSemanticAnalyzer:checkTable doesn't account for the 
> renaming of OutputFormat class and the setting of a default value for 
> Serialization.Format



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6331) HIVE-5279 deprecated UDAF class without explanation/documentation/alternative

2014-04-06 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-6331:
--

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Patch committed to trunk. Thanks Lars.

> HIVE-5279 deprecated UDAF class without explanation/documentation/alternative
> -
>
> Key: HIVE-6331
> URL: https://issues.apache.org/jira/browse/HIVE-6331
> Project: Hive
>  Issue Type: Bug
>Reporter: Lars Francke
>Assignee: Lars Francke
>Priority: Minor
> Fix For: 0.14.0
>
> Attachments: HIVE-5279.1.patch, HIVE-6331.2.patch, HIVE-6331.3.patch
>
>
> HIVE-5279 added a @Deprecated annotation to the {{UDAF}} class. The comment 
> in that class says {quote}UDAF classes are REQUIRED to inherit from this 
> class.{quote}
> One of these two needs to be updated. Either remove the annotation or 
> document why it was deprecated and what to use instead.
> Unfortunately [~navis] did not leave any documentation about his intentions.
> I'm happy to provide a patch once I know the intentions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6757) Remove deprecated parquet classes from outside of org.apache package

2014-04-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961565#comment-13961565
 ] 

Hive QA commented on HIVE-6757:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12638894/HIVE-6757.2.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5548 tests executed
*Failed tests:*
{noformat}
org.apache.hive.service.cli.thrift.TestThriftHttpCLIService.testExecuteStatementAsync
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2156/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2156/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12638894

> Remove deprecated parquet classes from outside of org.apache package
> 
>
> Key: HIVE-6757
> URL: https://issues.apache.org/jira/browse/HIVE-6757
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Blocker
> Fix For: 0.13.0
>
> Attachments: HIVE-6757.2.patch, HIVE-6757.patch, parquet-hive.patch
>
>
> Apache shouldn't release projects with files outside of the org.apache 
> namespace.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6840) Use Unordered Output for Bucket Map Joins on Tez

2014-04-06 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-6840:
-

   Resolution: Fixed
Fix Version/s: 0.13.0
   Status: Resolved  (was: Patch Available)

Committed to branch and trunk. Thanks [~sseth]!

> Use Unordered Output for Bucket Map Joins on Tez
> 
>
> Key: HIVE-6840
> URL: https://issues.apache.org/jira/browse/HIVE-6840
> Project: Hive
>  Issue Type: Bug
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Fix For: 0.13.0
>
> Attachments: HIVE-6840.1.patch, HIVE-6840.2.patch
>
>
> Tez 0.4 adds a placeholder UnorderedOutput. Once Hive is changed to use 0.4, 
> it should be possible to make use of this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6759) Fix reading partial ORC files while they are being written

2014-04-06 Thread Harish Butani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961560#comment-13961560
 ] 

Harish Butani commented on HIVE-6759:
-

+1 for 0.13

> Fix reading partial ORC files while they are being written
> --
>
> Key: HIVE-6759
> URL: https://issues.apache.org/jira/browse/HIVE-6759
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Critical
> Fix For: 0.13.0
>
> Attachments: HIVE-6759.patch
>
>
> HDFS with the hflush ensures the bytes are visible, but doesn't update the 
> file length on the NameNode. Currently the Orc reader will only read up to 
> the length on the NameNode. If the user specified a length from a 
> flush_length file, the Orc reader should trust it to be right.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6834) Dynamic partition optimization bails out after removing file sink operator

2014-04-06 Thread Harish Butani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-6834:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk and 0.13
Thanks Prasanth

> Dynamic partition optimization bails out after removing file sink operator
> --
>
> Key: HIVE-6834
> URL: https://issues.apache.org/jira/browse/HIVE-6834
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>Priority: Critical
> Fix For: 0.13.0
>
> Attachments: HIVE-6834.1.patch
>
>
> HIVE-6455 introduced scalable dynamic partitioning optimization that bails 
> out after removing the file sink operator. This causes union_remove_16.q test 
> to fail by removing all the stages in the plan.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6852) JDBC client connections hang at TSaslTransport

2014-04-06 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961552#comment-13961552
 ] 

Szehon Ho commented on HIVE-6852:
-

Can you check the jstack of the hanging client to see what its waiting for?

> JDBC client connections hang at TSaslTransport
> --
>
> Key: HIVE-6852
> URL: https://issues.apache.org/jira/browse/HIVE-6852
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Reporter: jay vyas
>
> I've noticed that when there is an underlying issue in connecting a client to 
> the JDBC interface of the HiveServer2 to run queries, you get a hang after 
> the thrift portion, at least in certain scenarios: 
> Turning log4j to DEBUG, you can see the following when trying to get a 
> connection using:
> {noformat}
> Connection jdbc = 
> DriverManager.getConnection(this.con,"hive","password");
> "jdbc:hive2://localhost:1/default",
> {noformat}
> The logs get to here before the hang :
> {noformat}
> 0[main] DEBUG org.apache.thrift.transport.TSaslTransport  - opening 
> transport org.apache.thrift.transport.TSaslClientTransport@219ba640
> 0 [main] DEBUG org.apache.thrift.transport.TSaslTransport  - opening 
> transport org.apache.thrift.transport.TSaslClientTransport@219ba640
> 3[main] DEBUG org.apache.thrift.transport.TSaslClientTransport  - Sending 
> mechanism name PLAIN and initial response of length 14
> 3 [main] DEBUG org.apache.thrift.transport.TSaslClientTransport  - Sending 
> mechanism name PLAIN and initial response of length 14
> 5[main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: 
> Writing message with status START and payload length 5
> 5 [main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: Writing 
> message with status START and payload length 5
> 5[main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: 
> Writing message with status COMPLETE and payload length 14
> 5 [main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: Writing 
> message with status COMPLETE and payload length 14
> 5[main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: Start 
> message handled
> 5 [main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: Start 
> message handled
> 5[main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: Main 
> negotiation loop complete
> 5 [main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: Main 
> negotiation loop complete
> 6[main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: SASL 
> Client receiving last message
> 6 [main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: SASL 
> Client receiving last message
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HIVE-6843) INSTR for UTF-8 returns incorrect position

2014-04-06 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho reassigned HIVE-6843:
---

Assignee: Szehon Ho

> INSTR for UTF-8 returns incorrect position
> --
>
> Key: HIVE-6843
> URL: https://issues.apache.org/jira/browse/HIVE-6843
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 0.11.0, 0.12.0
>Reporter: Clif Kranish
>Assignee: Szehon Ho
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-1598) use SequenceFile rather than TextFile format for hive query results

2014-04-06 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961553#comment-13961553
 ] 

Lefty Leverenz commented on HIVE-1598:
--

This adds the *hive.query.result.fileformat* configuration parameter to 
HiveConf.java with a default value of TextFile.

(This comment makes it easy to find the jira that introduced 
*hive.query.result.fileformat*.)

> use SequenceFile rather than TextFile format for hive query results
> ---
>
> Key: HIVE-1598
> URL: https://issues.apache.org/jira/browse/HIVE-1598
> Project: Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1598.2.patch, HIVE-1598.patch
>
>
> Hive query's result is written to a temporary directory first and then 
> FetchTask takes the files and display it to the users. Currently the file 
> format used for the resulting file is TextFile format. This could cause 
> incorrect result display if some string typed column contains new lines, 
> which are used as record delimiters in TextInputFormat. Switching to 
> SequenceFile format will solve this problem. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6843) INSTR for UTF-8 returns incorrect position

2014-04-06 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961551#comment-13961551
 ] 

Szehon Ho commented on HIVE-6843:
-

Hi, I was going to look at this , but when I tried it looks like that your 
first P is a Cyrillic P (d0,a0), while the second is a English P (50).  Can you 
verify?  If you make the second a Cyrlilic P, than it works.

> INSTR for UTF-8 returns incorrect position
> --
>
> Key: HIVE-6843
> URL: https://issues.apache.org/jira/browse/HIVE-6843
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 0.11.0, 0.12.0
>Reporter: Clif Kranish
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6468) HS2 out of memory error when curl sends a get request

2014-04-06 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961535#comment-13961535
 ] 

Lefty Leverenz commented on HIVE-6468:
--

Good doc, [~vaibhavgumashta].  I did some light editing (mostly capitalization) 
and fixed a typo.  Were the empty links on "jdbc:hive2://" intentional?  I 
removed them, but I'll put them back if you wanted them there for emphasis.  
Your use of color on a default value is nice -- we might want to use that in 
other wikidocs.

> HS2 out of memory error when curl sends a get request
> -
>
> Key: HIVE-6468
> URL: https://issues.apache.org/jira/browse/HIVE-6468
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0
> Environment: Centos 6.3, hive 12, hadoop-2.2
>Reporter: Abin Shahab
>Assignee: Navis
> Attachments: HIVE-6468.1.patch.txt
>
>
> We see an out of memory error when we run simple beeline calls.
> (The hive.server2.transport.mode is binary)
> curl localhost:1
> Exception in thread "pool-2-thread-8" java.lang.OutOfMemoryError: Java heap 
> space
>   at 
> org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:181)
>   at 
> org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
>   at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
>   at 
> org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
>   at 
> org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6850) For FetchOperator, Driver uses the valid transaction list from the previous query

2014-04-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961531#comment-13961531
 ] 

Hive QA commented on HIVE-6850:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12638889/HIVE-6850.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5548 tests executed
*Failed tests:*
{noformat}
org.apache.hive.jdbc.TestJdbcDriver2.testNewConnectionConfiguration
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2154/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2154/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12638889

> For FetchOperator, Driver uses the valid transaction list from the previous 
> query
> -
>
> Key: HIVE-6850
> URL: https://issues.apache.org/jira/browse/HIVE-6850
> Project: Hive
>  Issue Type: Bug
>  Components: Clients
>Reporter: Alan Gates
>Assignee: Owen O'Malley
>Priority: Blocker
> Fix For: 0.13.0
>
> Attachments: HIVE-6850.patch
>
>
> The problem is two fold:
> * FetchTask.initialize, which is called during parsing of the query, converts 
> the HiveConf it is given into a JobConf by copying it.
> * Driver.recordValidTxns, which runs after parsing, adds the valid 
> transactions to the HiveConf.
> Thus fetch operators will use the transactions from the previous command.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6840) Use Unordered Output for Bucket Map Joins on Tez

2014-04-06 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961525#comment-13961525
 ] 

Gunther Hagleitner commented on HIVE-6840:
--

Failure is unrelated.

> Use Unordered Output for Bucket Map Joins on Tez
> 
>
> Key: HIVE-6840
> URL: https://issues.apache.org/jira/browse/HIVE-6840
> Project: Hive
>  Issue Type: Bug
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-6840.1.patch, HIVE-6840.2.patch
>
>
> Tez 0.4 adds a placeholder UnorderedOutput. Once Hive is changed to use 0.4, 
> it should be possible to make use of this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6837) HiveServer2 thrift/http mode & binary mode proxy user check fails reporting IP null for client

2014-04-06 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961493#comment-13961493
 ] 

Thejas M Nair commented on HIVE-6837:
-

+1

Vaibhav,
I remember you had suggested that TSetIpAddressProcessor also use 
SessionManager thread local, in a different jira. I think that makes sense, it 
would be cleaner. We should do that in a followup jira.


> HiveServer2 thrift/http mode & binary mode proxy user check fails reporting 
> IP null for client
> --
>
> Key: HIVE-6837
> URL: https://issues.apache.org/jira/browse/HIVE-6837
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.13.0
>Reporter: Dilli Arumugam
>Assignee: Vaibhav Gumashta
> Fix For: 0.13.0
>
> Attachments: HIVE-6837.1.patch, HIVE-6837.2.patch, HIVE-6837.3.patch, 
> hive.log
>
>
> Hive Server running thrift/http with Kerberos security.
> Kinited user knox attempting to proxy as sam.
> Beeline connection failed reporting error on hive server logs:
> Caused by: org.apache.hadoop.security.authorize.AuthorizationException: 
> Unauthorized connection for super-user: knox from IP null



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe

2014-04-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961491#comment-13961491
 ] 

Hive QA commented on HIVE-6785:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12638885/HIVE-6785.2.patch.txt

{color:green}SUCCESS:{color} +1 5549 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2153/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2153/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12638885

> query fails when partitioned table's table level serde is ParquetHiveSerDe 
> and partition level serde is of different SerDe
> --
>
> Key: HIVE-6785
> URL: https://issues.apache.org/jira/browse/HIVE-6785
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats, Serializers/Deserializers
>Affects Versions: 0.13.0
>Reporter: Tongjie Chen
> Fix For: 0.14.0
>
> Attachments: HIVE-6785.1.patch.txt, HIVE-6785.2.patch.txt
>
>
> When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of 
> other SerDe, AND if this table has string column[s], hive generates confusing 
> error message:
> "Failed with exception java.io.IOException:java.lang.ClassCastException: 
> parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector"
> This is confusing because timestamp is mentioned even if it is not been used 
> by the table. The reason is when there is SerDe difference between table and 
> partition, hive tries to convert objectinspector of two SerDes. 
> ParquetHiveSerDe's object inspector for string type is ParquetStringInspector 
> (newly introduced), neither a subclass of WritableStringObjectInspector nor 
> JavaStringObjectInspector, which ObjectInspectorConverters expect for string 
> category objector inspector. There is no break statement in STRING case 
> statement, hence the following TIMESTAMP case statement is executed, 
> generating confusing error message.
> see also in the following parquet issue:
> https://github.com/Parquet/parquet-mr/issues/324
> To fix that it is relatively easy, just make ParquetStringInspector subclass 
> of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. 
> But because constructor of class JavaStringObjectInspector is package scope 
> instead of public or protected, we would need to move ParquetStringInspector 
> to the same package with JavaStringObjectInspector.
> Also ArrayWritableObjectInspector's setStructFieldData needs to also accept 
> List data, since the corresponding setStructFieldData and create methods 
> return a list. This is also needed when table SerDe is ParquetHiveSerDe, and 
> partition SerDe is something else.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6848) Importing into an existing table fails

2014-04-06 Thread Harish Butani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-6848:


Fix Version/s: 0.13.0

> Importing into an existing table fails
> --
>
> Key: HIVE-6848
> URL: https://issues.apache.org/jira/browse/HIVE-6848
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Arpit Gupta
>Assignee: Harish Butani
> Fix For: 0.13.0
>
> Attachments: HIVE-6848.1.patch
>
>
> This is because ImportSemanticAnalyzer:checkTable doesn't account for the 
> renaming of OutputFormat class and the setting of a default value for 
> Serialization.Format



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6848) Importing into an existing table fails

2014-04-06 Thread Harish Butani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-6848:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk and 0.13
thanks Ashutosh for the review

> Importing into an existing table fails
> --
>
> Key: HIVE-6848
> URL: https://issues.apache.org/jira/browse/HIVE-6848
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Arpit Gupta
>Assignee: Harish Butani
> Attachments: HIVE-6848.1.patch
>
>
> This is because ImportSemanticAnalyzer:checkTable doesn't account for the 
> renaming of OutputFormat class and the setting of a default value for 
> Serialization.Format



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5687) Streaming support in Hive

2014-04-06 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961469#comment-13961469
 ] 

Owen O'Malley commented on HIVE-5687:
-

Initial comments
* Please create a package-info.java (or package.html) for the entire package 
that has the text from the design document, but without the example.
* I believe the API will go through some iterations and use before it becomes 
stable. We should warn users that it will likely evolve in future versions of 
Hive and won't necessarily be backwards compatible. The package-info.java is 
probably the best place to place the warning.
* The current API requires users to implement a RecordWriter wrapper for each 
SerDe they want to use. In Hive 0.14, I think we need to revisit this and 
switch to just requiring a serde class name and a string to string map of serde 
properties. This way, any of a user's current SerDes can be used to parse the 
byte[] and there can be a generic method for constructing the object using 
reflection.
* The code shouldn't depend on OrcOutputFormat, but instead find the 
OutputFormat of the table/partition and use that. The streaming code should 
only require that it implement AcidOutputFormat.
* The RecordWriter should be passed the HiveConf rather than create it. It will 
make it easier to do unit tests.
* The StreamingIntegrationTester needs to print the exception's getMessage to 
stderr if the options don't parse correctly. Otherwise, the user doesn't get 
any clue as to which parameter they forgot.
* I don't see how the column reordering can be invoken. The SerDe is using the 
table properties from the table in the MetaStore to define the columns it 
returns, so the two should always be the same. My suggestion is to remove all 
of the column reordering code.
* If you don't remove the column ordering code, you should deserialize and then 
reorder the columns rather than the current strategy of deserialize, reorder, 
serialize, and deserialize.
* Revert the change that adds startMetaStore. It isn't called and thus 
shouldn't be added.
* The method writeImpl(byte[]) doesn't add any value and should just be inlined.
* Why do you use DDL to create partitions rather than the MetaStoreClient API 
that you everywhere else?

Some style guidelines:
* Please split the lines that are longer than 100 characters and it is even 
better if they are less than 80 characters.
* Ensure your if statements have a space before the parenthesis.
* Remove the commented out code.
* Please remove the private uncalled functions (eg. 
HiveEndPoint.newConnection(String proxyUser, ...) )
* You've defined a lot of exceptions in this API. Is the user expected to 
handle each exception separately? Your throw declarations don't list the exact 
exceptions that are thrown. You'd be better off with different exceptions only 
when the user is expected to be able to handle a specific error. Otherwise, you 
might as well use StreamingException with a descriptive error message for 
everything.


> Streaming support in Hive
> -
>
> Key: HIVE-5687
> URL: https://issues.apache.org/jira/browse/HIVE-5687
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: ACID, Streaming
> Fix For: 0.13.0
>
> Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 
> 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, 
> HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, 
> HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, Hive Streaming 
> Ingest API for v3 patch.pdf, Hive Streaming Ingest API for v4 patch.pdf
>
>
> Implement support for Streaming data into HIVE.
> - Provide a client streaming API 
> - Transaction support: Clients should be able to periodically commit a batch 
> of records atomically
> - Immediate visibility: Records should be immediately visible to queries on 
> commit
> - Should not overload HDFS with too many small files
> Use Cases:
>  - Streaming logs into HIVE via Flume
>  - Streaming results of computations from Storm



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6837) HiveServer2 thrift/http mode & binary mode proxy user check fails reporting IP null for client

2014-04-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961466#comment-13961466
 ] 

Hive QA commented on HIVE-6837:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12638909/HIVE-6837.3.patch

{color:green}SUCCESS:{color} +1 5548 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2152/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2152/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12638909

> HiveServer2 thrift/http mode & binary mode proxy user check fails reporting 
> IP null for client
> --
>
> Key: HIVE-6837
> URL: https://issues.apache.org/jira/browse/HIVE-6837
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.13.0
>Reporter: Dilli Arumugam
>Assignee: Vaibhav Gumashta
> Fix For: 0.13.0
>
> Attachments: HIVE-6837.1.patch, HIVE-6837.2.patch, HIVE-6837.3.patch, 
> hive.log
>
>
> Hive Server running thrift/http with Kerberos security.
> Kinited user knox attempting to proxy as sam.
> Beeline connection failed reporting error on hive server logs:
> Caused by: org.apache.hadoop.security.authorize.AuthorizationException: 
> Unauthorized connection for super-user: knox from IP null



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6852) JDBC client connections hang at TSaslTransport

2014-04-06 Thread jay vyas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jay vyas updated HIVE-6852:
---

Description: 
I've noticed that when there is an underlying issue in connecting a client to 
the JDBC interface of the HiveServer2 to run queries, you get a hang after the 
thrift portion, at least in certain scenarios: 

Turning log4j to DEBUG, you can see the following when trying to get a 
connection using:

{noformat}
Connection jdbc = 
DriverManager.getConnection(this.con,"hive","password");
"jdbc:hive2://localhost:1/default",
{noformat}

The logs get to here before the hang :

{noformat}
0[main] DEBUG org.apache.thrift.transport.TSaslTransport  - opening 
transport org.apache.thrift.transport.TSaslClientTransport@219ba640
0 [main] DEBUG org.apache.thrift.transport.TSaslTransport  - opening transport 
org.apache.thrift.transport.TSaslClientTransport@219ba640
3[main] DEBUG org.apache.thrift.transport.TSaslClientTransport  - Sending 
mechanism name PLAIN and initial response of length 14
3 [main] DEBUG org.apache.thrift.transport.TSaslClientTransport  - Sending 
mechanism name PLAIN and initial response of length 14
5[main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: Writing 
message with status START and payload length 5
5 [main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: Writing 
message with status START and payload length 5
5[main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: Writing 
message with status COMPLETE and payload length 14
5 [main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: Writing 
message with status COMPLETE and payload length 14
5[main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: Start 
message handled
5 [main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: Start 
message handled
5[main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: Main 
negotiation loop complete
5 [main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: Main 
negotiation loop complete
6[main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: SASL 
Client receiving last message
6 [main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: SASL 
Client receiving last message
{noformat}

  was:
I've noticed that when connecting a client to the JDBC interface of the 
HiveServer2 to run queries, you get a hang after the thrift portion, at least 
in certain scenarios: 

Turning log4j to DEBUG, you can see the following when trying to get a 
connection using:

{noformat}
Connection jdbc = 
DriverManager.getConnection(this.con,"hive","password");
"jdbc:hive2://localhost:1/default",
{noformat}

The logs get to here before the hang :

{noformat}
0[main] DEBUG org.apache.thrift.transport.TSaslTransport  - opening 
transport org.apache.thrift.transport.TSaslClientTransport@219ba640
0 [main] DEBUG org.apache.thrift.transport.TSaslTransport  - opening transport 
org.apache.thrift.transport.TSaslClientTransport@219ba640
3[main] DEBUG org.apache.thrift.transport.TSaslClientTransport  - Sending 
mechanism name PLAIN and initial response of length 14
3 [main] DEBUG org.apache.thrift.transport.TSaslClientTransport  - Sending 
mechanism name PLAIN and initial response of length 14
5[main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: Writing 
message with status START and payload length 5
5 [main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: Writing 
message with status START and payload length 5
5[main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: Writing 
message with status COMPLETE and payload length 14
5 [main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: Writing 
message with status COMPLETE and payload length 14
5[main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: Start 
message handled
5 [main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: Start 
message handled
5[main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: Main 
negotiation loop complete
5 [main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: Main 
negotiation loop complete
6[main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: SASL 
Client receiving last message
6 [main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: SASL 
Client receiving last message
{noformat}


> JDBC client connections hang at TSaslTransport
> --
>
> Key: HIVE-6852
> URL: https://issues.apache.org/jira/browse/HIVE-6852
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Reporter: jay vyas
>
> I've noticed that when there is an underlying issue in connecting a client to 
> the JDBC interface of the HiveServer2 to run queries, you

[jira] [Commented] (HIVE-1608) use sequencefile as the default for storing intermediate results

2014-04-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961446#comment-13961446
 ] 

Hive QA commented on HIVE-1608:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12638875/HIVE-1608.patch

{color:red}ERROR:{color} -1 due to 487 failed/errored test(s), 5548 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alias_casted_column
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_allcolref_in_udf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ambiguous_col
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_union
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ansi_sql_arithmetic
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join0
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join15
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join16
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18_multi_distinct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join21
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join22
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join24
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join27
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join28
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join29
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join30
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join31
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join32
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_reordering_values
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_without_localtask
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_smb_mapjoin_14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_15
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_binarysortable_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_map_join_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_map_join_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_7
org.apache.hadoop.hive.cli.TestCliDriver.testC

[jira] [Resolved] (HIVE-3065) New lines in columns can cause problems even when using sequence files

2014-04-06 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland resolved HIVE-3065.


Resolution: Duplicate

This is a duplicate of HIVE-1608. There is a very easy work around for this 
bug, setting "hive.query.result.fileformat" to "SequenceFile".

> New lines in columns can cause problems even when using sequence files
> --
>
> Key: HIVE-3065
> URL: https://issues.apache.org/jira/browse/HIVE-3065
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.7.1, 0.8.1
>Reporter: Joey Echeverria
>
> When using sequence files as the container format, I'd expect to be able to 
> embed new lines in a column. However, this causes problems when the data is 
> output if the newlines aren't manually stripped or escaped. This tends to 
> show up as each row of output generating two (or more) rows with nulls after 
> the column with a new line and nulls for the "empty" columns on the second 
> row.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6840) Use Unordered Output for Bucket Map Joins on Tez

2014-04-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961407#comment-13961407
 ] 

Hive QA commented on HIVE-6840:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12638856/HIVE-6840.2.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5547 tests executed
*Failed tests:*
{noformat}
org.apache.hive.service.cli.thrift.TestThriftBinaryCLIService.testExecuteStatementAsync
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2149/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2149/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12638856

> Use Unordered Output for Bucket Map Joins on Tez
> 
>
> Key: HIVE-6840
> URL: https://issues.apache.org/jira/browse/HIVE-6840
> Project: Hive
>  Issue Type: Bug
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-6840.1.patch, HIVE-6840.2.patch
>
>
> Tez 0.4 adds a placeholder UnorderedOutput. Once Hive is changed to use 0.4, 
> it should be possible to make use of this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-6852) JDBC client connections hang at TSaslTransport

2014-04-06 Thread jay vyas (JIRA)
jay vyas created HIVE-6852:
--

 Summary: JDBC client connections hang at TSaslTransport
 Key: HIVE-6852
 URL: https://issues.apache.org/jira/browse/HIVE-6852
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Reporter: jay vyas


I've noticed that when connecting a client to the JDBC interface of the 
HiveServer2 to run queries, you get a hang after the thrift portion, at least 
in certain scenarios: 

Turning log4j to DEBUG, you can see the following when trying to get a 
connection using:

{noformat}
Connection jdbc = 
DriverManager.getConnection(this.con,"hive","password");
"jdbc:hive2://localhost:1/default",
{noformat}

The logs get to here before the hang :

{noformat}
0[main] DEBUG org.apache.thrift.transport.TSaslTransport  - opening 
transport org.apache.thrift.transport.TSaslClientTransport@219ba640
0 [main] DEBUG org.apache.thrift.transport.TSaslTransport  - opening transport 
org.apache.thrift.transport.TSaslClientTransport@219ba640
3[main] DEBUG org.apache.thrift.transport.TSaslClientTransport  - Sending 
mechanism name PLAIN and initial response of length 14
3 [main] DEBUG org.apache.thrift.transport.TSaslClientTransport  - Sending 
mechanism name PLAIN and initial response of length 14
5[main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: Writing 
message with status START and payload length 5
5 [main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: Writing 
message with status START and payload length 5
5[main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: Writing 
message with status COMPLETE and payload length 14
5 [main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: Writing 
message with status COMPLETE and payload length 14
5[main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: Start 
message handled
5 [main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: Start 
message handled
5[main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: Main 
negotiation loop complete
5 [main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: Main 
negotiation loop complete
6[main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: SASL 
Client receiving last message
6 [main] DEBUG org.apache.thrift.transport.TSaslTransport  - CLIENT: SASL 
Client receiving last message
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6837) HiveServer2 thrift/http mode & binary mode proxy user check fails reporting IP null for client

2014-04-06 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-6837:
---

Attachment: HIVE-6837.3.patch

> HiveServer2 thrift/http mode & binary mode proxy user check fails reporting 
> IP null for client
> --
>
> Key: HIVE-6837
> URL: https://issues.apache.org/jira/browse/HIVE-6837
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.13.0
>Reporter: Dilli Arumugam
>Assignee: Vaibhav Gumashta
> Fix For: 0.13.0
>
> Attachments: HIVE-6837.1.patch, HIVE-6837.2.patch, HIVE-6837.3.patch, 
> hive.log
>
>
> Hive Server running thrift/http with Kerberos security.
> Kinited user knox attempting to proxy as sam.
> Beeline connection failed reporting error on hive server logs:
> Caused by: org.apache.hadoop.security.authorize.AuthorizationException: 
> Unauthorized connection for super-user: knox from IP null



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 20061: HiveServer2 thrift/http mode & binary mode proxy user check fails reporting IP null for client

2014-04-06 Thread Vaibhav Gumashta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20061/
---

(Updated April 6, 2014, 12:10 p.m.)


Review request for hive, dilli dorai, Prasad Mujumdar, and Thejas Nair.


Bugs: HIVE-6837
https://issues.apache.org/jira/browse/HIVE-6837


Repository: hive-git


Description
---

https://issues.apache.org/jira/browse/HIVE-6837


Diffs (updated)
-

  service/src/java/org/apache/hive/service/auth/HiveAuthFactory.java d8f4822 
  service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
046e4d9 
  service/src/java/org/apache/hive/service/cli/thrift/ThriftHttpServlet.java 
2bda9a4 

Diff: https://reviews.apache.org/r/20061/diff/


Testing
---

Using beeline on a secure cluster, running HS2 in http/binary modes.


Thanks,

Vaibhav Gumashta



[jira] [Updated] (HIVE-5998) Add vectorized reader for Parquet files

2014-04-06 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-5998:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk r1585290

> Add vectorized reader for Parquet files
> ---
>
> Key: HIVE-5998
> URL: https://issues.apache.org/jira/browse/HIVE-5998
> Project: Hive
>  Issue Type: Sub-task
>  Components: Serializers/Deserializers, Vectorization
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>Priority: Minor
>  Labels: Parquet, vectorization
> Attachments: HIVE-5998.1.patch, HIVE-5998.10.patch, 
> HIVE-5998.11.patch, HIVE-5998.12.patch, HIVE-5998.13.patch, 
> HIVE-5998.2.patch, HIVE-5998.3.patch, HIVE-5998.4.patch, HIVE-5998.5.patch, 
> HIVE-5998.6.patch, HIVE-5998.7.patch, HIVE-5998.8.patch, HIVE-5998.9.patch
>
>
> HIVE-5783 is adding native Parquet support in Hive. As Parquet is a columnar 
> format, it makes sense to provide a vectorized reader, similar to how RC and 
> ORC formats have, to benefit from vectorized execution engine.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6848) Importing into an existing table fails

2014-04-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961389#comment-13961389
 ] 

Hive QA commented on HIVE-6848:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12638820/HIVE-6848.1.patch

{color:green}SUCCESS:{color} +1 5547 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2148/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2148/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12638820

> Importing into an existing table fails
> --
>
> Key: HIVE-6848
> URL: https://issues.apache.org/jira/browse/HIVE-6848
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Arpit Gupta
>Assignee: Harish Butani
> Attachments: HIVE-6848.1.patch
>
>
> This is because ImportSemanticAnalyzer:checkTable doesn't account for the 
> renaming of OutputFormat class and the setting of a default value for 
> Serialization.Format



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6784) parquet-hive should allow column type change

2014-04-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961358#comment-13961358
 ] 

Hive QA commented on HIVE-6784:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12638877/HIVE-6784.2.patch.txt

{color:green}SUCCESS:{color} +1 5550 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2147/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2147/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12638877

> parquet-hive should allow column type change
> 
>
> Key: HIVE-6784
> URL: https://issues.apache.org/jira/browse/HIVE-6784
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats, Serializers/Deserializers
>Affects Versions: 0.13.0
>Reporter: Tongjie Chen
> Fix For: 0.14.0
>
> Attachments: HIVE-6784.1.patch.txt, HIVE-6784.2.patch.txt
>
>
> see also in the following parquet issue:
> https://github.com/Parquet/parquet-mr/issues/323
> Currently, if we change parquet format hive table using "alter table 
> parquet_table change c1 c1 bigint " ( assuming original type of c1 is int), 
> it will result in exception thrown from SerDe: 
> "org.apache.hadoop.io.IntWritable cannot be cast to 
> org.apache.hadoop.io.LongWritable" in query runtime.
> This is different behavior from hive (using other file format), where it will 
> try to perform cast (null value in case of incompatible type).
> Parquet Hive's RecordReader returns an ArrayWritable (based on schema stored 
> in footers of parquet files); ParquetHiveSerDe also creates an corresponding 
> ArrayWritableObjectInspector (but using column type info from metastore). 
> Whenever there is column type change, the objector inspector will throw 
> exception, since WritableLongObjectInspector cannot inspect an IntWritable 
> etc...
> Conversion has to happen somewhere if we want to allow type change. SerDe's 
> deserialize method seems a natural place for it.
> Currently, serialize method calls createStruct (then createPrimitive) for 
> every record, but it creates a new object regardless, which seems expensive. 
> I think that could be optimized a bit by just returning the object passed if 
> already of the right type. deserialize also reuse this method, if there is a 
> type change, there will be new object to be created, which I think is 
> inevitable. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6738) HiveServer2 secure Thrift/HTTP needs to accept doAs parameter from proxying intermediary

2014-04-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961347#comment-13961347
 ] 

Hive QA commented on HIVE-6738:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12637545/HIVE-6738.1.patch

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2146/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2146/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n '' ]]
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-Build-2146/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
++ egrep -v '^X|^Performing status on external'
++ awk '{print $2}'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20/target 
shims/0.20S/target shims/0.23/target shims/aggregator/target 
shims/common/target shims/common-secure/target packaging/target 
hbase-handler/target testutils/target jdbc/target metastore/target 
itests/target itests/hcatalog-unit/target itests/test-serde/target 
itests/qtest/target itests/hive-unit/target itests/custom-serde/target 
itests/util/target hcatalog/target hcatalog/storage-handlers/hbase/target 
hcatalog/server-extensions/target hcatalog/core/target 
hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target 
hcatalog/hcatalog-pig-adapter/target hwi/target common/target common/src/gen 
contrib/target service/target serde/target beeline/target odbc/target 
cli/target ql/dependency-reduced-pom.xml ql/target
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1585255.

At revision 1585255.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12637545

> HiveServer2 secure Thrift/HTTP needs to accept doAs parameter from proxying 
> intermediary
> 
>
> Key: HIVE-6738
> URL: https://issues.apache.org/jira/browse/HIVE-6738
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 0.13.0
>Reporter: Dilli Arumugam
>Assignee: Dilli Arumugam
> Fix For: 0.13.0
>
> Attachments: HIVE-6738.1.patch, HIVE-6738.patch, 
> hive-6738-req-impl-verify-rev1.md, hive-6738-req-impl-verify.md
>
>
> See already implemented JIra
>  https://issues.apache.org/jira/browse/HIVE-5155
> Support secure proxy user access to HiveServer2
> That fix expects the hive.server2.proxy.user parameter to come in Thrift body.
> When an intermediary gateway like Apache Knox is authenticating the end 
> client and then proxying the request to HiveServer2,  it is not practical for 
> the intermediary like Apache Knox to modify thrift content.
> Intermediary like Apache Knox should be able to assert doAs in a query 
> parameter. This paradigm is already established by other Hadoop ecosystem 
> components like WebHDFS, WebHCat, Oozie and HBase and Hive needs to be 
> aligned with them.
> The doAs asserted in query parameter should override if doAs specified in 
> Thrift body.



--
This message was sent by Atlassian JIRA
(v6.2#6252)