[jira] [Commented] (HIVE-5687) Streaming support in Hive

2014-04-10 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965071#comment-13965071
 ] 

Lefty Leverenz commented on HIVE-5687:
--

[~roshan_naik], were my comments on the review board too late?  Most of them 
were trivial edits of the javadocs, but several seemed worth fixing.  (For 
example, some methods' javadocs got an exception name wrong.)  Maybe I should 
have raised issues, but I figured you're the best judge of which changes should 
be made.

> Streaming support in Hive
> -
>
> Key: HIVE-5687
> URL: https://issues.apache.org/jira/browse/HIVE-5687
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: ACID, Streaming
> Fix For: 0.13.0
>
> Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 
> 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, 
> HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, 
> HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, 
> HIVE-5687.v6.patch, HIVE-5687.v7.patch, Hive Streaming Ingest API for v3 
> patch.pdf, Hive Streaming Ingest API for v4 patch.pdf, package.html
>
>
> Implement support for Streaming data into HIVE.
> - Provide a client streaming API 
> - Transaction support: Clients should be able to periodically commit a batch 
> of records atomically
> - Immediate visibility: Records should be immediately visible to queries on 
> commit
> - Should not overload HDFS with too many small files
> Use Cases:
>  - Streaming logs into HIVE via Flume
>  - Streaming results of computations from Storm



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 20197: HiveServer2 concurrency uses incorrect user information in unsecured mode

2014-04-10 Thread Thejas Nair

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20197/#review39993
---



service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java


I think we should throw HiveSQLException here


- Thejas Nair


On April 10, 2014, 4:56 a.m., Vaibhav Gumashta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20197/
> ---
> 
> (Updated April 10, 2014, 4:56 a.m.)
> 
> 
> Review request for hive and Thejas Nair.
> 
> 
> Bugs: HIVE-6864
> https://issues.apache.org/jira/browse/HIVE-6864
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> https://issues.apache.org/jira/browse/HIVE-6864
> 
> 
> Diffs
> -
> 
>   service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 
> 46946ab 
> 
> Diff: https://reviews.apache.org/r/20197/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Vaibhav Gumashta
> 
>



[jira] [Updated] (HIVE-6864) HiveServer2 concurrency uses incorrect user information in unsecured mode

2014-04-10 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-6864:


Priority: Blocker  (was: Major)

> HiveServer2 concurrency uses incorrect user information in unsecured mode
> -
>
> Key: HIVE-6864
> URL: https://issues.apache.org/jira/browse/HIVE-6864
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.13.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>Priority: Blocker
> Fix For: 0.13.0
>
> Attachments: HIVE-6864.1.patch
>
>
> Concurrent queries create table with wrong ownership



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6864) HiveServer2 concurrency uses incorrect user information in unsecured mode

2014-04-10 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965088#comment-13965088
 ] 

Thejas M Nair commented on HIVE-6864:
-

Looks good. Just a minor comment about the exception in rb.

Harish, I think we should include this in 0.13 . This can result in action 
being done as a wrong user.


> HiveServer2 concurrency uses incorrect user information in unsecured mode
> -
>
> Key: HIVE-6864
> URL: https://issues.apache.org/jira/browse/HIVE-6864
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.13.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Fix For: 0.13.0
>
> Attachments: HIVE-6864.1.patch
>
>
> Concurrent queries create table with wrong ownership



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6883) Dynamic partitioning optimization does not honor sort order or order by

2014-04-10 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-6883:
-

Fix Version/s: 0.13.0

> Dynamic partitioning optimization does not honor sort order or order by
> ---
>
> Key: HIVE-6883
> URL: https://issues.apache.org/jira/browse/HIVE-6883
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>Priority: Critical
> Fix For: 0.13.0
>
>
> HIVE-6455 patch does not honor sort order of the output table or order by of 
> select statement. The reason for the former is numDistributionKey in 
> ReduceSinkDesc is set wrongly. It doesn't take into account the sort columns, 
> because of this RSOp sets the sort columns to null in Key. Since nulls are 
> set in place of sort columns in Key, the sort columns in Value are not 
> sorted. 
> The other issue is ORDER BY columns are not honored during insertion. For 
> example
> {code}
> insert overwrite table over1k_part_orc partition(ds="foo", t) select 
> si,i,b,f,t from over1k_orc where t is null or t=27 order by si;
> {code}
> the select query performs order by on column 'si' in the first MR job. The 
> following MR job (inserted by HIVE-6455), sorts the input data on dynamic 
> partition column 't' without taking into account the already sorted 'si' 
> column. This results in out of order insertion for 'si' column.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6873) DISTINCT clause in aggregates is handled incorrectly by vectorized execution

2014-04-10 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965099#comment-13965099
 ] 

Remus Rusanu commented on HIVE-6873:


[~jnp] From how I read GroupByOptimizer.java@227, I reckon there are some cases 
when the reduce side does expect the mapper to had been doing the correct 
aggregation:
{code}
  // Partial aggregation is not done for distincts on the mapper
  // However, if the data is bucketed/sorted on the distinct key, 
partial aggregation
  // can be performed on the mapper.
{code}

> DISTINCT clause in aggregates is handled incorrectly by vectorized execution
> 
>
> Key: HIVE-6873
> URL: https://issues.apache.org/jira/browse/HIVE-6873
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0, 0.14.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-6873.1.patch, HIVE-6873.2.patch
>
>
> The vectorized aggregates ignore the DISTINCT clause. This cause incorrect 
> results. Due to how GroupByOperatorDesc adds the DISTINCT keys to the overall 
> aggregate keys the vectorized aggregates do account for the extra key, but 
> they do not process the data correctly for the key. the reduce side the 
> aggregates the input from the vectorized map side to results that are only 
> sometimes correct but mostly incorrect. HIVE-4607 tracks the proper fix, but 
> meantime I'm filing a bug to disable vectorized execution if DISTINCT is 
> present. Fix is trivial.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5687) Streaming support in Hive

2014-04-10 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965129#comment-13965129
 ] 

Roshan Naik commented on HIVE-5687:
---

[~leftylev] Yes looks like it went unnoticed due to the short time frame. For 
some reason i never got a notification of your review. We can get it in via 
another patch... but it appears to be too late to get it into this release.

[~orahive] You can query the data while it is being streamed into Hive. Queries 
will always see a consistent view of the data as this feature relies on the new 
ACID support in Hive. So queries will not see new data that was committed after 
they began executing. 

FLUME-1734 consumes this API to implement a Flume sink that streams data 
continuously into Hive.

> Streaming support in Hive
> -
>
> Key: HIVE-5687
> URL: https://issues.apache.org/jira/browse/HIVE-5687
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: ACID, Streaming
> Fix For: 0.13.0
>
> Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 
> 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, 
> HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, 
> HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, 
> HIVE-5687.v6.patch, HIVE-5687.v7.patch, Hive Streaming Ingest API for v3 
> patch.pdf, Hive Streaming Ingest API for v4 patch.pdf, package.html
>
>
> Implement support for Streaming data into HIVE.
> - Provide a client streaming API 
> - Transaction support: Clients should be able to periodically commit a batch 
> of records atomically
> - Immediate visibility: Records should be immediately visible to queries on 
> commit
> - Should not overload HDFS with too many small files
> Use Cases:
>  - Streaming logs into HIVE via Flume
>  - Streaming results of computations from Storm



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6883) Dynamic partitioning optimization does not honor sort order or order by

2014-04-10 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-6883:
-

Status: Patch Available  (was: Open)

> Dynamic partitioning optimization does not honor sort order or order by
> ---
>
> Key: HIVE-6883
> URL: https://issues.apache.org/jira/browse/HIVE-6883
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>Priority: Critical
> Fix For: 0.13.0
>
> Attachments: HIVE-6883.1.patch
>
>
> HIVE-6455 patch does not honor sort order of the output table or order by of 
> select statement. The reason for the former is numDistributionKey in 
> ReduceSinkDesc is set wrongly. It doesn't take into account the sort columns, 
> because of this RSOp sets the sort columns to null in Key. Since nulls are 
> set in place of sort columns in Key, the sort columns in Value are not 
> sorted. 
> The other issue is ORDER BY columns are not honored during insertion. For 
> example
> {code}
> insert overwrite table over1k_part_orc partition(ds="foo", t) select 
> si,i,b,f,t from over1k_orc where t is null or t=27 order by si;
> {code}
> the select query performs order by on column 'si' in the first MR job. The 
> following MR job (inserted by HIVE-6455), sorts the input data on dynamic 
> partition column 't' without taking into account the already sorted 'si' 
> column. This results in out of order insertion for 'si' column.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6883) Dynamic partitioning optimization does not honor sort order or order by

2014-04-10 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-6883:
-

Attachment: HIVE-6883.1.patch

The fix is small. I added more tests to verify sort by and order by cases. Also 
replicated the tests for vectorization and tez.

> Dynamic partitioning optimization does not honor sort order or order by
> ---
>
> Key: HIVE-6883
> URL: https://issues.apache.org/jira/browse/HIVE-6883
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>Priority: Critical
> Fix For: 0.13.0
>
> Attachments: HIVE-6883.1.patch
>
>
> HIVE-6455 patch does not honor sort order of the output table or order by of 
> select statement. The reason for the former is numDistributionKey in 
> ReduceSinkDesc is set wrongly. It doesn't take into account the sort columns, 
> because of this RSOp sets the sort columns to null in Key. Since nulls are 
> set in place of sort columns in Key, the sort columns in Value are not 
> sorted. 
> The other issue is ORDER BY columns are not honored during insertion. For 
> example
> {code}
> insert overwrite table over1k_part_orc partition(ds="foo", t) select 
> si,i,b,f,t from over1k_orc where t is null or t=27 order by si;
> {code}
> the select query performs order by on column 'si' in the first MR job. The 
> following MR job (inserted by HIVE-6455), sorts the input data on dynamic 
> partition column 't' without taking into account the already sorted 'si' 
> column. This results in out of order insertion for 'si' column.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6883) Dynamic partitioning optimization does not honor sort order or order by

2014-04-10 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965141#comment-13965141
 ] 

Prasanth J commented on HIVE-6883:
--

Review Board is flaky.. will upload the patch once it is back..

> Dynamic partitioning optimization does not honor sort order or order by
> ---
>
> Key: HIVE-6883
> URL: https://issues.apache.org/jira/browse/HIVE-6883
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>Priority: Critical
> Fix For: 0.13.0
>
> Attachments: HIVE-6883.1.patch
>
>
> HIVE-6455 patch does not honor sort order of the output table or order by of 
> select statement. The reason for the former is numDistributionKey in 
> ReduceSinkDesc is set wrongly. It doesn't take into account the sort columns, 
> because of this RSOp sets the sort columns to null in Key. Since nulls are 
> set in place of sort columns in Key, the sort columns in Value are not 
> sorted. 
> The other issue is ORDER BY columns are not honored during insertion. For 
> example
> {code}
> insert overwrite table over1k_part_orc partition(ds="foo", t) select 
> si,i,b,f,t from over1k_orc where t is null or t=27 order by si;
> {code}
> the select query performs order by on column 'si' in the first MR job. The 
> following MR job (inserted by HIVE-6455), sorts the input data on dynamic 
> partition column 't' without taking into account the already sorted 'si' 
> column. This results in out of order insertion for 'si' column.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6883) Dynamic partitioning optimization does not honor sort order or order by

2014-04-10 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965142#comment-13965142
 ] 

Prasanth J commented on HIVE-6883:
--

[~rhbutani] this is a critical issue as HIVE-6455 fails to honor sort order in 
DDL.. Will it be possible to include this in 0.13?

> Dynamic partitioning optimization does not honor sort order or order by
> ---
>
> Key: HIVE-6883
> URL: https://issues.apache.org/jira/browse/HIVE-6883
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>Priority: Critical
> Fix For: 0.13.0
>
> Attachments: HIVE-6883.1.patch
>
>
> HIVE-6455 patch does not honor sort order of the output table or order by of 
> select statement. The reason for the former is numDistributionKey in 
> ReduceSinkDesc is set wrongly. It doesn't take into account the sort columns, 
> because of this RSOp sets the sort columns to null in Key. Since nulls are 
> set in place of sort columns in Key, the sort columns in Value are not 
> sorted. 
> The other issue is ORDER BY columns are not honored during insertion. For 
> example
> {code}
> insert overwrite table over1k_part_orc partition(ds="foo", t) select 
> si,i,b,f,t from over1k_orc where t is null or t=27 order by si;
> {code}
> the select query performs order by on column 'si' in the first MR job. The 
> following MR job (inserted by HIVE-6455), sorts the input data on dynamic 
> partition column 't' without taking into account the already sorted 'si' 
> column. This results in out of order insertion for 'si' column.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5687) Streaming support in Hive

2014-04-10 Thread Lars Francke (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965143#comment-13965143
 ] 

Lars Francke commented on HIVE-5687:


I don't understand why this was rushed. There were only a couple of hours to 
review the final patch.

> Streaming support in Hive
> -
>
> Key: HIVE-5687
> URL: https://issues.apache.org/jira/browse/HIVE-5687
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: ACID, Streaming
> Fix For: 0.13.0
>
> Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 
> 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, 
> HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, 
> HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, 
> HIVE-5687.v6.patch, HIVE-5687.v7.patch, Hive Streaming Ingest API for v3 
> patch.pdf, Hive Streaming Ingest API for v4 patch.pdf, package.html
>
>
> Implement support for Streaming data into HIVE.
> - Provide a client streaming API 
> - Transaction support: Clients should be able to periodically commit a batch 
> of records atomically
> - Immediate visibility: Records should be immediately visible to queries on 
> commit
> - Should not overload HDFS with too many small files
> Use Cases:
>  - Streaming logs into HIVE via Flume
>  - Streaming results of computations from Storm



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5376) Hive does not honor type for partition columns when altering column type

2014-04-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965171#comment-13965171
 ] 

Hive QA commented on HIVE-5376:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12639448/HIVE-5376.2.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5571 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin6
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_leftsemijoin_mr
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority2
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_alter_partition_coltype_invalidtype_conversion
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2200/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2200/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12639448

> Hive does not honor type for partition columns when altering column type
> 
>
> Key: HIVE-5376
> URL: https://issues.apache.org/jira/browse/HIVE-5376
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Reporter: Sergey Shelukhin
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-5376.1.patch, HIVE-5376.2.patch
>
>
> Followup for HIVE-5297. If partition column of type string is changed to int, 
> the data is not verified. The values for partition columns are all in 
> metastore db, so it's easy to check and fail the type change.
> alter_partition_coltype.q (or some other test?) checks this behavior right 
> now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6325) Enable using multiple concurrent sessions in tez

2014-04-10 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965217#comment-13965217
 ] 

Lefty Leverenz commented on HIVE-6325:
--

For the record:  this adds three configuration parameters -- 
*hive.server2.tez.default.queues*, 
*hive.server2.tez.sessions.per.default.queue*, and 
*hive.server2.tez.initialize.default.sessions*.

The previously mentioned parameters no longer exist 
(hive.hs2.num.sessions.default.queues and 
hive.hs2.tez.sessions.per.default.queue).

> Enable using multiple concurrent sessions in tez
> 
>
> Key: HIVE-6325
> URL: https://issues.apache.org/jira/browse/HIVE-6325
> Project: Hive
>  Issue Type: Improvement
>  Components: Tez
>Affects Versions: 0.13.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Fix For: 0.13.0, 0.14.0
>
> Attachments: HIVE-6325-branch-0.13.patch, HIVE-6325-trunk.patch, 
> HIVE-6325.1.patch, HIVE-6325.10.patch, HIVE-6325.11.patch, HIVE-6325.2.patch, 
> HIVE-6325.3.patch, HIVE-6325.4.patch, HIVE-6325.5.patch, HIVE-6325.6.patch, 
> HIVE-6325.7.patch, HIVE-6325.8.patch, HIVE-6325.9.patch
>
>
> We would like to enable multiple concurrent sessions in tez via hive server 
> 2. This will enable users to make efficient use of the cluster when it has 
> been partitioned using yarn queues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6325) Enable using multiple concurrent sessions in tez

2014-04-10 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965218#comment-13965218
 ] 

Lefty Leverenz commented on HIVE-6325:
--

For the record:  this adds three configuration parameters -- 
*hive.server2.tez.default.queues*, 
*hive.server2.tez.sessions.per.default.queue*, and 
*hive.server2.tez.initialize.default.sessions*.

The previously mentioned parameters no longer exist 
(hive.hs2.num.sessions.default.queues and 
hive.hs2.tez.sessions.per.default.queue).

> Enable using multiple concurrent sessions in tez
> 
>
> Key: HIVE-6325
> URL: https://issues.apache.org/jira/browse/HIVE-6325
> Project: Hive
>  Issue Type: Improvement
>  Components: Tez
>Affects Versions: 0.13.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Fix For: 0.13.0, 0.14.0
>
> Attachments: HIVE-6325-branch-0.13.patch, HIVE-6325-trunk.patch, 
> HIVE-6325.1.patch, HIVE-6325.10.patch, HIVE-6325.11.patch, HIVE-6325.2.patch, 
> HIVE-6325.3.patch, HIVE-6325.4.patch, HIVE-6325.5.patch, HIVE-6325.6.patch, 
> HIVE-6325.7.patch, HIVE-6325.8.patch, HIVE-6325.9.patch
>
>
> We would like to enable multiple concurrent sessions in tez via hive server 
> 2. This will enable users to make efficient use of the cluster when it has 
> been partitioned using yarn queues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6382) PATCHED_BLOB encoding in ORC will corrupt data in some cases

2014-04-10 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965247#comment-13965247
 ] 

Lefty Leverenz commented on HIVE-6382:
--

For the record:  this adds the configuration parameter 
*hive.exec.orc.skip.corrupt.data* to HiveConf.java and 
hive-default.xml.template.

> PATCHED_BLOB encoding in ORC will corrupt data in some cases
> 
>
> Key: HIVE-6382
> URL: https://issues.apache.org/jira/browse/HIVE-6382
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.13.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: orcfile
> Fix For: 0.13.0
>
> Attachments: HIVE-6382.1.patch, HIVE-6382.2.patch, HIVE-6382.3.patch, 
> HIVE-6382.4.patch, HIVE-6382.5.patch, HIVE-6382.6.patch
>
>
> In PATCHED_BLOB encoding (added in HIVE-4123), gapVsPatchList is an array of 
> long that stores gap (g) between the values that are patched and the patch 
> value (p). The maximum distance of gap can be 511 that require 8 bits to 
> encode. And patch values can take more than 56 bits. When patch values take 
> more than 56 bits, p + g will become > 64 bits which cannot be packed to a 
> long. This will result in data corruption under the case where patch values 
> are > 56 bits. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6876) Logging information should include thread id

2014-04-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965246#comment-13965246
 ] 

Hive QA commented on HIVE-6876:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12639481/HIVE-6876.1.patch

{color:green}SUCCESS:{color} +1 5570 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2202/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2202/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12639481

> Logging information should include thread id
> 
>
> Key: HIVE-6876
> URL: https://issues.apache.org/jira/browse/HIVE-6876
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Metastore
>Affects Versions: 0.14.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
>Priority: Trivial
> Attachments: HIVE-6876.1.patch
>
>
> The multi-threaded nature of hive server and remote metastore makes it 
> difficult to debug issues without enabling thread information. It would be 
> nice to have the thread id in the logs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Issue Comment Deleted] (HIVE-6325) Enable using multiple concurrent sessions in tez

2014-04-10 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-6325:
-

Comment: was deleted

(was: For the record:  this adds three configuration parameters -- 
*hive.server2.tez.default.queues*, 
*hive.server2.tez.sessions.per.default.queue*, and 
*hive.server2.tez.initialize.default.sessions*.

The previously mentioned parameters no longer exist 
(hive.hs2.num.sessions.default.queues and 
hive.hs2.tez.sessions.per.default.queue).)

> Enable using multiple concurrent sessions in tez
> 
>
> Key: HIVE-6325
> URL: https://issues.apache.org/jira/browse/HIVE-6325
> Project: Hive
>  Issue Type: Improvement
>  Components: Tez
>Affects Versions: 0.13.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Fix For: 0.13.0, 0.14.0
>
> Attachments: HIVE-6325-branch-0.13.patch, HIVE-6325-trunk.patch, 
> HIVE-6325.1.patch, HIVE-6325.10.patch, HIVE-6325.11.patch, HIVE-6325.2.patch, 
> HIVE-6325.3.patch, HIVE-6325.4.patch, HIVE-6325.5.patch, HIVE-6325.6.patch, 
> HIVE-6325.7.patch, HIVE-6325.8.patch, HIVE-6325.9.patch
>
>
> We would like to enable multiple concurrent sessions in tez via hive server 
> 2. This will enable users to make efficient use of the cluster when it has 
> been partitioned using yarn queues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6450) Potential deadlock caused by unlock exceptions

2014-04-10 Thread Ding Yuan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965251#comment-13965251
 ] 

Ding Yuan commented on HIVE-6450:
-

Ping. Is there anything else I can help from my side?

> Potential deadlock caused by unlock exceptions
> --
>
> Key: HIVE-6450
> URL: https://issues.apache.org/jira/browse/HIVE-6450
> Project: Hive
>  Issue Type: Bug
>  Components: Locking
>Affects Versions: 0.12.0
>Reporter: Ding Yuan
>
> In the following two code snippets, unlock might fail with LockException. This
> exception is not handled and thus the program might go on without releasing 
> the lock, causing potential deadlock or starvations.
> Line: 197, File: "org/apache/hadoop/hive/ql/lockmgr/EmbeddedLockManager.java"
> {noformat}
> 194:   try {
> 195: unlock(locked, numRetriesForUnLock, sleepTime);
> 196:   } catch (LockException e) {
> 197: LOG.info(e);
> 198:   }
> {noformat}
> Line: 276, File: 
> "org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java"
> {noformat}
> 271: try {
> 272:   LOG.info(" about to release lock for " + 
> hiveLock.getHiveLockObject().getName());
> 273:   unlock(hiveLock);
> 274: } catch (LockException e) {
> 275:   // The lock may have been released. Ignore and continue
> 276:   LOG.warn("Error when releasing lock", e);
> 277: }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Timeline for the Hive 0.13 release?

2014-04-10 Thread Lefty Leverenz
Harish, here are some additions to your list, with links and patch excerpts:


HIVE-5351  (linked doc
jira HIVE-6318  doesn't
provide definitions for template file but documents these config in the
wiki -- Setting Up HiveServer2 - SSL
Encryption
):

+HIVE_SERVER2_USE_SSL("hive.server2.use.SSL", false),

+HIVE_SERVER2_SSL_KEYSTORE_PATH("hive.server2.keystore.path", ""),

+HIVE_SERVER2_SSL_KEYSTORE_PASSWORD("hive.server2.keystore.password",
""),



HIVE-6447 (on your list):


HIVE_CONVERT_JOIN_BUCKET_MAPJOIN_TEZ("hive.convert.join.bucket.mapjoin.tez",
false),

description provided in jira
comment
:

{code}



  hive.convert.join.bucket.mapjoin.tez

  false

  Whether joins can be automatically converted to bucket map
joins in hive when tez is used as the execution engine.



{code}


HIVE-6500 :

HIVESTATSDBCLASS("hive.stats.dbclass", "fs",

new PatternValidator("jdbc(:.*)", "hbase", "counter", "custom",
"fs")), // StatsSetupConst.StatDB

*Need to add "fs" to template description:*

  hive.stats.dbclass

  counter

  The storage that stores temporary Hive statistics.
Currently, jdbc, hbase, counter and custom type are supported.



HIVE-6466  added a config
value (PAM) and a new config (hive.server2.authentication.pam.services):

 HIVE_SERVER2_AUTHENTICATION("hive.server2.authentication", "NONE",

-new StringsValidator("NOSASL", "NONE", "LDAP", "KERBEROS",
"CUSTOM")),

+new StringsValidator("NOSASL", "NONE", "LDAP", "KERBEROS", "PAM",
"CUSTOM")),

 ...

+// List of the underlying pam services that should be used when auth
type is PAM

+// A file with the same name must exist in /etc/pam.d

+HIVE_SERVER2_PAM_SERVICES("hive.server2.authentication.pam.services",
null),


It's documented in the wiki by
HIVE-6318in Setting
Up HiveServer2 - Pluggable Authentication Modules
(PAM)and
supposedly documented in the template file by
HIVE-6503 , which says
committed for 0.13.0 but *doesn't show up in branch 13 or trunk.*
HIVE-6503.1.patch
:

@@ -2165,6 +2165,7 @@

NONE: no authentication check

LDAP: LDAP/AD based authentication

KERBEROS: Kerberos/GSSAPI authentication

+   PAM: Pluggable authentication module

CUSTOM: Custom authentication provider

(Use with property hive.server2.custom.authentication.class)

   

@@ -2217,6 +2218,15 @@

 



 

+  hive.server2.authentication.pam.services

+  

+  

+List of the underlying PAM services that should be used when auth type
is PAM.

+A file with the same name must exist in /etc/pam.d.

+  

+

+

+

   hive.server2.enable.doAs

   true

   


HIVE-6681 :

+
SERDESUSINGMETASTOREFORSCHEMA("hive.serdes.using.metastore.for.schema","org.apache.hadoop.hive.ql.io.orc.OrcSerde,"

+  +
"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe,"

+  +
"org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe,"

+  +
"org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe,org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe,"

+  + "org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe"),




That's it, as far as I can tell.

-- Lefty


On Wed, Apr 9, 2014 at 3:49 PM, Harish Butani wrote:

> Lefty, here is the list I found missing from the template xml file:
>
> HIVE-6447:
>
> HIVE_CONVERT_JOIN_BUCKET_MAPJOIN_TEZ("hive.convert.join.bucket.mapjoin.tez",
> false),
>
> HIVE-6492:
> HIVELIMITTABLESCANPARTITION("hive.limit.query.max.table.partition",
> -1),
>
> HIVE-5843:
> // Transactions
> HIVE_TXN_MANAGER("hive.txn.manager",
> "org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager"),
> // time after which transactions are declared aborted if the client has
> // not sent a heartbeat, in seconds.
> HIVE_TXN_TIMEOUT("hive.txn.timeout", 300),
>
> // Maximum number of transactions that can be fetched in one call to
> // open_txns().
> // Increasing this will decrease the number of delta files created when
> // streaming data into Hive.  But it will also increase the number of
> // open tra

[jira] [Commented] (HIVE-6784) parquet-hive should allow column type change

2014-04-10 Thread Justin Coffey (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965279#comment-13965279
 ] 

Justin Coffey commented on HIVE-6784:
-

-1 on this patch.

Looping on the arraywriteable in deserialize() will cause a performance penalty 
at read time, and running parseXxxx(obj.toString) in the event of a type 
mismatch is also painful.

Changing types of columns is a rare event, we shouldn't write code that will 
cause performance penalties to handle it.  Users should recreate the table with 
the new type and load it from the old table casting and converting as 
appropriate in their query.

> parquet-hive should allow column type change
> 
>
> Key: HIVE-6784
> URL: https://issues.apache.org/jira/browse/HIVE-6784
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats, Serializers/Deserializers
>Affects Versions: 0.13.0
>Reporter: Tongjie Chen
> Fix For: 0.14.0
>
> Attachments: HIVE-6784.1.patch.txt, HIVE-6784.2.patch.txt
>
>
> see also in the following parquet issue:
> https://github.com/Parquet/parquet-mr/issues/323
> Currently, if we change parquet format hive table using "alter table 
> parquet_table change c1 c1 bigint " ( assuming original type of c1 is int), 
> it will result in exception thrown from SerDe: 
> "org.apache.hadoop.io.IntWritable cannot be cast to 
> org.apache.hadoop.io.LongWritable" in query runtime.
> This is different behavior from hive (using other file format), where it will 
> try to perform cast (null value in case of incompatible type).
> Parquet Hive's RecordReader returns an ArrayWritable (based on schema stored 
> in footers of parquet files); ParquetHiveSerDe also creates an corresponding 
> ArrayWritableObjectInspector (but using column type info from metastore). 
> Whenever there is column type change, the objector inspector will throw 
> exception, since WritableLongObjectInspector cannot inspect an IntWritable 
> etc...
> Conversion has to happen somewhere if we want to allow type change. SerDe's 
> deserialize method seems a natural place for it.
> Currently, serialize method calls createStruct (then createPrimitive) for 
> every record, but it creates a new object regardless, which seems expensive. 
> I think that could be optimized a bit by just returning the object passed if 
> already of the right type. deserialize also reuse this method, if there is a 
> type change, there will be new object to be created, which I think is 
> inevitable. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6877) TestOrcRawRecordMerger is deleting test.tmp.dir

2014-04-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965355#comment-13965355
 ] 

Hive QA commented on HIVE-6877:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12639488/HIVE-6877.1.patch

{color:green}SUCCESS:{color} +1 5570 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2203/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2203/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12639488

> TestOrcRawRecordMerger is deleting test.tmp.dir
> ---
>
> Key: HIVE-6877
> URL: https://issues.apache.org/jira/browse/HIVE-6877
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-6877.1.patch
>
>
> TestOrcRawRecordMerger seems to be deleting the directory pointed to by 
> test.tmp.dir.  This can cause some failures in any tests that run after this 
> test if they need to use any files in the tmp dir such as conf files or 
> creating Hive tables.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6648) Permissions are not inherited correctly when tables have multiple partition columns

2014-04-10 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-6648:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Thank you Szehon! I have committed this to trunk!

> Permissions are not inherited correctly when tables have multiple partition 
> columns
> ---
>
> Key: HIVE-6648
> URL: https://issues.apache.org/jira/browse/HIVE-6648
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0, 0.13.0
>Reporter: Henry Robinson
>Assignee: Szehon Ho
> Fix For: 0.14.0
>
> Attachments: HIVE-6648.patch
>
>
> {{Warehouse.mkdirs()}} always looks at the immediate parent of the path that 
> it creates when determining what permissions to inherit. However, it may have 
> created that parent directory as well, in which case it will have the default 
> permissions and will not have inherited them.
> This is a problem when performing an {{INSERT}} into a table with more than 
> one partition column. E.g., in an empty table:
> {{INSERT INTO TABLE tbl PARTITION(p1=1, p2=2) ... }}
> A new subdirectory /p1=1/p2=2  will be created, and with permission 
> inheritance (per HIVE-2504) enabled, the intention is presumably for both new 
> directories to inherit the root table dir's permissions. However, 
> {{mkdirs()}} will only set the permission of the leaf directory (i.e. 
> /p2=2/), and then only to the permissions of /p1=1/, which was just created.
> {code}
> public boolean mkdirs(Path f) throws MetaException {
> FileSystem fs = null;
> try {
>   fs = getFs(f);
>   LOG.debug("Creating directory if it doesn't exist: " + f);
>   //Check if the directory already exists. We want to change the 
> permission
>   //to that of the parent directory only for newly created directories.
>   if (this.inheritPerms) {
> try {
>   return fs.getFileStatus(f).isDir();
> } catch (FileNotFoundException ignore) {
> }
>   }
>   boolean success = fs.mkdirs(f);
>   if (this.inheritPerms && success) {
> // Set the permission of parent directory.
> // HNR: This is the bug - getParent() may refer to a just-created 
> directory.
> fs.setPermission(f, fs.getFileStatus(f.getParent()).getPermission());
>   }
>   return success;
> } catch (IOException e) {
>   closeFs(fs);
>   MetaStoreUtils.logAndThrowMetaException(e);
> }
> return false;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5609) possibly revisit change in unhex() behavior

2014-04-10 Thread Josh Sumali (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965441#comment-13965441
 ] 

Josh Sumali commented on HIVE-5609:
---

Yes, upgrading to hive 0.12 with this change broke a Tableau query with 
statement CONCAT(UNHEX('E280A2'), '----0360 ').

> possibly revisit change in unhex() behavior
> ---
>
> Key: HIVE-5609
> URL: https://issues.apache.org/jira/browse/HIVE-5609
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Jason Dere
>
> Got one user case where they got bit by the change in HIVE-2482 where unhex() 
> changed to return Binary, where it used to return Text. Looking at the MySQL 
> unhex() function which I'm assuming Hive's unhex() is based on, MySQL 
> actually returns a string value.   I know that we explicitly document the 
> incompatible change in the Jira, but I'd figure that most users would be 
> using unhex() based on how the mysql version works.  And because it's now 
> returning binary, the resulting value from unhex() is actually much less 
> flexible to use if it's being nested within other UDFs, because binary does 
> not implicitly convert to other types.  
> Is anyone open to the idea of reverting unhex() back to it's original 
> behavior, and providing a separate UDF to return the value as binary as 
> [~appodictic] suggested in  HIVE-2482?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6882) Make upgrade script schemaTool friendly

2014-04-10 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6882:
---

Attachment: HIVE-6882.1.patch

Similar fix is also needed for oracle script. 

> Make upgrade script schemaTool friendly
> ---
>
> Key: HIVE-6882
> URL: https://issues.apache.org/jira/browse/HIVE-6882
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.13.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-6882.1.patch, HIVE-6882.patch
>
>
> Current scripts work fine when invoked manually, but fails when invoked via 
> schemaTool.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6319) Insert, update, delete functionality needs a compactor

2014-04-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965461#comment-13965461
 ] 

Hive QA commented on HIVE-6319:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12639499/HIVE-6319.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5612 tests executed
*Failed tests:*
{noformat}
org.apache.hive.service.cli.thrift.TestThriftBinaryCLIService.testExecuteStatementAsync
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2205/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2205/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12639499

> Insert, update, delete functionality needs a compactor
> --
>
> Key: HIVE-6319
> URL: https://issues.apache.org/jira/browse/HIVE-6319
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Alan Gates
>Assignee: Alan Gates
> Fix For: 0.13.0
>
> Attachments: 6319.wip.patch, HIVE-6319.patch, HIVE-6319.patch, 
> HIVE-6319.patch, HIVE-6319.patch, HIVE-6319.patch, HiveCompactorDesign.pdf
>
>
> In order to keep the number of delta files from spiraling out of control we 
> need a compactor to collect these delta files together, and eventually 
> rewrite the base file when the deltas get large enough.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6648) Permissions are not inherited correctly when tables have multiple partition columns

2014-04-10 Thread Damien Carol (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-6648:
---

Description: 
{{Warehouse.mkdirs()}} always looks at the immediate parent of the path that it 
creates when determining what permissions to inherit. However, it may have 
created that parent directory as well, in which case it will have the default 
permissions and will not have inherited them.

This is a problem when performing an {{INSERT}} into a table with more than one 
partition column. E.g., in an empty table:

{code}INSERT INTO TABLE tbl PARTITION(p1=1, p2=2) ... {code}

A new subdirectory /p1=1/p2=2  will be created, and with permission inheritance 
(per HIVE-2504) enabled, the intention is presumably for both new directories 
to inherit the root table dir's permissions. However, {{mkdirs()}} will only 
set the permission of the leaf directory (i.e. /p2=2/), and then only to the 
permissions of /p1=1/, which was just created.

{code}
public boolean mkdirs(Path f) throws MetaException {
FileSystem fs = null;
try {
  fs = getFs(f);
  LOG.debug("Creating directory if it doesn't exist: " + f);
  //Check if the directory already exists. We want to change the permission
  //to that of the parent directory only for newly created directories.
  if (this.inheritPerms) {
try {
  return fs.getFileStatus(f).isDir();
} catch (FileNotFoundException ignore) {
}
  }
  boolean success = fs.mkdirs(f);
  if (this.inheritPerms && success) {
// Set the permission of parent directory.
// HNR: This is the bug - getParent() may refer to a just-created 
directory.
fs.setPermission(f, fs.getFileStatus(f.getParent()).getPermission());
  }
  return success;
} catch (IOException e) {
  closeFs(fs);
  MetaStoreUtils.logAndThrowMetaException(e);
}
return false;
  }
{code}

  was:
{{Warehouse.mkdirs()}} always looks at the immediate parent of the path that it 
creates when determining what permissions to inherit. However, it may have 
created that parent directory as well, in which case it will have the default 
permissions and will not have inherited them.

This is a problem when performing an {{INSERT}} into a table with more than one 
partition column. E.g., in an empty table:

{{INSERT INTO TABLE tbl PARTITION(p1=1, p2=2) ... }}

A new subdirectory /p1=1/p2=2  will be created, and with permission inheritance 
(per HIVE-2504) enabled, the intention is presumably for both new directories 
to inherit the root table dir's permissions. However, {{mkdirs()}} will only 
set the permission of the leaf directory (i.e. /p2=2/), and then only to the 
permissions of /p1=1/, which was just created.

{code}
public boolean mkdirs(Path f) throws MetaException {
FileSystem fs = null;
try {
  fs = getFs(f);
  LOG.debug("Creating directory if it doesn't exist: " + f);
  //Check if the directory already exists. We want to change the permission
  //to that of the parent directory only for newly created directories.
  if (this.inheritPerms) {
try {
  return fs.getFileStatus(f).isDir();
} catch (FileNotFoundException ignore) {
}
  }
  boolean success = fs.mkdirs(f);
  if (this.inheritPerms && success) {
// Set the permission of parent directory.
// HNR: This is the bug - getParent() may refer to a just-created 
directory.
fs.setPermission(f, fs.getFileStatus(f.getParent()).getPermission());
  }
  return success;
} catch (IOException e) {
  closeFs(fs);
  MetaStoreUtils.logAndThrowMetaException(e);
}
return false;
  }
{code}


> Permissions are not inherited correctly when tables have multiple partition 
> columns
> ---
>
> Key: HIVE-6648
> URL: https://issues.apache.org/jira/browse/HIVE-6648
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0, 0.13.0
>Reporter: Henry Robinson
>Assignee: Szehon Ho
> Fix For: 0.14.0
>
> Attachments: HIVE-6648.patch
>
>
> {{Warehouse.mkdirs()}} always looks at the immediate parent of the path that 
> it creates when determining what permissions to inherit. However, it may have 
> created that parent directory as well, in which case it will have the default 
> permissions and will not have inherited them.
> This is a problem when performing an {{INSERT}} into a table with more than 
> one partition column. E.g., in an empty table:
> {code}INSERT INTO TABLE tbl PARTITION(p1=1, p2=2) ... {code}
> A new subdirectory /p1=1/p2=2  will be created, and with permission 
> inheritance (per HIVE-2504) enabled, the intention is presumably for both new 
> directories to inherit the root table dir's permission

[jira] [Resolved] (HIVE-6882) Make upgrade script schemaTool friendly

2014-04-10 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-6882.


   Resolution: Fixed
Fix Version/s: 0.13.0

Committed to trunk & 0.13

> Make upgrade script schemaTool friendly
> ---
>
> Key: HIVE-6882
> URL: https://issues.apache.org/jira/browse/HIVE-6882
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.13.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 0.13.0
>
> Attachments: HIVE-6882.1.patch, HIVE-6882.patch
>
>
> Current scripts work fine when invoked manually, but fails when invoked via 
> schemaTool.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HIVE-6881) Postgres Upgrade script for hive 0.13 is broken

2014-04-10 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-6881.


   Resolution: Fixed
Fix Version/s: 0.13.0

Committed to trunk & 0.13

> Postgres Upgrade script for hive 0.13 is broken
> ---
>
> Key: HIVE-6881
> URL: https://issues.apache.org/jira/browse/HIVE-6881
> Project: Hive
>  Issue Type: Bug
>Reporter: Harish Butani
>Assignee: Harish Butani
> Fix For: 0.13.0
>
> Attachments: HIVE-6881.1.patch
>
>
> The script added for HIVE-6757 didn't quote the identifiers



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5687) Streaming support in Hive

2014-04-10 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965544#comment-13965544
 ] 

Alan Gates commented on HIVE-5687:
--

[~leftylev] sorry, this is my fault.  I meant to file a JIRA to address yours 
and Lars style comments and forgot.  I'll do that shortly.
[~lars_francke] the latest patch was only different from the one Owen +1'd in a 
few small packaging details.  Sorry if that wasn't clear.  I pushed it in 
because I know Harish is anxious to get a release candidate for 0.13 and this 
was one of the last blockers.  

> Streaming support in Hive
> -
>
> Key: HIVE-5687
> URL: https://issues.apache.org/jira/browse/HIVE-5687
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: ACID, Streaming
> Fix For: 0.13.0
>
> Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 
> 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, 
> HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, 
> HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, 
> HIVE-5687.v6.patch, HIVE-5687.v7.patch, Hive Streaming Ingest API for v3 
> patch.pdf, Hive Streaming Ingest API for v4 patch.pdf, package.html
>
>
> Implement support for Streaming data into HIVE.
> - Provide a client streaming API 
> - Transaction support: Clients should be able to periodically commit a batch 
> of records atomically
> - Immediate visibility: Records should be immediately visible to queries on 
> commit
> - Should not overload HDFS with too many small files
> Use Cases:
>  - Streaming logs into HIVE via Flume
>  - Streaming results of computations from Storm



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-6884) HiveLockObject and enclosed HiveLockObjectData override equal() method but didn't do so for hashcode()

2014-04-10 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-6884:
-

 Summary: HiveLockObject and enclosed HiveLockObjectData override 
equal() method but didn't do so for hashcode()
 Key: HIVE-6884
 URL: https://issues.apache.org/jira/browse/HIVE-6884
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.12.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang


This breaches the JAVA contact that equal objects should have equal hash code, 
thus may cause unexpected results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Timeline for the Hive 0.13 release?

2014-04-10 Thread Vaibhav Gumashta
Hi Lefty,

All the HiveServer2 related configs are already in hive-site.xml.template
and HiveConf.java.

Thanks,
--Vaibhav


On Thu, Apr 10, 2014 at 7:54 AM, Lefty Leverenz wrote:

> Harish, here are some additions to your list, with links and patch
> excerpts:
>
>
> HIVE-5351  (linked doc
> jira HIVE-6318  doesn't
> provide definitions for template file but documents these config in the
> wiki -- Setting Up HiveServer2 - SSL
> Encryption<
> https://cwiki.apache.org/confluence/display/Hive/Setting+up+HiveServer2#SettingUpHiveServer2-SSLEncryption
> >
> ):
>
> +HIVE_SERVER2_USE_SSL("hive.server2.use.SSL", false),
>
> +HIVE_SERVER2_SSL_KEYSTORE_PATH("hive.server2.keystore.path", ""),
>
> +HIVE_SERVER2_SSL_KEYSTORE_PASSWORD("hive.server2.keystore.password",
> ""),
>
>
>
> HIVE-6447 (on your list):
>
>
>
> HIVE_CONVERT_JOIN_BUCKET_MAPJOIN_TEZ("hive.convert.join.bucket.mapjoin.tez",
> false),
>
> description provided in jira
> comment<
> https://issues.apache.org/jira/browse/HIVE-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959765#comment-13959765
> >
> :
>
> {code}
>
> 
>
>   hive.convert.join.bucket.mapjoin.tez
>
>   false
>
>   Whether joins can be automatically converted to bucket map
> joins in hive when tez is used as the execution engine.
>
> 
>
> {code}
>
>
> HIVE-6500 :
>
> HIVESTATSDBCLASS("hive.stats.dbclass", "fs",
>
> new PatternValidator("jdbc(:.*)", "hbase", "counter", "custom",
> "fs")), // StatsSetupConst.StatDB
>
> *Need to add "fs" to template description:*
>
>   hive.stats.dbclass
>
>   counter
>
>   The storage that stores temporary Hive statistics.
> Currently, jdbc, hbase, counter and custom type are
> supported.
>
>
>
> HIVE-6466  added a config
> value (PAM) and a new config (hive.server2.authentication.pam.services):
>
>  HIVE_SERVER2_AUTHENTICATION("hive.server2.authentication", "NONE",
>
> -new StringsValidator("NOSASL", "NONE", "LDAP", "KERBEROS",
> "CUSTOM")),
>
> +new StringsValidator("NOSASL", "NONE", "LDAP", "KERBEROS", "PAM",
> "CUSTOM")),
>
>  ...
>
> +// List of the underlying pam services that should be used when auth
> type is PAM
>
> +// A file with the same name must exist in /etc/pam.d
>
> +HIVE_SERVER2_PAM_SERVICES("hive.server2.authentication.pam.services",
> null),
>
>
> It's documented in the wiki by
> HIVE-6318in Setting
> Up HiveServer2 - Pluggable Authentication Modules
> (PAM)<
> https://cwiki.apache.org/confluence/display/Hive/Setting+up+HiveServer2#SettingUpHiveServer2-PluggableAuthenticationModules(PAM)
> >and
> supposedly documented in the template file by
> HIVE-6503 , which says
> committed for 0.13.0 but *doesn't show up in branch 13 or trunk.*
> HIVE-6503.1.patch<
> https://issues.apache.org/jira/secure/attachment/12633674/HIVE-6503.1.patch
> >
> :
>
> @@ -2165,6 +2165,7 @@
>
> NONE: no authentication check
>
> LDAP: LDAP/AD based authentication
>
> KERBEROS: Kerberos/GSSAPI authentication
>
> +   PAM: Pluggable authentication module
>
> CUSTOM: Custom authentication provider
>
> (Use with property
> hive.server2.custom.authentication.class)
>
>
>
> @@ -2217,6 +2218,15 @@
>
>  
>
>
>
>  
>
> +  hive.server2.authentication.pam.services
>
> +  
>
> +  
>
> +List of the underlying PAM services that should be used when auth type
> is PAM.
>
> +A file with the same name must exist in /etc/pam.d.
>
> +  
>
> +
>
> +
>
> +
>
>hive.server2.enable.doAs
>
>true
>
>
>
>
> HIVE-6681 :
>
> +
>
> SERDESUSINGMETASTOREFORSCHEMA("hive.serdes.using.metastore.for.schema","org.apache.hadoop.hive.ql.io.orc.OrcSerde,"
>
> +  +
>
> "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe,"
>
> +  +
>
> "org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe,"
>
> +  +
>
> "org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe,org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe,"
>
> +  + "org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe"),
>
>
>
>
> That's it, as far as I can tell.
>
> -- Lefty
>
>
> On Wed, Apr 9, 2014 at 3:49 PM, Harish Butani  >wrote:
>
> > Lefty, here is the list I found missing from the template xml file:
> >
> > HIVE-6447:
> >
> >
> HIVE_CONVERT_JOIN_BUCKET_MAPJOIN_TEZ("hive.convert.join.bucket.mapjoin.tez",
> > false),
> >
> > HIVE-6492:
> > HIVELIMITTABLESCANPARTITION("hive.limit.query.max.table.partition",
> > -1),
> >
> > HIVE-5843:
> > // Transactions
> > HIVE_TXN_MANAGER("hive.t

[jira] [Commented] (HIVE-6880) TestHWISessionManager fails with -Phadoop-2

2014-04-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965586#comment-13965586
 ] 

Hive QA commented on HIVE-6880:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12639512/HIVE-6880.1.patch

{color:green}SUCCESS:{color} +1 5571 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2206/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2206/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12639512

> TestHWISessionManager fails with -Phadoop-2
> ---
>
> Key: HIVE-6880
> URL: https://issues.apache.org/jira/browse/HIVE-6880
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 0.13.0
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-6880.1.patch
>
>
> Looks like dependencies missing for -Phadoop-2
> {noformat}
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.213 sec <<< 
> FAILURE! - in org.apache.hadoop.hive.hwi.TestHWISessionManager
> warning(junit.framework.TestSuite$1)  Time elapsed: 0.009 sec  <<< FAILURE!
> junit.framework.AssertionFailedError: Exception in constructor: 
> testHiveDriver (java.lang.NoClassDefFoundError: 
> org/apache/hadoop/mapreduce/TaskAttemptContext
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:171)
>   at 
> org.apache.hadoop.hive.shims.ShimLoader.createShim(ShimLoader.java:120)
>   at 
> org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:115)
>   at 
> org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:80)
>   at 
> org.apache.hadoop.hive.conf.HiveConf$ConfVars.(HiveConf.java:248)
>   at org.apache.hadoop.hive.conf.HiveConf.(HiveConf.java:81)
>   at 
> org.apache.hadoop.hive.hwi.TestHWISessionManager.(TestHWISessionManager.java:46)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>   at junit.framework.TestSuite.createTest(TestSuite.java:65)
>   at junit.framework.TestSuite.addTestMethod(TestSuite.java:294)
>   at junit.framework.TestSuite.addTestsFromTestCase(TestSuite.java:150)
>   at junit.framework.TestSuite.(TestSuite.java:129)
>   at 
> org.junit.internal.runners.JUnit38ClassRunner.(JUnit38ClassRunner.java:71)
>   at 
> org.junit.internal.builders.JUnit3Builder.runnerForClass(JUnit3Builder.java:14)
>   at 
> org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:57)
>   at 
> org.junit.internal.builders.AllDefaultPossibilitiesBuilder.runnerForClass(AllDefaultPossibilitiesBuilder.java:29)
>   at 
> org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:57)
>   at 
> org.junit.internal.requests.ClassRequest.getRunner(ClassRequest.java:24)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:262)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.mapreduce.TaskAttemptContext
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>   ... 28 more
> )
>   at junit.framework.Assert.fail(Assert.java:50)
>   at junit.framework.TestSuite$1.runTest(TestSuite.java:97)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Timeline for the Hive 0.13 release?

2014-04-10 Thread Vaibhav Gumashta
In fact, Harish just pointed out, the SSL configs are not there in hive
template. Thanks for pointing out Lefty!


On Thu, Apr 10, 2014 at 1:21 PM, Vaibhav Gumashta  wrote:

> Hi Lefty,
>
> All the HiveServer2 related configs are already in hive-site.xml.template
> and HiveConf.java.
>
> Thanks,
> --Vaibhav
>
>
> On Thu, Apr 10, 2014 at 7:54 AM, Lefty Leverenz 
> wrote:
>
>> Harish, here are some additions to your list, with links and patch
>> excerpts:
>>
>>
>> HIVE-5351  (linked doc
>> jira HIVE-6318  doesn't
>> provide definitions for template file but documents these config in the
>> wiki -- Setting Up HiveServer2 - SSL
>> Encryption<
>> https://cwiki.apache.org/confluence/display/Hive/Setting+up+HiveServer2#SettingUpHiveServer2-SSLEncryption
>> >
>> ):
>>
>> +HIVE_SERVER2_USE_SSL("hive.server2.use.SSL", false),
>>
>> +HIVE_SERVER2_SSL_KEYSTORE_PATH("hive.server2.keystore.path", ""),
>>
>> +HIVE_SERVER2_SSL_KEYSTORE_PASSWORD("hive.server2.keystore.password",
>> ""),
>>
>>
>>
>> HIVE-6447 (on your list):
>>
>>
>>
>> HIVE_CONVERT_JOIN_BUCKET_MAPJOIN_TEZ("hive.convert.join.bucket.mapjoin.tez",
>> false),
>>
>> description provided in jira
>> comment<
>> https://issues.apache.org/jira/browse/HIVE-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959765#comment-13959765
>> >
>> :
>>
>> {code}
>>
>> 
>>
>>   hive.convert.join.bucket.mapjoin.tez
>>
>>   false
>>
>>   Whether joins can be automatically converted to bucket map
>> joins in hive when tez is used as the execution engine.
>>
>> 
>>
>> {code}
>>
>>
>> HIVE-6500 :
>>
>> HIVESTATSDBCLASS("hive.stats.dbclass", "fs",
>>
>> new PatternValidator("jdbc(:.*)", "hbase", "counter", "custom",
>> "fs")), // StatsSetupConst.StatDB
>>
>> *Need to add "fs" to template description:*
>>
>>   hive.stats.dbclass
>>
>>   counter
>>
>>   The storage that stores temporary Hive statistics.
>> Currently, jdbc, hbase, counter and custom type are
>> supported.
>>
>>
>>
>> HIVE-6466  added a
>> config
>> value (PAM) and a new config (hive.server2.authentication.pam.services):
>>
>>  HIVE_SERVER2_AUTHENTICATION("hive.server2.authentication", "NONE",
>>
>> -new StringsValidator("NOSASL", "NONE", "LDAP", "KERBEROS",
>> "CUSTOM")),
>>
>> +new StringsValidator("NOSASL", "NONE", "LDAP", "KERBEROS", "PAM",
>> "CUSTOM")),
>>
>>  ...
>>
>> +// List of the underlying pam services that should be used when auth
>> type is PAM
>>
>> +// A file with the same name must exist in /etc/pam.d
>>
>> +HIVE_SERVER2_PAM_SERVICES("hive.server2.authentication.pam.services",
>> null),
>>
>>
>> It's documented in the wiki by
>> HIVE-6318in Setting
>> Up HiveServer2 - Pluggable Authentication Modules
>> (PAM)<
>> https://cwiki.apache.org/confluence/display/Hive/Setting+up+HiveServer2#SettingUpHiveServer2-PluggableAuthenticationModules(PAM)
>> >and
>> supposedly documented in the template file by
>> HIVE-6503 , which says
>> committed for 0.13.0 but *doesn't show up in branch 13 or trunk.*
>> HIVE-6503.1.patch<
>> https://issues.apache.org/jira/secure/attachment/12633674/HIVE-6503.1.patch
>> >
>> :
>>
>> @@ -2165,6 +2165,7 @@
>>
>> NONE: no authentication check
>>
>> LDAP: LDAP/AD based authentication
>>
>> KERBEROS: Kerberos/GSSAPI authentication
>>
>> +   PAM: Pluggable authentication module
>>
>> CUSTOM: Custom authentication provider
>>
>> (Use with property
>> hive.server2.custom.authentication.class)
>>
>>
>>
>> @@ -2217,6 +2218,15 @@
>>
>>  
>>
>>
>>
>>  
>>
>> +  hive.server2.authentication.pam.services
>>
>> +  
>>
>> +  
>>
>> +List of the underlying PAM services that should be used when auth
>> type
>> is PAM.
>>
>> +A file with the same name must exist in /etc/pam.d.
>>
>> +  
>>
>> +
>>
>> +
>>
>> +
>>
>>hive.server2.enable.doAs
>>
>>true
>>
>>
>>
>>
>> HIVE-6681 :
>>
>> +
>>
>> SERDESUSINGMETASTOREFORSCHEMA("hive.serdes.using.metastore.for.schema","org.apache.hadoop.hive.ql.io.orc.OrcSerde,"
>>
>> +  +
>>
>> "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe,"
>>
>> +  +
>>
>> "org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe,"
>>
>> +  +
>>
>> "org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe,org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe,"
>>
>> +  + "org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe"),
>>
>>
>>
>>
>> That's it, as far as I can tell.
>>
>> -- Lefty
>>
>>
>> On Wed, Apr 9, 2014 at 3:49 PM, Harish B

[jira] [Updated] (HIVE-6503) document pluggable authentication modules (PAM) in template config, wiki

2014-04-10 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-6503:
---

Priority: Major  (was: Blocker)

> document pluggable authentication modules (PAM) in template config, wiki
> 
>
> Key: HIVE-6503
> URL: https://issues.apache.org/jira/browse/HIVE-6503
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.13.0
>Reporter: Thejas M Nair
>Assignee: Vaibhav Gumashta
> Fix For: 0.13.0
>
> Attachments: HIVE-6503.1.patch
>
>
> HIVE-6466 adds support for "PAM" as a supported value for 
> hive.server2.authentication. 
> It also adds a config parameter hive.server2.authentication.pam.services.
> The default template file needs to be updated to document these. The wiki 
> docs should also document the support for pluggable authentication modules.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Reopened] (HIVE-6503) document pluggable authentication modules (PAM) in template config, wiki

2014-04-10 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta reopened HIVE-6503:



> document pluggable authentication modules (PAM) in template config, wiki
> 
>
> Key: HIVE-6503
> URL: https://issues.apache.org/jira/browse/HIVE-6503
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.13.0
>Reporter: Thejas M Nair
>Assignee: Vaibhav Gumashta
>Priority: Blocker
> Fix For: 0.13.0
>
> Attachments: HIVE-6503.1.patch
>
>
> HIVE-6466 adds support for "PAM" as a supported value for 
> hive.server2.authentication. 
> It also adds a config parameter hive.server2.authentication.pam.services.
> The default template file needs to be updated to document these. The wiki 
> docs should also document the support for pluggable authentication modules.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6503) document pluggable authentication modules (PAM) in template config, wiki

2014-04-10 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965611#comment-13965611
 ] 

Vaibhav Gumashta commented on HIVE-6503:


Reopened the issue as the patch never got applied.

[~rhbutani] This just modifies the template file. I think there is not point in 
a precommit run for this.

> document pluggable authentication modules (PAM) in template config, wiki
> 
>
> Key: HIVE-6503
> URL: https://issues.apache.org/jira/browse/HIVE-6503
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.13.0
>Reporter: Thejas M Nair
>Assignee: Vaibhav Gumashta
> Fix For: 0.13.0
>
> Attachments: HIVE-6503.1.patch
>
>
> HIVE-6466 adds support for "PAM" as a supported value for 
> hive.server2.authentication. 
> It also adds a config parameter hive.server2.authentication.pam.services.
> The default template file needs to be updated to document these. The wiki 
> docs should also document the support for pluggable authentication modules.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-6885) Address style and docs feedback in HIVE-5687

2014-04-10 Thread Alan Gates (JIRA)
Alan Gates created HIVE-6885:


 Summary: Address style and docs feedback in HIVE-5687
 Key: HIVE-6885
 URL: https://issues.apache.org/jira/browse/HIVE-6885
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.0
Reporter: Alan Gates
Assignee: Roshan Naik


There were a number of style and docs feedback given in HIVE-5687 that were not 
addressed before it was committed.  These need to be addressed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5687) Streaming support in Hive

2014-04-10 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965644#comment-13965644
 ] 

Alan Gates commented on HIVE-5687:
--

File HIVE-6885 to address the style and docs feedback.

> Streaming support in Hive
> -
>
> Key: HIVE-5687
> URL: https://issues.apache.org/jira/browse/HIVE-5687
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: ACID, Streaming
> Fix For: 0.13.0
>
> Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 
> 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, 
> HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, 
> HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, 
> HIVE-5687.v6.patch, HIVE-5687.v7.patch, Hive Streaming Ingest API for v3 
> patch.pdf, Hive Streaming Ingest API for v4 patch.pdf, package.html
>
>
> Implement support for Streaming data into HIVE.
> - Provide a client streaming API 
> - Transaction support: Clients should be able to periodically commit a batch 
> of records atomically
> - Immediate visibility: Records should be immediately visible to queries on 
> commit
> - Should not overload HDFS with too many small files
> Use Cases:
>  - Streaming logs into HIVE via Flume
>  - Streaming results of computations from Storm



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6885) Address style and docs feedback in HIVE-5687

2014-04-10 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965643#comment-13965643
 ] 

Alan Gates commented on HIVE-6885:
--

See 
 * 
https://issues.apache.org/jira/browse/HIVE-5687?focusedCommentId=13961469&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13961469
 * 
https://issues.apache.org/jira/browse/HIVE-5687?focusedCommentId=13961928&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13961928
 * Lefty's comments on https://reviews.apache.org/r/19754/

> Address style and docs feedback in HIVE-5687
> 
>
> Key: HIVE-6885
> URL: https://issues.apache.org/jira/browse/HIVE-6885
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.13.0
>Reporter: Alan Gates
>Assignee: Roshan Naik
>
> There were a number of style and docs feedback given in HIVE-5687 that were 
> not addressed before it was committed.  These need to be addressed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5687) Streaming support in Hive

2014-04-10 Thread Lars Francke (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965646#comment-13965646
 ] 

Lars Francke commented on HIVE-5687:


Thanks Alan for the follow-up!

> Streaming support in Hive
> -
>
> Key: HIVE-5687
> URL: https://issues.apache.org/jira/browse/HIVE-5687
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: ACID, Streaming
> Fix For: 0.13.0
>
> Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 
> 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, 
> HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, 
> HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, 
> HIVE-5687.v6.patch, HIVE-5687.v7.patch, Hive Streaming Ingest API for v3 
> patch.pdf, Hive Streaming Ingest API for v4 patch.pdf, package.html
>
>
> Implement support for Streaming data into HIVE.
> - Provide a client streaming API 
> - Transaction support: Clients should be able to periodically commit a batch 
> of records atomically
> - Immediate visibility: Records should be immediately visible to queries on 
> commit
> - Should not overload HDFS with too many small files
> Use Cases:
>  - Streaming logs into HIVE via Flume
>  - Streaming results of computations from Storm



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-6886) Minor fixes in the compactor

2014-04-10 Thread Alan Gates (JIRA)
Alan Gates created HIVE-6886:


 Summary: Minor fixes in the compactor
 Key: HIVE-6886
 URL: https://issues.apache.org/jira/browse/HIVE-6886
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Alan Gates
Assignee: Alan Gates


A number of smaller issues were identified in the review of HIVE-6319 that were 
not addressed due to the push to get it into 0.13.  These need to be addressed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6886) Minor fixes in the compactor

2014-04-10 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965649#comment-13965649
 ] 

Alan Gates commented on HIVE-6886:
--

See 
https://issues.apache.org/jira/browse/HIVE-6319?focusedCommentId=13964552&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13964552

> Minor fixes in the compactor
> 
>
> Key: HIVE-6886
> URL: https://issues.apache.org/jira/browse/HIVE-6886
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>
> A number of smaller issues were identified in the review of HIVE-6319 that 
> were not addressed due to the push to get it into 0.13.  These need to be 
> addressed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6319) Insert, update, delete functionality needs a compactor

2014-04-10 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965651#comment-13965651
 ] 

Alan Gates commented on HIVE-6319:
--

Filed HIVE-6886 to track feedback from Owen that hasn't been addressed yet.

> Insert, update, delete functionality needs a compactor
> --
>
> Key: HIVE-6319
> URL: https://issues.apache.org/jira/browse/HIVE-6319
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Alan Gates
>Assignee: Alan Gates
> Fix For: 0.13.0
>
> Attachments: 6319.wip.patch, HIVE-6319.patch, HIVE-6319.patch, 
> HIVE-6319.patch, HIVE-6319.patch, HIVE-6319.patch, HiveCompactorDesign.pdf
>
>
> In order to keep the number of delta files from spiraling out of control we 
> need a compactor to collect these delta files together, and eventually 
> rewrite the base file when the deltas get large enough.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6816) jar upload path w/o schema is not handled correctly

2014-04-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6816:
---

Attachment: HIVE-6816.01.patch

Update the patch to correspond to recent changes

> jar upload path w/o schema is not handled correctly
> ---
>
> Key: HIVE-6816
> URL: https://issues.apache.org/jira/browse/HIVE-6816
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 0.13.0
>
> Attachments: HIVE-6816.01.patch, HIVE-6816.patch
>
>
> {noformat}
> java.io.IOException: java.net.URISyntaxException: Expected scheme name at 
> index 0: 
> :///user/sershe/hive-exec-0.14.0-SNAPSHOT-5a31b3483b29ad46db705a47893898b2b9f5b7ce3c65f0641bbecca2b1201d81.jar
>   at 
> org.apache.tez.client.TezClientUtils.setupDAGCredentials(TezClientUtils.java:304)
>   at org.apache.tez.client.TezSession.submitDAG(TezSession.java:202)
>   at org.apache.tez.client.TezSession.submitDAG(TezSession.java:154)
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:294)
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:147)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1473)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1240)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1058)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:885)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:875)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:687)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> Caused by: java.net.URISyntaxException: Expected scheme name at index 0: 
> :///user/sershe/hive-exec-0.14.0-SNAPSHOT-5a31b3483b29ad46db705a47893898b2b9f5b7ce3c65f0641bbecca2b1201d81.jar
>   at java.net.URI$Parser.fail(URI.java:2829)
>   at java.net.URI$Parser.failExpecting(URI.java:2835)
>   at java.net.URI$Parser.parse(URI.java:3027)
>   at java.net.URI.(URI.java:753)
>   at 
> org.apache.hadoop.yarn.util.ConverterUtils.getPathFromYarnURL(ConverterUtils.java:80)
>   at 
> org.apache.tez.client.TezClientUtils.setupDAGCredentials(TezClientUtils.java:296)
>   ... 22 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-6887) Add missing params to hive-default.xml.template

2014-04-10 Thread Harish Butani (JIRA)
Harish Butani created HIVE-6887:
---

 Summary: Add missing params to hive-default.xml.template 
 Key: HIVE-6887
 URL: https://issues.apache.org/jira/browse/HIVE-6887
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani


Add in the ones that were added to HiveConf, but not the template.xml file; For 
0.13 we will not be moving to HIVE-6037 style of genning the template file. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6887) Add missing params to hive-default.xml.template

2014-04-10 Thread Harish Butani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-6887:


Attachment: HIVE-6887.1.patch

> Add missing params to hive-default.xml.template 
> 
>
> Key: HIVE-6887
> URL: https://issues.apache.org/jira/browse/HIVE-6887
> Project: Hive
>  Issue Type: Bug
>Reporter: Harish Butani
> Attachments: HIVE-6887.1.patch
>
>
> Add in the ones that were added to HiveConf, but not the template.xml file; 
> For 0.13 we will not be moving to HIVE-6037 style of genning the template 
> file. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Timeline for the Hive 0.13 release?

2014-04-10 Thread Harish Butani
Lefty, I added in the missing ones and have a patch, jira is HIVE-6887:
- The ones from 6500 and 6466 are already attached.
- I got the comments for 6447 from Vaibhav.
- Applied the patch from HIVE-6503.  I think it got closed as a Duplicate.

Sorry if it looks like I am rushing you. Just trying to help. Please review or 
replace with your patch. Will wait for your response.
We are most likely down to this issue.  Would love to cut the rc today/early 
tomorrow.

regards,
Harish.

On Apr 10, 2014, at 10:41 AM, Vaibhav Gumashta  
wrote:

> In fact, Harish just pointed out, the SSL configs are not there in hive
> template. Thanks for pointing out Lefty!
> 
> 
> On Thu, Apr 10, 2014 at 1:21 PM, Vaibhav Gumashta > wrote:
> 
>> Hi Lefty,
>> 
>> All the HiveServer2 related configs are already in hive-site.xml.template
>> and HiveConf.java.
>> 
>> Thanks,
>> --Vaibhav
>> 
>> 
>> On Thu, Apr 10, 2014 at 7:54 AM, Lefty Leverenz 
>> wrote:
>> 
>>> Harish, here are some additions to your list, with links and patch
>>> excerpts:
>>> 
>>> 
>>> HIVE-5351  (linked doc
>>> jira HIVE-6318  doesn't
>>> provide definitions for template file but documents these config in the
>>> wiki -- Setting Up HiveServer2 - SSL
>>> Encryption<
>>> https://cwiki.apache.org/confluence/display/Hive/Setting+up+HiveServer2#SettingUpHiveServer2-SSLEncryption
 
>>> ):
>>> 
>>> +HIVE_SERVER2_USE_SSL("hive.server2.use.SSL", false),
>>> 
>>> +HIVE_SERVER2_SSL_KEYSTORE_PATH("hive.server2.keystore.path", ""),
>>> 
>>> +HIVE_SERVER2_SSL_KEYSTORE_PASSWORD("hive.server2.keystore.password",
>>> ""),
>>> 
>>> 
>>> 
>>> HIVE-6447 (on your list):
>>> 
>>> 
>>> 
>>> HIVE_CONVERT_JOIN_BUCKET_MAPJOIN_TEZ("hive.convert.join.bucket.mapjoin.tez",
>>> false),
>>> 
>>>description provided in jira
>>> comment<
>>> https://issues.apache.org/jira/browse/HIVE-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959765#comment-13959765
 
>>> :
>>> 
>>> {code}
>>> 
>>> 
>>> 
>>>  hive.convert.join.bucket.mapjoin.tez
>>> 
>>>  false
>>> 
>>>  Whether joins can be automatically converted to bucket map
>>> joins in hive when tez is used as the execution engine.
>>> 
>>> 
>>> 
>>> {code}
>>> 
>>> 
>>> HIVE-6500 :
>>> 
>>>HIVESTATSDBCLASS("hive.stats.dbclass", "fs",
>>> 
>>>new PatternValidator("jdbc(:.*)", "hbase", "counter", "custom",
>>> "fs")), // StatsSetupConst.StatDB
>>> 
>>> *Need to add "fs" to template description:*
>>> 
>>>  hive.stats.dbclass
>>> 
>>>  counter
>>> 
>>>  The storage that stores temporary Hive statistics.
>>> Currently, jdbc, hbase, counter and custom type are
>>> supported.
>>> 
>>> 
>>> 
>>> HIVE-6466  added a
>>> config
>>> value (PAM) and a new config (hive.server2.authentication.pam.services):
>>> 
>>> HIVE_SERVER2_AUTHENTICATION("hive.server2.authentication", "NONE",
>>> 
>>> -new StringsValidator("NOSASL", "NONE", "LDAP", "KERBEROS",
>>> "CUSTOM")),
>>> 
>>> +new StringsValidator("NOSASL", "NONE", "LDAP", "KERBEROS", "PAM",
>>> "CUSTOM")),
>>> 
>>> ...
>>> 
>>> +// List of the underlying pam services that should be used when auth
>>> type is PAM
>>> 
>>> +// A file with the same name must exist in /etc/pam.d
>>> 
>>> +HIVE_SERVER2_PAM_SERVICES("hive.server2.authentication.pam.services",
>>> null),
>>> 
>>> 
>>> It's documented in the wiki by
>>> HIVE-6318in Setting
>>> Up HiveServer2 - Pluggable Authentication Modules
>>> (PAM)<
>>> https://cwiki.apache.org/confluence/display/Hive/Setting+up+HiveServer2#SettingUpHiveServer2-PluggableAuthenticationModules(PAM)
 and
>>> supposedly documented in the template file by
>>> HIVE-6503 , which says
>>> committed for 0.13.0 but *doesn't show up in branch 13 or trunk.*
>>> HIVE-6503.1.patch<
>>> https://issues.apache.org/jira/secure/attachment/12633674/HIVE-6503.1.patch
 
>>> :
>>> 
>>> @@ -2165,6 +2165,7 @@
>>> 
>>>NONE: no authentication check
>>> 
>>>LDAP: LDAP/AD based authentication
>>> 
>>>KERBEROS: Kerberos/GSSAPI authentication
>>> 
>>> +   PAM: Pluggable authentication module
>>> 
>>>CUSTOM: Custom authentication provider
>>> 
>>>(Use with property
>>> hive.server2.custom.authentication.class)
>>> 
>>>   
>>> 
>>> @@ -2217,6 +2218,15 @@
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> +  hive.server2.authentication.pam.services
>>> 
>>> +  
>>> 
>>> +  
>>> 
>>> +List of the underlying PAM services that should be used when auth
>>> type
>>> is PAM.
>>> 
>>> +A file with the same name must exist in /etc/pam.d.
>>> 
>>> +  
>>> 
>>> +
>>> 
>>> +
>>> 
>>> +
>>> 
>>>   hive.server2.enable.doAs
>>> 
>>>   true
>>> 
>>>   
>>> 
>>> 
>>> HIVE-6681 

[jira] [Commented] (HIVE-6784) parquet-hive should allow column type change

2014-04-10 Thread Tongjie Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965702#comment-13965702
 ] 

Tongjie Chen commented on HIVE-6784:


Not allowing changing types of columns would be negative for adopting parquet 
--- at least this is a different behavior from using other file format. 

Regarding performance penalty, 

1) if you look at current implementation of LazyInteger, LayzLong etc (from 
LazySimpleSerDe, in package org.apache.hadoop.hive.serde2.lazy), they try to 
parseInt, parseLong for every column (all initially represented as string, the 
parsing overhead occurs even if the type is expected).  This is how hive 
achieves change types of columns ("schema on read");  in another word, the 
similar performance penalty is already there for other SerDe in order to 
achieve "schema on read".

  /** 
   *Parses the string argument as if it was an int value and returns the
   * result. Throws NumberFormatException if the string does not represent an
   * int quantity.
   ...
public static int parseInt(byte[] bytes, int start, int length, int radix) {


   /**
   * Parses the string argument as if it was a long value and returns the
   * result. Throws NumberFormatException if the string does not represent a
   * long quantity.
   
   public static long parseLong(byte[] bytes, int start, int length, int radix) 
{


2) In he patch for this jira, the extra overhead is to list the top level 
element of ArrayWritable to inspect whether there is a type change or not. A 
new converted object is created ONLY IF the type is changed.

I agree that there would be some overhead to list the element of 
ArrayWritable, but it is a tradeoff.

3) The patch in this jira would actually help performance when serializing (in 
time of writing) ArrayWritable. The old approach is to create a new object for 
every single writable element; with this patch, it only creates new a new 
object when there is type change.



> parquet-hive should allow column type change
> 
>
> Key: HIVE-6784
> URL: https://issues.apache.org/jira/browse/HIVE-6784
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats, Serializers/Deserializers
>Affects Versions: 0.13.0
>Reporter: Tongjie Chen
> Fix For: 0.14.0
>
> Attachments: HIVE-6784.1.patch.txt, HIVE-6784.2.patch.txt
>
>
> see also in the following parquet issue:
> https://github.com/Parquet/parquet-mr/issues/323
> Currently, if we change parquet format hive table using "alter table 
> parquet_table change c1 c1 bigint " ( assuming original type of c1 is int), 
> it will result in exception thrown from SerDe: 
> "org.apache.hadoop.io.IntWritable cannot be cast to 
> org.apache.hadoop.io.LongWritable" in query runtime.
> This is different behavior from hive (using other file format), where it will 
> try to perform cast (null value in case of incompatible type).
> Parquet Hive's RecordReader returns an ArrayWritable (based on schema stored 
> in footers of parquet files); ParquetHiveSerDe also creates an corresponding 
> ArrayWritableObjectInspector (but using column type info from metastore). 
> Whenever there is column type change, the objector inspector will throw 
> exception, since WritableLongObjectInspector cannot inspect an IntWritable 
> etc...
> Conversion has to happen somewhere if we want to allow type change. SerDe's 
> deserialize method seems a natural place for it.
> Currently, serialize method calls createStruct (then createPrimitive) for 
> every record, but it creates a new object regardless, which seems expensive. 
> I think that could be optimized a bit by just returning the object passed if 
> already of the right type. deserialize also reuse this method, if there is a 
> type change, there will be new object to be created, which I think is 
> inevitable. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6755) Zookeeper Lock Manager leaks zookeeper connections.

2014-04-10 Thread Andrey Stepachev (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Stepachev updated HIVE-6755:
---

Status: Patch Available  (was: Reopened)

> Zookeeper Lock Manager leaks zookeeper connections.
> ---
>
> Key: HIVE-6755
> URL: https://issues.apache.org/jira/browse/HIVE-6755
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0
> Environment: cloudera cdh5b2
>Reporter: Andrey Stepachev
>Priority: Critical
> Attachments: HIVE-6755.patch
>
>
> Driver holds instance for ZkHiveLockManager. In turn SqlQuery holds it too. 
> So if we have many not closed queries we will get many zk sessions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6427) Hive Server2 should reopen Metastore client in case of any Thrift exceptions

2014-04-10 Thread Andrey Stepachev (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Stepachev updated HIVE-6427:
---

Status: Patch Available  (was: Open)

> Hive Server2 should reopen Metastore client in case of any Thrift exceptions
> 
>
> Key: HIVE-6427
> URL: https://issues.apache.org/jira/browse/HIVE-6427
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0
> Environment: cloudera cdh5 beta2
>Reporter: Andrey Stepachev
>Priority: Critical
> Attachments: HIVE-6427-2.patch
>
>
> In case of metastore restart hive server doesn't reopen connection to 
> metastore. Any command gives broken pipe or similar exceptions.
> http://paste.ubuntu.com/6926215/
> Any subsequent command doesn't reestablish connection and tries to use stale 
> (closed) connection.
> Looks like we shouldn't blindly convert any MetaException to 
> HiveSQLException, but should distinguish between fatal exceptions and logical 
> exceptions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6816) jar upload path w/o schema is not handled correctly

2014-04-10 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965766#comment-13965766
 ] 

Vikram Dixit K commented on HIVE-6816:
--

+1 LGTM

> jar upload path w/o schema is not handled correctly
> ---
>
> Key: HIVE-6816
> URL: https://issues.apache.org/jira/browse/HIVE-6816
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 0.13.0
>
> Attachments: HIVE-6816.01.patch, HIVE-6816.patch
>
>
> {noformat}
> java.io.IOException: java.net.URISyntaxException: Expected scheme name at 
> index 0: 
> :///user/sershe/hive-exec-0.14.0-SNAPSHOT-5a31b3483b29ad46db705a47893898b2b9f5b7ce3c65f0641bbecca2b1201d81.jar
>   at 
> org.apache.tez.client.TezClientUtils.setupDAGCredentials(TezClientUtils.java:304)
>   at org.apache.tez.client.TezSession.submitDAG(TezSession.java:202)
>   at org.apache.tez.client.TezSession.submitDAG(TezSession.java:154)
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:294)
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:147)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1473)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1240)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1058)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:885)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:875)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:687)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> Caused by: java.net.URISyntaxException: Expected scheme name at index 0: 
> :///user/sershe/hive-exec-0.14.0-SNAPSHOT-5a31b3483b29ad46db705a47893898b2b9f5b7ce3c65f0641bbecca2b1201d81.jar
>   at java.net.URI$Parser.fail(URI.java:2829)
>   at java.net.URI$Parser.failExpecting(URI.java:2835)
>   at java.net.URI$Parser.parse(URI.java:3027)
>   at java.net.URI.(URI.java:753)
>   at 
> org.apache.hadoop.yarn.util.ConverterUtils.getPathFromYarnURL(ConverterUtils.java:80)
>   at 
> org.apache.tez.client.TezClientUtils.setupDAGCredentials(TezClientUtils.java:296)
>   ... 22 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (HIVE-6883) Dynamic partitioning optimization does not honor sort order or order by

2014-04-10 Thread Harish Butani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965758#comment-13965758
 ] 

Harish Butani edited comment on HIVE-6883 at 4/10/14 7:36 PM:
--

I going to say, let's keep this off 0.13
The user can turn of the dyn partition optimization in the case of sort/order


was (Author: rhbutani):
I going to say, let's keep this off 0.13
The use can turn of the dyn partition optimization in the case of sort/order

> Dynamic partitioning optimization does not honor sort order or order by
> ---
>
> Key: HIVE-6883
> URL: https://issues.apache.org/jira/browse/HIVE-6883
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>Priority: Critical
> Fix For: 0.13.0
>
> Attachments: HIVE-6883.1.patch
>
>
> HIVE-6455 patch does not honor sort order of the output table or order by of 
> select statement. The reason for the former is numDistributionKey in 
> ReduceSinkDesc is set wrongly. It doesn't take into account the sort columns, 
> because of this RSOp sets the sort columns to null in Key. Since nulls are 
> set in place of sort columns in Key, the sort columns in Value are not 
> sorted. 
> The other issue is ORDER BY columns are not honored during insertion. For 
> example
> {code}
> insert overwrite table over1k_part_orc partition(ds="foo", t) select 
> si,i,b,f,t from over1k_orc where t is null or t=27 order by si;
> {code}
> the select query performs order by on column 'si' in the first MR job. The 
> following MR job (inserted by HIVE-6455), sorts the input data on dynamic 
> partition column 't' without taking into account the already sorted 'si' 
> column. This results in out of order insertion for 'si' column.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5376) Hive does not honor type for partition columns when altering column type

2014-04-10 Thread Harish Butani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965754#comment-13965754
 ] 

Harish Butani commented on HIVE-5376:
-

Removed this from 0.13 list.
[~hsubramaniyan] please look into the failure when there is a  DEFAULT_PARTITION

> Hive does not honor type for partition columns when altering column type
> 
>
> Key: HIVE-5376
> URL: https://issues.apache.org/jira/browse/HIVE-5376
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Reporter: Sergey Shelukhin
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-5376.1.patch, HIVE-5376.2.patch
>
>
> Followup for HIVE-5297. If partition column of type string is changed to int, 
> the data is not verified. The values for partition columns are all in 
> metastore db, so it's easy to check and fail the type change.
> alter_partition_coltype.q (or some other test?) checks this behavior right 
> now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6879) Vectorization: IsNull returns incorrect output column.

2014-04-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965761#comment-13965761
 ] 

Hive QA commented on HIVE-6879:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12639519/HIVE-6879.1.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5571 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketizedhiveinputformat
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2207/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2207/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12639519

> Vectorization: IsNull returns incorrect output column.
> --
>
> Key: HIVE-6879
> URL: https://issues.apache.org/jira/browse/HIVE-6879
> Project: Hive
>  Issue Type: Bug
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HIVE-6879.1.patch
>
>
> IsNull returns -1 as output column.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6883) Dynamic partitioning optimization does not honor sort order or order by

2014-04-10 Thread Harish Butani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965758#comment-13965758
 ] 

Harish Butani commented on HIVE-6883:
-

I going to say, let's keep this off 0.13
The use can turn of the dyn partition optimization in the case of sort/order

> Dynamic partitioning optimization does not honor sort order or order by
> ---
>
> Key: HIVE-6883
> URL: https://issues.apache.org/jira/browse/HIVE-6883
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>Priority: Critical
> Fix For: 0.13.0
>
> Attachments: HIVE-6883.1.patch
>
>
> HIVE-6455 patch does not honor sort order of the output table or order by of 
> select statement. The reason for the former is numDistributionKey in 
> ReduceSinkDesc is set wrongly. It doesn't take into account the sort columns, 
> because of this RSOp sets the sort columns to null in Key. Since nulls are 
> set in place of sort columns in Key, the sort columns in Value are not 
> sorted. 
> The other issue is ORDER BY columns are not honored during insertion. For 
> example
> {code}
> insert overwrite table over1k_part_orc partition(ds="foo", t) select 
> si,i,b,f,t from over1k_orc where t is null or t=27 order by si;
> {code}
> the select query performs order by on column 'si' in the first MR job. The 
> following MR job (inserted by HIVE-6455), sorts the input data on dynamic 
> partition column 't' without taking into account the already sorted 'si' 
> column. This results in out of order insertion for 'si' column.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6864) HiveServer2 concurrency uses incorrect user information in unsecured mode

2014-04-10 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-6864:
---

Attachment: HIVE-6864.2.patch

[~thejas] Patch includes feedback. Thanks!

> HiveServer2 concurrency uses incorrect user information in unsecured mode
> -
>
> Key: HIVE-6864
> URL: https://issues.apache.org/jira/browse/HIVE-6864
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.13.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>Priority: Blocker
> Fix For: 0.13.0
>
> Attachments: HIVE-6864.1.patch, HIVE-6864.2.patch
>
>
> Concurrent queries create table with wrong ownership



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6883) Dynamic partitioning optimization does not honor sort order or order by

2014-04-10 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965768#comment-13965768
 ] 

Prasanth J commented on HIVE-6883:
--

Thanks. No problem. I will change the fix version then. And yes the user can 
turn off this optimization.

> Dynamic partitioning optimization does not honor sort order or order by
> ---
>
> Key: HIVE-6883
> URL: https://issues.apache.org/jira/browse/HIVE-6883
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>Priority: Critical
> Attachments: HIVE-6883.1.patch
>
>
> HIVE-6455 patch does not honor sort order of the output table or order by of 
> select statement. The reason for the former is numDistributionKey in 
> ReduceSinkDesc is set wrongly. It doesn't take into account the sort columns, 
> because of this RSOp sets the sort columns to null in Key. Since nulls are 
> set in place of sort columns in Key, the sort columns in Value are not 
> sorted. 
> The other issue is ORDER BY columns are not honored during insertion. For 
> example
> {code}
> insert overwrite table over1k_part_orc partition(ds="foo", t) select 
> si,i,b,f,t from over1k_orc where t is null or t=27 order by si;
> {code}
> the select query performs order by on column 'si' in the first MR job. The 
> following MR job (inserted by HIVE-6455), sorts the input data on dynamic 
> partition column 't' without taking into account the already sorted 'si' 
> column. This results in out of order insertion for 'si' column.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6883) Dynamic partitioning optimization does not honor sort order or order by

2014-04-10 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-6883:
-

Fix Version/s: (was: 0.14.0)

> Dynamic partitioning optimization does not honor sort order or order by
> ---
>
> Key: HIVE-6883
> URL: https://issues.apache.org/jira/browse/HIVE-6883
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>Priority: Critical
> Attachments: HIVE-6883.1.patch
>
>
> HIVE-6455 patch does not honor sort order of the output table or order by of 
> select statement. The reason for the former is numDistributionKey in 
> ReduceSinkDesc is set wrongly. It doesn't take into account the sort columns, 
> because of this RSOp sets the sort columns to null in Key. Since nulls are 
> set in place of sort columns in Key, the sort columns in Value are not 
> sorted. 
> The other issue is ORDER BY columns are not honored during insertion. For 
> example
> {code}
> insert overwrite table over1k_part_orc partition(ds="foo", t) select 
> si,i,b,f,t from over1k_orc where t is null or t=27 order by si;
> {code}
> the select query performs order by on column 'si' in the first MR job. The 
> following MR job (inserted by HIVE-6455), sorts the input data on dynamic 
> partition column 't' without taking into account the already sorted 'si' 
> column. This results in out of order insertion for 'si' column.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6883) Dynamic partitioning optimization does not honor sort order or order by

2014-04-10 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-6883:
-

Fix Version/s: (was: 0.13.0)
   0.14.0

> Dynamic partitioning optimization does not honor sort order or order by
> ---
>
> Key: HIVE-6883
> URL: https://issues.apache.org/jira/browse/HIVE-6883
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>Priority: Critical
> Attachments: HIVE-6883.1.patch
>
>
> HIVE-6455 patch does not honor sort order of the output table or order by of 
> select statement. The reason for the former is numDistributionKey in 
> ReduceSinkDesc is set wrongly. It doesn't take into account the sort columns, 
> because of this RSOp sets the sort columns to null in Key. Since nulls are 
> set in place of sort columns in Key, the sort columns in Value are not 
> sorted. 
> The other issue is ORDER BY columns are not honored during insertion. For 
> example
> {code}
> insert overwrite table over1k_part_orc partition(ds="foo", t) select 
> si,i,b,f,t from over1k_orc where t is null or t=27 order by si;
> {code}
> the select query performs order by on column 'si' in the first MR job. The 
> following MR job (inserted by HIVE-6455), sorts the input data on dynamic 
> partition column 't' without taking into account the already sorted 'si' 
> column. This results in out of order insertion for 'si' column.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6879) Vectorization: IsNull returns incorrect output column.

2014-04-10 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965770#comment-13965770
 ] 

Jitendra Nath Pandey commented on HIVE-6879:


The failure is unrelated.

[~rhbutani] This should go to hive-0.13 as well. It causes some queries to 
crash.

> Vectorization: IsNull returns incorrect output column.
> --
>
> Key: HIVE-6879
> URL: https://issues.apache.org/jira/browse/HIVE-6879
> Project: Hive
>  Issue Type: Bug
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HIVE-6879.1.patch
>
>
> IsNull returns -1 as output column.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6876) Logging information should include thread id

2014-04-10 Thread Harish Butani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965773#comment-13965773
 ] 

Harish Butani commented on HIVE-6876:
-

+1 for 0.13

> Logging information should include thread id
> 
>
> Key: HIVE-6876
> URL: https://issues.apache.org/jira/browse/HIVE-6876
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Metastore
>Affects Versions: 0.14.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
>Priority: Trivial
> Attachments: HIVE-6876.1.patch
>
>
> The multi-threaded nature of hive server and remote metastore makes it 
> difficult to debug issues without enabling thread information. It would be 
> nice to have the thread id in the logs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6816) jar upload path w/o schema is not handled correctly

2014-04-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6816:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

committed to trunk and 0.13, I ran some tests locally; only affects Tez path. 
This is an important bug to fix in 13.

> jar upload path w/o schema is not handled correctly
> ---
>
> Key: HIVE-6816
> URL: https://issues.apache.org/jira/browse/HIVE-6816
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 0.13.0
>
> Attachments: HIVE-6816.01.patch, HIVE-6816.patch
>
>
> {noformat}
> java.io.IOException: java.net.URISyntaxException: Expected scheme name at 
> index 0: 
> :///user/sershe/hive-exec-0.14.0-SNAPSHOT-5a31b3483b29ad46db705a47893898b2b9f5b7ce3c65f0641bbecca2b1201d81.jar
>   at 
> org.apache.tez.client.TezClientUtils.setupDAGCredentials(TezClientUtils.java:304)
>   at org.apache.tez.client.TezSession.submitDAG(TezSession.java:202)
>   at org.apache.tez.client.TezSession.submitDAG(TezSession.java:154)
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:294)
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:147)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1473)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1240)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1058)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:885)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:875)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:687)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> Caused by: java.net.URISyntaxException: Expected scheme name at index 0: 
> :///user/sershe/hive-exec-0.14.0-SNAPSHOT-5a31b3483b29ad46db705a47893898b2b9f5b7ce3c65f0641bbecca2b1201d81.jar
>   at java.net.URI$Parser.fail(URI.java:2829)
>   at java.net.URI$Parser.failExpecting(URI.java:2835)
>   at java.net.URI$Parser.parse(URI.java:3027)
>   at java.net.URI.(URI.java:753)
>   at 
> org.apache.hadoop.yarn.util.ConverterUtils.getPathFromYarnURL(ConverterUtils.java:80)
>   at 
> org.apache.tez.client.TezClientUtils.setupDAGCredentials(TezClientUtils.java:296)
>   ... 22 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6883) Dynamic partitioning optimization does not honor sort order or order by

2014-04-10 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-6883:
-

Attachment: HIVE-6883.2.patch

orc_analyze.q test was failing in hadoop-2. Due to inconsistency in between 
hadoop-1 and hadoop-2 added order by to the test cases.

> Dynamic partitioning optimization does not honor sort order or order by
> ---
>
> Key: HIVE-6883
> URL: https://issues.apache.org/jira/browse/HIVE-6883
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>Priority: Critical
> Attachments: HIVE-6883.1.patch, HIVE-6883.2.patch
>
>
> HIVE-6455 patch does not honor sort order of the output table or order by of 
> select statement. The reason for the former is numDistributionKey in 
> ReduceSinkDesc is set wrongly. It doesn't take into account the sort columns, 
> because of this RSOp sets the sort columns to null in Key. Since nulls are 
> set in place of sort columns in Key, the sort columns in Value are not 
> sorted. 
> The other issue is ORDER BY columns are not honored during insertion. For 
> example
> {code}
> insert overwrite table over1k_part_orc partition(ds="foo", t) select 
> si,i,b,f,t from over1k_orc where t is null or t=27 order by si;
> {code}
> the select query performs order by on column 'si' in the first MR job. The 
> following MR job (inserted by HIVE-6455), sorts the input data on dynamic 
> partition column 't' without taking into account the already sorted 'si' 
> column. This results in out of order insertion for 'si' column.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-6888) Hive leaks MapWork objects via Utilities::gWorkMap

2014-04-10 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-6888:
--

 Summary: Hive leaks MapWork objects via Utilities::gWorkMap
 Key: HIVE-6888
 URL: https://issues.apache.org/jira/browse/HIVE-6888
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Gunther Hagleitner


When running multiple queries with hive on a single Application Master, we 
found that hive leaks a large number of MapWork objects which accumulate in the 
AM



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6888) Hive leaks MapWork objects via Utilities::gWorkMap

2014-04-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6888:
---

Attachment: HIVE-6888.patch

> Hive leaks MapWork objects via Utilities::gWorkMap
> --
>
> Key: HIVE-6888
> URL: https://issues.apache.org/jira/browse/HIVE-6888
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Gunther Hagleitner
> Fix For: 0.13.0
>
> Attachments: HIVE-6888.patch
>
>
> When running multiple queries with hive on a single Application Master, we 
> found that hive leaks a large number of MapWork objects which accumulate in 
> the AM



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6888) Hive leaks MapWork objects via Utilities::gWorkMap

2014-04-10 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965821#comment-13965821
 ] 

Sergey Shelukhin commented on HIVE-6888:


Found by [~t3rmin4t0r], and patch initially provided by [~hagleitn]. Both are 
on vacation so I will attach patch here so that this issue doesn't get lost

> Hive leaks MapWork objects via Utilities::gWorkMap
> --
>
> Key: HIVE-6888
> URL: https://issues.apache.org/jira/browse/HIVE-6888
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Gunther Hagleitner
> Fix For: 0.13.0
>
> Attachments: HIVE-6888.patch
>
>
> When running multiple queries with hive on a single Application Master, we 
> found that hive leaks a large number of MapWork objects which accumulate in 
> the AM



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6888) Hive leaks MapWork objects via Utilities::gWorkMap

2014-04-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6888:
---

Fix Version/s: 0.13.0
   Status: Patch Available  (was: Open)

> Hive leaks MapWork objects via Utilities::gWorkMap
> --
>
> Key: HIVE-6888
> URL: https://issues.apache.org/jira/browse/HIVE-6888
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Gunther Hagleitner
> Fix For: 0.13.0
>
> Attachments: HIVE-6888.patch
>
>
> When running multiple queries with hive on a single Application Master, we 
> found that hive leaks a large number of MapWork objects which accumulate in 
> the AM



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6879) Vectorization: IsNull returns incorrect output column.

2014-04-10 Thread Harish Butani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965867#comment-13965867
 ] 

Harish Butani commented on HIVE-6879:
-

+1 for 0.13

> Vectorization: IsNull returns incorrect output column.
> --
>
> Key: HIVE-6879
> URL: https://issues.apache.org/jira/browse/HIVE-6879
> Project: Hive
>  Issue Type: Bug
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HIVE-6879.1.patch
>
>
> IsNull returns -1 as output column.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6864) HiveServer2 concurrency uses incorrect user information in unsecured mode

2014-04-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965882#comment-13965882
 ] 

Hive QA commented on HIVE-6864:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12639637/HIVE-6864.2.patch

{color:green}SUCCESS:{color} +1 5571 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2208/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2208/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12639637

> HiveServer2 concurrency uses incorrect user information in unsecured mode
> -
>
> Key: HIVE-6864
> URL: https://issues.apache.org/jira/browse/HIVE-6864
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.13.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>Priority: Blocker
> Fix For: 0.13.0
>
> Attachments: HIVE-6864.1.patch, HIVE-6864.2.patch
>
>
> Concurrent queries create table with wrong ownership



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6883) Dynamic partitioning optimization does not honor sort order or order by

2014-04-10 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965906#comment-13965906
 ] 

Vikram Dixit K commented on HIVE-6883:
--

+1 LGTM

> Dynamic partitioning optimization does not honor sort order or order by
> ---
>
> Key: HIVE-6883
> URL: https://issues.apache.org/jira/browse/HIVE-6883
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>Priority: Critical
> Attachments: HIVE-6883.1.patch, HIVE-6883.2.patch
>
>
> HIVE-6455 patch does not honor sort order of the output table or order by of 
> select statement. The reason for the former is numDistributionKey in 
> ReduceSinkDesc is set wrongly. It doesn't take into account the sort columns, 
> because of this RSOp sets the sort columns to null in Key. Since nulls are 
> set in place of sort columns in Key, the sort columns in Value are not 
> sorted. 
> The other issue is ORDER BY columns are not honored during insertion. For 
> example
> {code}
> insert overwrite table over1k_part_orc partition(ds="foo", t) select 
> si,i,b,f,t from over1k_orc where t is null or t=27 order by si;
> {code}
> the select query performs order by on column 'si' in the first MR job. The 
> following MR job (inserted by HIVE-6455), sorts the input data on dynamic 
> partition column 't' without taking into account the already sorted 'si' 
> column. This results in out of order insertion for 'si' column.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6873) DISTINCT clause in aggregates is handled incorrectly by vectorized execution

2014-04-10 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965915#comment-13965915
 ] 

Jitendra Nath Pandey commented on HIVE-6873:


Here is a scenario where we get incorrect result. It shows up on sorted 
bucketed column with hive.map.groupby.sorted=true, and only on group by queries 
with no keys.

Here are the steps:

hive> Create table T(a int, b int) clustered by (a) sorted by (a) stored as orc;

load following data:
300  1
300  1
300  1
300  1
300  1

hive> set hive.map.groupby.sorted=true;

hive> select sum(distinct a) from T;  // Incorrect result.
hive> select count(distinct a) from T;  // This is also incorrect.



> DISTINCT clause in aggregates is handled incorrectly by vectorized execution
> 
>
> Key: HIVE-6873
> URL: https://issues.apache.org/jira/browse/HIVE-6873
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0, 0.14.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-6873.1.patch, HIVE-6873.2.patch
>
>
> The vectorized aggregates ignore the DISTINCT clause. This cause incorrect 
> results. Due to how GroupByOperatorDesc adds the DISTINCT keys to the overall 
> aggregate keys the vectorized aggregates do account for the extra key, but 
> they do not process the data correctly for the key. the reduce side the 
> aggregates the input from the vectorized map side to results that are only 
> sometimes correct but mostly incorrect. HIVE-4607 tracks the proper fix, but 
> meantime I'm filing a bug to disable vectorized execution if DISTINCT is 
> present. Fix is trivial.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6883) Dynamic partitioning optimization does not honor sort order or order by

2014-04-10 Thread Harish Butani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965930#comment-13965930
 ] 

Harish Butani commented on HIVE-6883:
-

+1 for 0.13

> Dynamic partitioning optimization does not honor sort order or order by
> ---
>
> Key: HIVE-6883
> URL: https://issues.apache.org/jira/browse/HIVE-6883
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>Priority: Critical
> Attachments: HIVE-6883.1.patch, HIVE-6883.2.patch
>
>
> HIVE-6455 patch does not honor sort order of the output table or order by of 
> select statement. The reason for the former is numDistributionKey in 
> ReduceSinkDesc is set wrongly. It doesn't take into account the sort columns, 
> because of this RSOp sets the sort columns to null in Key. Since nulls are 
> set in place of sort columns in Key, the sort columns in Value are not 
> sorted. 
> The other issue is ORDER BY columns are not honored during insertion. For 
> example
> {code}
> insert overwrite table over1k_part_orc partition(ds="foo", t) select 
> si,i,b,f,t from over1k_orc where t is null or t=27 order by si;
> {code}
> the select query performs order by on column 'si' in the first MR job. The 
> following MR job (inserted by HIVE-6455), sorts the input data on dynamic 
> partition column 't' without taking into account the already sorted 'si' 
> column. This results in out of order insertion for 'si' column.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-6889) Change the default scratch dir permission to 777

2014-04-10 Thread Vaibhav Gumashta (JIRA)
Vaibhav Gumashta created HIVE-6889:
--

 Summary: Change the default scratch dir permission to 777
 Key: HIVE-6889
 URL: https://issues.apache.org/jira/browse/HIVE-6889
 Project: Hive
  Issue Type: Bug
  Components: CLI, HiveServer2
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta


There could be several conflicts while creating scratch dirs when both CLI and 
HiveServer2 is used. The jiras created to address these are: HIVE-6847, 
HIVE-6627 and HIVE-6626. However, till then the default value for scratch dir 
permission should be 777 instead of 700.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6880) TestHWISessionManager fails with -Phadoop-2

2014-04-10 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6880:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Jason!

> TestHWISessionManager fails with -Phadoop-2
> ---
>
> Key: HIVE-6880
> URL: https://issues.apache.org/jira/browse/HIVE-6880
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 0.13.0
>Reporter: Jason Dere
>Assignee: Jason Dere
> Fix For: 0.14.0
>
> Attachments: HIVE-6880.1.patch
>
>
> Looks like dependencies missing for -Phadoop-2
> {noformat}
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.213 sec <<< 
> FAILURE! - in org.apache.hadoop.hive.hwi.TestHWISessionManager
> warning(junit.framework.TestSuite$1)  Time elapsed: 0.009 sec  <<< FAILURE!
> junit.framework.AssertionFailedError: Exception in constructor: 
> testHiveDriver (java.lang.NoClassDefFoundError: 
> org/apache/hadoop/mapreduce/TaskAttemptContext
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:171)
>   at 
> org.apache.hadoop.hive.shims.ShimLoader.createShim(ShimLoader.java:120)
>   at 
> org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:115)
>   at 
> org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:80)
>   at 
> org.apache.hadoop.hive.conf.HiveConf$ConfVars.(HiveConf.java:248)
>   at org.apache.hadoop.hive.conf.HiveConf.(HiveConf.java:81)
>   at 
> org.apache.hadoop.hive.hwi.TestHWISessionManager.(TestHWISessionManager.java:46)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>   at junit.framework.TestSuite.createTest(TestSuite.java:65)
>   at junit.framework.TestSuite.addTestMethod(TestSuite.java:294)
>   at junit.framework.TestSuite.addTestsFromTestCase(TestSuite.java:150)
>   at junit.framework.TestSuite.(TestSuite.java:129)
>   at 
> org.junit.internal.runners.JUnit38ClassRunner.(JUnit38ClassRunner.java:71)
>   at 
> org.junit.internal.builders.JUnit3Builder.runnerForClass(JUnit3Builder.java:14)
>   at 
> org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:57)
>   at 
> org.junit.internal.builders.AllDefaultPossibilitiesBuilder.runnerForClass(AllDefaultPossibilitiesBuilder.java:29)
>   at 
> org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:57)
>   at 
> org.junit.internal.requests.ClassRequest.getRunner(ClassRequest.java:24)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:262)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.mapreduce.TaskAttemptContext
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>   ... 28 more
> )
>   at junit.framework.Assert.fail(Assert.java:50)
>   at junit.framework.TestSuite$1.runTest(TestSuite.java:97)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6877) TestOrcRawRecordMerger is deleting test.tmp.dir

2014-04-10 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6877:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Jason!

> TestOrcRawRecordMerger is deleting test.tmp.dir
> ---
>
> Key: HIVE-6877
> URL: https://issues.apache.org/jira/browse/HIVE-6877
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Reporter: Jason Dere
>Assignee: Jason Dere
> Fix For: 0.14.0
>
> Attachments: HIVE-6877.1.patch
>
>
> TestOrcRawRecordMerger seems to be deleting the directory pointed to by 
> test.tmp.dir.  This can cause some failures in any tests that run after this 
> test if they need to use any files in the tmp dir such as conf files or 
> creating Hive tables.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6319) Insert, update, delete functionality needs a compactor

2014-04-10 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-6319:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

I just committed this. Thanks, Alan!

> Insert, update, delete functionality needs a compactor
> --
>
> Key: HIVE-6319
> URL: https://issues.apache.org/jira/browse/HIVE-6319
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Alan Gates
>Assignee: Alan Gates
> Fix For: 0.13.0
>
> Attachments: 6319.wip.patch, HIVE-6319.patch, HIVE-6319.patch, 
> HIVE-6319.patch, HIVE-6319.patch, HIVE-6319.patch, HiveCompactorDesign.pdf
>
>
> In order to keep the number of delta files from spiraling out of control we 
> need a compactor to collect these delta files together, and eventually 
> rewrite the base file when the deltas get large enough.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6871) Build fixes to allow Windows to run TestCliDriver

2014-04-10 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6871:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Jason!

> Build fixes to allow Windows to run TestCliDriver
> -
>
> Key: HIVE-6871
> URL: https://issues.apache.org/jira/browse/HIVE-6871
> Project: Hive
>  Issue Type: Bug
>  Components: Build Infrastructure, Windows
>Reporter: Jason Dere
>Assignee: Jason Dere
> Fix For: 0.14.0
>
> Attachments: HIVE-6871.1.patch
>
>
> Some of the Java properties have been changed or set differently due to the 
> Mavenization of the Hive build, and it looks like this is causing some issues 
> with the Windows unit tests.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6864) HiveServer2 concurrency uses incorrect user information in unsecured mode

2014-04-10 Thread Harish Butani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965945#comment-13965945
 ] 

Harish Butani commented on HIVE-6864:
-

+1 for 0.13

> HiveServer2 concurrency uses incorrect user information in unsecured mode
> -
>
> Key: HIVE-6864
> URL: https://issues.apache.org/jira/browse/HIVE-6864
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.13.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>Priority: Blocker
> Fix For: 0.13.0
>
> Attachments: HIVE-6864.1.patch, HIVE-6864.2.patch
>
>
> Concurrent queries create table with wrong ownership



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6870) Fix maven.repo.local setting in Hive build

2014-04-10 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6870:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Jason!

> Fix maven.repo.local setting in Hive build
> --
>
> Key: HIVE-6870
> URL: https://issues.apache.org/jira/browse/HIVE-6870
> Project: Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Reporter: Jason Dere
>Assignee: Jason Dere
> Fix For: 0.14.0
>
> Attachments: HIVE-6870.1.patch
>
>
> The pom.xml currently assumes maven.repo.local should be 
> ${user.home}/.m2/repository.  If the user has overridden the local repository 
> through Maven settings, tests which assume the hive-exec JAR is at 
> ${user.home}/.m2/repository will fail because the artifacts will not be 
> installed at that location.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6864) HiveServer2 concurrency uses incorrect user information in unsecured mode

2014-04-10 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-6864:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch committed to 0.13 branch and trunk.


> HiveServer2 concurrency uses incorrect user information in unsecured mode
> -
>
> Key: HIVE-6864
> URL: https://issues.apache.org/jira/browse/HIVE-6864
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.13.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>Priority: Blocker
> Fix For: 0.13.0
>
> Attachments: HIVE-6864.1.patch, HIVE-6864.2.patch
>
>
> Concurrent queries create table with wrong ownership



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6887) Add missing params to hive-default.xml.template

2014-04-10 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965969#comment-13965969
 ] 

Thejas M Nair commented on HIVE-6887:
-

+1 LGTM



> Add missing params to hive-default.xml.template 
> 
>
> Key: HIVE-6887
> URL: https://issues.apache.org/jira/browse/HIVE-6887
> Project: Hive
>  Issue Type: Bug
>Reporter: Harish Butani
> Attachments: HIVE-6887.1.patch
>
>
> Add in the ones that were added to HiveConf, but not the template.xml file; 
> For 0.13 we will not be moving to HIVE-6037 style of genning the template 
> file. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HIVE-6503) document pluggable authentication modules (PAM) in template config, wiki

2014-04-10 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta resolved HIVE-6503.


Resolution: Duplicate

> document pluggable authentication modules (PAM) in template config, wiki
> 
>
> Key: HIVE-6503
> URL: https://issues.apache.org/jira/browse/HIVE-6503
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.13.0
>Reporter: Thejas M Nair
>Assignee: Vaibhav Gumashta
> Fix For: 0.13.0
>
> Attachments: HIVE-6503.1.patch
>
>
> HIVE-6466 adds support for "PAM" as a supported value for 
> hive.server2.authentication. 
> It also adds a config parameter hive.server2.authentication.pam.services.
> The default template file needs to be updated to document these. The wiki 
> docs should also document the support for pluggable authentication modules.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6503) document pluggable authentication modules (PAM) in template config, wiki

2014-04-10 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965975#comment-13965975
 ] 

Vaibhav Gumashta commented on HIVE-6503:


Okay, so this seems to be going in as part of HIVE-6887. I'll close this jira.

> document pluggable authentication modules (PAM) in template config, wiki
> 
>
> Key: HIVE-6503
> URL: https://issues.apache.org/jira/browse/HIVE-6503
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.13.0
>Reporter: Thejas M Nair
>Assignee: Vaibhav Gumashta
> Fix For: 0.13.0
>
> Attachments: HIVE-6503.1.patch
>
>
> HIVE-6466 adds support for "PAM" as a supported value for 
> hive.server2.authentication. 
> It also adds a config parameter hive.server2.authentication.pam.services.
> The default template file needs to be updated to document these. The wiki 
> docs should also document the support for pluggable authentication modules.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6888) Hive leaks MapWork objects via Utilities::gWorkMap

2014-04-10 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965985#comment-13965985
 ] 

Vikram Dixit K commented on HIVE-6888:
--

+1

> Hive leaks MapWork objects via Utilities::gWorkMap
> --
>
> Key: HIVE-6888
> URL: https://issues.apache.org/jira/browse/HIVE-6888
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Gunther Hagleitner
> Fix For: 0.13.0
>
> Attachments: HIVE-6888.patch
>
>
> When running multiple queries with hive on a single Application Master, we 
> found that hive leaks a large number of MapWork objects which accumulate in 
> the AM



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6847) Improve Hive scratch dir setup

2014-04-10 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-6847:
---

Summary: Improve Hive scratch dir setup  (was: Hive server concurrency has 
issues with regard to temp directories)

> Improve Hive scratch dir setup
> --
>
> Key: HIVE-6847
> URL: https://issues.apache.org/jira/browse/HIVE-6847
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.14.0
>Reporter: Vikram Dixit K
>Assignee: Vaibhav Gumashta
>
> Currently, the hive server creates scratch directory and changes permission 
> to 777 however, this is not great with respect to security. We need to create 
> user specific scratch directories instead. Also refer to HIVE-6782 1st 
> iteration of the patch for approach.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6847) Improve Hive scratch dir setup

2014-04-10 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-6847:
---

Component/s: CLI

> Improve Hive scratch dir setup
> --
>
> Key: HIVE-6847
> URL: https://issues.apache.org/jira/browse/HIVE-6847
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, HiveServer2
>Affects Versions: 0.14.0
>Reporter: Vikram Dixit K
>Assignee: Vaibhav Gumashta
>
> Currently, the hive server creates scratch directory and changes permission 
> to 777 however, this is not great with respect to security. We need to create 
> user specific scratch directories instead. Also refer to HIVE-6782 1st 
> iteration of the patch for approach.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6626) Hive does not expand the DOWNLOADED_RESOURCES_DIR path

2014-04-10 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-6626:
---

Issue Type: Sub-task  (was: Bug)
Parent: HIVE-6847

> Hive does not expand the DOWNLOADED_RESOURCES_DIR path
> --
>
> Key: HIVE-6626
> URL: https://issues.apache.org/jira/browse/HIVE-6626
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 0.13.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Fix For: 0.14.0
>
>
> The downloaded scratch dir is specified in HiveConf as:
> {code}
> DOWNLOADED_RESOURCES_DIR("hive.downloaded.resources.dir", 
> System.getProperty("java.io.tmpdir") + File.separator  + 
> "${hive.session.id}_resources"),
> {code}
> However, hive.session.id  does not get expanded.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6626) Hive does not expand the DOWNLOADED_RESOURCES_DIR path

2014-04-10 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-6626:
---

Summary: Hive does not expand the DOWNLOADED_RESOURCES_DIR path  (was: 
HiveServer2 does not expand the DOWNLOADED_RESOURCES_DIR path)

> Hive does not expand the DOWNLOADED_RESOURCES_DIR path
> --
>
> Key: HIVE-6626
> URL: https://issues.apache.org/jira/browse/HIVE-6626
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Fix For: 0.14.0
>
>
> The downloaded scratch dir is specified in HiveConf as:
> {code}
> DOWNLOADED_RESOURCES_DIR("hive.downloaded.resources.dir", 
> System.getProperty("java.io.tmpdir") + File.separator  + 
> "${hive.session.id}_resources"),
> {code}
> However, hive.session.id  does not get expanded.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6627) HiveServer2 should handle scratch dir permissions / errors in a better way

2014-04-10 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-6627:
---

Issue Type: Sub-task  (was: Bug)
Parent: HIVE-6847

> HiveServer2 should handle scratch dir permissions / errors in a better way
> --
>
> Key: HIVE-6627
> URL: https://issues.apache.org/jira/browse/HIVE-6627
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 0.13.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Fix For: 0.14.0
>
>
> Should do the following:
> if permission is 777 , we don't need to try changing permissions
> 1. If owner change, the permissions to 777 for all 3 scratch dirs (if they 
> don't exist create).
> 2. Else throw a meaningful permission denied error and exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6847) Improve / fix bugs in Hive scratch dir setup

2014-04-10 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-6847:
---

Summary: Improve / fix bugs in Hive scratch dir setup  (was: Improve Hive 
scratch dir setup)

> Improve / fix bugs in Hive scratch dir setup
> 
>
> Key: HIVE-6847
> URL: https://issues.apache.org/jira/browse/HIVE-6847
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, HiveServer2
>Affects Versions: 0.14.0
>Reporter: Vikram Dixit K
>Assignee: Vaibhav Gumashta
>
> Currently, the hive server creates scratch directory and changes permission 
> to 777 however, this is not great with respect to security. We need to create 
> user specific scratch directories instead. Also refer to HIVE-6782 1st 
> iteration of the patch for approach.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6889) Change the default scratch dir permission to 777

2014-04-10 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965994#comment-13965994
 ] 

Owen O'Malley commented on HIVE-6889:
-

This seems to introduce a really large set of security holes. Scratch 
directories shouldn't be 777.

> Change the default scratch dir permission to 777
> 
>
> Key: HIVE-6889
> URL: https://issues.apache.org/jira/browse/HIVE-6889
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, HiveServer2
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>
> There could be several conflicts while creating scratch dirs when both CLI 
> and HiveServer2 is used. The jiras created to address these are: HIVE-6847, 
> HIVE-6627 and HIVE-6626. However, till then the default value for scratch dir 
> permission should be 777 instead of 700.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HIVE-6889) Change the default scratch dir permission to 777

2014-04-10 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta resolved HIVE-6889.


Resolution: Invalid

> Change the default scratch dir permission to 777
> 
>
> Key: HIVE-6889
> URL: https://issues.apache.org/jira/browse/HIVE-6889
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, HiveServer2
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>
> There could be several conflicts while creating scratch dirs when both CLI 
> and HiveServer2 is used. The jiras created to address these are: HIVE-6847, 
> HIVE-6627 and HIVE-6626. However, till then the default value for scratch dir 
> permission should be 777 instead of 700.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6681) Describe table sometimes shows "from deserializer" for column comments

2014-04-10 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965997#comment-13965997
 ] 

Ashutosh Chauhan commented on HIVE-6681:


I think this config is too specific for a particular scenario. Its useful only 
for hive-devs. I dont think its useful to be exposed to end users, so I will 
avoid documenting it. 

> Describe table sometimes shows "from deserializer" for column comments
> --
>
> Key: HIVE-6681
> URL: https://issues.apache.org/jira/browse/HIVE-6681
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Serializers/Deserializers
>Affects Versions: 0.11.0, 0.12.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 0.13.0
>
> Attachments: HIVE-6681.2.patch, HIVE-6681.3.patch, HIVE-6681.4.patch, 
> HIVE-6681.5.patch, HIVE-6681.6.patch, HIVE-6681.7.patch, HIVE-6681.8.patch, 
> HIVE-6681.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6889) Change the default scratch dir permission to 777

2014-04-10 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965996#comment-13965996
 ] 

Vaibhav Gumashta commented on HIVE-6889:


I agree, I think this is an invalid work around.

> Change the default scratch dir permission to 777
> 
>
> Key: HIVE-6889
> URL: https://issues.apache.org/jira/browse/HIVE-6889
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, HiveServer2
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>
> There could be several conflicts while creating scratch dirs when both CLI 
> and HiveServer2 is used. The jiras created to address these are: HIVE-6847, 
> HIVE-6627 and HIVE-6626. However, till then the default value for scratch dir 
> permission should be 777 instead of 700.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6873) DISTINCT clause in aggregates is handled incorrectly by vectorized execution

2014-04-10 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-6873:
---

Attachment: HIVE-6873.3.patch

The attached patch fixes the issue by disabling the optimization for 
sorted/bucketed columns if vectorization is on. This change impacts only the 
specific scenario being hit in the example above. A test is also added to 
reproduce the above scenario, and another test for normal distinct case.

> DISTINCT clause in aggregates is handled incorrectly by vectorized execution
> 
>
> Key: HIVE-6873
> URL: https://issues.apache.org/jira/browse/HIVE-6873
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0, 0.14.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-6873.1.patch, HIVE-6873.2.patch, HIVE-6873.3.patch
>
>
> The vectorized aggregates ignore the DISTINCT clause. This cause incorrect 
> results. Due to how GroupByOperatorDesc adds the DISTINCT keys to the overall 
> aggregate keys the vectorized aggregates do account for the extra key, but 
> they do not process the data correctly for the key. the reduce side the 
> aggregates the input from the vectorized map side to results that are only 
> sometimes correct but mostly incorrect. HIVE-4607 tracks the proper fix, but 
> meantime I'm filing a bug to disable vectorized execution if DISTINCT is 
> present. Fix is trivial.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 19984: Beeline should accept -i option to Initializing a SQL file

2014-04-10 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19984/#review40101
---



beeline/src/java/org/apache/hive/beeline/BeeLine.java


What's the need for this?



beeline/src/main/resources/BeeLine.properties


Hive CLI seems honoring init file even for -e option. See CliDriver.java

From the patch, it seems that init file is executed even with -e option. 
Could you verify?


- Xuefu Zhang


On April 8, 2014, 2:07 a.m., Navis Ryu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19984/
> ---
> 
> (Updated April 8, 2014, 2:07 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-6561
> https://issues.apache.org/jira/browse/HIVE-6561
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Hive CLI has -i option. From Hive CLI help:
> {code}
> ...
>  -i Initialization SQL file
> ...
> {code}
> 
> However, Beeline has no such option:
> {code}
> xzhang@xzlt:~/apa/hive3$ 
> ./packaging/target/apache-hive-0.14.0-SNAPSHOT-bin/apache-hive-0.14.0-SNAPSHOT-bin/bin/beeline
>  -u jdbc:hive2:// -i hive.rc
> ...
> Connected to: Apache Hive (version 0.14.0-SNAPSHOT)
> Driver: Hive JDBC (version 0.14.0-SNAPSHOT)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> -i (No such file or directory)
> Property "url" is required
> Beeline version 0.14.0-SNAPSHOT by Apache Hive
> ...
> {code}
> 
> 
> Diffs
> -
> 
>   beeline/src/java/org/apache/hive/beeline/BeeLine.java 5773109 
>   beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java 44cabdf 
>   beeline/src/java/org/apache/hive/beeline/Commands.java 493f963 
>   beeline/src/main/resources/BeeLine.properties 697c29a 
> 
> Diff: https://reviews.apache.org/r/19984/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Navis Ryu
> 
>



[jira] [Updated] (HIVE-6873) DISTINCT clause in aggregates is handled incorrectly by vectorized execution

2014-04-10 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-6873:
---

Status: Patch Available  (was: Open)

> DISTINCT clause in aggregates is handled incorrectly by vectorized execution
> 
>
> Key: HIVE-6873
> URL: https://issues.apache.org/jira/browse/HIVE-6873
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0, 0.14.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-6873.1.patch, HIVE-6873.2.patch, HIVE-6873.3.patch
>
>
> The vectorized aggregates ignore the DISTINCT clause. This cause incorrect 
> results. Due to how GroupByOperatorDesc adds the DISTINCT keys to the overall 
> aggregate keys the vectorized aggregates do account for the extra key, but 
> they do not process the data correctly for the key. the reduce side the 
> aggregates the input from the vectorized map side to results that are only 
> sometimes correct but mostly incorrect. HIVE-4607 tracks the proper fix, but 
> meantime I'm filing a bug to disable vectorized execution if DISTINCT is 
> present. Fix is trivial.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6873) DISTINCT clause in aggregates is handled incorrectly by vectorized execution

2014-04-10 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-6873:
---

Status: Open  (was: Patch Available)

> DISTINCT clause in aggregates is handled incorrectly by vectorized execution
> 
>
> Key: HIVE-6873
> URL: https://issues.apache.org/jira/browse/HIVE-6873
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0, 0.14.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-6873.1.patch, HIVE-6873.2.patch, HIVE-6873.3.patch
>
>
> The vectorized aggregates ignore the DISTINCT clause. This cause incorrect 
> results. Due to how GroupByOperatorDesc adds the DISTINCT keys to the overall 
> aggregate keys the vectorized aggregates do account for the extra key, but 
> they do not process the data correctly for the key. the reduce side the 
> aggregates the input from the vectorized map side to results that are only 
> sometimes correct but mostly incorrect. HIVE-4607 tracks the proper fix, but 
> meantime I'm filing a bug to disable vectorized execution if DISTINCT is 
> present. Fix is trivial.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6561) Beeline should accept -i option to Initializing a SQL file

2014-04-10 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13966001#comment-13966001
 ] 

Xuefu Zhang commented on HIVE-6561:
---

Patch looks good. I had a couple of minor comments on RB.

> Beeline should accept -i option to Initializing a SQL file
> --
>
> Key: HIVE-6561
> URL: https://issues.apache.org/jira/browse/HIVE-6561
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.10.0, 0.11.0, 0.12.0
>Reporter: Xuefu Zhang
>Assignee: Navis
> Attachments: HIVE-6561.1.patch.txt, HIVE-6561.2.patch.txt
>
>
> Hive CLI has -i option. From Hive CLI help:
> {code}
> ...
>  -i Initialization SQL file
> ...
> {code}
> However, Beeline has no such option:
> {code}
> xzhang@xzlt:~/apa/hive3$ 
> ./packaging/target/apache-hive-0.14.0-SNAPSHOT-bin/apache-hive-0.14.0-SNAPSHOT-bin/bin/beeline
>  -u jdbc:hive2:// -i hive.rc
> ...
> Connected to: Apache Hive (version 0.14.0-SNAPSHOT)
> Driver: Hive JDBC (version 0.14.0-SNAPSHOT)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> -i (No such file or directory)
> Property "url" is required
> Beeline version 0.14.0-SNAPSHOT by Apache Hive
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   >