[jira] [Commented] (HIVE-7250) Adaptive compression buffer size for wide tables in ORC

2014-06-20 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14038568#comment-14038568
 ] 

Lefty Leverenz commented on HIVE-7250:
--

No user doc?

> Adaptive compression buffer size for wide tables in ORC
> ---
>
> Key: HIVE-7250
> URL: https://issues.apache.org/jira/browse/HIVE-7250
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Affects Versions: 0.14.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: orcfile
> Fix For: 0.14.0
>
> Attachments: HIVE-7250.1.patch, HIVE-7250.2.patch, HIVE-7250.3.patch, 
> HIVE-7250.4.patch, HIVE-7250.5.patch
>
>
> If the input table is wide (in the order of 1000s), ORC compression buffer 
> size overhead becomes significant causing OOM issues. To overcome this issue, 
> buffer size should be adaptively chosen based on the available memory and the 
> number of columns.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7250) Adaptive compression buffer size for wide tables in ORC

2014-06-19 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14038286#comment-14038286
 ] 

Prasanth J commented on HIVE-7250:
--

Patch committed to trunk.

> Adaptive compression buffer size for wide tables in ORC
> ---
>
> Key: HIVE-7250
> URL: https://issues.apache.org/jira/browse/HIVE-7250
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Affects Versions: 0.14.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: orcfile
> Fix For: 0.14.0
>
> Attachments: HIVE-7250.1.patch, HIVE-7250.2.patch, HIVE-7250.3.patch, 
> HIVE-7250.4.patch, HIVE-7250.5.patch
>
>
> If the input table is wide (in the order of 1000s), ORC compression buffer 
> size overhead becomes significant causing OOM issues. To overcome this issue, 
> buffer size should be adaptively chosen based on the available memory and the 
> number of columns.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7250) Adaptive compression buffer size for wide tables in ORC

2014-06-19 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14038284#comment-14038284
 ] 

Prasanth J commented on HIVE-7250:
--

The recent patch does not change the outcome of unit tests.


> Adaptive compression buffer size for wide tables in ORC
> ---
>
> Key: HIVE-7250
> URL: https://issues.apache.org/jira/browse/HIVE-7250
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Affects Versions: 0.14.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: orcfile
> Fix For: 0.14.0
>
> Attachments: HIVE-7250.1.patch, HIVE-7250.2.patch, HIVE-7250.3.patch, 
> HIVE-7250.4.patch, HIVE-7250.5.patch
>
>
> If the input table is wide (in the order of 1000s), ORC compression buffer 
> size overhead becomes significant causing OOM issues. To overcome this issue, 
> buffer size should be adaptively chosen based on the available memory and the 
> number of columns.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7250) Adaptive compression buffer size for wide tables in ORC

2014-06-19 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14037271#comment-14037271
 ] 

Hive QA commented on HIVE-7250:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12651345/HIVE-7250.4.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5664 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/514/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/514/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-514/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12651345

> Adaptive compression buffer size for wide tables in ORC
> ---
>
> Key: HIVE-7250
> URL: https://issues.apache.org/jira/browse/HIVE-7250
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Affects Versions: 0.14.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: orcfile
> Attachments: HIVE-7250.1.patch, HIVE-7250.2.patch, HIVE-7250.3.patch, 
> HIVE-7250.4.patch
>
>
> If the input table is wide (in the order of 1000s), ORC compression buffer 
> size overhead becomes significant causing OOM issues. To overcome this issue, 
> buffer size should be adaptively chosen based on the available memory and the 
> number of columns.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7250) Adaptive compression buffer size for wide tables in ORC

2014-06-18 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14036773#comment-14036773
 ] 

Prasanth J commented on HIVE-7250:
--

The qfile test was added just to make sure the wide table creation runs with 
any OOM with the default heap settings. I will add a unit test in the next 
patch that will check for buffer sizes.

> Adaptive compression buffer size for wide tables in ORC
> ---
>
> Key: HIVE-7250
> URL: https://issues.apache.org/jira/browse/HIVE-7250
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Affects Versions: 0.14.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: orcfile
> Attachments: HIVE-7250.1.patch, HIVE-7250.2.patch, HIVE-7250.3.patch
>
>
> If the input table is wide (in the order of 1000s), ORC compression buffer 
> size overhead becomes significant causing OOM issues. To overcome this issue, 
> buffer size should be adaptively chosen based on the available memory and the 
> number of columns.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7250) Adaptive compression buffer size for wide tables in ORC

2014-06-18 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14036763#comment-14036763
 ] 

Gunther Hagleitner commented on HIVE-7250:
--

+1 although i was asking for some more testing of the logic through unit tests 
on rb

> Adaptive compression buffer size for wide tables in ORC
> ---
>
> Key: HIVE-7250
> URL: https://issues.apache.org/jira/browse/HIVE-7250
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Affects Versions: 0.14.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: orcfile
> Attachments: HIVE-7250.1.patch, HIVE-7250.2.patch, HIVE-7250.3.patch
>
>
> If the input table is wide (in the order of 1000s), ORC compression buffer 
> size overhead becomes significant causing OOM issues. To overcome this issue, 
> buffer size should be adaptively chosen based on the available memory and the 
> number of columns.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7250) Adaptive compression buffer size for wide tables in ORC

2014-06-18 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14036726#comment-14036726
 ] 

Gopal V commented on HIVE-7250:
---

LGTM +1 (NB)

> Adaptive compression buffer size for wide tables in ORC
> ---
>
> Key: HIVE-7250
> URL: https://issues.apache.org/jira/browse/HIVE-7250
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Affects Versions: 0.14.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: orcfile
> Attachments: HIVE-7250.1.patch, HIVE-7250.2.patch, HIVE-7250.3.patch
>
>
> If the input table is wide (in the order of 1000s), ORC compression buffer 
> size overhead becomes significant causing OOM issues. To overcome this issue, 
> buffer size should be adaptively chosen based on the available memory and the 
> number of columns.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7250) Adaptive compression buffer size for wide tables in ORC

2014-06-18 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14034991#comment-14034991
 ] 

Prasanth J commented on HIVE-7250:
--

Was able to load 15K columns of the test dataset similar to the qtest dataset 
in patch with default 1GB heap. 20K columns causes OOM.

> Adaptive compression buffer size for wide tables in ORC
> ---
>
> Key: HIVE-7250
> URL: https://issues.apache.org/jira/browse/HIVE-7250
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Affects Versions: 0.14.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: orcfile
> Attachments: HIVE-7250.1.patch, HIVE-7250.2.patch
>
>
> If the input table is wide (in the order of 1000s), ORC compression buffer 
> size overhead becomes significant causing OOM issues. To overcome this issue, 
> buffer size should be adaptively chosen based on the available memory and the 
> number of columns.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7250) Adaptive compression buffer size for wide tables in ORC

2014-06-18 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14034914#comment-14034914
 ] 

Prasanth J commented on HIVE-7250:
--

I tested the current patch with hive 0.11 and hive 0.12 versions for backward 
compatibility.

> Adaptive compression buffer size for wide tables in ORC
> ---
>
> Key: HIVE-7250
> URL: https://issues.apache.org/jira/browse/HIVE-7250
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Affects Versions: 0.14.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: orcfile
> Attachments: HIVE-7250.1.patch, HIVE-7250.2.patch
>
>
> If the input table is wide (in the order of 1000s), ORC compression buffer 
> size overhead becomes significant causing OOM issues. To overcome this issue, 
> buffer size should be adaptively chosen based on the available memory and the 
> number of columns.



--
This message was sent by Atlassian JIRA
(v6.2#6252)