[jira] [Commented] (HIVE-7231) Improve ORC padding

2014-08-07 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088990#comment-14088990
 ] 

Lefty Leverenz commented on HIVE-7231:
--

The wiki now documents *hive.exec.orc.default.block.size*, 
*hive.exec.orc.block.padding.tolerance*, and the changed default for 
*hive.exec.orc.default.stripe.size* in 0.14.0:

* [Configuration Properties -- hive.exec.orc.default.block.size | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.orc.default.block.size]
* [Configuration Properties -- hive.exec.orc.block.padding.tolerance | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.orc.block.padding.tolerance]
* [Configuration Properties -- hive.exec.orc.default.stripe.size | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.orc.default.stripe.size]

 Improve ORC padding
 ---

 Key: HIVE-7231
 URL: https://issues.apache.org/jira/browse/HIVE-7231
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.14.0

 Attachments: HIVE-7231.1.patch, HIVE-7231.2.patch, HIVE-7231.3.patch, 
 HIVE-7231.4.patch, HIVE-7231.5.patch, HIVE-7231.6.patch, HIVE-7231.7.patch, 
 HIVE-7231.8.patch


 Current ORC padding is not optimal because of fixed stripe sizes within 
 block. The padding overhead will be significant in some cases. Also padding 
 percentage relative to stripe size is not configurable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7231) Improve ORC padding

2014-07-07 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053398#comment-14053398
 ] 

Lefty Leverenz commented on HIVE-7231:
--

Facepalm!  Now that the patch is committed I've finally noticed that 
hive.exec.orc.block.padding.tolerance is not a percentage but a decimal 
fraction.  For example, with a 64 MB stripe size the default 0.05 gives 3.2 MB 
tolerance (0.05 * 64, not 0.05% of 64).

This is only a tech-writer's quibble which isn't likely to confuse anyone.  
I'll explain it in the wiki and put a request in HIVE-6586 to fix it with 
HIVE-6037.

 Improve ORC padding
 ---

 Key: HIVE-7231
 URL: https://issues.apache.org/jira/browse/HIVE-7231
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: TODOC14, orcfile
 Fix For: 0.14.0

 Attachments: HIVE-7231.1.patch, HIVE-7231.2.patch, HIVE-7231.3.patch, 
 HIVE-7231.4.patch, HIVE-7231.5.patch, HIVE-7231.6.patch, HIVE-7231.7.patch, 
 HIVE-7231.8.patch


 Current ORC padding is not optimal because of fixed stripe sizes within 
 block. The padding overhead will be significant in some cases. Also padding 
 percentage relative to stripe size is not configurable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7231) Improve ORC padding

2014-07-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053242#comment-14053242
 ] 

Hive QA commented on HIVE-7231:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12654227/HIVE-7231.8.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5677 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/683/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/683/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-683/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12654227

 Improve ORC padding
 ---

 Key: HIVE-7231
 URL: https://issues.apache.org/jira/browse/HIVE-7231
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Attachments: HIVE-7231.1.patch, HIVE-7231.2.patch, HIVE-7231.3.patch, 
 HIVE-7231.4.patch, HIVE-7231.5.patch, HIVE-7231.6.patch, HIVE-7231.7.patch, 
 HIVE-7231.8.patch


 Current ORC padding is not optimal because of fixed stripe sizes within 
 block. The padding overhead will be significant in some cases. Also padding 
 percentage relative to stripe size is not configurable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7231) Improve ORC padding

2014-07-06 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053253#comment-14053253
 ] 

Gopal V commented on HIVE-7231:
---

Test failures unrelated.

 Improve ORC padding
 ---

 Key: HIVE-7231
 URL: https://issues.apache.org/jira/browse/HIVE-7231
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Attachments: HIVE-7231.1.patch, HIVE-7231.2.patch, HIVE-7231.3.patch, 
 HIVE-7231.4.patch, HIVE-7231.5.patch, HIVE-7231.6.patch, HIVE-7231.7.patch, 
 HIVE-7231.8.patch


 Current ORC padding is not optimal because of fixed stripe sizes within 
 block. The padding overhead will be significant in some cases. Also padding 
 percentage relative to stripe size is not configurable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7231) Improve ORC padding

2014-07-02 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050998#comment-14050998
 ] 

Hive QA commented on HIVE-7231:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12653697/HIVE-7231.7.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5672 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/664/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/664/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-664/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12653697

 Improve ORC padding
 ---

 Key: HIVE-7231
 URL: https://issues.apache.org/jira/browse/HIVE-7231
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Attachments: HIVE-7231.1.patch, HIVE-7231.2.patch, HIVE-7231.3.patch, 
 HIVE-7231.4.patch, HIVE-7231.5.patch, HIVE-7231.6.patch, HIVE-7231.7.patch


 Current ORC padding is not optimal because of fixed stripe sizes within 
 block. The padding overhead will be significant in some cases. Also padding 
 percentage relative to stripe size is not configurable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7231) Improve ORC padding

2014-07-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048713#comment-14048713
 ] 

Hive QA commented on HIVE-7231:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12653314/HIVE-7231.6.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5671 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hive.minikdc.TestJdbcWithMiniKdcSQLAuthBinary.org.apache.hive.minikdc.TestJdbcWithMiniKdcSQLAuthBinary
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/645/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/645/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-645/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12653314

 Improve ORC padding
 ---

 Key: HIVE-7231
 URL: https://issues.apache.org/jira/browse/HIVE-7231
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Attachments: HIVE-7231.1.patch, HIVE-7231.2.patch, HIVE-7231.3.patch, 
 HIVE-7231.4.patch, HIVE-7231.5.patch, HIVE-7231.6.patch


 Current ORC padding is not optimal because of fixed stripe sizes within 
 block. The padding overhead will be significant in some cases. Also padding 
 percentage relative to stripe size is not configurable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7231) Improve ORC padding

2014-06-30 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048327#comment-14048327
 ] 

Gopal V commented on HIVE-7231:
---

With rebase  update to docs. it LGTM - +1

 Improve ORC padding
 ---

 Key: HIVE-7231
 URL: https://issues.apache.org/jira/browse/HIVE-7231
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Attachments: HIVE-7231.1.patch, HIVE-7231.2.patch, HIVE-7231.3.patch, 
 HIVE-7231.4.patch, HIVE-7231.5.patch


 Current ORC padding is not optimal because of fixed stripe sizes within 
 block. The padding overhead will be significant in some cases. Also padding 
 percentage relative to stripe size is not configurable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7231) Improve ORC padding

2014-06-30 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048378#comment-14048378
 ] 

Lefty Leverenz commented on HIVE-7231:
--

Woops, very sorry -- forgot to publish my second review, which requested 
clarification in the description of hive.exec.orc.block.padding.tolerance in 
HiveConf.java:

{code}
+// Define the tolerance for block padding. The total padded length will
+// always be less than the specified percentage.
{code}

My comment:

bq.  Should mention that it's a percentage of stripe size, because block 
padding sounds like percentage of block size.  Could also explain that block 
padding prevents stripes from straddling blocks.

But this isn't a show stopper.

 Improve ORC padding
 ---

 Key: HIVE-7231
 URL: https://issues.apache.org/jira/browse/HIVE-7231
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Attachments: HIVE-7231.1.patch, HIVE-7231.2.patch, HIVE-7231.3.patch, 
 HIVE-7231.4.patch, HIVE-7231.5.patch


 Current ORC padding is not optimal because of fixed stripe sizes within 
 block. The padding overhead will be significant in some cases. Also padding 
 percentage relative to stripe size is not configurable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7231) Improve ORC padding

2014-06-30 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048404#comment-14048404
 ] 

Gopal V commented on HIVE-7231:
---

[~leftylev]: HIVE-7231.5.patch has that clarified in hive-default.xml and with 
a default case (64Mb stripe, 256Mb block = 3.2Mb padding tolerance per 256Mb 
block).

 Improve ORC padding
 ---

 Key: HIVE-7231
 URL: https://issues.apache.org/jira/browse/HIVE-7231
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Attachments: HIVE-7231.1.patch, HIVE-7231.2.patch, HIVE-7231.3.patch, 
 HIVE-7231.4.patch, HIVE-7231.5.patch


 Current ORC padding is not optimal because of fixed stripe sizes within 
 block. The padding overhead will be significant in some cases. Also padding 
 percentage relative to stripe size is not configurable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7231) Improve ORC padding

2014-06-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048416#comment-14048416
 ] 

Hive QA commented on HIVE-7231:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12653268/HIVE-7231.5.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/635/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/635/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-635/

Messages:
{noformat}
 This message was trimmed, see log for full details 
[INFO] 
[INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-shims-0.20 ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-shims-0.20 ---
[INFO] Building jar: 
/data/hive-ptest/working/apache-svn-trunk-source/shims/0.20/target/hive-shims-0.20-0.14.0-SNAPSHOT.jar
[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ 
hive-shims-0.20 ---
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-shims-0.20 
---
[INFO] Installing 
/data/hive-ptest/working/apache-svn-trunk-source/shims/0.20/target/hive-shims-0.20-0.14.0-SNAPSHOT.jar
 to 
/data/hive-ptest/working/maven/org/apache/hive/shims/hive-shims-0.20/0.14.0-SNAPSHOT/hive-shims-0.20-0.14.0-SNAPSHOT.jar
[INFO] Installing 
/data/hive-ptest/working/apache-svn-trunk-source/shims/0.20/pom.xml to 
/data/hive-ptest/working/maven/org/apache/hive/shims/hive-shims-0.20/0.14.0-SNAPSHOT/hive-shims-0.20-0.14.0-SNAPSHOT.pom
[INFO] 
[INFO] 
[INFO] Building Hive Shims Secure Common 0.14.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ 
hive-shims-common-secure ---
[INFO] Deleting 
/data/hive-ptest/working/apache-svn-trunk-source/shims/common-secure (includes 
= [datanucleus.log, derby.log], excludes = [])
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ 
hive-shims-common-secure ---
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ 
hive-shims-common-secure ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 
/data/hive-ptest/working/apache-svn-trunk-source/shims/common-secure/src/main/resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ 
hive-shims-common-secure ---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ 
hive-shims-common-secure ---
[INFO] Compiling 12 source files to 
/data/hive-ptest/working/apache-svn-trunk-source/shims/common-secure/target/classes
[WARNING] 
/data/hive-ptest/working/apache-svn-trunk-source/shims/common-secure/src/main/java/org/apache/hadoop/hive/shims/HadoopShimsSecure.java:
 
/data/hive-ptest/working/apache-svn-trunk-source/shims/common-secure/src/main/java/org/apache/hadoop/hive/shims/HadoopShimsSecure.java
 uses or overrides a deprecated API.
[WARNING] 
/data/hive-ptest/working/apache-svn-trunk-source/shims/common-secure/src/main/java/org/apache/hadoop/hive/shims/HadoopShimsSecure.java:
 Recompile with -Xlint:deprecation for details.
[WARNING] 
/data/hive-ptest/working/apache-svn-trunk-source/shims/common-secure/src/main/java/org/apache/hadoop/hive/shims/HadoopShimsSecure.java:
 Some input files use unchecked or unsafe operations.
[WARNING] 
/data/hive-ptest/working/apache-svn-trunk-source/shims/common-secure/src/main/java/org/apache/hadoop/hive/shims/HadoopShimsSecure.java:
 Recompile with -Xlint:unchecked for details.
[INFO] 
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ 
hive-shims-common-secure ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 
/data/hive-ptest/working/apache-svn-trunk-source/shims/common-secure/src/test/resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ 
hive-shims-common-secure ---
[INFO] Executing tasks

main:
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/shims/common-secure/target/tmp
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/shims/common-secure/target/warehouse
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/shims/common-secure/target/tmp/conf
 [copy] Copying 5 files to 
/data/hive-ptest/working/apache-svn-trunk-source/shims/common-secure/target/tmp/conf
[INFO] Executed tasks
[INFO] 
[INFO] --- 

[jira] [Commented] (HIVE-7231) Improve ORC padding

2014-06-30 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048417#comment-14048417
 ] 

Lefty Leverenz commented on HIVE-7231:
--

[~gopalv], I must have been looking at the wrong patch.  Thanks.

Is the coded ampersand (amp;) necessary in hive-default.xml.template?  Perhaps 
a simple and would be clearer.  But that's a mini-nit.

+1 for docs

 Improve ORC padding
 ---

 Key: HIVE-7231
 URL: https://issues.apache.org/jira/browse/HIVE-7231
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Attachments: HIVE-7231.1.patch, HIVE-7231.2.patch, HIVE-7231.3.patch, 
 HIVE-7231.4.patch, HIVE-7231.5.patch


 Current ORC padding is not optimal because of fixed stripe sizes within 
 block. The padding overhead will be significant in some cases. Also padding 
 percentage relative to stripe size is not configurable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7231) Improve ORC padding

2014-06-30 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048431#comment-14048431
 ] 

Gopal V commented on HIVE-7231:
---

Looks like I jumped the gun and rebased my patch to also use tez-0.4.1, which 
is why tests failed.

 Improve ORC padding
 ---

 Key: HIVE-7231
 URL: https://issues.apache.org/jira/browse/HIVE-7231
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Attachments: HIVE-7231.1.patch, HIVE-7231.2.patch, HIVE-7231.3.patch, 
 HIVE-7231.4.patch, HIVE-7231.5.patch


 Current ORC padding is not optimal because of fixed stripe sizes within 
 block. The padding overhead will be significant in some cases. Also padding 
 percentage relative to stripe size is not configurable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7231) Improve ORC padding

2014-06-30 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048531#comment-14048531
 ] 

Gopal V commented on HIVE-7231:
---

Tests on 1Tb proving that this does cut down on padding, but it progressively 
writes smaller and smaller stripes within a block.

I saw 12MB, 8Mb stripes being written before the 3.2Mb stripe size trigger sets 
in and triggers a pad event.

{code}
Resetting stripe size via (1.0 - 0.00) * (0.663954 * 66945840) = 8964
Resetting stripe size via (1.0 - 0.00) * (0.495154 * 8964) = 22009074
Resetting stripe size via (1.0 - 0.00) * (0.358696 * 22009074) = 7894571
Resetting stripe size via (1.0 - 0.00) * (0.263782 * 7894571) = 2082443
Resetting stripe size via (1.0 - 0.00) * (0.581675 * 2082443) = 1211304
Resetting stripe size via (1.0 - 0.00) * (0.814780 * 1211304) = 986946 
Resetting stripe size via (1.0 - 0.00) * (0.772579 * 986946) = 762494 
{code}

I think I might undo the as a fraction of stripe size bit and make sure that 
the padding amount is a fraction of the HDFS block size for consistent stripe 
sizes as much as possible.

 Improve ORC padding
 ---

 Key: HIVE-7231
 URL: https://issues.apache.org/jira/browse/HIVE-7231
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Attachments: HIVE-7231.1.patch, HIVE-7231.2.patch, HIVE-7231.3.patch, 
 HIVE-7231.4.patch, HIVE-7231.5.patch, HIVE-7231.6.patch


 Current ORC padding is not optimal because of fixed stripe sizes within 
 block. The padding overhead will be significant in some cases. Also padding 
 percentage relative to stripe size is not configurable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7231) Improve ORC padding

2014-06-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047059#comment-14047059
 ] 

Hive QA commented on HIVE-7231:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12653039/HIVE-7231.4.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5670 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/625/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/625/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-625/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12653039

 Improve ORC padding
 ---

 Key: HIVE-7231
 URL: https://issues.apache.org/jira/browse/HIVE-7231
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Attachments: HIVE-7231.1.patch, HIVE-7231.2.patch, HIVE-7231.3.patch, 
 HIVE-7231.4.patch


 Current ORC padding is not optimal because of fixed stripe sizes within 
 block. The padding overhead will be significant in some cases. Also padding 
 percentage relative to stripe size is not configurable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7231) Improve ORC padding

2014-06-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039773#comment-14039773
 ] 

Hive QA commented on HIVE-7231:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12651598/HIVE-7231.3.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5668 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/539/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/539/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-539/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12651598

 Improve ORC padding
 ---

 Key: HIVE-7231
 URL: https://issues.apache.org/jira/browse/HIVE-7231
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Attachments: HIVE-7231.1.patch, HIVE-7231.2.patch, HIVE-7231.3.patch


 Current ORC padding is not optimal because of fixed stripe sizes within 
 block. The padding overhead will be significant in some cases. Also padding 
 percentage relative to stripe size is not configurable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7231) Improve ORC padding

2014-06-17 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034409#comment-14034409
 ] 

Hive QA commented on HIVE-7231:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12650660/HIVE-7231.2.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5536 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_columnar
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testSubmit
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/492/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/492/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-492/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12650660

 Improve ORC padding
 ---

 Key: HIVE-7231
 URL: https://issues.apache.org/jira/browse/HIVE-7231
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Attachments: HIVE-7231.1.patch, HIVE-7231.2.patch


 Current ORC padding is not optimal because of fixed stripe sizes within 
 block. The padding overhead will be significant in some cases. Also padding 
 percentage relative to stripe size is not configurable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7231) Improve ORC padding

2014-06-17 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034434#comment-14034434
 ] 

Gopal V commented on HIVE-7231:
---

The approach results in stray writes across the stripe boundaries.

I think this approach needs to be revisited to disconnect the HDFS block size 
from the ORC stripe size.

The stripe size needs to be a factor of the HDFS block size, but the fraction 
should not remain at 0.5x.

 Improve ORC padding
 ---

 Key: HIVE-7231
 URL: https://issues.apache.org/jira/browse/HIVE-7231
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Attachments: HIVE-7231.1.patch, HIVE-7231.2.patch


 Current ORC padding is not optimal because of fixed stripe sizes within 
 block. The padding overhead will be significant in some cases. Also padding 
 percentage relative to stripe size is not configurable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7231) Improve ORC padding

2014-06-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032109#comment-14032109
 ] 

Hive QA commented on HIVE-7231:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12650431/HIVE-7231.1.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 5536 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_columnar
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rand_partitionpruner3
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDictionaryThreshold
org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDump
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/479/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/479/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-479/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12650431

 Improve ORC padding
 ---

 Key: HIVE-7231
 URL: https://issues.apache.org/jira/browse/HIVE-7231
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Attachments: HIVE-7231.1.patch


 Current ORC padding is not optimal because of fixed stripe sizes within 
 block. The padding overhead will be significant in some cases. Also padding 
 percentage relative to stripe size is not configurable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)