[jira] [Commented] (HIVE-15664) LLAP text cache: improve first query perf I

2017-01-25 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837354#comment-15837354
 ] 

Lefty Leverenz commented on HIVE-15664:
---

Doc note:  This adds two configuration parameters to HiveConf.java 
(*hive.llap.io.encode.enabled*, *hive.llap.io.encode.vector.serde.enabled*), so 
they need to be documented in the wiki.

* [Configuration Properties -- LLAP | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-LLAP]

Added a TODOC2.2 label.

> LLAP text cache: improve first query perf I
> ---
>
> Key: HIVE-15664
> URL: https://issues.apache.org/jira/browse/HIVE-15664
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-15664.04.patch, HIVE-15664.patch
>
>
> 1) Don't use ORC dictionary.
> 2) Use VectorDeserialize.
> 3) Don't parse the columns that are not included (cannot avoid reading them).
> -4) Send VRB to the pipeline and write ORC in parallel (in background)-. 
> HIVE-15672
> Also add an option to disable the encoding pipeline server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15664) LLAP text cache: improve first query perf I

2017-01-24 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837011#comment-15837011
 ] 

Hive QA commented on HIVE-15664:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12849183/HIVE-15664.04.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10996 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[offset_limit_ppd_optimizer]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_varchar_simple]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=93)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3159/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3159/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3159/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12849183 - PreCommit-HIVE-Build

> LLAP text cache: improve first query perf I
> ---
>
> Key: HIVE-15664
> URL: https://issues.apache.org/jira/browse/HIVE-15664
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15664.04.patch, HIVE-15664.patch
>
>
> 1) Don't use ORC dictionary.
> 2) Use VectorDeserialize.
> 3) Don't parse the columns that are not included (cannot avoid reading them).
> -4) Send VRB to the pipeline and write ORC in parallel (in background)-. 
> HIVE-15672
> Also add an option to disable the encoding pipeline server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15664) LLAP text cache: improve first query perf I

2017-01-24 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836839#comment-15836839
 ] 

Prasanth Jayachandran commented on HIVE-15664:
--

lgtm, +1. Pending tests

> LLAP text cache: improve first query perf I
> ---
>
> Key: HIVE-15664
> URL: https://issues.apache.org/jira/browse/HIVE-15664
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15664.04.patch, HIVE-15664.patch
>
>
> 1) Don't use ORC dictionary.
> 2) Use VectorDeserialize.
> 3) Don't parse the columns that are not included (cannot avoid reading them).
> -4) Send VRB to the pipeline and write ORC in parallel (in background)-. 
> HIVE-15672
> Also add an option to disable the encoding pipeline server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15664) LLAP text cache: improve first query perf I

2017-01-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832842#comment-15832842
 ] 

Hive QA commented on HIVE-15664:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12848464/HIVE-15664.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10974 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[cascade_dbdrop]
 (batchId=226)
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[generatehfiles_require_family_path]
 (batchId=226)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=136)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[offset_limit_ppd_optimizer]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part]
 (batchId=149)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3087/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3087/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3087/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12848464 - PreCommit-HIVE-Build

> LLAP text cache: improve first query perf I
> ---
>
> Key: HIVE-15664
> URL: https://issues.apache.org/jira/browse/HIVE-15664
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15664.patch
>
>
> 1) Don't use ORC dictionary.
> 2) Use VectorDeserialize.
> 3) Don't parse the columns that are not included (cannot avoid reading them).
> -4) Send VRB to the pipeline and write ORC in parallel (in background)-. 
> HIVE-15672
> Also add an option to disable the encoding pipeline server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15664) LLAP text cache: improve first query perf I

2017-01-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832381#comment-15832381
 ] 

Sergey Shelukhin commented on HIVE-15664:
-

Yes, why? :)

> LLAP text cache: improve first query perf I
> ---
>
> Key: HIVE-15664
> URL: https://issues.apache.org/jira/browse/HIVE-15664
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15664.patch, HIVE-15664.WIP.patch
>
>
> 1) Don't use ORC dictionary.
> 2) Use VectorDeserialize.
> 3) Don't parse the columns that are not included (cannot avoid reading them).
> -4) Send VRB to the pipeline and write ORC in parallel (in background)-. 
> HIVE-15672
> Also add an option to disable the encoding pipeline server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15664) LLAP text cache: improve first query perf I

2017-01-20 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831527#comment-15831527
 ] 

Matt McCline commented on HIVE-15664:
-

So, looks like you have a sparse column input VRB from the table but you need 
to cache the data ORC style with non sparse so you share columns with the 
destination (write) VRB. 

> LLAP text cache: improve first query perf I
> ---
>
> Key: HIVE-15664
> URL: https://issues.apache.org/jira/browse/HIVE-15664
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15664.patch, HIVE-15664.WIP.patch
>
>
> 1) Don't use ORC dictionary.
> 2) Use VectorDeserialize.
> 3) Don't parse the columns that are not included (cannot avoid reading them).
> -4) Send VRB to the pipeline and write ORC in parallel (in background)-. 
> HIVE-15672
> Also add an option to disable the encoding pipeline server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15664) LLAP text cache: improve first query perf I

2017-01-19 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831194#comment-15831194
 ] 

Sergey Shelukhin commented on HIVE-15664:
-

[~gopalv] [~mmccline] can you take a look? Thanks. It reuses some code from 
VectorMapOperator as part of #2

> LLAP text cache: improve first query perf I
> ---
>
> Key: HIVE-15664
> URL: https://issues.apache.org/jira/browse/HIVE-15664
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15664.patch, HIVE-15664.WIP.patch
>
>
> 1) Don't use ORC dictionary.
> 2) Use VectorDeserialize.
> 3) Don't parse the columns that are not included (cannot avoid reading them).
> -4) Send VRB to the pipeline and write ORC in parallel (in background)-. 
> HIVE-15672
> Also add an option to disable the encoding pipeline server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)