subject:"\[jira\] \[Commented\] \(HIVE\-12049\) Provide an option to write serialized thrift objects in final tasks"

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

2016-04-22 Thread Vaibhav Gumashta (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254434#comment-15254434
 ] 

Vaibhav Gumashta commented on HIVE-12049:
-

Test failures look unrelated - I'll commit shortly.

> Provide an option to write serialized thrift objects in final tasks
> ---
>
> Key: HIVE-12049
> URL: https://issues.apache.org/jira/browse/HIVE-12049
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, JDBC
>Affects Versions: 2.0.0
>Reporter: Rohit Dholakia
>Assignee: Rohit Dholakia
> Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, 
> HIVE-12049.12.patch, HIVE-12049.13.patch, HIVE-12049.14.patch, 
> HIVE-12049.15.patch, HIVE-12049.16.patch, HIVE-12049.17.patch, 
> HIVE-12049.18.patch, HIVE-12049.19.patch, HIVE-12049.2.patch, 
> HIVE-12049.25.patch, HIVE-12049.26.patch, HIVE-12049.3.patch, 
> HIVE-12049.4.patch, HIVE-12049.5.patch, HIVE-12049.6.patch, 
> HIVE-12049.7.patch, HIVE-12049.9.patch, new-driver-profiles.png, 
> old-driver-profiles.png
>
>
> For each fetch request to HiveServer2, we pay the penalty of deserializing 
> the row objects and translating them into a different representation suitable 
> for the RPC transfer. In a moderate to high concurrency scenarios, this can 
> result in significant CPU and memory wastage. By having each task write the 
> appropriate thrift objects to the output files, HiveServer2 can simply stream 
> a batch of rows on the wire without incurring any of the additional cost of 
> deserialization and translation. 
> This can be implemented by writing a new SerDe, which the FileSinkOperator 
> can use to write thrift formatted row batches to the output file. Using the 
> pluggable property of the {{hive.query.result.fileformat}}, we can set it to 
> use SequenceFile and write a batch of thrift formatted rows as a value blob. 
> The FetchTask can now simply read the blob and send it over the wire. On the 
> client side, the *DBC driver can read the blob and since it is already 
> formatted in the way it expects, it can continue building the ResultSet the 
> way it does in the current implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

2016-04-22 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253676#comment-15253676
 ] 

Hive QA commented on HIVE-12049:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12800138/HIVE-12049.26.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 9998 tests executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.concurrencyFalse
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testLockTimeout
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testUpdate
org.apache.hive.beeline.TestSchemaTool.testSchemaInit
{noformat}

Test results: 
http://ec2-54-177-240-2.us-west-1.compute.amazonaws.com/job/PreCommit-HIVE-MASTER-Build/36/testReport
Console output: 
http://ec2-54-177-240-2.us-west-1.compute.amazonaws.com/job/PreCommit-HIVE-MASTER-Build/36/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-36/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12800138 - PreCommit-HIVE-MASTER-Build

> Provide an option to write serialized thrift objects in final tasks
> ---
>
> Key: HIVE-12049
> URL: https://issues.apache.org/jira/browse/HIVE-12049
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, JDBC
>Affects Versions: 2.0.0
>Reporter: Rohit Dholakia
>Assignee: Rohit Dholakia
> Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, 
> HIVE-12049.12.patch, HIVE-12049.13.patch, HIVE-12049.14.patch, 
> HIVE-12049.15.patch, HIVE-12049.16.patch, HIVE-12049.17.patch, 
> HIVE-12049.18.patch, HIVE-12049.19.patch, HIVE-12049.2.patch, 
> HIVE-12049.25.patch, HIVE-12049.26.patch, HIVE-12049.3.patch, 
> HIVE-12049.4.patch, HIVE-12049.5.patch, HIVE-12049.6.patch, 
> HIVE-12049.7.patch, HIVE-12049.9.patch, new-driver-profiles.png, 
> old-driver-profiles.png
>
>
> For each fetch request to HiveServer2, we pay the penalty of deserializing 
> the row objects and translating them into a different representation suitable 
> for the RPC transfer. In a moderate to high concurrency scenarios, this can 
> result in significant CPU and memory wastage. By having each task write the 
> appropriate thrift objects to the output files, HiveServer2 can simply stream 
> a batch of rows on the wire without incurring any of the additional cost of 
> deserialization and translation. 
> This can be implemented by writing a new SerDe, which the FileSinkOperator 
> can use to write thrift formatted row batches to the output file. Using the 
> pluggable property of the {{hive.query.result.fileformat}}, we can set it to 
> use SequenceFile and write a batch of thrift formatted rows as a value blob. 
> The FetchTask can now simply read the blob and send it over the wire. On the 
> client side, the *DBC driver can read the blob and since it is already 
> formatted in the way it expects, it can continue building the ResultSet the 
> way it does in the current implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

2016-04-19 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15249076#comment-15249076
 ] 

Lefty Leverenz commented on HIVE-12049:
---

+1 for typo fixes and descriptions of configuration parameters

> Provide an option to write serialized thrift objects in final tasks
> ---
>
> Key: HIVE-12049
> URL: https://issues.apache.org/jira/browse/HIVE-12049
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, JDBC
>Affects Versions: 2.0.0
>Reporter: Rohit Dholakia
>Assignee: Rohit Dholakia
> Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, 
> HIVE-12049.12.patch, HIVE-12049.13.patch, HIVE-12049.14.patch, 
> HIVE-12049.15.patch, HIVE-12049.16.patch, HIVE-12049.17.patch, 
> HIVE-12049.18.patch, HIVE-12049.19.patch, HIVE-12049.2.patch, 
> HIVE-12049.25.patch, HIVE-12049.3.patch, HIVE-12049.4.patch, 
> HIVE-12049.5.patch, HIVE-12049.6.patch, HIVE-12049.7.patch, 
> HIVE-12049.9.patch, new-driver-profiles.png, old-driver-profiles.png
>
>
> For each fetch request to HiveServer2, we pay the penalty of deserializing 
> the row objects and translating them into a different representation suitable 
> for the RPC transfer. In a moderate to high concurrency scenarios, this can 
> result in significant CPU and memory wastage. By having each task write the 
> appropriate thrift objects to the output files, HiveServer2 can simply stream 
> a batch of rows on the wire without incurring any of the additional cost of 
> deserialization and translation. 
> This can be implemented by writing a new SerDe, which the FileSinkOperator 
> can use to write thrift formatted row batches to the output file. Using the 
> pluggable property of the {{hive.query.result.fileformat}}, we can set it to 
> use SequenceFile and write a batch of thrift formatted rows as a value blob. 
> The FetchTask can now simply read the blob and send it over the wire. On the 
> client side, the *DBC driver can read the blob and since it is already 
> formatted in the way it expects, it can continue building the ResultSet the 
> way it does in the current implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

2016-04-19 Thread Vaibhav Gumashta (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15248516#comment-15248516
 ] 

Vaibhav Gumashta commented on HIVE-12049:
-

+1 pending tests. 

> Provide an option to write serialized thrift objects in final tasks
> ---
>
> Key: HIVE-12049
> URL: https://issues.apache.org/jira/browse/HIVE-12049
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Rohit Dholakia
>Assignee: Rohit Dholakia
> Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, 
> HIVE-12049.12.patch, HIVE-12049.13.patch, HIVE-12049.14.patch, 
> HIVE-12049.15.patch, HIVE-12049.16.patch, HIVE-12049.17.patch, 
> HIVE-12049.18.patch, HIVE-12049.19.patch, HIVE-12049.2.patch, 
> HIVE-12049.25.patch, HIVE-12049.3.patch, HIVE-12049.4.patch, 
> HIVE-12049.5.patch, HIVE-12049.6.patch, HIVE-12049.7.patch, 
> HIVE-12049.9.patch, new-driver-profiles.png, old-driver-profiles.png
>
>
> For each fetch request to HiveServer2, we pay the penalty of deserializing 
> the row objects and translating them into a different representation suitable 
> for the RPC transfer. In a moderate to high concurrency scenarios, this can 
> result in significant CPU and memory wastage. By having each task write the 
> appropriate thrift objects to the output files, HiveServer2 can simply stream 
> a batch of rows on the wire without incurring any of the additional cost of 
> deserialization and translation. 
> This can be implemented by writing a new SerDe, which the FileSinkOperator 
> can use to write thrift formatted row batches to the output file. Using the 
> pluggable property of the {{hive.query.result.fileformat}}, we can set it to 
> use SequenceFile and write a batch of thrift formatted rows as a value blob. 
> The FetchTask can now simply read the blob and send it over the wire. On the 
> client side, the *DBC driver can read the blob and since it is already 
> formatted in the way it expects, it can continue building the ResultSet the 
> way it does in the current implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

2016-04-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15233834#comment-15233834
 ] 

Hive QA commented on HIVE-12049:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12797566/HIVE-12049.18.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 37 failed/errored test(s), 9967 tests 
executed
*Failed tests:*
{noformat}
TestMiniTezCliDriver-auto_sortmerge_join_13.q-tez_self_join.q-orc_vectorization_ppd.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_bucket_map_join_tez2
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_cte_3
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_cte_5
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_cte_mat_1
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_cte_mat_2
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_cte_mat_3
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_cte_mat_4
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_cte_mat_5
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning_2
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_hybridgrace_hashjoin_1
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_hybridgrace_hashjoin_2
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_llap_nullscan
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_llapdecider
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_mrr
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_dml
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_dynpart_hashjoin_1
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_dynpart_hashjoin_2
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_join_hash
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_join_tests
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_joins_explain
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_smb_main
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_union
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_union2
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_union_multiinsert
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_vector_dynpart_hashjoin_1
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_vector_dynpart_hashjoin_2
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testForcedLocalityPreemption
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testSimpleLocalAllocation
org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStore.testSimpleTable
org.apache.hadoop.hive.ql.security.TestAuthorizationPreEventListener.testListener
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProviderWithACL.testSimplePrivileges
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
org.apache.hive.jdbc.TestSSL.testSSLFetchHttp
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7528/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7528/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7528/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 37 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12797566 - PreCommit-HIVE-TRUNK-Build

> Provide an option to write serialized thrift objects in final tasks
> ---
>
> Key: HIVE-12049
> URL: https://issues.apache.org/jira/browse/HIVE-12049
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Rohit Dholakia
>Assignee: Rohit Dholakia
> Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, 
> HIVE-12049.12.patch, HIVE-12049.13.patch, HIVE-12049.14.patch, 
> HIVE-12049.15.patch, HIVE-12049.16.patch, HIVE-12049.17

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

2016-04-08 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231953#comment-15231953
 ] 

Lefty Leverenz commented on HIVE-12049:
---

I left some editorial comments on the review board.

> Provide an option to write serialized thrift objects in final tasks
> ---
>
> Key: HIVE-12049
> URL: https://issues.apache.org/jira/browse/HIVE-12049
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Rohit Dholakia
>Assignee: Rohit Dholakia
> Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, 
> HIVE-12049.12.patch, HIVE-12049.13.patch, HIVE-12049.14.patch, 
> HIVE-12049.15.patch, HIVE-12049.16.patch, HIVE-12049.17.patch, 
> HIVE-12049.18.patch, HIVE-12049.2.patch, HIVE-12049.3.patch, 
> HIVE-12049.4.patch, HIVE-12049.5.patch, HIVE-12049.6.patch, 
> HIVE-12049.7.patch, HIVE-12049.9.patch, new-driver-profiles.png, 
> old-driver-profiles.png
>
>
> For each fetch request to HiveServer2, we pay the penalty of deserializing 
> the row objects and translating them into a different representation suitable 
> for the RPC transfer. In a moderate to high concurrency scenarios, this can 
> result in significant CPU and memory wastage. By having each task write the 
> appropriate thrift objects to the output files, HiveServer2 can simply stream 
> a batch of rows on the wire without incurring any of the additional cost of 
> deserialization and translation. 
> This can be implemented by writing a new SerDe, which the FileSinkOperator 
> can use to write thrift formatted row batches to the output file. Using the 
> pluggable property of the {{hive.query.result.fileformat}}, we can set it to 
> use SequenceFile and write a batch of thrift formatted rows as a value blob. 
> The FetchTask can now simply read the blob and send it over the wire. On the 
> client side, the *DBC driver can read the blob and since it is already 
> formatted in the way it expects, it can continue building the ResultSet the 
> way it does in the current implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

2016-04-03 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15223445#comment-15223445
 ] 

Hive QA commented on HIVE-12049:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12796613/HIVE-12049.17.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9959 tests executed
*Failed tests:*
{noformat}
TestMiniTezCliDriver-dynpart_sort_optimization2.q-cte_mat_1.q-tez_bmj_schema_evolution.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7457/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7457/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7457/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12796613 - PreCommit-HIVE-TRUNK-Build

> Provide an option to write serialized thrift objects in final tasks
> ---
>
> Key: HIVE-12049
> URL: https://issues.apache.org/jira/browse/HIVE-12049
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Rohit Dholakia
>Assignee: Rohit Dholakia
> Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, 
> HIVE-12049.12.patch, HIVE-12049.13.patch, HIVE-12049.14.patch, 
> HIVE-12049.15.patch, HIVE-12049.16.patch, HIVE-12049.17.patch, 
> HIVE-12049.2.patch, HIVE-12049.3.patch, HIVE-12049.4.patch, 
> HIVE-12049.5.patch, HIVE-12049.6.patch, HIVE-12049.7.patch, 
> HIVE-12049.9.patch, new-driver-profiles.png, old-driver-profiles.png
>
>
> For each fetch request to HiveServer2, we pay the penalty of deserializing 
> the row objects and translating them into a different representation suitable 
> for the RPC transfer. In a moderate to high concurrency scenarios, this can 
> result in significant CPU and memory wastage. By having each task write the 
> appropriate thrift objects to the output files, HiveServer2 can simply stream 
> a batch of rows on the wire without incurring any of the additional cost of 
> deserialization and translation. 
> This can be implemented by writing a new SerDe, which the FileSinkOperator 
> can use to write thrift formatted row batches to the output file. Using the 
> pluggable property of the {{hive.query.result.fileformat}}, we can set it to 
> use SequenceFile and write a batch of thrift formatted rows as a value blob. 
> The FetchTask can now simply read the blob and send it over the wire. On the 
> client side, the *DBC driver can read the blob and since it is already 
> formatted in the way it expects, it can continue building the ResultSet the 
> way it does in the current implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

2016-03-28 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15214796#comment-15214796
 ] 

Gopal V commented on HIVE-12049:


MaxRows = 1000

!old-driver-profiles.png!

The hot codepath with the new driver is 

{code}
 Stacks at 2016-03-28 01:10:19 PM (uptime 7m 58 sec)

 faeb41dd-3869-40cc-860b-748f505d5565 eab06890-8bb8-478f-877a-9282f5b4d64e 
HiveServer2-Handler-Pool: Thread-788 [RUNNABLE]
*** java.util.concurrent.ConcurrentHashMap.putAll(Map) 
ConcurrentHashMap.java:1084
*** java.util.concurrent.ConcurrentHashMap.(Map) 
ConcurrentHashMap.java:852
*** org.apache.hadoop.conf.Configuration.(Configuration) 
Configuration.java:713
*** org.apache.hadoop.hive.conf.HiveConf.(HiveConf) HiveConf.java:3460
*** org.apache.hive.service.cli.operation.SQLOperation.getConfigForOperation() 
SQLOperation.java:529
*** 
org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(FetchOrientation,
 long) SQLOperation.java:360
*** 
org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationHandle,
 FetchOrientation, long) OperationManager.java:280
org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(OperationHandle,
 FetchOrientation, long, FetchType) HiveSessionImpl.java:786
org.apache.hive.service.cli.CLIService.fetchResults(OperationHandle, 
FetchOrientation, long, FetchType) CLIService.java:452
org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(TFetchResultsReq)
 ThriftCLIService.java:743
org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService$Iface,
 TCLIService$FetchResults_args) TCLIService.java:1557
org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(Object,
 TBase) TCLIService.java:1542
org.apache.thrift.ProcessFunction.process(int, TProtocol, TProtocol, Object) 
ProcessFunction.java:39
org.apache.thrift.TBaseProcessor.process(TProtocol, TProtocol) 
TBaseProcessor.java:39
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TProtocol, 
TProtocol) TSetIpAddressProcessor.java:56
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run() 
TThreadPoolServer.java:286
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
ThreadPoolExecutor.java:1142
java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:617
java.lang.Thread.run() Thread.java:745
{code}

> Provide an option to write serialized thrift objects in final tasks
> ---
>
> Key: HIVE-12049
> URL: https://issues.apache.org/jira/browse/HIVE-12049
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Rohit Dholakia
>Assignee: Rohit Dholakia
> Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, 
> HIVE-12049.12.patch, HIVE-12049.13.patch, HIVE-12049.14.patch, 
> HIVE-12049.2.patch, HIVE-12049.3.patch, HIVE-12049.4.patch, 
> HIVE-12049.5.patch, HIVE-12049.6.patch, HIVE-12049.7.patch, 
> HIVE-12049.9.patch, new-driver-profiles.png, old-driver-profiles.png
>
>
> For each fetch request to HiveServer2, we pay the penalty of deserializing 
> the row objects and translating them into a different representation suitable 
> for the RPC transfer. In a moderate to high concurrency scenarios, this can 
> result in significant CPU and memory wastage. By having each task write the 
> appropriate thrift objects to the output files, HiveServer2 can simply stream 
> a batch of rows on the wire without incurring any of the additional cost of 
> deserialization and translation. 
> This can be implemented by writing a new SerDe, which the FileSinkOperator 
> can use to write thrift formatted row batches to the output file. Using the 
> pluggable property of the {{hive.query.result.fileformat}}, we can set it to 
> use SequenceFile and write a batch of thrift formatted rows as a value blob. 
> The FetchTask can now simply read the blob and send it over the wire. On the 
> client side, the *DBC driver can read the blob and since it is already 
> formatted in the way it expects, it can continue building the ResultSet the 
> way it does in the current implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

2016-03-28 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15214643#comment-15214643
 ] 

Thejas M Nair commented on HIVE-12049:
--

[~gopalv]
Thanks for profiling it! 
What is it like without the optimization ?
What is the JDBC fetchRowSize being used ?

> Provide an option to write serialized thrift objects in final tasks
> ---
>
> Key: HIVE-12049
> URL: https://issues.apache.org/jira/browse/HIVE-12049
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Rohit Dholakia
>Assignee: Rohit Dholakia
> Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, 
> HIVE-12049.12.patch, HIVE-12049.13.patch, HIVE-12049.14.patch, 
> HIVE-12049.2.patch, HIVE-12049.3.patch, HIVE-12049.4.patch, 
> HIVE-12049.5.patch, HIVE-12049.6.patch, HIVE-12049.7.patch, 
> HIVE-12049.9.patch, new-driver-profiles.png
>
>
> For each fetch request to HiveServer2, we pay the penalty of deserializing 
> the row objects and translating them into a different representation suitable 
> for the RPC transfer. In a moderate to high concurrency scenarios, this can 
> result in significant CPU and memory wastage. By having each task write the 
> appropriate thrift objects to the output files, HiveServer2 can simply stream 
> a batch of rows on the wire without incurring any of the additional cost of 
> deserialization and translation. 
> This can be implemented by writing a new SerDe, which the FileSinkOperator 
> can use to write thrift formatted row batches to the output file. Using the 
> pluggable property of the {{hive.query.result.fileformat}}, we can set it to 
> use SequenceFile and write a batch of thrift formatted rows as a value blob. 
> The FetchTask can now simply read the blob and send it over the wire. On the 
> client side, the *DBC driver can read the blob and since it is already 
> formatted in the way it expects, it can continue building the ResultSet the 
> way it does in the current implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

2016-03-28 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15214598#comment-15214598
 ] 

Gopal V commented on HIVE-12049:


Profiling the patch, most of the CPU in the fetchResults is now from the 
Session acquire and release.

!new-driver-profiles.png!

> Provide an option to write serialized thrift objects in final tasks
> ---
>
> Key: HIVE-12049
> URL: https://issues.apache.org/jira/browse/HIVE-12049
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Rohit Dholakia
>Assignee: Rohit Dholakia
> Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, 
> HIVE-12049.12.patch, HIVE-12049.13.patch, HIVE-12049.14.patch, 
> HIVE-12049.2.patch, HIVE-12049.3.patch, HIVE-12049.4.patch, 
> HIVE-12049.5.patch, HIVE-12049.6.patch, HIVE-12049.7.patch, HIVE-12049.9.patch
>
>
> For each fetch request to HiveServer2, we pay the penalty of deserializing 
> the row objects and translating them into a different representation suitable 
> for the RPC transfer. In a moderate to high concurrency scenarios, this can 
> result in significant CPU and memory wastage. By having each task write the 
> appropriate thrift objects to the output files, HiveServer2 can simply stream 
> a batch of rows on the wire without incurring any of the additional cost of 
> deserialization and translation. 
> This can be implemented by writing a new SerDe, which the FileSinkOperator 
> can use to write thrift formatted row batches to the output file. Using the 
> pluggable property of the {{hive.query.result.fileformat}}, we can set it to 
> use SequenceFile and write a batch of thrift formatted rows as a value blob. 
> The FetchTask can now simply read the blob and send it over the wire. On the 
> client side, the *DBC driver can read the blob and since it is already 
> formatted in the way it expects, it can continue building the ResultSet the 
> way it does in the current implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

2016-03-28 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15214528#comment-15214528
 ] 

Thejas M Nair commented on HIVE-12049:
--

The "not in list of params that are allowed to be modified at runtime" is 
happening because SQL std auth or Ranger is enabled, and it allows modifying 
configs only in a whitelist.
[~gopalv] A workaround is to add the parameter as value of 
hive.security.authorization.sqlstd.confwhitelist.append in HS2.

[~rohitdholakia] We should add hive.server2.thrift.resulset.serialize.in.tasks 
parameter to the default whiltelist. It should be added to   
sqlStdAuthSafeVarNames  array in HiveConf.java.


> Provide an option to write serialized thrift objects in final tasks
> ---
>
> Key: HIVE-12049
> URL: https://issues.apache.org/jira/browse/HIVE-12049
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Rohit Dholakia
>Assignee: Rohit Dholakia
> Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, 
> HIVE-12049.12.patch, HIVE-12049.13.patch, HIVE-12049.14.patch, 
> HIVE-12049.2.patch, HIVE-12049.3.patch, HIVE-12049.4.patch, 
> HIVE-12049.5.patch, HIVE-12049.6.patch, HIVE-12049.7.patch, HIVE-12049.9.patch
>
>
> For each fetch request to HiveServer2, we pay the penalty of deserializing 
> the row objects and translating them into a different representation suitable 
> for the RPC transfer. In a moderate to high concurrency scenarios, this can 
> result in significant CPU and memory wastage. By having each task write the 
> appropriate thrift objects to the output files, HiveServer2 can simply stream 
> a batch of rows on the wire without incurring any of the additional cost of 
> deserialization and translation. 
> This can be implemented by writing a new SerDe, which the FileSinkOperator 
> can use to write thrift formatted row batches to the output file. Using the 
> pluggable property of the {{hive.query.result.fileformat}}, we can set it to 
> use SequenceFile and write a batch of thrift formatted rows as a value blob. 
> The FetchTask can now simply read the blob and send it over the wire. On the 
> client side, the *DBC driver can read the blob and since it is already 
> formatted in the way it expects, it can continue building the ResultSet the 
> way it does in the current implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

2016-03-28 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15213909#comment-15213909
 ] 

Gopal V commented on HIVE-12049:


My long running tests failed due to me making a simple typo in the HiveConf - 
{{hive.server2.thrift.resulset.serialize.in.tasks}} is missing a 't'.

But even when I do set it up on Jmeter, I get 

{code}
org.apache.hive.service.cli.HiveSQLException: 
java.lang.IllegalArgumentException: Cannot modify 
hive.server2.thrift.resulset.serialize.in.tasks at runtime. It is not in list 
of params that are allowed to be modified at runtime
at 
org.apache.hive.service.cli.session.HiveSessionImpl.configureSession(HiveSessionImpl.java:253)
 ~[hive-service-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
{code}

> Provide an option to write serialized thrift objects in final tasks
> ---
>
> Key: HIVE-12049
> URL: https://issues.apache.org/jira/browse/HIVE-12049
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Rohit Dholakia
>Assignee: Rohit Dholakia
> Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, 
> HIVE-12049.12.patch, HIVE-12049.13.patch, HIVE-12049.14.patch, 
> HIVE-12049.2.patch, HIVE-12049.3.patch, HIVE-12049.4.patch, 
> HIVE-12049.5.patch, HIVE-12049.6.patch, HIVE-12049.7.patch, HIVE-12049.9.patch
>
>
> For each fetch request to HiveServer2, we pay the penalty of deserializing 
> the row objects and translating them into a different representation suitable 
> for the RPC transfer. In a moderate to high concurrency scenarios, this can 
> result in significant CPU and memory wastage. By having each task write the 
> appropriate thrift objects to the output files, HiveServer2 can simply stream 
> a batch of rows on the wire without incurring any of the additional cost of 
> deserialization and translation. 
> This can be implemented by writing a new SerDe, which the FileSinkOperator 
> can use to write thrift formatted row batches to the output file. Using the 
> pluggable property of the {{hive.query.result.fileformat}}, we can set it to 
> use SequenceFile and write a batch of thrift formatted rows as a value blob. 
> The FetchTask can now simply read the blob and send it over the wire. On the 
> client side, the *DBC driver can read the blob and since it is already 
> formatted in the way it expects, it can continue building the ResultSet the 
> way it does in the current implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

2016-03-24 Thread Vaibhav Gumashta (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15211011#comment-15211011
 ] 

Vaibhav Gumashta commented on HIVE-12049:
-

[~rohitdholakia] You might want to look at the following failures, which look 
related:
{code}
org.apache.hive.beeline.TestBeeLineWithArgs.testEmbeddedBeelineOutputs
org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgress
org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgressParallel
org.apache.hive.jdbc.TestJdbcDriver2.testGetQueryLog
{code}


> Provide an option to write serialized thrift objects in final tasks
> ---
>
> Key: HIVE-12049
> URL: https://issues.apache.org/jira/browse/HIVE-12049
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Rohit Dholakia
>Assignee: Rohit Dholakia
> Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, 
> HIVE-12049.12.patch, HIVE-12049.13.patch, HIVE-12049.14.patch, 
> HIVE-12049.2.patch, HIVE-12049.3.patch, HIVE-12049.4.patch, 
> HIVE-12049.5.patch, HIVE-12049.6.patch, HIVE-12049.7.patch, HIVE-12049.9.patch
>
>
> For each fetch request to HiveServer2, we pay the penalty of deserializing 
> the row objects and translating them into a different representation suitable 
> for the RPC transfer. In a moderate to high concurrency scenarios, this can 
> result in significant CPU and memory wastage. By having each task write the 
> appropriate thrift objects to the output files, HiveServer2 can simply stream 
> a batch of rows on the wire without incurring any of the additional cost of 
> deserialization and translation. 
> This can be implemented by writing a new SerDe, which the FileSinkOperator 
> can use to write thrift formatted row batches to the output file. Using the 
> pluggable property of the {{hive.query.result.fileformat}}, we can set it to 
> use SequenceFile and write a batch of thrift formatted rows as a value blob. 
> The FetchTask can now simply read the blob and send it over the wire. On the 
> client side, the *DBC driver can read the blob and since it is already 
> formatted in the way it expects, it can continue building the ResultSet the 
> way it does in the current implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

2016-03-22 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207370#comment-15207370
 ] 

Ashutosh Chauhan commented on HIVE-12049:
-

Compiler related changes look good to me.

> Provide an option to write serialized thrift objects in final tasks
> ---
>
> Key: HIVE-12049
> URL: https://issues.apache.org/jira/browse/HIVE-12049
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Rohit Dholakia
>Assignee: Rohit Dholakia
> Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, 
> HIVE-12049.12.patch, HIVE-12049.13.patch, HIVE-12049.14.patch, 
> HIVE-12049.2.patch, HIVE-12049.3.patch, HIVE-12049.4.patch, 
> HIVE-12049.5.patch, HIVE-12049.6.patch, HIVE-12049.7.patch, HIVE-12049.9.patch
>
>
> For each fetch request to HiveServer2, we pay the penalty of deserializing 
> the row objects and translating them into a different representation suitable 
> for the RPC transfer. In a moderate to high concurrency scenarios, this can 
> result in significant CPU and memory wastage. By having each task write the 
> appropriate thrift objects to the output files, HiveServer2 can simply stream 
> a batch of rows on the wire without incurring any of the additional cost of 
> deserialization and translation. 
> This can be implemented by writing a new SerDe, which the FileSinkOperator 
> can use to write thrift formatted row batches to the output file. Using the 
> pluggable property of the {{hive.query.result.fileformat}}, we can set it to 
> use SequenceFile and write a batch of thrift formatted rows as a value blob. 
> The FetchTask can now simply read the blob and send it over the wire. On the 
> client side, the *DBC driver can read the blob and since it is already 
> formatted in the way it expects, it can continue building the ResultSet the 
> way it does in the current implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

2016-03-19 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15199351#comment-15199351
 ] 

Hive QA commented on HIVE-12049:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12793673/HIVE-12049.14.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 9832 tests executed
*Failed tests:*
{noformat}
TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more 
- did not produce a TEST-*.xml file
TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not 
produce a TEST-*.xml file
org.apache.hive.beeline.TestBeeLineWithArgs.testEmbeddedBeelineOutputs
org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgress
org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgressParallel
org.apache.hive.jdbc.TestJdbcDriver2.testGetQueryLog
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7289/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7289/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7289/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12793673 - PreCommit-HIVE-TRUNK-Build

> Provide an option to write serialized thrift objects in final tasks
> ---
>
> Key: HIVE-12049
> URL: https://issues.apache.org/jira/browse/HIVE-12049
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Rohit Dholakia
>Assignee: Rohit Dholakia
> Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, 
> HIVE-12049.12.patch, HIVE-12049.13.patch, HIVE-12049.14.patch, 
> HIVE-12049.2.patch, HIVE-12049.3.patch, HIVE-12049.4.patch, 
> HIVE-12049.5.patch, HIVE-12049.6.patch, HIVE-12049.7.patch, HIVE-12049.9.patch
>
>
> For each fetch request to HiveServer2, we pay the penalty of deserializing 
> the row objects and translating them into a different representation suitable 
> for the RPC transfer. In a moderate to high concurrency scenarios, this can 
> result in significant CPU and memory wastage. By having each task write the 
> appropriate thrift objects to the output files, HiveServer2 can simply stream 
> a batch of rows on the wire without incurring any of the additional cost of 
> deserialization and translation. 
> This can be implemented by writing a new SerDe, which the FileSinkOperator 
> can use to write thrift formatted row batches to the output file. Using the 
> pluggable property of the {{hive.query.result.fileformat}}, we can set it to 
> use SequenceFile and write a batch of thrift formatted rows as a value blob. 
> The FetchTask can now simply read the blob and send it over the wire. On the 
> client side, the *DBC driver can read the blob and since it is already 
> formatted in the way it expects, it can continue building the ResultSet the 
> way it does in the current implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

2016-03-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15187591#comment-15187591
 ] 

Hive QA commented on HIVE-12049:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12791890/HIVE-12049.12.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 31 failed/errored test(s), 9786 tests 
executed
*Failed tests:*
{noformat}
TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more 
- did not produce a TEST-*.xml file
TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnStatsUpdateForStatsOptimizer_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnStatsUpdateForStatsOptimizer_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_metadata_only_queries
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_metadata_only_queries_with_filters
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_only_null
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_truncate_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_26
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert_into1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert_into2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_metadata_only_queries
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_metadata_only_queries_with_filters
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_stats_only_null
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_insert_into1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_metadata_only_queries
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_metadata_only_queries_with_filters
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_stats_only_null
org.apache.hive.beeline.TestBeeLineWithArgs.testEmbeddedBeelineOutputs
org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgress
org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgressParallel
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
org.apache.hive.jdbc.TestJdbcDriver2.testExplainStmt
org.apache.hive.jdbc.TestJdbcDriver2.testGetQueryLog
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7204/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7204/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7204/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 31 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12791890 - PreCommit-HIVE-TRUNK-Build

> Provide an option to write serialized thrift objects in final tasks
> ---
>
> Key: HIVE-12049
> URL: https://issues.apache.org/jira/browse/HIVE-12049
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Rohit Dholakia
>Assignee: Rohit Dholakia
> Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, 
> HIVE-12049.12.patch, HIVE-12049.2.patch, HIVE-12049.3.patch, 
> HIVE-12049.4.patch, HIVE-12049.5.patch, HIVE-12049.6.patch, 
> HIVE-12049.7.patch, HIVE-12049.9.patch
>
>
> For each fetch request to HiveServer2, we pay the penalty of deserializing 
> the row objects and translating them into a different representation suitable 
> for the RPC transfer. In a moderate to high concurrency scenarios, this can 
> result in significant CPU and memory wastage. By having each task write the 
> appropriate thrift objects to the output files, HiveServer2 can simply stream 
> a batch of rows on the wi

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

2016-03-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182022#comment-15182022
 ] 

Hive QA commented on HIVE-12049:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12791362/HIVE-12049.11.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 30 failed/errored test(s), 9781 tests 
executed
*Failed tests:*
{noformat}
TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more 
- did not produce a TEST-*.xml file
TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not 
produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnStatsUpdateForStatsOptimizer_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnStatsUpdateForStatsOptimizer_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_metadata_only_queries
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_metadata_only_queries_with_filters
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_only_null
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_truncate_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_26
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert_into1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert_into2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_metadata_only_queries
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_metadata_only_queries_with_filters
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_stats_only_null
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_insert_into1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_insert_into2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_metadata_only_queries
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_metadata_only_queries_with_filters
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_stats_only_null
org.apache.hive.beeline.TestBeeLineWithArgs.testEmbeddedBeelineOutputs
org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgress
org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgressParallel
org.apache.hive.jdbc.TestJdbcDriver2.testExplainStmt
org.apache.hive.jdbc.TestJdbcDriver2.testGetQueryLog
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7174/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7174/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7174/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 30 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12791362 - PreCommit-HIVE-TRUNK-Build

> Provide an option to write serialized thrift objects in final tasks
> ---
>
> Key: HIVE-12049
> URL: https://issues.apache.org/jira/browse/HIVE-12049
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Rohit Dholakia
>Assignee: Rohit Dholakia
> Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, 
> HIVE-12049.2.patch, HIVE-12049.3.patch, HIVE-12049.4.patch, 
> HIVE-12049.5.patch, HIVE-12049.6.patch, HIVE-12049.7.patch, HIVE-12049.9.patch
>
>
> For each fetch request to HiveServer2, we pay the penalty of deserializing 
> the row objects and translating them into a different representation suitable 
> for the RPC transfer. In a moderate to high concurrency scenarios, this can 
> result in significant CPU and memory wastage. By having each task write the 
> appropriate thrift objects to the output files, HiveServer2 can simply stream 
> a batch of rows on the wire without incurring any of the additional cost of 
> deserialization and translation. 
> This can be implemented by writing a new SerDe, which the FileSinkOperator 
> can use to write thrift f

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

2016-02-26 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15168948#comment-15168948
 ] 

Hive QA commented on HIVE-12049:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12789882/HIVE-12049.9.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7098/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7098/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7098/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-7098/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   e44198f..5c07894  master -> origin/master
+ git reset --hard HEAD
HEAD is now at e44198f HIVE-12857 : LLAP: modify the decider to allow using 
LLAP with whitelisted UDFs (Sergey Shelukhin, reviewed by Gunther Hagleitner)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 3 commits, and can be fast-forwarded.
+ git reset --hard origin/master
HEAD is now at 5c07894 HIVE-13122: LLAP: simple Model/View separation for UI 
(Gopal V)
+ git merge --ff-only origin/master
Already up-to-date.
+ git gc
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12789882 - PreCommit-HIVE-TRUNK-Build

> Provide an option to write serialized thrift objects in final tasks
> ---
>
> Key: HIVE-12049
> URL: https://issues.apache.org/jira/browse/HIVE-12049
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Rohit Dholakia
>Assignee: Rohit Dholakia
> Attachments: HIVE-12049.1.patch, HIVE-12049.2.patch, 
> HIVE-12049.3.patch, HIVE-12049.4.patch, HIVE-12049.5.patch, 
> HIVE-12049.6.patch, HIVE-12049.7.patch, HIVE-12049.9.patch
>
>
> For each fetch request to HiveServer2, we pay the penalty of deserializing 
> the row objects and translating them into a different representation suitable 
> for the RPC transfer. In a moderate to high concurrency scenarios, this can 
> result in significant CPU and memory wastage. By having each task write the 
> appropriate thrift objects to the output files, HiveServer2 can simply stream 
> a batch of rows on the wire without incurring any of the additional cost of 
> deserialization and translation. 
> This can be implemented by writing a new SerDe, which the FileSinkOperator 
> can use to write thrift formatted row batches to the output file. Using the 
> pluggable property of the {{hive.query.result.fileformat}}, we can set it to 
> use SequenceFile and write a batch of thrift formatted rows as a value blob. 
> The FetchTask can now simply read the blob and send it over the wire. On the 
> client side, the *DBC driver can re

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

2016-02-22 Thread Vaibhav Gumashta (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15157086#comment-15157086
 ] 

Vaibhav Gumashta commented on HIVE-12049:
-

Thanks for the iterations [~rohitdholakia]. I've posted some comments on RB on 
the latest patch there.

> Provide an option to write serialized thrift objects in final tasks
> ---
>
> Key: HIVE-12049
> URL: https://issues.apache.org/jira/browse/HIVE-12049
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Rohit Dholakia
>Assignee: Rohit Dholakia
> Attachments: HIVE-12049.1.patch, HIVE-12049.2.patch, 
> HIVE-12049.3.patch, HIVE-12049.4.patch, HIVE-12049.5.patch, 
> HIVE-12049.6.patch, HIVE-12049.7.patch
>
>
> For each fetch request to HiveServer2, we pay the penalty of deserializing 
> the row objects and translating them into a different representation suitable 
> for the RPC transfer. In a moderate to high concurrency scenarios, this can 
> result in significant CPU and memory wastage. By having each task write the 
> appropriate thrift objects to the output files, HiveServer2 can simply stream 
> a batch of rows on the wire without incurring any of the additional cost of 
> deserialization and translation. 
> This can be implemented by writing a new SerDe, which the FileSinkOperator 
> can use to write thrift formatted row batches to the output file. Using the 
> pluggable property of the {{hive.query.result.fileformat}}, we can set it to 
> use SequenceFile and write a batch of thrift formatted rows as a value blob. 
> The FetchTask can now simply read the blob and send it over the wire. On the 
> client side, the *DBC driver can read the blob and since it is already 
> formatted in the way it expects, it can continue building the ResultSet the 
> way it does in the current implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

2016-02-19 Thread Rohit Dholakia (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15154602#comment-15154602
 ] 

Rohit Dholakia commented on HIVE-12049:
---

uploaded a new version. 

* stray comments removed. 
* A fix at several places so that CLI queries don't use the new ThriftJDBC 
SerDe. 


> Provide an option to write serialized thrift objects in final tasks
> ---
>
> Key: HIVE-12049
> URL: https://issues.apache.org/jira/browse/HIVE-12049
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Rohit Dholakia
>Assignee: Rohit Dholakia
> Attachments: HIVE-12049.1.patch, HIVE-12049.2.patch, 
> HIVE-12049.3.patch, HIVE-12049.4.patch, HIVE-12049.5.patch, 
> HIVE-12049.6.patch, HIVE-12049.7.patch
>
>
> For each fetch request to HiveServer2, we pay the penalty of deserializing 
> the row objects and translating them into a different representation suitable 
> for the RPC transfer. In a moderate to high concurrency scenarios, this can 
> result in significant CPU and memory wastage. By having each task write the 
> appropriate thrift objects to the output files, HiveServer2 can simply stream 
> a batch of rows on the wire without incurring any of the additional cost of 
> deserialization and translation. 
> This can be implemented by writing a new SerDe, which the FileSinkOperator 
> can use to write thrift formatted row batches to the output file. Using the 
> pluggable property of the {{hive.query.result.fileformat}}, we can set it to 
> use SequenceFile and write a batch of thrift formatted rows as a value blob. 
> The FetchTask can now simply read the blob and send it over the wire. On the 
> client side, the *DBC driver can read the blob and since it is already 
> formatted in the way it expects, it can continue building the ResultSet the 
> way it does in the current implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

2016-02-16 Thread Rohit Dholakia (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15149452#comment-15149452
 ] 

Rohit Dholakia commented on HIVE-12049:
---

uploaded a new version of end to end patch. has some bug fixes and some changes 
to the FileSinkOperator and ThriftJDBCSerDe. 

> Provide an option to write serialized thrift objects in final tasks
> ---
>
> Key: HIVE-12049
> URL: https://issues.apache.org/jira/browse/HIVE-12049
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Rohit Dholakia
>Assignee: Rohit Dholakia
> Attachments: HIVE-12049.1.patch, HIVE-12049.2.patch, 
> HIVE-12049.3.patch, HIVE-12049.4.patch, HIVE-12049.5.patch, HIVE-12049.6.patch
>
>
> For each fetch request to HiveServer2, we pay the penalty of deserializing 
> the row objects and translating them into a different representation suitable 
> for the RPC transfer. In a moderate to high concurrency scenarios, this can 
> result in significant CPU and memory wastage. By having each task write the 
> appropriate thrift objects to the output files, HiveServer2 can simply stream 
> a batch of rows on the wire without incurring any of the additional cost of 
> deserialization and translation. 
> This can be implemented by writing a new SerDe, which the FileSinkOperator 
> can use to write thrift formatted row batches to the output file. Using the 
> pluggable property of the {{hive.query.result.fileformat}}, we can set it to 
> use SequenceFile and write a batch of thrift formatted rows as a value blob. 
> The FetchTask can now simply read the blob and send it over the wire. On the 
> client side, the *DBC driver can read the blob and since it is already 
> formatted in the way it expects, it can continue building the ResultSet the 
> way it does in the current implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

2016-01-17 Thread Vaibhav Gumashta (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15104177#comment-15104177
 ] 

Vaibhav Gumashta commented on HIVE-12049:
-

Patch v2 has changes for FileSinkOp. 

> Provide an option to write serialized thrift objects in final tasks
> ---
>
> Key: HIVE-12049
> URL: https://issues.apache.org/jira/browse/HIVE-12049
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Rohit Dholakia
>Assignee: Rohit Dholakia
> Attachments: HIVE-12049.1.patch, HIVE-12049.2.patch
>
>
> For each fetch request to HiveServer2, we pay the penalty of deserializing 
> the row objects and translating them into a different representation suitable 
> for the RPC transfer. In a moderate to high concurrency scenarios, this can 
> result in significant CPU and memory wastage. By having each task write the 
> appropriate thrift objects to the output files, HiveServer2 can simply stream 
> a batch of rows on the wire without incurring any of the additional cost of 
> deserialization and translation. 
> This can be implemented by writing a new SerDe, which the FileSinkOperator 
> can use to write thrift formatted row batches to the output file. Using the 
> pluggable property of the {{hive.query.result.fileformat}}, we can set it to 
> use SequenceFile and write a batch of thrift formatted rows as a value blob. 
> The FetchTask can now simply read the blob and send it over the wire. On the 
> client side, the *DBC driver can read the blob and since it is already 
> formatted in the way it expects, it can continue building the ResultSet the 
> way it does in the current implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

22 matches

Site Navigation

Mail list logo

Footer information