[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks
[ https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254434#comment-15254434 ] Vaibhav Gumashta commented on HIVE-12049: - Test failures look unrelated - I'll commit shortly. > Provide an option to write serialized thrift objects in final tasks > --- > > Key: HIVE-12049 > URL: https://issues.apache.org/jira/browse/HIVE-12049 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, JDBC >Affects Versions: 2.0.0 >Reporter: Rohit Dholakia >Assignee: Rohit Dholakia > Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, > HIVE-12049.12.patch, HIVE-12049.13.patch, HIVE-12049.14.patch, > HIVE-12049.15.patch, HIVE-12049.16.patch, HIVE-12049.17.patch, > HIVE-12049.18.patch, HIVE-12049.19.patch, HIVE-12049.2.patch, > HIVE-12049.25.patch, HIVE-12049.26.patch, HIVE-12049.3.patch, > HIVE-12049.4.patch, HIVE-12049.5.patch, HIVE-12049.6.patch, > HIVE-12049.7.patch, HIVE-12049.9.patch, new-driver-profiles.png, > old-driver-profiles.png > > > For each fetch request to HiveServer2, we pay the penalty of deserializing > the row objects and translating them into a different representation suitable > for the RPC transfer. In a moderate to high concurrency scenarios, this can > result in significant CPU and memory wastage. By having each task write the > appropriate thrift objects to the output files, HiveServer2 can simply stream > a batch of rows on the wire without incurring any of the additional cost of > deserialization and translation. > This can be implemented by writing a new SerDe, which the FileSinkOperator > can use to write thrift formatted row batches to the output file. Using the > pluggable property of the {{hive.query.result.fileformat}}, we can set it to > use SequenceFile and write a batch of thrift formatted rows as a value blob. > The FetchTask can now simply read the blob and send it over the wire. On the > client side, the *DBC driver can read the blob and since it is already > formatted in the way it expects, it can continue building the ResultSet the > way it does in the current implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks
[ https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253676#comment-15253676 ] Hive QA commented on HIVE-12049: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12800138/HIVE-12049.26.patch {color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 9998 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.concurrencyFalse org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testLockTimeout org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testUpdate org.apache.hive.beeline.TestSchemaTool.testSchemaInit {noformat} Test results: http://ec2-54-177-240-2.us-west-1.compute.amazonaws.com/job/PreCommit-HIVE-MASTER-Build/36/testReport Console output: http://ec2-54-177-240-2.us-west-1.compute.amazonaws.com/job/PreCommit-HIVE-MASTER-Build/36/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-36/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12800138 - PreCommit-HIVE-MASTER-Build > Provide an option to write serialized thrift objects in final tasks > --- > > Key: HIVE-12049 > URL: https://issues.apache.org/jira/browse/HIVE-12049 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, JDBC >Affects Versions: 2.0.0 >Reporter: Rohit Dholakia >Assignee: Rohit Dholakia > Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, > HIVE-12049.12.patch, HIVE-12049.13.patch, HIVE-12049.14.patch, > HIVE-12049.15.patch, HIVE-12049.16.patch, HIVE-12049.17.patch, > HIVE-12049.18.patch, HIVE-12049.19.patch, HIVE-12049.2.patch, > HIVE-12049.25.patch, HIVE-12049.26.patch, HIVE-12049.3.patch, > HIVE-12049.4.patch, HIVE-12049.5.patch, HIVE-12049.6.patch, > HIVE-12049.7.patch, HIVE-12049.9.patch, new-driver-profiles.png, > old-driver-profiles.png > > > For each fetch request to HiveServer2, we pay the penalty of deserializing > the row objects and translating them into a different representation suitable > for the RPC transfer. In a moderate to high concurrency scenarios, this can > result in significant CPU and memory wastage. By having each task write the > appropriate thrift objects to the output files, HiveServer2 can simply stream > a batch of rows on the wire without incurring any of the additional cost of > deserialization and translation. > This can be implemented by writing a new SerDe, which the FileSinkOperator > can use to write thrift formatted row batches to the output file. Using the > pluggable property of the {{hive.query.result.fileformat}}, we can set it to > use SequenceFile and write a batch of thrift formatted rows as a value blob. > The FetchTask can now simply read the blob and send it over the wire. On the > client side, the *DBC driver can read the blob and since it is already > formatted in the way it expects, it can continue building the ResultSet the > way it does in the current implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks
[ https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15249076#comment-15249076 ] Lefty Leverenz commented on HIVE-12049: --- +1 for typo fixes and descriptions of configuration parameters > Provide an option to write serialized thrift objects in final tasks > --- > > Key: HIVE-12049 > URL: https://issues.apache.org/jira/browse/HIVE-12049 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, JDBC >Affects Versions: 2.0.0 >Reporter: Rohit Dholakia >Assignee: Rohit Dholakia > Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, > HIVE-12049.12.patch, HIVE-12049.13.patch, HIVE-12049.14.patch, > HIVE-12049.15.patch, HIVE-12049.16.patch, HIVE-12049.17.patch, > HIVE-12049.18.patch, HIVE-12049.19.patch, HIVE-12049.2.patch, > HIVE-12049.25.patch, HIVE-12049.3.patch, HIVE-12049.4.patch, > HIVE-12049.5.patch, HIVE-12049.6.patch, HIVE-12049.7.patch, > HIVE-12049.9.patch, new-driver-profiles.png, old-driver-profiles.png > > > For each fetch request to HiveServer2, we pay the penalty of deserializing > the row objects and translating them into a different representation suitable > for the RPC transfer. In a moderate to high concurrency scenarios, this can > result in significant CPU and memory wastage. By having each task write the > appropriate thrift objects to the output files, HiveServer2 can simply stream > a batch of rows on the wire without incurring any of the additional cost of > deserialization and translation. > This can be implemented by writing a new SerDe, which the FileSinkOperator > can use to write thrift formatted row batches to the output file. Using the > pluggable property of the {{hive.query.result.fileformat}}, we can set it to > use SequenceFile and write a batch of thrift formatted rows as a value blob. > The FetchTask can now simply read the blob and send it over the wire. On the > client side, the *DBC driver can read the blob and since it is already > formatted in the way it expects, it can continue building the ResultSet the > way it does in the current implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks
[ https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15248516#comment-15248516 ] Vaibhav Gumashta commented on HIVE-12049: - +1 pending tests. > Provide an option to write serialized thrift objects in final tasks > --- > > Key: HIVE-12049 > URL: https://issues.apache.org/jira/browse/HIVE-12049 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Rohit Dholakia >Assignee: Rohit Dholakia > Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, > HIVE-12049.12.patch, HIVE-12049.13.patch, HIVE-12049.14.patch, > HIVE-12049.15.patch, HIVE-12049.16.patch, HIVE-12049.17.patch, > HIVE-12049.18.patch, HIVE-12049.19.patch, HIVE-12049.2.patch, > HIVE-12049.25.patch, HIVE-12049.3.patch, HIVE-12049.4.patch, > HIVE-12049.5.patch, HIVE-12049.6.patch, HIVE-12049.7.patch, > HIVE-12049.9.patch, new-driver-profiles.png, old-driver-profiles.png > > > For each fetch request to HiveServer2, we pay the penalty of deserializing > the row objects and translating them into a different representation suitable > for the RPC transfer. In a moderate to high concurrency scenarios, this can > result in significant CPU and memory wastage. By having each task write the > appropriate thrift objects to the output files, HiveServer2 can simply stream > a batch of rows on the wire without incurring any of the additional cost of > deserialization and translation. > This can be implemented by writing a new SerDe, which the FileSinkOperator > can use to write thrift formatted row batches to the output file. Using the > pluggable property of the {{hive.query.result.fileformat}}, we can set it to > use SequenceFile and write a batch of thrift formatted rows as a value blob. > The FetchTask can now simply read the blob and send it over the wire. On the > client side, the *DBC driver can read the blob and since it is already > formatted in the way it expects, it can continue building the ResultSet the > way it does in the current implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks
[ https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15233834#comment-15233834 ] Hive QA commented on HIVE-12049: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12797566/HIVE-12049.18.patch {color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 37 failed/errored test(s), 9967 tests executed *Failed tests:* {noformat} TestMiniTezCliDriver-auto_sortmerge_join_13.q-tez_self_join.q-orc_vectorization_ppd.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_bucket_map_join_tez2 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_cte_3 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_cte_5 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_cte_mat_1 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_cte_mat_2 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_cte_mat_3 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_cte_mat_4 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_cte_mat_5 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning_2 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_hybridgrace_hashjoin_1 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_hybridgrace_hashjoin_2 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_llap_nullscan org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_llapdecider org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_mrr org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_dml org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_dynpart_hashjoin_1 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_dynpart_hashjoin_2 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_join_hash org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_join_tests org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_joins_explain org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_smb_main org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_union org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_union2 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_union_multiinsert org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_vector_dynpart_hashjoin_1 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_vector_dynpart_hashjoin_2 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testForcedLocalityPreemption org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testSimpleLocalAllocation org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStore.testSimpleTable org.apache.hadoop.hive.ql.security.TestAuthorizationPreEventListener.testListener org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProviderWithACL.testSimplePrivileges org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler org.apache.hive.jdbc.TestSSL.testSSLFetchHttp {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7528/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7528/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7528/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 37 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12797566 - PreCommit-HIVE-TRUNK-Build > Provide an option to write serialized thrift objects in final tasks > --- > > Key: HIVE-12049 > URL: https://issues.apache.org/jira/browse/HIVE-12049 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Rohit Dholakia >Assignee: Rohit Dholakia > Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, > HIVE-12049.12.patch, HIVE-12049.13.patch, HIVE-12049.14.patch, > HIVE-12049.15.patch, HIVE-12049.16.patch, HIVE-12049.17
[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks
[ https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231953#comment-15231953 ] Lefty Leverenz commented on HIVE-12049: --- I left some editorial comments on the review board. > Provide an option to write serialized thrift objects in final tasks > --- > > Key: HIVE-12049 > URL: https://issues.apache.org/jira/browse/HIVE-12049 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Rohit Dholakia >Assignee: Rohit Dholakia > Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, > HIVE-12049.12.patch, HIVE-12049.13.patch, HIVE-12049.14.patch, > HIVE-12049.15.patch, HIVE-12049.16.patch, HIVE-12049.17.patch, > HIVE-12049.18.patch, HIVE-12049.2.patch, HIVE-12049.3.patch, > HIVE-12049.4.patch, HIVE-12049.5.patch, HIVE-12049.6.patch, > HIVE-12049.7.patch, HIVE-12049.9.patch, new-driver-profiles.png, > old-driver-profiles.png > > > For each fetch request to HiveServer2, we pay the penalty of deserializing > the row objects and translating them into a different representation suitable > for the RPC transfer. In a moderate to high concurrency scenarios, this can > result in significant CPU and memory wastage. By having each task write the > appropriate thrift objects to the output files, HiveServer2 can simply stream > a batch of rows on the wire without incurring any of the additional cost of > deserialization and translation. > This can be implemented by writing a new SerDe, which the FileSinkOperator > can use to write thrift formatted row batches to the output file. Using the > pluggable property of the {{hive.query.result.fileformat}}, we can set it to > use SequenceFile and write a batch of thrift formatted rows as a value blob. > The FetchTask can now simply read the blob and send it over the wire. On the > client side, the *DBC driver can read the blob and since it is already > formatted in the way it expects, it can continue building the ResultSet the > way it does in the current implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks
[ https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15223445#comment-15223445 ] Hive QA commented on HIVE-12049: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12796613/HIVE-12049.17.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9959 tests executed *Failed tests:* {noformat} TestMiniTezCliDriver-dynpart_sort_optimization2.q-cte_mat_1.q-tez_bmj_schema_evolution.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7457/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7457/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7457/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12796613 - PreCommit-HIVE-TRUNK-Build > Provide an option to write serialized thrift objects in final tasks > --- > > Key: HIVE-12049 > URL: https://issues.apache.org/jira/browse/HIVE-12049 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Rohit Dholakia >Assignee: Rohit Dholakia > Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, > HIVE-12049.12.patch, HIVE-12049.13.patch, HIVE-12049.14.patch, > HIVE-12049.15.patch, HIVE-12049.16.patch, HIVE-12049.17.patch, > HIVE-12049.2.patch, HIVE-12049.3.patch, HIVE-12049.4.patch, > HIVE-12049.5.patch, HIVE-12049.6.patch, HIVE-12049.7.patch, > HIVE-12049.9.patch, new-driver-profiles.png, old-driver-profiles.png > > > For each fetch request to HiveServer2, we pay the penalty of deserializing > the row objects and translating them into a different representation suitable > for the RPC transfer. In a moderate to high concurrency scenarios, this can > result in significant CPU and memory wastage. By having each task write the > appropriate thrift objects to the output files, HiveServer2 can simply stream > a batch of rows on the wire without incurring any of the additional cost of > deserialization and translation. > This can be implemented by writing a new SerDe, which the FileSinkOperator > can use to write thrift formatted row batches to the output file. Using the > pluggable property of the {{hive.query.result.fileformat}}, we can set it to > use SequenceFile and write a batch of thrift formatted rows as a value blob. > The FetchTask can now simply read the blob and send it over the wire. On the > client side, the *DBC driver can read the blob and since it is already > formatted in the way it expects, it can continue building the ResultSet the > way it does in the current implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks
[ https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15214796#comment-15214796 ] Gopal V commented on HIVE-12049: MaxRows = 1000 !old-driver-profiles.png! The hot codepath with the new driver is {code} Stacks at 2016-03-28 01:10:19 PM (uptime 7m 58 sec) faeb41dd-3869-40cc-860b-748f505d5565 eab06890-8bb8-478f-877a-9282f5b4d64e HiveServer2-Handler-Pool: Thread-788 [RUNNABLE] *** java.util.concurrent.ConcurrentHashMap.putAll(Map) ConcurrentHashMap.java:1084 *** java.util.concurrent.ConcurrentHashMap.(Map) ConcurrentHashMap.java:852 *** org.apache.hadoop.conf.Configuration.(Configuration) Configuration.java:713 *** org.apache.hadoop.hive.conf.HiveConf.(HiveConf) HiveConf.java:3460 *** org.apache.hive.service.cli.operation.SQLOperation.getConfigForOperation() SQLOperation.java:529 *** org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(FetchOrientation, long) SQLOperation.java:360 *** org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationHandle, FetchOrientation, long) OperationManager.java:280 org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(OperationHandle, FetchOrientation, long, FetchType) HiveSessionImpl.java:786 org.apache.hive.service.cli.CLIService.fetchResults(OperationHandle, FetchOrientation, long, FetchType) CLIService.java:452 org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(TFetchResultsReq) ThriftCLIService.java:743 org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService$Iface, TCLIService$FetchResults_args) TCLIService.java:1557 org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(Object, TBase) TCLIService.java:1542 org.apache.thrift.ProcessFunction.process(int, TProtocol, TProtocol, Object) ProcessFunction.java:39 org.apache.thrift.TBaseProcessor.process(TProtocol, TProtocol) TBaseProcessor.java:39 org.apache.hive.service.auth.TSetIpAddressProcessor.process(TProtocol, TProtocol) TSetIpAddressProcessor.java:56 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run() TThreadPoolServer.java:286 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1142 java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:617 java.lang.Thread.run() Thread.java:745 {code} > Provide an option to write serialized thrift objects in final tasks > --- > > Key: HIVE-12049 > URL: https://issues.apache.org/jira/browse/HIVE-12049 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Rohit Dholakia >Assignee: Rohit Dholakia > Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, > HIVE-12049.12.patch, HIVE-12049.13.patch, HIVE-12049.14.patch, > HIVE-12049.2.patch, HIVE-12049.3.patch, HIVE-12049.4.patch, > HIVE-12049.5.patch, HIVE-12049.6.patch, HIVE-12049.7.patch, > HIVE-12049.9.patch, new-driver-profiles.png, old-driver-profiles.png > > > For each fetch request to HiveServer2, we pay the penalty of deserializing > the row objects and translating them into a different representation suitable > for the RPC transfer. In a moderate to high concurrency scenarios, this can > result in significant CPU and memory wastage. By having each task write the > appropriate thrift objects to the output files, HiveServer2 can simply stream > a batch of rows on the wire without incurring any of the additional cost of > deserialization and translation. > This can be implemented by writing a new SerDe, which the FileSinkOperator > can use to write thrift formatted row batches to the output file. Using the > pluggable property of the {{hive.query.result.fileformat}}, we can set it to > use SequenceFile and write a batch of thrift formatted rows as a value blob. > The FetchTask can now simply read the blob and send it over the wire. On the > client side, the *DBC driver can read the blob and since it is already > formatted in the way it expects, it can continue building the ResultSet the > way it does in the current implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks
[ https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15214643#comment-15214643 ] Thejas M Nair commented on HIVE-12049: -- [~gopalv] Thanks for profiling it! What is it like without the optimization ? What is the JDBC fetchRowSize being used ? > Provide an option to write serialized thrift objects in final tasks > --- > > Key: HIVE-12049 > URL: https://issues.apache.org/jira/browse/HIVE-12049 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Rohit Dholakia >Assignee: Rohit Dholakia > Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, > HIVE-12049.12.patch, HIVE-12049.13.patch, HIVE-12049.14.patch, > HIVE-12049.2.patch, HIVE-12049.3.patch, HIVE-12049.4.patch, > HIVE-12049.5.patch, HIVE-12049.6.patch, HIVE-12049.7.patch, > HIVE-12049.9.patch, new-driver-profiles.png > > > For each fetch request to HiveServer2, we pay the penalty of deserializing > the row objects and translating them into a different representation suitable > for the RPC transfer. In a moderate to high concurrency scenarios, this can > result in significant CPU and memory wastage. By having each task write the > appropriate thrift objects to the output files, HiveServer2 can simply stream > a batch of rows on the wire without incurring any of the additional cost of > deserialization and translation. > This can be implemented by writing a new SerDe, which the FileSinkOperator > can use to write thrift formatted row batches to the output file. Using the > pluggable property of the {{hive.query.result.fileformat}}, we can set it to > use SequenceFile and write a batch of thrift formatted rows as a value blob. > The FetchTask can now simply read the blob and send it over the wire. On the > client side, the *DBC driver can read the blob and since it is already > formatted in the way it expects, it can continue building the ResultSet the > way it does in the current implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks
[ https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15214598#comment-15214598 ] Gopal V commented on HIVE-12049: Profiling the patch, most of the CPU in the fetchResults is now from the Session acquire and release. !new-driver-profiles.png! > Provide an option to write serialized thrift objects in final tasks > --- > > Key: HIVE-12049 > URL: https://issues.apache.org/jira/browse/HIVE-12049 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Rohit Dholakia >Assignee: Rohit Dholakia > Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, > HIVE-12049.12.patch, HIVE-12049.13.patch, HIVE-12049.14.patch, > HIVE-12049.2.patch, HIVE-12049.3.patch, HIVE-12049.4.patch, > HIVE-12049.5.patch, HIVE-12049.6.patch, HIVE-12049.7.patch, HIVE-12049.9.patch > > > For each fetch request to HiveServer2, we pay the penalty of deserializing > the row objects and translating them into a different representation suitable > for the RPC transfer. In a moderate to high concurrency scenarios, this can > result in significant CPU and memory wastage. By having each task write the > appropriate thrift objects to the output files, HiveServer2 can simply stream > a batch of rows on the wire without incurring any of the additional cost of > deserialization and translation. > This can be implemented by writing a new SerDe, which the FileSinkOperator > can use to write thrift formatted row batches to the output file. Using the > pluggable property of the {{hive.query.result.fileformat}}, we can set it to > use SequenceFile and write a batch of thrift formatted rows as a value blob. > The FetchTask can now simply read the blob and send it over the wire. On the > client side, the *DBC driver can read the blob and since it is already > formatted in the way it expects, it can continue building the ResultSet the > way it does in the current implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks
[ https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15214528#comment-15214528 ] Thejas M Nair commented on HIVE-12049: -- The "not in list of params that are allowed to be modified at runtime" is happening because SQL std auth or Ranger is enabled, and it allows modifying configs only in a whitelist. [~gopalv] A workaround is to add the parameter as value of hive.security.authorization.sqlstd.confwhitelist.append in HS2. [~rohitdholakia] We should add hive.server2.thrift.resulset.serialize.in.tasks parameter to the default whiltelist. It should be added to sqlStdAuthSafeVarNames array in HiveConf.java. > Provide an option to write serialized thrift objects in final tasks > --- > > Key: HIVE-12049 > URL: https://issues.apache.org/jira/browse/HIVE-12049 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Rohit Dholakia >Assignee: Rohit Dholakia > Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, > HIVE-12049.12.patch, HIVE-12049.13.patch, HIVE-12049.14.patch, > HIVE-12049.2.patch, HIVE-12049.3.patch, HIVE-12049.4.patch, > HIVE-12049.5.patch, HIVE-12049.6.patch, HIVE-12049.7.patch, HIVE-12049.9.patch > > > For each fetch request to HiveServer2, we pay the penalty of deserializing > the row objects and translating them into a different representation suitable > for the RPC transfer. In a moderate to high concurrency scenarios, this can > result in significant CPU and memory wastage. By having each task write the > appropriate thrift objects to the output files, HiveServer2 can simply stream > a batch of rows on the wire without incurring any of the additional cost of > deserialization and translation. > This can be implemented by writing a new SerDe, which the FileSinkOperator > can use to write thrift formatted row batches to the output file. Using the > pluggable property of the {{hive.query.result.fileformat}}, we can set it to > use SequenceFile and write a batch of thrift formatted rows as a value blob. > The FetchTask can now simply read the blob and send it over the wire. On the > client side, the *DBC driver can read the blob and since it is already > formatted in the way it expects, it can continue building the ResultSet the > way it does in the current implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks
[ https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15213909#comment-15213909 ] Gopal V commented on HIVE-12049: My long running tests failed due to me making a simple typo in the HiveConf - {{hive.server2.thrift.resulset.serialize.in.tasks}} is missing a 't'. But even when I do set it up on Jmeter, I get {code} org.apache.hive.service.cli.HiveSQLException: java.lang.IllegalArgumentException: Cannot modify hive.server2.thrift.resulset.serialize.in.tasks at runtime. It is not in list of params that are allowed to be modified at runtime at org.apache.hive.service.cli.session.HiveSessionImpl.configureSession(HiveSessionImpl.java:253) ~[hive-service-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] {code} > Provide an option to write serialized thrift objects in final tasks > --- > > Key: HIVE-12049 > URL: https://issues.apache.org/jira/browse/HIVE-12049 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Rohit Dholakia >Assignee: Rohit Dholakia > Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, > HIVE-12049.12.patch, HIVE-12049.13.patch, HIVE-12049.14.patch, > HIVE-12049.2.patch, HIVE-12049.3.patch, HIVE-12049.4.patch, > HIVE-12049.5.patch, HIVE-12049.6.patch, HIVE-12049.7.patch, HIVE-12049.9.patch > > > For each fetch request to HiveServer2, we pay the penalty of deserializing > the row objects and translating them into a different representation suitable > for the RPC transfer. In a moderate to high concurrency scenarios, this can > result in significant CPU and memory wastage. By having each task write the > appropriate thrift objects to the output files, HiveServer2 can simply stream > a batch of rows on the wire without incurring any of the additional cost of > deserialization and translation. > This can be implemented by writing a new SerDe, which the FileSinkOperator > can use to write thrift formatted row batches to the output file. Using the > pluggable property of the {{hive.query.result.fileformat}}, we can set it to > use SequenceFile and write a batch of thrift formatted rows as a value blob. > The FetchTask can now simply read the blob and send it over the wire. On the > client side, the *DBC driver can read the blob and since it is already > formatted in the way it expects, it can continue building the ResultSet the > way it does in the current implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks
[ https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15211011#comment-15211011 ] Vaibhav Gumashta commented on HIVE-12049: - [~rohitdholakia] You might want to look at the following failures, which look related: {code} org.apache.hive.beeline.TestBeeLineWithArgs.testEmbeddedBeelineOutputs org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgress org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgressParallel org.apache.hive.jdbc.TestJdbcDriver2.testGetQueryLog {code} > Provide an option to write serialized thrift objects in final tasks > --- > > Key: HIVE-12049 > URL: https://issues.apache.org/jira/browse/HIVE-12049 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Rohit Dholakia >Assignee: Rohit Dholakia > Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, > HIVE-12049.12.patch, HIVE-12049.13.patch, HIVE-12049.14.patch, > HIVE-12049.2.patch, HIVE-12049.3.patch, HIVE-12049.4.patch, > HIVE-12049.5.patch, HIVE-12049.6.patch, HIVE-12049.7.patch, HIVE-12049.9.patch > > > For each fetch request to HiveServer2, we pay the penalty of deserializing > the row objects and translating them into a different representation suitable > for the RPC transfer. In a moderate to high concurrency scenarios, this can > result in significant CPU and memory wastage. By having each task write the > appropriate thrift objects to the output files, HiveServer2 can simply stream > a batch of rows on the wire without incurring any of the additional cost of > deserialization and translation. > This can be implemented by writing a new SerDe, which the FileSinkOperator > can use to write thrift formatted row batches to the output file. Using the > pluggable property of the {{hive.query.result.fileformat}}, we can set it to > use SequenceFile and write a batch of thrift formatted rows as a value blob. > The FetchTask can now simply read the blob and send it over the wire. On the > client side, the *DBC driver can read the blob and since it is already > formatted in the way it expects, it can continue building the ResultSet the > way it does in the current implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks
[ https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207370#comment-15207370 ] Ashutosh Chauhan commented on HIVE-12049: - Compiler related changes look good to me. > Provide an option to write serialized thrift objects in final tasks > --- > > Key: HIVE-12049 > URL: https://issues.apache.org/jira/browse/HIVE-12049 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Rohit Dholakia >Assignee: Rohit Dholakia > Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, > HIVE-12049.12.patch, HIVE-12049.13.patch, HIVE-12049.14.patch, > HIVE-12049.2.patch, HIVE-12049.3.patch, HIVE-12049.4.patch, > HIVE-12049.5.patch, HIVE-12049.6.patch, HIVE-12049.7.patch, HIVE-12049.9.patch > > > For each fetch request to HiveServer2, we pay the penalty of deserializing > the row objects and translating them into a different representation suitable > for the RPC transfer. In a moderate to high concurrency scenarios, this can > result in significant CPU and memory wastage. By having each task write the > appropriate thrift objects to the output files, HiveServer2 can simply stream > a batch of rows on the wire without incurring any of the additional cost of > deserialization and translation. > This can be implemented by writing a new SerDe, which the FileSinkOperator > can use to write thrift formatted row batches to the output file. Using the > pluggable property of the {{hive.query.result.fileformat}}, we can set it to > use SequenceFile and write a batch of thrift formatted rows as a value blob. > The FetchTask can now simply read the blob and send it over the wire. On the > client side, the *DBC driver can read the blob and since it is already > formatted in the way it expects, it can continue building the ResultSet the > way it does in the current implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks
[ https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15199351#comment-15199351 ] Hive QA commented on HIVE-12049: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12793673/HIVE-12049.14.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 9832 tests executed *Failed tests:* {noformat} TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not produce a TEST-*.xml file org.apache.hive.beeline.TestBeeLineWithArgs.testEmbeddedBeelineOutputs org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgress org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgressParallel org.apache.hive.jdbc.TestJdbcDriver2.testGetQueryLog {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7289/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7289/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7289/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12793673 - PreCommit-HIVE-TRUNK-Build > Provide an option to write serialized thrift objects in final tasks > --- > > Key: HIVE-12049 > URL: https://issues.apache.org/jira/browse/HIVE-12049 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Rohit Dholakia >Assignee: Rohit Dholakia > Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, > HIVE-12049.12.patch, HIVE-12049.13.patch, HIVE-12049.14.patch, > HIVE-12049.2.patch, HIVE-12049.3.patch, HIVE-12049.4.patch, > HIVE-12049.5.patch, HIVE-12049.6.patch, HIVE-12049.7.patch, HIVE-12049.9.patch > > > For each fetch request to HiveServer2, we pay the penalty of deserializing > the row objects and translating them into a different representation suitable > for the RPC transfer. In a moderate to high concurrency scenarios, this can > result in significant CPU and memory wastage. By having each task write the > appropriate thrift objects to the output files, HiveServer2 can simply stream > a batch of rows on the wire without incurring any of the additional cost of > deserialization and translation. > This can be implemented by writing a new SerDe, which the FileSinkOperator > can use to write thrift formatted row batches to the output file. Using the > pluggable property of the {{hive.query.result.fileformat}}, we can set it to > use SequenceFile and write a batch of thrift formatted rows as a value blob. > The FetchTask can now simply read the blob and send it over the wire. On the > client side, the *DBC driver can read the blob and since it is already > formatted in the way it expects, it can continue building the ResultSet the > way it does in the current implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks
[ https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15187591#comment-15187591 ] Hive QA commented on HIVE-12049: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12791890/HIVE-12049.12.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 31 failed/errored test(s), 9786 tests executed *Failed tests:* {noformat} TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnStatsUpdateForStatsOptimizer_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnStatsUpdateForStatsOptimizer_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_metadata_only_queries org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_metadata_only_queries_with_filters org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_only_null org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_truncate_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_26 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert_into1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert_into2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_metadata_only_queries org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_metadata_only_queries_with_filters org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_stats_only_null org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_insert_into1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_metadata_only_queries org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_metadata_only_queries_with_filters org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_stats_only_null org.apache.hive.beeline.TestBeeLineWithArgs.testEmbeddedBeelineOutputs org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgress org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgressParallel org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler org.apache.hive.jdbc.TestJdbcDriver2.testExplainStmt org.apache.hive.jdbc.TestJdbcDriver2.testGetQueryLog org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7204/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7204/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7204/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 31 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12791890 - PreCommit-HIVE-TRUNK-Build > Provide an option to write serialized thrift objects in final tasks > --- > > Key: HIVE-12049 > URL: https://issues.apache.org/jira/browse/HIVE-12049 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Rohit Dholakia >Assignee: Rohit Dholakia > Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, > HIVE-12049.12.patch, HIVE-12049.2.patch, HIVE-12049.3.patch, > HIVE-12049.4.patch, HIVE-12049.5.patch, HIVE-12049.6.patch, > HIVE-12049.7.patch, HIVE-12049.9.patch > > > For each fetch request to HiveServer2, we pay the penalty of deserializing > the row objects and translating them into a different representation suitable > for the RPC transfer. In a moderate to high concurrency scenarios, this can > result in significant CPU and memory wastage. By having each task write the > appropriate thrift objects to the output files, HiveServer2 can simply stream > a batch of rows on the wi
[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks
[ https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182022#comment-15182022 ] Hive QA commented on HIVE-12049: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12791362/HIVE-12049.11.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 30 failed/errored test(s), 9781 tests executed *Failed tests:* {noformat} TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnStatsUpdateForStatsOptimizer_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnStatsUpdateForStatsOptimizer_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_metadata_only_queries org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_metadata_only_queries_with_filters org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_only_null org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_truncate_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_26 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert_into1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert_into2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_metadata_only_queries org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_metadata_only_queries_with_filters org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_stats_only_null org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_insert_into1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_insert_into2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_metadata_only_queries org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_metadata_only_queries_with_filters org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_stats_only_null org.apache.hive.beeline.TestBeeLineWithArgs.testEmbeddedBeelineOutputs org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgress org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgressParallel org.apache.hive.jdbc.TestJdbcDriver2.testExplainStmt org.apache.hive.jdbc.TestJdbcDriver2.testGetQueryLog org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7174/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7174/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7174/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 30 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12791362 - PreCommit-HIVE-TRUNK-Build > Provide an option to write serialized thrift objects in final tasks > --- > > Key: HIVE-12049 > URL: https://issues.apache.org/jira/browse/HIVE-12049 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Rohit Dholakia >Assignee: Rohit Dholakia > Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, > HIVE-12049.2.patch, HIVE-12049.3.patch, HIVE-12049.4.patch, > HIVE-12049.5.patch, HIVE-12049.6.patch, HIVE-12049.7.patch, HIVE-12049.9.patch > > > For each fetch request to HiveServer2, we pay the penalty of deserializing > the row objects and translating them into a different representation suitable > for the RPC transfer. In a moderate to high concurrency scenarios, this can > result in significant CPU and memory wastage. By having each task write the > appropriate thrift objects to the output files, HiveServer2 can simply stream > a batch of rows on the wire without incurring any of the additional cost of > deserialization and translation. > This can be implemented by writing a new SerDe, which the FileSinkOperator > can use to write thrift f
[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks
[ https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15168948#comment-15168948 ] Hive QA commented on HIVE-12049: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12789882/HIVE-12049.9.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7098/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7098/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7098/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-7098/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + cd apache-github-source-source + git fetch origin >From https://github.com/apache/hive e44198f..5c07894 master -> origin/master + git reset --hard HEAD HEAD is now at e44198f HIVE-12857 : LLAP: modify the decider to allow using LLAP with whitelisted UDFs (Sergey Shelukhin, reviewed by Gunther Hagleitner) + git clean -f -d + git checkout master Already on 'master' Your branch is behind 'origin/master' by 3 commits, and can be fast-forwarded. + git reset --hard origin/master HEAD is now at 5c07894 HIVE-13122: LLAP: simple Model/View separation for UI (Gopal V) + git merge --ff-only origin/master Already up-to-date. + git gc + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12789882 - PreCommit-HIVE-TRUNK-Build > Provide an option to write serialized thrift objects in final tasks > --- > > Key: HIVE-12049 > URL: https://issues.apache.org/jira/browse/HIVE-12049 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Rohit Dholakia >Assignee: Rohit Dholakia > Attachments: HIVE-12049.1.patch, HIVE-12049.2.patch, > HIVE-12049.3.patch, HIVE-12049.4.patch, HIVE-12049.5.patch, > HIVE-12049.6.patch, HIVE-12049.7.patch, HIVE-12049.9.patch > > > For each fetch request to HiveServer2, we pay the penalty of deserializing > the row objects and translating them into a different representation suitable > for the RPC transfer. In a moderate to high concurrency scenarios, this can > result in significant CPU and memory wastage. By having each task write the > appropriate thrift objects to the output files, HiveServer2 can simply stream > a batch of rows on the wire without incurring any of the additional cost of > deserialization and translation. > This can be implemented by writing a new SerDe, which the FileSinkOperator > can use to write thrift formatted row batches to the output file. Using the > pluggable property of the {{hive.query.result.fileformat}}, we can set it to > use SequenceFile and write a batch of thrift formatted rows as a value blob. > The FetchTask can now simply read the blob and send it over the wire. On the > client side, the *DBC driver can re
[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks
[ https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15157086#comment-15157086 ] Vaibhav Gumashta commented on HIVE-12049: - Thanks for the iterations [~rohitdholakia]. I've posted some comments on RB on the latest patch there. > Provide an option to write serialized thrift objects in final tasks > --- > > Key: HIVE-12049 > URL: https://issues.apache.org/jira/browse/HIVE-12049 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Rohit Dholakia >Assignee: Rohit Dholakia > Attachments: HIVE-12049.1.patch, HIVE-12049.2.patch, > HIVE-12049.3.patch, HIVE-12049.4.patch, HIVE-12049.5.patch, > HIVE-12049.6.patch, HIVE-12049.7.patch > > > For each fetch request to HiveServer2, we pay the penalty of deserializing > the row objects and translating them into a different representation suitable > for the RPC transfer. In a moderate to high concurrency scenarios, this can > result in significant CPU and memory wastage. By having each task write the > appropriate thrift objects to the output files, HiveServer2 can simply stream > a batch of rows on the wire without incurring any of the additional cost of > deserialization and translation. > This can be implemented by writing a new SerDe, which the FileSinkOperator > can use to write thrift formatted row batches to the output file. Using the > pluggable property of the {{hive.query.result.fileformat}}, we can set it to > use SequenceFile and write a batch of thrift formatted rows as a value blob. > The FetchTask can now simply read the blob and send it over the wire. On the > client side, the *DBC driver can read the blob and since it is already > formatted in the way it expects, it can continue building the ResultSet the > way it does in the current implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks
[ https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15154602#comment-15154602 ] Rohit Dholakia commented on HIVE-12049: --- uploaded a new version. * stray comments removed. * A fix at several places so that CLI queries don't use the new ThriftJDBC SerDe. > Provide an option to write serialized thrift objects in final tasks > --- > > Key: HIVE-12049 > URL: https://issues.apache.org/jira/browse/HIVE-12049 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Rohit Dholakia >Assignee: Rohit Dholakia > Attachments: HIVE-12049.1.patch, HIVE-12049.2.patch, > HIVE-12049.3.patch, HIVE-12049.4.patch, HIVE-12049.5.patch, > HIVE-12049.6.patch, HIVE-12049.7.patch > > > For each fetch request to HiveServer2, we pay the penalty of deserializing > the row objects and translating them into a different representation suitable > for the RPC transfer. In a moderate to high concurrency scenarios, this can > result in significant CPU and memory wastage. By having each task write the > appropriate thrift objects to the output files, HiveServer2 can simply stream > a batch of rows on the wire without incurring any of the additional cost of > deserialization and translation. > This can be implemented by writing a new SerDe, which the FileSinkOperator > can use to write thrift formatted row batches to the output file. Using the > pluggable property of the {{hive.query.result.fileformat}}, we can set it to > use SequenceFile and write a batch of thrift formatted rows as a value blob. > The FetchTask can now simply read the blob and send it over the wire. On the > client side, the *DBC driver can read the blob and since it is already > formatted in the way it expects, it can continue building the ResultSet the > way it does in the current implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks
[ https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15149452#comment-15149452 ] Rohit Dholakia commented on HIVE-12049: --- uploaded a new version of end to end patch. has some bug fixes and some changes to the FileSinkOperator and ThriftJDBCSerDe. > Provide an option to write serialized thrift objects in final tasks > --- > > Key: HIVE-12049 > URL: https://issues.apache.org/jira/browse/HIVE-12049 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Rohit Dholakia >Assignee: Rohit Dholakia > Attachments: HIVE-12049.1.patch, HIVE-12049.2.patch, > HIVE-12049.3.patch, HIVE-12049.4.patch, HIVE-12049.5.patch, HIVE-12049.6.patch > > > For each fetch request to HiveServer2, we pay the penalty of deserializing > the row objects and translating them into a different representation suitable > for the RPC transfer. In a moderate to high concurrency scenarios, this can > result in significant CPU and memory wastage. By having each task write the > appropriate thrift objects to the output files, HiveServer2 can simply stream > a batch of rows on the wire without incurring any of the additional cost of > deserialization and translation. > This can be implemented by writing a new SerDe, which the FileSinkOperator > can use to write thrift formatted row batches to the output file. Using the > pluggable property of the {{hive.query.result.fileformat}}, we can set it to > use SequenceFile and write a batch of thrift formatted rows as a value blob. > The FetchTask can now simply read the blob and send it over the wire. On the > client side, the *DBC driver can read the blob and since it is already > formatted in the way it expects, it can continue building the ResultSet the > way it does in the current implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks
[ https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15104177#comment-15104177 ] Vaibhav Gumashta commented on HIVE-12049: - Patch v2 has changes for FileSinkOp. > Provide an option to write serialized thrift objects in final tasks > --- > > Key: HIVE-12049 > URL: https://issues.apache.org/jira/browse/HIVE-12049 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Rohit Dholakia >Assignee: Rohit Dholakia > Attachments: HIVE-12049.1.patch, HIVE-12049.2.patch > > > For each fetch request to HiveServer2, we pay the penalty of deserializing > the row objects and translating them into a different representation suitable > for the RPC transfer. In a moderate to high concurrency scenarios, this can > result in significant CPU and memory wastage. By having each task write the > appropriate thrift objects to the output files, HiveServer2 can simply stream > a batch of rows on the wire without incurring any of the additional cost of > deserialization and translation. > This can be implemented by writing a new SerDe, which the FileSinkOperator > can use to write thrift formatted row batches to the output file. Using the > pluggable property of the {{hive.query.result.fileformat}}, we can set it to > use SequenceFile and write a batch of thrift formatted rows as a value blob. > The FetchTask can now simply read the blob and send it over the wire. On the > client side, the *DBC driver can read the blob and since it is already > formatted in the way it expects, it can continue building the ResultSet the > way it does in the current implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)