[jira] [Commented] (HIVE-12643) For self describing InputFormat don't replicate schema information in partitions
[ https://issues.apache.org/jira/browse/HIVE-12643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297281#comment-15297281 ] Matt McCline commented on HIVE-12643: - LGTM +1 > For self describing InputFormat don't replicate schema information in > partitions > > > Key: HIVE-12643 > URL: https://issues.apache.org/jira/browse/HIVE-12643 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Attachments: HIVE-12643.1.patch, HIVE-12643.2.patch, > HIVE-12643.3.patch, HIVE-12643.3.patch, HIVE-12643.patch > > > Since self describing Input Formats don't use individual partition schemas > for schema resolution, there is no need to send that info to tasks. > Doing this should cut down plan size. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12643) For self describing InputFormat don't replicate schema information in partitions
[ https://issues.apache.org/jira/browse/HIVE-12643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15288149#comment-15288149 ] Nita Dembla commented on HIVE-12643: I've tested a slightly modified version of the patch. Original changes to following files were rejected - ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java - ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java And following file needed modifications - ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java > For self describing InputFormat don't replicate schema information in > partitions > > > Key: HIVE-12643 > URL: https://issues.apache.org/jira/browse/HIVE-12643 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Attachments: HIVE-12643.1.patch, HIVE-12643.2.patch, > HIVE-12643.3.patch, HIVE-12643.3.patch, HIVE-12643.patch > > > Since self describing Input Formats don't use individual partition schemas > for schema resolution, there is no need to send that info to tasks. > Doing this should cut down plan size. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12643) For self describing InputFormat don't replicate schema information in partitions
[ https://issues.apache.org/jira/browse/HIVE-12643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15288034#comment-15288034 ] Ashutosh Chauhan commented on HIVE-12643: - [~mmccline] Can you also please take a look at this one? [~ndembla] tried it out and reported this gives us a speed-up in query compile time. > For self describing InputFormat don't replicate schema information in > partitions > > > Key: HIVE-12643 > URL: https://issues.apache.org/jira/browse/HIVE-12643 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Attachments: HIVE-12643.1.patch, HIVE-12643.2.patch, > HIVE-12643.3.patch, HIVE-12643.3.patch, HIVE-12643.patch > > > Since self describing Input Formats don't use individual partition schemas > for schema resolution, there is no need to send that info to tasks. > Doing this should cut down plan size. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12643) For self describing InputFormat don't replicate schema information in partitions
[ https://issues.apache.org/jira/browse/HIVE-12643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284340#comment-15284340 ] Hive QA commented on HIVE-12643: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12804105/HIVE-12643.3.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 76 failed/errored test(s), 9814 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file TestMiniLlapCliDriver - did not produce a TEST-*.xml file TestMiniTezCliDriver-auto_join1.q-schema_evol_text_vec_mapwork_part_all_complex.q-vector_complex_join.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-auto_join30.q-vector_decimal_10_0.q-acid_globallimit.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-auto_sortmerge_join_16.q-skewjoin.q-vectorization_div0.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-auto_sortmerge_join_7.q-orc_merge9.q-tez_union_dynamic_partition.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-cte_4.q-vector_non_string_partition.q-delete_where_non_partitioned.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-join1.q-mapjoin_decimal.q-union5.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-load_dyn_part2.q-selectDistinctStar.q-vector_decimal_5.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-schema_evol_text_nonvec_mapwork_table.q-vector_decimal_trailing.q-subquery_in.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-smb_cache.q-transform_ppr2.q-vector_outer_join0.q-and-5-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-update_orig_table.q-union2.q-bucket4.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-vector_distinct_2.q-tez_joins_explain.q-cte_mat_1.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-vector_interval_2.q-schema_evol_text_nonvec_mapwork_part_all_primitive.q-tez_fsstat.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-vectorized_parquet.q-insert_values_non_partitioned.q-schema_evol_orc_nonvec_mapwork_part.q-and-12-more - did not produce a TEST-*.xml file TestPTFRowContainer - did not produce a TEST-*.xml file TestSparkCliDriver-groupby10.q-groupby4_noskew.q-union5.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-join9.q-join_casesensitive.q-filter_join_breaktask.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-join_cond_pushdown_3.q-groupby7.q-auto_join17.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-join_cond_pushdown_unqual4.q-bucketmapjoin12.q-avro_decimal_native.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-multi_insert.q-join5.q-groupby6.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-stats13.q-stats2.q-ppd_gby_join.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables_compact org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ivyDownload org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join11 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join3 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join_stats2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucket_map_join_tez2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_custom_input_output_format org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby2_map org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby3_noskew_multi_distinct org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby6_map_skew org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby7_map_multi_single_reducer org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby7_map_skew org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_input14 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join26 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join39 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_limit_pushdown org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_load_dyn_part10 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_load_dyn_part11 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_load_dyn_part12 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_merge2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_nullgroup2
[jira] [Commented] (HIVE-12643) For self describing InputFormat don't replicate schema information in partitions
[ https://issues.apache.org/jira/browse/HIVE-12643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15054367#comment-15054367 ] Hive QA commented on HIVE-12643: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12777237/HIVE-12643.2.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 17 failed/errored test(s), 9896 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union9 org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping org.apache.hive.jdbc.TestSSL.testSSLVersion org.apache.hive.spark.client.TestSparkClient.testAddJarsAndFiles org.apache.hive.spark.client.TestSparkClient.testCounters org.apache.hive.spark.client.TestSparkClient.testErrorJob org.apache.hive.spark.client.TestSparkClient.testJobSubmission org.apache.hive.spark.client.TestSparkClient.testMetricsCollection org.apache.hive.spark.client.TestSparkClient.testRemoteClient org.apache.hive.spark.client.TestSparkClient.testSimpleSparkJob org.apache.hive.spark.client.TestSparkClient.testSyncRpc {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6331/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6331/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6331/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 17 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12777237 - PreCommit-HIVE-TRUNK-Build > For self describing InputFormat don't replicate schema information in > partitions > > > Key: HIVE-12643 > URL: https://issues.apache.org/jira/browse/HIVE-12643 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Attachments: HIVE-12643.1.patch, HIVE-12643.2.patch, HIVE-12643.patch > > > Since self describing Input Formats don't use individual partition schemas > for schema resolution, there is no need to send that info to tasks. > Doing this should cut down plan size. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12643) For self describing InputFormat don't replicate schema information in partitions
[ https://issues.apache.org/jira/browse/HIVE-12643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15053813#comment-15053813 ] Hive QA commented on HIVE-12643: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12777180/HIVE-12643.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 9894 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_udf_max org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_quotedid_tblproperty org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_partition_diff_num_cols org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_partition_diff_num_cols org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping org.apache.hive.jdbc.TestSSL.testSSLVersion org.apache.hive.jdbc.miniHS2.TestHs2Metrics.testMetrics {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6322/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6322/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6322/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 15 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12777180 - PreCommit-HIVE-TRUNK-Build > For self describing InputFormat don't replicate schema information in > partitions > > > Key: HIVE-12643 > URL: https://issues.apache.org/jira/browse/HIVE-12643 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Attachments: HIVE-12643.1.patch, HIVE-12643.patch > > > Since self describing Input Formats don't use individual partition schemas > for schema resolution, there is no need to send that info to tasks. > Doing this should cut down plan size. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12643) For self describing InputFormat don't replicate schema information in partitions
[ https://issues.apache.org/jira/browse/HIVE-12643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052200#comment-15052200 ] Hive QA commented on HIVE-12643: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12776691/HIVE-12643.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6313/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6313/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6313/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Tests exited with: ExecutionException: java.util.concurrent.ExecutionException: org.apache.hive.ptest.execution.ssh.SSHExecutionException: RSyncResult [localFile=/data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-6313/succeeded/TestCliDriver-cp_mj_rc.q-udf_stddev_pop.q-mapreduce2.q-and-12-more, remoteFile=/home/hiveptest/50.16.94.163-hiveptest-2/logs/, getExitCode()=12, getException()=null, getUser()=hiveptest, getHost()=50.16.94.163, getInstance()=2]: 'ssh_exchange_identification: Connection closed by remote host rsync: connection unexpectedly closed (0 bytes received so far) [receiver] rsync error: error in rsync protocol data stream (code 12) at io.c(600) [receiver=3.0.6] ssh_exchange_identification: Connection closed by remote host rsync: connection unexpectedly closed (0 bytes received so far) [receiver] rsync error: error in rsync protocol data stream (code 12) at io.c(600) [receiver=3.0.6] ssh_exchange_identification: Connection closed by remote host rsync: connection unexpectedly closed (0 bytes received so far) [receiver] rsync error: error in rsync protocol data stream (code 12) at io.c(600) [receiver=3.0.6] ssh_exchange_identification: Connection closed by remote host rsync: connection unexpectedly closed (0 bytes received so far) [receiver] rsync error: error in rsync protocol data stream (code 12) at io.c(600) [receiver=3.0.6] ssh_exchange_identification: Connection closed by remote host rsync: connection unexpectedly closed (0 bytes received so far) [receiver] rsync error: error in rsync protocol data stream (code 12) at io.c(600) [receiver=3.0.6] ' {noformat} This message is automatically generated. ATTACHMENT ID: 12776691 - PreCommit-HIVE-TRUNK-Build > For self describing InputFormat don't replicate schema information in > partitions > > > Key: HIVE-12643 > URL: https://issues.apache.org/jira/browse/HIVE-12643 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Attachments: HIVE-12643.patch > > > Since self describing Input Formats don't use individual partition schemas > for schema resolution, there is no need to send that info to tasks. > Doing this should cut down plan size. -- This message was sent by Atlassian JIRA (v6.3.4#6332)