[jira] [Commented] (HIVE-14128) Parallelize jobClose phases
[ https://issues.apache.org/jira/browse/HIVE-14128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16631686#comment-16631686 ] t oo commented on HIVE-14128: - [~rajesh.balamohan] can this be merged to master? > Parallelize jobClose phases > --- > > Key: HIVE-14128 > URL: https://issues.apache.org/jira/browse/HIVE-14128 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 1.2.0, 2.0.0, 2.1.0 >Reporter: Ashutosh Chauhan >Assignee: Rajesh Balamohan >Priority: Major > Attachments: HIVE-14128.1.patch, HIVE-14128.master.2.patch, > HIVE-14128.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-14128) Parallelize jobClose phases
[ https://issues.apache.org/jira/browse/HIVE-14128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15407617#comment-15407617 ] Rajesh Balamohan commented on HIVE-14128: - Will revise the patch and upload it. Couple of tests failed locally. > Parallelize jobClose phases > --- > > Key: HIVE-14128 > URL: https://issues.apache.org/jira/browse/HIVE-14128 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 1.2.0, 2.0.0, 2.1.0 >Reporter: Ashutosh Chauhan >Assignee: Rajesh Balamohan > Attachments: HIVE-14128.1.patch, HIVE-14128.master.2.patch, > HIVE-14128.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14128) Parallelize jobClose phases
[ https://issues.apache.org/jira/browse/HIVE-14128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15407184#comment-15407184 ] Ashutosh Chauhan commented on HIVE-14128: - To trigger ut runs you need to name patch as HIVE-14128.xx.patch > Parallelize jobClose phases > --- > > Key: HIVE-14128 > URL: https://issues.apache.org/jira/browse/HIVE-14128 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 1.2.0, 2.0.0, 2.1.0 >Reporter: Ashutosh Chauhan >Assignee: Rajesh Balamohan > Attachments: HIVE-14128.1.patch, HIVE-14128.master.2.patch, > HIVE-14128.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14128) Parallelize jobClose phases
[ https://issues.apache.org/jira/browse/HIVE-14128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15407177#comment-15407177 ] Rajesh Balamohan commented on HIVE-14128: - Some systems might not prefer go via HDFS path. Had offline discussion with [~ashutoshc]. Assigning it in my name to submit the revised patch. > Parallelize jobClose phases > --- > > Key: HIVE-14128 > URL: https://issues.apache.org/jira/browse/HIVE-14128 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 1.2.0, 2.0.0, 2.1.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Attachments: HIVE-14128.1.patch, HIVE-14128.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14128) Parallelize jobClose phases
[ https://issues.apache.org/jira/browse/HIVE-14128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402391#comment-15402391 ] Ashutosh Chauhan commented on HIVE-14128: - I think approach on HIVE-14270 is better than this. So, I have abandoned this in favor of that. Once we have HIVE-14270 I think this won't be necessary. > Parallelize jobClose phases > --- > > Key: HIVE-14128 > URL: https://issues.apache.org/jira/browse/HIVE-14128 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 1.2.0, 2.0.0, 2.1.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Attachments: HIVE-14128.1.patch, HIVE-14128.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14128) Parallelize jobClose phases
[ https://issues.apache.org/jira/browse/HIVE-14128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401900#comment-15401900 ] Rajesh Balamohan commented on HIVE-14128: - [~ashutoshc] - In non-partitioned case, there can be multiple part files within the temp directory. When this is moved in HDFS, it would be simpler. But in some file systems like S3, it would turn out to be expensive still. E.g lineitem is a non-partitioned dataset in TPC-H. Simple insert overwrite would have the following move at the end of the job. Please note that this internally has 300+ part files. So it rename would turn out to be expensive here. {noformat} 2016-08-01T04:40:00,154 INFO [JobClose-Thread-0] exec.FileSinkOperator: Moving tmp dir: s3a://bucket/lineitem/.hive-staging_hive_2016-08-01_04-31-26_432_5317262787271448273-1/_tmp.-ext-1 to: s3a://bucket/lineitem/.hive-staging_hive_2016-08-01_04-31-26_432_5317262787271448273-1/-ext-1 {noformat} Should we consider a file by file move in such cases? > Parallelize jobClose phases > --- > > Key: HIVE-14128 > URL: https://issues.apache.org/jira/browse/HIVE-14128 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 1.2.0, 2.0.0, 2.1.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Attachments: HIVE-14128.1.patch, HIVE-14128.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14128) Parallelize jobClose phases
[ https://issues.apache.org/jira/browse/HIVE-14128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15369388#comment-15369388 ] Hive QA commented on HIVE-14128: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12816951/HIVE-14128.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 76 failed/errored test(s), 10296 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_merge org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_escape1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_escape2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into_with_schema org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lock3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lock4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_merge4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_merge_dynamic_partition org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_merge_dynamic_partition2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_merge_dynamic_partition4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_merge_dynamic_partition5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rcfile_merge2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_all org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_join org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_auto_sortmerge_join_16 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_num_buckets org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge1 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge2 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge_diff_fs org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_16 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part3 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge10 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge2 org.apache.hadoop.hive.cli.TestMiniTezCliD