[jira] [Commented] (HIVE-13354) Add ability to specify Compaction options per table and per request
[ https://issues.apache.org/jira/browse/HIVE-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426089#comment-15426089 ] Lefty Leverenz commented on HIVE-13354: --- Thanks for the docs, [~wzheng], well done! I'm removing the TODOC2.1 label. Here are the specific links: * [Hive Transactions -- Table Properties | https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-TableProperties] * [DDL -- Create Table | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable] * [DDL -- Alter Table/Partition Compact | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/PartitionCompact] > Add ability to specify Compaction options per table and per request > --- > > Key: HIVE-13354 > URL: https://issues.apache.org/jira/browse/HIVE-13354 > Project: Hive > Issue Type: Improvement >Affects Versions: 1.3.0, 2.0.0 >Reporter: Eugene Koifman >Assignee: Wei Zheng > Fix For: 1.3.0, 2.1.0 > > Attachments: HIVE-13354.1.patch, > HIVE-13354.1.withoutSchemaChange.patch, HIVE-13354.2.patch, > HIVE-13354.3.patch, HIVE-13354.branch-1.patch > > > Currently the are a few options that determine when automatic compaction is > triggered. They are specified once for the warehouse. > This doesn't make sense - some table may be more important and need to be > compacted more often. > We should allow specifying these on per table basis. > Also, compaction is an MR job launched from within the metastore. There is > currently no way to control job parameters (like memory, for example) except > to specify it in hive-site.xml for metastore which means they are site wide. > Should add a way to specify these per table (perhaps even per compaction if > launched via ALTER TABLE) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13354) Add ability to specify Compaction options per table and per request
[ https://issues.apache.org/jira/browse/HIVE-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15421768#comment-15421768 ] Wei Zheng commented on HIVE-13354: -- [~leftylev] Wiki has been updated: https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL > Add ability to specify Compaction options per table and per request > --- > > Key: HIVE-13354 > URL: https://issues.apache.org/jira/browse/HIVE-13354 > Project: Hive > Issue Type: Improvement >Affects Versions: 1.3.0, 2.0.0 >Reporter: Eugene Koifman >Assignee: Wei Zheng > Labels: TODOC2.1 > Fix For: 1.3.0, 2.1.0 > > Attachments: HIVE-13354.1.patch, > HIVE-13354.1.withoutSchemaChange.patch, HIVE-13354.2.patch, > HIVE-13354.3.patch, HIVE-13354.branch-1.patch > > > Currently the are a few options that determine when automatic compaction is > triggered. They are specified once for the warehouse. > This doesn't make sense - some table may be more important and need to be > compacted more often. > We should allow specifying these on per table basis. > Also, compaction is an MR job launched from within the metastore. There is > currently no way to control job parameters (like memory, for example) except > to specify it in hive-site.xml for metastore which means they are site wide. > Should add a way to specify these per table (perhaps even per compaction if > launched via ALTER TABLE) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13354) Add ability to specify Compaction options per table and per request
[ https://issues.apache.org/jira/browse/HIVE-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304482#comment-15304482 ] Wei Zheng commented on HIVE-13354: -- Test failures unrelated. Test Name Duration Age org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_memcheck 3.7 sec 1 org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testDelayedLocalityNodeCommErrorImmediateAllocation 10 sec 1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_load_dyn_part2 1 min 9 sec 1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_load_dyn_part6 1.6 sec 1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan 0.82 sec1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_temp_table_gb1 0.65 sec1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_13 0.82 sec1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join28 0.87 sec1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_leftsemijoin_mr 0.89 sec1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_16 2.3 sec 1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_decimal_date 0.59 sec1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join5 0.64 sec1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join7 0.83 sec1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_char_4 0.94 sec1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_17 0.97 sec1 org.apache.hadoop.hive.ql.TestTxnCommands.testSimpleAcidInsert 2 min 41 sec 1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_partitioned 21 sec 2 org.apache.hadoop.hive.llap.daemon.impl.comparator.TestShortestJobFirstComparator.testWaitQueueComparatorWithinDagPriority 5 sec 18 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner 3.2 sec 22 org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static 1 min 34 sec38 org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic 1 min 25 sec38 org.apache.hive.minikdc.TestHiveAuthFactory.testStartTokenManagerForMemoryTokenStore 1.6 sec 38 org.apache.hive.minikdc.TestHiveAuthFactory.testStartTokenManagerForDBTokenStore 0.25 sec38 org.apache.hive.minikdc.TestMiniHiveKdc.testLogin 1 min 30 sec38 org.apache.hadoop.hive.llap.tez.TestConverters.testFragmentSpecToTaskSpec 43 ms 58 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_selectindate13 sec 90 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avrocountemptytbl 11 sec 90 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order_null 41 sec 90 org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_with_different_encryption_keys 1 min 33 sec90 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 10 sec 90 org.apache.hadoop.hive.cli.TestMinimrCliDriver.org.apache.hadoop.hive.cli.TestMinimrCliDriver 1 min 2 sec 90 > Add ability to specify Compaction options per table and per request > --- > > Key: HIVE-13354 > URL: https://issues.apache.org/jira/browse/HIVE-13354 > Project: Hive > Issue Type: Improvement >Affects Versions: 1.3.0, 2.0.0 >Reporter: Eugene Koifman >Assignee: Wei Zheng > Labels: TODOC2.1 > Attachments: HIVE-13354.1.patch, > HIVE-13354.1.withoutSchemaChange.patch, HIVE-13354.2.patch, HIVE-13354.3.patch > > > Currently the are a few options that determine when automatic compaction is > triggered. They are specified once for the warehouse. > This doesn't make sense - some table may be more important and need to be > compacted more often. > We should allow specifying these on per table basis. > Also, compaction is an MR job launched from within the metastore. There is > currently no way to control job parameters (like memory, for example) except > to specify it in hive-site.xml for metastore which means they are site wide. > Should add a way to specify these per table (perhaps even per compaction if > launched via ALTER TABLE) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13354) Add ability to specify Compaction options per table and per request
[ https://issues.apache.org/jira/browse/HIVE-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303240#comment-15303240 ] Eugene Koifman commented on HIVE-13354: --- +1 pending tests > Add ability to specify Compaction options per table and per request > --- > > Key: HIVE-13354 > URL: https://issues.apache.org/jira/browse/HIVE-13354 > Project: Hive > Issue Type: Improvement >Affects Versions: 1.3.0, 2.0.0 >Reporter: Eugene Koifman >Assignee: Wei Zheng > Labels: TODOC2.1 > Attachments: HIVE-13354.1.patch, > HIVE-13354.1.withoutSchemaChange.patch, HIVE-13354.2.patch, HIVE-13354.3.patch > > > Currently the are a few options that determine when automatic compaction is > triggered. They are specified once for the warehouse. > This doesn't make sense - some table may be more important and need to be > compacted more often. > We should allow specifying these on per table basis. > Also, compaction is an MR job launched from within the metastore. There is > currently no way to control job parameters (like memory, for example) except > to specify it in hive-site.xml for metastore which means they are site wide. > Should add a way to specify these per table (perhaps even per compaction if > launched via ALTER TABLE) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13354) Add ability to specify Compaction options per table and per request
[ https://issues.apache.org/jira/browse/HIVE-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15301014#comment-15301014 ] Eugene Koifman commented on HIVE-13354: --- Couple of nits 1. it seem like 'compactor.mapreduce.map.memory.mb' in {quote} 872 executeStatementOnDriver("CREATE TABLE " + tblName2 + "(a INT, b STRING) " + 873 " CLUSTERED BY(a) INTO 2 BUCKETS STORED AS ORC TBLPROPERTIES (" + 874 "'transactional'='true'," + 875 "'compactor.mapreduce.map.memory.mb'='2048'," + // 2048 MB memory for compaction map job 876 "'compactorthreshold.hive.compactor.delta.num.threshold'='4'," + // minor compaction if more than 4 delta dirs 877 "'compactorthreshold.hive.compactor.delta.pct.threshold'='0.5'" + // major compaction if more than 50% 878 ")", driver); {quote} is never tested. Is it possible? 2. perhaps props like "compactor." should have symbolic constants if not Enums somewhere otherwise looks good > Add ability to specify Compaction options per table and per request > --- > > Key: HIVE-13354 > URL: https://issues.apache.org/jira/browse/HIVE-13354 > Project: Hive > Issue Type: Improvement >Affects Versions: 1.3.0, 2.0.0 >Reporter: Eugene Koifman >Assignee: Wei Zheng > Labels: TODOC2.1 > Attachments: HIVE-13354.1.patch, > HIVE-13354.1.withoutSchemaChange.patch, HIVE-13354.2.patch > > > Currently the are a few options that determine when automatic compaction is > triggered. They are specified once for the warehouse. > This doesn't make sense - some table may be more important and need to be > compacted more often. > We should allow specifying these on per table basis. > Also, compaction is an MR job launched from within the metastore. There is > currently no way to control job parameters (like memory, for example) except > to specify it in hive-site.xml for metastore which means they are site wide. > Should add a way to specify these per table (perhaps even per compaction if > launched via ALTER TABLE) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13354) Add ability to specify Compaction options per table and per request
[ https://issues.apache.org/jira/browse/HIVE-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299524#comment-15299524 ] Wei Zheng commented on HIVE-13354: -- The only failure that looks related is TestTxnCommands.testSimpleAcidInsert. But this one doesn't fail local to me. [~ekoifman] Can you take a look at patch 2? > Add ability to specify Compaction options per table and per request > --- > > Key: HIVE-13354 > URL: https://issues.apache.org/jira/browse/HIVE-13354 > Project: Hive > Issue Type: Improvement >Affects Versions: 1.3.0, 2.0.0 >Reporter: Eugene Koifman >Assignee: Wei Zheng > Labels: TODOC2.1 > Attachments: HIVE-13354.1.patch, > HIVE-13354.1.withoutSchemaChange.patch, HIVE-13354.2.patch > > > Currently the are a few options that determine when automatic compaction is > triggered. They are specified once for the warehouse. > This doesn't make sense - some table may be more important and need to be > compacted more often. > We should allow specifying these on per table basis. > Also, compaction is an MR job launched from within the metastore. There is > currently no way to control job parameters (like memory, for example) except > to specify it in hive-site.xml for metastore which means they are site wide. > Should add a way to specify these per table (perhaps even per compaction if > launched via ALTER TABLE) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13354) Add ability to specify Compaction options per table and per request
[ https://issues.apache.org/jira/browse/HIVE-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299500#comment-15299500 ] Hive QA commented on HIVE-13354: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12805815/HIVE-13354.2.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 80 failed/errored test(s), 9902 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file TestMiniTezCliDriver-auto_sortmerge_join_16.q-skewjoin.q-vectorization_div0.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-join1.q-mapjoin_decimal.q-union5.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-load_dyn_part2.q-selectDistinctStar.q-vector_decimal_5.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-schema_evol_text_nonvec_mapwork_table.q-vector_decimal_trailing.q-subquery_in.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-tez_union_group_by.q-vector_auto_smb_mapjoin_14.q-union_fast_stats.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-update_orig_table.q-union2.q-bucket4.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-auto_join30.q-join2.q-input17.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-groupby2.q-custom_input_output_format.q-join41.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-groupby3_map.q-skewjoinopt8.q-union_remove_1.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-groupby_complex_types.q-groupby_map_ppr_multi_distinct.q-vectorization_16.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-groupby_grouping_id2.q-vectorization_13.q-auto_sortmerge_join_13.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-join_cond_pushdown_unqual4.q-bucketmapjoin12.q-avro_decimal_native.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-parallel_join1.q-escape_distributeby1.q-auto_sortmerge_join_7.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-ppd_transform.q-union_remove_7.q-date_udf.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-script_pipe.q-stats12.q-auto_join24.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-skewjoinopt15.q-join39.q-avro_joins_native.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ivyDownload org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_7 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_metadata_only_queries_with_filters org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge9 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_schema_evol_text_vec_mapwork_table org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union_dynamic_partition org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_update_where_partitioned org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_complex_all org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_round_2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_elt org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_left_outer_join org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_orderby_5 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_context org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_nested_mapjoin org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join21 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join28 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_10 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cbo_simple_select org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby6 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby7_map org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_input_part2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join5 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_leftsemijoin org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_subquery2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert
[jira] [Commented] (HIVE-13354) Add ability to specify Compaction options per table and per request
[ https://issues.apache.org/jira/browse/HIVE-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297708#comment-15297708 ] Wei Zheng commented on HIVE-13354: -- Thanks [~ekoifman] for the review. 1. I moved the setConf later to make it clearer. 2. You're right. "ready for cleaning" is due to the SQL failure in CompactionTxnHandler. After fixing the unmatching "?"s, I got "succeeded" response. 3. "size4" is due to the serialization scheme of jobConf (4 being the length of 8192). The complete output of job.get("hive.compactor.table.props") is this: {code} 11:9:totalSize4:207617:orc.compress.size4:819253:compactorthreshold.hive.compactor.delta.pct.threshold3:0.57:numRows1:711:rawDataSize1:021:COLUMN_STATS_ACCURATE22:{"BASIC_STATS":"true"}53:compactorthreshold.hive.compactor.delta.num.threshold1:48:numFiles1:421:transient_lastDdlTime10:146403755713:transactional4:true33:compactor.mapreduce.map.memory.mb4:2048 {code} 4. Deprecated the old compact() signature. 5. Fixed unmatching number of value entries in insert statement. 6. Removed cc_tblproperties from purgeCompactionHistory(). > Add ability to specify Compaction options per table and per request > --- > > Key: HIVE-13354 > URL: https://issues.apache.org/jira/browse/HIVE-13354 > Project: Hive > Issue Type: Improvement >Affects Versions: 1.3.0, 2.0.0 >Reporter: Eugene Koifman >Assignee: Wei Zheng > Labels: TODOC2.1 > Attachments: HIVE-13354.1.patch, > HIVE-13354.1.withoutSchemaChange.patch, HIVE-13354.2.patch > > > Currently the are a few options that determine when automatic compaction is > triggered. They are specified once for the warehouse. > This doesn't make sense - some table may be more important and need to be > compacted more often. > We should allow specifying these on per table basis. > Also, compaction is an MR job launched from within the metastore. There is > currently no way to control job parameters (like memory, for example) except > to specify it in hive-site.xml for metastore which means they are site wide. > Should add a way to specify these per table (perhaps even per compaction if > launched via ALTER TABLE) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13354) Add ability to specify Compaction options per table and per request
[ https://issues.apache.org/jira/browse/HIVE-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15295273#comment-15295273 ] Hive QA commented on HIVE-13354: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12804101/HIVE-13354.1.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: http://ec2-54-177-240-2.us-west-1.compute.amazonaws.com/job/PreCommit-HIVE-MASTER-Build/351/testReport Console output: http://ec2-54-177-240-2.us-west-1.compute.amazonaws.com/job/PreCommit-HIVE-MASTER-Build/351/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-351/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/lib64/qt-3.3/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/lib64/qt-3.3/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-MASTER-Build-351/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at 2c3ebf8 HIVE-13699: Make JavaDataModel#get thread safe for parallel compilation (Peter Slawski via Ashutosh Chauhan) + git clean -f -d Removing common/src/java/org/apache/hadoop/hive/common/UgiFactory.java Removing common/src/java/org/apache/hadoop/hive/conf/HiveConf.java.orig Removing llap-server/src/java/org/apache/hadoop/hive/llap/security/LlapUgiFactoryFactory.java + git checkout master Already on 'master' + git reset --hard origin/master HEAD is now at 2c3ebf8 HIVE-13699: Make JavaDataModel#get thread safe for parallel compilation (Peter Slawski via Ashutosh Chauhan) + git merge --ff-only origin/master Already up-to-date. + git gc + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12804101 - PreCommit-HIVE-MASTER-Build > Add ability to specify Compaction options per table and per request > --- > > Key: HIVE-13354 > URL: https://issues.apache.org/jira/browse/HIVE-13354 > Project: Hive > Issue Type: Improvement >Affects Versions: 1.3.0, 2.0.0 >Reporter: Eugene Koifman >Assignee: Wei Zheng > Labels: TODOC2.1 > Attachments: HIVE-13354.1.patch, > HIVE-13354.1.withoutSchemaChange.patch > > > Currently the are a few options that determine when automatic compaction is > triggered. They are specified once for the warehouse. > This doesn't make sense - some table may be more important and need to be > compacted more often. > We should allow specifying these on per table basis. > Also, compaction is an MR job launched from within the metastore. There is > currently no way to control job parameters (like memory, for example) except > to specify it in hive-site.xml for metastore which means they are site wide. > Should add a way to specify these per table (perhaps even per compaction if > launched via ALTER TABLE) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13354) Add ability to specify Compaction options per table and per request
[ https://issues.apache.org/jira/browse/HIVE-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292324#comment-15292324 ] Eugene Koifman commented on HIVE-13354: --- {quote} // intentionally set this high so that ttp1 will not trigger major compaction later on conf.setFloatVar(HiveConf.ConfVars.HIVE_COMPACTOR_DELTA_PCT_THRESHOLD, 0.8f); {quote} could this be moved to where it's used - it's confusing at its current location {quote} runWorker(conf); // compact ttp2 runWorker(conf); // compact ttp1 runCleaner(conf); rsp = txnHandler.showCompact(new ShowCompactRequest()); Assert.assertEquals(2, rsp.getCompacts().size()); Assert.assertEquals("ttp2", rsp.getCompacts().get(0).getTablename()); Assert.assertEquals("ready for cleaning", rsp.getCompacts().get(0).getState()); Assert.assertEquals("ttp1", rsp.getCompacts().get(1).getTablename()); Assert.assertEquals("ready for cleaning", rsp.getCompacts().get(1).getState()); {quote} The "ready for cleaning" seems suspicious after successful runCleaner()... Also, perhaps TxnStrore.CLEANING_RESPONSE would be better {quote} // ttp1 has 0.8 for DELTA_PCT_THRESHOLD (from hive conf), whereas ttp2 has 0.5 (from tblproperties) // so only ttp2 will trigger major compaction for the newly inserted row (actual pct: 0.66) {quote} this seems wrong.ttp2 had 5 rows which were Major compacted into a base. Now 2 more rows are added. 2/5 = 40% Perhaps compaction is triggered because in this case ORC headers make up 99% of the file size. bq. 949 Assert.assertEquals("ready for cleaning", rsp.getCompacts().get(2).getState()); I would've expected this state to be TxnStore.SUCCEEDED_RESPONSE after runCleaner(). Why isn't it? bq. 973 Assert.assertTrue(job.get("hive.compactor.table.props").contains("orc.compress.size4:8192")); Why "size4"? {quote} void compact(String dbname, String tableName, String partitionName, CompactionType type, 1440 Maptblproperties) throws TException; 1440 {quote} This is public API change so should probably deprecate the method with old signature {quote} 348 pStmt = dbConn.prepareStatement("insert into COMPLETED_COMPACTIONS(CC_ID, CC_DATABASE, CC_TABLE, CC_PARTITION, CC_STATE, CC_TYPE, CC_TBLPROPERTIES, CC_WORKER_ID, CC_START, CC_END, CC_RUN_AS, CC_HIGHEST_TXN_ID, CC_META_INFO, CC_HADOOP_JOB_ID) VALUES(?,?,?,?,?, ?,?,?,?,?, ?,?,?)"); {quote} A new column is added here but the number of "?" is the same. How does this work? {quote} 714 rs = stmt.executeQuery("select cc_id, cc_database, cc_table, cc_partition, cc_state, " + 715 "cc_tblproperties from COMPLETED_COMPACTIONS order by cc_database, cc_table, " + 716 "cc_partition, cc_id desc"); {quote} Why do you need to know cc_tblproperties in order to delete the entry from history? etc > Add ability to specify Compaction options per table and per request > --- > > Key: HIVE-13354 > URL: https://issues.apache.org/jira/browse/HIVE-13354 > Project: Hive > Issue Type: Improvement >Affects Versions: 1.3.0, 2.0.0 >Reporter: Eugene Koifman >Assignee: Wei Zheng > Labels: TODOC2.1 > Attachments: HIVE-13354.1.patch, > HIVE-13354.1.withoutSchemaChange.patch > > > Currently the are a few options that determine when automatic compaction is > triggered. They are specified once for the warehouse. > This doesn't make sense - some table may be more important and need to be > compacted more often. > We should allow specifying these on per table basis. > Also, compaction is an MR job launched from within the metastore. There is > currently no way to control job parameters (like memory, for example) except > to specify it in hive-site.xml for metastore which means they are site wide. > Should add a way to specify these per table (perhaps even per compaction if > launched via ALTER TABLE) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13354) Add ability to specify Compaction options per table and per request
[ https://issues.apache.org/jira/browse/HIVE-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267792#comment-15267792 ] Wei Zheng commented on HIVE-13354: -- New usages after this improvement. - Allow new tblproperties on DDL. - Specify compactor MR job properties. e.g. CREATE TABLE t1 ... TBLPROPERTIES ('compactor.mapreduce.map.memory.mb'='1024'); - Specify compactor thresholds for triggering compaction (currently, hive.compactor.delta.num.threshold and hive.compactor.delta.pct.threshold). e.g. CREATE TABLE t1 ... TBLPROPERTIES ('compactorthreshold.hive.compactor.delta.num.threshold'='5'); - Allow tblproperties on ALTER TABLE .. COMPACT. - Speficy compactor MR job properties or other hive properties. ALTER TABLE t1 ... COMPACT ... WITH OVERWRITE TBLPROPERTIES ('compactor.mapreduce.map.memory.mb'='1024', 'tblprops.orc.compress.size'='8192'); > Add ability to specify Compaction options per table and per request > --- > > Key: HIVE-13354 > URL: https://issues.apache.org/jira/browse/HIVE-13354 > Project: Hive > Issue Type: Improvement >Affects Versions: 1.3.0, 2.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Labels: TODOC2.1 > > Currently the are a few options that determine when automatic compaction is > triggered. They are specified once for the warehouse. > This doesn't make sense - some table may be more important and need to be > compacted more often. > We should allow specifying these on per table basis. > Also, compaction is an MR job launched from within the metastore. There is > currently no way to control job parameters (like memory, for example) except > to specify it in hive-site.xml for metastore which means they are site wide. > Should add a way to specify these per table (perhaps even per compaction if > launched via ALTER TABLE) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13354) Add ability to specify Compaction options per table and per request
[ https://issues.apache.org/jira/browse/HIVE-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261172#comment-15261172 ] Wei Zheng commented on HIVE-13354: -- Schema changes in this ticket will depend on schema changes made in HIVE-13395 > Add ability to specify Compaction options per table and per request > --- > > Key: HIVE-13354 > URL: https://issues.apache.org/jira/browse/HIVE-13354 > Project: Hive > Issue Type: Improvement >Affects Versions: 1.3.0, 2.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Labels: TODOC2.1 > > Currently the are a few options that determine when automatic compaction is > triggered. They are specified once for the warehouse. > This doesn't make sense - some table may be more important and need to be > compacted more often. > We should allow specifying these on per table basis. > Also, compaction is an MR job launched from within the metastore. There is > currently no way to control job parameters (like memory, for example) except > to specify it in hive-site.xml for metastore which means they are site wide. > Should add a way to specify these per table (perhaps even per compaction if > launched via ALTER TABLE) -- This message was sent by Atlassian JIRA (v6.3.4#6332)