[jira] [Commented] (HIVE-13354) Add ability to specify Compaction options per table and per request

2016-08-18 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426089#comment-15426089
 ] 

Lefty Leverenz commented on HIVE-13354:
---

Thanks for the docs, [~wzheng], well done!  I'm removing the TODOC2.1 label.

Here are the specific links:

* [Hive Transactions -- Table Properties | 
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-TableProperties]
* [DDL -- Create Table | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable]
* [DDL -- Alter Table/Partition Compact | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/PartitionCompact]

> Add ability to specify Compaction options per table and per request
> ---
>
> Key: HIVE-13354
> URL: https://issues.apache.org/jira/browse/HIVE-13354
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
> Fix For: 1.3.0, 2.1.0
>
> Attachments: HIVE-13354.1.patch, 
> HIVE-13354.1.withoutSchemaChange.patch, HIVE-13354.2.patch, 
> HIVE-13354.3.patch, HIVE-13354.branch-1.patch
>
>
> Currently the are a few options that determine when automatic compaction is 
> triggered.  They are specified once for the warehouse.
> This doesn't make sense - some table may be more important and need to be 
> compacted more often.
> We should allow specifying these on per table basis.
> Also, compaction is an MR job launched from within the metastore.  There is 
> currently no way to control job parameters (like memory, for example) except 
> to specify it in hive-site.xml for metastore which means they are site wide.
> Should add a way to specify these per table (perhaps even per compaction if 
> launched via ALTER TABLE)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13354) Add ability to specify Compaction options per table and per request

2016-08-15 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15421768#comment-15421768
 ] 

Wei Zheng commented on HIVE-13354:
--

[~leftylev] Wiki has been updated:
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL

> Add ability to specify Compaction options per table and per request
> ---
>
> Key: HIVE-13354
> URL: https://issues.apache.org/jira/browse/HIVE-13354
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
>  Labels: TODOC2.1
> Fix For: 1.3.0, 2.1.0
>
> Attachments: HIVE-13354.1.patch, 
> HIVE-13354.1.withoutSchemaChange.patch, HIVE-13354.2.patch, 
> HIVE-13354.3.patch, HIVE-13354.branch-1.patch
>
>
> Currently the are a few options that determine when automatic compaction is 
> triggered.  They are specified once for the warehouse.
> This doesn't make sense - some table may be more important and need to be 
> compacted more often.
> We should allow specifying these on per table basis.
> Also, compaction is an MR job launched from within the metastore.  There is 
> currently no way to control job parameters (like memory, for example) except 
> to specify it in hive-site.xml for metastore which means they are site wide.
> Should add a way to specify these per table (perhaps even per compaction if 
> launched via ALTER TABLE)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13354) Add ability to specify Compaction options per table and per request

2016-05-27 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304482#comment-15304482
 ] 

Wei Zheng commented on HIVE-13354:
--

Test failures unrelated.

Test Name
Duration
Age
 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_memcheck
3.7 sec 1
 
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testDelayedLocalityNodeCommErrorImmediateAllocation
10 sec  1
 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_load_dyn_part2 
1 min 9 sec 1
 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_load_dyn_part6 
1.6 sec 1
 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan  
0.82 sec1
 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_temp_table_gb1 
0.65 sec1
 
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_13
 0.82 sec1
 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join28 0.87 
sec1
 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_leftsemijoin_mr
0.89 sec1
 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_16
2.3 sec 1
 
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_decimal_date
 0.59 sec1
 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join5 0.64 
sec1
 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join7 0.83 
sec1
 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_char_4  
0.94 sec1
 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_17 
0.97 sec1
 org.apache.hadoop.hive.ql.TestTxnCommands.testSimpleAcidInsert 2 min 41 sec
1
 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_partitioned  
21 sec  2
 
org.apache.hadoop.hive.llap.daemon.impl.comparator.TestShortestJobFirstComparator.testWaitQueueComparatorWithinDagPriority
 5 sec   18
 
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner
3.2 sec 22
 
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
 1 min 34 sec38
 
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
1 min 25 sec38
 
org.apache.hive.minikdc.TestHiveAuthFactory.testStartTokenManagerForMemoryTokenStore
   1.6 sec 38
 
org.apache.hive.minikdc.TestHiveAuthFactory.testStartTokenManagerForDBTokenStore
   0.25 sec38
 org.apache.hive.minikdc.TestMiniHiveKdc.testLogin  1 min 30 sec38
 org.apache.hadoop.hive.llap.tez.TestConverters.testFragmentSpecToTaskSpec  
43 ms   58
 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_selectindate13 sec  
90
 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avrocountemptytbl   
11 sec  90
 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order_null  41 sec  
90
 
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_with_different_encryption_keys
 1 min 33 sec90
 
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
10 sec  90
 
org.apache.hadoop.hive.cli.TestMinimrCliDriver.org.apache.hadoop.hive.cli.TestMinimrCliDriver
  1 min 2 sec 90

> Add ability to specify Compaction options per table and per request
> ---
>
> Key: HIVE-13354
> URL: https://issues.apache.org/jira/browse/HIVE-13354
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
>  Labels: TODOC2.1
> Attachments: HIVE-13354.1.patch, 
> HIVE-13354.1.withoutSchemaChange.patch, HIVE-13354.2.patch, HIVE-13354.3.patch
>
>
> Currently the are a few options that determine when automatic compaction is 
> triggered.  They are specified once for the warehouse.
> This doesn't make sense - some table may be more important and need to be 
> compacted more often.
> We should allow specifying these on per table basis.
> Also, compaction is an MR job launched from within the metastore.  There is 
> currently no way to control job parameters (like memory, for example) except 
> to specify it in hive-site.xml for metastore which means they are site wide.
> Should add a way to specify these per table (perhaps even per compaction if 
> launched via ALTER TABLE)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13354) Add ability to specify Compaction options per table and per request

2016-05-26 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303240#comment-15303240
 ] 

Eugene Koifman commented on HIVE-13354:
---

+1 pending tests


> Add ability to specify Compaction options per table and per request
> ---
>
> Key: HIVE-13354
> URL: https://issues.apache.org/jira/browse/HIVE-13354
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
>  Labels: TODOC2.1
> Attachments: HIVE-13354.1.patch, 
> HIVE-13354.1.withoutSchemaChange.patch, HIVE-13354.2.patch, HIVE-13354.3.patch
>
>
> Currently the are a few options that determine when automatic compaction is 
> triggered.  They are specified once for the warehouse.
> This doesn't make sense - some table may be more important and need to be 
> compacted more often.
> We should allow specifying these on per table basis.
> Also, compaction is an MR job launched from within the metastore.  There is 
> currently no way to control job parameters (like memory, for example) except 
> to specify it in hive-site.xml for metastore which means they are site wide.
> Should add a way to specify these per table (perhaps even per compaction if 
> launched via ALTER TABLE)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13354) Add ability to specify Compaction options per table and per request

2016-05-25 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15301014#comment-15301014
 ] 

Eugene Koifman commented on HIVE-13354:
---

Couple of nits
1. it seem like 'compactor.mapreduce.map.memory.mb' in
{quote}
872 executeStatementOnDriver("CREATE TABLE " + tblName2 + "(a INT, b 
STRING) " +
873 " CLUSTERED BY(a) INTO 2 BUCKETS STORED AS ORC TBLPROPERTIES (" 
+
874 "'transactional'='true'," +
875 "'compactor.mapreduce.map.memory.mb'='2048'," + // 2048 MB 
memory for compaction map job
876 "'compactorthreshold.hive.compactor.delta.num.threshold'='4'," 
+  // minor compaction if more than 4 delta dirs
877 "'compactorthreshold.hive.compactor.delta.pct.threshold'='0.5'" 
+ // major compaction if more than 50%
878 ")", driver);
{quote}
is never tested.  Is it possible?

2. perhaps props like "compactor." should have symbolic constants if not Enums 
somewhere

otherwise looks good

> Add ability to specify Compaction options per table and per request
> ---
>
> Key: HIVE-13354
> URL: https://issues.apache.org/jira/browse/HIVE-13354
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
>  Labels: TODOC2.1
> Attachments: HIVE-13354.1.patch, 
> HIVE-13354.1.withoutSchemaChange.patch, HIVE-13354.2.patch
>
>
> Currently the are a few options that determine when automatic compaction is 
> triggered.  They are specified once for the warehouse.
> This doesn't make sense - some table may be more important and need to be 
> compacted more often.
> We should allow specifying these on per table basis.
> Also, compaction is an MR job launched from within the metastore.  There is 
> currently no way to control job parameters (like memory, for example) except 
> to specify it in hive-site.xml for metastore which means they are site wide.
> Should add a way to specify these per table (perhaps even per compaction if 
> launched via ALTER TABLE)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13354) Add ability to specify Compaction options per table and per request

2016-05-25 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299524#comment-15299524
 ] 

Wei Zheng commented on HIVE-13354:
--

The only failure that looks related is TestTxnCommands.testSimpleAcidInsert. 
But this one doesn't fail local to me.

[~ekoifman] Can you take a look at patch 2?

> Add ability to specify Compaction options per table and per request
> ---
>
> Key: HIVE-13354
> URL: https://issues.apache.org/jira/browse/HIVE-13354
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
>  Labels: TODOC2.1
> Attachments: HIVE-13354.1.patch, 
> HIVE-13354.1.withoutSchemaChange.patch, HIVE-13354.2.patch
>
>
> Currently the are a few options that determine when automatic compaction is 
> triggered.  They are specified once for the warehouse.
> This doesn't make sense - some table may be more important and need to be 
> compacted more often.
> We should allow specifying these on per table basis.
> Also, compaction is an MR job launched from within the metastore.  There is 
> currently no way to control job parameters (like memory, for example) except 
> to specify it in hive-site.xml for metastore which means they are site wide.
> Should add a way to specify these per table (perhaps even per compaction if 
> launched via ALTER TABLE)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13354) Add ability to specify Compaction options per table and per request

2016-05-24 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299500#comment-15299500
 ] 

Hive QA commented on HIVE-13354:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12805815/HIVE-13354.2.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 80 failed/errored test(s), 9902 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
TestMiniTezCliDriver-auto_sortmerge_join_16.q-skewjoin.q-vectorization_div0.q-and-12-more
 - did not produce a TEST-*.xml file
TestMiniTezCliDriver-join1.q-mapjoin_decimal.q-union5.q-and-12-more - did not 
produce a TEST-*.xml file
TestMiniTezCliDriver-load_dyn_part2.q-selectDistinctStar.q-vector_decimal_5.q-and-12-more
 - did not produce a TEST-*.xml file
TestMiniTezCliDriver-schema_evol_text_nonvec_mapwork_table.q-vector_decimal_trailing.q-subquery_in.q-and-12-more
 - did not produce a TEST-*.xml file
TestMiniTezCliDriver-tez_union_group_by.q-vector_auto_smb_mapjoin_14.q-union_fast_stats.q-and-12-more
 - did not produce a TEST-*.xml file
TestMiniTezCliDriver-update_orig_table.q-union2.q-bucket4.q-and-12-more - did 
not produce a TEST-*.xml file
TestSparkCliDriver-auto_join30.q-join2.q-input17.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-groupby2.q-custom_input_output_format.q-join41.q-and-12-more 
- did not produce a TEST-*.xml file
TestSparkCliDriver-groupby3_map.q-skewjoinopt8.q-union_remove_1.q-and-12-more - 
did not produce a TEST-*.xml file
TestSparkCliDriver-groupby_complex_types.q-groupby_map_ppr_multi_distinct.q-vectorization_16.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-groupby_grouping_id2.q-vectorization_13.q-auto_sortmerge_join_13.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-join_cond_pushdown_unqual4.q-bucketmapjoin12.q-avro_decimal_native.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-parallel_join1.q-escape_distributeby1.q-auto_sortmerge_join_7.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-ppd_transform.q-union_remove_7.q-date_udf.q-and-12-more - 
did not produce a TEST-*.xml file
TestSparkCliDriver-script_pipe.q-stats12.q-auto_join24.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-skewjoinopt15.q-join39.q-avro_joins_native.q-and-12-more - 
did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ivyDownload
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_7
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_metadata_only_queries_with_filters
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge9
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_schema_evol_text_vec_mapwork_table
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union_dynamic_partition
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_update_where_partitioned
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_complex_all
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_round_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_elt
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_left_outer_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_orderby_5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_context
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_nested_mapjoin
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join21
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join28
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_10
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cbo_simple_select
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby6
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby7_map
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_input_part2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join5
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_leftsemijoin
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_subquery2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert

[jira] [Commented] (HIVE-13354) Add ability to specify Compaction options per table and per request

2016-05-23 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297708#comment-15297708
 ] 

Wei Zheng commented on HIVE-13354:
--

Thanks [~ekoifman] for the review.
1. I moved the setConf later to make it clearer.
2. You're right. "ready for cleaning" is due to the SQL failure in 
CompactionTxnHandler. After fixing the unmatching "?"s, I got "succeeded" 
response.
3. "size4" is due to the serialization scheme of jobConf (4 being the length of 
8192). The complete output of job.get("hive.compactor.table.props") is this:
{code}
11:9:totalSize4:207617:orc.compress.size4:819253:compactorthreshold.hive.compactor.delta.pct.threshold3:0.57:numRows1:711:rawDataSize1:021:COLUMN_STATS_ACCURATE22:{"BASIC_STATS":"true"}53:compactorthreshold.hive.compactor.delta.num.threshold1:48:numFiles1:421:transient_lastDdlTime10:146403755713:transactional4:true33:compactor.mapreduce.map.memory.mb4:2048
{code}
4. Deprecated the old compact() signature.
5. Fixed unmatching number of value entries in insert statement.
6. Removed cc_tblproperties from purgeCompactionHistory().

> Add ability to specify Compaction options per table and per request
> ---
>
> Key: HIVE-13354
> URL: https://issues.apache.org/jira/browse/HIVE-13354
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
>  Labels: TODOC2.1
> Attachments: HIVE-13354.1.patch, 
> HIVE-13354.1.withoutSchemaChange.patch, HIVE-13354.2.patch
>
>
> Currently the are a few options that determine when automatic compaction is 
> triggered.  They are specified once for the warehouse.
> This doesn't make sense - some table may be more important and need to be 
> compacted more often.
> We should allow specifying these on per table basis.
> Also, compaction is an MR job launched from within the metastore.  There is 
> currently no way to control job parameters (like memory, for example) except 
> to specify it in hive-site.xml for metastore which means they are site wide.
> Should add a way to specify these per table (perhaps even per compaction if 
> launched via ALTER TABLE)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13354) Add ability to specify Compaction options per table and per request

2016-05-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15295273#comment-15295273
 ] 

Hive QA commented on HIVE-13354:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12804101/HIVE-13354.1.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
http://ec2-54-177-240-2.us-west-1.compute.amazonaws.com/job/PreCommit-HIVE-MASTER-Build/351/testReport
Console output: 
http://ec2-54-177-240-2.us-west-1.compute.amazonaws.com/job/PreCommit-HIVE-MASTER-Build/351/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-351/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/lib64/qt-3.3/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/lib64/qt-3.3/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-MASTER-Build-351/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 2c3ebf8 HIVE-13699: Make JavaDataModel#get thread safe for 
parallel compilation (Peter Slawski via Ashutosh Chauhan)
+ git clean -f -d
Removing common/src/java/org/apache/hadoop/hive/common/UgiFactory.java
Removing common/src/java/org/apache/hadoop/hive/conf/HiveConf.java.orig
Removing 
llap-server/src/java/org/apache/hadoop/hive/llap/security/LlapUgiFactoryFactory.java
+ git checkout master
Already on 'master'
+ git reset --hard origin/master
HEAD is now at 2c3ebf8 HIVE-13699: Make JavaDataModel#get thread safe for 
parallel compilation (Peter Slawski via Ashutosh Chauhan)
+ git merge --ff-only origin/master
Already up-to-date.
+ git gc
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12804101 - PreCommit-HIVE-MASTER-Build

> Add ability to specify Compaction options per table and per request
> ---
>
> Key: HIVE-13354
> URL: https://issues.apache.org/jira/browse/HIVE-13354
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
>  Labels: TODOC2.1
> Attachments: HIVE-13354.1.patch, 
> HIVE-13354.1.withoutSchemaChange.patch
>
>
> Currently the are a few options that determine when automatic compaction is 
> triggered.  They are specified once for the warehouse.
> This doesn't make sense - some table may be more important and need to be 
> compacted more often.
> We should allow specifying these on per table basis.
> Also, compaction is an MR job launched from within the metastore.  There is 
> currently no way to control job parameters (like memory, for example) except 
> to specify it in hive-site.xml for metastore which means they are site wide.
> Should add a way to specify these per table (perhaps even per compaction if 
> launched via ALTER TABLE)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13354) Add ability to specify Compaction options per table and per request

2016-05-19 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292324#comment-15292324
 ] 

Eugene Koifman commented on HIVE-13354:
---

{quote} // intentionally set this high so that ttp1 will not trigger major 
compaction later on
   conf.setFloatVar(HiveConf.ConfVars.HIVE_COMPACTOR_DELTA_PCT_THRESHOLD, 
0.8f);
{quote}
could this be moved to where it's used - it's confusing at its current location

{quote}
   runWorker(conf);  // compact ttp2
runWorker(conf);  // compact ttp1
runCleaner(conf);
rsp = txnHandler.showCompact(new ShowCompactRequest());
Assert.assertEquals(2, rsp.getCompacts().size());
Assert.assertEquals("ttp2", 
rsp.getCompacts().get(0).getTablename());
Assert.assertEquals("ready for cleaning", 
rsp.getCompacts().get(0).getState());
Assert.assertEquals("ttp1", 
rsp.getCompacts().get(1).getTablename());
Assert.assertEquals("ready for cleaning", 
rsp.getCompacts().get(1).getState());
{quote}
The "ready for cleaning" seems suspicious after successful runCleaner()...  
Also, perhaps TxnStrore.CLEANING_RESPONSE would be better

{quote}
   // ttp1 has 0.8 for DELTA_PCT_THRESHOLD (from hive conf), whereas 
ttp2 has 0.5 (from tblproperties)
// so only ttp2 will trigger major compaction for the newly 
inserted row (actual pct: 0.66)
{quote}
this seems wrong.ttp2 had 5 rows which were Major compacted into a base.  
Now 2 more rows are added.  2/5 = 40%
Perhaps compaction is triggered because in this case ORC headers make up 99% of 
the file size.

bq. 949 Assert.assertEquals("ready for cleaning", 
rsp.getCompacts().get(2).getState());
I would've expected this state to be TxnStore.SUCCEEDED_RESPONSE after 
runCleaner().  Why isn't it?

bq. 973 
Assert.assertTrue(job.get("hive.compactor.table.props").contains("orc.compress.size4:8192"));
Why "size4"?

{quote}
void compact(String dbname, String tableName, String partitionName, 
CompactionType type,
1440   Map tblproperties) throws TException;
1440
{quote}
This is public API change so should probably deprecate the method with old 
signature

{quote}
348 pStmt = dbConn.prepareStatement("insert into COMPLETED_COMPACTIONS(CC_ID, 
CC_DATABASE, CC_TABLE, CC_PARTITION, CC_STATE, CC_TYPE, CC_TBLPROPERTIES, 
CC_WORKER_ID, CC_START, CC_END, CC_RUN_AS, CC_HIGHEST_TXN_ID, CC_META_INFO, 
CC_HADOOP_JOB_ID) VALUES(?,?,?,?,?, ?,?,?,?,?, ?,?,?)");
{quote}
A new column is added here but the number of "?" is the same.  How does this 
work?

{quote}
714 rs = stmt.executeQuery("select cc_id, cc_database, cc_table, 
cc_partition, cc_state, " +
715 "cc_tblproperties from COMPLETED_COMPACTIONS order by 
cc_database, cc_table, " +
716 "cc_partition, cc_id desc");
{quote}
Why do you need to know cc_tblproperties in order to delete the entry from 
history?

etc


> Add ability to specify Compaction options per table and per request
> ---
>
> Key: HIVE-13354
> URL: https://issues.apache.org/jira/browse/HIVE-13354
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
>  Labels: TODOC2.1
> Attachments: HIVE-13354.1.patch, 
> HIVE-13354.1.withoutSchemaChange.patch
>
>
> Currently the are a few options that determine when automatic compaction is 
> triggered.  They are specified once for the warehouse.
> This doesn't make sense - some table may be more important and need to be 
> compacted more often.
> We should allow specifying these on per table basis.
> Also, compaction is an MR job launched from within the metastore.  There is 
> currently no way to control job parameters (like memory, for example) except 
> to specify it in hive-site.xml for metastore which means they are site wide.
> Should add a way to specify these per table (perhaps even per compaction if 
> launched via ALTER TABLE)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13354) Add ability to specify Compaction options per table and per request

2016-05-02 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267792#comment-15267792
 ] 

Wei Zheng commented on HIVE-13354:
--

New usages after this improvement.

- Allow new tblproperties on DDL.
- Specify compactor MR job properties.
  e.g. CREATE TABLE t1 ... TBLPROPERTIES 
('compactor.mapreduce.map.memory.mb'='1024');
- Specify compactor thresholds for triggering compaction (currently, 
hive.compactor.delta.num.threshold and hive.compactor.delta.pct.threshold).
  e.g. CREATE TABLE t1 ... TBLPROPERTIES 
('compactorthreshold.hive.compactor.delta.num.threshold'='5');

- Allow tblproperties on ALTER TABLE .. COMPACT.
- Speficy compactor MR job properties or other hive properties.
  ALTER TABLE t1 ... COMPACT ... WITH OVERWRITE TBLPROPERTIES 
('compactor.mapreduce.map.memory.mb'='1024', 
'tblprops.orc.compress.size'='8192');

> Add ability to specify Compaction options per table and per request
> ---
>
> Key: HIVE-13354
> URL: https://issues.apache.org/jira/browse/HIVE-13354
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>  Labels: TODOC2.1
>
> Currently the are a few options that determine when automatic compaction is 
> triggered.  They are specified once for the warehouse.
> This doesn't make sense - some table may be more important and need to be 
> compacted more often.
> We should allow specifying these on per table basis.
> Also, compaction is an MR job launched from within the metastore.  There is 
> currently no way to control job parameters (like memory, for example) except 
> to specify it in hive-site.xml for metastore which means they are site wide.
> Should add a way to specify these per table (perhaps even per compaction if 
> launched via ALTER TABLE)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13354) Add ability to specify Compaction options per table and per request

2016-04-27 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261172#comment-15261172
 ] 

Wei Zheng commented on HIVE-13354:
--

Schema changes in this ticket will depend on schema changes made in HIVE-13395

> Add ability to specify Compaction options per table and per request
> ---
>
> Key: HIVE-13354
> URL: https://issues.apache.org/jira/browse/HIVE-13354
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>  Labels: TODOC2.1
>
> Currently the are a few options that determine when automatic compaction is 
> triggered.  They are specified once for the warehouse.
> This doesn't make sense - some table may be more important and need to be 
> compacted more often.
> We should allow specifying these on per table basis.
> Also, compaction is an MR job launched from within the metastore.  There is 
> currently no way to control job parameters (like memory, for example) except 
> to specify it in hive-site.xml for metastore which means they are site wide.
> Should add a way to specify these per table (perhaps even per compaction if 
> launched via ALTER TABLE)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)