[jira] [Commented] (HIVE-11777) implement an option to have single ETL strategy for multiple directories

2015-09-22 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903759#comment-14903759
 ] 

Sergey Shelukhin commented on HIVE-11777:
-

No :)

> implement an option to have single ETL strategy for multiple directories
> 
>
> Key: HIVE-11777
> URL: https://issues.apache.org/jira/browse/HIVE-11777
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: hbase-metastore-branch
>
> Attachments: HIVE-11777.patch
>
>
> In case of metastore footer PPD we don't want to call PPD call with all 
> attendant SARG, MS and HBase overhead for each directory. If we wait for some 
> time (10ms? some fraction of inputs?) we can do one call without losing 
> overall perf. 
> For now make it time based.
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11777) implement an option to have single ETL strategy for multiple directories

2015-09-22 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903760#comment-14903760
 ] 

Sergey Shelukhin commented on HIVE-11777:
-

[~gopalv] this is the method to not make metastore call per split. Thoughts? 

> implement an option to have single ETL strategy for multiple directories
> 
>
> Key: HIVE-11777
> URL: https://issues.apache.org/jira/browse/HIVE-11777
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: hbase-metastore-branch
>
> Attachments: HIVE-11777.patch
>
>
> In case of metastore footer PPD we don't want to call PPD call with all 
> attendant SARG, MS and HBase overhead for each directory. If we wait for some 
> time (10ms? some fraction of inputs?) we can do one call without losing 
> overall perf. 
> For now make it time based.
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11777) implement an option to have single ETL strategy for multiple directories

2015-10-03 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14942194#comment-14942194
 ] 

Lefty Leverenz commented on HIVE-11777:
---

[~sershe], please don't forget to fix the "geneation" typo.

> implement an option to have single ETL strategy for multiple directories
> 
>
> Key: HIVE-11777
> URL: https://issues.apache.org/jira/browse/HIVE-11777
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11777.01.patch, HIVE-11777.patch
>
>
> In case of metastore footer PPD we don't want to call PPD call with all 
> attendant SARG, MS and HBase overhead for each directory. If we wait for some 
> time (10ms? some fraction of inputs?) we can do one call without losing 
> overall perf. 
> For now make it time based.
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11777) implement an option to have single ETL strategy for multiple directories

2015-10-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960110#comment-14960110
 ] 

Hive QA commented on HIVE-11777:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12766697/HIVE-11777.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 421 failed/errored test(s), 9694 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization_project
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_delete
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_delete_own_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_update
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_update_own_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_serde
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_date_serde
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_join2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_all_non_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_all_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_orig_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_tmp_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_where_no_match
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_where_non_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_where_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_whole_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization_acid
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_implicit_cast_during_insert
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_acid_dynamic_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_acid_not_bucketed
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into_with_schema
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_nonacid_from_acid
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_orig_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_update_delete
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_values_acid_not_bucketed
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_values_dynamic_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_values_non_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_values_orig_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_values_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_values_tmp_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_llap_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_create
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_createas1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_dictionary_threshold
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_diff_part_cols
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_diff_part_cols2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_empty_strings
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_ends_with_nulls
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_file_dump
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_int_type_promotion
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_llap
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge_incompat1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge_incompat2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_ppd_boolean
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_ppd_char
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_ppd_date
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_ppd_decimal
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_ppd_timestamp
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_ppd_v

[jira] [Commented] (HIVE-11777) implement an option to have single ETL strategy for multiple directories

2015-10-17 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14962108#comment-14962108
 ] 

Hive QA commented on HIVE-11777:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12767119/HIVE-11777.03.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9697 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_explode
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udtf_explode
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5695/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5695/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5695/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12767119 - PreCommit-HIVE-TRUNK-Build

> implement an option to have single ETL strategy for multiple directories
> 
>
> Key: HIVE-11777
> URL: https://issues.apache.org/jira/browse/HIVE-11777
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11777.01.patch, HIVE-11777.02.patch, 
> HIVE-11777.03.patch, HIVE-11777.patch
>
>
> In case of metastore footer PPD we don't want to call PPD call with all 
> attendant SARG, MS and HBase overhead for each directory. If we wait for some 
> time (10ms? some fraction of inputs?) we can do one call without losing 
> overall perf. 
> For now make it time based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11777) implement an option to have single ETL strategy for multiple directories

2015-10-21 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14967512#comment-14967512
 ] 

Sergey Shelukhin commented on HIVE-11777:
-

Will do before commit

> implement an option to have single ETL strategy for multiple directories
> 
>
> Key: HIVE-11777
> URL: https://issues.apache.org/jira/browse/HIVE-11777
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11777.01.patch, HIVE-11777.02.patch, 
> HIVE-11777.03.patch, HIVE-11777.patch
>
>
> In case of metastore footer PPD we don't want to call PPD call with all 
> attendant SARG, MS and HBase overhead for each directory. If we wait for some 
> time (10ms? some fraction of inputs?) we can do one call without losing 
> overall perf. 
> For now make it time based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11777) implement an option to have single ETL strategy for multiple directories

2015-10-21 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14967511#comment-14967511
 ] 

Sergey Shelukhin commented on HIVE-11777:
-

Will do before commit

> implement an option to have single ETL strategy for multiple directories
> 
>
> Key: HIVE-11777
> URL: https://issues.apache.org/jira/browse/HIVE-11777
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11777.01.patch, HIVE-11777.02.patch, 
> HIVE-11777.03.patch, HIVE-11777.patch
>
>
> In case of metastore footer PPD we don't want to call PPD call with all 
> attendant SARG, MS and HBase overhead for each directory. If we wait for some 
> time (10ms? some fraction of inputs?) we can do one call without losing 
> overall perf. 
> For now make it time based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11777) implement an option to have single ETL strategy for multiple directories

2015-11-03 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14986920#comment-14986920
 ] 

Lefty Leverenz commented on HIVE-11777:
---

+1 for the parameter description of hive.orc.splits.directory.batch.ms.

> implement an option to have single ETL strategy for multiple directories
> 
>
> Key: HIVE-11777
> URL: https://issues.apache.org/jira/browse/HIVE-11777
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11777.01.patch, HIVE-11777.02.patch, 
> HIVE-11777.03.patch, HIVE-11777.04.patch, HIVE-11777.patch
>
>
> In case of metastore footer PPD we don't want to call PPD call with all 
> attendant SARG, MS and HBase overhead for each directory. If we wait for some 
> time (10ms? some fraction of inputs?) we can do one call without losing 
> overall perf. 
> For now make it time based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11777) implement an option to have single ETL strategy for multiple directories

2015-11-03 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987014#comment-14987014
 ] 

Hive QA commented on HIVE-11777:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12770198/HIVE-11777.04.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 425 failed/errored test(s), 9745 tests 
executed
*Failed tests:*
{noformat}
TestMiniTezCliDriver-unionDistinct_1.q-insert_values_non_partitioned.q-insert_update_delete.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization_project
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_delete
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_delete_own_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_update
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_update_own_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_serde
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_date_serde
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_join2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_all_non_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_all_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_orig_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_tmp_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_where_no_match
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_where_non_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_where_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_whole_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization_acid
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_implicit_cast_during_insert
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_acid_dynamic_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_acid_not_bucketed
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into_with_schema
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_nonacid_from_acid
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_orig_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_update_delete
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_values_acid_not_bucketed
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_values_dynamic_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_values_non_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_values_orig_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_values_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_values_tmp_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_llap_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_create
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_createas1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_dictionary_threshold
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_diff_part_cols
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_diff_part_cols2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_empty_strings
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_ends_with_nulls
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_file_dump
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_int_type_promotion
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_llap
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge_incompat1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge_incompat2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_ppd_boolean
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_ppd_char
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_ppd_date
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_ppd_de

[jira] [Commented] (HIVE-11777) implement an option to have single ETL strategy for multiple directories

2015-11-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14989082#comment-14989082
 ] 

Hive QA commented on HIVE-11777:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12770394/HIVE-11777.05.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 9768 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_tez_union_with_udf
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_queries
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_union_with_udf
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5911/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5911/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5911/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12770394 - PreCommit-HIVE-TRUNK-Build

> implement an option to have single ETL strategy for multiple directories
> 
>
> Key: HIVE-11777
> URL: https://issues.apache.org/jira/browse/HIVE-11777
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11777.01.patch, HIVE-11777.02.patch, 
> HIVE-11777.03.patch, HIVE-11777.04.patch, HIVE-11777.05.patch, 
> HIVE-11777.patch
>
>
> In case of metastore footer PPD we don't want to call PPD call with all 
> attendant SARG, MS and HBase overhead for each directory. If we wait for some 
> time (10ms? some fraction of inputs?) we can do one call without losing 
> overall perf. 
> For now make it time based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11777) implement an option to have single ETL strategy for multiple directories

2015-11-10 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14999183#comment-14999183
 ] 

Prasanth Jayachandran commented on HIVE-11777:
--

Can you add test cases for combining logic? Other than that patch looks good to 
me +1 

> implement an option to have single ETL strategy for multiple directories
> 
>
> Key: HIVE-11777
> URL: https://issues.apache.org/jira/browse/HIVE-11777
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11777.01.patch, HIVE-11777.02.patch, 
> HIVE-11777.03.patch, HIVE-11777.04.patch, HIVE-11777.05.patch, 
> HIVE-11777.patch
>
>
> In case of metastore footer PPD we don't want to call PPD call with all 
> attendant SARG, MS and HBase overhead for each directory. If we wait for some 
> time (10ms? some fraction of inputs?) we can do one call without losing 
> overall perf. 
> For now make it time based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11777) implement an option to have single ETL strategy for multiple directories

2015-11-12 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15002689#comment-15002689
 ] 

Sergey Shelukhin commented on HIVE-11777:
-

Hmm. I'll see if it's easy to add unit tests.

> implement an option to have single ETL strategy for multiple directories
> 
>
> Key: HIVE-11777
> URL: https://issues.apache.org/jira/browse/HIVE-11777
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11777.01.patch, HIVE-11777.02.patch, 
> HIVE-11777.03.patch, HIVE-11777.04.patch, HIVE-11777.05.patch, 
> HIVE-11777.patch
>
>
> In case of metastore footer PPD we don't want to call PPD call with all 
> attendant SARG, MS and HBase overhead for each directory. If we wait for some 
> time (10ms? some fraction of inputs?) we can do one call without losing 
> overall perf. 
> For now make it time based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11777) implement an option to have single ETL strategy for multiple directories

2015-11-17 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15010280#comment-15010280
 ] 

Hive QA commented on HIVE-11777:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12772644/HIVE-11777.06.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6061/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6061/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6061/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-6061/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   255b2bd..7f65e36  master -> origin/master
+ git reset --hard HEAD
HEAD is now at 255b2bd Bug: HIVE-12384.1 - Union Operator may produce incorrect 
result on TEZ; missed test config update (Laljo John Pullokkaran reviewed by 
Sergey Shelukhin, Ashutosh Chauhan)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded.
+ git reset --hard origin/master
HEAD is now at 7f65e36 HIVE-12054. Create vectorized ORC write method. (omalley 
reviewed by prasanthj)
+ git merge --ff-only origin/master
Already up-to-date.
+ git gc
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12772644 - PreCommit-HIVE-TRUNK-Build

> implement an option to have single ETL strategy for multiple directories
> 
>
> Key: HIVE-11777
> URL: https://issues.apache.org/jira/browse/HIVE-11777
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.0.0
>
> Attachments: HIVE-11777.01.patch, HIVE-11777.02.patch, 
> HIVE-11777.03.patch, HIVE-11777.04.patch, HIVE-11777.05.patch, 
> HIVE-11777.06.patch, HIVE-11777.patch
>
>
> In case of metastore footer PPD we don't want to call PPD call with all 
> attendant SARG, MS and HBase overhead for each directory. If we wait for some 
> time (10ms? some fraction of inputs?) we can do one call without losing 
> overall perf. 
> For now make it time based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11777) implement an option to have single ETL strategy for multiple directories

2015-11-19 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15015350#comment-15015350
 ] 

Lefty Leverenz commented on HIVE-11777:
---

Doc note:  This adds configuration parameter 
*hive.orc.splits.directory.batch.ms* to HiveConf.java, so it needs to be 
documented in the ORC section of Configuration Properties.

* [Configuration Properties -- ORC File Format | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-ORCFileFormat]

> implement an option to have single ETL strategy for multiple directories
> 
>
> Key: HIVE-11777
> URL: https://issues.apache.org/jira/browse/HIVE-11777
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: HIVE-11777.01.patch, HIVE-11777.02.patch, 
> HIVE-11777.03.patch, HIVE-11777.04.patch, HIVE-11777.05.patch, 
> HIVE-11777.06.patch, HIVE-11777.patch
>
>
> In case of metastore footer PPD we don't want to call PPD call with all 
> attendant SARG, MS and HBase overhead for each directory. If we wait for some 
> time (10ms? some fraction of inputs?) we can do one call without losing 
> overall perf. 
> For now make it time based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11777) implement an option to have single ETL strategy for multiple directories

2015-09-09 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737973#comment-14737973
 ] 

Sergey Shelukhin commented on HIVE-11777:
-

parking it here cause I need to run

> implement an option to have single ETL strategy for multiple directories
> 
>
> Key: HIVE-11777
> URL: https://issues.apache.org/jira/browse/HIVE-11777
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11777.WIP.patch
>
>
> In case of metastore footer PPD we don't want to call PPD call with all 
> attendant SARG, MS and HBase overhead for each directory. If we wait for some 
> time (10ms? some fraction of inputs?) we can do one call without losing 
> overall perf. 
> For now make it time based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11777) implement an option to have single ETL strategy for multiple directories

2015-09-16 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791553#comment-14791553
 ] 

Lefty Leverenz commented on HIVE-11777:
---

Typo alert:  "geneation" presumably should be "generation" in the patch's 
description of *hive.orc.splits.strategy.batch.ms*.

Also, the description could be clearer.  "0 means don't" should be "0 means 
don't wait" (or maybe "0 means don't batch").  And is it just me, or is the 
phrase "to batch split strategies" confusing?

> implement an option to have single ETL strategy for multiple directories
> 
>
> Key: HIVE-11777
> URL: https://issues.apache.org/jira/browse/HIVE-11777
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11777.patch
>
>
> In case of metastore footer PPD we don't want to call PPD call with all 
> attendant SARG, MS and HBase overhead for each directory. If we wait for some 
> time (10ms? some fraction of inputs?) we can do one call without losing 
> overall perf. 
> For now make it time based.
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11777) implement an option to have single ETL strategy for multiple directories

2015-09-16 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791559#comment-14791559
 ] 

Lefty Leverenz commented on HIVE-11777:
---

Pun intended?

> implement an option to have single ETL strategy for multiple directories
> 
>
> Key: HIVE-11777
> URL: https://issues.apache.org/jira/browse/HIVE-11777
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11777.patch
>
>
> In case of metastore footer PPD we don't want to call PPD call with all 
> attendant SARG, MS and HBase overhead for each directory. If we wait for some 
> time (10ms? some fraction of inputs?) we can do one call without losing 
> overall perf. 
> For now make it time based.
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)