[jira] [Commented] (HIVE-15148) disallow loading data into bucketed tables (by default)

Hive QA (JIRA) Thu, 10 Nov 2016 16:40:25 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-15148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655666#comment-15655666
 ]


Hive QA commented on HIVE-15148:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12838433/HIVE-15148.patch

{color:green}SUCCESS:{color} +1 due to 84 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 30 failed/errored test(s), 10637 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_1] 
(batchId=59)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_2] 
(batchId=52)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_orig_table] 
(batchId=55)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table]
 (batchId=51)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=56)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=59)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_orig_table]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=145)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[alter_view_failure6]
 (batchId=83)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[bucket_mapjoin_wrong_table_metadata_1]
 (batchId=83)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[bucket_mapjoin_wrong_table_metadata_2]
 (batchId=83)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[compare_double_bigint]
 (batchId=83)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[compare_string_bigint]
 (batchId=83)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[exim_11_nonpart_noncompat_sorting]
 (batchId=83)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[input4] 
(batchId=83)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[input_part0_neg] 
(batchId=83)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[strict_join] 
(batchId=83)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[strict_orderby] 
(batchId=83)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[strict_pruning] 
(batchId=83)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[bucket_map_join_1] 
(batchId=120)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[bucket_map_join_2] 
(batchId=116)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[sample2] 
(batchId=95)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[sample4] 
(batchId=99)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[sample6] 
(batchId=120)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[sample7] 
(batchId=120)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2069/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2069/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2069/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 30 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12838433 - PreCommit-HIVE-Build

> disallow loading data into bucketed tables (by default)
> -------------------------------------------------------
>
>                 Key: HIVE-15148
>                 URL: https://issues.apache.org/jira/browse/HIVE-15148
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HIVE-15148.patch
>
>
> A few q file tests still use the following, allowed, pattern:
> {noformat}
> CREATE TABLE bucket_small (key string, value string) partitioned by (ds 
> string) CLUSTERED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
> load data local inpath '../../data/files/smallsrcsortbucket1outof4.txt' INTO 
> TABLE bucket_small partition(ds='2008-04-08');
> load data local inpath '../../data/files/smallsrcsortbucket2outof4.txt' INTO 
> TABLE bucket_small partition(ds='2008-04-08');
> {noformat}
> This relies on the user to load the correct number of files with correctly 
> hashed data and the correct order of file names; if there's some discrepancy 
> in any of the above, the queries will fail or may produce incorrect results 
> if some bucket-based optimizations kick in.
> Additionally, even if the user does everything correctly, as far as I know 
> some code derives bucket number from file name, which won't work in this case 
> (as opposed to getting buckets based on the order of files, which will work 
> here but won't work as per  HIVE-14970... sigh).
> Hive enforces bucketing in other cases (the check cannot even be disabled 
> these days), so I suggest that we either prohibit the above outright, or at 
> least add a safety config setting that would disallow it by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15148) disallow loading data into bucketed tables (by default)

Reply via email to