[ https://issues.apache.org/jira/browse/HIVE-22661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008951#comment-17008951 ]
Hive QA commented on HIVE-22661: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12990009/HIVE-22661.1.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:green}SUCCESS:{color} +1 due to 17853 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/20079/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20079/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20079/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12990009 - PreCommit-HIVE-Build > Compaction fails on non bucketed table with data loaded inpath > -------------------------------------------------------------- > > Key: HIVE-22661 > URL: https://issues.apache.org/jira/browse/HIVE-22661 > Project: Hive > Issue Type: Bug > Reporter: Ádám Szita > Assignee: Ádám Szita > Priority: Major > Attachments: HIVE-22661.0.patch, HIVE-22661.1.patch, > HIVE-22661.2.patch > > > Compaction cannot handle situations where: > * data was ingested with {{LOAD DATA INPATH}} > * this ingest method is run multiple times, and > ** with different number of files getting created in the delta directories > Therefore, for file/dir structures such as: > {code:java} > /warehouse/tablespace/managed/hive/comp3/delta_0000001_0000001_0000 > /warehouse/tablespace/managed/hive/comp3/delta_0000001_0000001_0000/000000_0 > /warehouse/tablespace/managed/hive/comp3/delta_0000001_0000001_0000/000001_0 > /warehouse/tablespace/managed/hive/comp3/delta_0000002_0000002_0000 > /warehouse/tablespace/managed/hive/comp3/delta_0000002_0000002_0000/000000_0 > /warehouse/tablespace/managed/hive/comp3/delta_0000002_0000002_0000/000001_0 > /warehouse/tablespace/managed/hive/comp3/delta_0000002_0000002_0000/000002_0 > {code} > Although the table is not bucketed, bucket is calculated from the (raw) > files' names. Compaction in the above case will fail on delta1-1 not having > data for 'bucket' 2. > Steps to repro using small dataset: > {code:java} > set tez.grouping.min-size=8; > set tez.grouping.max-size=8; > set mapreduce.input.fileinputformat.split.minsize=8; > set mapreduce.input.fileinputformat.split.minsize=8; > create external table comp0 (a string); > insert into comp0 values ("qwertyuiopasdfghjklzxcvbnm"); > insert into comp0 values ("qwertyuiopasdfghjklzxcvbnm"); > create external table comp1 stored as orc as select * from comp0; > insert into comp0 values ("qwertyuiopasdfghjklzxcvbnm"); > create external table comp2 stored as orc as select * from comp0; > create table comp3 (a string); > load data inpath '/warehouse/tablespace/external/hive/comp1' into table comp3; > load data inpath '/warehouse/tablespace/external/hive/comp2' into table > comp3;{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)