[jira] [Commented] (HIVE-10980) Merge of dynamic partitions loads all data to default partition
[ https://issues.apache.org/jira/browse/HIVE-10980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14742145#comment-14742145 ] Ashutosh Chauhan commented on HIVE-10980: - +1 > Merge of dynamic partitions loads all data to default partition > --- > > Key: HIVE-10980 > URL: https://issues.apache.org/jira/browse/HIVE-10980 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 0.14.0 > Environment: HDP 2.2.4 (also reproduced on apache hive built from > trunk) >Reporter: Illya Yalovyy >Assignee: Illya Yalovyy > Attachments: HIVE-10980.patch > > > Conditions that lead to the issue: > 1. Execution engine set to MapReduce > 2. Partition columns have different types > 3. Both static and dynamic partitions are used in the query > 4. Dynamically generated partitions require merge > Result: Final data is loaded to "__HIVE_DEFAULT_PARTITION__". > Steps to reproduce: > set hive.exec.dynamic.partition=true; > set hive.exec.dynamic.partition.mode=strict; > set hive.optimize.sort.dynamic.partition=false; > set hive.merge.mapfiles=true; > set hive.merge.mapredfiles=true; > set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; > set hive.execution.engine=mr; > create external table sdp ( > dataint bigint, > hour int, > req string, > cid string, > caid string > ) > row format delimited > fields terminated by ','; > load data local inpath '../../data/files/dynpartdata1.txt' into table sdp; > load data local inpath '../../data/files/dynpartdata2.txt' into table sdp; > ... > load data local inpath '../../data/files/dynpartdataN.txt' into table sdp; > create table tdp (cid string, caid string) > partitioned by (dataint bigint, hour int, req string); > insert overwrite table tdp partition (dataint=20150316, hour=16, req) > select cid, caid, req from sdp where dataint=20150316 and hour=16; > select * from tdp order by caid; > show partitions tdp; > Example of the input file: > 20150316,16,reqA,clusterIdA,cacheId1 > 20150316,16,reqB,clusterIdB,cacheId2 > 20150316,16,reqA,clusterIdC,cacheId3 > 20150316,16,reqD,clusterIdD,cacheId4 > 20150316,16,reqA,clusterIdA,cacheId5 > Actual result: > clusterIdA cacheId12015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdA cacheId12015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdB cacheId22015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdC cacheId32015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdD cacheId42015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdA cacheId52015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdD cacheId82015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdB cacheId92015031616 > __HIVE_DEFAULT_PARTITION__ > > dataint=20150316/hour=16/req=__HIVE_DEFAULT_PARTITION__ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10980) Merge of dynamic partitions loads all data to default partition
[ https://issues.apache.org/jira/browse/HIVE-10980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741553#comment-14741553 ] Illya Yalovyy commented on HIVE-10980: -- Thank you! The patch is re-submitted. > Merge of dynamic partitions loads all data to default partition > --- > > Key: HIVE-10980 > URL: https://issues.apache.org/jira/browse/HIVE-10980 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 0.14.0 > Environment: HDP 2.2.4 (also reproduced on apache hive built from > trunk) >Reporter: Illya Yalovyy >Assignee: Illya Yalovyy > Attachments: HIVE-10980.patch > > > Conditions that lead to the issue: > 1. Execution engine set to MapReduce > 2. Partition columns have different types > 3. Both static and dynamic partitions are used in the query > 4. Dynamically generated partitions require merge > Result: Final data is loaded to "__HIVE_DEFAULT_PARTITION__". > Steps to reproduce: > set hive.exec.dynamic.partition=true; > set hive.exec.dynamic.partition.mode=strict; > set hive.optimize.sort.dynamic.partition=false; > set hive.merge.mapfiles=true; > set hive.merge.mapredfiles=true; > set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; > set hive.execution.engine=mr; > create external table sdp ( > dataint bigint, > hour int, > req string, > cid string, > caid string > ) > row format delimited > fields terminated by ','; > load data local inpath '../../data/files/dynpartdata1.txt' into table sdp; > load data local inpath '../../data/files/dynpartdata2.txt' into table sdp; > ... > load data local inpath '../../data/files/dynpartdataN.txt' into table sdp; > create table tdp (cid string, caid string) > partitioned by (dataint bigint, hour int, req string); > insert overwrite table tdp partition (dataint=20150316, hour=16, req) > select cid, caid, req from sdp where dataint=20150316 and hour=16; > select * from tdp order by caid; > show partitions tdp; > Example of the input file: > 20150316,16,reqA,clusterIdA,cacheId1 > 20150316,16,reqB,clusterIdB,cacheId2 > 20150316,16,reqA,clusterIdC,cacheId3 > 20150316,16,reqD,clusterIdD,cacheId4 > 20150316,16,reqA,clusterIdA,cacheId5 > Actual result: > clusterIdA cacheId12015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdA cacheId12015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdB cacheId22015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdC cacheId32015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdD cacheId42015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdA cacheId52015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdD cacheId82015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdB cacheId92015031616 > __HIVE_DEFAULT_PARTITION__ > > dataint=20150316/hour=16/req=__HIVE_DEFAULT_PARTITION__ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10980) Merge of dynamic partitions loads all data to default partition
[ https://issues.apache.org/jira/browse/HIVE-10980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741524#comment-14741524 ] Lefty Leverenz commented on HIVE-10980: --- [~yalovyyi], you can re-run tests by cancelling the patch (using a button on the top line) and then resubmitting it (using the Submit Patch button that will appear in the same place as the Cancel Patch button). > Merge of dynamic partitions loads all data to default partition > --- > > Key: HIVE-10980 > URL: https://issues.apache.org/jira/browse/HIVE-10980 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 0.14.0 > Environment: HDP 2.2.4 (also reproduced on apache hive built from > trunk) >Reporter: Illya Yalovyy >Assignee: Illya Yalovyy > Attachments: HIVE-10980.patch > > > Conditions that lead to the issue: > 1. Execution engine set to MapReduce > 2. Partition columns have different types > 3. Both static and dynamic partitions are used in the query > 4. Dynamically generated partitions require merge > Result: Final data is loaded to "__HIVE_DEFAULT_PARTITION__". > Steps to reproduce: > set hive.exec.dynamic.partition=true; > set hive.exec.dynamic.partition.mode=strict; > set hive.optimize.sort.dynamic.partition=false; > set hive.merge.mapfiles=true; > set hive.merge.mapredfiles=true; > set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; > set hive.execution.engine=mr; > create external table sdp ( > dataint bigint, > hour int, > req string, > cid string, > caid string > ) > row format delimited > fields terminated by ','; > load data local inpath '../../data/files/dynpartdata1.txt' into table sdp; > load data local inpath '../../data/files/dynpartdata2.txt' into table sdp; > ... > load data local inpath '../../data/files/dynpartdataN.txt' into table sdp; > create table tdp (cid string, caid string) > partitioned by (dataint bigint, hour int, req string); > insert overwrite table tdp partition (dataint=20150316, hour=16, req) > select cid, caid, req from sdp where dataint=20150316 and hour=16; > select * from tdp order by caid; > show partitions tdp; > Example of the input file: > 20150316,16,reqA,clusterIdA,cacheId1 > 20150316,16,reqB,clusterIdB,cacheId2 > 20150316,16,reqA,clusterIdC,cacheId3 > 20150316,16,reqD,clusterIdD,cacheId4 > 20150316,16,reqA,clusterIdA,cacheId5 > Actual result: > clusterIdA cacheId12015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdA cacheId12015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdB cacheId22015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdC cacheId32015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdD cacheId42015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdA cacheId52015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdD cacheId82015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdB cacheId92015031616 > __HIVE_DEFAULT_PARTITION__ > > dataint=20150316/hour=16/req=__HIVE_DEFAULT_PARTITION__ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10980) Merge of dynamic partitions loads all data to default partition
[ https://issues.apache.org/jira/browse/HIVE-10980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741476#comment-14741476 ] Illya Yalovyy commented on HIVE-10980: -- [~gopalv], I have reviewed failed tests: org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation - was failing for many build before my patch org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables_compact - is failing for other patches as well, and I wasn't able to reproduce this failure locally. What is the best way to re-run tests? > Merge of dynamic partitions loads all data to default partition > --- > > Key: HIVE-10980 > URL: https://issues.apache.org/jira/browse/HIVE-10980 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 0.14.0 > Environment: HDP 2.2.4 (also reproduced on apache hive built from > trunk) >Reporter: Illya Yalovyy >Assignee: Illya Yalovyy > Attachments: HIVE-10980.patch > > > Conditions that lead to the issue: > 1. Execution engine set to MapReduce > 2. Partition columns have different types > 3. Both static and dynamic partitions are used in the query > 4. Dynamically generated partitions require merge > Result: Final data is loaded to "__HIVE_DEFAULT_PARTITION__". > Steps to reproduce: > set hive.exec.dynamic.partition=true; > set hive.exec.dynamic.partition.mode=strict; > set hive.optimize.sort.dynamic.partition=false; > set hive.merge.mapfiles=true; > set hive.merge.mapredfiles=true; > set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; > set hive.execution.engine=mr; > create external table sdp ( > dataint bigint, > hour int, > req string, > cid string, > caid string > ) > row format delimited > fields terminated by ','; > load data local inpath '../../data/files/dynpartdata1.txt' into table sdp; > load data local inpath '../../data/files/dynpartdata2.txt' into table sdp; > ... > load data local inpath '../../data/files/dynpartdataN.txt' into table sdp; > create table tdp (cid string, caid string) > partitioned by (dataint bigint, hour int, req string); > insert overwrite table tdp partition (dataint=20150316, hour=16, req) > select cid, caid, req from sdp where dataint=20150316 and hour=16; > select * from tdp order by caid; > show partitions tdp; > Example of the input file: > 20150316,16,reqA,clusterIdA,cacheId1 > 20150316,16,reqB,clusterIdB,cacheId2 > 20150316,16,reqA,clusterIdC,cacheId3 > 20150316,16,reqD,clusterIdD,cacheId4 > 20150316,16,reqA,clusterIdA,cacheId5 > Actual result: > clusterIdA cacheId12015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdA cacheId12015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdB cacheId22015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdC cacheId32015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdD cacheId42015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdA cacheId52015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdD cacheId82015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdB cacheId92015031616 > __HIVE_DEFAULT_PARTITION__ > > dataint=20150316/hour=16/req=__HIVE_DEFAULT_PARTITION__ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10980) Merge of dynamic partitions loads all data to default partition
[ https://issues.apache.org/jira/browse/HIVE-10980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740210#comment-14740210 ] Hive QA commented on HIVE-10980: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12755080/HIVE-10980.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9437 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables_compact org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5232/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5232/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5232/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12755080 - PreCommit-HIVE-TRUNK-Build > Merge of dynamic partitions loads all data to default partition > --- > > Key: HIVE-10980 > URL: https://issues.apache.org/jira/browse/HIVE-10980 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 0.14.0 > Environment: HDP 2.2.4 (also reproduced on apache hive built from > trunk) >Reporter: Illya Yalovyy >Assignee: Illya Yalovyy > Attachments: HIVE-10980.patch > > > Conditions that lead to the issue: > 1. Execution engine set to MapReduce > 2. Partition columns have different types > 3. Both static and dynamic partitions are used in the query > 4. Dynamically generated partitions require merge > Result: Final data is loaded to "__HIVE_DEFAULT_PARTITION__". > Steps to reproduce: > set hive.exec.dynamic.partition=true; > set hive.exec.dynamic.partition.mode=strict; > set hive.optimize.sort.dynamic.partition=false; > set hive.merge.mapfiles=true; > set hive.merge.mapredfiles=true; > set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; > set hive.execution.engine=mr; > create external table sdp ( > dataint bigint, > hour int, > req string, > cid string, > caid string > ) > row format delimited > fields terminated by ','; > load data local inpath '../../data/files/dynpartdata1.txt' into table sdp; > load data local inpath '../../data/files/dynpartdata2.txt' into table sdp; > ... > load data local inpath '../../data/files/dynpartdataN.txt' into table sdp; > create table tdp (cid string, caid string) > partitioned by (dataint bigint, hour int, req string); > insert overwrite table tdp partition (dataint=20150316, hour=16, req) > select cid, caid, req from sdp where dataint=20150316 and hour=16; > select * from tdp order by caid; > show partitions tdp; > Example of the input file: > 20150316,16,reqA,clusterIdA,cacheId1 > 20150316,16,reqB,clusterIdB,cacheId2 > 20150316,16,reqA,clusterIdC,cacheId3 > 20150316,16,reqD,clusterIdD,cacheId4 > 20150316,16,reqA,clusterIdA,cacheId5 > Actual result: > clusterIdA cacheId12015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdA cacheId12015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdB cacheId22015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdC cacheId32015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdD cacheId42015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdA cacheId52015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdD cacheId82015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdB cacheId92015031616 > __HIVE_DEFAULT_PARTITION__ > > dataint=20150316/hour=16/req=__HIVE_DEFAULT_PARTITION__ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10980) Merge of dynamic partitions loads all data to default partition
[ https://issues.apache.org/jira/browse/HIVE-10980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739561#comment-14739561 ] Illya Yalovyy commented on HIVE-10980: -- Patch is submitted for review: https://reviews.apache.org/r/38268/ > Merge of dynamic partitions loads all data to default partition > --- > > Key: HIVE-10980 > URL: https://issues.apache.org/jira/browse/HIVE-10980 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 0.14.0 > Environment: HDP 2.2.4 (also reproduced on apache hive built from > trunk) >Reporter: Illya Yalovyy > Attachments: HIVE-10980.patch > > > Conditions that lead to the issue: > 1. Execution engine set to MapReduce > 2. Partition columns have different types > 3. Both static and dynamic partitions are used in the query > 4. Dynamically generated partitions require merge > Result: Final data is loaded to "__HIVE_DEFAULT_PARTITION__". > Steps to reproduce: > set hive.exec.dynamic.partition=true; > set hive.exec.dynamic.partition.mode=strict; > set hive.optimize.sort.dynamic.partition=false; > set hive.merge.mapfiles=true; > set hive.merge.mapredfiles=true; > set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; > set hive.execution.engine=mr; > create external table sdp ( > dataint bigint, > hour int, > req string, > cid string, > caid string > ) > row format delimited > fields terminated by ','; > load data local inpath '../../data/files/dynpartdata1.txt' into table sdp; > load data local inpath '../../data/files/dynpartdata2.txt' into table sdp; > ... > load data local inpath '../../data/files/dynpartdataN.txt' into table sdp; > create table tdp (cid string, caid string) > partitioned by (dataint bigint, hour int, req string); > insert overwrite table tdp partition (dataint=20150316, hour=16, req) > select cid, caid, req from sdp where dataint=20150316 and hour=16; > select * from tdp order by caid; > show partitions tdp; > Example of the input file: > 20150316,16,reqA,clusterIdA,cacheId1 > 20150316,16,reqB,clusterIdB,cacheId2 > 20150316,16,reqA,clusterIdC,cacheId3 > 20150316,16,reqD,clusterIdD,cacheId4 > 20150316,16,reqA,clusterIdA,cacheId5 > Actual result: > clusterIdA cacheId12015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdA cacheId12015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdB cacheId22015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdC cacheId32015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdD cacheId42015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdA cacheId52015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdD cacheId82015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdB cacheId92015031616 > __HIVE_DEFAULT_PARTITION__ > > dataint=20150316/hour=16/req=__HIVE_DEFAULT_PARTITION__ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10980) Merge of dynamic partitions loads all data to default partition
[ https://issues.apache.org/jira/browse/HIVE-10980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583655#comment-14583655 ] Illya Yalovyy commented on HIVE-10980: -- I have a patch for this. Will upload it soon. > Merge of dynamic partitions loads all data to default partition > --- > > Key: HIVE-10980 > URL: https://issues.apache.org/jira/browse/HIVE-10980 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 0.14.0 > Environment: HDP 2.2.4 (also reproduced on apache hive built from > trunk) >Reporter: Illya Yalovyy > > Conditions that lead to the issue: > 1. Execution engine set to MapReduce > 2. Partition columns have different types > 3. Both static and dynamic partitions are used in the query > 4. Dynamically generated partitions require merge > Result: Final data is loaded to "__HIVE_DEFAULT_PARTITION__". > Steps to reproduce: > set hive.exec.dynamic.partition=true; > set hive.exec.dynamic.partition.mode=strict; > set hive.optimize.sort.dynamic.partition=false; > set hive.merge.mapfiles=true; > set hive.merge.mapredfiles=true; > set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; > set hive.execution.engine=mr; > create external table sdp ( > dataint bigint, > hour int, > req string, > cid string, > caid string > ) > row format delimited > fields terminated by ','; > load data local inpath '../../data/files/dynpartdata1.txt' into table sdp; > load data local inpath '../../data/files/dynpartdata2.txt' into table sdp; > ... > load data local inpath '../../data/files/dynpartdataN.txt' into table sdp; > create table tdp (cid string, caid string) > partitioned by (dataint bigint, hour int, req string); > insert overwrite table tdp partition (dataint=20150316, hour=16, req) > select cid, caid, req from sdp where dataint=20150316 and hour=16; > select * from tdp order by caid; > show partitions tdp; > Example of the input file: > 20150316,16,reqA,clusterIdA,cacheId1 > 20150316,16,reqB,clusterIdB,cacheId2 > 20150316,16,reqA,clusterIdC,cacheId3 > 20150316,16,reqD,clusterIdD,cacheId4 > 20150316,16,reqA,clusterIdA,cacheId5 > Actual result: > clusterIdA cacheId12015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdA cacheId12015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdB cacheId22015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdC cacheId32015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdD cacheId42015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdA cacheId52015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdD cacheId82015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdB cacheId92015031616 > __HIVE_DEFAULT_PARTITION__ > > dataint=20150316/hour=16/req=__HIVE_DEFAULT_PARTITION__ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10980) Merge of dynamic partitions loads all data to default partition
[ https://issues.apache.org/jira/browse/HIVE-10980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14582092#comment-14582092 ] Illya Yalovyy commented on HIVE-10980: -- Good point. I observed this behavior on MapReduce. I'll update the ticket. > Merge of dynamic partitions loads all data to default partition > --- > > Key: HIVE-10980 > URL: https://issues.apache.org/jira/browse/HIVE-10980 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 0.14.0 > Environment: HDP 2.2.4 (also reproduced on apache hive built from > trunk) >Reporter: Illya Yalovyy > > Conditions that lead to the issue: > 1. Partition columns have different types > 2. Both static and dynamic partitions are used in the query > 3. Dynamically generated partitions require merge > Result: Final data is loaded to "__HIVE_DEFAULT_PARTITION__". > Steps to reproduce: > set hive.exec.dynamic.partition=true; > set hive.exec.dynamic.partition.mode=strict; > set hive.optimize.sort.dynamic.partition=false; > set hive.merge.mapfiles=true; > set hive.merge.mapredfiles=true; > set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; > create external table sdp ( > dataint bigint, > hour int, > req string, > cid string, > caid string > ) > row format delimited > fields terminated by ','; > load data local inpath '../../data/files/dynpartdata1.txt' into table sdp; > load data local inpath '../../data/files/dynpartdata2.txt' into table sdp; > ... > load data local inpath '../../data/files/dynpartdataN.txt' into table sdp; > create table tdp (cid string, caid string) > partitioned by (dataint bigint, hour int, req string); > insert overwrite table tdp partition (dataint=20150316, hour=16, req) > select cid, caid, req from sdp where dataint=20150316 and hour=16; > select * from tdp order by caid; > show partitions tdp; > Example of the input file: > 20150316,16,reqA,clusterIdA,cacheId1 > 20150316,16,reqB,clusterIdB,cacheId2 > 20150316,16,reqA,clusterIdC,cacheId3 > 20150316,16,reqD,clusterIdD,cacheId4 > 20150316,16,reqA,clusterIdA,cacheId5 > Actual result: > clusterIdA cacheId12015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdA cacheId12015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdB cacheId22015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdC cacheId32015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdD cacheId42015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdA cacheId52015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdD cacheId82015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdB cacheId92015031616 > __HIVE_DEFAULT_PARTITION__ > > dataint=20150316/hour=16/req=__HIVE_DEFAULT_PARTITION__ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10980) Merge of dynamic partitions loads all data to default partition
[ https://issues.apache.org/jira/browse/HIVE-10980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14581540#comment-14581540 ] Gopal V commented on HIVE-10980: Are you using MapReduce or Tez? > Merge of dynamic partitions loads all data to default partition > --- > > Key: HIVE-10980 > URL: https://issues.apache.org/jira/browse/HIVE-10980 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 0.14.0 > Environment: HDP 2.2.4 (also reproduced on apache hive built from > trunk) >Reporter: Illya Yalovyy > > Conditions that lead to the issue: > 1. Partition columns have different types > 2. Both static and dynamic partitions are used in the query > 3. Dynamically generated partitions require merge > Result: Final data is loaded to "__HIVE_DEFAULT_PARTITION__". > Steps to reproduce: > set hive.exec.dynamic.partition=true; > set hive.exec.dynamic.partition.mode=strict; > set hive.optimize.sort.dynamic.partition=false; > set hive.merge.mapfiles=true; > set hive.merge.mapredfiles=true; > set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; > create external table sdp ( > dataint bigint, > hour int, > req string, > cid string, > caid string > ) > row format delimited > fields terminated by ','; > load data local inpath '../../data/files/dynpartdata1.txt' into table sdp; > load data local inpath '../../data/files/dynpartdata2.txt' into table sdp; > ... > load data local inpath '../../data/files/dynpartdataN.txt' into table sdp; > create table tdp (cid string, caid string) > partitioned by (dataint bigint, hour int, req string); > insert overwrite table tdp partition (dataint=20150316, hour=16, req) > select cid, caid, req from sdp where dataint=20150316 and hour=16; > select * from tdp order by caid; > show partitions tdp; > Example of the input file: > 20150316,16,reqA,clusterIdA,cacheId1 > 20150316,16,reqB,clusterIdB,cacheId2 > 20150316,16,reqA,clusterIdC,cacheId3 > 20150316,16,reqD,clusterIdD,cacheId4 > 20150316,16,reqA,clusterIdA,cacheId5 > Actual result: > clusterIdA cacheId12015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdA cacheId12015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdB cacheId22015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdC cacheId32015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdD cacheId42015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdA cacheId52015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdD cacheId82015031616 > __HIVE_DEFAULT_PARTITION__ > clusterIdB cacheId92015031616 > __HIVE_DEFAULT_PARTITION__ > > dataint=20150316/hour=16/req=__HIVE_DEFAULT_PARTITION__ -- This message was sent by Atlassian JIRA (v6.3.4#6332)