[jira] [Commented] (HIVE-2117) insert overwrite ignoring partition location
[ https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038354#comment-13038354 ] Hudson commented on HIVE-2117: -- Integrated in Hive-trunk-h0.21 #745 (See [https://builds.apache.org/hudson/job/Hive-trunk-h0.21/745/]) HIVE-2117. Insert overwrite ignoring partition location (Patrick Hunt via cws) cws : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1126726 Files : * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/TestLocationQueries.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/TestMTQueries.java * /hive/trunk/ql/src/test/results/clientpositive/alter5.q.out * /hive/trunk/ql/src/test/queries/clientpositive/alter5.q * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/BaseTestQueries.java > insert overwrite ignoring partition location > > > Key: HIVE-2117 > URL: https://issues.apache.org/jira/browse/HIVE-2117 > Project: Hive > Issue Type: Bug > Components: Metastore, Query Processor >Affects Versions: 0.7.1, 0.8.0 >Reporter: Patrick Hunt >Assignee: Patrick Hunt >Priority: Blocker > Attachments: HIVE-2117_br07.patch, HIVE-2117_br07.patch, > HIVE-2117_trunk.patch, data.txt > > > The following code works differently in 0.5.0 vs 0.7.0. > In 0.5.0 the partition location is respected. > However in 0.7.0 while the initial partition is create with the specified > location "/parta", the "insert overwrite ..." results in the partition > written to "/dt=a" (note that is the same in both cases). > {code} > create table foo_stg (bar INT, car INT); > load data local inpath 'data.txt' into table foo_stg; > > create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION > '/user/hive/warehouse/foo4'; > alter table foo4 add partition (dt='a') location > '/user/hive/warehouse/foo4/parta'; > > from foo_stg fs insert overwrite table foo4 partition (dt='a') select *; > {code} > From what I can tell HIVE-1707 introduced this via a change to > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, > Map, boolean, boolean) > specifically: > {code} > + Path partPath = new Path(tbl.getDataLocation().getPath(), > + Warehouse.makePartPath(partSpec)); > + > + Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath > + .toUri().getAuthority(), partPath.toUri().getPath()); > {code} > Reading the description on HIVE-1707 it seems that this may have been done > purposefully, however given the partition location is explicitly specified > for the partition in question it seems like that should be honored (esp give > the table location has not changed). > This difference in behavior is causing a regression in existing production > Hive based code. I'd like to take a stab at addressing this, any suggestions? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2117) insert overwrite ignoring partition location
[ https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037096#comment-13037096 ] Carl Steinbach commented on HIVE-2117: -- +1. Will commit if tests pass. > insert overwrite ignoring partition location > > > Key: HIVE-2117 > URL: https://issues.apache.org/jira/browse/HIVE-2117 > Project: Hive > Issue Type: Bug >Affects Versions: 0.7.0, 0.8.0 >Reporter: Patrick Hunt >Assignee: Patrick Hunt >Priority: Blocker > Attachments: HIVE-2117_br07.patch, HIVE-2117_br07.patch, > HIVE-2117_trunk.patch, data.txt > > > The following code works differently in 0.5.0 vs 0.7.0. > In 0.5.0 the partition location is respected. > However in 0.7.0 while the initial partition is create with the specified > location "/parta", the "insert overwrite ..." results in the partition > written to "/dt=a" (note that is the same in both cases). > {code} > create table foo_stg (bar INT, car INT); > load data local inpath 'data.txt' into table foo_stg; > > create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION > '/user/hive/warehouse/foo4'; > alter table foo4 add partition (dt='a') location > '/user/hive/warehouse/foo4/parta'; > > from foo_stg fs insert overwrite table foo4 partition (dt='a') select *; > {code} > From what I can tell HIVE-1707 introduced this via a change to > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, > Map, boolean, boolean) > specifically: > {code} > + Path partPath = new Path(tbl.getDataLocation().getPath(), > + Warehouse.makePartPath(partSpec)); > + > + Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath > + .toUri().getAuthority(), partPath.toUri().getPath()); > {code} > Reading the description on HIVE-1707 it seems that this may have been done > purposefully, however given the partition location is explicitly specified > for the partition in question it seems like that should be honored (esp give > the table location has not changed). > This difference in behavior is causing a regression in existing production > Hive based code. I'd like to take a stab at addressing this, any suggestions? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2117) insert overwrite ignoring partition location
[ https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036914#comment-13036914 ] jirapos...@reviews.apache.org commented on HIVE-2117: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/773/ --- Review request for hive and Carl Steinbach. Summary --- This change resolves a regression introduced by HIVE-1707, specifically that the partition location (set via alter table partition location) is not being respected. I addressed this by using the user specified location (as done originally), except in the case with cross-filesystem moves (which was the concern in 1707). This addresses bug HIVE-2117. https://issues.apache.org/jira/browse/HIVE-2117 Diffs - ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java bcacd35 ql/src/test/org/apache/hadoop/hive/ql/BaseTestQueries.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java 06a0447 ql/src/test/org/apache/hadoop/hive/ql/TestLocationQueries.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/TestMTQueries.java 8c7c0b8 ql/src/test/queries/clientpositive/alter5.q PRE-CREATION ql/src/test/results/clientpositive/alter5.q.out PRE-CREATION Diff: https://reviews.apache.org/r/773/diff Testing --- I added a new test which verifies partition location explicitly - as the existing tests ignore this detail. This test failed w/o my fix applied, it passes with the fix applied. Thanks, Patrick > insert overwrite ignoring partition location > > > Key: HIVE-2117 > URL: https://issues.apache.org/jira/browse/HIVE-2117 > Project: Hive > Issue Type: Bug >Affects Versions: 0.7.0, 0.8.0 >Reporter: Patrick Hunt >Assignee: Patrick Hunt >Priority: Blocker > Attachments: HIVE-2117_br07.patch, HIVE-2117_br07.patch, > HIVE-2117_trunk.patch, data.txt > > > The following code works differently in 0.5.0 vs 0.7.0. > In 0.5.0 the partition location is respected. > However in 0.7.0 while the initial partition is create with the specified > location "/parta", the "insert overwrite ..." results in the partition > written to "/dt=a" (note that is the same in both cases). > {code} > create table foo_stg (bar INT, car INT); > load data local inpath 'data.txt' into table foo_stg; > > create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION > '/user/hive/warehouse/foo4'; > alter table foo4 add partition (dt='a') location > '/user/hive/warehouse/foo4/parta'; > > from foo_stg fs insert overwrite table foo4 partition (dt='a') select *; > {code} > From what I can tell HIVE-1707 introduced this via a change to > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, > Map, boolean, boolean) > specifically: > {code} > + Path partPath = new Path(tbl.getDataLocation().getPath(), > + Warehouse.makePartPath(partSpec)); > + > + Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath > + .toUri().getAuthority(), partPath.toUri().getPath()); > {code} > Reading the description on HIVE-1707 it seems that this may have been done > purposefully, however given the partition location is explicitly specified > for the partition in question it seems like that should be honored (esp give > the table location has not changed). > This difference in behavior is causing a regression in existing production > Hive based code. I'd like to take a stab at addressing this, any suggestions? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2117) insert overwrite ignoring partition location
[ https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036912#comment-13036912 ] Patrick Hunt commented on HIVE-2117: I posted reviews up on reviewboard: trunk: https://reviews.apache.org/r/773/ branch-0.7: https://reviews.apache.org/r/772/ > insert overwrite ignoring partition location > > > Key: HIVE-2117 > URL: https://issues.apache.org/jira/browse/HIVE-2117 > Project: Hive > Issue Type: Bug >Affects Versions: 0.7.0, 0.8.0 >Reporter: Patrick Hunt >Assignee: Patrick Hunt >Priority: Blocker > Attachments: HIVE-2117_br07.patch, HIVE-2117_br07.patch, > HIVE-2117_trunk.patch, data.txt > > > The following code works differently in 0.5.0 vs 0.7.0. > In 0.5.0 the partition location is respected. > However in 0.7.0 while the initial partition is create with the specified > location "/parta", the "insert overwrite ..." results in the partition > written to "/dt=a" (note that is the same in both cases). > {code} > create table foo_stg (bar INT, car INT); > load data local inpath 'data.txt' into table foo_stg; > > create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION > '/user/hive/warehouse/foo4'; > alter table foo4 add partition (dt='a') location > '/user/hive/warehouse/foo4/parta'; > > from foo_stg fs insert overwrite table foo4 partition (dt='a') select *; > {code} > From what I can tell HIVE-1707 introduced this via a change to > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, > Map, boolean, boolean) > specifically: > {code} > + Path partPath = new Path(tbl.getDataLocation().getPath(), > + Warehouse.makePartPath(partSpec)); > + > + Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath > + .toUri().getAuthority(), partPath.toUri().getPath()); > {code} > Reading the description on HIVE-1707 it seems that this may have been done > purposefully, however given the partition location is explicitly specified > for the partition in question it seems like that should be honored (esp give > the table location has not changed). > This difference in behavior is causing a regression in existing production > Hive based code. I'd like to take a stab at addressing this, any suggestions? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2117) insert overwrite ignoring partition location
[ https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036908#comment-13036908 ] jirapos...@reviews.apache.org commented on HIVE-2117: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/772/ --- Review request for hive and Carl Steinbach. Summary --- This change resolves a regression introduced by HIVE-1707, specifically that the partition location (set via alter table partition location) is not being respected. I addressed this by using the user specified location (as done originally), except in the case with cross-filesystem moves (which was the concern in 1707). This addresses bug HIVE-2117. https://issues.apache.org/jira/browse/HIVE-2117 Diffs - ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 916b235 ql/src/test/org/apache/hadoop/hive/ql/BaseTestQueries.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java 4685471 ql/src/test/org/apache/hadoop/hive/ql/TestLocationQueries.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/TestMTQueries.java 8c7c0b8 ql/src/test/queries/clientpositive/alter5.q PRE-CREATION ql/src/test/results/clientpositive/alter5.q.out PRE-CREATION Diff: https://reviews.apache.org/r/772/diff Testing --- I added a new test which verifies partition location explicitly - as the existing tests ignore this detail. This test failed w/o my fix applied, it passes with the fix applied. Thanks, Patrick > insert overwrite ignoring partition location > > > Key: HIVE-2117 > URL: https://issues.apache.org/jira/browse/HIVE-2117 > Project: Hive > Issue Type: Bug >Affects Versions: 0.7.0, 0.8.0 >Reporter: Patrick Hunt >Assignee: Patrick Hunt >Priority: Blocker > Attachments: HIVE-2117_br07.patch, HIVE-2117_br07.patch, > HIVE-2117_trunk.patch, data.txt > > > The following code works differently in 0.5.0 vs 0.7.0. > In 0.5.0 the partition location is respected. > However in 0.7.0 while the initial partition is create with the specified > location "/parta", the "insert overwrite ..." results in the partition > written to "/dt=a" (note that is the same in both cases). > {code} > create table foo_stg (bar INT, car INT); > load data local inpath 'data.txt' into table foo_stg; > > create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION > '/user/hive/warehouse/foo4'; > alter table foo4 add partition (dt='a') location > '/user/hive/warehouse/foo4/parta'; > > from foo_stg fs insert overwrite table foo4 partition (dt='a') select *; > {code} > From what I can tell HIVE-1707 introduced this via a change to > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, > Map, boolean, boolean) > specifically: > {code} > + Path partPath = new Path(tbl.getDataLocation().getPath(), > + Warehouse.makePartPath(partSpec)); > + > + Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath > + .toUri().getAuthority(), partPath.toUri().getPath()); > {code} > Reading the description on HIVE-1707 it seems that this may have been done > purposefully, however given the partition location is explicitly specified > for the partition in question it seems like that should be honored (esp give > the table location has not changed). > This difference in behavior is causing a regression in existing production > Hive based code. I'd like to take a stab at addressing this, any suggestions? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira