[jira] [Updated] (HIVE-2117) insert overwrite ignoring partition location
[ https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2117: - Affects Version/s: 0.7.1 Backported to branch-0.7 insert overwrite ignoring partition location Key: HIVE-2117 URL: https://issues.apache.org/jira/browse/HIVE-2117 Project: Hive Issue Type: Bug Components: Metastore, Query Processor Affects Versions: 0.7.1, 0.8.0 Reporter: Patrick Hunt Assignee: Patrick Hunt Priority: Blocker Attachments: HIVE-2117_br07.patch, HIVE-2117_br07.patch, HIVE-2117_trunk.patch, data.txt The following code works differently in 0.5.0 vs 0.7.0. In 0.5.0 the partition location is respected. However in 0.7.0 while the initial partition is create with the specified location path/parta, the insert overwrite ... results in the partition written to path/dt=a (note that path is the same in both cases). {code} create table foo_stg (bar INT, car INT); load data local inpath 'data.txt' into table foo_stg; create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION '/user/hive/warehouse/foo4'; alter table foo4 add partition (dt='a') location '/user/hive/warehouse/foo4/parta'; from foo_stg fs insert overwrite table foo4 partition (dt='a') select *; {code} From what I can tell HIVE-1707 introduced this via a change to org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, MapString, String, boolean, boolean) specifically: {code} + Path partPath = new Path(tbl.getDataLocation().getPath(), + Warehouse.makePartPath(partSpec)); + + Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath + .toUri().getAuthority(), partPath.toUri().getPath()); {code} Reading the description on HIVE-1707 it seems that this may have been done purposefully, however given the partition location is explicitly specified for the partition in question it seems like that should be honored (esp give the table location has not changed). This difference in behavior is causing a regression in existing production Hive based code. I'd like to take a stab at addressing this, any suggestions? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2117) insert overwrite ignoring partition location
[ https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2117: - Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed to trunk. Thanks Patrick! insert overwrite ignoring partition location Key: HIVE-2117 URL: https://issues.apache.org/jira/browse/HIVE-2117 Project: Hive Issue Type: Bug Components: Metastore, Query Processor Affects Versions: 0.8.0 Reporter: Patrick Hunt Assignee: Patrick Hunt Priority: Blocker Attachments: HIVE-2117_br07.patch, HIVE-2117_br07.patch, HIVE-2117_trunk.patch, data.txt The following code works differently in 0.5.0 vs 0.7.0. In 0.5.0 the partition location is respected. However in 0.7.0 while the initial partition is create with the specified location path/parta, the insert overwrite ... results in the partition written to path/dt=a (note that path is the same in both cases). {code} create table foo_stg (bar INT, car INT); load data local inpath 'data.txt' into table foo_stg; create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION '/user/hive/warehouse/foo4'; alter table foo4 add partition (dt='a') location '/user/hive/warehouse/foo4/parta'; from foo_stg fs insert overwrite table foo4 partition (dt='a') select *; {code} From what I can tell HIVE-1707 introduced this via a change to org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, MapString, String, boolean, boolean) specifically: {code} + Path partPath = new Path(tbl.getDataLocation().getPath(), + Warehouse.makePartPath(partSpec)); + + Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath + .toUri().getAuthority(), partPath.toUri().getPath()); {code} Reading the description on HIVE-1707 it seems that this may have been done purposefully, however given the partition location is explicitly specified for the partition in question it seems like that should be honored (esp give the table location has not changed). This difference in behavior is causing a regression in existing production Hive based code. I'd like to take a stab at addressing this, any suggestions? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2117) insert overwrite ignoring partition location
[ https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2117: - Affects Version/s: (was: 0.7.1) insert overwrite ignoring partition location Key: HIVE-2117 URL: https://issues.apache.org/jira/browse/HIVE-2117 Project: Hive Issue Type: Bug Components: Metastore, Query Processor Affects Versions: 0.8.0 Reporter: Patrick Hunt Assignee: Patrick Hunt Priority: Blocker Attachments: HIVE-2117_br07.patch, HIVE-2117_br07.patch, HIVE-2117_trunk.patch, data.txt The following code works differently in 0.5.0 vs 0.7.0. In 0.5.0 the partition location is respected. However in 0.7.0 while the initial partition is create with the specified location path/parta, the insert overwrite ... results in the partition written to path/dt=a (note that path is the same in both cases). {code} create table foo_stg (bar INT, car INT); load data local inpath 'data.txt' into table foo_stg; create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION '/user/hive/warehouse/foo4'; alter table foo4 add partition (dt='a') location '/user/hive/warehouse/foo4/parta'; from foo_stg fs insert overwrite table foo4 partition (dt='a') select *; {code} From what I can tell HIVE-1707 introduced this via a change to org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, MapString, String, boolean, boolean) specifically: {code} + Path partPath = new Path(tbl.getDataLocation().getPath(), + Warehouse.makePartPath(partSpec)); + + Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath + .toUri().getAuthority(), partPath.toUri().getPath()); {code} Reading the description on HIVE-1707 it seems that this may have been done purposefully, however given the partition location is explicitly specified for the partition in question it seems like that should be honored (esp give the table location has not changed). This difference in behavior is causing a regression in existing production Hive based code. I'd like to take a stab at addressing this, any suggestions? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2117) insert overwrite ignoring partition location
[ https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated HIVE-2117: --- Status: Patch Available (was: Open) insert overwrite ignoring partition location Key: HIVE-2117 URL: https://issues.apache.org/jira/browse/HIVE-2117 Project: Hive Issue Type: Bug Affects Versions: 0.7.0, 0.8.0 Reporter: Patrick Hunt Assignee: Patrick Hunt Priority: Blocker Attachments: HIVE-2117_br07.patch, HIVE-2117_br07.patch, HIVE-2117_trunk.patch, data.txt The following code works differently in 0.5.0 vs 0.7.0. In 0.5.0 the partition location is respected. However in 0.7.0 while the initial partition is create with the specified location path/parta, the insert overwrite ... results in the partition written to path/dt=a (note that path is the same in both cases). {code} create table foo_stg (bar INT, car INT); load data local inpath 'data.txt' into table foo_stg; create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION '/user/hive/warehouse/foo4'; alter table foo4 add partition (dt='a') location '/user/hive/warehouse/foo4/parta'; from foo_stg fs insert overwrite table foo4 partition (dt='a') select *; {code} From what I can tell HIVE-1707 introduced this via a change to org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, MapString, String, boolean, boolean) specifically: {code} + Path partPath = new Path(tbl.getDataLocation().getPath(), + Warehouse.makePartPath(partSpec)); + + Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath + .toUri().getAuthority(), partPath.toUri().getPath()); {code} Reading the description on HIVE-1707 it seems that this may have been done purposefully, however given the partition location is explicitly specified for the partition in question it seems like that should be honored (esp give the table location has not changed). This difference in behavior is causing a regression in existing production Hive based code. I'd like to take a stab at addressing this, any suggestions? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2117) insert overwrite ignoring partition location
[ https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated HIVE-2117: --- Attachment: HIVE-2117_trunk.patch HIVE-2117_br07.patch Updated patch files for branch 0.7 and trunk. This fixes the problem -- I've also added a new test which verifies the location used for the partition. I verified this failed before my patch and passes after applying my patch. insert overwrite ignoring partition location Key: HIVE-2117 URL: https://issues.apache.org/jira/browse/HIVE-2117 Project: Hive Issue Type: Bug Affects Versions: 0.7.0, 0.8.0 Reporter: Patrick Hunt Assignee: Patrick Hunt Priority: Blocker Attachments: HIVE-2117_br07.patch, HIVE-2117_br07.patch, HIVE-2117_trunk.patch, data.txt The following code works differently in 0.5.0 vs 0.7.0. In 0.5.0 the partition location is respected. However in 0.7.0 while the initial partition is create with the specified location path/parta, the insert overwrite ... results in the partition written to path/dt=a (note that path is the same in both cases). {code} create table foo_stg (bar INT, car INT); load data local inpath 'data.txt' into table foo_stg; create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION '/user/hive/warehouse/foo4'; alter table foo4 add partition (dt='a') location '/user/hive/warehouse/foo4/parta'; from foo_stg fs insert overwrite table foo4 partition (dt='a') select *; {code} From what I can tell HIVE-1707 introduced this via a change to org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, MapString, String, boolean, boolean) specifically: {code} + Path partPath = new Path(tbl.getDataLocation().getPath(), + Warehouse.makePartPath(partSpec)); + + Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath + .toUri().getAuthority(), partPath.toUri().getPath()); {code} Reading the description on HIVE-1707 it seems that this may have been done purposefully, however given the partition location is explicitly specified for the partition in question it seems like that should be honored (esp give the table location has not changed). This difference in behavior is causing a regression in existing production Hive based code. I'd like to take a stab at addressing this, any suggestions? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2117) insert overwrite ignoring partition location
[ https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated HIVE-2117: --- Attachment: HIVE-2117_br07.patch This patch is a work in progress, it resolves this jira in branch 0.7 while maintaining compatibility with the requirements from HIVE-1707. All unit tests are passing with this patch applied, also fixes the example I provided in the description (managed and external table). I'm still working on two aspects of this JIRA; 1) creating a patch for trunk, and 2) adding unit tests to verify this behavior. insert overwrite ignoring partition location Key: HIVE-2117 URL: https://issues.apache.org/jira/browse/HIVE-2117 Project: Hive Issue Type: Bug Affects Versions: 0.7.0 Reporter: Patrick Hunt Priority: Critical Attachments: HIVE-2117_br07.patch, data.txt The following code works differently in 0.5.0 vs 0.7.0. In 0.5.0 the partition location is respected. However in 0.7.0 while the initial partition is create with the specified location path/parta, the insert overwrite ... results in the partition written to path/dt=a (note that path is the same in both cases). {code} create table foo_stg (bar INT, car INT); load data local inpath 'data.txt' into table foo_stg; create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION '/user/hive/warehouse/foo4'; alter table foo4 add partition (dt='a') location '/user/hive/warehouse/foo4/parta'; from foo_stg fs insert overwrite table foo4 partition (dt='a') select *; {code} From what I can tell HIVE-1707 introduced this via a change to org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, MapString, String, boolean, boolean) specifically: {code} + Path partPath = new Path(tbl.getDataLocation().getPath(), + Warehouse.makePartPath(partSpec)); + + Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath + .toUri().getAuthority(), partPath.toUri().getPath()); {code} Reading the description on HIVE-1707 it seems that this may have been done purposefully, however given the partition location is explicitly specified for the partition in question it seems like that should be honored (esp give the table location has not changed). This difference in behavior is causing a regression in existing production Hive based code. I'd like to take a stab at addressing this, any suggestions? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2117) insert overwrite ignoring partition location
[ https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated HIVE-2117: --- Priority: Blocker (was: Critical) Affects Version/s: 0.8.0 insert overwrite ignoring partition location Key: HIVE-2117 URL: https://issues.apache.org/jira/browse/HIVE-2117 Project: Hive Issue Type: Bug Affects Versions: 0.7.0, 0.8.0 Reporter: Patrick Hunt Priority: Blocker Attachments: HIVE-2117_br07.patch, data.txt The following code works differently in 0.5.0 vs 0.7.0. In 0.5.0 the partition location is respected. However in 0.7.0 while the initial partition is create with the specified location path/parta, the insert overwrite ... results in the partition written to path/dt=a (note that path is the same in both cases). {code} create table foo_stg (bar INT, car INT); load data local inpath 'data.txt' into table foo_stg; create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION '/user/hive/warehouse/foo4'; alter table foo4 add partition (dt='a') location '/user/hive/warehouse/foo4/parta'; from foo_stg fs insert overwrite table foo4 partition (dt='a') select *; {code} From what I can tell HIVE-1707 introduced this via a change to org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, MapString, String, boolean, boolean) specifically: {code} + Path partPath = new Path(tbl.getDataLocation().getPath(), + Warehouse.makePartPath(partSpec)); + + Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath + .toUri().getAuthority(), partPath.toUri().getPath()); {code} Reading the description on HIVE-1707 it seems that this may have been done purposefully, however given the partition location is explicitly specified for the partition in question it seems like that should be honored (esp give the table location has not changed). This difference in behavior is causing a regression in existing production Hive based code. I'd like to take a stab at addressing this, any suggestions? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2117) insert overwrite ignoring partition location
[ https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated HIVE-2117: --- Attachment: data.txt single row of data that is used to reproduce the issue insert overwrite ignoring partition location Key: HIVE-2117 URL: https://issues.apache.org/jira/browse/HIVE-2117 Project: Hive Issue Type: Bug Affects Versions: 0.7.0 Reporter: Patrick Hunt Priority: Critical Attachments: data.txt The following code works differently in 0.5.0 vs 0.7.0. In 0.5.0 the partition location is respected. However in 0.7.0 while the initial partition is create with the specified location path/parta, the insert overwrite ... results in the partition written to path/dt=a (note that path is the same in both cases). {code} create table foo_stg (bar INT, car INT); load data local inpath 'data.txt' into table foo_stg; create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION '/user/hive/warehouse/foo4'; alter table foo4 add partition (dt='a') location '/user/hive/warehouse/foo4/parta'; from foo_stg fs insert overwrite table foo4 partition (dt='a') select *; {code} From what I can tell HIVE-1707 introduced this via a change to org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, MapString, String, boolean, boolean) specifically: {code} + Path partPath = new Path(tbl.getDataLocation().getPath(), + Warehouse.makePartPath(partSpec)); + + Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath + .toUri().getAuthority(), partPath.toUri().getPath()); {code} Reading the description on HIVE-1707 it seems that this may have been done purposefully, however given the partition location is explicitly specified for the partition in question it seems like that should be honored (esp give the table location has not changed). This difference in behavior is causing a regression in existing production Hive based code. I'd like to take a stab at addressing this, any suggestions? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira