[jira] [Updated] (HIVE-2117) insert overwrite ignoring partition location

2011-06-13 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2117:
-

Affects Version/s: 0.7.1

Backported to branch-0.7

 insert overwrite ignoring partition location
 

 Key: HIVE-2117
 URL: https://issues.apache.org/jira/browse/HIVE-2117
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Query Processor
Affects Versions: 0.7.1, 0.8.0
Reporter: Patrick Hunt
Assignee: Patrick Hunt
Priority: Blocker
 Attachments: HIVE-2117_br07.patch, HIVE-2117_br07.patch, 
 HIVE-2117_trunk.patch, data.txt


 The following code works differently in 0.5.0 vs 0.7.0.
 In 0.5.0 the partition location is respected. 
 However in 0.7.0 while the initial partition is create with the specified 
 location path/parta, the insert overwrite ... results in the partition 
 written to path/dt=a (note that path is the same in both cases).
 {code}
 create table foo_stg (bar INT, car INT); 
 load data local inpath 'data.txt' into table foo_stg;
  
 create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION 
 '/user/hive/warehouse/foo4'; 
 alter table foo4 add partition (dt='a') location 
 '/user/hive/warehouse/foo4/parta';
  
 from foo_stg fs insert overwrite table foo4 partition (dt='a') select *;
 {code}
 From what I can tell HIVE-1707 introduced this via a change to
 org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, 
 MapString, String, boolean, boolean)
 specifically:
 {code}
 +  Path partPath = new Path(tbl.getDataLocation().getPath(),
 +  Warehouse.makePartPath(partSpec));
 +
 +  Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath
 +  .toUri().getAuthority(), partPath.toUri().getPath());
 {code}
 Reading the description on HIVE-1707 it seems that this may have been done 
 purposefully, however given the partition location is explicitly specified 
 for the partition in question it seems like that should be honored (esp give 
 the table location has not changed).
 This difference in behavior is causing a regression in existing production 
 Hive based code. I'd like to take a stab at addressing this, any suggestions?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2117) insert overwrite ignoring partition location

2011-05-24 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2117:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks Patrick!

 insert overwrite ignoring partition location
 

 Key: HIVE-2117
 URL: https://issues.apache.org/jira/browse/HIVE-2117
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Query Processor
Affects Versions: 0.8.0
Reporter: Patrick Hunt
Assignee: Patrick Hunt
Priority: Blocker
 Attachments: HIVE-2117_br07.patch, HIVE-2117_br07.patch, 
 HIVE-2117_trunk.patch, data.txt


 The following code works differently in 0.5.0 vs 0.7.0.
 In 0.5.0 the partition location is respected. 
 However in 0.7.0 while the initial partition is create with the specified 
 location path/parta, the insert overwrite ... results in the partition 
 written to path/dt=a (note that path is the same in both cases).
 {code}
 create table foo_stg (bar INT, car INT); 
 load data local inpath 'data.txt' into table foo_stg;
  
 create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION 
 '/user/hive/warehouse/foo4'; 
 alter table foo4 add partition (dt='a') location 
 '/user/hive/warehouse/foo4/parta';
  
 from foo_stg fs insert overwrite table foo4 partition (dt='a') select *;
 {code}
 From what I can tell HIVE-1707 introduced this via a change to
 org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, 
 MapString, String, boolean, boolean)
 specifically:
 {code}
 +  Path partPath = new Path(tbl.getDataLocation().getPath(),
 +  Warehouse.makePartPath(partSpec));
 +
 +  Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath
 +  .toUri().getAuthority(), partPath.toUri().getPath());
 {code}
 Reading the description on HIVE-1707 it seems that this may have been done 
 purposefully, however given the partition location is explicitly specified 
 for the partition in question it seems like that should be honored (esp give 
 the table location has not changed).
 This difference in behavior is causing a regression in existing production 
 Hive based code. I'd like to take a stab at addressing this, any suggestions?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2117) insert overwrite ignoring partition location

2011-05-24 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2117:
-

Affects Version/s: (was: 0.7.1)

 insert overwrite ignoring partition location
 

 Key: HIVE-2117
 URL: https://issues.apache.org/jira/browse/HIVE-2117
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Query Processor
Affects Versions: 0.8.0
Reporter: Patrick Hunt
Assignee: Patrick Hunt
Priority: Blocker
 Attachments: HIVE-2117_br07.patch, HIVE-2117_br07.patch, 
 HIVE-2117_trunk.patch, data.txt


 The following code works differently in 0.5.0 vs 0.7.0.
 In 0.5.0 the partition location is respected. 
 However in 0.7.0 while the initial partition is create with the specified 
 location path/parta, the insert overwrite ... results in the partition 
 written to path/dt=a (note that path is the same in both cases).
 {code}
 create table foo_stg (bar INT, car INT); 
 load data local inpath 'data.txt' into table foo_stg;
  
 create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION 
 '/user/hive/warehouse/foo4'; 
 alter table foo4 add partition (dt='a') location 
 '/user/hive/warehouse/foo4/parta';
  
 from foo_stg fs insert overwrite table foo4 partition (dt='a') select *;
 {code}
 From what I can tell HIVE-1707 introduced this via a change to
 org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, 
 MapString, String, boolean, boolean)
 specifically:
 {code}
 +  Path partPath = new Path(tbl.getDataLocation().getPath(),
 +  Warehouse.makePartPath(partSpec));
 +
 +  Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath
 +  .toUri().getAuthority(), partPath.toUri().getPath());
 {code}
 Reading the description on HIVE-1707 it seems that this may have been done 
 purposefully, however given the partition location is explicitly specified 
 for the partition in question it seems like that should be honored (esp give 
 the table location has not changed).
 This difference in behavior is causing a regression in existing production 
 Hive based code. I'd like to take a stab at addressing this, any suggestions?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2117) insert overwrite ignoring partition location

2011-05-20 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated HIVE-2117:
---

Status: Patch Available  (was: Open)

 insert overwrite ignoring partition location
 

 Key: HIVE-2117
 URL: https://issues.apache.org/jira/browse/HIVE-2117
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.7.0, 0.8.0
Reporter: Patrick Hunt
Assignee: Patrick Hunt
Priority: Blocker
 Attachments: HIVE-2117_br07.patch, HIVE-2117_br07.patch, 
 HIVE-2117_trunk.patch, data.txt


 The following code works differently in 0.5.0 vs 0.7.0.
 In 0.5.0 the partition location is respected. 
 However in 0.7.0 while the initial partition is create with the specified 
 location path/parta, the insert overwrite ... results in the partition 
 written to path/dt=a (note that path is the same in both cases).
 {code}
 create table foo_stg (bar INT, car INT); 
 load data local inpath 'data.txt' into table foo_stg;
  
 create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION 
 '/user/hive/warehouse/foo4'; 
 alter table foo4 add partition (dt='a') location 
 '/user/hive/warehouse/foo4/parta';
  
 from foo_stg fs insert overwrite table foo4 partition (dt='a') select *;
 {code}
 From what I can tell HIVE-1707 introduced this via a change to
 org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, 
 MapString, String, boolean, boolean)
 specifically:
 {code}
 +  Path partPath = new Path(tbl.getDataLocation().getPath(),
 +  Warehouse.makePartPath(partSpec));
 +
 +  Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath
 +  .toUri().getAuthority(), partPath.toUri().getPath());
 {code}
 Reading the description on HIVE-1707 it seems that this may have been done 
 purposefully, however given the partition location is explicitly specified 
 for the partition in question it seems like that should be honored (esp give 
 the table location has not changed).
 This difference in behavior is causing a regression in existing production 
 Hive based code. I'd like to take a stab at addressing this, any suggestions?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2117) insert overwrite ignoring partition location

2011-05-19 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated HIVE-2117:
---

Attachment: HIVE-2117_trunk.patch
HIVE-2117_br07.patch

Updated patch files for branch 0.7 and trunk.

This fixes the problem -- I've also added a new test which verifies the 
location used for the partition. I verified this failed before my patch and 
passes after applying my patch.

 insert overwrite ignoring partition location
 

 Key: HIVE-2117
 URL: https://issues.apache.org/jira/browse/HIVE-2117
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.7.0, 0.8.0
Reporter: Patrick Hunt
Assignee: Patrick Hunt
Priority: Blocker
 Attachments: HIVE-2117_br07.patch, HIVE-2117_br07.patch, 
 HIVE-2117_trunk.patch, data.txt


 The following code works differently in 0.5.0 vs 0.7.0.
 In 0.5.0 the partition location is respected. 
 However in 0.7.0 while the initial partition is create with the specified 
 location path/parta, the insert overwrite ... results in the partition 
 written to path/dt=a (note that path is the same in both cases).
 {code}
 create table foo_stg (bar INT, car INT); 
 load data local inpath 'data.txt' into table foo_stg;
  
 create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION 
 '/user/hive/warehouse/foo4'; 
 alter table foo4 add partition (dt='a') location 
 '/user/hive/warehouse/foo4/parta';
  
 from foo_stg fs insert overwrite table foo4 partition (dt='a') select *;
 {code}
 From what I can tell HIVE-1707 introduced this via a change to
 org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, 
 MapString, String, boolean, boolean)
 specifically:
 {code}
 +  Path partPath = new Path(tbl.getDataLocation().getPath(),
 +  Warehouse.makePartPath(partSpec));
 +
 +  Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath
 +  .toUri().getAuthority(), partPath.toUri().getPath());
 {code}
 Reading the description on HIVE-1707 it seems that this may have been done 
 purposefully, however given the partition location is explicitly specified 
 for the partition in question it seems like that should be honored (esp give 
 the table location has not changed).
 This difference in behavior is causing a regression in existing production 
 Hive based code. I'd like to take a stab at addressing this, any suggestions?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2117) insert overwrite ignoring partition location

2011-04-21 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated HIVE-2117:
---

Attachment: HIVE-2117_br07.patch

This patch is a work in progress, it resolves this jira in branch 0.7 while 
maintaining compatibility with the requirements from HIVE-1707. All unit tests 
are passing with this patch applied, also fixes the example I provided in the 
description (managed and external table).

I'm still working on two aspects of this JIRA; 1) creating a patch for trunk, 
and 2) adding unit tests to verify this behavior.


 insert overwrite ignoring partition location
 

 Key: HIVE-2117
 URL: https://issues.apache.org/jira/browse/HIVE-2117
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Patrick Hunt
Priority: Critical
 Attachments: HIVE-2117_br07.patch, data.txt


 The following code works differently in 0.5.0 vs 0.7.0.
 In 0.5.0 the partition location is respected. 
 However in 0.7.0 while the initial partition is create with the specified 
 location path/parta, the insert overwrite ... results in the partition 
 written to path/dt=a (note that path is the same in both cases).
 {code}
 create table foo_stg (bar INT, car INT); 
 load data local inpath 'data.txt' into table foo_stg;
  
 create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION 
 '/user/hive/warehouse/foo4'; 
 alter table foo4 add partition (dt='a') location 
 '/user/hive/warehouse/foo4/parta';
  
 from foo_stg fs insert overwrite table foo4 partition (dt='a') select *;
 {code}
 From what I can tell HIVE-1707 introduced this via a change to
 org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, 
 MapString, String, boolean, boolean)
 specifically:
 {code}
 +  Path partPath = new Path(tbl.getDataLocation().getPath(),
 +  Warehouse.makePartPath(partSpec));
 +
 +  Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath
 +  .toUri().getAuthority(), partPath.toUri().getPath());
 {code}
 Reading the description on HIVE-1707 it seems that this may have been done 
 purposefully, however given the partition location is explicitly specified 
 for the partition in question it seems like that should be honored (esp give 
 the table location has not changed).
 This difference in behavior is causing a regression in existing production 
 Hive based code. I'd like to take a stab at addressing this, any suggestions?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2117) insert overwrite ignoring partition location

2011-04-21 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated HIVE-2117:
---

 Priority: Blocker  (was: Critical)
Affects Version/s: 0.8.0

 insert overwrite ignoring partition location
 

 Key: HIVE-2117
 URL: https://issues.apache.org/jira/browse/HIVE-2117
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.7.0, 0.8.0
Reporter: Patrick Hunt
Priority: Blocker
 Attachments: HIVE-2117_br07.patch, data.txt


 The following code works differently in 0.5.0 vs 0.7.0.
 In 0.5.0 the partition location is respected. 
 However in 0.7.0 while the initial partition is create with the specified 
 location path/parta, the insert overwrite ... results in the partition 
 written to path/dt=a (note that path is the same in both cases).
 {code}
 create table foo_stg (bar INT, car INT); 
 load data local inpath 'data.txt' into table foo_stg;
  
 create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION 
 '/user/hive/warehouse/foo4'; 
 alter table foo4 add partition (dt='a') location 
 '/user/hive/warehouse/foo4/parta';
  
 from foo_stg fs insert overwrite table foo4 partition (dt='a') select *;
 {code}
 From what I can tell HIVE-1707 introduced this via a change to
 org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, 
 MapString, String, boolean, boolean)
 specifically:
 {code}
 +  Path partPath = new Path(tbl.getDataLocation().getPath(),
 +  Warehouse.makePartPath(partSpec));
 +
 +  Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath
 +  .toUri().getAuthority(), partPath.toUri().getPath());
 {code}
 Reading the description on HIVE-1707 it seems that this may have been done 
 purposefully, however given the partition location is explicitly specified 
 for the partition in question it seems like that should be honored (esp give 
 the table location has not changed).
 This difference in behavior is causing a regression in existing production 
 Hive based code. I'd like to take a stab at addressing this, any suggestions?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2117) insert overwrite ignoring partition location

2011-04-18 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated HIVE-2117:
---

Attachment: data.txt

single row of data that is used to reproduce the issue

 insert overwrite ignoring partition location
 

 Key: HIVE-2117
 URL: https://issues.apache.org/jira/browse/HIVE-2117
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Patrick Hunt
Priority: Critical
 Attachments: data.txt


 The following code works differently in 0.5.0 vs 0.7.0.
 In 0.5.0 the partition location is respected. 
 However in 0.7.0 while the initial partition is create with the specified 
 location path/parta, the insert overwrite ... results in the partition 
 written to path/dt=a (note that path is the same in both cases).
 {code}
 create table foo_stg (bar INT, car INT); 
 load data local inpath 'data.txt' into table foo_stg;
  
 create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION 
 '/user/hive/warehouse/foo4'; 
 alter table foo4 add partition (dt='a') location 
 '/user/hive/warehouse/foo4/parta';
  
 from foo_stg fs insert overwrite table foo4 partition (dt='a') select *;
 {code}
 From what I can tell HIVE-1707 introduced this via a change to
 org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, 
 MapString, String, boolean, boolean)
 specifically:
 {code}
 +  Path partPath = new Path(tbl.getDataLocation().getPath(),
 +  Warehouse.makePartPath(partSpec));
 +
 +  Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath
 +  .toUri().getAuthority(), partPath.toUri().getPath());
 {code}
 Reading the description on HIVE-1707 it seems that this may have been done 
 purposefully, however given the partition location is explicitly specified 
 for the partition in question it seems like that should be honored (esp give 
 the table location has not changed).
 This difference in behavior is causing a regression in existing production 
 Hive based code. I'd like to take a stab at addressing this, any suggestions?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira