Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

2016-08-11 Thread Lefty Leverenz


> On Aug. 10, 2016, 5:31 a.m., Lefty Leverenz wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, lines 3091-3092
> > 
> >
> > Tiny nit:  Either make "It" lowercase or move the parenthetical 
> > sentence after the first sentence, with a final period like this:
> > 
> > "Enable the use of scratch directories directly on blob storage 
> > systems. (It may cause performance penalties.)"

Looks good now.  +1 for the parameter descriptions.


- Lefty


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review145307
---


On Aug. 10, 2016, 9:08 p.m., Sergio Pena wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> ---
> 
> (Updated Aug. 10, 2016, 9:08 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
> https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch will create a temporary directory for Hive intermediate data on 
> HDFS when S3 tables are used.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java 
> PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> 9f5f619359701b948f57d599a5bdc2ecbdff280a 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 
> 89893eba9fd2316b9a393f06edefa837bb815faf 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
> 5bd78862e1064d7f64a5d764571015a8df1101e8 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> ---
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);
>3.651s
> - insert into table s3dummy values (1);   
>   39.231s
> - insert overwrite table s3dummy values (1);  
>   42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   9.297s
> - insert into table s3dummy_ext values (1);   
>   45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   3.945s
> - insert into table s3dummy values (1);   
>   15.025s
> - insert overwrite table s3dummy values (1);  
>   25.149s 
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 19.158s  
> - from dummy insert overwrite table s3dummy select *; 
>   25.469s  
> - from dummy insert into table s3dummy select *;  
>   14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   4.827s
> - insert into table s3dummy_ext values (1);   
>   16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';  
>3.176s
> - alter table s3dummypart add partition (part=1); 
>3.229s
> - alter table s3dummypart add partition (part=2); 
>3.124s
> - insert into table s3dummypart partition (part=1) values (1);
>   14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);   
>   27.594s 
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * 
> from dummypart; 22.298s  
> - from dummypart insert overwrite table s3dummypart partition (part=1) select 
> id;   29.001s  
> - from dummypart insert into table s3dummypart partition (part=1) select id;  
> 

Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

2016-08-10 Thread Sergio Pena

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/
---

(Updated Aug. 10, 2016, 9:08 p.m.)


Review request for hive.


Changes
---

Changes on this patch:
- Use getTempDirForPath() for the statistics temp file and GenMapRedUtils temp 
file.


Bugs: HIVE-14270
https://issues.apache.org/jira/browse/HIVE-14270


Repository: hive-git


Description
---

This patch will create a temporary directory for Hive intermediate data on HDFS 
when S3 tables are used.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java 
PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
9f5f619359701b948f57d599a5bdc2ecbdff280a 
  common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/Context.java 
89893eba9fd2316b9a393f06edefa837bb815faf 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
5bd78862e1064d7f64a5d764571015a8df1101e8 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 

Diff: https://reviews.apache.org/r/50359/diff/


Testing
---

NO PATCH
** NON-PARTITIONED TABLE

- create table dummy (id int);  
 3.651s
- insert into table s3dummy values (1); 
39.231s
- insert overwrite table s3dummy values (1);
42.569s
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
dummy; 30.136s

EXTERNAL TABLE

- create table s3dummy_ext like s3dummy location 
's3a://spena-bucket/user/hive/warehouse/s3dummy';   9.297s
- insert into table s3dummy_ext values (1); 
45.855s

WITH PATCH

** NON-PARTITIONED TABLE
- create table s3dummy (id int) location 
's3a://spena-bucket/user/hive/warehouse/s3dummy';   3.945s
- insert into table s3dummy values (1); 
15.025s
- insert overwrite table s3dummy values (1);
25.149s 
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
dummy; 19.158s  
- from dummy insert overwrite table s3dummy select *;   
25.469s  
- from dummy insert into table s3dummy select *;
14.501s

** EXTERNAL TABLE
- create table s3dummy_ext like s3dummy location 
's3a://spena-bucket/user/hive/warehouse/s3dummy';   4.827s
- insert into table s3dummy_ext values (1); 
16.070s

** PARTITIONED TABLE
- create table s3dummypart (id int) partitioned by (part int)
  location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';
 3.176s
- alter table s3dummypart add partition (part=1);   
 3.229s
- alter table s3dummypart add partition (part=2);   
 3.124s
- insert into table s3dummypart partition (part=1) values (1);  
14.876s
- insert overwrite table s3dummypart partition (part=1) values (1); 
27.594s 
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * 
from dummypart; 22.298s  
- from dummypart insert overwrite table s3dummypart partition (part=1) select 
id;   29.001s  
- from dummypart insert into table s3dummypart partition (part=1) select id;
14.869s

** DYNAMIC PARTITIONS
- insert into table s3dummypart partition (part) select id, 1 from dummypart;   
15.185s
- insert into table s3dummypart partition (part) select id, 1 from dummypart;   
18.820s


Thanks,

Sergio Pena



Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

2016-08-10 Thread Sergio Pena


> On Aug. 10, 2016, 6:41 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java, lines 
> > 1807-1814
> > 
> >
> > Not able to follow : )
> > Are you doing this only to avoid copying .hive-staging dir? If so, you 
> > can use filter while copying to eliminate that, no?

Think more about this, I think you were right since the begginning. I can use 
'getTempDirForPath(fileSinkDesc.getDestPath())' as it will use the same 
.hive-staging directory that is used in 'dest'.
I did some tests and it is working fine.


- Sergio


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review145381
---


On Aug. 9, 2016, 7:53 p.m., Sergio Pena wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> ---
> 
> (Updated Aug. 9, 2016, 7:53 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
> https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch will create a temporary directory for Hive intermediate data on 
> HDFS when S3 tables are used.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java 
> PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> 9f5f619359701b948f57d599a5bdc2ecbdff280a 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 
> 89893eba9fd2316b9a393f06edefa837bb815faf 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
> 5bd78862e1064d7f64a5d764571015a8df1101e8 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> ---
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);
>3.651s
> - insert into table s3dummy values (1);   
>   39.231s
> - insert overwrite table s3dummy values (1);  
>   42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   9.297s
> - insert into table s3dummy_ext values (1);   
>   45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   3.945s
> - insert into table s3dummy values (1);   
>   15.025s
> - insert overwrite table s3dummy values (1);  
>   25.149s 
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 19.158s  
> - from dummy insert overwrite table s3dummy select *; 
>   25.469s  
> - from dummy insert into table s3dummy select *;  
>   14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   4.827s
> - insert into table s3dummy_ext values (1);   
>   16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';  
>3.176s
> - alter table s3dummypart add partition (part=1); 
>3.229s
> - alter table s3dummypart add partition (part=2); 
>3.124s
> - insert into table s3dummypart partition (part=1) values (1);
>   14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);   
>   27.594s 
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * 
> from dummypart; 22.298s  
> - from dummypart insert overwrite table s3dummypart partition (part=1) select 
> id;   

Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

2016-08-10 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review145381
---




ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java (lines 1807 
- 1814)


Not able to follow : )
Are you doing this only to avoid copying .hive-staging dir? If so, you can 
use filter while copying to eliminate that, no?



ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java (lines 7020 - 
7024)


yeah.. lets use ctx.getTempDirForPath() here.


- Ashutosh Chauhan


On Aug. 9, 2016, 7:53 p.m., Sergio Pena wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> ---
> 
> (Updated Aug. 9, 2016, 7:53 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
> https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch will create a temporary directory for Hive intermediate data on 
> HDFS when S3 tables are used.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java 
> PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> 9f5f619359701b948f57d599a5bdc2ecbdff280a 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 
> 89893eba9fd2316b9a393f06edefa837bb815faf 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
> 5bd78862e1064d7f64a5d764571015a8df1101e8 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> ---
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);
>3.651s
> - insert into table s3dummy values (1);   
>   39.231s
> - insert overwrite table s3dummy values (1);  
>   42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   9.297s
> - insert into table s3dummy_ext values (1);   
>   45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   3.945s
> - insert into table s3dummy values (1);   
>   15.025s
> - insert overwrite table s3dummy values (1);  
>   25.149s 
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 19.158s  
> - from dummy insert overwrite table s3dummy select *; 
>   25.469s  
> - from dummy insert into table s3dummy select *;  
>   14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   4.827s
> - insert into table s3dummy_ext values (1);   
>   16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';  
>3.176s
> - alter table s3dummypart add partition (part=1); 
>3.229s
> - alter table s3dummypart add partition (part=2); 
>3.124s
> - insert into table s3dummypart partition (part=1) values (1);
>   14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);   
>   27.594s 
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * 
> from dummypart; 22.298s  
> - from dummypart insert overwrite table s3dummypart partition (part=1) select 
> id;   29.001s  
> - from dummypart insert into table s3dummypart partition (part=1) select id;  
>   14.869s
> 
> ** DYNAMIC 

Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

2016-08-09 Thread Lefty Leverenz

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review145307
---




common/src/java/org/apache/hadoop/hive/conf/HiveConf.java (lines 3091 - 3092)


Tiny nit:  Either make "It" lowercase or move the parenthetical sentence 
after the first sentence, with a final period like this:

"Enable the use of scratch directories directly on blob storage systems. 
(It may cause performance penalties.)"


- Lefty Leverenz


On Aug. 9, 2016, 7:53 p.m., Sergio Pena wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> ---
> 
> (Updated Aug. 9, 2016, 7:53 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
> https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch will create a temporary directory for Hive intermediate data on 
> HDFS when S3 tables are used.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java 
> PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> 9f5f619359701b948f57d599a5bdc2ecbdff280a 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 
> 89893eba9fd2316b9a393f06edefa837bb815faf 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
> 5bd78862e1064d7f64a5d764571015a8df1101e8 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> ---
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);
>3.651s
> - insert into table s3dummy values (1);   
>   39.231s
> - insert overwrite table s3dummy values (1);  
>   42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   9.297s
> - insert into table s3dummy_ext values (1);   
>   45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   3.945s
> - insert into table s3dummy values (1);   
>   15.025s
> - insert overwrite table s3dummy values (1);  
>   25.149s 
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 19.158s  
> - from dummy insert overwrite table s3dummy select *; 
>   25.469s  
> - from dummy insert into table s3dummy select *;  
>   14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   4.827s
> - insert into table s3dummy_ext values (1);   
>   16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';  
>3.176s
> - alter table s3dummypart add partition (part=1); 
>3.229s
> - alter table s3dummypart add partition (part=2); 
>3.124s
> - insert into table s3dummypart partition (part=1) values (1);
>   14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);   
>   27.594s 
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * 
> from dummypart; 22.298s  
> - from dummypart insert overwrite table s3dummypart partition (part=1) select 
> id;   29.001s  
> - from dummypart insert into table s3dummypart partition (part=1) select id;  
>   14.869s
> 
> ** DYNAMIC PARTITIONS
> - insert into table s3dummypart partition (part) select id, 1 from dummypart; 
> 

Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

2016-08-09 Thread Lefty Leverenz


> On July 30, 2016, 8:44 a.m., Lefty Leverenz wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, lines 3066-3067
> > 
> >
> > Typo:  Commad-separated --> Comma-separated
> > 
> > Redundancy:  "... supported blobstore schemes that Hive officially 
> > supports" (omit "supported")
> > 
> > Nit:  A period could be added at the end.
> 
> Lefty Leverenz wrote:
> Looks good now, thanks Sergio.

Aarrgh, forgot to publish that.

Adding a trivial comment for the new config.


- Lefty


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review144255
---


On Aug. 9, 2016, 7:53 p.m., Sergio Pena wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> ---
> 
> (Updated Aug. 9, 2016, 7:53 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
> https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch will create a temporary directory for Hive intermediate data on 
> HDFS when S3 tables are used.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java 
> PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> 9f5f619359701b948f57d599a5bdc2ecbdff280a 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 
> 89893eba9fd2316b9a393f06edefa837bb815faf 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
> 5bd78862e1064d7f64a5d764571015a8df1101e8 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> ---
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);
>3.651s
> - insert into table s3dummy values (1);   
>   39.231s
> - insert overwrite table s3dummy values (1);  
>   42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   9.297s
> - insert into table s3dummy_ext values (1);   
>   45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   3.945s
> - insert into table s3dummy values (1);   
>   15.025s
> - insert overwrite table s3dummy values (1);  
>   25.149s 
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 19.158s  
> - from dummy insert overwrite table s3dummy select *; 
>   25.469s  
> - from dummy insert into table s3dummy select *;  
>   14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   4.827s
> - insert into table s3dummy_ext values (1);   
>   16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';  
>3.176s
> - alter table s3dummypart add partition (part=1); 
>3.229s
> - alter table s3dummypart add partition (part=2); 
>3.124s
> - insert into table s3dummypart partition (part=1) values (1);
>   14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);   
>   27.594s 
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * 
> from dummypart; 22.298s  
> - from dummypart insert overwrite table s3dummypart partition (part=1) select 
> id;   29.001s  
> - from dummypart insert into table 

Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

2016-08-09 Thread Lefty Leverenz


> On July 30, 2016, 8:44 a.m., Lefty Leverenz wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, lines 3066-3067
> > 
> >
> > Typo:  Commad-separated --> Comma-separated
> > 
> > Redundancy:  "... supported blobstore schemes that Hive officially 
> > supports" (omit "supported")
> > 
> > Nit:  A period could be added at the end.

Looks good now, thanks Sergio.


- Lefty


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review144255
---


On Aug. 9, 2016, 7:53 p.m., Sergio Pena wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> ---
> 
> (Updated Aug. 9, 2016, 7:53 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
> https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch will create a temporary directory for Hive intermediate data on 
> HDFS when S3 tables are used.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java 
> PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> 9f5f619359701b948f57d599a5bdc2ecbdff280a 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 
> 89893eba9fd2316b9a393f06edefa837bb815faf 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
> 5bd78862e1064d7f64a5d764571015a8df1101e8 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> ---
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);
>3.651s
> - insert into table s3dummy values (1);   
>   39.231s
> - insert overwrite table s3dummy values (1);  
>   42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   9.297s
> - insert into table s3dummy_ext values (1);   
>   45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   3.945s
> - insert into table s3dummy values (1);   
>   15.025s
> - insert overwrite table s3dummy values (1);  
>   25.149s 
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 19.158s  
> - from dummy insert overwrite table s3dummy select *; 
>   25.469s  
> - from dummy insert into table s3dummy select *;  
>   14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   4.827s
> - insert into table s3dummy_ext values (1);   
>   16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';  
>3.176s
> - alter table s3dummypart add partition (part=1); 
>3.229s
> - alter table s3dummypart add partition (part=2); 
>3.124s
> - insert into table s3dummypart partition (part=1) values (1);
>   14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);   
>   27.594s 
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * 
> from dummypart; 22.298s  
> - from dummypart insert overwrite table s3dummypart partition (part=1) select 
> id;   29.001s  
> - from dummypart insert into table s3dummypart partition (part=1) select id;  
>   14.869s
> 
> ** DYNAMIC PARTITIONS
> - insert 

Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

2016-08-09 Thread Sergio Pena


> On Aug. 9, 2016, 9:49 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java, lines 
> > 1807-1814
> > 
> >
> > Why not use newly added Context::getTempDirForPath(Path path) here.

Yeah, sorry. This is a little confusing. 

The thing is that 'tmpDir' is based on 'dest' (tmpDir = 
baseCtx.getExternalTmpPath(dest)) where 'dest' is an HDFS temporary directory 
(not S3). This is the directory causing the .hive-staging to be created on S3 
at the end, when HDFS temp dir was copied to S3 (INSERT OVERWRITE).

I found out that FileSinkDesc has a 'getDestPath' that returns you the S3 path. 
So, the condition is if the 'getDestPath' is on S3, then use 'getMRTmpPath', or 
continue using the temporary path based on 'dest' (HDFS temp path).

That part of the code was a little confusing regarding the names 'dest', 
'getDestPath', 'getFinalDirName'. I was trying to understand this code, but I 
could not figure out the idea behind 'getFinalDirnName', and 'getDestPath'; so 
I ended up writing that condition. Also, the comments that were already there 
mentioned that the temp file should be in the same filesystem as the 
destination (in case of non-blobstore directories).


> On Aug. 9, 2016, 9:49 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java, lines 
> > 7020-7024
> > 
> >
> > Why not use newly introduced tx.getTempDirForPath(dest_path); here?

This part was causing 72 tests failing due to the different scratch directory 
name. Also I wasn't sure why the stats temp was on the same location as 
'queryTmpdir', so I added the condition too incase it has issues with encrypted 
zones. I like your line best, but I wasn't sure about it, and I ended up doing 
this condition.

I can do the 'ctx.getTempDirForPath' better. What do you think?


> On Aug. 9, 2016, 9:49 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java, line 6763
> > 
> >
> > surprised that we weren't using getExternalTmpPathRelTo() here, did we 
> > miss this when we introduced this method for encrypt support work?

Mmm, i'm surprised too. Maybe we missed it.


- Sergio


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review145269
---


On Aug. 9, 2016, 7:53 p.m., Sergio Pena wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> ---
> 
> (Updated Aug. 9, 2016, 7:53 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
> https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch will create a temporary directory for Hive intermediate data on 
> HDFS when S3 tables are used.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java 
> PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> 9f5f619359701b948f57d599a5bdc2ecbdff280a 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 
> 89893eba9fd2316b9a393f06edefa837bb815faf 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
> 5bd78862e1064d7f64a5d764571015a8df1101e8 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> ---
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);
>3.651s
> - insert into table s3dummy values (1);   
>   39.231s
> - insert overwrite table s3dummy values (1);  
>   42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   9.297s
> - insert into table s3dummy_ext values (1);   
>   45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   

Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

2016-08-09 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review145269
---




ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java (lines 1807 
- 1814)


Why not use newly added Context::getTempDirForPath(Path path) here.



ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java (line 6763)


surprised that we weren't using getExternalTmpPathRelTo() here, did we miss 
this when we introduced this method for encrypt support work?



ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java (lines 7020 - 
7024)


Why not use newly introduced tx.getTempDirForPath(dest_path); here?


- Ashutosh Chauhan


On Aug. 9, 2016, 7:53 p.m., Sergio Pena wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> ---
> 
> (Updated Aug. 9, 2016, 7:53 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
> https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch will create a temporary directory for Hive intermediate data on 
> HDFS when S3 tables are used.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java 
> PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> 9f5f619359701b948f57d599a5bdc2ecbdff280a 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 
> 89893eba9fd2316b9a393f06edefa837bb815faf 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
> 5bd78862e1064d7f64a5d764571015a8df1101e8 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> ---
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);
>3.651s
> - insert into table s3dummy values (1);   
>   39.231s
> - insert overwrite table s3dummy values (1);  
>   42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   9.297s
> - insert into table s3dummy_ext values (1);   
>   45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   3.945s
> - insert into table s3dummy values (1);   
>   15.025s
> - insert overwrite table s3dummy values (1);  
>   25.149s 
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 19.158s  
> - from dummy insert overwrite table s3dummy select *; 
>   25.469s  
> - from dummy insert into table s3dummy select *;  
>   14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   4.827s
> - insert into table s3dummy_ext values (1);   
>   16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';  
>3.176s
> - alter table s3dummypart add partition (part=1); 
>3.229s
> - alter table s3dummypart add partition (part=2); 
>3.124s
> - insert into table s3dummypart partition (part=1) values (1);
>   14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);   
>   27.594s 
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * 
> from dummypart; 22.298s  
> - from dummypart insert overwrite table 

Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

2016-08-09 Thread Sergio Pena

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/
---

(Updated Aug. 9, 2016, 7:53 p.m.)


Review request for hive.


Changes
---

- Added new flag variable that allows users to use the table blobstorage 
location as scratch directory.
- Other minor fixes to allow tests to pass.


Bugs: HIVE-14270
https://issues.apache.org/jira/browse/HIVE-14270


Repository: hive-git


Description
---

This patch will create a temporary directory for Hive intermediate data on HDFS 
when S3 tables are used.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java 
PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
9f5f619359701b948f57d599a5bdc2ecbdff280a 
  common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/Context.java 
89893eba9fd2316b9a393f06edefa837bb815faf 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
5bd78862e1064d7f64a5d764571015a8df1101e8 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 

Diff: https://reviews.apache.org/r/50359/diff/


Testing
---

NO PATCH
** NON-PARTITIONED TABLE

- create table dummy (id int);  
 3.651s
- insert into table s3dummy values (1); 
39.231s
- insert overwrite table s3dummy values (1);
42.569s
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
dummy; 30.136s

EXTERNAL TABLE

- create table s3dummy_ext like s3dummy location 
's3a://spena-bucket/user/hive/warehouse/s3dummy';   9.297s
- insert into table s3dummy_ext values (1); 
45.855s

WITH PATCH

** NON-PARTITIONED TABLE
- create table s3dummy (id int) location 
's3a://spena-bucket/user/hive/warehouse/s3dummy';   3.945s
- insert into table s3dummy values (1); 
15.025s
- insert overwrite table s3dummy values (1);
25.149s 
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
dummy; 19.158s  
- from dummy insert overwrite table s3dummy select *;   
25.469s  
- from dummy insert into table s3dummy select *;
14.501s

** EXTERNAL TABLE
- create table s3dummy_ext like s3dummy location 
's3a://spena-bucket/user/hive/warehouse/s3dummy';   4.827s
- insert into table s3dummy_ext values (1); 
16.070s

** PARTITIONED TABLE
- create table s3dummypart (id int) partitioned by (part int)
  location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';
 3.176s
- alter table s3dummypart add partition (part=1);   
 3.229s
- alter table s3dummypart add partition (part=2);   
 3.124s
- insert into table s3dummypart partition (part=1) values (1);  
14.876s
- insert overwrite table s3dummypart partition (part=1) values (1); 
27.594s 
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * 
from dummypart; 22.298s  
- from dummypart insert overwrite table s3dummypart partition (part=1) select 
id;   29.001s  
- from dummypart insert into table s3dummypart partition (part=1) select id;
14.869s

** DYNAMIC PARTITIONS
- insert into table s3dummypart partition (part) select id, 1 from dummypart;   
15.185s
- insert into table s3dummypart partition (part) select id, 1 from dummypart;   
18.820s


Thanks,

Sergio Pena



Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

2016-08-04 Thread Sergio Pena

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/
---

(Updated Aug. 4, 2016, 4:29 p.m.)


Review request for hive.


Changes
---

Addressed minor comments.

Removed the code that was duplicating the rename() to S3. Instead, it gets HDFS 
scratch directories for the required temporary files.


Bugs: HIVE-14270
https://issues.apache.org/jira/browse/HIVE-14270


Repository: hive-git


Description
---

This patch will create a temporary directory for Hive intermediate data on HDFS 
when S3 tables are used.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java 
PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
9f5f619359701b948f57d599a5bdc2ecbdff280a 
  common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/Context.java 
89893eba9fd2316b9a393f06edefa837bb815faf 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
5bd78862e1064d7f64a5d764571015a8df1101e8 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 

Diff: https://reviews.apache.org/r/50359/diff/


Testing
---

NO PATCH
** NON-PARTITIONED TABLE

- create table dummy (id int);  
 3.651s
- insert into table s3dummy values (1); 
39.231s
- insert overwrite table s3dummy values (1);
42.569s
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
dummy; 30.136s

EXTERNAL TABLE

- create table s3dummy_ext like s3dummy location 
's3a://spena-bucket/user/hive/warehouse/s3dummy';   9.297s
- insert into table s3dummy_ext values (1); 
45.855s

WITH PATCH

** NON-PARTITIONED TABLE
- create table s3dummy (id int) location 
's3a://spena-bucket/user/hive/warehouse/s3dummy';   3.945s
- insert into table s3dummy values (1); 
15.025s
- insert overwrite table s3dummy values (1);
25.149s 
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
dummy; 19.158s  
- from dummy insert overwrite table s3dummy select *;   
25.469s  
- from dummy insert into table s3dummy select *;
14.501s

** EXTERNAL TABLE
- create table s3dummy_ext like s3dummy location 
's3a://spena-bucket/user/hive/warehouse/s3dummy';   4.827s
- insert into table s3dummy_ext values (1); 
16.070s

** PARTITIONED TABLE
- create table s3dummypart (id int) partitioned by (part int)
  location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';
 3.176s
- alter table s3dummypart add partition (part=1);   
 3.229s
- alter table s3dummypart add partition (part=2);   
 3.124s
- insert into table s3dummypart partition (part=1) values (1);  
14.876s
- insert overwrite table s3dummypart partition (part=1) values (1); 
27.594s 
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * 
from dummypart; 22.298s  
- from dummypart insert overwrite table s3dummypart partition (part=1) select 
id;   29.001s  
- from dummypart insert into table s3dummypart partition (part=1) select id;
14.869s

** DYNAMIC PARTITIONS
- insert into table s3dummypart partition (part) select id, 1 from dummypart;   
15.185s
- insert into table s3dummypart partition (part) select id, 1 from dummypart;   
18.820s


Thanks,

Sergio Pena



Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

2016-07-30 Thread Lefty Leverenz

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review144255
---




common/src/java/org/apache/hadoop/hive/conf/HiveConf.java (lines 3066 - 3067)


Typo:  Commad-separated --> Comma-separated

Redundancy:  "... supported blobstore schemes that Hive officially 
supports" (omit "supported")

Nit:  A period could be added at the end.


- Lefty Leverenz


On July 28, 2016, 8:11 p.m., Sergio Pena wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> ---
> 
> (Updated July 28, 2016, 8:11 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
> https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch will create a temporary directory for Hive intermediate data on 
> HDFS when S3 tables are used.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java 
> PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> e92466f172c81fce20fe951df58f6561d28dc215 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 
> ec5d693d28a40925c44f844a05ebf3f5c10173c9 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 
> 9d927bd1a519f79bc7fa88c3b7e5c6cc2ef0637f 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> 2671cb1cf2ef74f9d6628f8cdf3f5ac99283dbd8 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> ---
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);
>3.651s
> - insert into table s3dummy values (1);   
>   39.231s
> - insert overwrite table s3dummy values (1);  
>   42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   9.297s
> - insert into table s3dummy_ext values (1);   
>   45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   3.945s
> - insert into table s3dummy values (1);   
>   15.025s
> - insert overwrite table s3dummy values (1);  
>   25.149s 
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 19.158s  
> - from dummy insert overwrite table s3dummy select *; 
>   25.469s  
> - from dummy insert into table s3dummy select *;  
>   14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   4.827s
> - insert into table s3dummy_ext values (1);   
>   16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';  
>3.176s
> - alter table s3dummypart add partition (part=1); 
>3.229s
> - alter table s3dummypart add partition (part=2); 
>3.124s
> - insert into table s3dummypart partition (part=1) values (1);
>   14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);   
>   27.594s 
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * 
> from dummypart; 22.298s  
> - from dummypart insert overwrite table s3dummypart partition (part=1) select 
> id;   29.001s  
> - from dummypart insert into table s3dummypart partition (part=1) select id;  
>   14.869s
> 
> ** DYNAMIC PARTITIONS
> - insert into table s3dummypart partition (part) select id, 1 from dummypart; 
>   15.185s
> - insert into table s3dummypart 

Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

2016-07-29 Thread Sergio Pena


> On July 29, 2016, 11:33 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, lines 1840-1841
> > 
> >
> > I don't follow this. Comment doesn't seem to match code. 
> > FileSystem.rename() should automatically do copy+delete for S3. So, why 
> > do we need to do that explictly?
> > Per your comment, you want to delete temp dir, but that should already 
> > be handled in Context::clear()
> > Per your code, you are deleting preexisting files on target dir but as 
> > I said that should already be handled in fs.rename()

Yes, FileSystem.rename() is handleding the copy+delete for S3. However, for the 
INSERT OVERWRITE case, the temporary directory that contains 00_0 also 
contains a .hive-staging directory that is also copied to S3. This 
.hive-staging directory should be deleted automatically on HDFS by the 
deleteOnExit() call, but when this directory is copied to S3, then this 
deleteOnExit flag is not copied, so the data is kept on S3.

I thought I could point statsTmpLoc to a different location instead. Then I 
found another place where another temporary directory in .hive-staging is 
created too. So, instead on fixing these 2, I thought that maybe this can be 
handled this way by doing it explicitly. Other developers may use 
getExtTmpPathRelTo() in the future again and they will add more temp data to 
.hive-staging, so I just wanted to prevent copying unwanted files to S3.

What do you think?


> On July 29, 2016, 11:33 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 1807
> > 
> >
> > I am not sure if this change is really needed. But, if it does, won't 
> > be need equivalent in loadPartition() & loadDynamicPartitions().

Thanks.
I'll take a look at this.


- Sergio


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review144220
---


On July 28, 2016, 8:11 p.m., Sergio Pena wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> ---
> 
> (Updated July 28, 2016, 8:11 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
> https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch will create a temporary directory for Hive intermediate data on 
> HDFS when S3 tables are used.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java 
> PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> e92466f172c81fce20fe951df58f6561d28dc215 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 
> ec5d693d28a40925c44f844a05ebf3f5c10173c9 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 
> 9d927bd1a519f79bc7fa88c3b7e5c6cc2ef0637f 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> 2671cb1cf2ef74f9d6628f8cdf3f5ac99283dbd8 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> ---
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);
>3.651s
> - insert into table s3dummy values (1);   
>   39.231s
> - insert overwrite table s3dummy values (1);  
>   42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   9.297s
> - insert into table s3dummy_ext values (1);   
>   45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   3.945s
> - insert into table s3dummy values (1);   
>   15.025s
> - insert overwrite table s3dummy values (1);  
>   25.149s 
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 19.158s  
> - from dummy insert overwrite table s3dummy select *; 
>   

Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

2016-07-29 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review144220
---




ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java (line 1803)


I am not sure if this change is really needed. But, if it does, won't be 
need equivalent in loadPartition() & loadDynamicPartitions().



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java (lines 1832 - 1833)


I don't follow this. Comment doesn't seem to match code. 
FileSystem.rename() should automatically do copy+delete for S3. So, why do 
we need to do that explictly?
Per your comment, you want to delete temp dir, but that should already be 
handled in Context::clear()
Per your code, you are deleting preexisting files on target dir but as I 
said that should already be handled in fs.rename()



ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java (line 7019)


This will go to temp s3 location. you may want to move this to hdfs too.


- Ashutosh Chauhan


On July 28, 2016, 8:11 p.m., Sergio Pena wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> ---
> 
> (Updated July 28, 2016, 8:11 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
> https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch will create a temporary directory for Hive intermediate data on 
> HDFS when S3 tables are used.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java 
> PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> e92466f172c81fce20fe951df58f6561d28dc215 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 
> ec5d693d28a40925c44f844a05ebf3f5c10173c9 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 
> 9d927bd1a519f79bc7fa88c3b7e5c6cc2ef0637f 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> 2671cb1cf2ef74f9d6628f8cdf3f5ac99283dbd8 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> ---
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);
>3.651s
> - insert into table s3dummy values (1);   
>   39.231s
> - insert overwrite table s3dummy values (1);  
>   42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   9.297s
> - insert into table s3dummy_ext values (1);   
>   45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   3.945s
> - insert into table s3dummy values (1);   
>   15.025s
> - insert overwrite table s3dummy values (1);  
>   25.149s 
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 19.158s  
> - from dummy insert overwrite table s3dummy select *; 
>   25.469s  
> - from dummy insert into table s3dummy select *;  
>   14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   4.827s
> - insert into table s3dummy_ext values (1);   
>   16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';  
>3.176s
> - alter table s3dummypart add partition (part=1); 
>3.229s
> - alter table s3dummypart add partition (part=2); 
>3.124s
> - insert into table s3dummypart partition (part=1) values (1);
>   

Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

2016-07-28 Thread Sergio Pena


> On July 28, 2016, 6:45 p.m., Reuben Kuhnert wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 3225
> > 
> >
> > This code in both branches of 'if/else' are identical except for the 
> > 'destination path'. Maybe factor that out?

It looks the same, but there might be a sligther issue that needs to be tested 
if we refactor this part.
I recalled we have had several issues with this code, so better leave this way 
and fix it in another jira.


- Sergio


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review143975
---


On July 28, 2016, 8:11 p.m., Sergio Pena wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> ---
> 
> (Updated July 28, 2016, 8:11 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
> https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch will create a temporary directory for Hive intermediate data on 
> HDFS when S3 tables are used.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java 
> PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> e92466f172c81fce20fe951df58f6561d28dc215 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 
> ec5d693d28a40925c44f844a05ebf3f5c10173c9 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 
> 9d927bd1a519f79bc7fa88c3b7e5c6cc2ef0637f 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> 2671cb1cf2ef74f9d6628f8cdf3f5ac99283dbd8 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> ---
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);
>3.651s
> - insert into table s3dummy values (1);   
>   39.231s
> - insert overwrite table s3dummy values (1);  
>   42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   9.297s
> - insert into table s3dummy_ext values (1);   
>   45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   3.945s
> - insert into table s3dummy values (1);   
>   15.025s
> - insert overwrite table s3dummy values (1);  
>   25.149s 
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 19.158s  
> - from dummy insert overwrite table s3dummy select *; 
>   25.469s  
> - from dummy insert into table s3dummy select *;  
>   14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   4.827s
> - insert into table s3dummy_ext values (1);   
>   16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';  
>3.176s
> - alter table s3dummypart add partition (part=1); 
>3.229s
> - alter table s3dummypart add partition (part=2); 
>3.124s
> - insert into table s3dummypart partition (part=1) values (1);
>   14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);   
>   27.594s 
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * 
> from dummypart; 22.298s  
> - from dummypart insert overwrite table s3dummypart partition (part=1) select 
> id;   29.001s  
> - from dummypart insert into table s3dummypart partition (part=1) select id;  
>

Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

2016-07-28 Thread Sergio Pena

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/
---

(Updated July 28, 2016, 8:11 p.m.)


Review request for hive.


Changes
---

This patch adds a new configuration variable that contains supported blobstore 
schemes.

HIVE_BLOBSTORE_SUPPORTED_SCHEMES("hive.blobstore.supported.schemes", 
"s3,s3a,s3n",
"Commad-separated list of supported blobstore schemes that Hive 
officially supports");


Bugs: HIVE-14270
https://issues.apache.org/jira/browse/HIVE-14270


Repository: hive-git


Description
---

This patch will create a temporary directory for Hive intermediate data on HDFS 
when S3 tables are used.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java 
PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
e92466f172c81fce20fe951df58f6561d28dc215 
  common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/Context.java 
ec5d693d28a40925c44f844a05ebf3f5c10173c9 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 
9d927bd1a519f79bc7fa88c3b7e5c6cc2ef0637f 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
2671cb1cf2ef74f9d6628f8cdf3f5ac99283dbd8 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 

Diff: https://reviews.apache.org/r/50359/diff/


Testing
---

NO PATCH
** NON-PARTITIONED TABLE

- create table dummy (id int);  
 3.651s
- insert into table s3dummy values (1); 
39.231s
- insert overwrite table s3dummy values (1);
42.569s
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
dummy; 30.136s

EXTERNAL TABLE

- create table s3dummy_ext like s3dummy location 
's3a://spena-bucket/user/hive/warehouse/s3dummy';   9.297s
- insert into table s3dummy_ext values (1); 
45.855s

WITH PATCH

** NON-PARTITIONED TABLE
- create table s3dummy (id int) location 
's3a://spena-bucket/user/hive/warehouse/s3dummy';   3.945s
- insert into table s3dummy values (1); 
15.025s
- insert overwrite table s3dummy values (1);
25.149s 
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
dummy; 19.158s  
- from dummy insert overwrite table s3dummy select *;   
25.469s  
- from dummy insert into table s3dummy select *;
14.501s

** EXTERNAL TABLE
- create table s3dummy_ext like s3dummy location 
's3a://spena-bucket/user/hive/warehouse/s3dummy';   4.827s
- insert into table s3dummy_ext values (1); 
16.070s

** PARTITIONED TABLE
- create table s3dummypart (id int) partitioned by (part int)
  location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';
 3.176s
- alter table s3dummypart add partition (part=1);   
 3.229s
- alter table s3dummypart add partition (part=2);   
 3.124s
- insert into table s3dummypart partition (part=1) values (1);  
14.876s
- insert overwrite table s3dummypart partition (part=1) values (1); 
27.594s 
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * 
from dummypart; 22.298s  
- from dummypart insert overwrite table s3dummypart partition (part=1) select 
id;   29.001s  
- from dummypart insert into table s3dummypart partition (part=1) select id;
14.869s

** DYNAMIC PARTITIONS
- insert into table s3dummypart partition (part) select id, 1 from dummypart;   
15.185s
- insert into table s3dummypart partition (part) select id, 1 from dummypart;   
18.820s


Thanks,

Sergio Pena



Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

2016-07-28 Thread Reuben Kuhnert

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review143975
---




ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java (line 3217)


This code in both branches of 'if/else' are identical except for the 
'destination path'. Maybe factor that out?


- Reuben Kuhnert


On 七月 27, 2016, 10:56 p.m., Sergio Pena wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> ---
> 
> (Updated 七月 27, 2016, 10:56 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
> https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch will create a temporary directory for Hive intermediate data on 
> HDFS when S3 tables are used.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java 
> PRE-CREATION 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 
> ec5d693d28a40925c44f844a05ebf3f5c10173c9 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 
> 9d927bd1a519f79bc7fa88c3b7e5c6cc2ef0637f 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> 2671cb1cf2ef74f9d6628f8cdf3f5ac99283dbd8 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> ---
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);
>3.651s
> - insert into table s3dummy values (1);   
>   39.231s
> - insert overwrite table s3dummy values (1);  
>   42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   9.297s
> - insert into table s3dummy_ext values (1);   
>   45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   3.945s
> - insert into table s3dummy values (1);   
>   15.025s
> - insert overwrite table s3dummy values (1);  
>   25.149s 
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 19.158s  
> - from dummy insert overwrite table s3dummy select *; 
>   25.469s  
> - from dummy insert into table s3dummy select *;  
>   14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   4.827s
> - insert into table s3dummy_ext values (1);   
>   16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';  
>3.176s
> - alter table s3dummypart add partition (part=1); 
>3.229s
> - alter table s3dummypart add partition (part=2); 
>3.124s
> - insert into table s3dummypart partition (part=1) values (1);
>   14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);   
>   27.594s 
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * 
> from dummypart; 22.298s  
> - from dummypart insert overwrite table s3dummypart partition (part=1) select 
> id;   29.001s  
> - from dummypart insert into table s3dummypart partition (part=1) select id;  
>   14.869s
> 
> ** DYNAMIC PARTITIONS
> - insert into table s3dummypart partition (part) select id, 1 from dummypart; 
>   15.185s
> - insert into table s3dummypart partition (part) select id, 1 from dummypart; 
>   18.820s
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>



Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

2016-07-27 Thread Sergio Pena

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/
---

(Updated July 27, 2016, 10:56 p.m.)


Review request for hive.


Bugs: HIVE-14270
https://issues.apache.org/jira/browse/HIVE-14270


Repository: hive-git


Description
---

This patch will create a temporary directory for Hive intermediate data on HDFS 
when S3 tables are used.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java 
PRE-CREATION 
  common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/Context.java 
ec5d693d28a40925c44f844a05ebf3f5c10173c9 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 
9d927bd1a519f79bc7fa88c3b7e5c6cc2ef0637f 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
2671cb1cf2ef74f9d6628f8cdf3f5ac99283dbd8 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 

Diff: https://reviews.apache.org/r/50359/diff/


Testing
---

NO PATCH
** NON-PARTITIONED TABLE

- create table dummy (id int);  
 3.651s
- insert into table s3dummy values (1); 
39.231s
- insert overwrite table s3dummy values (1);
42.569s
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
dummy; 30.136s

EXTERNAL TABLE

- create table s3dummy_ext like s3dummy location 
's3a://spena-bucket/user/hive/warehouse/s3dummy';   9.297s
- insert into table s3dummy_ext values (1); 
45.855s

WITH PATCH

** NON-PARTITIONED TABLE
- create table s3dummy (id int) location 
's3a://spena-bucket/user/hive/warehouse/s3dummy';   3.945s
- insert into table s3dummy values (1); 
15.025s
- insert overwrite table s3dummy values (1);
25.149s 
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
dummy; 19.158s  
- from dummy insert overwrite table s3dummy select *;   
25.469s  
- from dummy insert into table s3dummy select *;
14.501s

** EXTERNAL TABLE
- create table s3dummy_ext like s3dummy location 
's3a://spena-bucket/user/hive/warehouse/s3dummy';   4.827s
- insert into table s3dummy_ext values (1); 
16.070s

** PARTITIONED TABLE
- create table s3dummypart (id int) partitioned by (part int)
  location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';
 3.176s
- alter table s3dummypart add partition (part=1);   
 3.229s
- alter table s3dummypart add partition (part=2);   
 3.124s
- insert into table s3dummypart partition (part=1) values (1);  
14.876s
- insert overwrite table s3dummypart partition (part=1) values (1); 
27.594s 
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * 
from dummypart; 22.298s  
- from dummypart insert overwrite table s3dummypart partition (part=1) select 
id;   29.001s  
- from dummypart insert into table s3dummypart partition (part=1) select id;
14.869s

** DYNAMIC PARTITIONS
- insert into table s3dummypart partition (part) select id, 1 from dummypart;   
15.185s
- insert into table s3dummypart partition (part) select id, 1 from dummypart;   
18.820s


Thanks,

Sergio Pena



Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

2016-07-26 Thread Sergio Pena

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/
---

(Updated July 26, 2016, 10:24 p.m.)


Review request for hive.


Changes
---

Changes on this patch:
- Added isBlobStorageFileSystem tests
- fix junit imports


Bugs: HIVE-14270
https://issues.apache.org/jira/browse/HIVE-14270


Repository: hive-git


Description
---

This patch will create a temporary directory for Hive intermediate data on HDFS 
when S3 tables are used.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java 
PRE-CREATION 
  common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/Context.java 
ec5d693d28a40925c44f844a05ebf3f5c10173c9 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 
9d927bd1a519f79bc7fa88c3b7e5c6cc2ef0637f 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
2671cb1cf2ef74f9d6628f8cdf3f5ac99283dbd8 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 

Diff: https://reviews.apache.org/r/50359/diff/


Testing
---

NO PATCH
** NON-PARTITIONED TABLE

- create table dummy (id int);  
 3.651s
- insert into table s3dummy values (1); 
39.231s
- insert overwrite table s3dummy values (1);
42.569s
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
dummy; 30.136s

EXTERNAL TABLE

- create table s3dummy_ext like s3dummy location 
's3a://spena-bucket/user/hive/warehouse/s3dummy';   9.297s
- insert into table s3dummy_ext values (1); 
45.855s

WITH PATCH

** NON-PARTITIONED TABLE
- create table s3dummy (id int) location 
's3a://spena-bucket/user/hive/warehouse/s3dummy';   3.945s
- insert into table s3dummy values (1); 
15.025s
- insert overwrite table s3dummy values (1);
25.149s 
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
dummy; 19.158s  
- from dummy insert overwrite table s3dummy select *;   
25.469s  
- from dummy insert into table s3dummy select *;
14.501s

** EXTERNAL TABLE
- create table s3dummy_ext like s3dummy location 
's3a://spena-bucket/user/hive/warehouse/s3dummy';   4.827s
- insert into table s3dummy_ext values (1); 
16.070s

** PARTITIONED TABLE
- create table s3dummypart (id int) partitioned by (part int)
  location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';
 3.176s
- alter table s3dummypart add partition (part=1);   
 3.229s
- alter table s3dummypart add partition (part=2);   
 3.124s
- insert into table s3dummypart partition (part=1) values (1);  
14.876s
- insert overwrite table s3dummypart partition (part=1) values (1); 
27.594s 
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * 
from dummypart; 22.298s  
- from dummypart insert overwrite table s3dummypart partition (part=1) select 
id;   29.001s  
- from dummypart insert into table s3dummypart partition (part=1) select id;
14.869s

** DYNAMIC PARTITIONS
- insert into table s3dummypart partition (part) select id, 1 from dummypart;   
15.185s
- insert into table s3dummypart partition (part) select id, 1 from dummypart;   
18.820s


Thanks,

Sergio Pena



Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

2016-07-26 Thread Sergio Pena


> On July 22, 2016, 10:05 p.m., Thomas Poepping wrote:
> > common/src/java/org/apache/hadoop/hive/common/ObjectStoreUtils.java, lines 
> > 44-46
> > 
> >
> > second @Steve Loughran's comment that we should pull this from a config 
> > file. maybe another config value for hive-site.xml, a comma separated value 
> > list of objectstore schemes? it need not all be S3 related, right?

Shoudn't be better if HDFS has a method to request for all blobstore scheme it 
supports? 
I think this method should help other non-hive components to see what Hadoop 
supports depending of the version.


On July 22, 2016, 10:05 p.m., Sergio Pena wrote:
> > We have multiple things to remember:
> >  - this needs to be extensible; not all objectstores are S3
> >  - we need this to be happening in the background, we can't have "if path 
> > is S3" in front of each time we find a tmpPath. that's not scalable (from a 
> > programmer's point of view, not a functionality point of view)

Agree. At some point we'd like to support the same blobstores hadoop currently 
supports.


- Sergio


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review143280
---


On July 26, 2016, 10:05 p.m., Sergio Pena wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> ---
> 
> (Updated July 26, 2016, 10:05 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
> https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch will create a temporary directory for Hive intermediate data on 
> HDFS when S3 tables are used.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/ObjectStorageUtils.java 
> PRE-CREATION 
>   common/src/test/org/apache/hadoop/hive/common/TestObjectStorageUtils.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 
> ec5d693d28a40925c44f844a05ebf3f5c10173c9 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 
> 9d927bd1a519f79bc7fa88c3b7e5c6cc2ef0637f 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> 2671cb1cf2ef74f9d6628f8cdf3f5ac99283dbd8 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> ---
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);
>3.651s
> - insert into table s3dummy values (1);   
>   39.231s
> - insert overwrite table s3dummy values (1);  
>   42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   9.297s
> - insert into table s3dummy_ext values (1);   
>   45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   3.945s
> - insert into table s3dummy values (1);   
>   15.025s
> - insert overwrite table s3dummy values (1);  
>   25.149s 
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 19.158s  
> - from dummy insert overwrite table s3dummy select *; 
>   25.469s  
> - from dummy insert into table s3dummy select *;  
>   14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   4.827s
> - insert into table s3dummy_ext values (1);   
>   16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';  
>3.176s
> - alter table s3dummypart add partition (part=1); 
>3.229s
> - alter table s3dummypart add partition (part=2); 
>3.124s
> - insert into table s3dummypart partition (part=1) values (1);
>

Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

2016-07-26 Thread Sergio Pena

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/
---

(Updated July 26, 2016, 10:05 p.m.)


Review request for hive.


Changes
---

Changes added on this patch:
- create a helper method on Context to get the temporary directory depending of 
the filesystem
- add more tests
- fix issue where staging directories where copied to s3


Bugs: HIVE-14270
https://issues.apache.org/jira/browse/HIVE-14270


Repository: hive-git


Description
---

This patch will create a temporary directory for Hive intermediate data on HDFS 
when S3 tables are used.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/common/ObjectStorageUtils.java 
PRE-CREATION 
  common/src/test/org/apache/hadoop/hive/common/TestObjectStorageUtils.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/Context.java 
ec5d693d28a40925c44f844a05ebf3f5c10173c9 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 
9d927bd1a519f79bc7fa88c3b7e5c6cc2ef0637f 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
2671cb1cf2ef74f9d6628f8cdf3f5ac99283dbd8 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 

Diff: https://reviews.apache.org/r/50359/diff/


Testing
---

NO PATCH
** NON-PARTITIONED TABLE

- create table dummy (id int);  
 3.651s
- insert into table s3dummy values (1); 
39.231s
- insert overwrite table s3dummy values (1);
42.569s
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
dummy; 30.136s

EXTERNAL TABLE

- create table s3dummy_ext like s3dummy location 
's3a://spena-bucket/user/hive/warehouse/s3dummy';   9.297s
- insert into table s3dummy_ext values (1); 
45.855s

WITH PATCH

** NON-PARTITIONED TABLE
- create table s3dummy (id int) location 
's3a://spena-bucket/user/hive/warehouse/s3dummy';   3.945s
- insert into table s3dummy values (1); 
15.025s
- insert overwrite table s3dummy values (1);
25.149s 
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
dummy; 19.158s  
- from dummy insert overwrite table s3dummy select *;   
25.469s  
- from dummy insert into table s3dummy select *;
14.501s

** EXTERNAL TABLE
- create table s3dummy_ext like s3dummy location 
's3a://spena-bucket/user/hive/warehouse/s3dummy';   4.827s
- insert into table s3dummy_ext values (1); 
16.070s

** PARTITIONED TABLE
- create table s3dummypart (id int) partitioned by (part int)
  location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';
 3.176s
- alter table s3dummypart add partition (part=1);   
 3.229s
- alter table s3dummypart add partition (part=2);   
 3.124s
- insert into table s3dummypart partition (part=1) values (1);  
14.876s
- insert overwrite table s3dummypart partition (part=1) values (1); 
27.594s 
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * 
from dummypart; 22.298s  
- from dummypart insert overwrite table s3dummypart partition (part=1) select 
id;   29.001s  
- from dummypart insert into table s3dummypart partition (part=1) select id;
14.869s

** DYNAMIC PARTITIONS
- insert into table s3dummypart partition (part) select id, 1 from dummypart;   
15.185s
- insert into table s3dummypart partition (part) select id, 1 from dummypart;   
18.820s


Thanks,

Sergio Pena



Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

2016-07-22 Thread Thomas Poepping

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review143280
---




common/src/java/org/apache/hadoop/hive/common/ObjectStoreUtils.java (lines 44 - 
46)


second @Steve Loughran's comment that we should pull this from a config 
file. maybe another config value for hive-site.xml, a comma separated value 
list of objectstore schemes? it need not all be S3 related, right?



common/src/test/org/apache/hadoop/hive/common/TestObjectStoreUtils.java (lines 
26 - 27)


suggest we use either junit.framework OR org.junit.



common/src/test/org/apache/hadoop/hive/common/TestObjectStoreUtils.java (lines 
30 - 47)


could we have a second test method that tests your 
isObjectStoreFileSystem() function?

you can mock the Filesystem objects with Mockito



ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java (lines 6646 - 
6654)


as suggested on the Jira issue, is there a way we could move this logic to 
a helper function, to avoid having to change it in multiple places, or 
newcomers to this section of the code potentially forgetting to check this?


We have multiple things to remember:
 - this needs to be extensible; not all objectstores are S3
 - we need this to be happening in the background, we can't have "if path is 
S3" in front of each time we find a tmpPath. that's not scalable (from a 
programmer's point of view, not a functionality point of view)

- Thomas Poepping


On July 22, 2016, 9:45 p.m., Sergio Pena wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> ---
> 
> (Updated July 22, 2016, 9:45 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
> https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch will create a temporary directory for Hive intermediate data on 
> HDFS when S3 tables are used.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/ObjectStoreUtils.java 
> PRE-CREATION 
>   common/src/test/org/apache/hadoop/hive/common/TestObjectStoreUtils.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 698efdc 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> ---
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);
>3.651s
> - insert into table s3dummy values (1);   
>   39.231s
> - insert overwrite table s3dummy values (1);  
>   42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   9.297s
> - insert into table s3dummy_ext values (1);   
>   45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   3.945s
> - insert into table s3dummy values (1);   
>   15.025s
> - insert overwrite table s3dummy values (1);  
>   25.149s 
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 19.158s  
> - from dummy insert overwrite table s3dummy select *; 
>   25.469s  
> - from dummy insert into table s3dummy select *;  
>   14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   4.827s
> - insert into table s3dummy_ext values (1);   
>   16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';  
>3.176s
> - alter table s3dummypart add partition (part=1); 
>3.229s
> - alter table s3dummypart add partition (part=2); 
>3.124s
> - 

Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

2016-07-22 Thread Sergio Pena

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/
---

Review request for hive.


Bugs: HIVE-14270
https://issues.apache.org/jira/browse/HIVE-14270


Repository: hive-git


Description
---

This patch will create a temporary directory for Hive intermediate data on HDFS 
when S3 tables are used.


Diffs
-

  common/src/java/org/apache/hadoop/hive/common/ObjectStoreUtils.java 
PRE-CREATION 
  common/src/test/org/apache/hadoop/hive/common/TestObjectStoreUtils.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 698efdc 

Diff: https://reviews.apache.org/r/50359/diff/


Testing
---

NO PATCH
** NON-PARTITIONED TABLE

- create table dummy (id int);  
 3.651s
- insert into table s3dummy values (1); 
39.231s
- insert overwrite table s3dummy values (1);
42.569s
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
dummy; 30.136s

EXTERNAL TABLE

- create table s3dummy_ext like s3dummy location 
's3a://spena-bucket/user/hive/warehouse/s3dummy';   9.297s
- insert into table s3dummy_ext values (1); 
45.855s

WITH PATCH

** NON-PARTITIONED TABLE
- create table s3dummy (id int) location 
's3a://spena-bucket/user/hive/warehouse/s3dummy';   3.945s
- insert into table s3dummy values (1); 
15.025s
- insert overwrite table s3dummy values (1);
25.149s 
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
dummy; 19.158s  
- from dummy insert overwrite table s3dummy select *;   
25.469s  
- from dummy insert into table s3dummy select *;
14.501s

** EXTERNAL TABLE
- create table s3dummy_ext like s3dummy location 
's3a://spena-bucket/user/hive/warehouse/s3dummy';   4.827s
- insert into table s3dummy_ext values (1); 
16.070s

** PARTITIONED TABLE
- create table s3dummypart (id int) partitioned by (part int)
  location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';
 3.176s
- alter table s3dummypart add partition (part=1);   
 3.229s
- alter table s3dummypart add partition (part=2);   
 3.124s
- insert into table s3dummypart partition (part=1) values (1);  
14.876s
- insert overwrite table s3dummypart partition (part=1) values (1); 
27.594s 
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * 
from dummypart; 22.298s  
- from dummypart insert overwrite table s3dummypart partition (part=1) select 
id;   29.001s  
- from dummypart insert into table s3dummypart partition (part=1) select id;
14.869s

** DYNAMIC PARTITIONS
- insert into table s3dummypart partition (part) select id, 1 from dummypart;   
15.185s
- insert into table s3dummypart partition (part) select id, 1 from dummypart;   
18.820s


Thanks,

Sergio Pena