date:20160809

Re: Permission to edit comments and close my Jira

2016-08-09 Thread Lefty Leverenz

Marta, I believe you need committer status to resolve a Jira issue, so I
did it for you (see HIVE-14387
).

For editing your own comments, perhaps someone else can tell you what
permission is needed.

-- Lefty

On Mon, Aug 8, 2016 at 10:32 AM, Marta Kuczora 
wrote:

> Hello,
>
> I noticed that I cannot edit my comments on a Hive Jira.
> I am also not able to resolve my Jiras. I have the HIVE-14387 issue which
> could be closed, but I am not able to set its status.
>
> Could one of the Hive Jira admins please give me permission to edit my
> comments and close my issues? My username is kuczoram.
>
> Thanks and regards,
> Marta
>

Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

2016-08-09 Thread Lefty Leverenz


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review145307
---




common/src/java/org/apache/hadoop/hive/conf/HiveConf.java (lines 3091 - 3092)


Tiny nit:  Either make "It" lowercase or move the parenthetical sentence 
after the first sentence, with a final period like this:

"Enable the use of scratch directories directly on blob storage systems. 
(It may cause performance penalties.)"


- Lefty Leverenz


On Aug. 9, 2016, 7:53 p.m., Sergio Pena wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> ---
> 
> (Updated Aug. 9, 2016, 7:53 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
> https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch will create a temporary directory for Hive intermediate data on 
> HDFS when S3 tables are used.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java 
> PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> 9f5f619359701b948f57d599a5bdc2ecbdff280a 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 
> 89893eba9fd2316b9a393f06edefa837bb815faf 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
> 5bd78862e1064d7f64a5d764571015a8df1101e8 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> ---
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);
>3.651s
> - insert into table s3dummy values (1);   
>   39.231s
> - insert overwrite table s3dummy values (1);  
>   42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   9.297s
> - insert into table s3dummy_ext values (1);   
>   45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   3.945s
> - insert into table s3dummy values (1);   
>   15.025s
> - insert overwrite table s3dummy values (1);  
>   25.149s 
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 19.158s  
> - from dummy insert overwrite table s3dummy select *; 
>   25.469s  
> - from dummy insert into table s3dummy select *;  
>   14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   4.827s
> - insert into table s3dummy_ext values (1);   
>   16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';  
>3.176s
> - alter table s3dummypart add partition (part=1); 
>3.229s
> - alter table s3dummypart add partition (part=2); 
>3.124s
> - insert into table s3dummypart partition (part=1) values (1);
>   14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);   
>   27.594s 
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * 
> from dummypart; 22.298s  
> - from dummypart insert overwrite table s3dummypart partition (part=1) select 
> id;   29.001s  
> - from dummypart insert into table s3dummypart partition (part=1) select id;  
>   14.869s
> 
> ** DYNAMIC PARTITIONS
> - insert into table s3dummypart partition (part) select id, 1 from dummypart; 
>

Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

2016-08-09 Thread Lefty Leverenz



> On July 30, 2016, 8:44 a.m., Lefty Leverenz wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, lines 3066-3067
> > 
> >
> > Typo:  Commad-separated --> Comma-separated
> > 
> > Redundancy:  "... supported blobstore schemes that Hive officially 
> > supports" (omit "supported")
> > 
> > Nit:  A period could be added at the end.
> 
> Lefty Leverenz wrote:
> Looks good now, thanks Sergio.

Aarrgh, forgot to publish that.

Adding a trivial comment for the new config.


- Lefty


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review144255
---


On Aug. 9, 2016, 7:53 p.m., Sergio Pena wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> ---
> 
> (Updated Aug. 9, 2016, 7:53 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
> https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch will create a temporary directory for Hive intermediate data on 
> HDFS when S3 tables are used.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java 
> PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> 9f5f619359701b948f57d599a5bdc2ecbdff280a 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 
> 89893eba9fd2316b9a393f06edefa837bb815faf 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
> 5bd78862e1064d7f64a5d764571015a8df1101e8 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> ---
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);
>3.651s
> - insert into table s3dummy values (1);   
>   39.231s
> - insert overwrite table s3dummy values (1);  
>   42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   9.297s
> - insert into table s3dummy_ext values (1);   
>   45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   3.945s
> - insert into table s3dummy values (1);   
>   15.025s
> - insert overwrite table s3dummy values (1);  
>   25.149s 
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 19.158s  
> - from dummy insert overwrite table s3dummy select *; 
>   25.469s  
> - from dummy insert into table s3dummy select *;  
>   14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   4.827s
> - insert into table s3dummy_ext values (1);   
>   16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';  
>3.176s
> - alter table s3dummypart add partition (part=1); 
>3.229s
> - alter table s3dummypart add partition (part=2); 
>3.124s
> - insert into table s3dummypart partition (part=1) values (1);
>   14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);   
>   27.594s 
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * 
> from dummypart; 22.298s  
> - from dummypart insert overwrite table s3dummypart partition (part=1) select 
> id;   29.001s  
> - from dummypart insert into table s3dummypar

Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

2016-08-09 Thread Lefty Leverenz



> On July 30, 2016, 8:44 a.m., Lefty Leverenz wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, lines 3066-3067
> > 
> >
> > Typo:  Commad-separated --> Comma-separated
> > 
> > Redundancy:  "... supported blobstore schemes that Hive officially 
> > supports" (omit "supported")
> > 
> > Nit:  A period could be added at the end.

Looks good now, thanks Sergio.


- Lefty


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review144255
---


On Aug. 9, 2016, 7:53 p.m., Sergio Pena wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> ---
> 
> (Updated Aug. 9, 2016, 7:53 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
> https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch will create a temporary directory for Hive intermediate data on 
> HDFS when S3 tables are used.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java 
> PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> 9f5f619359701b948f57d599a5bdc2ecbdff280a 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 
> 89893eba9fd2316b9a393f06edefa837bb815faf 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
> 5bd78862e1064d7f64a5d764571015a8df1101e8 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> ---
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);
>3.651s
> - insert into table s3dummy values (1);   
>   39.231s
> - insert overwrite table s3dummy values (1);  
>   42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   9.297s
> - insert into table s3dummy_ext values (1);   
>   45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   3.945s
> - insert into table s3dummy values (1);   
>   15.025s
> - insert overwrite table s3dummy values (1);  
>   25.149s 
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 19.158s  
> - from dummy insert overwrite table s3dummy select *; 
>   25.469s  
> - from dummy insert into table s3dummy select *;  
>   14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   4.827s
> - insert into table s3dummy_ext values (1);   
>   16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';  
>3.176s
> - alter table s3dummypart add partition (part=1); 
>3.229s
> - alter table s3dummypart add partition (part=2); 
>3.124s
> - insert into table s3dummypart partition (part=1) values (1);
>   14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);   
>   27.594s 
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * 
> from dummypart; 22.298s  
> - from dummypart insert overwrite table s3dummypart partition (part=1) select 
> id;   29.001s  
> - from dummypart insert into table s3dummypart partition (part=1) select id;  
>   14.869s
> 
> ** DYNAMIC PARTITIONS
> - insert int

Review Request 50942: HIVE-14376: Schema evolution tests takes a long time

2016-08-09 Thread j . prasanth . j


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50942/
---

Review request for hive and Siddharth Seth.


Bugs: HIVE-14376
https://issues.apache.org/jira/browse/HIVE-14376


Repository: hive-git


Description
---

HIVE-14376: Schema evolution tests takes a long time


Diffs
-

  itests/src/test/resources/testconfiguration.properties 
ac249ed2f13e834f429ebc17b55f3fc6b44ad724 
  ql/src/test/queries/clientpositive/schema_evol_orc_acid_mapwork_part.q 
768d77d24977228858076981a7092de2a7b3799a 
  ql/src/test/queries/clientpositive/schema_evol_orc_acid_mapwork_table.q 
09f4f22be5a1b46a3d55a487278416d6c89b2d0c 
  ql/src/test/queries/clientpositive/schema_evol_orc_acidvec_mapwork_part.q 
38afb9d922447c251e606a328267e9133f75ca3a 
  ql/src/test/queries/clientpositive/schema_evol_orc_acidvec_mapwork_table.q 
63de008a07c1a75d4ef51712596772599b53989b 
  ql/src/test/queries/clientpositive/schema_evol_orc_nonvec_fetchwork_part.q 
8f336366ef751a0971b6b233f5e438ea942a 
  ql/src/test/queries/clientpositive/schema_evol_orc_nonvec_fetchwork_table.q 
0328d0acbda234c67bca62ae59028189c16fac5b 
  ql/src/test/queries/clientpositive/schema_evol_orc_nonvec_mapwork_part.q 
859dc6572d58d7abf0ec5e3ea0ef0c2e1063806b 
  
ql/src/test/queries/clientpositive/schema_evol_orc_nonvec_mapwork_part_all_complex.q
 27cea8d98ab569b1e220a9a8e982ae06abaa9a1d 
  
ql/src/test/queries/clientpositive/schema_evol_orc_nonvec_mapwork_part_all_primitive.q
 899b4bbd1d978bad8bb37caa7d927101cb70e9af 
  ql/src/test/queries/clientpositive/schema_evol_orc_nonvec_mapwork_table.q 
88c7cf6f7ad5cb5fe224703d4b1997ab2c3437fb 
  ql/src/test/queries/clientpositive/schema_evol_orc_vec_mapwork_part.q 
180ce2142d742039dbc0bbcaaabb9be0153b2bba 
  
ql/src/test/queries/clientpositive/schema_evol_orc_vec_mapwork_part_all_complex.q
 f8a8fa691b33778b2a8feaa403a3801cd4c7c134 
  
ql/src/test/queries/clientpositive/schema_evol_orc_vec_mapwork_part_all_primitive.q
 3769485f5ad2f85e74f99328626a1907f460575a 
  ql/src/test/queries/clientpositive/schema_evol_orc_vec_mapwork_table.q 
98b70f900049936c137f4794ec07ef44c995662c 
  ql/src/test/queries/clientpositive/schema_evol_text_nonvec_mapwork_part.q 
978e76db030af4a13d25890d7c9237afc8c6734d 
  
ql/src/test/queries/clientpositive/schema_evol_text_nonvec_mapwork_part_all_complex.q
 c1e8af60994d4249e45d2d6f9ef8f310479d13bc 
  
ql/src/test/queries/clientpositive/schema_evol_text_nonvec_mapwork_part_all_primitive.q
 4ed92c718701535b96cf49636ea8549c0d28c0b3 
  ql/src/test/queries/clientpositive/schema_evol_text_nonvec_mapwork_table.q 
9fa020a8e707ff4a19a4d57a038b87386624b3c1 
  ql/src/test/queries/clientpositive/schema_evol_text_vec_mapwork_part.q 
c21bf8fa9ca18d7b6efd6c48f8d99db28db62282 
  
ql/src/test/queries/clientpositive/schema_evol_text_vec_mapwork_part_all_complex.q
 a91454478597d3d61cb51f0256cba9dbfaed0ab7 
  
ql/src/test/queries/clientpositive/schema_evol_text_vec_mapwork_part_all_primitive.q
 30a1c08f152265d4b2f8e740987e823835326ae7 
  ql/src/test/queries/clientpositive/schema_evol_text_vec_mapwork_table.q 
b20f7e848b48a4717a55ad6e6da6c28a87cb9221 
  ql/src/test/queries/clientpositive/schema_evol_text_vecrow_mapwork_part.q 
c54ed91b65e6cbbfbad3ce6a05c3780734c594a3 
  
ql/src/test/queries/clientpositive/schema_evol_text_vecrow_mapwork_part_all_complex.q
 7737abfe4942f35c9a224ef477540d3a979e3bc9 
  
ql/src/test/queries/clientpositive/schema_evol_text_vecrow_mapwork_part_all_primitive.q
 62e1405339709df4a2e7e3e1b232ffcff9496103 
  ql/src/test/queries/clientpositive/schema_evol_text_vecrow_mapwork_table.q 
88716231bbf110d79a74209eaf1937410216b9ff 
  
ql/src/test/results/clientpositive/llap/schema_evol_orc_acid_mapwork_part.q.out 
fafad50b082eb25314c49324500cb04a82142b78 
  
ql/src/test/results/clientpositive/llap/schema_evol_orc_acid_mapwork_table.q.out
 e69e9bddd86dfe23caf64b726cf2761815ffbc4f 
  
ql/src/test/results/clientpositive/llap/schema_evol_orc_acidvec_mapwork_part.q.out
 abe001d1eef1a359e68b60141083dbe349cd6ba0 
  
ql/src/test/results/clientpositive/llap/schema_evol_orc_acidvec_mapwork_table.q.out
 8ce8794d02a1b87e77851a302c39d5a9eebd640e 
  
ql/src/test/results/clientpositive/llap/schema_evol_orc_nonvec_fetchwork_part.q.out
 d1634a9607d6e6f3ca60d26c17a870a420b3347e 
  
ql/src/test/results/clientpositive/llap/schema_evol_orc_nonvec_fetchwork_table.q.out
 b569a94dd5ff289257e59e7bd87394ae4f5567b6 
  
ql/src/test/results/clientpositive/llap/schema_evol_orc_nonvec_mapwork_part.q.out
 127d5a9d1dc515d60ec9b280faf18a3d2afc989c 
  
ql/src/test/results/clientpositive/llap/schema_evol_orc_nonvec_mapwork_part_all_complex.q.out
 9f47c1cec52504a7d8dab798cb376464ad293ad3 
  
ql/src/test/results/clientpositive/llap/schema_evol_orc_nonvec_mapwork_part_all_primitive.q.out
 5e7507ebf284a12dfbf4bb1d57769119bca10e18 
  
ql/src/test/results/clientpositive/llap/schema_evol_orc_non

Re: Hive unit testing in other projects

2016-08-09 Thread Matt Burgess

I am interested in this as well, we have Hive processors (in Apache
NiFi) but have been using Derby to test the processor logic (not
necessarily the interaction with Hive). With an embedded/small Hive it
would be an integration test but still would help a great deal to iron
out bugs.

Regards,
Matt

On Tue, Aug 9, 2016 at 7:37 PM, Chris Teoh  wrote:
> Hi folks,
>
> I'm working on a Sqoop patch that imports to Hive and am wondering what's
> the easiest way to incorporate a unit test that uses a mini Hive server
> without requiring a full dependency on Hive project?
>
> Kind Regards
> Chris

filter push down help

2016-08-09 Thread Zhu Li

Hi,

I am working on a connector where I need to push filters down to readers of
Hive tables. So I should convert the filter in predicate to
ExprNodeGenericFuncDesc first. I could code this by myself but I guess
there must be some methods in Hive that are actually doing this. However,
it is difficult for me to find out how the process of converting the filter
in predicate into ExprNodeGenericFuncDesc happens in Hive. Could anyone
give any help?

Thanks!

[jira] [Created] (HIVE-14509) AvroSerde mutates tinyint and smallint columns when specifying native columns

2016-08-09 Thread Mark Wagner (JIRA)

Mark Wagner created HIVE-14509:
--

 Summary: AvroSerde mutates tinyint and smallint columns when 
specifying native columns
 Key: HIVE-14509
 URL: https://issues.apache.org/jira/browse/HIVE-14509
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 2.2.0
Reporter: Mark Wagner


tinyint and smallint go in, int comes out:

{noformat}
string1 string  
int1int 
tinyint1int 
smallint1   int 
bigint1 bigint  
boolean1boolean 
float1  float   
double1 double  
list1   array   
map1map 
struct1 struct

enum1   string  
nullableint int 
bytes1  binary  
fixed1  binary
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Hive unit testing in other projects

2016-08-09 Thread Chris Teoh

Hi folks,

I'm working on a Sqoop patch that imports to Hive and am wondering what's
the easiest way to incorporate a unit test that uses a mini Hive server
without requiring a full dependency on Hive project?

Kind Regards
Chris

Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

2016-08-09 Thread Sergio Pena

> On Aug. 9, 2016, 9:49 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java, lines 
> > 1807-1814
> > 
> >
> > Why not use newly added Context::getTempDirForPath(Path path) here.

Yeah, sorry. This is a little confusing. 

The thing is that 'tmpDir' is based on 'dest' (tmpDir = 
baseCtx.getExternalTmpPath(dest)) where 'dest' is an HDFS temporary directory 
(not S3). This is the directory causing the .hive-staging to be created on S3 
at the end, when HDFS temp dir was copied to S3 (INSERT OVERWRITE).

I found out that FileSinkDesc has a 'getDestPath' that returns you the S3 path. 
So, the condition is if the 'getDestPath' is on S3, then use 'getMRTmpPath', or 
continue using the temporary path based on 'dest' (HDFS temp path).

That part of the code was a little confusing regarding the names 'dest', 
'getDestPath', 'getFinalDirName'. I was trying to understand this code, but I 
could not figure out the idea behind 'getFinalDirnName', and 'getDestPath'; so 
I ended up writing that condition. Also, the comments that were already there 
mentioned that the temp file should be in the same filesystem as the 
destination (in case of non-blobstore directories).

> On Aug. 9, 2016, 9:49 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java, lines 
> > 7020-7024
> > 
> >
> > Why not use newly introduced tx.getTempDirForPath(dest_path); here?

This part was causing 72 tests failing due to the different scratch directory 
name. Also I wasn't sure why the stats temp was on the same location as 
'queryTmpdir', so I added the condition too incase it has issues with encrypted 
zones. I like your line best, but I wasn't sure about it, and I ended up doing 
this condition.

I can do the 'ctx.getTempDirForPath' better. What do you think?

> On Aug. 9, 2016, 9:49 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java, line 6763
> > 
> >
> > surprised that we weren't using getExternalTmpPathRelTo() here, did we 
> > miss this when we introduced this method for encrypt support work?

Mmm, i'm surprised too. Maybe we missed it.

- Sergio

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review145269
---

On Aug. 9, 2016, 7:53 p.m., Sergio Pena wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> ---
> 
> (Updated Aug. 9, 2016, 7:53 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
> https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch will create a temporary directory for Hive intermediate data on 
> HDFS when S3 tables are used.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java 
> PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> 9f5f619359701b948f57d599a5bdc2ecbdff280a 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 
> 89893eba9fd2316b9a393f06edefa837bb815faf 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
> 5bd78862e1064d7f64a5d764571015a8df1101e8 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> ---
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);
>3.651s
> - insert into table s3dummy values (1);   
>   39.231s
> - insert overwrite table s3dummy values (1);  
>   42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   9.297s
> - insert into table s3dummy_ext values (1);   
>   45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   3

Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

2016-08-09 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review145269
---




ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java (lines 1807 
- 1814)


Why not use newly added Context::getTempDirForPath(Path path) here.



ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java (line 6763)


surprised that we weren't using getExternalTmpPathRelTo() here, did we miss 
this when we introduced this method for encrypt support work?



ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java (lines 7020 - 
7024)


Why not use newly introduced tx.getTempDirForPath(dest_path); here?


- Ashutosh Chauhan


On Aug. 9, 2016, 7:53 p.m., Sergio Pena wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> ---
> 
> (Updated Aug. 9, 2016, 7:53 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
> https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch will create a temporary directory for Hive intermediate data on 
> HDFS when S3 tables are used.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java 
> PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> 9f5f619359701b948f57d599a5bdc2ecbdff280a 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 
> 89893eba9fd2316b9a393f06edefa837bb815faf 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
> 5bd78862e1064d7f64a5d764571015a8df1101e8 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> ---
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);
>3.651s
> - insert into table s3dummy values (1);   
>   39.231s
> - insert overwrite table s3dummy values (1);  
>   42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   9.297s
> - insert into table s3dummy_ext values (1);   
>   45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   3.945s
> - insert into table s3dummy values (1);   
>   15.025s
> - insert overwrite table s3dummy values (1);  
>   25.149s 
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 19.158s  
> - from dummy insert overwrite table s3dummy select *; 
>   25.469s  
> - from dummy insert into table s3dummy select *;  
>   14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   4.827s
> - insert into table s3dummy_ext values (1);   
>   16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';  
>3.176s
> - alter table s3dummypart add partition (part=1); 
>3.229s
> - alter table s3dummypart add partition (part=2); 
>3.124s
> - insert into table s3dummypart partition (part=1) values (1);
>   14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);   
>   27.594s 
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * 
> from dummypart; 22.298s  
> - from dummypart insert overwrite table

[jira] [Created] (HIVE-14508) Explore surefire parallel testing options

2016-08-09 Thread Prasanth Jayachandran (JIRA)

Prasanth Jayachandran created HIVE-14508:


 Summary: Explore surefire parallel testing options
 Key: HIVE-14508
 URL: https://issues.apache.org/jira/browse/HIVE-14508
 Project: Hive
  Issue Type: Sub-task
  Components: Test
Affects Versions: 2.2.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran


Maven surefire seems to have some options for running tests in parallel. Will 
be worthwhile to explore it to improve the overall test runtime. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14507) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_llap_counters failure

2016-08-09 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

Hari Sankar Sivarama Subramaniyan created HIVE-14507:


 Summary: 
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_llap_counters
 failure
 Key: HIVE-14507
 URL: https://issues.apache.org/jira/browse/HIVE-14507
 Project: Hive
  Issue Type: Sub-task
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan


Fails locally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14506) TestQueryLifeTimeHook hang

2016-08-09 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

Hari Sankar Sivarama Subramaniyan created HIVE-14506:


 Summary: TestQueryLifeTimeHook hang
 Key: HIVE-14506
 URL: https://issues.apache.org/jira/browse/HIVE-14506
 Project: Hive
  Issue Type: Sub-task
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan


The test hangs locally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14505) Analyze org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching failure

2016-08-09 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

Hari Sankar Sivarama Subramaniyan created HIVE-14505:


 Summary:  Analyze 
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching failure
 Key: HIVE-14505
 URL: https://issues.apache.org/jira/browse/HIVE-14505
 Project: Hive
  Issue Type: Sub-task
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan


Flaky test failure. Fails ~50% of the time locally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Review Request 50934: HIVE-14233 Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-09 Thread Saket Saurabh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50934/
---

Review request for hive and Eugene Koifman.


Repository: hive-git


Description
---

https://issues.apache.org/jira/browse/HIVE-14233


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 
334cb31c5406f500c122a11eccef25b92d357cd4 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java 
e46ca51eff9c230147166e9428d7f462d2f9e772 
  
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
 PRE-CREATION 
  ql/src/test/queries/clientpositive/acid_vectorization.q 
832909bdb1bc79e01163373beed03eaaffcefd3d 
  ql/src/test/results/clientpositive/acid_vectorization.q.out 
1792979156ec361c85882ac8b6968e93d42b5f31 

Diff: https://reviews.apache.org/r/50934/diff/


Testing
---

This JIRA proposes to improve vectorization for ACID by eliminating row-by-row 
stitching when reading back ACID files. In the current implementation, a 
vectorized row batch is created by populating the batch one row at a time, 
before the vectorized batch is passed up along the operator pipeline. This 
row-by-row stitching limitation was because of the fact that the ACID 
insert/update/delete events from various delta files needed to be merged 
together before the actual version of a given row was found out. HIVE-14035 has 
enabled us to break away from that limitation by splitting ACID update events 
into a combination of delete+insert. In fact, it has now enabled us to create 
splits on delta files.
Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
bottleneck in the vectorized code path for ACID by now directly reading row 
batches from the underlying ORC files and avoiding any stitching altogether. 
Once a row batch is read from the split (which may be on a base/delta file), 
the deleted rows will be found by cross-referencing them against a data 
structure that will just keep track of deleted events (found in the 
deleted_delta files). This will lead to a large performance gain when reading 
ACID files in vectorized fashion, while enabling further optimizations in 
future that can be done on top of that.


Thanks,

Saket Saurabh

[jira] [Created] (HIVE-14504) tez_join_hash.q test is slow

2016-08-09 Thread Prasanth Jayachandran (JIRA)

Prasanth Jayachandran created HIVE-14504:


 Summary: tez_join_hash.q test is slow
 Key: HIVE-14504
 URL: https://issues.apache.org/jira/browse/HIVE-14504
 Project: Hive
  Issue Type: Sub-task
  Components: Test
Affects Versions: 2.2.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran


tez_join_hash.q also explicitly sets execution engine to mr which slows down 
the entire test. Test takes around 7 mins. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14503) Remove explicit order by in qfiles and replace them with SORT_QUERY_RESULTS

2016-08-09 Thread Prasanth Jayachandran (JIRA)

Prasanth Jayachandran created HIVE-14503:


 Summary: Remove explicit order by in qfiles and replace them with 
SORT_QUERY_RESULTS
 Key: HIVE-14503
 URL: https://issues.apache.org/jira/browse/HIVE-14503
 Project: Hive
  Issue Type: Sub-task
  Components: Test
Affects Versions: 2.2.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran


Identify qfiles with explicit order by and replace them with SORT_QUERY_RESULTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14502) Convert MiniTez tests to MiniLlap tests

2016-08-09 Thread Prasanth Jayachandran (JIRA)

Prasanth Jayachandran created HIVE-14502:


 Summary: Convert MiniTez tests to MiniLlap tests
 Key: HIVE-14502
 URL: https://issues.apache.org/jira/browse/HIVE-14502
 Project: Hive
  Issue Type: Sub-task
  Components: Test
Affects Versions: 2.2.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran


Llap shares most of the codepath with tez. MiniLlapCliDriver is much faster 
than MiniTezCliDriver because of threaded executors and caching. 
MiniTezCliDriver tests takes around 3hr 15mins to run around 400 tests. To cut 
down this test time significantly it makes sense to move over mive tez tests to 
mini llap tests.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14501) MiniTez test for union_type_chk.q is slow

2016-08-09 Thread Prasanth Jayachandran (JIRA)

Prasanth Jayachandran created HIVE-14501:


 Summary: MiniTez test for union_type_chk.q is slow
 Key: HIVE-14501
 URL: https://issues.apache.org/jira/browse/HIVE-14501
 Project: Hive
  Issue Type: Sub-task
  Components: Test
Affects Versions: 2.2.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran


union_type_chk.q runs on minimr and minitez but the test itself explicitly sets 
execution engine as mr. It takes around 10 mins to run this test. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

2016-08-09 Thread Sergio Pena


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/
---

(Updated Aug. 9, 2016, 7:53 p.m.)


Review request for hive.


Changes
---

- Added new flag variable that allows users to use the table blobstorage 
location as scratch directory.
- Other minor fixes to allow tests to pass.


Bugs: HIVE-14270
https://issues.apache.org/jira/browse/HIVE-14270


Repository: hive-git


Description
---

This patch will create a temporary directory for Hive intermediate data on HDFS 
when S3 tables are used.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java 
PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
9f5f619359701b948f57d599a5bdc2ecbdff280a 
  common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/Context.java 
89893eba9fd2316b9a393f06edefa837bb815faf 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
5bd78862e1064d7f64a5d764571015a8df1101e8 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 

Diff: https://reviews.apache.org/r/50359/diff/


Testing
---

NO PATCH
** NON-PARTITIONED TABLE

- create table dummy (id int);  
 3.651s
- insert into table s3dummy values (1); 
39.231s
- insert overwrite table s3dummy values (1);
42.569s
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
dummy; 30.136s

EXTERNAL TABLE

- create table s3dummy_ext like s3dummy location 
's3a://spena-bucket/user/hive/warehouse/s3dummy';   9.297s
- insert into table s3dummy_ext values (1); 
45.855s

WITH PATCH

** NON-PARTITIONED TABLE
- create table s3dummy (id int) location 
's3a://spena-bucket/user/hive/warehouse/s3dummy';   3.945s
- insert into table s3dummy values (1); 
15.025s
- insert overwrite table s3dummy values (1);
25.149s 
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
dummy; 19.158s  
- from dummy insert overwrite table s3dummy select *;   
25.469s  
- from dummy insert into table s3dummy select *;
14.501s

** EXTERNAL TABLE
- create table s3dummy_ext like s3dummy location 
's3a://spena-bucket/user/hive/warehouse/s3dummy';   4.827s
- insert into table s3dummy_ext values (1); 
16.070s

** PARTITIONED TABLE
- create table s3dummypart (id int) partitioned by (part int)
  location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';
 3.176s
- alter table s3dummypart add partition (part=1);   
 3.229s
- alter table s3dummypart add partition (part=2);   
 3.124s
- insert into table s3dummypart partition (part=1) values (1);  
14.876s
- insert overwrite table s3dummypart partition (part=1) values (1); 
27.594s 
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * 
from dummypart; 22.298s  
- from dummypart insert overwrite table s3dummypart partition (part=1) select 
id;   29.001s  
- from dummypart insert into table s3dummypart partition (part=1) select id;
14.869s

** DYNAMIC PARTITIONS
- insert into table s3dummypart partition (part) select id, 1 from dummypart;   
15.185s
- insert into table s3dummypart partition (part) select id, 1 from dummypart;   
18.820s


Thanks,

Sergio Pena

Re: Review Request 50906: HIVE-14444 Upgrade qtest execution framework to junit4 - migrate most of them

2016-08-09 Thread Peter Vary



> On Aug. 9, 2016, 12:14 a.m., Peter Vary wrote:
> > Hi Zoltan,
> > 
> > Thanks for the patch, I can see, that you were working on it even on the 
> > weekend.
> > 
> > Please help me to understand the components a little more, so I could help 
> > with the review.
> > As I can see there are 3 levels of the classes for every given test:
> > - Configuration
> > - Adapter
> > - Driver
> > 
> > I have tried to identify the functionality of the given elements, and come 
> > up with the following:
> > - Configuration - The queries to run, the configuration of the clusters, 
> > and the initial data
> > - Adapter - The actual methods for implementing the test, like class, 
> > method level setup, and test execution
> > - Driver - These contain very little code, and they look very simmilar, so 
> > a lot of code duplication is there - should not be a good idea to merge it 
> > with the Adapter class? Also it is a little strange, that the Configuration 
> > has to have a reference to the Adapter. If you decide to merge the Adapter 
> > and the Driver, then the reference is not needed anymore.
> > 
> > Thanks,
> > Peter
> 
> Zoltan Haindrich wrote:
> Hello,
> 
> yeah...i wanted to take advantage of the "empty queue" on the ptest 
> executor ;)
> by the way i think that all hive precommit jobs which end-up on ubuntu-3 
> will fail with some wierd jdk issue...
> 
> I think you are getting it right...those classes which have "Driver" in 
> there names are the successors of the old vm files: i don't want to touch 
> them until we have all of them on board.
> There is some redundancy even between the Driver classes...CliDriver and 
> some others are very similar - it will be easy to drop some of them.
> Merge with the adapter would possibly remove the common parent - and that 
> would possibly break the factory adapterfactory.
> The positive side of the current design is that all configurations are in 
> one place...even the core executor selection is in CliConfigs - so you have 
> to look at just one place if you have to modify it.
> 
> About more refactoring work: reviewboard can pick-up changes in renamed 
> files (which is great) - but if I add more refactors to this patch: it will 
> look like a "20 files remove", "30 files added" - which is not really review 
> friendly ;) it have already lost track of the changes of PerfCliDriver and 
> QTestGenTask.
> 
> I would like to continue this with a cleanup refactor; after AccumuloCli 
> and BeelineCli is on-board. 
> 
> regards,
> Zoltan

Hi Zoltan,

If I were in BP, I would offer you a beer, to discuss this above it :).
Unfortunately this is not an option now, so we have to do it on the hard way.

What final design do you have in your mind, I think we should discuss these 
changes in the light of those, and should not focus on partial solutions.
For example - correct me if I wrong - the Adapter model is most useful, if 
there is an existing interface, we have to adhere. So the final design does not 
require an adapter since the interfaces are used by only the tests, and we 
could change them if needed.

I think we should plan for the following changes, and keep everything else as 
simple as possible:
1. Adding new queries - this happens very often (maybe too often in my opinion, 
but let’s not discuss it here :) )
2. Changing how to handle the specific test case results 
(ordering/filtering/regexp) - QTestUtil, HBaseQTestUtil, QFileClient for BeeLine
3. Adding new test, to test new integrated components - like it was in case of 
BeeLine/Spark/Tez

Only in the 3rd case should we touch the Driver, and the Adapter, but then we 
should change both of them. For me it means that they are tightly coupled, and 
might be a good idea to merge them.

What do you think?

Thanks,
Peter


- Peter


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50906/#review145152
---


On Aug. 8, 2016, 8:47 p.m., Zoltan Haindrich wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50906/
> ---
> 
> (Updated Aug. 8, 2016, 8:47 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-1
> https://issues.apache.org/jira/browse/HIVE-1
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> this patch focuses on removing the qtestgen task without introducing 
> regressions in existing testing services - its not the nicest patch...but 
> opens the possiblities to continue refactoring of these classes inside a pure 
> java environment
> 
> this patch contains a bunch of helper classes to provide Tests for the junit4 
> executor.
> 
> I've mimiced the old generated testcases behaviour:
> 
> * every te

[jira] [Created] (HIVE-14500) Support masking of columns for materialized views

2016-08-09 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-14500:
--

 Summary: Support masking of columns for materialized views
 Key: HIVE-14500
 URL: https://issues.apache.org/jira/browse/HIVE-14500
 Project: Hive
  Issue Type: Sub-task
  Components: Materialized views
Affects Versions: 2.2.0
Reporter: Jesus Camacho Rodriguez


Verify that column masking is working for materialized views and provided 
necessary extensions. Add test cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14499) Add HMS metrics for materialized views

2016-08-09 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-14499:
--

 Summary: Add HMS metrics for materialized views
 Key: HIVE-14499
 URL: https://issues.apache.org/jira/browse/HIVE-14499
 Project: Hive
  Issue Type: Sub-task
  Components: Materialized views
Affects Versions: 2.2.0
Reporter: Jesus Camacho Rodriguez


Related to HIVE-10761.

We should be able to show some metrics related to materialized views, such as 
the number of materialized views, size of the materialized views, number of 
accesses, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14498) Timeout for query rewriting using materialized views

2016-08-09 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-14498:
--

 Summary: Timeout for query rewriting using materialized views
 Key: HIVE-14498
 URL: https://issues.apache.org/jira/browse/HIVE-14498
 Project: Hive
  Issue Type: Sub-task
  Components: Materialized views
Affects Versions: 2.2.0
Reporter: Jesus Camacho Rodriguez


Once we have query rewriting in place (HIVE-14496), one of the main issues is 
data freshness in the materialized views.

Since we will not support view maintenance at first, we could include a 
HiveConf property to configure a max freshness period (_n timeunits_). If a 
query comes, and the materialized view has been populated (by create, refresh, 
etc.) for a longer period than _n_, then we should not use it for rewriting the 
query.

Optionally, we could print a warning for the user indicating that the 
materialized was not used because it was not fresh.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14497) Fine control for using materialized views in rewriting

2016-08-09 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-14497:
--

 Summary: Fine control for using materialized views in rewriting
 Key: HIVE-14497
 URL: https://issues.apache.org/jira/browse/HIVE-14497
 Project: Hive
  Issue Type: Sub-task
  Components: Materialized views
Affects Versions: 2.2.0
Reporter: Jesus Camacho Rodriguez


Follow-up of HIVE-14495. Since the number of materialized views in the system 
might grow very large, and query rewriting using materialized views might be 
very expensive, we need to include a mechanism to enable/disable materialized 
views for query rewriting.

Thus, we should extend the CREATE MATERIALIZED VIEW statement as follows:
{code:sql}
CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db_name.]materialized_view_name
  [BUILD DEFERRED]
  [ENABLE REWRITE] -- NEW!
  [COMMENT materialized_view_comment]
  [
   [ROW FORMAT row_format] 
   [STORED AS file_format]
 | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]
  AS select_statement;
{code}

Further, we should extend the ALTER statement in case we want to change the 
behavior of the materialized view after we have created it.
{code:sql}
ALTER MATERIALIZED VIEW [db_name.]materialized_view_name DISABLE REWRITE;
{code}
{code:sql}
ALTER MATERIALIZED VIEW [db_name.]materialized_view_name ENABLE REWRITE;
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 50873: HIVE-14453 refactor physical writing of ORC data and metadata to FS from the logical writers

2016-08-09 Thread Sergey Shelukhin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50873/
---

(Updated Aug. 9, 2016, 6:24 p.m.)


Review request for hive and Prasanth_J.


Repository: hive-git


Description
---

see jira; mostly moved code


Diffs (updated)
-

  
llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/OrcEncodedDataConsumer.java
 94e4750 
  orc/src/java/org/apache/orc/impl/OrcTail.java b5f85fb 
  orc/src/java/org/apache/orc/impl/PhysicalFsWriter.java PRE-CREATION 
  orc/src/java/org/apache/orc/impl/PhysicalWriter.java PRE-CREATION 
  orc/src/java/org/apache/orc/impl/ReaderImpl.java d6df7d7 
  orc/src/java/org/apache/orc/impl/RecordReaderUtils.java 1067957 
  orc/src/java/org/apache/orc/impl/WriterImpl.java b2966e0 
  orc/src/test/org/apache/orc/impl/TestOrcWideTable.java 289a86e 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java 8e52907 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java 
6648829 

Diff: https://reviews.apache.org/r/50873/diff/


Testing
---


Thanks,

Sergey Shelukhin

[jira] [Created] (HIVE-14496) Enable Calcite rewriting with materialized views

2016-08-09 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-14496:
--

 Summary: Enable Calcite rewriting with materialized views
 Key: HIVE-14496
 URL: https://issues.apache.org/jira/browse/HIVE-14496
 Project: Hive
  Issue Type: Sub-task
  Components: Materialized views
Affects Versions: 2.2.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Calcite already supports query rewriting using materialized views. We will use 
it to support this feature in Hive.

In order to do that, we need to register the existing materialized views with 
Calcite view service and enable the materialized views rewriting rules. 

We should include a HiveConf flag to completely disable query rewriting using 
materialized views if necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 50896: HIVE-14404: Allow delimiterfordsv to use multiple-character delimiters

2016-08-09 Thread Szehon Ho


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50896/#review145227
---



I'm ambivilent, I would rather have pursued the change to make it in superCSV 
to be better in long run.  But I do see it might not move very fast (did you 
guys try contacting them?).   The patch itself looks mostly fine though.

My only question is, does it need to be a 2nd version of the format?  That is, 
is there anything that is actually backward incompatibie other than adding a 
new flag?  Thanks.

- Szehon Ho


On Aug. 8, 2016, 3:13 p.m., Marta Kuczora wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50896/
> ---
> 
> (Updated Aug. 8, 2016, 3:13 p.m.)
> 
> 
> Review request for hive, Naveen Gangam, Sergio Pena, Szehon Ho, and Xuefu 
> Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Introduced a new outputformat (dsv2) which supports multiple characters as 
> delimiter.
> For generating the dsv, csv2 and tsv2 outputformats, the Super CSV library is 
> used. This library doesn’t support multiple characters as delimiter. Since 
> the same logic is used for generating csv2, tsv2 and dsv outputformats, I 
> decided not to change this logic, rather introduce a new outputformat (dsv2) 
> which supports multiple characters as delimiter. 
> The new dsv2 outputformat has the same escaping logic as the dsv outputformat 
> if the quoting is not disabled.
> Extended the TestBeeLineWithArgs tests with new test steps which are using 
> multiple characters as delimiter.
> 
> Main changes in the code:
>  - Changed the SeparatedValuesOutputFormat class to be an abstract class and 
> created two new child classes to separate the logic for single-character and 
> multi-character delimiters: SingleCharSeparatedValuesOutputFormat and 
> MultiCharSeparatedValuesOutputFormat
> 
>  - Kept the methods which are used by both children in the 
> SeparatedValuesOutputFormat and moved the methods specific to the 
> single-character case to the SingleCharSeparatedValuesOutputFormat class.
> 
>  - Didn’t change the logic which was in the SeparatedValuesOutputFormat, only 
> moved some parts to the child class.
> 
>  - Implemented the value escaping and concatenation with the delimiter string 
> in the MultiCharSeparatedValuesOutputFormat.
> 
> 
> Diffs
> -
> 
>   beeline/src/java/org/apache/hive/beeline/BeeLine.java e0fa032 
>   beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java e6e24b1 
>   
> beeline/src/java/org/apache/hive/beeline/MultiCharSeparatedValuesOutputFormat.java
>  PRE-CREATION 
>   beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java 
> 66d9fd0 
>   
> beeline/src/java/org/apache/hive/beeline/SingleCharSeparatedValuesOutputFormat.java
>  PRE-CREATION 
>   
> itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java
>  892c733 
> 
> Diff: https://reviews.apache.org/r/50896/diff/
> 
> 
> Testing
> ---
> 
> - Tested manually in BeeLine.
> - Extended the TestBeeLineWithArgs tests with new test steps which are using 
> multiple characters as delimiter.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>

[jira] [Created] (HIVE-14495) Add SHOW MATERIALIZED VIEWS statement

2016-08-09 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-14495:
--

 Summary: Add SHOW MATERIALIZED VIEWS statement
 Key: HIVE-14495
 URL: https://issues.apache.org/jira/browse/HIVE-14495
 Project: Hive
  Issue Type: Sub-task
  Components: Materialized views
Affects Versions: 2.2.0
Reporter: Jesus Camacho Rodriguez


In the spirit of {{SHOW TABLES}}, we should support the following statement:

{code:sql}
SHOW MATERIALIZED VIEWS [IN database_name] ['identifier_with_wildcards'];
{sql}

In contrast to {{SHOW TABLES}}, this command would only list the materialized 
views.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14494) Add support for BUILD DEFERRED

2016-08-09 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-14494:
--

 Summary: Add support for BUILD DEFERRED
 Key: HIVE-14494
 URL: https://issues.apache.org/jira/browse/HIVE-14494
 Project: Hive
  Issue Type: Sub-task
  Components: Materialized views
Affects Versions: 2.2.0
Reporter: Jesus Camacho Rodriguez


This is an important feature, as it allows to declare materialized views but do 
not materialize them till they are used for the first use, or a REBUILD 
statement is executed. The extension for the CREATE MATERIALIZED VIEW syntax 
should be as follows:

{code:sql}
CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db_name.]table_name
  [BUILD DEFERRED] -- NEW!
  [COMMENT table_comment]
  [
   [ROW FORMAT row_format] 
   [STORED AS file_format]
 | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]
  AS select_statement;
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14493) Partitioning support for materialized views

2016-08-09 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-14493:
--

 Summary: Partitioning support for materialized views
 Key: HIVE-14493
 URL: https://issues.apache.org/jira/browse/HIVE-14493
 Project: Hive
  Issue Type: Sub-task
  Components: Materialized views
Affects Versions: 2.2.0
Reporter: Jesus Camacho Rodriguez


We should support defining a partitioning specification for materialized views 
and that the results of the materialized view evaluation are stored meeting the 
partitioning spec. 

The syntax should be extended as follows:

{code:sql}
CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db_name.]table_name
  [COMMENT table_comment]
  [PARTITIONED ON (col_name, ...)] -- NEW!
  [
   [ROW FORMAT row_format] 
   [STORED AS file_format]
 | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]
  AS select_statement;
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14492) Optimize query in CREATE MATERIALIZED VIEW statement in Calcite

2016-08-09 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-14492:
--

 Summary: Optimize query in CREATE MATERIALIZED VIEW statement in 
Calcite
 Key: HIVE-14492
 URL: https://issues.apache.org/jira/browse/HIVE-14492
 Project: Hive
  Issue Type: Sub-task
  Components: Materialized views
Affects Versions: 2.2.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


When we create a materialized view, the query specified in the statement should 
be optimized by Calcite, as this might make a huge performance difference when 
we materialize the results.

Further, Calcite optimization should be trigger too when we execute {{ALTER 
MATERIALIZED VIEW ... REBUILD}}, as the changes in the original tables might 
lead to a new optimized plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14491) Implement access control for materialized views

2016-08-09 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-14491:
--

 Summary: Implement access control for materialized views
 Key: HIVE-14491
 URL: https://issues.apache.org/jira/browse/HIVE-14491
 Project: Hive
  Issue Type: Sub-task
  Components: Materialized views
Affects Versions: 2.2.0
Reporter: Jesus Camacho Rodriguez
Assignee: Alan Gates


We need to control the permissions granted to users and roles on materialized 
views. For instance, controlling SELECT permission on the materialized view, or 
materialized view ownership to drop it.

Further, materialized views should appear in {{SHOW GRANT}} statement.

Add tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14489) Add tests for DESCRIBE [EXTENDED|FORMATTED] statement on materialized views

2016-08-09 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-14489:
--

 Summary: Add tests for DESCRIBE [EXTENDED|FORMATTED] statement on 
materialized views
 Key: HIVE-14489
 URL: https://issues.apache.org/jira/browse/HIVE-14489
 Project: Hive
  Issue Type: Sub-task
  Components: Materialized views
Affects Versions: 2.2.0
Reporter: Jesus Camacho Rodriguez
Assignee: Alan Gates






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14490) Block ACID for materialized views

2016-08-09 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-14490:
--

 Summary: Block ACID for materialized views
 Key: HIVE-14490
 URL: https://issues.apache.org/jira/browse/HIVE-14490
 Project: Hive
  Issue Type: Sub-task
  Components: Materialized views
Affects Versions: 2.2.0
Reporter: Jesus Camacho Rodriguez
Assignee: Alan Gates


We should not be able to load, insert, update, or delete records from a 
materialized view. Add tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14488) Add DROP MATERIALIZED VIEW statement

2016-08-09 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-14488:
--

 Summary: Add DROP MATERIALIZED VIEW statement
 Key: HIVE-14488
 URL: https://issues.apache.org/jira/browse/HIVE-14488
 Project: Hive
  Issue Type: Sub-task
  Components: Materialized views
Affects Versions: 2.2.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Support for dropping existing materialized views. The statement is the 
following:

{code:sql}
DROP MATERIALIZED VIEW [db_name.]table_name;
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14487) Add REBUILD statement for materialized views

2016-08-09 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-14487:
--

 Summary: Add REBUILD statement for materialized views
 Key: HIVE-14487
 URL: https://issues.apache.org/jira/browse/HIVE-14487
 Project: Hive
  Issue Type: Sub-task
  Components: Materialized views
Affects Versions: 2.2.0
Reporter: Jesus Camacho Rodriguez
Assignee: Alan Gates


Support for rebuilding existing materialized views. The statement is the 
following:

{code:sql}
ALTER MATERIALIZED VIEW [db_name.]table_name REBUILD;
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14486) Add CREATE MATERIALIZED VIEW statement

2016-08-09 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-14486:
--

 Summary: Add CREATE MATERIALIZED VIEW statement
 Key: HIVE-14486
 URL: https://issues.apache.org/jira/browse/HIVE-14486
 Project: Hive
  Issue Type: Sub-task
  Components: Materialized views
Affects Versions: 2.2.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Support for creating materialized views. The statement is the following:

{code:sql}
CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db_name.]table_name
  [COMMENT table_comment]
  [
   [ROW FORMAT row_format] 
   [STORED AS file_format]
 | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]
  AS select_statement;
{code}

Thus, important features such as support for custom StorageHandler and location 
will be initially included.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14485) Create 'materialized view' table type

2016-08-09 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-14485:
--

 Summary: Create 'materialized view' table type
 Key: HIVE-14485
 URL: https://issues.apache.org/jira/browse/HIVE-14485
 Project: Hive
  Issue Type: Sub-task
  Components: Materialized views
Affects Versions: 2.2.0
Reporter: Jesus Camacho Rodriguez
Assignee: Alan Gates


Materialized views will be internally represented as a Table. Thus, we need to 
introduce a new Table type _materialized view_. Recognizing tables as 
materialized views is important for follow-up tasks such as query rewriting or 
materialized view maintenance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14484) Extensions for initial materialized views implementation

2016-08-09 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-14484:
--

 Summary: Extensions for initial materialized views implementation
 Key: HIVE-14484
 URL: https://issues.apache.org/jira/browse/HIVE-14484
 Project: Hive
  Issue Type: Bug
  Components: Materialized views
Affects Versions: 2.2.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Follow-up of HIVE-14249.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays

2016-08-09 Thread Sergey Zadoroshnyak (JIRA)

Sergey Zadoroshnyak created HIVE-14483:
--

 Summary:  java.lang.ArrayIndexOutOfBoundsException 
org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
 Key: HIVE-14483
 URL: https://issues.apache.org/jira/browse/HIVE-14483
 Project: Hive
  Issue Type: Bug
  Components: ORC
Affects Versions: 2.1.0
Reporter: Sergey Zadoroshnyak
Assignee: Owen O'Malley
Priority: Critical
 Fix For: 2.2.0


Error message:

Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
at 
org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369)
at 
org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231)
at 
org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268)
at 
org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368)
at 
org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212)
at 
org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902)
at 
org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737)
at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205)
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
... 22 more


How to reproduce?
Configure StringTreeReader  which contains StringDirectTreeReader as TreeReader 
(DIRECT or DIRECT_V2 column encoding)

batchSize = 1026;

invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final 
int batchSize)

scratchlcv is LongColumnVector with long[] vector  (length 1024)

 which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, 
scratchlcv,result, batchSize);

as result in method commonReadByteArrays(stream, lengths, scratchlcv,
result, (int) batchSize) we received ArrayIndexOutOfBoundsException.


If we use StringDictionaryTreeReader, then there is no exception, as we have a 
verification  scratchlcv.ensureSize((int) batchSize, false) before 
reader.nextVector(scratchlcv, scratchlcv.vector, batchSize);

These changes were made for Hive 2.1.0 by corresponding commit 
https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467
 for task  https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley

How to fix?
add  only one line :

scratchlcv.ensureSize((int) batchSize, false) ;

in method 
org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream
 stream, IntegerReader lengths,
LongColumnVector scratchlcv,
BytesColumnVector result, final int batchSize) before invocation 
lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize);















--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 50906: HIVE-14444 Upgrade qtest execution framework to junit4 - migrate most of them

2016-08-09 Thread Zoltan Haindrich



> On Aug. 9, 2016, 12:14 a.m., Peter Vary wrote:
> > Hi Zoltan,
> > 
> > Thanks for the patch, I can see, that you were working on it even on the 
> > weekend.
> > 
> > Please help me to understand the components a little more, so I could help 
> > with the review.
> > As I can see there are 3 levels of the classes for every given test:
> > - Configuration
> > - Adapter
> > - Driver
> > 
> > I have tried to identify the functionality of the given elements, and come 
> > up with the following:
> > - Configuration - The queries to run, the configuration of the clusters, 
> > and the initial data
> > - Adapter - The actual methods for implementing the test, like class, 
> > method level setup, and test execution
> > - Driver - These contain very little code, and they look very simmilar, so 
> > a lot of code duplication is there - should not be a good idea to merge it 
> > with the Adapter class? Also it is a little strange, that the Configuration 
> > has to have a reference to the Adapter. If you decide to merge the Adapter 
> > and the Driver, then the reference is not needed anymore.
> > 
> > Thanks,
> > Peter

Hello,

yeah...i wanted to take advantage of the "empty queue" on the ptest executor ;)
by the way i think that all hive precommit jobs which end-up on ubuntu-3 will 
fail with some wierd jdk issue...

I think you are getting it right...those classes which have "Driver" in there 
names are the successors of the old vm files: i don't want to touch them until 
we have all of them on board.
There is some redundancy even between the Driver classes...CliDriver and some 
others are very similar - it will be easy to drop some of them.
Merge with the adapter would possibly remove the common parent - and that would 
possibly break the factory adapterfactory.
The positive side of the current design is that all configurations are in one 
place...even the core executor selection is in CliConfigs - so you have to look 
at just one place if you have to modify it.

About more refactoring work: reviewboard can pick-up changes in renamed files 
(which is great) - but if I add more refactors to this patch: it will look like 
a "20 files remove", "30 files added" - which is not really review friendly ;) 
it have already lost track of the changes of PerfCliDriver and QTestGenTask.

I would like to continue this with a cleanup refactor; after AccumuloCli and 
BeelineCli is on-board. 

regards,
Zoltan


- Zoltan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50906/#review145152
---


On Aug. 8, 2016, 8:47 p.m., Zoltan Haindrich wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50906/
> ---
> 
> (Updated Aug. 8, 2016, 8:47 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-1
> https://issues.apache.org/jira/browse/HIVE-1
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> this patch focuses on removing the qtestgen task without introducing 
> regressions in existing testing services - its not the nicest patch...but 
> opens the possiblities to continue refactoring of these classes inside a pure 
> java environment
> 
> this patch contains a bunch of helper classes to provide Tests for the junit4 
> executor.
> 
> I've mimiced the old generated testcases behaviour:
> 
> * every test has a config entry in CliConfigs
> * every config has a Core executor - these can be looked as the older vm files
> * every test has an instance in the appropriate module - this class is 
> small...copy pasted parameterized junit4 test
> 
> 
> Diffs
> -
> 
>   ant/src/org/apache/hadoop/hive/ant/QTestGenTask.java 
> f372d7cb937d91c10d3dff0e4c51e0849c5e3c9b 
>   ant/src/org/apache/hadoop/hive/ant/antlib.xml 
> 8f663482348448be9aadcf535c42a8c11d8955b1 
>   hbase-handler/src/test/templates/TestHBaseCliDriver.vm 
> f513f0374b1e6798677e98b4d071d5969d204bfa 
>   hbase-handler/src/test/templates/TestHBaseNegativeCliDriver.vm 
> 043bd87a4f617de7fff89e38e654770cd9b84d8f 
>   itests/qtest-accumulo/pom.xml 339c59919295b66c9d3c64a53ea78bd944e3a903 
>   itests/qtest-spark/pom.xml 3bc9e24893a42bfcaab66d4cb8930dd5586c1db5 
>   
> itests/qtest-spark/src/test/java/org/apache/hadoop/hive/cli/TestMiniSparkOnYarnCliDriver.java
>  PRE-CREATION 
>   
> itests/qtest-spark/src/test/java/org/apache/hadoop/hive/cli/TestSparkCliDriver.java
>  PRE-CREATION 
>   
> itests/qtest-spark/src/test/java/org/apache/hadoop/hive/cli/TestSparkNegativeCliDriver.java
>  PRE-CREATION 
>   itests/qtest/pom.xml 17968e69559a16a1971f6028ea3073ab693a6678 
>   
> itests/qtest/src/test/java/org/apache/hadoop/hive/cli/ContribNegativeCliDriver.java
>  PRE-CREATION 
>   
> itests/qtest/src/test/java/org/apache/hadoop/hive/cli/Disabled

43 matches

Mail list logo