[jira] [Created] (HIVE-14524) BaseSemanticAnalyzer may leak HMS connection

2016-08-11 Thread Chao Sun (JIRA)
Chao Sun created HIVE-14524:
---

 Summary: BaseSemanticAnalyzer may leak HMS connection
 Key: HIVE-14524
 URL: https://issues.apache.org/jira/browse/HIVE-14524
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 2.2.0
Reporter: Chao Sun
Assignee: Chao Sun


Currently {{BaseSemanticAnalyzer}} keeps a copy of thread-local {{Hive}} object 
to connect to HMS. However, in some cases Hive may overwrite the existing 
{{Hive}} object:

{{Hive#getInternal}}:
{code}
  private static Hive getInternal(HiveConf c, boolean needsRefresh, boolean 
isFastCheck,
  boolean doRegisterAllFns) throws HiveException {
Hive db = hiveDB.get();
if (db == null || !db.isCurrentUserOwner() || needsRefresh
|| (c != null && db.metaStoreClient != null && !isCompatible(db, c, 
isFastCheck))) {
  return create(c, false, db, doRegisterAllFns);
}
if (c != null) {
  db.conf = c;
}
return db;
  }
{code}

*This poses an potential problem*: if one first instantiates a 
{{BaseSemanticAnalyzer}} object with the current {{Hive}} object (let's call it 
A), and for some reason A is overwritten by B with the code above, then 
{{BaseSemanticAnalyzer}} may keep using A to contact HMS, which will leak 
connections.

This can be reproduced by the following steps:
1. open a session
2. execute some simple query such as {{desc formatted src}}
3. change a metastore property (I know, this is not a perfect example...), for 
instance: {{set hive.txn.timeout=500}}
4. run another command such as {{desc formatted src}} again

Notice that in step 4), since a metavar is changed the {{isCompatible}} will 
return false, and hence a new {{Hive}} object is created. As result, you'll 
observe in the HS2 log that an connection has been leaked.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 50934: HIVE-14233 Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-11 Thread Saket Saurabh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50934/
---

(Updated Aug. 11, 2016, 4:36 p.m.)


Review request for hive and Eugene Koifman.


Repository: hive-git


Description
---

https://issues.apache.org/jira/browse/HIVE-14233
This JIRA proposes to improve vectorization for ACID by eliminating row-by-row 
stitching when reading back ACID files. In the current implementation, a 
vectorized row batch is created by populating the batch one row at a time, 
before the vectorized batch is passed up along the operator pipeline. This 
row-by-row stitching limitation was because of the fact that the ACID 
insert/update/delete events from various delta files needed to be merged 
together before the actual version of a given row was found out. HIVE-14035 has 
enabled us to break away from that limitation by splitting ACID update events 
into a combination of delete+insert. In fact, it has now enabled us to create 
splits on delta files.
Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
bottleneck in the vectorized code path for ACID by now directly reading row 
batches from the underlying ORC files and avoiding any stitching altogether. 
Once a row batch is read from the split (which may be on a base/delta file), 
the deleted rows will be found by cross-referencing them against a data 
structure that will just keep track of deleted events (found in the 
deleted_delta files). This will lead to a large performance gain when reading 
ACID files in vectorized fashion, while enabling further optimizations in 
future that can be done on top of that.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 
334cb31c5406f500c122a11eccef25b92d357cd4 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java 
e46ca51eff9c230147166e9428d7f462d2f9e772 
  
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
 PRE-CREATION 
  ql/src/test/queries/clientpositive/acid_vectorization.q 
832909bdb1bc79e01163373beed03eaaffcefd3d 
  ql/src/test/results/clientpositive/acid_vectorization.q.out 
1792979156ec361c85882ac8b6968e93d42b5f31 

Diff: https://reviews.apache.org/r/50934/diff/


Testing
---


Thanks,

Saket Saurabh



[jira] [Created] (HIVE-14523) ACID performance improvement patches

2016-08-11 Thread Saket Saurabh (JIRA)
Saket Saurabh created HIVE-14523:


 Summary: ACID performance improvement patches
 Key: HIVE-14523
 URL: https://issues.apache.org/jira/browse/HIVE-14523
 Project: Hive
  Issue Type: Test
Affects Versions: 2.2.0
Reporter: Saket Saurabh
Assignee: Saket Saurabh
Priority: Trivial
 Attachments: HIVE-14035_14199_14233.01.patch

This is a trivial non-functional JIRA that combines the features introduced 
HIVE-14035, HIVE-14199 and HIVE-14233 into a single patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 51006: CBO: Return path - Fix for converting GroupBy operator with no map side group by

2016-08-11 Thread Vineet Garg

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51006/
---

Review request for hive and Ashutosh Chauhan.


Bugs: HIVE-14396
https://issues.apache.org/jira/browse/HIVE-14396


Repository: hive-git


Description
---

This patch fixes the following issues:
1. UDAF info collection was looking for wrong expression for a UDAF's parameter 
from HiveAggregate.
2. Converting HiveAggregate to GroupBy operator was creating wrong expressions 
for UDAF's arguments based on underlying Reduce operator.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveCalciteUtil.java 
774fc59 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/HiveGBOpConvUtil.java
 25fe059 
  ql/src/test/queries/clientpositive/count.q 41ffaf2 
  ql/src/test/queries/clientpositive/groupby_ppr_multi_distinct.q 74bd2fd 
  ql/src/test/results/clientpositive/count.q.out c950c5b 
  ql/src/test/results/clientpositive/groupby_ppr_multi_distinct.q.out 33d1ed0 
  ql/src/test/results/clientpositive/spark/count.q.out b1ad662 
  ql/src/test/results/clientpositive/spark/groupby_ppr_multi_distinct.q.out 
5251241 
  ql/src/test/results/clientpositive/tez/count.q.out 9fc2c75 

Diff: https://reviews.apache.org/r/51006/diff/


Testing
---

Added new tests and Pre-commit testing


Thanks,

Vineet Garg



[jira] [Created] (HIVE-14521) codahale metrics exceptions

2016-08-11 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-14521:
---

 Summary: codahale metrics exceptions
 Key: HIVE-14521
 URL: https://issues.apache.org/jira/browse/HIVE-14521
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin


One some random setup, I see bazillions of errors like this in HS2 log, Gb-s of 
logs worth:
{noformat}
2016-08-08 04:52:18,619 WARN  [HiveServer2-Handler-Pool: Thread-101]: 
log.PerfLogger (PerfLogger.java:beginMetrics(226)) - Error recording metrics
java.io.IOException: Scope named api_Driver.run is not closed, cannot be opened.
at 
org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics$CodahaleMetricsScope.open(CodahaleMetrics.java:133)
at 
org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics.startStoredScope(CodahaleMetrics.java:220)
at 
org.apache.hadoop.hive.ql.log.PerfLogger.beginMetrics(PerfLogger.java:223)
at 
org.apache.hadoop.hive.ql.log.PerfLogger.PerfLogBegin(PerfLogger.java:143)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:378)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:320)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1214)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1208)
at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:146)
at 
org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:226)
at 
org.apache.hive.service.cli.operation.Operation.run(Operation.java:276)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:468)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:456)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
{noformat}

I suspect that either, just like the metastore deadline, this needs better 
error handling when whatever the metrics surround fails; or, it is just not 
thread safe.
But I actually haven't looked at the code yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14520) We should set a timeout for the blocking calls in TestMsgBusConnection

2016-08-11 Thread Hari Sankar Sivarama Subramaniyan (JIRA)
Hari Sankar Sivarama Subramaniyan created HIVE-14520:


 Summary: We should set a timeout for the blocking calls in 
TestMsgBusConnection
 Key: HIVE-14520
 URL: https://issues.apache.org/jira/browse/HIVE-14520
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan


consumer.receive() is a blocking call and if it fails, it will block for ever. 
Need to set timeout at the bare minimum to force the test to fail incase of 
failure rather than timing out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14519) Multi insert query bug

2016-08-11 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-14519:
---

 Summary: Multi insert query bug
 Key: HIVE-14519
 URL: https://issues.apache.org/jira/browse/HIVE-14519
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


When running multi-insert queries, when one of the query is not returning 
results, the other query is not returning the right result.
For example:
After following query, there is no value in /tmp/emp/dir3/00_0
{noformat}
>From (select * from src) a
insert overwrite directory '/tmp/emp/dir1/'
select key, value
insert overwrite directory '/tmp/emp/dir2/'
select 'header'
where 1=2
insert overwrite directory '/tmp/emp/dir3/'
select key, value 
where key = 100;
{noformat}

where clause in the second insert should not affect the third insert. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14518) Support 'having' translation for Druid GroupBy queries

2016-08-11 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-14518:
--

 Summary: Support 'having' translation for Druid GroupBy queries
 Key: HIVE-14518
 URL: https://issues.apache.org/jira/browse/HIVE-14518
 Project: Hive
  Issue Type: Sub-task
  Components: Druid integration
Affects Versions: 2.2.0
Reporter: Jesus Camacho Rodriguez


Currently, when there is a filter on top of an aggregate e.g. on the result of 
an aggregation function, we do not push that filter to Druid.

However, Druid supports 'having' clause for GroupBy queries, thus we should 
translate the Filter on top of the Aggregate and push it to Druid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 50896: HIVE-14404: Allow delimiterfordsv to use multiple-character delimiters

2016-08-11 Thread Marta Kuczora

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50896/
---

(Updated Aug. 11, 2016, 3:50 p.m.)


Review request for hive, Naveen Gangam, Sergio Pena, Szehon Ho, and Xuefu Zhang.


Bugs: HIVE-14404
https://issues.apache.org/jira/browse/HIVE-14404


Repository: hive-git


Description
---

Introduced a new outputformat (dsv2) which supports multiple characters as 
delimiter.
For generating the dsv, csv2 and tsv2 outputformats, the Super CSV library is 
used. This library doesn’t support multiple characters as delimiter. Since the 
same logic is used for generating csv2, tsv2 and dsv outputformats, I decided 
not to change this logic, rather introduce a new outputformat (dsv2) which 
supports multiple characters as delimiter. 
The new dsv2 outputformat has the same escaping logic as the dsv outputformat 
if the quoting is not disabled.
Extended the TestBeeLineWithArgs tests with new test steps which are using 
multiple characters as delimiter.

Main changes in the code:
 - Changed the SeparatedValuesOutputFormat class to be an abstract class and 
created two new child classes to separate the logic for single-character and 
multi-character delimiters: SingleCharSeparatedValuesOutputFormat and 
MultiCharSeparatedValuesOutputFormat

 - Kept the methods which are used by both children in the 
SeparatedValuesOutputFormat and moved the methods specific to the 
single-character case to the SingleCharSeparatedValuesOutputFormat class.

 - Didn’t change the logic which was in the SeparatedValuesOutputFormat, only 
moved some parts to the child class.

 - Implemented the value escaping and concatenation with the delimiter string 
in the MultiCharSeparatedValuesOutputFormat.


Diffs
-

  beeline/src/java/org/apache/hive/beeline/BeeLine.java e0fa032 
  beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java e6e24b1 
  
beeline/src/java/org/apache/hive/beeline/MultiCharSeparatedValuesOutputFormat.java
 PRE-CREATION 
  beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java 
66d9fd0 
  
beeline/src/java/org/apache/hive/beeline/SingleCharSeparatedValuesOutputFormat.java
 PRE-CREATION 
  
itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java 
892c733 

Diff: https://reviews.apache.org/r/50896/diff/


Testing
---

- Tested manually in BeeLine.
- Extended the TestBeeLineWithArgs tests with new test steps which are using 
multiple characters as delimiter.


Thanks,

Marta Kuczora



Re: Review Request 50982: HIVE-14345:Beeline result table has erroneous characters

2016-08-11 Thread Peter Vary

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50982/#review145509
---



Hi Miklos,

Thanks for the patch. This extra column was hurting my eyes :)

One important note:
- Please review your patch, to adhere the coding conventions here: 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-CodingConventions

What I have seen specifically:
- Spaces in empty lines
- Padding of lines
- Missing space before '{'
- Missing space between method call parameters
- And in the conditional statement, the extra space made it less readable.

About filling the review request:
- Please fill the branch, and the bug information of this apache review as 
well, to help searching for it.

Thanks,
Peter


beeline/src/test/org/apache/hive/beeline/TestTableOutputFormat.java (lines 71 - 
88)


I would remove these empty functions



beeline/src/test/org/apache/hive/beeline/TestTableOutputFormat.java (line 91)


nit: Maybe it would be good to have a little more explanation here, what we 
are testing - like extra column...



beeline/src/test/org/apache/hive/beeline/TestTableOutputFormat.java (line 95)


nit: I would remove this line



beeline/src/test/org/apache/hive/beeline/TestTableOutputFormat.java (line 99)


nit: I think, you should remove this line from the final patch


- Peter Vary


On Aug. 11, 2016, 2:19 p.m., Miklos Csanady wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50982/
> ---
> 
> (Updated Aug. 11, 2016, 2:19 p.m.)
> 
> 
> Review request for hive, Peter Vary and Sergio Pena.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Fixed output table formatting header and footer lines.
> 
> 
> Diffs
> -
> 
>   beeline/src/java/org/apache/hive/beeline/TableOutputFormat.java 2753568 
>   beeline/src/test/org/apache/hive/beeline/TestTableOutputFormat.java 
> PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50982/diff/
> 
> 
> Testing
> ---
> 
> See attached Unit testClass.
> After building with patch, the bug eliminated.
> 
> 
> Thanks,
> 
> Miklos Csanady
> 
>



Review Request 50982: HIVE-14345:Beeline result table has erroneous characters

2016-08-11 Thread Miklos Csanady

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50982/
---

Review request for hive, Peter Vary and Sergio Pena.


Repository: hive-git


Description
---

Fixed output table formatting header and footer lines.


Diffs
-

  beeline/src/java/org/apache/hive/beeline/TableOutputFormat.java 2753568 
  beeline/src/test/org/apache/hive/beeline/TestTableOutputFormat.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/50982/diff/


Testing
---

See attached Unit testClass.
After building with patch, the bug eliminated.


Thanks,

Miklos Csanady



change job name but keep the stage info

2016-08-11 Thread Markovitz, Dudu
Hi guys

I'm looking for a way to generate a common id for all jobs generated from the 
same query.
I'm aware of 2 possible options (described below) which are someway problematic.

Are you aware of a way to achieve this in current/future versions?

Thanks

Dudu




1.
Setting the job name:

set mapred.job.name=demo 1;
select count(*) from (select 1) t;

ID

User

Name

application_1469828525963_122782

dmarkovitz

demo 1


The downside:

* I'm losing the stage information

2.
Adding a comment before the query:

-- demo 2
select count(*) from (select 1) t

ID

User

Name

application_1469828525963_122812

dmarkovitz

-- demo 2 select count(*) from (select 1) t(Stage-1)


The downsides:

* This current behavior of determining the job name is not guaranteed

* It requires to add an additional text to all queries

* It contains undesired text (the prefix of the query)





Re: Review Request 50896: HIVE-14404: Allow delimiterfordsv to use multiple-character delimiters

2016-08-11 Thread Marta Kuczora


> On Aug. 9, 2016, 5 a.m., Peter Vary wrote:
> > Hi Marta,
> > 
> > Thanks for the patch, it is nice, and clean.
> > It might be a good idea to have the inputs checked, so if the user provides 
> > a multicharacter separator with a dsv format, then instead of using the 
> > first character of the string, an error might be printed.
> > 
> > Otherwise looks good.
> > 
> > Thanks,
> > Peter

Hi Peter,

thanks a lot for the review.
Sure, I can change the patch to check the input and print an error if the 
format is dsv and the separator contains multiple characters.

Regards,
Marta


> On Aug. 9, 2016, 5 a.m., Peter Vary wrote:
> > beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java, line 106
> > 
> >
> > nit Would not be better to change the type of the DEFAULT_DELIMITER_DSV 
> > to String?

If we change the type of the DEFAULT_DELIMITER_DSV to String, then we would 
need to do the converting for the single-character delimiter case. What would 
be better I think is to create a new default for dsv2, something like "String 
DEFAULT_DELIMITER_DSV2".


- Marta


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50896/#review145167
---


On Aug. 8, 2016, 3:13 p.m., Marta Kuczora wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50896/
> ---
> 
> (Updated Aug. 8, 2016, 3:13 p.m.)
> 
> 
> Review request for hive, Naveen Gangam, Sergio Pena, Szehon Ho, and Xuefu 
> Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Introduced a new outputformat (dsv2) which supports multiple characters as 
> delimiter.
> For generating the dsv, csv2 and tsv2 outputformats, the Super CSV library is 
> used. This library doesn’t support multiple characters as delimiter. Since 
> the same logic is used for generating csv2, tsv2 and dsv outputformats, I 
> decided not to change this logic, rather introduce a new outputformat (dsv2) 
> which supports multiple characters as delimiter. 
> The new dsv2 outputformat has the same escaping logic as the dsv outputformat 
> if the quoting is not disabled.
> Extended the TestBeeLineWithArgs tests with new test steps which are using 
> multiple characters as delimiter.
> 
> Main changes in the code:
>  - Changed the SeparatedValuesOutputFormat class to be an abstract class and 
> created two new child classes to separate the logic for single-character and 
> multi-character delimiters: SingleCharSeparatedValuesOutputFormat and 
> MultiCharSeparatedValuesOutputFormat
> 
>  - Kept the methods which are used by both children in the 
> SeparatedValuesOutputFormat and moved the methods specific to the 
> single-character case to the SingleCharSeparatedValuesOutputFormat class.
> 
>  - Didn’t change the logic which was in the SeparatedValuesOutputFormat, only 
> moved some parts to the child class.
> 
>  - Implemented the value escaping and concatenation with the delimiter string 
> in the MultiCharSeparatedValuesOutputFormat.
> 
> 
> Diffs
> -
> 
>   beeline/src/java/org/apache/hive/beeline/BeeLine.java e0fa032 
>   beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java e6e24b1 
>   
> beeline/src/java/org/apache/hive/beeline/MultiCharSeparatedValuesOutputFormat.java
>  PRE-CREATION 
>   beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java 
> 66d9fd0 
>   
> beeline/src/java/org/apache/hive/beeline/SingleCharSeparatedValuesOutputFormat.java
>  PRE-CREATION 
>   
> itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java
>  892c733 
> 
> Diff: https://reviews.apache.org/r/50896/diff/
> 
> 
> Testing
> ---
> 
> - Tested manually in BeeLine.
> - Extended the TestBeeLineWithArgs tests with new test steps which are using 
> multiple characters as delimiter.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>



Re: Review Request 50896: HIVE-14404: Allow delimiterfordsv to use multiple-character delimiters

2016-08-11 Thread Marta Kuczora


> On Aug. 9, 2016, 5:35 p.m., Szehon Ho wrote:
> > I'm ambivilent, I would rather have pursued the change to make it in 
> > superCSV to be better in long run.  But I do see it might not move very 
> > fast (did you guys try contacting them?).   The patch itself looks mostly 
> > fine though.
> > 
> > My only question is, does it need to be a 2nd version of the format?  That 
> > is, is there anything that is actually backward incompatibie other than 
> > adding a new flag?  Thanks.

No, I didn’t contacted the SuperCSV team. I asked around what would be the 
better way to go and I mostly got the answer to fix the issue in Hive, so I 
went in that direction. But I can try to contact them if fixing the issue there 
would be preferable.

Actually the issue can be fixed without introducing the 2nd dsv output format 
as well. In this case I saw two possible solutions:
- Separate the logic for the dsv output format from the csv2 and tsv2 formats 
and for dsv format always use the new logic which supports the string 
delimiter. In this case the Super CSV library wouldn’t be used for the dsv 
format.

- Change the logic in the SeparatedValuesOutputFormat class to use the SuperCSV 
library just like now if the delimiter is a single character and the use the 
new logic if the delimiter is a string.  This solution would leave the 
single-character case unchanged, and would support the multi-character 
delimiter for the same dsv format, but the code in the 
SeparatedValuesOutputFormat class would not be that nice.

I was thinking about which solution to choose. The reason why I introduced the 
new format is that I didn’t wan’t to change the logic for the single-character 
delimiter case, because I think using Super CSV makes the code a lot cleaner. 
But I also wanted to the keep the code clean, so I chose to separate the 
single-, and multi-character delimiter cases and introduce a new format.
If it is preferable to have only one dsv format, I can change the patch easily.


- Marta


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50896/#review145227
---


On Aug. 8, 2016, 3:13 p.m., Marta Kuczora wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50896/
> ---
> 
> (Updated Aug. 8, 2016, 3:13 p.m.)
> 
> 
> Review request for hive, Naveen Gangam, Sergio Pena, Szehon Ho, and Xuefu 
> Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Introduced a new outputformat (dsv2) which supports multiple characters as 
> delimiter.
> For generating the dsv, csv2 and tsv2 outputformats, the Super CSV library is 
> used. This library doesn’t support multiple characters as delimiter. Since 
> the same logic is used for generating csv2, tsv2 and dsv outputformats, I 
> decided not to change this logic, rather introduce a new outputformat (dsv2) 
> which supports multiple characters as delimiter. 
> The new dsv2 outputformat has the same escaping logic as the dsv outputformat 
> if the quoting is not disabled.
> Extended the TestBeeLineWithArgs tests with new test steps which are using 
> multiple characters as delimiter.
> 
> Main changes in the code:
>  - Changed the SeparatedValuesOutputFormat class to be an abstract class and 
> created two new child classes to separate the logic for single-character and 
> multi-character delimiters: SingleCharSeparatedValuesOutputFormat and 
> MultiCharSeparatedValuesOutputFormat
> 
>  - Kept the methods which are used by both children in the 
> SeparatedValuesOutputFormat and moved the methods specific to the 
> single-character case to the SingleCharSeparatedValuesOutputFormat class.
> 
>  - Didn’t change the logic which was in the SeparatedValuesOutputFormat, only 
> moved some parts to the child class.
> 
>  - Implemented the value escaping and concatenation with the delimiter string 
> in the MultiCharSeparatedValuesOutputFormat.
> 
> 
> Diffs
> -
> 
>   beeline/src/java/org/apache/hive/beeline/BeeLine.java e0fa032 
>   beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java e6e24b1 
>   
> beeline/src/java/org/apache/hive/beeline/MultiCharSeparatedValuesOutputFormat.java
>  PRE-CREATION 
>   beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java 
> 66d9fd0 
>   
> beeline/src/java/org/apache/hive/beeline/SingleCharSeparatedValuesOutputFormat.java
>  PRE-CREATION 
>   
> itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java
>  892c733 
> 
> Diff: https://reviews.apache.org/r/50896/diff/
> 
> 
> Testing
> ---
> 
> - Tested manually in BeeLine.
> - Extended the TestBeeLineWithArgs tests with new test steps which are using 
> multiple characters as delimiter.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 

Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

2016-08-11 Thread Lefty Leverenz


> On Aug. 10, 2016, 5:31 a.m., Lefty Leverenz wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, lines 3091-3092
> > 
> >
> > Tiny nit:  Either make "It" lowercase or move the parenthetical 
> > sentence after the first sentence, with a final period like this:
> > 
> > "Enable the use of scratch directories directly on blob storage 
> > systems. (It may cause performance penalties.)"

Looks good now.  +1 for the parameter descriptions.


- Lefty


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review145307
---


On Aug. 10, 2016, 9:08 p.m., Sergio Pena wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> ---
> 
> (Updated Aug. 10, 2016, 9:08 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
> https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch will create a temporary directory for Hive intermediate data on 
> HDFS when S3 tables are used.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java 
> PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> 9f5f619359701b948f57d599a5bdc2ecbdff280a 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 
> 89893eba9fd2316b9a393f06edefa837bb815faf 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
> 5bd78862e1064d7f64a5d764571015a8df1101e8 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> ---
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);
>3.651s
> - insert into table s3dummy values (1);   
>   39.231s
> - insert overwrite table s3dummy values (1);  
>   42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   9.297s
> - insert into table s3dummy_ext values (1);   
>   45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   3.945s
> - insert into table s3dummy values (1);   
>   15.025s
> - insert overwrite table s3dummy values (1);  
>   25.149s 
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from 
> dummy; 19.158s  
> - from dummy insert overwrite table s3dummy select *; 
>   25.469s  
> - from dummy insert into table s3dummy select *;  
>   14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 
> 's3a://spena-bucket/user/hive/warehouse/s3dummy';   4.827s
> - insert into table s3dummy_ext values (1);   
>   16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';  
>3.176s
> - alter table s3dummypart add partition (part=1); 
>3.229s
> - alter table s3dummypart add partition (part=2); 
>3.124s
> - insert into table s3dummypart partition (part=1) values (1);
>   14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);   
>   27.594s 
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * 
> from dummypart; 22.298s  
> - from dummypart insert overwrite table s3dummypart partition (part=1) select 
> id;   29.001s  
> - from dummypart insert into table s3dummypart partition (part=1) select id;  
>