[jira] [Commented] (HIVE-15780) HiveConf evaluation and management

2017-03-16 Thread anishek (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15927606#comment-15927606
 ] 

anishek commented on HIVE-15780:


something linked along similar lines of thinking as above is reported in 
HIVE-16220

> HiveConf evaluation and management
> --
>
> Key: HIVE-15780
> URL: https://issues.apache.org/jira/browse/HIVE-15780
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: anishek
>Assignee: anishek
>
> This is a placeholder Jira to see if HiveConf class can be rewired to give 
> better performance and security.
> * most of the variables in hive can be changed by the user, there are few 
> variables that are part of the "hive.conf.restricted.list" configuration, 
> with recent additions via HIVE-15713. Extending the list to include more 
> admin / cluster wide variables should be included.
> * By default any new variable that is created is available for end user to 
> change, given this is a huge list and a lot them are cluster wide 
> configurations, it would be better that any instance of variables in hiveconf 
> should be part of system / restricted list with additional parameter during 
> object instance creation to include if user can change it. 
> * since this class extends hadoop Configuration class any _get*_ method 
> called on instance of HiveConf is evaluated for variable substitution etc, 
> this is not required except for few variables like 
> "hive.downloaded.resources.dir", hence the class could just have a map of 
> most of the properties which should help in faster configuration lookup. 
> * for every session we create a new instance of hiveconf, if we can segregate 
> the conf's into system level / user level etc then have some sort of 
> composite pattern to represent hive conf such that a  single instance of 
> system configuration is used across sessions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16230) Enable CBO in presence of hints

2017-03-16 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan reassigned HIVE-16230:
---


> Enable CBO in presence of hints
> ---
>
> Key: HIVE-16230
> URL: https://issues.apache.org/jira/browse/HIVE-16230
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Logical Optimizer
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16230) Enable CBO in presence of hints

2017-03-16 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-16230:

Attachment: HIVE-16230.patch

> Enable CBO in presence of hints
> ---
>
> Key: HIVE-16230
> URL: https://issues.apache.org/jira/browse/HIVE-16230
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Logical Optimizer
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-16230.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16230) Enable CBO in presence of hints

2017-03-16 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-16230:

Status: Patch Available  (was: Open)

> Enable CBO in presence of hints
> ---
>
> Key: HIVE-16230
> URL: https://issues.apache.org/jira/browse/HIVE-16230
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Logical Optimizer
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-16230.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15616) Improve contents of qfile test output

2017-03-16 Thread Barna Zsombor Klara (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15927623#comment-15927623
 ] 

Barna Zsombor Klara commented on HIVE-15616:


Ahh sorry about that I forgot to test the new qfile generation. I will update 
the patch shortly.

> Improve contents of qfile test output
> -
>
> Key: HIVE-15616
> URL: https://issues.apache.org/jira/browse/HIVE-15616
> Project: Hive
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 2.1.1
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15616.1.patch, HIVE-15616.2.patch, 
> HIVE-15616.3.patch, HIVE-15616.4.patch, HIVE-15616.patch
>
>
> The current output of the failed qtests has a less than ideal signal to noise 
> ratio.
> We have duplicated stack traces and messages between the error message/stack 
> trace/error out.
> For diff errors the actual difference is missing from the error message and 
> can be found only in the standard out.
> I would like to simplify this output by removing duplications, moving 
> relevant information to the top.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16168) llap log links should use the NM nodeId port instead of web port

2017-03-16 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15927656#comment-15927656
 ] 

Lefty Leverenz commented on HIVE-16168:
---

Doc question:  Should the new config, *hive.llap.daemon.nm.address*, be 
documented in the wiki?  Its description says "Published to the llap registry. 
Should never be set by users" but perhaps users will need to know about the 
port address.  Then again if the TODO gets done before 2.2.0 is released, this 
won't matter (TODO Move the following 2 properties out of Configuration to a 
constant).

> llap log links should use the NM nodeId port instead of web port
> 
>
> Key: HIVE-16168
> URL: https://issues.apache.org/jira/browse/HIVE-16168
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Fix For: 2.2.0
>
> Attachments: HIVE-16168.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16210) Use jvm temporary tmp dir by default

2017-03-16 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15927657#comment-15927657
 ] 

Lefty Leverenz commented on HIVE-16210:
---

[~ashutoshc], the commit to master has a typo in the JIRA number (HIVE-16120 
should be 16210).  Please update errata.txt for commit 
7f05f0cfbbe56ed9f06ece5b040cc00940a4cad2.  (The patch also went to branch 
hive-14535 with the same commit number.)  Thanks.

Example of updating errata.txt:  HIVE-11876.

> Use jvm temporary tmp dir by default
> 
>
> Key: HIVE-16210
> URL: https://issues.apache.org/jira/browse/HIVE-16210
> Project: Hive
>  Issue Type: Improvement
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Fix For: 2.2.0
>
> Attachments: HIVE-16210.patch
>
>
> instead of using "/tmp" by default, it makes more sense to use the jvm 
> default tmp dir. This can have dramatic consequences if the indexed files are 
> huge. For instance application run by run containers can be provisioned with 
> a dedicated tmp dir. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16080) Add parquet to possible values for hive.default.fileformat and hive.default.fileformat.managed

2017-03-16 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-16080:
--
Labels: TODOC2.2  (was: )

> Add parquet to possible values for hive.default.fileformat and 
> hive.default.fileformat.managed
> --
>
> Key: HIVE-16080
> URL: https://issues.apache.org/jira/browse/HIVE-16080
> Project: Hive
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-16080.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16080) Add parquet to possible values for hive.default.fileformat and hive.default.fileformat.managed

2017-03-16 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15927676#comment-15927676
 ] 

Lefty Leverenz commented on HIVE-16080:
---

Doc note:  The descriptions of *hive.default.fileformat* and 
*hive.default.fileformat.managed* need to be updated in the wiki, with version 
information.

* [Configuration Properties -- hive.default.fileformat | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.default.fileformat]
* [Configuration Properties -- hive.default.fileformat.managed | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.default.fileformat.managed]

Added a TODOC2.2 label.

> Add parquet to possible values for hive.default.fileformat and 
> hive.default.fileformat.managed
> --
>
> Key: HIVE-16080
> URL: https://issues.apache.org/jira/browse/HIVE-16080
> Project: Hive
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-16080.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-13091) Beeline logs the username and password to the history file for connect commands

2017-03-16 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15927684#comment-15927684
 ] 

Peter Vary commented on HIVE-13091:
---

If somebody picks this up, keep in mind that the url could contain other 
passwords as well. See: HIVE-15931, and https://reviews.apache.org/r/56763/

> Beeline logs the username and password to the history file for connect 
> commands
> ---
>
> Key: HIVE-13091
> URL: https://issues.apache.org/jira/browse/HIVE-13091
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
>  Labels: incompatible
> Attachments: HIVE-13901.01.patch, HIVE-13901.02.patch
>
>
> [~farisa] and [~tthompso] found that the beeline client also logs the 
> username and password from a connect command into the beeline history file 
> (Usually found at ~/.beeline/history). We should not be logging the password 
> anywhere.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-14557) Nullpointer When both SkewJoin and Mapjoin Enabled

2017-03-16 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15927693#comment-15927693
 ] 

Rui Li commented on HIVE-14557:
---

Change looks good to me. +1 pending test

> Nullpointer When both SkewJoin  and Mapjoin Enabled
> ---
>
> Key: HIVE-14557
> URL: https://issues.apache.org/jira/browse/HIVE-14557
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 1.1.0, 2.1.0
>Reporter: Nemon Lou
> Attachments: HIVE-14557.patch
>
>
> The following sql failed with return code 2 on mr.
> {noformat}
> create table a(id int,id1 int);
> create table b(id int,id1 int);
> create table c(id int,id1 int);
> set hive.optimize.skewjoin=true;
> select a.id,b.id,c.id1 from a,b,c where a.id=b.id and a.id1=c.id1;
> {noformat}
> Error log as follows:
> {noformat}
> 2016-08-17 21:13:42,081 INFO [main] 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper: 
> Id =0
>   
> Id =21
>   
> Id =28
>   
> Id =16
>   
>   <\Children>
>   Id = 28 null<\Parent>
> <\FS>
>   <\Children>
>   Id = 21 nullId = 33 
> Id =33
>   null
>   <\Children>
>   <\Parent>
> <\HASHTABLEDUMMY><\Parent>
> <\MAPJOIN>
>   <\Children>
>   Id = 0 null<\Parent>
> <\TS>
>   <\Children>
>   <\Parent>
> <\MAP>
> 2016-08-17 21:13:42,084 INFO [main] 
> org.apache.hadoop.hive.ql.exec.TableScanOperator: Initializing operator TS[21]
> 2016-08-17 21:13:42,084 INFO [main] 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper: Initializing dummy operator
> 2016-08-17 21:13:42,086 INFO [main] 
> org.apache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0, 
> RECORDS_IN:0, 
> 2016-08-17 21:13:42,087 ERROR [main] 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper: Hit error while closing 
> operators - failing tree
> 2016-08-17 21:13:42,088 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.lang.RuntimeException: Hive Runtime Error 
> while closing operators
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:207)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:474)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:682)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:696)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:696)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:189)
>   ... 8 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15708) Upgrade calcite version to 1.12

2017-03-16 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-15708:

Attachment: HIVE-15708.18.patch

Reload againt same patch 17 renamed as 18 to trigger lab run

> Upgrade calcite version to 1.12
> ---
>
> Key: HIVE-15708
> URL: https://issues.apache.org/jira/browse/HIVE-15708
> Project: Hive
>  Issue Type: Task
>  Components: CBO, Logical Optimizer
>Affects Versions: 2.2.0
>Reporter: Ashutosh Chauhan
>Assignee: Remus Rusanu
> Attachments: HIVE-15708.01.patch, HIVE-15708.02.patch, 
> HIVE-15708.03.patch, HIVE-15708.04.patch, HIVE-15708.05.patch, 
> HIVE-15708.06.patch, HIVE-15708.07.patch, HIVE-15708.08.patch, 
> HIVE-15708.09.patch, HIVE-15708.10.patch, HIVE-15708.11.patch, 
> HIVE-15708.12.patch, HIVE-15708.13.patch, HIVE-15708.14.patch, 
> HIVE-15708.15.patch, HIVE-15708.15.patch, HIVE-15708.16.patch, 
> HIVE-15708.17.patch, HIVE-15708.18.patch
>
>
> Currently we are on 1.10 Need to upgrade calcite version to 1.11



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-14919) Improve the performance of Hive on Spark 2.0.0

2017-03-16 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15927700#comment-15927700
 ] 

Rui Li commented on HIVE-14919:
---

I mean we can set Xms, not Xmx.

> Improve the performance of Hive on Spark 2.0.0
> --
>
> Key: HIVE-14919
> URL: https://issues.apache.org/jira/browse/HIVE-14919
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
>
> In HIVE-14029, we have updated Spark dependency to 2.0.0. We use Intel 
> BigBench[1] to run benchmark with Spark 2.0 over 1 TB data set comparing with 
> Spark 1.6. We can see performance improvments about 5.4% in general and 45% 
> for the best case. However, some queries doesn't have significant performance 
> improvements.  This JIRA is the umbrella ticket addressing those performance 
> issues.
> [1] https://github.com/intel-hadoop/Big-Data-Benchmark-for-Big-Bench



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16231) Parquet timestamp may be stored differently since HIVE-12767

2017-03-16 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara reassigned HIVE-16231:
--


> Parquet timestamp may be stored differently since HIVE-12767
> 
>
> Key: HIVE-16231
> URL: https://issues.apache.org/jira/browse/HIVE-16231
> Project: Hive
>  Issue Type: Bug
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>Priority: Critical
>
> If the parquet table is missing its timezone property then the timestamp will 
> be stored with an adjustment instead of without it. This will cause a 
> regression with other applications like Impala or Spark.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16231) Parquet timestamp may be stored differently since HIVE-12767

2017-03-16 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-16231:
---
Fix Version/s: 2.2.0

> Parquet timestamp may be stored differently since HIVE-12767
> 
>
> Key: HIVE-16231
> URL: https://issues.apache.org/jira/browse/HIVE-16231
> Project: Hive
>  Issue Type: Bug
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>Priority: Critical
> Fix For: 2.2.0
>
>
> If the parquet table is missing its timezone property then the timestamp will 
> be stored with an adjustment instead of without it. This will cause a 
> regression with other applications like Impala or Spark.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Work started] (HIVE-16231) Parquet timestamp may be stored differently since HIVE-12767

2017-03-16 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-16231 started by Barna Zsombor Klara.
--
> Parquet timestamp may be stored differently since HIVE-12767
> 
>
> Key: HIVE-16231
> URL: https://issues.apache.org/jira/browse/HIVE-16231
> Project: Hive
>  Issue Type: Bug
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>Priority: Critical
> Fix For: 2.2.0
>
>
> If the parquet table is missing its timezone property then the timestamp will 
> be stored with an adjustment instead of without it. This will cause a 
> regression with other applications like Impala or Spark.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16007) When the query does not complie the LogRunnable never stops

2017-03-16 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15927782#comment-15927782
 ] 

Peter Vary commented on HIVE-16007:
---

[~anishek], [~thejas]: Any thoughts?

Thanks,
Peter

> When the query does not complie the LogRunnable never stops
> ---
>
> Key: HIVE-16007
> URL: https://issues.apache.org/jira/browse/HIVE-16007
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-16007.2.patch, HIVE-16007.3.patch, HIVE-16007.patch
>
>
> When issuing a sql command which does not compile then the LogRunnable thread 
> is never closed.
> The issue can be easily detected when running beeline with showWarnings=true.
> {code}
> $ ./beeline -u "jdbc:hive2://localhost:1 pvary pvary" --showWarnings=true
> [..]
> Connecting to jdbc:hive2://localhost:1
> Connected to: Apache Hive (version 2.2.0-SNAPSHOT)
> Driver: Hive JDBC (version 2.2.0-SNAPSHOT)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 2.2.0-SNAPSHOT by Apache Hive
> 0: jdbc:hive2://localhost:1> selekt;
> Warning: java.sql.SQLException: Method getQueryLog() failed. Because the 
> stmtHandle in HiveStatement is null and the statement execution might fail. 
> (state=,code=0)
> [..]
> Warning: java.sql.SQLException: Can't getQueryLog after statement has been 
> closed (state=,code=0)
> [..]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15849) hplsql should add enterGlobalScope func to UDF

2017-03-16 Thread Fei Hui (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15927991#comment-15927991
 ] 

Fei Hui commented on HIVE-15849:


[~Ferd] could you please help to commit it to master? Thanks

> hplsql should add enterGlobalScope func to UDF
> --
>
> Key: HIVE-15849
> URL: https://issues.apache.org/jira/browse/HIVE-15849
> Project: Hive
>  Issue Type: Bug
>  Components: hpl/sql
>Affects Versions: 2.2.0
>Reporter: Fei Hui
>Assignee: Fei Hui
> Attachments: HIVE-15849.1.patch, HIVE-15849.patch
>
>
> code in Udf.java
> {code:title=Udf.java|borderStyle=solid}
> if (exec == null) {
>   exec = new Exec();
>   String query = queryOI.getPrimitiveJavaObject(arguments[0].get());
>   String[] args = { "-e", query, "-trace" };
>   try {
> exec.setUdfRun(true);
> exec.init(args);
>   } catch (Exception e) {
> throw new HiveException(e.getMessage());
>   }
> }
> if (arguments.length > 1) {
>   setParameters(arguments);
> }
> Var result = exec.run();
> if (result != null) {
>   return result.toString();
> }
> {code}
> Here is my thoughts
> {quote}
> we should add 'exec.enterGlobalScope(); '  between 'exec = new Exec();' and 
> 'setParameters(arguments);'
> Because if we do not call exec.enterGlobalScope(),  setParameters(arguments) 
> will useless. Vars are not added into scope , but exec.run() will use vars 
> which we set. The vars are parameters passed to UDF, [, :1, :2, ...n] which 
> are description in Udf.java
> {quote}
> Before add this function, the result as follow. we get the wrong result, 
> because the result contains  empty string  
> {quote}
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting query
> Query executed successfully (2.30 sec)
> Ln:8 SELECT completed successfully
> Ln:8 Standalone SELECT executed: 1 columns in the result set
> Hello, !
> Hello, !
> {quote}
> After add this function, we get the right result
> {quote}
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting query
> Query executed successfully (2.35 sec)
> Ln:8 SELECT completed successfully
> Ln:8 Standalone SELECT executed: 1 columns in the result set
> Hello, fei!
> Hello, fei!
> {quote}
> tests come from http://www.hplsql.org/udf



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16183) Fix potential thread safety issues with static variables

2017-03-16 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-16183:
---
Attachment: HIVE-16183.4.patch

> Fix potential thread safety issues with static variables
> 
>
> Key: HIVE-16183
> URL: https://issues.apache.org/jira/browse/HIVE-16183
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-16183.1.patch, HIVE-16183.2.patch, 
> HIVE-16183.3.patch, HIVE-16183.4.patch, HIVE-16183.4.patch, HIVE-16183.patch
>
>
> Many concurrency issues (HIVE-12768, HIVE-16175, HIVE-16060) have been found 
> with respect to class static variable usages. With fact that HS2 supports 
> concurrent compilation and task execution as well as some backend engines 
> (such as Spark) running multiple tasks in a single JVM, traditional 
> assumption (or mindset) of single threaded execution needs to be abandoned.
> This purpose of this JIRA is to do a global scan of static variables in Hive 
> code base, and correct potential thread-safety issues. However, it's not 
> meant to be exhaustive.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16196) UDFJson having thread-safety issues

2017-03-16 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928006#comment-15928006
 ] 

Xuefu Zhang commented on HIVE-16196:


Test result didn't get posted somehow. Here it goes:
{code}
https://builds.apache.org/job/PreCommit-HIVE-Build/4168/testReport/
{code}
The test failure doesn't seem related to the patch here.

> UDFJson having thread-safety issues
> ---
>
> Key: HIVE-16196
> URL: https://issues.apache.org/jira/browse/HIVE-16196
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-16196.patch
>
>
> Followup for HIVE-16183, there seems to be some concurrency issues in 
> UDFJson.java, especially around static class variables.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16196) UDFJson having thread-safety issues

2017-03-16 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928006#comment-15928006
 ] 

Xuefu Zhang edited comment on HIVE-16196 at 3/16/17 1:21 PM:
-

Test result didn't get posted somehow. Here it goes:

https://builds.apache.org/job/PreCommit-HIVE-Build/4168/testReport/

The test failure doesn't seem related to the patch here.


was (Author: xuefuz):
Test result didn't get posted somehow. Here it goes:
{code}
https://builds.apache.org/job/PreCommit-HIVE-Build/4168/testReport/
{code}
The test failure doesn't seem related to the patch here.

> UDFJson having thread-safety issues
> ---
>
> Key: HIVE-16196
> URL: https://issues.apache.org/jira/browse/HIVE-16196
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-16196.patch
>
>
> Followup for HIVE-16183, there seems to be some concurrency issues in 
> UDFJson.java, especially around static class variables.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15978) Support regr_* functions

2017-03-16 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928197#comment-15928197
 ] 

Zoltan Haindrich commented on HIVE-15978:
-

I will be the HiveQA today! :)

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12858977/HIVE-15978.3.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 10419 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=97)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4171/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4171/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4171/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

> Support regr_* functions
> 
>
> Key: HIVE-15978
> URL: https://issues.apache.org/jira/browse/HIVE-15978
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Carter Shanklin
>Assignee: Zoltan Haindrich
> Attachments: HIVE-15978.1.patch, HIVE-15978.2.patch, 
> HIVE-15978.2.patch, HIVE-15978.3.patch
>
>
> Support the standard regr_* functions, regr_slope, regr_intercept, regr_r2, 
> regr_sxx, regr_syy, regr_sxy, regr_avgx, regr_avgy, regr_count. SQL reference 
> section 10.9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16166) HS2 may still waste up to 15% of memory on duplicate strings

2017-03-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-16166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928202#comment-15928202
 ] 

Sergio Peña commented on HIVE-16166:


[~mi...@cloudera.com] HiveQA already finished, but it could not post the 
results on the JIRA. 
Here are the results:

https://builds.apache.org/view/H-L/view/Hive/job/PreCommit-HIVE-Build/4164/

Test Result (10 failures / +10)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[dynamic_partition_pruning_2]
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_partitioned]
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_dynamic_partition_pruning]
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_part]
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_partition_pruning]
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_part_varchar]
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_semijoin_reduction]
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[special_character_in_tabnames_1]
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[vector_join_part_col_char]

> HS2 may still waste up to 15% of memory on duplicate strings
> 
>
> Key: HIVE-16166
> URL: https://issues.apache.org/jira/browse/HIVE-16166
> Project: Hive
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: ch_2_excerpt.txt, HIVE-16166.01.patch
>
>
> A heap dump obtained from one of our users shows that 15% of memory is wasted 
> on duplicate strings, despite the recent optimizations that I made. The 
> problematic strings just come from different sources this time. See the 
> excerpt from the jxray (www.jxray.com) analysis attached.
> Adding String.intern() calls in the appropriate places reduces the overhead 
> of duplicate strings with this workload to ~6%. The remaining duplicates come 
> mostly from JDK internal and MapReduce data structures, and thus are more 
> difficult to fix.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15867) Add blobstore tests for import/export

2017-03-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928248#comment-15928248
 ] 

Sergio Peña commented on HIVE-15867:


[~juanrh] Seems the tests weren't executed. Could you re-submit the patch to 
trigger HiveQA?

> Add blobstore tests for import/export
> -
>
> Key: HIVE-15867
> URL: https://issues.apache.org/jira/browse/HIVE-15867
> Project: Hive
>  Issue Type: Bug
>Reporter: Thomas Poepping
>Assignee: Juan Rodríguez Hortalá
> Attachments: HIVE-15867-v3.patch
>
>
> This patch covers ten separate tests testing import and export operations 
> running against blobstore filesystems:
> * Import addpartition
> ** blobstore -> file
> ** file -> blobstore
> ** blobstore -> blobstore
> ** blobstore -> hdfs
> * import/export
> ** blobstore -> file
> ** file -> blobstore
> ** blobstore -> blobstore (partitioned and non-partitioned)
> ** blobstore -> HDFS (partitioned and non-partitioned)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16049) upgrade to jetty 9

2017-03-16 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-16049:

Attachment: HIVE-16049.2.patch

> upgrade to jetty 9
> --
>
> Key: HIVE-16049
> URL: https://issues.apache.org/jira/browse/HIVE-16049
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sean Busbey
>Assignee: Aihua Xu
> Attachments: HIVE-16049.0.patch, HIVE-16049.1.patch, 
> HIVE-16049.2.patch
>
>
> Jetty 7 has been deprecated for a couple of years now. Hadoop and HBase have 
> both updated to Jetty 9 for their next major releases, which will complicate 
> classpath concerns.
> Proactively update to Jetty 9 in the few places we use a web server.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16049) upgrade to jetty 9

2017-03-16 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928251#comment-15928251
 ] 

Aihua Xu commented on HIVE-16049:
-

patch-2: fix the build of hcatalog/svr project which should fix the test 
failure TestWebHCatE2e.

> upgrade to jetty 9
> --
>
> Key: HIVE-16049
> URL: https://issues.apache.org/jira/browse/HIVE-16049
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sean Busbey
>Assignee: Aihua Xu
> Attachments: HIVE-16049.0.patch, HIVE-16049.1.patch, 
> HIVE-16049.2.patch
>
>
> Jetty 7 has been deprecated for a couple of years now. Hadoop and HBase have 
> both updated to Jetty 9 for their next major releases, which will complicate 
> classpath concerns.
> Proactively update to Jetty 9 in the few places we use a web server.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16205) Improving type safety in Objectstore

2017-03-16 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928253#comment-15928253
 ] 

Vihang Karajgaonkar commented on HIVE-16205:


[~spena] Can you please review? Thanks!

> Improving type safety in Objectstore
> 
>
> Key: HIVE-16205
> URL: https://issues.apache.org/jira/browse/HIVE-16205
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-16205.01.patch, HIVE-16205.02.patch, 
> HIVE-16205.03.patch
>
>
> Modify the queries in ObjectStore for better type safety



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15166) Provide beeline option to set the jline history max size

2017-03-16 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-15166:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks Eric for the contribution.

> Provide beeline option to set the jline history max size
> 
>
> Key: HIVE-15166
> URL: https://issues.apache.org/jira/browse/HIVE-15166
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Affects Versions: 2.1.0
>Reporter: Eric Lin
>Assignee: Eric Lin
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15166.2.patch, HIVE-15166.3.patch, HIVE-15166.patch
>
>
> Currently Beeline does not provide an option to limit the max size for 
> beeline history file, in the case that each query is very big, it will flood 
> the history file and slow down beeline on start up and shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16024) MSCK Repair Requires nonstrict hive.mapred.mode

2017-03-16 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-16024:
---
Attachment: HIVE-16024.06.patch

Updated the patch based on review board comments.

> MSCK Repair Requires nonstrict hive.mapred.mode
> ---
>
> Key: HIVE-16024
> URL: https://issues.apache.org/jira/browse/HIVE-16024
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-16024.01.patch, HIVE-16024.02.patch, 
> HIVE-16024.03.patch, HIVE-16024.04.patch, HIVE-16024.05.patch, 
> HIVE-16024.06.patch
>
>
> MSCK repair fails when hive.mapred.mode is set to strict
> HIVE-13788 modified the way we read up partitions for a table to improve 
> performance. Unfortunately it is using PartitionPruner to load the partitions 
> which in turn is checking hive.mapred.mode.
> The previous code did not check hive.mapred.mode.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15867) Add blobstore tests for import/export

2017-03-16 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Rodríguez Hortalá updated HIVE-15867:
--
Status: Open  (was: Patch Available)

re-submitting the patch to trigger HiveQA

> Add blobstore tests for import/export
> -
>
> Key: HIVE-15867
> URL: https://issues.apache.org/jira/browse/HIVE-15867
> Project: Hive
>  Issue Type: Bug
>Reporter: Thomas Poepping
>Assignee: Juan Rodríguez Hortalá
> Attachments: HIVE-15867-v3.patch
>
>
> This patch covers ten separate tests testing import and export operations 
> running against blobstore filesystems:
> * Import addpartition
> ** blobstore -> file
> ** file -> blobstore
> ** blobstore -> blobstore
> ** blobstore -> hdfs
> * import/export
> ** blobstore -> file
> ** file -> blobstore
> ** blobstore -> blobstore (partitioned and non-partitioned)
> ** blobstore -> HDFS (partitioned and non-partitioned)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15867) Add blobstore tests for import/export

2017-03-16 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Rodríguez Hortalá updated HIVE-15867:
--
   Labels: patch test  (was: )
Affects Version/s: 2.1.1
 Target Version/s: 2.1.1
   Status: Patch Available  (was: Open)

> Add blobstore tests for import/export
> -
>
> Key: HIVE-15867
> URL: https://issues.apache.org/jira/browse/HIVE-15867
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Thomas Poepping
>Assignee: Juan Rodríguez Hortalá
>  Labels: patch, test
> Attachments: HIVE-15867-v3.patch
>
>
> This patch covers ten separate tests testing import and export operations 
> running against blobstore filesystems:
> * Import addpartition
> ** blobstore -> file
> ** file -> blobstore
> ** blobstore -> blobstore
> ** blobstore -> hdfs
> * import/export
> ** blobstore -> file
> ** file -> blobstore
> ** blobstore -> blobstore (partitioned and non-partitioned)
> ** blobstore -> HDFS (partitioned and non-partitioned)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16146) If possible find a better way to filter the TestBeeLineDriver output

2017-03-16 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928369#comment-15928369
 ] 

Peter Vary commented on HIVE-16146:
---

Here is my thoughts after some digging: Even though we might be able to create 
better filtering than the current one in QTestUtil, the main goal could be to 
generate the same output from the same query scripts. So if we decide to move a 
few tests from the CLI Driver to the BeeLine driver then we should be able to 
do it without changing the golden files.

The CLI output contains 3 types of data:
- Query results - columns separated by a single '\t' character
- PREHOOK logs
- POSTHOOK logs

I have yet to find anything else, but there might be something which I missed :)

> If possible find a better way to filter the TestBeeLineDriver output
> 
>
> Key: HIVE-16146
> URL: https://issues.apache.org/jira/browse/HIVE-16146
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>
> Currently we apply a blacklist to filter the output of the BeeLine Qtest runs.
> It might be a good idea to go thorough of the possibilities and find a better 
> way, if possible.
> I think our main goal could be for the TestBeeLineDriver test output to match 
> the TestCliDriver output of the came query file. Or if it is not possible, 
> then at least a similar one
> CC: [~vihangk1]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16146) If possible find a better way to filter the TestBeeLineDriver output

2017-03-16 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-16146:
--
Attachment: HIVE-16146.patch

Patch for providing the same golden files from BeeLine tests than from CLI tests

> If possible find a better way to filter the TestBeeLineDriver output
> 
>
> Key: HIVE-16146
> URL: https://issues.apache.org/jira/browse/HIVE-16146
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-16146.patch
>
>
> Currently we apply a blacklist to filter the output of the BeeLine Qtest runs.
> It might be a good idea to go thorough of the possibilities and find a better 
> way, if possible.
> I think our main goal could be for the TestBeeLineDriver test output to match 
> the TestCliDriver output of the came query file. Or if it is not possible, 
> then at least a similar one
> CC: [~vihangk1]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16146) If possible find a better way to filter the TestBeeLineDriver output

2017-03-16 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-16146:
--
Status: Patch Available  (was: Open)

> If possible find a better way to filter the TestBeeLineDriver output
> 
>
> Key: HIVE-16146
> URL: https://issues.apache.org/jira/browse/HIVE-16146
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-16146.patch
>
>
> Currently we apply a blacklist to filter the output of the BeeLine Qtest runs.
> It might be a good idea to go thorough of the possibilities and find a better 
> way, if possible.
> I think our main goal could be for the TestBeeLineDriver test output to match 
> the TestCliDriver output of the came query file. Or if it is not possible, 
> then at least a similar one
> CC: [~vihangk1]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15867) Add blobstore tests for import/export

2017-03-16 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Rodríguez Hortalá updated HIVE-15867:
--
Attachment: HIVE-15867-v3.patch

re-attaching patch to trigger Hive QA

> Add blobstore tests for import/export
> -
>
> Key: HIVE-15867
> URL: https://issues.apache.org/jira/browse/HIVE-15867
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Thomas Poepping
>Assignee: Juan Rodríguez Hortalá
>  Labels: patch, test
> Attachments: HIVE-15867-v3.patch, HIVE-15867-v3.patch
>
>
> This patch covers ten separate tests testing import and export operations 
> running against blobstore filesystems:
> * Import addpartition
> ** blobstore -> file
> ** file -> blobstore
> ** blobstore -> blobstore
> ** blobstore -> hdfs
> * import/export
> ** blobstore -> file
> ** file -> blobstore
> ** blobstore -> blobstore (partitioned and non-partitioned)
> ** blobstore -> HDFS (partitioned and non-partitioned)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15867) Add blobstore tests for import/export

2017-03-16 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Rodríguez Hortalá updated HIVE-15867:
--
Attachment: (was: HIVE-15867-v3.patch)

> Add blobstore tests for import/export
> -
>
> Key: HIVE-15867
> URL: https://issues.apache.org/jira/browse/HIVE-15867
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Thomas Poepping
>Assignee: Juan Rodríguez Hortalá
>  Labels: patch, test
> Attachments: HIVE-15867-v3.patch
>
>
> This patch covers ten separate tests testing import and export operations 
> running against blobstore filesystems:
> * Import addpartition
> ** blobstore -> file
> ** file -> blobstore
> ** blobstore -> blobstore
> ** blobstore -> hdfs
> * import/export
> ** blobstore -> file
> ** file -> blobstore
> ** blobstore -> blobstore (partitioned and non-partitioned)
> ** blobstore -> HDFS (partitioned and non-partitioned)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15766) DBNotificationlistener leaks JDOPersistenceManager

2017-03-16 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-15766:

Attachment: HIVE-15766.5.patch

> DBNotificationlistener leaks JDOPersistenceManager
> --
>
> Key: HIVE-15766
> URL: https://issues.apache.org/jira/browse/HIVE-15766
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-15766.1.patch, HIVE-15766.2.patch, 
> HIVE-15766.3.patch, HIVE-15766.4.patch, HIVE-15766.4.patch, HIVE-15766.5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16146) If possible find a better way to filter the TestBeeLineDriver output

2017-03-16 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-16146:
--
Attachment: HIVE-16146.02.patch

findbugs, checkstyle errors corrected

> If possible find a better way to filter the TestBeeLineDriver output
> 
>
> Key: HIVE-16146
> URL: https://issues.apache.org/jira/browse/HIVE-16146
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-16146.02.patch, HIVE-16146.patch
>
>
> Currently we apply a blacklist to filter the output of the BeeLine Qtest runs.
> It might be a good idea to go thorough of the possibilities and find a better 
> way, if possible.
> I think our main goal could be for the TestBeeLineDriver test output to match 
> the TestCliDriver output of the came query file. Or if it is not possible, 
> then at least a similar one
> CC: [~vihangk1]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15983) Support the named columns join

2017-03-16 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15983:
---
Affects Version/s: 2.1.0

> Support the named columns join
> --
>
> Key: HIVE-15983
> URL: https://issues.apache.org/jira/browse/HIVE-15983
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Carter Shanklin
>Assignee: Pengcheng Xiong
> Fix For: 2.2.0
>
> Attachments: HIVE-15983.01.patch, HIVE-15983.02.patch, 
> HIVE-15983.03.patch, HIVE-15983.04.patch
>
>
> The named columns join is a common shortcut allowing joins on identically 
> named keys. Example: select * from t1 join t2 using c1 is equivalent to 
> select * from t1 join t2 on t1.c1 = t2.c1. SQL standard reference: Section 7.7



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15983) Support the named columns join

2017-03-16 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15983:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

pushed to master. thanks [~ashutoshc] for the review

> Support the named columns join
> --
>
> Key: HIVE-15983
> URL: https://issues.apache.org/jira/browse/HIVE-15983
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Carter Shanklin
>Assignee: Pengcheng Xiong
> Fix For: 2.2.0
>
> Attachments: HIVE-15983.01.patch, HIVE-15983.02.patch, 
> HIVE-15983.03.patch, HIVE-15983.04.patch
>
>
> The named columns join is a common shortcut allowing joins on identically 
> named keys. Example: select * from t1 join t2 using c1 is equivalent to 
> select * from t1 join t2 on t1.c1 = t2.c1. SQL standard reference: Section 7.7



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15983) Support the named columns join

2017-03-16 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15983:
---
Fix Version/s: 2.2.0

> Support the named columns join
> --
>
> Key: HIVE-15983
> URL: https://issues.apache.org/jira/browse/HIVE-15983
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Carter Shanklin
>Assignee: Pengcheng Xiong
> Fix For: 2.2.0
>
> Attachments: HIVE-15983.01.patch, HIVE-15983.02.patch, 
> HIVE-15983.03.patch, HIVE-15983.04.patch
>
>
> The named columns join is a common shortcut allowing joins on identically 
> named keys. Example: select * from t1 join t2 using c1 is equivalent to 
> select * from t1 join t2 on t1.c1 = t2.c1. SQL standard reference: Section 7.7



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16183) Fix potential thread safety issues with static variables

2017-03-16 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-16183:
---
Attachment: HIVE-16183.4.patch

> Fix potential thread safety issues with static variables
> 
>
> Key: HIVE-16183
> URL: https://issues.apache.org/jira/browse/HIVE-16183
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-16183.1.patch, HIVE-16183.2.patch, 
> HIVE-16183.3.patch, HIVE-16183.4.patch, HIVE-16183.4.patch, 
> HIVE-16183.4.patch, HIVE-16183.patch
>
>
> Many concurrency issues (HIVE-12768, HIVE-16175, HIVE-16060) have been found 
> with respect to class static variable usages. With fact that HS2 supports 
> concurrent compilation and task execution as well as some backend engines 
> (such as Spark) running multiple tasks in a single JVM, traditional 
> assumption (or mindset) of single threaded execution needs to be abandoned.
> This purpose of this JIRA is to do a global scan of static variables in Hive 
> code base, and correct potential thread-safety issues. However, it's not 
> meant to be exhaustive.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16232) Support stats computation for column in QuotedIdentifier

2017-03-16 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong reassigned HIVE-16232:
--


> Support stats computation for column in QuotedIdentifier 
> -
>
> Key: HIVE-16232
> URL: https://issues.apache.org/jira/browse/HIVE-16232
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>
> right now if a column contains double quotes ``, we can not compute its stats.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16232) Support stats computation for column in QuotedIdentifier

2017-03-16 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16232:
---
Issue Type: Sub-task  (was: Bug)
Parent: HIVE-11160

> Support stats computation for column in QuotedIdentifier 
> -
>
> Key: HIVE-16232
> URL: https://issues.apache.org/jira/browse/HIVE-16232
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>
> right now if a column contains double quotes ``, we can not compute its stats.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15947) Enhance Templeton service job operations reliability

2017-03-16 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-15947:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

+1ed on RB.

Precommit test fail to publish result, there is one unrelated failure: 
org.apache.hive.service.server.TestHS2HttpServer.testContextRootUrlRewrite. 
Other tests all pass. Link: 
https://builds.apache.org/job/PreCommit-HIVE-Build/4180/

Patch pushed to master. Thanks Subramanyam, Kiran!

> Enhance Templeton service job operations reliability
> 
>
> Key: HIVE-15947
> URL: https://issues.apache.org/jira/browse/HIVE-15947
> Project: Hive
>  Issue Type: Bug
>Reporter: Subramanyam Pattipaka
>Assignee: Subramanyam Pattipaka
> Fix For: 2.2.0
>
> Attachments: HIVE-15947.10.patch, HIVE-15947.2.patch, 
> HIVE-15947.3.patch, HIVE-15947.4.patch, HIVE-15947.6.patch, 
> HIVE-15947.7.patch, HIVE-15947.8.patch, HIVE-15947.9.patch, HIVE-15947.patch
>
>
> Currently Templeton service doesn't restrict number of job operation 
> requests. It simply accepts and tries to run all operations. If more number 
> of concurrent job submit requests comes then the time to submit job 
> operations can increase significantly. Templetonused hdfs to store staging 
> file for job. If HDFS storage can't respond to large number of requests and 
> throttles then the job submission can take very large times in order of 
> minutes.
> This behavior may not be suitable for all applications and client 
> applications  may be looking for predictable and low response for successful 
> request or send throttle response to client to wait for some time before 
> re-requesting job operation.
> In this JIRA, I am trying to address following job operations 
> 1) Submit new Job
> 2) Get Job Status
> 3) List jobs
> These three operations has different complexity due to variance in use of 
> cluster resources like YARN/HDFS.
> The idea is to introduce a new config templeton.job.submit.exec.max-procs 
> which controls maximum number of concurrent active job submissions within 
> Templeton and use this config to control better response times. If a new job 
> submission request sees that there are already 
> templeton.job.submit.exec.max-procs jobs getting submitted concurrently then 
> the request will fail with Http error 503 with reason 
>“Too many concurrent job submission requests received. Please wait for 
> some time before retrying.”
>  
> The client is expected to catch this response and retry after waiting for 
> some time. The default value for the config 
> templeton.job.submit.exec.max-procs is set to ‘0’. This means by default job 
> submission requests are always accepted. The behavior needs to be enabled 
> based on requirements.
> We can have similar behavior for Status and List operations with configs 
> templeton.job.status.exec.max-procs and templeton.list.job.exec.max-procs 
> respectively.
> Once the job operation is started, the operation can take longer time. The 
> client which has requested for job operation may not be waiting for 
> indefinite amount of time. This work introduces configurations
> templeton.exec.job.submit.timeout
> templeton.exec.job.status.timeout
> templeton.exec.job.list.timeout
> to specify maximum amount of time job operation can execute. If time out 
> happens then list and status job requests returns to client with message
> "List job request got timed out. Please retry the operation after waiting for 
> some time."
> If submit job request gets timed out then 
>   i) The job submit request thread which receives time out will check if 
> valid job id is generated in job request.
>   ii) If it is generated then issue kill job request on cancel thread 
> pool. Don't wait for operation to complete and returns to client with time 
> out message. 
> Side effects of enabling time out for submit operations
> 1) This has a possibility for having active job for some time by the client 
> gets response and a list operation from client could potential show the newly 
> created job before it gets killed.
> 2) We do best effort to kill the job and no guarantees. This means there is a 
> possibility of duplicate job created. One possible reason for this could be a 
> case where job is created and then operation timed out but kill request 
> failed due to resource manager unavailability. When resource manager 
> restarts, it will restarts the job which got created.
> Fixing this scenario is not part of the scope of this JIRA. The job operation 
> functionality can be enabled only if above side effects are acceptable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15947) Enhance Templeton service job operations reliability

2017-03-16 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-15947:
--
Issue Type: Improvement  (was: Bug)

> Enhance Templeton service job operations reliability
> 
>
> Key: HIVE-15947
> URL: https://issues.apache.org/jira/browse/HIVE-15947
> Project: Hive
>  Issue Type: Improvement
>Reporter: Subramanyam Pattipaka
>Assignee: Subramanyam Pattipaka
> Fix For: 2.2.0
>
> Attachments: HIVE-15947.10.patch, HIVE-15947.2.patch, 
> HIVE-15947.3.patch, HIVE-15947.4.patch, HIVE-15947.6.patch, 
> HIVE-15947.7.patch, HIVE-15947.8.patch, HIVE-15947.9.patch, HIVE-15947.patch
>
>
> Currently Templeton service doesn't restrict number of job operation 
> requests. It simply accepts and tries to run all operations. If more number 
> of concurrent job submit requests comes then the time to submit job 
> operations can increase significantly. Templetonused hdfs to store staging 
> file for job. If HDFS storage can't respond to large number of requests and 
> throttles then the job submission can take very large times in order of 
> minutes.
> This behavior may not be suitable for all applications and client 
> applications  may be looking for predictable and low response for successful 
> request or send throttle response to client to wait for some time before 
> re-requesting job operation.
> In this JIRA, I am trying to address following job operations 
> 1) Submit new Job
> 2) Get Job Status
> 3) List jobs
> These three operations has different complexity due to variance in use of 
> cluster resources like YARN/HDFS.
> The idea is to introduce a new config templeton.job.submit.exec.max-procs 
> which controls maximum number of concurrent active job submissions within 
> Templeton and use this config to control better response times. If a new job 
> submission request sees that there are already 
> templeton.job.submit.exec.max-procs jobs getting submitted concurrently then 
> the request will fail with Http error 503 with reason 
>“Too many concurrent job submission requests received. Please wait for 
> some time before retrying.”
>  
> The client is expected to catch this response and retry after waiting for 
> some time. The default value for the config 
> templeton.job.submit.exec.max-procs is set to ‘0’. This means by default job 
> submission requests are always accepted. The behavior needs to be enabled 
> based on requirements.
> We can have similar behavior for Status and List operations with configs 
> templeton.job.status.exec.max-procs and templeton.list.job.exec.max-procs 
> respectively.
> Once the job operation is started, the operation can take longer time. The 
> client which has requested for job operation may not be waiting for 
> indefinite amount of time. This work introduces configurations
> templeton.exec.job.submit.timeout
> templeton.exec.job.status.timeout
> templeton.exec.job.list.timeout
> to specify maximum amount of time job operation can execute. If time out 
> happens then list and status job requests returns to client with message
> "List job request got timed out. Please retry the operation after waiting for 
> some time."
> If submit job request gets timed out then 
>   i) The job submit request thread which receives time out will check if 
> valid job id is generated in job request.
>   ii) If it is generated then issue kill job request on cancel thread 
> pool. Don't wait for operation to complete and returns to client with time 
> out message. 
> Side effects of enabling time out for submit operations
> 1) This has a possibility for having active job for some time by the client 
> gets response and a list operation from client could potential show the newly 
> created job before it gets killed.
> 2) We do best effort to kill the job and no guarantees. This means there is a 
> possibility of duplicate job created. One possible reason for this could be a 
> case where job is created and then operation timed out but kill request 
> failed due to resource manager unavailability. When resource manager 
> restarts, it will restarts the job which got created.
> Fixing this scenario is not part of the scope of this JIRA. The job operation 
> functionality can be enabled only if above side effects are acceptable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-14016) Vectorization: Add support for Grouping Sets

2017-03-16 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14016:

Status: In Progress  (was: Patch Available)

> Vectorization: Add support for Grouping Sets
> 
>
> Key: HIVE-14016
> URL: https://issues.apache.org/jira/browse/HIVE-14016
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Matt McCline
> Attachments: HIVE-14016.01.patch, HIVE-14016.02.patch, 
> HIVE-14016.03.patch, HIVE-14016.04.patch, HIVE-14016.05.patch, 
> HIVE-14016.06.patch, HIVE-14016.07.patch, HIVE-14016.091.patch, 
> HIVE-14016.09.patch
>
>
> Rollup and Cube queries are not vectorized today due to the miss of 
> grouping-sets inside vector group by.
> The cube and rollup operators can be shimmed onto the end of the pipeline by 
> converting a single row writer into a multiple row writer.
> The corresponding non-vec loop is as follows
> {code}
>   if (groupingSetsPresent) {
> Object[] newKeysArray = newKeys.getKeyArray();
> Object[] cloneNewKeysArray = new Object[newKeysArray.length];
> for (int keyPos = 0; keyPos < groupingSetsPosition; keyPos++) {
>   cloneNewKeysArray[keyPos] = newKeysArray[keyPos];
> }
> for (int groupingSetPos = 0; groupingSetPos < groupingSets.size(); 
> groupingSetPos++) {
>   for (int keyPos = 0; keyPos < groupingSetsPosition; keyPos++) {
> newKeysArray[keyPos] = null;
>   }
>   FastBitSet bitset = groupingSetsBitSet[groupingSetPos];
>   // Some keys need to be left to null corresponding to that grouping 
> set.
>   for (int keyPos = bitset.nextSetBit(0); keyPos >= 0;
> keyPos = bitset.nextSetBit(keyPos+1)) {
> newKeysArray[keyPos] = cloneNewKeysArray[keyPos];
>   }
>   newKeysArray[groupingSetsPosition] = 
> newKeysGroupingSets[groupingSetPos];
>   processKey(row, rowInspector);
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-14016) Vectorization: Add support for Grouping Sets

2017-03-16 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14016:

Attachment: HIVE-14016.091.patch

> Vectorization: Add support for Grouping Sets
> 
>
> Key: HIVE-14016
> URL: https://issues.apache.org/jira/browse/HIVE-14016
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Matt McCline
> Attachments: HIVE-14016.01.patch, HIVE-14016.02.patch, 
> HIVE-14016.03.patch, HIVE-14016.04.patch, HIVE-14016.05.patch, 
> HIVE-14016.06.patch, HIVE-14016.07.patch, HIVE-14016.091.patch, 
> HIVE-14016.09.patch
>
>
> Rollup and Cube queries are not vectorized today due to the miss of 
> grouping-sets inside vector group by.
> The cube and rollup operators can be shimmed onto the end of the pipeline by 
> converting a single row writer into a multiple row writer.
> The corresponding non-vec loop is as follows
> {code}
>   if (groupingSetsPresent) {
> Object[] newKeysArray = newKeys.getKeyArray();
> Object[] cloneNewKeysArray = new Object[newKeysArray.length];
> for (int keyPos = 0; keyPos < groupingSetsPosition; keyPos++) {
>   cloneNewKeysArray[keyPos] = newKeysArray[keyPos];
> }
> for (int groupingSetPos = 0; groupingSetPos < groupingSets.size(); 
> groupingSetPos++) {
>   for (int keyPos = 0; keyPos < groupingSetsPosition; keyPos++) {
> newKeysArray[keyPos] = null;
>   }
>   FastBitSet bitset = groupingSetsBitSet[groupingSetPos];
>   // Some keys need to be left to null corresponding to that grouping 
> set.
>   for (int keyPos = bitset.nextSetBit(0); keyPos >= 0;
> keyPos = bitset.nextSetBit(keyPos+1)) {
> newKeysArray[keyPos] = cloneNewKeysArray[keyPos];
>   }
>   newKeysArray[groupingSetsPosition] = 
> newKeysGroupingSets[groupingSetPos];
>   processKey(row, rowInspector);
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-14016) Vectorization: Add support for Grouping Sets

2017-03-16 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14016:

Status: Patch Available  (was: In Progress)

> Vectorization: Add support for Grouping Sets
> 
>
> Key: HIVE-14016
> URL: https://issues.apache.org/jira/browse/HIVE-14016
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Matt McCline
> Attachments: HIVE-14016.01.patch, HIVE-14016.02.patch, 
> HIVE-14016.03.patch, HIVE-14016.04.patch, HIVE-14016.05.patch, 
> HIVE-14016.06.patch, HIVE-14016.07.patch, HIVE-14016.091.patch, 
> HIVE-14016.09.patch
>
>
> Rollup and Cube queries are not vectorized today due to the miss of 
> grouping-sets inside vector group by.
> The cube and rollup operators can be shimmed onto the end of the pipeline by 
> converting a single row writer into a multiple row writer.
> The corresponding non-vec loop is as follows
> {code}
>   if (groupingSetsPresent) {
> Object[] newKeysArray = newKeys.getKeyArray();
> Object[] cloneNewKeysArray = new Object[newKeysArray.length];
> for (int keyPos = 0; keyPos < groupingSetsPosition; keyPos++) {
>   cloneNewKeysArray[keyPos] = newKeysArray[keyPos];
> }
> for (int groupingSetPos = 0; groupingSetPos < groupingSets.size(); 
> groupingSetPos++) {
>   for (int keyPos = 0; keyPos < groupingSetsPosition; keyPos++) {
> newKeysArray[keyPos] = null;
>   }
>   FastBitSet bitset = groupingSetsBitSet[groupingSetPos];
>   // Some keys need to be left to null corresponding to that grouping 
> set.
>   for (int keyPos = bitset.nextSetBit(0); keyPos >= 0;
> keyPos = bitset.nextSetBit(keyPos+1)) {
> newKeysArray[keyPos] = cloneNewKeysArray[keyPos];
>   }
>   newKeysArray[groupingSetsPosition] = 
> newKeysGroupingSets[groupingSetPos];
>   processKey(row, rowInspector);
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work

2017-03-16 Thread Swarnim Kulkarni (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928510#comment-15928510
 ] 

Swarnim Kulkarni commented on HIVE-11609:
-

Can I get one of the hive committers to take a quick look at this one?

> Capability to add a filter to hbase scan via composite key doesn't work
> ---
>
> Key: HIVE-11609
> URL: https://issues.apache.org/jira/browse/HIVE-11609
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Reporter: Swarnim Kulkarni
>Assignee: Swarnim Kulkarni
> Attachments: HIVE-11609.1.patch.txt, HIVE-11609.2.patch.txt, 
> HIVE-11609.3.patch.txt, HIVE-11609.4.patch.txt, HIVE-11609.5.patch, 
> HIVE-11609.6.patch.txt
>
>
> It seems like the capability to add filter to an hbase scan which was added 
> as part of HIVE-6411 doesn't work. This is primarily because in the 
> HiveHBaseInputFormat, the filter is added in the getsplits instead of 
> getrecordreader. This works fine for start and stop keys but not for filter 
> because a filter is respected only when an actual scan is performed. This is 
> also related to the initial refactoring that was done as part of HIVE-3420.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16168) llap log links should use the NM nodeId port instead of web port

2017-03-16 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928514#comment-15928514
 ] 

Siddharth Seth commented on HIVE-16168:
---

[~leftylev] - this should not be documented. The two parameters can be 
considered private to Hive.

> llap log links should use the NM nodeId port instead of web port
> 
>
> Key: HIVE-16168
> URL: https://issues.apache.org/jira/browse/HIVE-16168
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Fix For: 2.2.0
>
> Attachments: HIVE-16168.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-14796) MetastoreEventListener - OnGrant() and OnRevoke() Events required for capturing the event on grant and revoke operation on the table in hive.

2017-03-16 Thread Tanping Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928521#comment-15928521
 ] 

Tanping Wang commented on HIVE-14796:
-

[~ashutoshc]  Is this something that the Hive Open Source community is 
interesting in accepting?  Would like to hear your feedback before we can start 
working on this.

> MetastoreEventListener - OnGrant() and OnRevoke() Events required for 
> capturing the event on grant and revoke operation on the table in hive. 
> --
>
> Key: HIVE-14796
> URL: https://issues.apache.org/jira/browse/HIVE-14796
> Project: Hive
>  Issue Type: New Feature
>  Components: Authorization
>Affects Versions: 1.2.1
> Environment: RHEL6 and RHEL7
>Reporter: Rahul Dhote
>
> During granting and revoking privileges on the table, OnGrant and OnRevoke 
> method is required inside the MetastoreEventListener for capturing the events.
> It would be useful for doing certain operation on this basic authorization 
> events. 
> Ex:
>  /**
>* @param OnGrantEvent grant event
>* @throws MetaException
>*/
>   public void onGrant(GrantEvent grantEvent) throws MetaException {
>   }
>   /**
>* @param OnRevoke revoke event
>* @throws MetaException
>*/
>   public void onRevoke(RevokeEvent revokeEvent) throws MetaException {
>   }



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15007) Hive 1.2.2 release planning

2017-03-16 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-15007:

Attachment: HIVE-15007-branch-1.2.patch

> Hive 1.2.2 release planning
> ---
>
> Key: HIVE-15007
> URL: https://issues.apache.org/jira/browse/HIVE-15007
> Project: Hive
>  Issue Type: Task
>Affects Versions: 1.2.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-15007-branch-1.2.patch, 
> HIVE-15007-branch-1.2.patch, HIVE-15007-branch-1.2.patch, 
> HIVE-15007-branch-1.2.patch, HIVE-15007-branch-1.2.patch, 
> HIVE-15007-branch-1.2.patch, HIVE-15007.branch-1.2.patch
>
>
> Discussed with [~spena] about triggering unit test runs for 1.2.2 release and 
> creating a patch which will trigger precommits looks like a good way.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16196) UDFJson having thread-safety issues

2017-03-16 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928578#comment-15928578
 ] 

Chao Sun commented on HIVE-16196:
-

+1

> UDFJson having thread-safety issues
> ---
>
> Key: HIVE-16196
> URL: https://issues.apache.org/jira/browse/HIVE-16196
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-16196.patch
>
>
> Followup for HIVE-16183, there seems to be some concurrency issues in 
> UDFJson.java, especially around static class variables.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16160) OutOfMemoryError: GC overhead limit exceeded on Hiveserver2

2017-03-16 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928586#comment-15928586
 ] 

Sushanth Sowmyan commented on HIVE-16160:
-

Looks like something is up with the Hive QA automation bot that it didn't 
comment after the tests ran. But, based on the results, I'm going to go ahead 
and commit.

> OutOfMemoryError: GC overhead limit exceeded on Hiveserver2 
> 
>
> Key: HIVE-16160
> URL: https://issues.apache.org/jira/browse/HIVE-16160
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Kavan Suresh
>Assignee: Sushanth Sowmyan
>Priority: Critical
> Attachments: HIVE-16160.patch, Screen Shot 2017-03-15 at 9.10.15 
> AM.png
>
>
> Hs2 process killed by OOM:
> {code:java}
>  ERROR [HiveServer2-Handler-Pool: Thread-1361771]: 
> metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(203)) - 
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> at java.lang.String.toLowerCase(String.java:2647)
> at 
> org.datanucleus.Configuration.getInternalNameForProperty(Configuration.java:188)
> at 
> org.datanucleus.ExecutionContextImpl.getProperty(ExecutionContextImpl.java:1012)
> at 
> org.datanucleus.state.AbstractStateManager.updateLevel2CacheForFields(AbstractStateManager.java:979)
> at 
> org.datanucleus.state.AbstractStateManager.loadFieldsInFetchPlan(AbstractStateManager.java:1097)
> at 
> org.datanucleus.ExecutionContextImpl.performDetachAllOnTxnEndPreparation(ExecutionContextImpl.java:4544)
> at 
> org.datanucleus.ExecutionContextImpl.preCommit(ExecutionContextImpl.java:4199)
> at 
> org.datanucleus.ExecutionContextImpl.transactionPreCommit(ExecutionContextImpl.java:770)
> at 
> org.datanucleus.TransactionImpl.internalPreCommit(TransactionImpl.java:385)
> at org.datanucleus.TransactionImpl.commit(TransactionImpl.java:275)
> at 
> org.datanucleus.api.jdo.JDOTransaction.commit(JDOTransaction.java:107)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.commitTransaction(ObjectStore.java:570)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getTable(ObjectStore.java:1033)
> at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:103)
> at com.sun.proxy.$Proxy7.getTable(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_core(HiveMetaStore.java:1915)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1887)
> at sun.reflect.GeneratedMethodAccessor36.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105)
> at com.sun.proxy.$Proxy12.get_table(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1271)
> at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.getTable(SessionHiveMetaStoreClient.java:131)
> at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:178)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16211) MERGE statement failing with ClassCastException

2017-03-16 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16211:
--
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

I've committed this to master

> MERGE statement failing with ClassCastException
> ---
>
> Key: HIVE-16211
> URL: https://issues.apache.org/jira/browse/HIVE-16211
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Fix For: 2.2.0
>
> Attachments: HIVE-16211.1.patch, HIVE-16211.2.patch, 
> HIVE-16211.3.patch
>
>
> Issuing a merge statement gives this error,
> hive> 2017-03-14T18:34:02,945 ERROR [17d1c728-8865-47f5-a6fd-2b156d183d0f 
> main] ql.Driver: FAILED: ClassCastException 
> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc cannot be cast to 
> org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc
> java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc cannot be cast to 
> org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc
>   at 
> org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.generateSemiJoinOperatorPlan(DynamicPartitionPruningOptimization.java:410)
>   at 
> org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.process(DynamicPartitionPruningOptimization.java:226)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>   at 
> org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:74)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
>   at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.runDynamicPartitionPruning(TezCompiler.java:359)
>   at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:91)
>   at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:138)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11159)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10708)
>   at 
> org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.analyzeInternal(UpdateDeleteSemanticAnalyzer.java:70)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:257)
>   at 
> org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.analyzeMerge(UpdateDeleteSemanticAnalyzer.java:729)
>   at 
> org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.analyzeInternal(UpdateDeleteSemanticAnalyzer.java:84)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:257)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:455)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1197)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1290)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1123)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:148)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-14016) Vectorization: Add support for Grouping Sets

2017-03-16 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928604#comment-15928604
 ] 

Jesus Camacho Rodriguez commented on HIVE-14016:


+1

> Vectorization: Add support for Grouping Sets
> 
>
> Key: HIVE-14016
> URL: https://issues.apache.org/jira/browse/HIVE-14016
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Matt McCline
> Attachments: HIVE-14016.01.patch, HIVE-14016.02.patch, 
> HIVE-14016.03.patch, HIVE-14016.04.patch, HIVE-14016.05.patch, 
> HIVE-14016.06.patch, HIVE-14016.07.patch, HIVE-14016.091.patch, 
> HIVE-14016.09.patch
>
>
> Rollup and Cube queries are not vectorized today due to the miss of 
> grouping-sets inside vector group by.
> The cube and rollup operators can be shimmed onto the end of the pipeline by 
> converting a single row writer into a multiple row writer.
> The corresponding non-vec loop is as follows
> {code}
>   if (groupingSetsPresent) {
> Object[] newKeysArray = newKeys.getKeyArray();
> Object[] cloneNewKeysArray = new Object[newKeysArray.length];
> for (int keyPos = 0; keyPos < groupingSetsPosition; keyPos++) {
>   cloneNewKeysArray[keyPos] = newKeysArray[keyPos];
> }
> for (int groupingSetPos = 0; groupingSetPos < groupingSets.size(); 
> groupingSetPos++) {
>   for (int keyPos = 0; keyPos < groupingSetsPosition; keyPos++) {
> newKeysArray[keyPos] = null;
>   }
>   FastBitSet bitset = groupingSetsBitSet[groupingSetPos];
>   // Some keys need to be left to null corresponding to that grouping 
> set.
>   for (int keyPos = bitset.nextSetBit(0); keyPos >= 0;
> keyPos = bitset.nextSetBit(keyPos+1)) {
> newKeysArray[keyPos] = cloneNewKeysArray[keyPos];
>   }
>   newKeysArray[groupingSetsPosition] = 
> newKeysGroupingSets[groupingSetPos];
>   processKey(row, rowInspector);
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15784) Vectorization: Turn on text vectorization by default

2017-03-16 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928614#comment-15928614
 ] 

Sergey Shelukhin commented on HIVE-15784:
-

Hmm, I thought the plan was to do it the other way around. I will commit 
HIVE-16222 then

> Vectorization: Turn on text vectorization by default
> 
>
> Key: HIVE-15784
> URL: https://issues.apache.org/jira/browse/HIVE-15784
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15784.01.patch, HIVE-15784.02.patch, 
> HIVE-15784.03.patch
>
>
> *Turn ON text vectorization related variables* 
> hive.vectorized.use.vector.serde.deserialize and 
> hive.vectorized.use.row.serde.deserialize by default.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15784) Vectorization: Turn on text vectorization by default

2017-03-16 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928615#comment-15928615
 ] 

Sergey Shelukhin commented on HIVE-15784:
-

Actually that is still pending tests... so I will commit at some point.

> Vectorization: Turn on text vectorization by default
> 
>
> Key: HIVE-15784
> URL: https://issues.apache.org/jira/browse/HIVE-15784
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15784.01.patch, HIVE-15784.02.patch, 
> HIVE-15784.03.patch
>
>
> *Turn ON text vectorization related variables* 
> hive.vectorized.use.vector.serde.deserialize and 
> hive.vectorized.use.row.serde.deserialize by default.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16160) OutOfMemoryError: GC overhead limit exceeded on Hiveserver2

2017-03-16 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-16160:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks, [~daijy]

> OutOfMemoryError: GC overhead limit exceeded on Hiveserver2 
> 
>
> Key: HIVE-16160
> URL: https://issues.apache.org/jira/browse/HIVE-16160
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Kavan Suresh
>Assignee: Sushanth Sowmyan
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-16160.patch, Screen Shot 2017-03-15 at 9.10.15 
> AM.png
>
>
> Hs2 process killed by OOM:
> {code:java}
>  ERROR [HiveServer2-Handler-Pool: Thread-1361771]: 
> metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(203)) - 
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> at java.lang.String.toLowerCase(String.java:2647)
> at 
> org.datanucleus.Configuration.getInternalNameForProperty(Configuration.java:188)
> at 
> org.datanucleus.ExecutionContextImpl.getProperty(ExecutionContextImpl.java:1012)
> at 
> org.datanucleus.state.AbstractStateManager.updateLevel2CacheForFields(AbstractStateManager.java:979)
> at 
> org.datanucleus.state.AbstractStateManager.loadFieldsInFetchPlan(AbstractStateManager.java:1097)
> at 
> org.datanucleus.ExecutionContextImpl.performDetachAllOnTxnEndPreparation(ExecutionContextImpl.java:4544)
> at 
> org.datanucleus.ExecutionContextImpl.preCommit(ExecutionContextImpl.java:4199)
> at 
> org.datanucleus.ExecutionContextImpl.transactionPreCommit(ExecutionContextImpl.java:770)
> at 
> org.datanucleus.TransactionImpl.internalPreCommit(TransactionImpl.java:385)
> at org.datanucleus.TransactionImpl.commit(TransactionImpl.java:275)
> at 
> org.datanucleus.api.jdo.JDOTransaction.commit(JDOTransaction.java:107)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.commitTransaction(ObjectStore.java:570)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getTable(ObjectStore.java:1033)
> at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:103)
> at com.sun.proxy.$Proxy7.getTable(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_core(HiveMetaStore.java:1915)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1887)
> at sun.reflect.GeneratedMethodAccessor36.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105)
> at com.sun.proxy.$Proxy12.get_table(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1271)
> at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.getTable(SessionHiveMetaStoreClient.java:131)
> at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:178)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15978) Support regr_* functions

2017-03-16 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928622#comment-15928622
 ] 

Ashutosh Chauhan commented on HIVE-15978:
-

+1

> Support regr_* functions
> 
>
> Key: HIVE-15978
> URL: https://issues.apache.org/jira/browse/HIVE-15978
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Carter Shanklin
>Assignee: Zoltan Haindrich
> Attachments: HIVE-15978.1.patch, HIVE-15978.2.patch, 
> HIVE-15978.2.patch, HIVE-15978.3.patch
>
>
> Support the standard regr_* functions, regr_slope, regr_intercept, regr_r2, 
> regr_sxx, regr_syy, regr_sxy, regr_avgx, regr_avgy, regr_count. SQL reference 
> section 10.9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15978) Support regr_* functions

2017-03-16 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-15978:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

pushed to master, thank you Ashutosh for the review!

> Support regr_* functions
> 
>
> Key: HIVE-15978
> URL: https://issues.apache.org/jira/browse/HIVE-15978
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Carter Shanklin
>Assignee: Zoltan Haindrich
> Fix For: 2.2.0
>
> Attachments: HIVE-15978.1.patch, HIVE-15978.2.patch, 
> HIVE-15978.2.patch, HIVE-15978.3.patch
>
>
> Support the standard regr_* functions, regr_slope, regr_intercept, regr_r2, 
> regr_sxx, regr_syy, regr_sxy, regr_avgx, regr_avgy, regr_count. SQL reference 
> section 10.9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15784) Vectorization: Turn on text vectorization by default

2017-03-16 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928627#comment-15928627
 ] 

Matt McCline commented on HIVE-15784:
-

TestCliDriver.orc_int_type_promotion.q triggers this stack trace:

{code}
Caused by: java.lang.IllegalStateException
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:133) 
~[guava-14.0.1.jar:?]
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.createAndInitPartitionContext(VectorMapOperator.java:394)
 ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.internalSetChildren(VectorMapOperator.java:559)
 ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.setChildren(VectorMapOperator.java:474)
 ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:106) 
~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) ~[?:?]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_91]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_91]
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) 
~[hadoop-common-2.7.2.jar:?]
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78) 
~[hadoop-common-2.7.2.jar:?]
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) 
~[hadoop-common-2.7.2.jar:?]
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) 
~[hadoop-mapreduce-client-core-2.7.2.jar:?]
at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) ~[?:?]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_91]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_91]
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) 
~[hadoop-common-2.7.2.jar:?]
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78) 
~[hadoop-common-2.7.2.jar:?]
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) 
~[hadoop-common-2.7.2.jar:?]
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) 
~[hadoop-mapreduce-client-core-2.7.2.jar:?]
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
~[hadoop-mapreduce-client-core-2.7.2.jar:?]
at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
 ~[hadoop-mapreduce-client-common-2.7.2.jar:?]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[?:1.8.0_91]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_91]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
~[?:1.8.0_91]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
~[?:1.8.0_91]
at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_91]
{code}

State check is:
{code}
Preconditions.checkState(Utilities.isSchemaEvolutionEnabled(hconf, isAcid));
{code}

> Vectorization: Turn on text vectorization by default
> 
>
> Key: HIVE-15784
> URL: https://issues.apache.org/jira/browse/HIVE-15784
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15784.01.patch, HIVE-15784.02.patch, 
> HIVE-15784.03.patch
>
>
> *Turn ON text vectorization related variables* 
> hive.vectorized.use.vector.serde.deserialize and 
> hive.vectorized.use.row.serde.deserialize by default.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16166) HS2 may still waste up to 15% of memory on duplicate strings

2017-03-16 Thread Misha Dmitriev (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928635#comment-15928635
 ] 

Misha Dmitriev commented on HIVE-16166:
---

@spena I ran all these tests locally, and they passed for me. So this looks 
like flakiness in the Hive build, which is not uncommon.

> HS2 may still waste up to 15% of memory on duplicate strings
> 
>
> Key: HIVE-16166
> URL: https://issues.apache.org/jira/browse/HIVE-16166
> Project: Hive
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: ch_2_excerpt.txt, HIVE-16166.01.patch
>
>
> A heap dump obtained from one of our users shows that 15% of memory is wasted 
> on duplicate strings, despite the recent optimizations that I made. The 
> problematic strings just come from different sources this time. See the 
> excerpt from the jxray (www.jxray.com) analysis attached.
> Adding String.intern() calls in the appropriate places reduces the overhead 
> of duplicate strings with this workload to ~6%. The remaining duplicates come 
> mostly from JDK internal and MapReduce data structures, and thus are more 
> difficult to fix.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16166) HS2 may still waste up to 15% of memory on duplicate strings

2017-03-16 Thread Misha Dmitriev (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928635#comment-15928635
 ] 

Misha Dmitriev edited comment on HIVE-16166 at 3/16/17 6:53 PM:


[~spena] I ran all these tests locally, and they passed for me. So this looks 
like flakiness in the Hive build, which is not uncommon.


was (Author: mi...@cloudera.com):
@spena I ran all these tests locally, and they passed for me. So this looks 
like flakiness in the Hive build, which is not uncommon.

> HS2 may still waste up to 15% of memory on duplicate strings
> 
>
> Key: HIVE-16166
> URL: https://issues.apache.org/jira/browse/HIVE-16166
> Project: Hive
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: ch_2_excerpt.txt, HIVE-16166.01.patch
>
>
> A heap dump obtained from one of our users shows that 15% of memory is wasted 
> on duplicate strings, despite the recent optimizations that I made. The 
> problematic strings just come from different sources this time. See the 
> excerpt from the jxray (www.jxray.com) analysis attached.
> Adding String.intern() calls in the appropriate places reduces the overhead 
> of duplicate strings with this workload to ~6%. The remaining duplicates come 
> mostly from JDK internal and MapReduce data structures, and thus are more 
> difficult to fix.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-7142) Hive multi serialization encoding support

2017-03-16 Thread Nitin Kak (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928650#comment-15928650
 ] 

Nitin Kak commented on HIVE-7142:
-

What encoding should I use for 16 bit little endian, UTF-16LE? Is that a valid 
value?

> Hive multi serialization encoding support
> -
>
> Key: HIVE-7142
> URL: https://issues.apache.org/jira/browse/HIVE-7142
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
> Fix For: 0.14.0
>
> Attachments: HIVE-7142.1.patch.txt, HIVE-7142.2.patch, 
> HIVE-7142.3.patch, HIVE-7142.4.patch
>
>
> Currently Hive only support serialize data into UTF-8 charset bytes or 
> deserialize from UTF-8 bytes, real world users may want to load different 
> kinds of encoded data into hive directly. This jira is dedicated to support 
> serialize/deserialize all kinds of encoded data in SerDe layer. 
> For user, only need to configure serialization encoding on table level by set 
> serialization encoding through serde parameter, for example:
> {code:sql}
> CREATE TABLE person(id INT, name STRING, desc STRING)ROW FORMAT SERDE 
> 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH 
> SERDEPROPERTIES("serialization.encoding"='GBK');
> {code}
> or
> {code:sql}
> ALTER TABLE person SET SERDEPROPERTIES ('serialization.encoding'='GBK'); 
> {code}
> LIMITATIONS: Only LazySimpleSerDe support "serialization.encoding" property 
> in this patch.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16216) update trunk/content/people.mdtext

2017-03-16 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928683#comment-15928683
 ] 

Jason Dere commented on HIVE-16216:
---

+1

> update trunk/content/people.mdtext
> --
>
> Key: HIVE-16216
> URL: https://issues.apache.org/jira/browse/HIVE-16216
> Project: Hive
>  Issue Type: New Feature
>  Components: Documentation
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-16216.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15931) JDBC: Improve logging when using ZooKeeper and anonymize passwords before logging

2017-03-16 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-15931:

Attachment: HIVE-15931.4.patch

> JDBC: Improve logging when using ZooKeeper and anonymize passwords before 
> logging
> -
>
> Key: HIVE-15931
> URL: https://issues.apache.org/jira/browse/HIVE-15931
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 2.2.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-15931.1.patch, HIVE-15931.2.patch, 
> HIVE-15931.3.patch, HIVE-15931.4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16234) Add support for quarter in trunc udf

2017-03-16 Thread Deepesh Khandelwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepesh Khandelwal reassigned HIVE-16234:
-


> Add support for quarter in trunc udf
> 
>
> Key: HIVE-16234
> URL: https://issues.apache.org/jira/browse/HIVE-16234
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Deepesh Khandelwal
>Assignee: Deepesh Khandelwal
>
> Hive has a Date function trunc(string date, string format) that returns date 
> truncated to the unit specified by the format. Supported formats: 
> MONTH/MON/MM, YEAR//YY.
> Goal here is to extend support to QUARTER/Q.
> Example:
> SELECT trunc('2017-03-15', 'Q');
> '2017-01-01'
> SELECT trunc('2017-12-31', 'Q');
> '2017-10-01'



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15784) Vectorization: Turn on text vectorization by default

2017-03-16 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15784:

Status: In Progress  (was: Patch Available)

> Vectorization: Turn on text vectorization by default
> 
>
> Key: HIVE-15784
> URL: https://issues.apache.org/jira/browse/HIVE-15784
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15784.01.patch, HIVE-15784.02.patch, 
> HIVE-15784.03.patch, HIVE-15784.04.patch
>
>
> *Turn ON text vectorization related variables* 
> hive.vectorized.use.vector.serde.deserialize and 
> hive.vectorized.use.row.serde.deserialize by default.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15784) Vectorization: Turn on text vectorization by default

2017-03-16 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15784:

Attachment: HIVE-15784.04.patch

> Vectorization: Turn on text vectorization by default
> 
>
> Key: HIVE-15784
> URL: https://issues.apache.org/jira/browse/HIVE-15784
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15784.01.patch, HIVE-15784.02.patch, 
> HIVE-15784.03.patch, HIVE-15784.04.patch
>
>
> *Turn ON text vectorization related variables* 
> hive.vectorized.use.vector.serde.deserialize and 
> hive.vectorized.use.row.serde.deserialize by default.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15784) Vectorization: Turn on text vectorization by default

2017-03-16 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15784:

Status: Patch Available  (was: In Progress)

> Vectorization: Turn on text vectorization by default
> 
>
> Key: HIVE-15784
> URL: https://issues.apache.org/jira/browse/HIVE-15784
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15784.01.patch, HIVE-15784.02.patch, 
> HIVE-15784.03.patch, HIVE-15784.04.patch
>
>
> *Turn ON text vectorization related variables* 
> hive.vectorized.use.vector.serde.deserialize and 
> hive.vectorized.use.row.serde.deserialize by default.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16234) Add support for quarter in trunc udf

2017-03-16 Thread Deepesh Khandelwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepesh Khandelwal updated HIVE-16234:
--
Attachment: HIVE-16234.1.patch

Attaching patch for review. Also created a review board, see 
https://reviews.apache.org/r/57701/
cc. [~ashutoshc] [~jdere]

> Add support for quarter in trunc udf
> 
>
> Key: HIVE-16234
> URL: https://issues.apache.org/jira/browse/HIVE-16234
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Deepesh Khandelwal
>Assignee: Deepesh Khandelwal
> Attachments: HIVE-16234.1.patch
>
>
> Hive has a Date function trunc(string date, string format) that returns date 
> truncated to the unit specified by the format. Supported formats: 
> MONTH/MON/MM, YEAR//YY.
> Goal here is to extend support to QUARTER/Q.
> Example:
> SELECT trunc('2017-03-15', 'Q');
> '2017-01-01'
> SELECT trunc('2017-12-31', 'Q');
> '2017-10-01'



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16234) Add support for quarter in trunc udf

2017-03-16 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-16234:

Status: Patch Available  (was: Open)

> Add support for quarter in trunc udf
> 
>
> Key: HIVE-16234
> URL: https://issues.apache.org/jira/browse/HIVE-16234
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Deepesh Khandelwal
>Assignee: Deepesh Khandelwal
> Attachments: HIVE-16234.1.patch
>
>
> Hive has a Date function trunc(string date, string format) that returns date 
> truncated to the unit specified by the format. Supported formats: 
> MONTH/MON/MM, YEAR//YY.
> Goal here is to extend support to QUARTER/Q.
> Example:
> SELECT trunc('2017-03-15', 'Q');
> '2017-01-01'
> SELECT trunc('2017-12-31', 'Q');
> '2017-10-01'



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16196) UDFJson having thread-safety issues

2017-03-16 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-16196:
---
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Patch committed to master. Thanks to Chao for the review.

> UDFJson having thread-safety issues
> ---
>
> Key: HIVE-16196
> URL: https://issues.apache.org/jira/browse/HIVE-16196
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 2.2.0
>
> Attachments: HIVE-16196.patch
>
>
> Followup for HIVE-16183, there seems to be some concurrency issues in 
> UDFJson.java, especially around static class variables.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16166) HS2 may still waste up to 15% of memory on duplicate strings

2017-03-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-16166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928899#comment-15928899
 ] 

Sergio Peña commented on HIVE-16166:


I was looking into HIVE-15058 to see if those flaky tests were already 
reported, but they're not. I will retry HiveQA again to see if those flaky 
tests go away. I know these might not related to the patch, but seems we have 
green tests, I would like to give it another shot just to see a clean test 
result.

> HS2 may still waste up to 15% of memory on duplicate strings
> 
>
> Key: HIVE-16166
> URL: https://issues.apache.org/jira/browse/HIVE-16166
> Project: Hive
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: ch_2_excerpt.txt, HIVE-16166.01.patch
>
>
> A heap dump obtained from one of our users shows that 15% of memory is wasted 
> on duplicate strings, despite the recent optimizations that I made. The 
> problematic strings just come from different sources this time. See the 
> excerpt from the jxray (www.jxray.com) analysis attached.
> Adding String.intern() calls in the appropriate places reduces the overhead 
> of duplicate strings with this workload to ~6%. The remaining duplicates come 
> mostly from JDK internal and MapReduce data structures, and thus are more 
> difficult to fix.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16166) HS2 may still waste up to 15% of memory on duplicate strings

2017-03-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-16166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928900#comment-15928900
 ] 

Sergio Peña commented on HIVE-16166:


I just trigger the jenkins job (#4197).
See https://builds.apache.org/view/H-L/view/Hive/job/PreCommit-HIVE-Build/ 

> HS2 may still waste up to 15% of memory on duplicate strings
> 
>
> Key: HIVE-16166
> URL: https://issues.apache.org/jira/browse/HIVE-16166
> Project: Hive
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: ch_2_excerpt.txt, HIVE-16166.01.patch
>
>
> A heap dump obtained from one of our users shows that 15% of memory is wasted 
> on duplicate strings, despite the recent optimizations that I made. The 
> problematic strings just come from different sources this time. See the 
> excerpt from the jxray (www.jxray.com) analysis attached.
> Adding String.intern() calls in the appropriate places reduces the overhead 
> of duplicate strings with this workload to ~6%. The remaining duplicates come 
> mostly from JDK internal and MapReduce data structures, and thus are more 
> difficult to fix.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16107) JDBC: HttpClient should retry one more time on NoHttpResponseException

2017-03-16 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928938#comment-15928938
 ] 

Daniel Dai commented on HIVE-16107:
---

LGTM. +1.

> JDBC: HttpClient should retry one more time on NoHttpResponseException
> --
>
> Key: HIVE-16107
> URL: https://issues.apache.org/jira/browse/HIVE-16107
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 2.0.1, 2.1.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-16107.1.patch
>
>
> Hive's JDBC client in HTTP transport mode doesn't retry on 
> NoHttpResponseException. We've seen the exception being thrown to the JDBC 
> end user when used with Knox as the proxy, when Knox upgraded its jetty 
> version, which has a smaller value for jetty connector idletimeout, and as a 
> result closes the HTTP connection on server side. The next jdbc query on the 
> client, throws a NoHttpResponseException. However, subsequent queries 
> reconnect, but the JDBC driver should ideally handle this by retrying.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16188) beeline should block the connection if given invalid database name.

2017-03-16 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928951#comment-15928951
 ] 

Vihang Karajgaonkar commented on HIVE-16188:


I looked into code again and I think client side is better place to handle this 
validation than on the HiveServer2 side. In addition to this check in 
HiveSession I think we should also add this check in 
HiveDriver:parseURLforPropertyInfo() method. Since we know that connection 
request will fail there is no need to even attempt making a connection. What do 
you think [~stakiar]?

> beeline should block the connection if given invalid database name.
> ---
>
> Key: HIVE-16188
> URL: https://issues.apache.org/jira/browse/HIVE-16188
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Pavas Garg
>Assignee: Sahil Takiar
>Priority: Minor
> Attachments: HIVE-16188.1.patch, HIVE-16188.2.patch
>
>
> When using beeline shell to connect to HS2 or impalaD as below -
> Connection to HS2 using beeline tool on port 1 -
> beeline -u 
> "jdbc:hive2://HS2-host-name:1/default;principal=hive/hs2-host-n...@domain.example.com"
> Connection to ImpalaD using beeline tool on port 21050 -
> beeline -u 
> "jdbc:hive2://impalad-host-name.com:21050/XXX;principal=impala/impalad-host-name@domain.example.com"
>  
> Providing a invalid database name as XXX - the connection is made.
> It should ideally stop the connection to be successfull.
> Even though, the beeline tool does not allow to move forward, unless you 
> provide a valid DB name, like
> Use ;



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-12274) Increase width of columns used for general configuration in the metastore.

2017-03-16 Thread Mass Dosage (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928971#comment-15928971
 ] 

Mass Dosage commented on HIVE-12274:


[~ngangam] Past experience shows that the Hive test suite is very flakey, 
normally when we reach out to one of the committers they take a look and go "oh 
yes, that's fine, that's got nothing to do with your patch, you can ignore 
those". Hopefully that's the same case here.

> Increase width of columns used for general configuration in the metastore.
> --
>
> Key: HIVE-12274
> URL: https://issues.apache.org/jira/browse/HIVE-12274
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.0.0
>Reporter: Elliot West
>Assignee: Naveen Gangam
>  Labels: metastore
> Attachments: HIVE-12274.2.patch, HIVE-12274.3.patch, 
> HIVE-12274.4.patch, HIVE-12274.example.ddl.hql, HIVE-12274.patch
>
>
> h2. Overview
> This issue is very similar in principle to HIVE-1364. We are hitting a limit 
> when processing JSON data that has a large nested schema. The struct 
> definition is truncated when inserted into the metastore database column 
> {{COLUMNS_V2.YPE_NAME}} as it is greater than 4000 characters in length.
> Given that the purpose of these columns is to hold very loosely defined 
> configuration values it seems rather limiting to impose such a relatively low 
> length bound. One can imagine that valid use cases will arise where 
> reasonable parameter/property values exceed the current limit. 
> h2. Context
> These limitations were in by the [patch 
> attributed|https://github.com/apache/hive/commit/c21a526b0a752df2a51d20a2729cc8493c228799]
>  to HIVE-1364 which mentions the _"max length on Oracle 9i/10g/11g"_ as the 
> reason. However, nowadays the limit can be increased because:
> * Oracle DB's {{varchar2}} supports 32767 bytes now, by setting the 
> configuration parameter {{MAX_STRING_SIZE}} to {{EXTENDED}}. 
> ([source|http://docs.oracle.com/database/121/SQLRF/sql_elements001.htm#SQLRF55623])
> * Postgres supports a max of 1GB for {{character}} datatype. 
> ([source|http://www.postgresql.org/docs/8.3/static/datatype-character.html])
> * MySQL can support upto 65535 bytes for the entire row. So long as the 
> {{PARAM_KEY}} value + {{PARAM_VALUE}} is less than 65535, we should be good. 
> ([source|http://dev.mysql.com/doc/refman/5.0/en/char.html])
> * SQL Server's {{varchar}} max length is 8000 and can go beyond using 
> "varchar(max)" with the same limitation as MySQL being 65535 bytes for the 
> entire row. ([source|http://dev.mysql.com/doc/refman/5.0/en/char.html])
> * Derby's {{varchar}} can be upto 32672 bytes. 
> ([source|https://db.apache.org/derby/docs/10.7/ref/rrefsqlj41207.html])
> h2. Proposal
> Can these columns not use CLOB-like types as for example as used by 
> {{TBLS.VIEW_EXPANDED_TEXT}}? It would seem that suitable type equivalents 
> exist for all targeted database platforms:
> * MySQL: {{mediumtext}}
> * Postgres: {{text}}
> * Oracle: {{CLOB}}
> * Derby: {{LONG VARCHAR}}
> I'd suggest that the candidates for type change are:
> * {{COLUMNS_V2.TYPE_NAME}}
> * {{TABLE_PARAMS.PARAM_VALUE}}
> * {{SERDE_PARAMS.PARAM_VALUE}}
> * {{SD_PARAMS.PARAM_VALUE}}
> After updating the maximum length the metastore database needs to be 
> configured and restarted with the new settings. Altering {{MAX_STRING_SIZE}} 
> will update database objects and possibly invalidate them, as follows:
> * Tables with virtual columns will be updated with new data type metadata for 
> virtual columns of {{VARCHAR2(4000)}}, 4000-byte {{NVARCHAR2}}, or 
> {{RAW(2000)}} type.
> * Functional indexes will become unusable if a change to their associated 
> virtual columns causes the index key to exceed index key length limits. 
> Attempts to rebuild such indexes will fail with {{ORA-01450: maximum key 
> length exceeded}}.
> * Views will be invalidated if they contain {{VARCHAR2(4000)}}, 4000-byte 
> {{NVARCHAR2}}, or {{RAW(2000)}} typed expression columns.
> * Materialized views will be updated with new metadata {{VARCHAR2(4000)}}, 
> 4000-byte {{NVARCHAR2}}, and {{RAW(2000)}} typed expression columns
> * So the limitation could be raised to 32672 bytes, with the caveat that 
> MySQL and SQL Server limit the row length to 65535 bytes, so that should also 
> be validated to provide consistency.
> Finally, will this limitation persist in the work resulting from HIVE-9452?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16189) Table column stats might be invalidated in a failed table rename

2017-03-16 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928985#comment-15928985
 ] 

Pengcheng Xiong commented on HIVE-16189:


I saw your point. Thanks for the detailed explain in RB. And the test case 
proved this. Just one more point, could u add another test case which includes 
an encrypted partitioned table renaming but failed? Thanks.

> Table column stats might be invalidated in a failed table rename
> 
>
> Key: HIVE-16189
> URL: https://issues.apache.org/jira/browse/HIVE-16189
> Project: Hive
>  Issue Type: Bug
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-16189.1.patch, HIVE-16189.2.patch, HIVE-16189.patch
>
>
> If the table rename does not succeed due to its failure in moving the data to 
> the new renamed table folder, the changes in TAB_COL_STATS are not rolled 
> back which leads to invalid column stats.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-12274) Increase width of columns used for general configuration in the metastore.

2017-03-16 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928988#comment-15928988
 ] 

Sergey Shelukhin commented on HIVE-12274:
-

Recently the test runs were pretty clean... The test logs link above has the 
error:
{noformat}
testTableFilter(org.apache.hadoop.hive.metastore.TestSetUGIOnOnlyServer)  Time 
elapsed: 20.433 sec  <<< ERROR!
MetaException(message:Exception thrown when executing query : SELECT 
A0.TBL_NAME FROM TBLS A0 LEFT OUTER JOIN DBS B0 ON A0.DB_ID = B0.DB_ID INNER 
JOIN TABLE_PARAMS C0 ON A0.TBL_ID = C0.TBL_ID WHERE C0.PARAM_KEY = 
'test_param_2' AND B0."NAME" = ? AND C0.PARAM_VALUE = ?)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_names_by_filter_result$get_table_names_by_filter_resultStandardScheme.read(ThriftHiveMetastore.java:57344)
{noformat}
due to (in hive.log) {noformat}
java.sql.SQLSyntaxErrorException: Comparisons between 'CLOB (UCS_BASIC)' and 
'CLOB (UCS_BASIC)' are not supported. Types must be comparable. String types 
must also have matching collation. If collation does not match, a possible 
solution is to cast operands to force them to the default collation (e.g. 
SELECT tablename FROM sys.systables WHERE CAST(tablename AS VARCHAR(128)) = 
'T1')
{noformat}

Apparently, one can filter by table params in Hive! I never knew that.

> Increase width of columns used for general configuration in the metastore.
> --
>
> Key: HIVE-12274
> URL: https://issues.apache.org/jira/browse/HIVE-12274
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.0.0
>Reporter: Elliot West
>Assignee: Naveen Gangam
>  Labels: metastore
> Attachments: HIVE-12274.2.patch, HIVE-12274.3.patch, 
> HIVE-12274.4.patch, HIVE-12274.example.ddl.hql, HIVE-12274.patch
>
>
> h2. Overview
> This issue is very similar in principle to HIVE-1364. We are hitting a limit 
> when processing JSON data that has a large nested schema. The struct 
> definition is truncated when inserted into the metastore database column 
> {{COLUMNS_V2.YPE_NAME}} as it is greater than 4000 characters in length.
> Given that the purpose of these columns is to hold very loosely defined 
> configuration values it seems rather limiting to impose such a relatively low 
> length bound. One can imagine that valid use cases will arise where 
> reasonable parameter/property values exceed the current limit. 
> h2. Context
> These limitations were in by the [patch 
> attributed|https://github.com/apache/hive/commit/c21a526b0a752df2a51d20a2729cc8493c228799]
>  to HIVE-1364 which mentions the _"max length on Oracle 9i/10g/11g"_ as the 
> reason. However, nowadays the limit can be increased because:
> * Oracle DB's {{varchar2}} supports 32767 bytes now, by setting the 
> configuration parameter {{MAX_STRING_SIZE}} to {{EXTENDED}}. 
> ([source|http://docs.oracle.com/database/121/SQLRF/sql_elements001.htm#SQLRF55623])
> * Postgres supports a max of 1GB for {{character}} datatype. 
> ([source|http://www.postgresql.org/docs/8.3/static/datatype-character.html])
> * MySQL can support upto 65535 bytes for the entire row. So long as the 
> {{PARAM_KEY}} value + {{PARAM_VALUE}} is less than 65535, we should be good. 
> ([source|http://dev.mysql.com/doc/refman/5.0/en/char.html])
> * SQL Server's {{varchar}} max length is 8000 and can go beyond using 
> "varchar(max)" with the same limitation as MySQL being 65535 bytes for the 
> entire row. ([source|http://dev.mysql.com/doc/refman/5.0/en/char.html])
> * Derby's {{varchar}} can be upto 32672 bytes. 
> ([source|https://db.apache.org/derby/docs/10.7/ref/rrefsqlj41207.html])
> h2. Proposal
> Can these columns not use CLOB-like types as for example as used by 
> {{TBLS.VIEW_EXPANDED_TEXT}}? It would seem that suitable type equivalents 
> exist for all targeted database platforms:
> * MySQL: {{mediumtext}}
> * Postgres: {{text}}
> * Oracle: {{CLOB}}
> * Derby: {{LONG VARCHAR}}
> I'd suggest that the candidates for type change are:
> * {{COLUMNS_V2.TYPE_NAME}}
> * {{TABLE_PARAMS.PARAM_VALUE}}
> * {{SERDE_PARAMS.PARAM_VALUE}}
> * {{SD_PARAMS.PARAM_VALUE}}
> After updating the maximum length the metastore database needs to be 
> configured and restarted with the new settings. Altering {{MAX_STRING_SIZE}} 
> will update database objects and possibly invalidate them, as follows:
> * Tables with virtual columns will be updated with new data type metadata for 
> virtual columns of {{VARCHAR2(4000)}}, 4000-byte {{NVARCHAR2}}, or 
> {{RAW(2000)}} type.
> * Functional indexes will become unusable if a change to their associated 
> virtual columns causes the index key to exceed index key length limits. 
> Attempts to rebuild such indexes will fail with {{ORA-01

[jira] [Updated] (HIVE-16232) Support stats computation for column in QuotedIdentifier

2017-03-16 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16232:
---
Status: Patch Available  (was: Open)

> Support stats computation for column in QuotedIdentifier 
> -
>
> Key: HIVE-16232
> URL: https://issues.apache.org/jira/browse/HIVE-16232
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16232.01.patch
>
>
> right now if a column contains double quotes ``, we can not compute its stats.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16232) Support stats computation for column in QuotedIdentifier

2017-03-16 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16232:
---
Attachment: HIVE-16232.01.patch

> Support stats computation for column in QuotedIdentifier 
> -
>
> Key: HIVE-16232
> URL: https://issues.apache.org/jira/browse/HIVE-16232
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16232.01.patch
>
>
> right now if a column contains double quotes ``, we can not compute its stats.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16178) corr/covar_samp UDAF standard compliance

2017-03-16 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-16178:

Attachment: HIVE-16178.1.patch

patch#1

* corr(x,y) => corr(y,x) ; change inner parameter order
* covar_samp - singleton set is now null
* corr doesnt return NaN anymore
* slope_intercept doesnt return NaN

I've checked the qtest on livesql.oracle.com, and the results are ok.

> corr/covar_samp UDAF standard compliance
> 
>
> Key: HIVE-16178
> URL: https://issues.apache.org/jira/browse/HIVE-16178
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Minor
> Attachments: HIVE-16178.1.patch
>
>
> h3. corr
> the standard defines corner cases when it should return null - but the 
> current result is NaN.
> If N * SUMX2 equals SUMX * SUMX , then the result is the null value.
> and
> If N * SUMY2 equals SUMY * SUMY , then the result is the null value.
> h3. covar_samp
> returns 0 instead 1
> `If N is 1 (one), then the result is the null value.`
> h3. check (x,y) vs (y,x) args in docs
> the standard uses (y,x) order; and some of the function names are also 
> contain X and Y...so the order does matter..currently at least corr uses 
> (x,y) order which is okay - because its symmetric; but it would be great to 
> have the same order everywhere (check others)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15849) hplsql should add enterGlobalScope func to UDF

2017-03-16 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-15849:
--
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

> hplsql should add enterGlobalScope func to UDF
> --
>
> Key: HIVE-15849
> URL: https://issues.apache.org/jira/browse/HIVE-15849
> Project: Hive
>  Issue Type: Bug
>  Components: hpl/sql
>Affects Versions: 2.2.0
>Reporter: Fei Hui
>Assignee: Fei Hui
> Fix For: 2.2.0
>
> Attachments: HIVE-15849.1.patch, HIVE-15849.patch
>
>
> code in Udf.java
> {code:title=Udf.java|borderStyle=solid}
> if (exec == null) {
>   exec = new Exec();
>   String query = queryOI.getPrimitiveJavaObject(arguments[0].get());
>   String[] args = { "-e", query, "-trace" };
>   try {
> exec.setUdfRun(true);
> exec.init(args);
>   } catch (Exception e) {
> throw new HiveException(e.getMessage());
>   }
> }
> if (arguments.length > 1) {
>   setParameters(arguments);
> }
> Var result = exec.run();
> if (result != null) {
>   return result.toString();
> }
> {code}
> Here is my thoughts
> {quote}
> we should add 'exec.enterGlobalScope(); '  between 'exec = new Exec();' and 
> 'setParameters(arguments);'
> Because if we do not call exec.enterGlobalScope(),  setParameters(arguments) 
> will useless. Vars are not added into scope , but exec.run() will use vars 
> which we set. The vars are parameters passed to UDF, [, :1, :2, ...n] which 
> are description in Udf.java
> {quote}
> Before add this function, the result as follow. we get the wrong result, 
> because the result contains  empty string  
> {quote}
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting query
> Query executed successfully (2.30 sec)
> Ln:8 SELECT completed successfully
> Ln:8 Standalone SELECT executed: 1 columns in the result set
> Hello, !
> Hello, !
> {quote}
> After add this function, we get the right result
> {quote}
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting query
> Query executed successfully (2.35 sec)
> Ln:8 SELECT completed successfully
> Ln:8 Standalone SELECT executed: 1 columns in the result set
> Hello, fei!
> Hello, fei!
> {quote}
> tests come from http://www.hplsql.org/udf



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15849) hplsql should add enterGlobalScope func to UDF

2017-03-16 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15929023#comment-15929023
 ] 

Alan Gates commented on HIVE-15849:
---

Patch 1 committed to master.  Thanks Fei for the patch, and for your patience 
in us taking a while to get this in.

> hplsql should add enterGlobalScope func to UDF
> --
>
> Key: HIVE-15849
> URL: https://issues.apache.org/jira/browse/HIVE-15849
> Project: Hive
>  Issue Type: Bug
>  Components: hpl/sql
>Affects Versions: 2.2.0
>Reporter: Fei Hui
>Assignee: Fei Hui
> Fix For: 2.2.0
>
> Attachments: HIVE-15849.1.patch, HIVE-15849.patch
>
>
> code in Udf.java
> {code:title=Udf.java|borderStyle=solid}
> if (exec == null) {
>   exec = new Exec();
>   String query = queryOI.getPrimitiveJavaObject(arguments[0].get());
>   String[] args = { "-e", query, "-trace" };
>   try {
> exec.setUdfRun(true);
> exec.init(args);
>   } catch (Exception e) {
> throw new HiveException(e.getMessage());
>   }
> }
> if (arguments.length > 1) {
>   setParameters(arguments);
> }
> Var result = exec.run();
> if (result != null) {
>   return result.toString();
> }
> {code}
> Here is my thoughts
> {quote}
> we should add 'exec.enterGlobalScope(); '  between 'exec = new Exec();' and 
> 'setParameters(arguments);'
> Because if we do not call exec.enterGlobalScope(),  setParameters(arguments) 
> will useless. Vars are not added into scope , but exec.run() will use vars 
> which we set. The vars are parameters passed to UDF, [, :1, :2, ...n] which 
> are description in Udf.java
> {quote}
> Before add this function, the result as follow. we get the wrong result, 
> because the result contains  empty string  
> {quote}
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting query
> Query executed successfully (2.30 sec)
> Ln:8 SELECT completed successfully
> Ln:8 Standalone SELECT executed: 1 columns in the result set
> Hello, !
> Hello, !
> {quote}
> After add this function, we get the right result
> {quote}
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting query
> Query executed successfully (2.35 sec)
> Ln:8 SELECT completed successfully
> Ln:8 Standalone SELECT executed: 1 columns in the result set
> Hello, fei!
> Hello, fei!
> {quote}
> tests come from http://www.hplsql.org/udf



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink

2017-03-16 Thread Kalyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15929039#comment-15929039
 ] 

Kalyan commented on HIVE-15691:
---

Hi [~ekoifman], [~roshan_naik]

can you please take a look on above patch.

Thanks


> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
> -
>
> Key: HIVE-15691
> URL: https://issues.apache.org/jira/browse/HIVE-15691
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Transactions
>Reporter: Kalyan
>Assignee: Kalyan
> Attachments: HIVE-15691.1.patch, HIVE-15691.2.patch, 
> HIVE-15691.patch, HIVE-15691-updated.patch
>
>
> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink.
> It is similar to StrictJsonWriter available in hive.
> Dependency is there in flume to commit.
> FLUME-3036 : Create a RegexSerializer for Hive Sink.
> Patch is available for Flume, Please verify the below link
> https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-6590) Hive does not work properly with boolean partition columns (wrong results and inserts to incorrect HDFS path)

2017-03-16 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15929043#comment-15929043
 ] 

Zoltan Haindrich commented on HIVE-6590:


[~ashutoshc] Could you please take a look at my last two comments?

> Hive does not work properly with boolean partition columns (wrong results and 
> inserts to incorrect HDFS path)
> -
>
> Key: HIVE-6590
> URL: https://issues.apache.org/jira/browse/HIVE-6590
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema, Metastore
>Affects Versions: 0.10.0
>Reporter: Lenni Kuff
>Assignee: Zoltan Haindrich
> Attachments: HIVE-6590.1.patch, HIVE-6590.2.patch, HIVE-6590.3.patch, 
> HIVE-6590.4.patch
>
>
> Hive does not work properly with boolean partition columns. Queries return 
> wrong results and also insert to incorrect HDFS paths.
> {code}
> create table bool_part(int_col int) partitioned by(bool_col boolean);
> # This works, creating 3 unique partitions!
> ALTER TABLE bool_table ADD PARTITION (bool_col=FALSE);
> ALTER TABLE bool_table ADD PARTITION (bool_col=false);
> ALTER TABLE bool_table ADD PARTITION (bool_col=False);
> {code}
> The first problem is that Hive cannot filter on a bool partition key column. 
> "select * from bool_part" returns the correct results, but if you apply a 
> filter on the bool partition key column hive won't return any results.
> The second problem is that Hive seems to just call "toString()" on the 
> boolean literal value. This means you can end up with multiple partitions 
> (FALSE, false, FaLSE, etc) mapping to the literal value 'FALSE'. For example, 
> if you can add three partition in have for the same logic value "false" doing:
> ALTER TABLE bool_table ADD PARTITION (bool_col=FALSE) -> 
> /test-warehouse/bool_table/bool_col=FALSE/
> ALTER TABLE bool_table ADD PARTITION (bool_col=false) -> 
> /test-warehouse/bool_table/bool_col=false/
> ALTER TABLE bool_table ADD PARTITION (bool_col=False) -> 
> /test-warehouse/bool_table/bool_col=False/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16164) Provide mechanism for passing HMS notification ID between transactional and non-transactional listeners.

2017-03-16 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-16164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-16164:
---
Attachment: HIVE-16164.3.patch

@mohit [~akolb] I ended up adding a new 'parameters' field to the 
ListenerEvent. I couldn't use an integer variable for the eventId because 2 or 
more transactions will update the same variable on the ListenerEvent. 

Also, there are some places where the ListenerEvent is re-used (such as 
create_database_core), but others cannot re-use it (such as 
exchange_partition). The exchange was a little more tricky because it calls one 
DROP event per partition deleted, so I had to track the eventID for each 
DROP_PARTITION  event, and then call the non-transactional listeners with the 
correct eventId for the partition.

Could you review the patch?

> Provide mechanism for passing HMS notification ID between transactional and 
> non-transactional listeners.
> 
>
> Key: HIVE-16164
> URL: https://issues.apache.org/jira/browse/HIVE-16164
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-16164.1.patch, HIVE-16164.2.patch, 
> HIVE-16164.3.patch
>
>
> The HMS DB notification listener currently stores an event ID on the HMS 
> backend DB so that external applications (such as backup apps) can request 
> incremental notifications based on the last event ID requested.
> The HMS DB notification and backup applications are asynchronous. However, 
> there are sometimes that applications may be required to be in sync with the 
> latest HMS event in order to process an action. These applications will 
> provide a listener implementation that is called by the HMS after an HMS 
> transaction happened.
> The problem is that the listener running after the transaction (or during the 
> non-transactional context) may need the DB event ID in order to sync all 
> events happened previous to that event ID, but this ID is never passed to the 
> non-transactional listeners.
> We can pass this event information through the EnvironmentContext found on 
> each ListenerEvent implementations (such as CreateTableEvent), and send the 
> EnvironmentContext to the non-transactional listeners to get the event ID.
> The DbNotificactionListener already knows the event ID after calling the 
> ObjectStore.addNotificationEvent(). We just need to set this event ID to the 
> EnvironmentContext from each of the event notifications and make sure that 
> this EnvironmentContext is sent to the non-transactional listeners.
> Here's the code example when creating a table on {{create_table_core}}:
> {noformat}
>  ms.createTable(tbl);
>   if (transactionalListeners.size() > 0) {
> CreateTableEvent createTableEvent = new CreateTableEvent(tbl, true, this);
> createTableEvent.setEnvironmentContext(envContext);
> for (MetaStoreEventListener transactionalListener : 
> transactionalListeners) {
>   transactionalListener.onCreateTable(createTableEvent); // <- 
> Here the notification ID is generated
> }
>   }
>   success = ms.commitTransaction();
> } finally {
>   if (!success) {
> ms.rollbackTransaction();
> if (madeDir) {
>   wh.deleteDir(tblPath, true);
> }
>   }
>   for (MetaStoreEventListener listener : listeners) {
> CreateTableEvent createTableEvent =
> new CreateTableEvent(tbl, success, this);
> createTableEvent.setEnvironmentContext(envContext);
> listener.onCreateTable(createTableEvent);// <- 
> Here we would like to consume notification ID
>   }
> {noformat}
> We could use a specific key name that will be used on the EnvironmentContext, 
> such as DB_NOTIFICATION_EVENT_ID.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16189) Table column stats might be invalidated in a failed table rename

2017-03-16 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15929059#comment-15929059
 ] 

Chaoyu Tang commented on HIVE-16189:


Currently the partition column stats are dropped no matter its table rename 
succeeds or fails (see HIVE-16147). I am going to address that issue, together 
with some other issues particularly related to the partition table stats, in 
HIVE-16147. Will that makes sense? Thanks.

> Table column stats might be invalidated in a failed table rename
> 
>
> Key: HIVE-16189
> URL: https://issues.apache.org/jira/browse/HIVE-16189
> Project: Hive
>  Issue Type: Bug
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-16189.1.patch, HIVE-16189.2.patch, HIVE-16189.patch
>
>
> If the table rename does not succeed due to its failure in moving the data to 
> the new renamed table folder, the changes in TAB_COL_STATS are not rolled 
> back which leads to invalid column stats.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16189) Table column stats might be invalidated in a failed table rename

2017-03-16 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-16189:
---
Attachment: HIVE-16189.2.patch

For unknown reasons, the precommit build tests were not run for the new patch. 
Reattach patch to trigger the tests.

> Table column stats might be invalidated in a failed table rename
> 
>
> Key: HIVE-16189
> URL: https://issues.apache.org/jira/browse/HIVE-16189
> Project: Hive
>  Issue Type: Bug
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-16189.1.patch, HIVE-16189.2.patch, 
> HIVE-16189.2.patch, HIVE-16189.patch
>
>
> If the table rename does not succeed due to its failure in moving the data to 
> the new renamed table folder, the changes in TAB_COL_STATS are not rolled 
> back which leads to invalid column stats.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16236) BuddyAllocator fragmentation - short-term fix

2017-03-16 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-16236:
---


> BuddyAllocator fragmentation - short-term fix
> -
>
> Key: HIVE-16236
> URL: https://issues.apache.org/jira/browse/HIVE-16236
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16236) BuddyAllocator fragmentation - short-term fix

2017-03-16 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-16236:

Description: The existing hacky code is not hacky enough.

> BuddyAllocator fragmentation - short-term fix
> -
>
> Key: HIVE-16236
> URL: https://issues.apache.org/jira/browse/HIVE-16236
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> The existing hacky code is not hacky enough.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16237) BuddyAllocator fragmentation - long-term fix

2017-03-16 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-16237:
---


> BuddyAllocator fragmentation - long-term fix
> 
>
> Key: HIVE-16237
> URL: https://issues.apache.org/jira/browse/HIVE-16237
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> The existing hacky code is not hacky enough.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16237) BuddyAllocator fragmentation - long-term fix

2017-03-16 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-16237:

Description: 
We were initially planning to write a different allocator with compaction, but 
now we are stuck with buddyallocator.
There are two approaches - compaction on top of buddy allocator (should be 
fairly simple actually, aside from indirection... luckily for the latter, we 
can take the buffer out in most places and we can reuse the existing 
locking-to-prevent-eviction mechanism to also prevent reallocation); or writing 
a better allocator. Probably the former is preferred, need to think about the 
latter.

  was:The existing hacky code is not hacky enough.


> BuddyAllocator fragmentation - long-term fix
> 
>
> Key: HIVE-16237
> URL: https://issues.apache.org/jira/browse/HIVE-16237
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> We were initially planning to write a different allocator with compaction, 
> but now we are stuck with buddyallocator.
> There are two approaches - compaction on top of buddy allocator (should be 
> fairly simple actually, aside from indirection... luckily for the latter, we 
> can take the buffer out in most places and we can reuse the existing 
> locking-to-prevent-eviction mechanism to also prevent reallocation); or 
> writing a better allocator. Probably the former is preferred, need to think 
> about the latter.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16236) BuddyAllocator fragmentation - short-term fix

2017-03-16 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-16236:

Attachment: HIVE-16236.patch

> BuddyAllocator fragmentation - short-term fix
> -
>
> Key: HIVE-16236
> URL: https://issues.apache.org/jira/browse/HIVE-16236
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16236.patch
>
>
> The existing hacky code is not hacky enough.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15766) DBNotificationlistener leaks JDOPersistenceManager

2017-03-16 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-15766:

Status: Open  (was: Patch Available)

> DBNotificationlistener leaks JDOPersistenceManager
> --
>
> Key: HIVE-15766
> URL: https://issues.apache.org/jira/browse/HIVE-15766
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-15766.1.patch, HIVE-15766.2.patch, 
> HIVE-15766.3.patch, HIVE-15766.4.patch, HIVE-15766.4.patch, HIVE-15766.5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16189) Table column stats might be invalidated in a failed table rename

2017-03-16 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15929097#comment-15929097
 ] 

Pengcheng Xiong commented on HIVE-16189:


+1. Thanks a lot for the detailed analysis.

> Table column stats might be invalidated in a failed table rename
> 
>
> Key: HIVE-16189
> URL: https://issues.apache.org/jira/browse/HIVE-16189
> Project: Hive
>  Issue Type: Bug
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-16189.1.patch, HIVE-16189.2.patch, 
> HIVE-16189.2.patch, HIVE-16189.patch
>
>
> If the table rename does not succeed due to its failure in moving the data to 
> the new renamed table folder, the changes in TAB_COL_STATS are not rolled 
> back which leads to invalid column stats.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16236) BuddyAllocator fragmentation - short-term fix

2017-03-16 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15929110#comment-15929110
 ] 

Sergey Shelukhin commented on HIVE-16236:
-

[~gopalv] [~prasanth_j] [~gopalv] can you take a look? thanks

> BuddyAllocator fragmentation - short-term fix
> -
>
> Key: HIVE-16236
> URL: https://issues.apache.org/jira/browse/HIVE-16236
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16236.patch
>
>
> The existing hacky code is not hacky enough.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


  1   2   >