[jira] [Updated] (HIVE-14455) upgrade httpclient, httpcore to match updated hadoop dependency

2016-11-22 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-14455:
-
Attachment: HIVE-14455.1.patch

Attaching same file to kick off ptest again

> upgrade httpclient, httpcore to match updated hadoop dependency
> ---
>
> Key: HIVE-14455
> URL: https://issues.apache.org/jira/browse/HIVE-14455
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-14455.1.patch, HIVE-14455.1.patch
>
>
> Hive was having a newer version of httpclient and httpcore since 1.2.0 
> (HIVE-9709), when compared to Hadoop 2.x versions, to be able to make use of 
> newer apis in httpclient 4.4.
> There was  security issue in the older version of httpclient and httpcore 
> that hadoop was using, and as a result moved to httpclient  4.5.2  and 
> httpcore 4.4.4 (HADOOP-12767).
> As hadoop was using the older version of these libraries and they often end 
> up earlier in the classpath, we have had bunch of difficulties in different 
> environments with class/method not found errors. 
> Now, that hadoops dependencies in versions with security fix are newer and 
> have the API that hive needs, we can be on the same version. For older 
> versions of hadoop this version update doesn't matter as the difference is 
> already there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15257) remove useless cleanupReaders in OrcEncodedDataReader

2016-11-22 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-15257:
-
Assignee: Fei Hui

> remove useless cleanupReaders in OrcEncodedDataReader
> -
>
> Key: HIVE-15257
> URL: https://issues.apache.org/jira/browse/HIVE-15257
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.1.0
>Reporter: Fei Hui
>Assignee: Fei Hui
> Attachments: HIVE-15257.1.patch
>
>
> it has cleanupReaders() in processStop()
> cleanupReaders is useless in if(processStop()){} block



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15135) Add an llap mode which fails if queries cannot run in llap

2016-11-22 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15686051#comment-15686051
 ] 

Lefty Leverenz commented on HIVE-15135:
---

Doc note:  This adds the new value "only" to *hive.llap.execution.mode* so the 
wiki needs to be updated.  The LLAP design doc also should explain all of the 
values.

* [Configuration Properties -- hive.llap.execution.mode | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.llap.execution.mode]
* [LLAP | https://cwiki.apache.org/confluence/display/Hive/LLAP]

Added a TODOC2.2 label.

> Add an llap mode which fails if queries cannot run in llap
> --
>
> Key: HIVE-15135
> URL: https://issues.apache.org/jira/browse/HIVE-15135
> Project: Hive
>  Issue Type: Task
>  Components: llap
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-15135.01.patch, HIVE-15135.02.patch, 
> HIVE-15135.03.patch, HIVE-15135.04.patch
>
>
> ALL currently ends up launching new containers for queries which cannot run 
> in llap.
> There should be a mode where these queries don't run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15257) remove useless cleanupReaders in OrcEncodedDataReader

2016-11-22 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-15257:
-
Affects Version/s: 2.2.0
   Status: Patch Available  (was: Open)

> remove useless cleanupReaders in OrcEncodedDataReader
> -
>
> Key: HIVE-15257
> URL: https://issues.apache.org/jira/browse/HIVE-15257
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Fei Hui
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-15257.1.patch
>
>
> it has cleanupReaders() in processStop()
> cleanupReaders is useless in if(processStop()){} block



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15257) remove useless cleanupReaders in OrcEncodedDataReader

2016-11-22 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15686052#comment-15686052
 ] 

Prasanth Jayachandran commented on HIVE-15257:
--

lgtm, +1. Pending tests

> remove useless cleanupReaders in OrcEncodedDataReader
> -
>
> Key: HIVE-15257
> URL: https://issues.apache.org/jira/browse/HIVE-15257
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Fei Hui
>Assignee: Fei Hui
> Attachments: HIVE-15257.1.patch
>
>
> it has cleanupReaders() in processStop()
> cleanupReaders is useless in if(processStop()){} block



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15257) remove useless cleanupReaders in OrcEncodedDataReader

2016-11-22 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-15257:
-
Assignee: Fei Hui  (was: Prasanth Jayachandran)

> remove useless cleanupReaders in OrcEncodedDataReader
> -
>
> Key: HIVE-15257
> URL: https://issues.apache.org/jira/browse/HIVE-15257
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Fei Hui
>Assignee: Fei Hui
> Attachments: HIVE-15257.1.patch
>
>
> it has cleanupReaders() in processStop()
> cleanupReaders is useless in if(processStop()){} block



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-15257) remove useless cleanupReaders in OrcEncodedDataReader

2016-11-22 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran reassigned HIVE-15257:


Assignee: Prasanth Jayachandran  (was: Fei Hui)

> remove useless cleanupReaders in OrcEncodedDataReader
> -
>
> Key: HIVE-15257
> URL: https://issues.apache.org/jira/browse/HIVE-15257
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Fei Hui
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-15257.1.patch
>
>
> it has cleanupReaders() in processStop()
> cleanupReaders is useless in if(processStop()){} block



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15259) The deserialization time of HOS20 is longer than what in HOS16

2016-11-22 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated HIVE-15259:

Attachment: Deserialization_HOS20.PNG
Deserialization_HOS16.PNG

[~xuefuz] , [~lirui] and [~Ferd]: please help see this problem.
I guess the problem is because before we link spark-assembly.jar to the 
$HIVE_HOME/lib/ while in latest code we need copy all jars from 
$SPARK_HOME/jars/ to $HIVE_HOME/lib/.

HOS20 log
{code}
2016-11-22T00:50:41,710  INFO [stderr-redir-1] client.SparkClientImpl: 16/11/22 
00:50:41 INFO yarn.Client: Uploading resource 
file:/tmp/spark-fe2aeecc-12a7-427f-9a5d-cf6e7335bf46/__spark_libs__3968994973591034858.zip
 -> 
hdfs://bdpe42:8020/user/root/.sparkStaging/application_1479702875308_0033/__spark_libs__3968994973591034858.zip
2016-11-22T00:50:42,376  INFO [stderr-redir-1] client.SparkClientImpl: 16/11/22 
00:50:42 INFO yarn.Client: Uploading resource 
file:/home/apache-hive-2.2.0-SNAPSHOT-bin/lib/hive-exec-2.2.0-SNAPSHOT.jar -> 
hdfs://bdpe42:8020/user/root/.sparkStaging/application_1479702875308_0033/hive-exec-2.2.0-SNAPSHOT.jar
2016-11-22T00:50:42,542  INFO [stderr-redir-1] client.SparkClientImpl: 16/11/22 
00:50:42 INFO yarn.Client: Uploading resource 
file:/tmp/spark-fe2aeecc-12a7-427f-9a5d-cf6e7335bf46/__spark_conf__7123360802254473357.zip
 -> 
hdfs://bdpe42:8020/user/root/.sparkStaging/application_1479702875308_0033/__spark_conf__.zip
{code}


HOS16 log
{code}
yarn.Client: Uploading resource 
file:/home/spark16/spark-1.6.2-bin-hadoop2-without-hive/lib/spark-assembly-1.6.2-hadoop2.6.0.jar
 -> hdfs://bdpe42:8020/user/root/.sparkStaging/application_1
479702875308_0034/spark-assembly-1.6.2-hadoop2.6.0.jar
777 2016-11-22T00:55:30,145  INFO [stderr-redir-1] client.SparkClientImpl: 
16/11/22 00:55:30 INFO yarn.Client: Uploading resource file:/home/spar
k16/spark16-apache-hive-2.2.0-SNAPSHOT-bin/lib/hive-exec-2.2.0-SNAPSHOT.jar -> 
hdfs://bdpe42:8020/user/root/.sparkStaging/application_1479702
875308_0034/hive-exec-2.2.0-SNAPSHOT.jar
778 2016-11-22T00:55:30,325  INFO [stderr-redir-1] client.SparkClientImpl: 
16/11/22 00:55:30 INFO yarn.Client: Uploading resource file:/tmp/spark
-35202f4e-8054-47a1-ae50-0c2468f374f6/__spark_conf__611310910041274518.zip -> 
hdfs://bdpe42:8020/user/root/.sparkStaging/application_14797028
75308_0034/__spark_conf__611310910041274518.zip
{code}

Above is log snippet when i run HOS20 and HOS16.
in HOS20, it uploads 
/tmp/spark-fe2aeecc-12a7-427f-9a5d-cf6e7335bf46/__spark_libs__3968994973591034858.zip
 to hdfs while 
in HOS16, it uploads 
/home/spark16/spark-1.6.2-bin-hadoop2-without-hive/lib/spark-assembly-1.6.2-hadoop2.6.0.jar
 to hdfs.

*Not* clear spark will put *which* jars to  
/tmp/spark-fe2aeecc-12a7-427f-9a5d-cf6e7335bf46/__spark_libs__3968994973591034858.zip
 in HOS20. i will investigate on it.



> The deserialization time of HOS20 is longer than what in  HOS16
> ---
>
> Key: HIVE-15259
> URL: https://issues.apache.org/jira/browse/HIVE-15259
> Project: Hive
>  Issue Type: Improvement
>Reporter: liyunzhang_intel
> Attachments: Deserialization_HOS16.PNG, Deserialization_HOS20.PNG
>
>
> deploy Hive on Spark on spark 1.6 version and spark 2.0 version.
> run query and in latest code(with spark2.0) the deserialization time of a 
> task is 4 sec while the deserialization time of spark1.6 is 1 sec. The detail 
> is in attached picture.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15148) disallow loading data into bucketed tables (by default)

2016-11-22 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-15148:
--
Labels: TODOC2.2  (was: )

> disallow loading data into bucketed tables (by default)
> ---
>
> Key: HIVE-15148
> URL: https://issues.apache.org/jira/browse/HIVE-15148
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-15148.01.patch, HIVE-15148.02.patch, 
> HIVE-15148.03.patch, HIVE-15148.04.patch, HIVE-15148.patch
>
>
> A few q file tests still use the following, allowed, pattern:
> {noformat}
> CREATE TABLE bucket_small (key string, value string) partitioned by (ds 
> string) CLUSTERED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
> load data local inpath '../../data/files/smallsrcsortbucket1outof4.txt' INTO 
> TABLE bucket_small partition(ds='2008-04-08');
> load data local inpath '../../data/files/smallsrcsortbucket2outof4.txt' INTO 
> TABLE bucket_small partition(ds='2008-04-08');
> {noformat}
> This relies on the user to load the correct number of files with correctly 
> hashed data and the correct order of file names; if there's some discrepancy 
> in any of the above, the queries will fail or may produce incorrect results 
> if some bucket-based optimizations kick in.
> Additionally, even if the user does everything correctly, as far as I know 
> some code derives bucket number from file name, which won't work in this case 
> (as opposed to getting buckets based on the order of files, which will work 
> here but won't work as per  HIVE-14970... sigh).
> Hive enforces bucketing in other cases (the check cannot even be disabled 
> these days), so I suggest that we either prohibit the above outright, or at 
> least add a safety config setting that would disallow it by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15148) disallow loading data into bucketed tables (by default)

2016-11-22 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15686088#comment-15686088
 ] 

Lefty Leverenz commented on HIVE-15148:
---

Doc note:  This adds *hive.strict.checks.bucketing* to HiveConf.java and 
changes the description of *hive.strict.checks.cartesian.product* (which isn't 
documented in the wiki yet -- see HIVE-12727).

* [Configuration Properties -- Query and DDL Execution | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution]

Added a TODOC2.2 label.

> disallow loading data into bucketed tables (by default)
> ---
>
> Key: HIVE-15148
> URL: https://issues.apache.org/jira/browse/HIVE-15148
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-15148.01.patch, HIVE-15148.02.patch, 
> HIVE-15148.03.patch, HIVE-15148.04.patch, HIVE-15148.patch
>
>
> A few q file tests still use the following, allowed, pattern:
> {noformat}
> CREATE TABLE bucket_small (key string, value string) partitioned by (ds 
> string) CLUSTERED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
> load data local inpath '../../data/files/smallsrcsortbucket1outof4.txt' INTO 
> TABLE bucket_small partition(ds='2008-04-08');
> load data local inpath '../../data/files/smallsrcsortbucket2outof4.txt' INTO 
> TABLE bucket_small partition(ds='2008-04-08');
> {noformat}
> This relies on the user to load the correct number of files with correctly 
> hashed data and the correct order of file names; if there's some discrepancy 
> in any of the above, the queries will fail or may produce incorrect results 
> if some bucket-based optimizations kick in.
> Additionally, even if the user does everything correctly, as far as I know 
> some code derives bucket number from file name, which won't work in this case 
> (as opposed to getting buckets based on the order of files, which will work 
> here but won't work as per  HIVE-14970... sigh).
> Hive enforces bucketing in other cases (the check cannot even be disabled 
> these days), so I suggest that we either prohibit the above outright, or at 
> least add a safety config setting that would disallow it by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12727) refactor Hive strict checks to be more granular, allow order by no limit and no partition filter by default for now

2016-11-22 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15686092#comment-15686092
 ] 

Lefty Leverenz commented on HIVE-12727:
---

HIVE-15148 changes the description of *hive.strict.checks.cartesian.product* in 
release 2.2.0.

> refactor Hive strict checks to be more granular, allow order by no limit and 
> no partition filter by default for now
> ---
>
> Key: HIVE-12727
> URL: https://issues.apache.org/jira/browse/HIVE-12727
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Blocker
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: HIVE-12727.01.patch, HIVE-12727.02.patch, 
> HIVE-12727.03.patch, HIVE-12727.04.patch, HIVE-12727.05.patch, 
> HIVE-12727.06.patch, HIVE-12727.07.patch, HIVE-12727.patch
>
>
> Making strict mode the default recently appears to have broken many normal 
> queries, such as some TPCDS benchmark queries, e.g. Q85:
> Response message: org.apache.hive.service.cli.HiveSQLException: Error while 
> compiling statement: FAILED: SemanticException [Error 10041]: No partition 
> predicate found for Alias "web_sales" Table "web_returns"
> We should remove this restriction from strict mode, or change the default 
> back to non-strict. Perhaps make a 3-value parameter, nonstrict, semistrict, 
> and strict, for backward compat for people who are relying on strict already.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14982) Remove some reserved keywords in Hive 2.2

2016-11-22 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-14982:
--
Labels: TODOC2.2  (was: )

> Remove some reserved keywords in Hive 2.2
> -
>
> Key: HIVE-14982
> URL: https://issues.apache.org/jira/browse/HIVE-14982
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-14982.01.patch
>
>
> It seems that CACHE, DAYOFWEEK, VIEWS are reserved keywords in master. This 
> conflicts with SQL2011 standard.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15260) Auto delete old log files of hive services

2016-11-22 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15686121#comment-15686121
 ] 

Thejas M Nair commented on HIVE-15260:
--

The default settings in common/src/main/resources/hive-log4j2.properties need 
to be updated.


> Auto delete old log files of hive services
> --
>
> Key: HIVE-15260
> URL: https://issues.apache.org/jira/browse/HIVE-15260
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore
>Reporter: Thejas M Nair
>
> Hive log4j settings rotate the old log files by date, but they don't delete 
> the old log files.
> It would be good to delete the old log files so that the space used doesn't 
> keep increasing for ever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15260) Auto delete old log files of hive services

2016-11-22 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15686125#comment-15686125
 ] 

Prasanth Jayachandran commented on HIVE-15260:
--

https://github.com/apache/hive/blob/master/common/src/main/resources/hive-log4j2.properties#L51
 gets deleted after 30 days.

> Auto delete old log files of hive services
> --
>
> Key: HIVE-15260
> URL: https://issues.apache.org/jira/browse/HIVE-15260
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore
>Reporter: Thejas M Nair
>
> Hive log4j settings rotate the old log files by date, but they don't delete 
> the old log files.
> It would be good to delete the old log files so that the space used doesn't 
> keep increasing for ever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14982) Remove some reserved keywords in Hive 2.2

2016-11-22 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15686127#comment-15686127
 ] 

Lefty Leverenz commented on HIVE-14982:
---

Doc note:  The table of keywords needs to be updated in the DDL doc.  DAYOFWEEK 
and VIEWS were added in 2.2.0 so they just need to be deleted from the 2.2.0 
row, but CACHE was added in 2.1.0 so a "removed" line needs to be added to the 
2.2.0 row.

* [DDL -- Reserved Keywords | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ReservedKeywords]

Added a TODOC2.2 label.

> Remove some reserved keywords in Hive 2.2
> -
>
> Key: HIVE-14982
> URL: https://issues.apache.org/jira/browse/HIVE-14982
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-14982.01.patch
>
>
> It seems that CACHE, DAYOFWEEK, VIEWS are reserved keywords in master. This 
> conflicts with SQL2011 standard.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-14816) Remove Ant jars from Hive installation

2016-11-22 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek reassigned HIVE-14816:
--

Assignee: anishek

> Remove Ant jars from Hive installation
> --
>
> Key: HIVE-14816
> URL: https://issues.apache.org/jira/browse/HIVE-14816
> Project: Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Affects Versions: 1.2.1
>Reporter: Vipin Rathor
>Assignee: anishek
>Priority: Minor
>
> Apache Ant jars are build-time jars for Hive project and not actually 
> required to be packaged with Hive libraries. The jars in questions are 
> ant-launcher-1.9.1.jar and ant-1.9.1.jar which get installed in $HIVE_LIB. 
> We'll need to remove these superfluous jar files.
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-15260) Auto delete old log files of hive services

2016-11-22 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek reassigned HIVE-15260:
--

Assignee: anishek

> Auto delete old log files of hive services
> --
>
> Key: HIVE-15260
> URL: https://issues.apache.org/jira/browse/HIVE-15260
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore
>Reporter: Thejas M Nair
>Assignee: anishek
>
> Hive log4j settings rotate the old log files by date, but they don't delete 
> the old log files.
> It would be good to delete the old log files so that the space used doesn't 
> keep increasing for ever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15258) Enable CBO on queries involving interval literals

2016-11-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15686155#comment-15686155
 ] 

Hive QA commented on HIVE-15258:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12839953/HIVE-15258.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 17 failed/errored test(s), 10716 tests 
executed
*Failed tests:*
{noformat}
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=122)

[table_access_keys_stats.q,bucketmapjoin11.q,auto_join4.q,join34.q,nullgroup.q,mergejoins_mixed.q,sort.q,join_nullsafe.q,stats8.q,auto_join28.q,join17.q,union17.q,skewjoinopt11.q,groupby1_map.q,load_dyn_part11.q]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[interval_comparison] 
(batchId=68)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[interval_udf] 
(batchId=23)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=133)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[current_date_timestamp]
 (batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_interval_1]
 (batchId=138)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_interval_arithmetic]
 (batchId=136)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=91)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join_without_localtask]
 (batchId=92)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_sortmerge_join_3]
 (batchId=92)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby6_noskew] 
(batchId=92)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[mapjoin_test_outer] 
(batchId=92)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[merge2] (batchId=92)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[smb_mapjoin_11] 
(batchId=92)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_9] 
(batchId=92)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2237/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2237/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2237/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 17 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12839953 - PreCommit-HIVE-Build

> Enable CBO on queries involving interval literals
> -
>
> Key: HIVE-15258
> URL: https://issues.apache.org/jira/browse/HIVE-15258
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Query Planning
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-15258.patch
>
>
> Currently, queries fail and fall back to non-cbo path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15237) Propagate Spark job failure to Hive

2016-11-22 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-15237:
--
Status: Patch Available  (was: Open)

> Propagate Spark job failure to Hive
> ---
>
> Key: HIVE-15237
> URL: https://issues.apache.org/jira/browse/HIVE-15237
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.1.0
>Reporter: Xuefu Zhang
>Assignee: Rui Li
> Attachments: HIVE-15237.2.patch, HIVE-15237.patch
>
>
> If a Spark job failed for some reason, Hive doesn't get any additional error 
> message, which makes it very hard for user to figure out why. Here is an 
> example:
> {code}
> Status: Running (Hive on Spark job[0])
> Job Progress Format
> CurrentTime StageId_StageAttemptId: 
> SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount 
> [StageCost]
> 2016-11-17 21:32:53,134   Stage-0_0: 0/23 Stage-1_0: 0/28 
> 2016-11-17 21:32:55,156   Stage-0_0: 0(+1)/23 Stage-1_0: 0/28 
> 2016-11-17 21:32:57,167   Stage-0_0: 0(+3)/23 Stage-1_0: 0/28 
> 2016-11-17 21:33:00,216   Stage-0_0: 0(+3)/23 Stage-1_0: 0/28 
> 2016-11-17 21:33:03,251   Stage-0_0: 0(+3)/23 Stage-1_0: 0/28 
> 2016-11-17 21:33:06,286   Stage-0_0: 0(+4)/23 Stage-1_0: 0/28 
> 2016-11-17 21:33:09,308   Stage-0_0: 0(+2,-3)/23  Stage-1_0: 0/28 
> 2016-11-17 21:33:12,332   Stage-0_0: 0(+2,-3)/23  Stage-1_0: 0/28 
> 2016-11-17 21:33:13,338   Stage-0_0: 0(+21,-3)/23 Stage-1_0: 0/28 
> 2016-11-17 21:33:15,349   Stage-0_0: 0(+21,-5)/23 Stage-1_0: 0/28 
> 2016-11-17 21:33:16,358   Stage-0_0: 0(+18,-8)/23 Stage-1_0: 0/28 
> 2016-11-17 21:33:19,373   Stage-0_0: 0(+21,-8)/23 Stage-1_0: 0/28 
> 2016-11-17 21:33:22,400   Stage-0_0: 0(+18,-14)/23Stage-1_0: 0/28 
> 2016-11-17 21:33:23,404   Stage-0_0: 0(+15,-20)/23Stage-1_0: 0/28 
> 2016-11-17 21:33:24,408   Stage-0_0: 0(+12,-23)/23Stage-1_0: 0/28 
> 2016-11-17 21:33:25,417   Stage-0_0: 0(+9,-26)/23 Stage-1_0: 0/28 
> 2016-11-17 21:33:26,420   Stage-0_0: 0(+12,-26)/23Stage-1_0: 0/28 
> 2016-11-17 21:33:28,427   Stage-0_0: 0(+9,-29)/23 Stage-1_0: 0/28 
> 2016-11-17 21:33:29,432   Stage-0_0: 0(+12,-29)/23Stage-1_0: 0/28 
> 2016-11-17 21:33:31,444   Stage-0_0: 0(+18,-29)/23Stage-1_0: 0/28 
> 2016-11-17 21:33:34,464   Stage-0_0: 0(+18,-29)/23Stage-1_0: 0/28 
> Status: Failed
> FAILED: Execution Error, return code 3 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> {code}
> It would be better if we can propagate Spark error to Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15237) Propagate Spark job failure to Hive

2016-11-22 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-15237:
--
Attachment: HIVE-15237.2.patch

Thanks [~xuefuz] for the patch. I made some modifications based on your work. 
Tried to parse the root cause from the Throwable. Then print the root cause to 
console and leave the detailed message to the log.
Please try it and let me know your opinions. Thanks.

> Propagate Spark job failure to Hive
> ---
>
> Key: HIVE-15237
> URL: https://issues.apache.org/jira/browse/HIVE-15237
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.1.0
>Reporter: Xuefu Zhang
>Assignee: Rui Li
> Attachments: HIVE-15237.2.patch, HIVE-15237.patch
>
>
> If a Spark job failed for some reason, Hive doesn't get any additional error 
> message, which makes it very hard for user to figure out why. Here is an 
> example:
> {code}
> Status: Running (Hive on Spark job[0])
> Job Progress Format
> CurrentTime StageId_StageAttemptId: 
> SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount 
> [StageCost]
> 2016-11-17 21:32:53,134   Stage-0_0: 0/23 Stage-1_0: 0/28 
> 2016-11-17 21:32:55,156   Stage-0_0: 0(+1)/23 Stage-1_0: 0/28 
> 2016-11-17 21:32:57,167   Stage-0_0: 0(+3)/23 Stage-1_0: 0/28 
> 2016-11-17 21:33:00,216   Stage-0_0: 0(+3)/23 Stage-1_0: 0/28 
> 2016-11-17 21:33:03,251   Stage-0_0: 0(+3)/23 Stage-1_0: 0/28 
> 2016-11-17 21:33:06,286   Stage-0_0: 0(+4)/23 Stage-1_0: 0/28 
> 2016-11-17 21:33:09,308   Stage-0_0: 0(+2,-3)/23  Stage-1_0: 0/28 
> 2016-11-17 21:33:12,332   Stage-0_0: 0(+2,-3)/23  Stage-1_0: 0/28 
> 2016-11-17 21:33:13,338   Stage-0_0: 0(+21,-3)/23 Stage-1_0: 0/28 
> 2016-11-17 21:33:15,349   Stage-0_0: 0(+21,-5)/23 Stage-1_0: 0/28 
> 2016-11-17 21:33:16,358   Stage-0_0: 0(+18,-8)/23 Stage-1_0: 0/28 
> 2016-11-17 21:33:19,373   Stage-0_0: 0(+21,-8)/23 Stage-1_0: 0/28 
> 2016-11-17 21:33:22,400   Stage-0_0: 0(+18,-14)/23Stage-1_0: 0/28 
> 2016-11-17 21:33:23,404   Stage-0_0: 0(+15,-20)/23Stage-1_0: 0/28 
> 2016-11-17 21:33:24,408   Stage-0_0: 0(+12,-23)/23Stage-1_0: 0/28 
> 2016-11-17 21:33:25,417   Stage-0_0: 0(+9,-26)/23 Stage-1_0: 0/28 
> 2016-11-17 21:33:26,420   Stage-0_0: 0(+12,-26)/23Stage-1_0: 0/28 
> 2016-11-17 21:33:28,427   Stage-0_0: 0(+9,-29)/23 Stage-1_0: 0/28 
> 2016-11-17 21:33:29,432   Stage-0_0: 0(+12,-29)/23Stage-1_0: 0/28 
> 2016-11-17 21:33:31,444   Stage-0_0: 0(+18,-29)/23Stage-1_0: 0/28 
> 2016-11-17 21:33:34,464   Stage-0_0: 0(+18,-29)/23Stage-1_0: 0/28 
> Status: Failed
> FAILED: Execution Error, return code 3 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> {code}
> It would be better if we can propagate Spark error to Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15247) Pass the purge option for drop table to storage handlers

2016-11-22 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15686254#comment-15686254
 ] 

Jesus Camacho Rodriguez commented on HIVE-15247:


+1

> Pass the purge option for drop table to storage handlers
> 
>
> Key: HIVE-15247
> URL: https://issues.apache.org/jira/browse/HIVE-15247
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.0.0, 2.1.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-15247.patch
>
>
> This gives storage handler more control on how to handle drop table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15259) The deserialization time of HOS20 is longer than what in HOS16

2016-11-22 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15686282#comment-15686282
 ] 

Rui Li commented on HIVE-15259:
---

With Spark 2.0, you don't have to copy all the jars to Hive lib. Please refer 
to our wiki 
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started.

The log you posted is from Spark. I guess it uploads the required jars to 
prepare the classpath for containers. Not sure if it's related to the 
deserialization time. And on Hive side we don't have much control on it - we 
basically just specify that hive-exec is needed. Others are up to Spark.

> The deserialization time of HOS20 is longer than what in  HOS16
> ---
>
> Key: HIVE-15259
> URL: https://issues.apache.org/jira/browse/HIVE-15259
> Project: Hive
>  Issue Type: Improvement
>Reporter: liyunzhang_intel
> Attachments: Deserialization_HOS16.PNG, Deserialization_HOS20.PNG
>
>
> deploy Hive on Spark on spark 1.6 version and spark 2.0 version.
> run query and in latest code(with spark2.0) the deserialization time of a 
> task is 4 sec while the deserialization time of spark1.6 is 1 sec. The detail 
> is in attached picture.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14455) upgrade httpclient, httpcore to match updated hadoop dependency

2016-11-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15686300#comment-15686300
 ] 

Hive QA commented on HIVE-14455:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12839958/HIVE-14455.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10701 tests 
executed
*Failed tests:*
{noformat}
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=120)

[groupby3_map.q,union11.q,union26.q,mapreduce1.q,mapjoin_addjar.q,bucket_map_join_spark1.q,udf_example_add.q,multi_insert_with_join.q,sample7.q,auto_join_nulls.q,ppd_outer_join4.q,load_dyn_part8.q,sample6.q,bucket_map_join_1.q,auto_sortmerge_join_9.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=122)

[table_access_keys_stats.q,bucketmapjoin11.q,auto_join4.q,join34.q,nullgroup.q,mergejoins_mixed.q,sort.q,join_nullsafe.q,stats8.q,auto_join28.q,join17.q,union17.q,skewjoinopt11.q,groupby1_map.q,load_dyn_part11.q]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=133)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=91)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2238/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2238/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2238/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12839958 - PreCommit-HIVE-Build

> upgrade httpclient, httpcore to match updated hadoop dependency
> ---
>
> Key: HIVE-14455
> URL: https://issues.apache.org/jira/browse/HIVE-14455
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-14455.1.patch, HIVE-14455.1.patch
>
>
> Hive was having a newer version of httpclient and httpcore since 1.2.0 
> (HIVE-9709), when compared to Hadoop 2.x versions, to be able to make use of 
> newer apis in httpclient 4.4.
> There was  security issue in the older version of httpclient and httpcore 
> that hadoop was using, and as a result moved to httpclient  4.5.2  and 
> httpcore 4.4.4 (HADOOP-12767).
> As hadoop was using the older version of these libraries and they often end 
> up earlier in the classpath, we have had bunch of difficulties in different 
> environments with class/method not found errors. 
> Now, that hadoops dependencies in versions with security fix are newer and 
> have the API that hive needs, we can be on the same version. For older 
> versions of hadoop this version update doesn't matter as the difference is 
> already there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15168) Flaky test: TestSparkClient.testJobSubmission (still flaky)

2016-11-22 Thread Barna Zsombor Klara (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15686364#comment-15686364
 ] 

Barna Zsombor Klara commented on HIVE-15168:


i'm not sure these failures are related.
These are known flaky tests:
explainanalyze_2 - https://issues.apache.org/jira/browse/HIVE-15084
transform_ppr2 - https://issues.apache.org/jira/browse/HIVE-15201
orc_ppd_schema_evol_3a - https://issues.apache.org/jira/browse/HIVE-14936
These are failing rather consistently:
union_fast_stats - https://issues.apache.org/jira/browse/HIVE-15115
join_acid_non_acid - https://issues.apache.org/jira/browse/HIVE-15116

auto_sortmerge_join_2 - is not identified as flaky, but the failure is in MR, I 
don't think my changes to the SparkClient could have caused it. And it did not 
fail on the first run.
{code}
< FAILED: Execution Error, return code 3 from 
org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
< ATTEMPT: Execute BackupTask: org.apache.hadoop.hive.ql.exec.mr.MapRedTask
{code}

This leaves TestSparkCliDriver, which I cannot repro locally.
>From the test report it seems to me that the error is happening during the 
>test setup:
{code}
java.lang.AssertionError: Failed during createSources processLine with code=3
{code}
But the hive log has a different failure:
{code}
2016-11-21T08:36:57,751  INFO [stderr-redir-1] client.SparkClientImpl: 16/11/21 
08:36:57 WARN TaskSetManager: Lost task 0.0 in stage 46.0 (TID 53, 
10.234.144.78): java.io.IOException: Failed to create local dir in 
/tmp/spark-8a7bd913-fca5-4990-ad09-c9eff4dacae0/executor-e85f9833-b0ab-47ed-bb95-56dc9ef64177/blockmgr-044ca916-76f1-402f-80ed-c4ec5fd4d544/2b.
{code}
I'm not sure if either can be related to the refactoring in the SparkClientImpl.

> Flaky test: TestSparkClient.testJobSubmission (still flaky)
> ---
>
> Key: HIVE-15168
> URL: https://issues.apache.org/jira/browse/HIVE-15168
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Fix For: 2.2.0
>
> Attachments: HIVE-15168.patch
>
>
> [HIVE-14910|https://issues.apache.org/jira/browse/HIVE-14910] already 
> addressed one source of flakyness bud sadly not all it seems.
> In JobHandleImpl the listeners are registered after the job has been 
> submitted.
> This may end up in a racecondition.
> {code}
>  // Link the RPC and the promise so that events from one are propagated to 
> the other as
>   // needed.
>   rpc.addListener(new 
> GenericFutureListener>() {
> @Override
> public void operationComplete(io.netty.util.concurrent.Future 
> f) {
>   if (f.isSuccess()) {
> handle.changeState(JobHandle.State.QUEUED);
>   } else if (!promise.isDone()) {
> promise.setFailure(f.cause());
>   }
> }
>   });
>   promise.addListener(new GenericFutureListener>() {
> @Override
> public void operationComplete(Promise p) {
>   if (jobId != null) {
> jobs.remove(jobId);
>   }
>   if (p.isCancelled() && !rpc.isDone()) {
> rpc.cancel(true);
>   }
> }
>   });
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15237) Propagate Spark job failure to Hive

2016-11-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15686449#comment-15686449
 ] 

Hive QA commented on HIVE-15237:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12839982/HIVE-15237.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10701 tests 
executed
*Failed tests:*
{noformat}
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=102)

[skewjoinopt3.q,smb_mapjoin_4.q,timestamp_comparison.q,union_remove_10.q,mapreduce2.q,bucketmapjoin_negative.q,udf_in_file.q,auto_join12.q,skewjoin.q,vector_left_outer_join.q,semijoin.q,skewjoinopt9.q,smb_mapjoin_3.q,stats10.q,nullgroup4.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=112)

[escape_distributeby1.q,join9.q,groupby2.q,groupby4_map.q,udf_max.q,vectorization_pushdown.q,cbo_gby_empty.q,join_cond_pushdown_unqual3.q,vectorization_short_regress.q,join8.q,stats5.q,sample10.q,cross_product_check_1.q,auto_join_stats.q,input_part2.q]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=133)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] 
(batchId=91)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2240/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2240/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2240/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12839982 - PreCommit-HIVE-Build

> Propagate Spark job failure to Hive
> ---
>
> Key: HIVE-15237
> URL: https://issues.apache.org/jira/browse/HIVE-15237
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.1.0
>Reporter: Xuefu Zhang
>Assignee: Rui Li
> Attachments: HIVE-15237.2.patch, HIVE-15237.patch
>
>
> If a Spark job failed for some reason, Hive doesn't get any additional error 
> message, which makes it very hard for user to figure out why. Here is an 
> example:
> {code}
> Status: Running (Hive on Spark job[0])
> Job Progress Format
> CurrentTime StageId_StageAttemptId: 
> SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount 
> [StageCost]
> 2016-11-17 21:32:53,134   Stage-0_0: 0/23 Stage-1_0: 0/28 
> 2016-11-17 21:32:55,156   Stage-0_0: 0(+1)/23 Stage-1_0: 0/28 
> 2016-11-17 21:32:57,167   Stage-0_0: 0(+3)/23 Stage-1_0: 0/28 
> 2016-11-17 21:33:00,216   Stage-0_0: 0(+3)/23 Stage-1_0: 0/28 
> 2016-11-17 21:33:03,251   Stage-0_0: 0(+3)/23 Stage-1_0: 0/28 
> 2016-11-17 21:33:06,286   Stage-0_0: 0(+4)/23 Stage-1_0: 0/28 
> 2016-11-17 21:33:09,308   Stage-0_0: 0(+2,-3)/23  Stage-1_0: 0/28 
> 2016-11-17 21:33:12,332   Stage-0_0: 0(+2,-3)/23  Stage-1_0: 0/28 
> 2016-11-17 21:33:13,338   Stage-0_0: 0(+21,-3)/23 Stage-1_0: 0/28 
> 2016-11-17 21:33:15,349   Stage-0_0: 0(+21,-5)/23 Stage-1_0: 0/28 
> 2016-11-17 21:33:16,358   Stage-0_0: 0(+18,-8)/23 Stage-1_0: 0/28 
> 2016-11-17 21:33:19,373   Stage-0_0: 0(+21,-8)/23 Stage-1_0: 0/28 
> 2016-11-17 21:33:22,400   Stage-0_0: 0(+18,-14)/23Stage-1_0: 0/28 
> 2016-11-17 21:33:23,404   Stage-0_0: 0(+15,-20)/23Stage-1_0: 0/28 
> 2016-11-17 21:33:24,408   Stage-0_0: 0(+12,-23)/23Stage-1_0: 0/28 
> 2016-11-17 21:33:25,417   Stage-0_0: 0(+9,-26)/23 Stage-1_0: 0/28 
> 2016-11-17 21:33:26,420   Stage-0_0: 0(+12,-26)/23Stage-1_0: 0/28 
> 2016-11-17 21:33:28,427   Stage-0_0: 0(+9,-29)/23 Stage-1_0: 0/28 
> 2016-11-17 21:33:29,432   Stage-0_0: 0(+12,-29)/23Stage-1_0: 0/28 
> 2016-11-17 21:33:31,444   Stage-0_0: 0(+18,-29)/23Stage-1_0: 0/28 
> 2016-11-17 21:33:34,464   Stage-0_0: 0(+18,-29)/23Stage-1_0: 0/28 
> Status: Failed
> FAILED: Execution Error, return code 3 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> {code}
> It would be better if we can propagate Spark error to Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15240) Updating/Altering stats in metastore can be expensive in S3

2016-11-22 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-15240:

Status: Open  (was: Patch Available)

> Updating/Altering stats in metastore can be expensive in S3
> ---
>
> Key: HIVE-15240
> URL: https://issues.apache.org/jira/browse/HIVE-15240
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-15240.1.patch, HIVE-15240.2.patch
>
>
> https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java#L630
> https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java#L367
> If there are 100 partitions, it iterates every partition to determine its 
> location taking up more than good amount of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15261) Exception in thread "main" java.lang.IllegalArgumentException: Unrecognized Hadoop major version number: 3.0.0-alpha1

2016-11-22 Thread R (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15686569#comment-15686569
 ] 

R commented on HIVE-15261:
--

I tried to run schematool -dbType mysql -initSchema and got the below error:

Exception in thread "main" java.lang.IllegalArgumentException: Unrecognized 
Hadoop major version number: 3.0.0-alpha1
at 
org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:165)
at 
org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:132)
at 
org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:93)
at org.apache.hive.beeline.HiveSchemaTool.(HiveSchemaTool.java:81)
at org.apache.hive.beeline.HiveSchemaTool.(HiveSchemaTool.java:68)
at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:480)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)


> Exception in thread "main" java.lang.IllegalArgumentException: Unrecognized 
> Hadoop major version number: 3.0.0-alpha1
> -
>
> Key: HIVE-15261
> URL: https://issues.apache.org/jira/browse/HIVE-15261
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.0.1
>Reporter: R
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15168) Flaky test: TestSparkClient.testJobSubmission (still flaky)

2016-11-22 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15686633#comment-15686633
 ] 

Rui Li commented on HIVE-15168:
---

[~zsombor.klara], not sure if I correctly understand your explanation about the 
race condition, but looking at the doc of addListener,
{code}
/**
 * Adds the specified listener to this future.  The
 * specified listener is notified when this future is
 * {@linkplain #isDone() done}.  If this future is already
 * completed, the specified listener is notified immediately.
 */
Future addListener(GenericFutureListener> 
listener);
{code}
Seems the listener will be notified even if it's registered after the future 
has completed?

> Flaky test: TestSparkClient.testJobSubmission (still flaky)
> ---
>
> Key: HIVE-15168
> URL: https://issues.apache.org/jira/browse/HIVE-15168
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Fix For: 2.2.0
>
> Attachments: HIVE-15168.patch
>
>
> [HIVE-14910|https://issues.apache.org/jira/browse/HIVE-14910] already 
> addressed one source of flakyness bud sadly not all it seems.
> In JobHandleImpl the listeners are registered after the job has been 
> submitted.
> This may end up in a racecondition.
> {code}
>  // Link the RPC and the promise so that events from one are propagated to 
> the other as
>   // needed.
>   rpc.addListener(new 
> GenericFutureListener>() {
> @Override
> public void operationComplete(io.netty.util.concurrent.Future 
> f) {
>   if (f.isSuccess()) {
> handle.changeState(JobHandle.State.QUEUED);
>   } else if (!promise.isDone()) {
> promise.setFailure(f.cause());
>   }
> }
>   });
>   promise.addListener(new GenericFutureListener>() {
> @Override
> public void operationComplete(Promise p) {
>   if (jobId != null) {
> jobs.remove(jobId);
>   }
>   if (p.isCancelled() && !rpc.isDone()) {
> rpc.cancel(true);
>   }
> }
>   });
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15168) Flaky test: TestSparkClient.testJobSubmission (still flaky)

2016-11-22 Thread Barna Zsombor Klara (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15686732#comment-15686732
 ] 

Barna Zsombor Klara commented on HIVE-15168:


That is indeed a very good point. Maybe I misunderstood the race condition 
here. I will cancel the patch until I can investigate a bit further. You are 
right that according to the javadoc the listeners should get notified even if 
they are registered after the future has completed.

> Flaky test: TestSparkClient.testJobSubmission (still flaky)
> ---
>
> Key: HIVE-15168
> URL: https://issues.apache.org/jira/browse/HIVE-15168
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Fix For: 2.2.0
>
> Attachments: HIVE-15168.patch
>
>
> [HIVE-14910|https://issues.apache.org/jira/browse/HIVE-14910] already 
> addressed one source of flakyness bud sadly not all it seems.
> In JobHandleImpl the listeners are registered after the job has been 
> submitted.
> This may end up in a racecondition.
> {code}
>  // Link the RPC and the promise so that events from one are propagated to 
> the other as
>   // needed.
>   rpc.addListener(new 
> GenericFutureListener>() {
> @Override
> public void operationComplete(io.netty.util.concurrent.Future 
> f) {
>   if (f.isSuccess()) {
> handle.changeState(JobHandle.State.QUEUED);
>   } else if (!promise.isDone()) {
> promise.setFailure(f.cause());
>   }
> }
>   });
>   promise.addListener(new GenericFutureListener>() {
> @Override
> public void operationComplete(Promise p) {
>   if (jobId != null) {
> jobs.remove(jobId);
>   }
>   if (p.isCancelled() && !rpc.isDone()) {
> rpc.cancel(true);
>   }
> }
>   });
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15168) Flaky test: TestSparkClient.testJobSubmission (still flaky)

2016-11-22 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-15168:
---
Status: Open  (was: Patch Available)

> Flaky test: TestSparkClient.testJobSubmission (still flaky)
> ---
>
> Key: HIVE-15168
> URL: https://issues.apache.org/jira/browse/HIVE-15168
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Fix For: 2.2.0
>
> Attachments: HIVE-15168.patch
>
>
> [HIVE-14910|https://issues.apache.org/jira/browse/HIVE-14910] already 
> addressed one source of flakyness bud sadly not all it seems.
> In JobHandleImpl the listeners are registered after the job has been 
> submitted.
> This may end up in a racecondition.
> {code}
>  // Link the RPC and the promise so that events from one are propagated to 
> the other as
>   // needed.
>   rpc.addListener(new 
> GenericFutureListener>() {
> @Override
> public void operationComplete(io.netty.util.concurrent.Future 
> f) {
>   if (f.isSuccess()) {
> handle.changeState(JobHandle.State.QUEUED);
>   } else if (!promise.isDone()) {
> promise.setFailure(f.cause());
>   }
> }
>   });
>   promise.addListener(new GenericFutureListener>() {
> @Override
> public void operationComplete(Promise p) {
>   if (jobId != null) {
> jobs.remove(jobId);
>   }
>   if (p.isCancelled() && !rpc.isDone()) {
> rpc.cancel(true);
>   }
> }
>   });
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14582) Add trunc(numeric) udf

2016-11-22 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-14582:

Attachment: HIVE-14582.3.patch

> Add trunc(numeric) udf
> --
>
> Key: HIVE-14582
> URL: https://issues.apache.org/jira/browse/HIVE-14582
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Ashutosh Chauhan
>Assignee: Chinna Rao Lalam
> Attachments: HIVE-14582.1.patch, HIVE-14582.2.patch, 
> HIVE-14582.3.patch, HIVE-14582.patch
>
>
> https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions200.htm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14582) Add trunc(numeric) udf

2016-11-22 Thread Chinna Rao Lalam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15686781#comment-15686781
 ] 

Chinna Rao Lalam commented on HIVE-14582:
-

Created RB request : https://reviews.apache.org/r/53983/

> Add trunc(numeric) udf
> --
>
> Key: HIVE-14582
> URL: https://issues.apache.org/jira/browse/HIVE-14582
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Ashutosh Chauhan
>Assignee: Chinna Rao Lalam
> Attachments: HIVE-14582.1.patch, HIVE-14582.2.patch, 
> HIVE-14582.3.patch, HIVE-14582.patch
>
>
> https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions200.htm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15262) Drill 1.8 UI doesn't display Hive join query results

2016-11-22 Thread Gopal Nagar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal Nagar updated HIVE-15262:
---
Attachment: Drill_log.txt

> Drill 1.8 UI doesn't display Hive join query results
> 
>
> Key: HIVE-15262
> URL: https://issues.apache.org/jira/browse/HIVE-15262
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Web UI
>Reporter: Gopal Nagar
> Attachments: Drill_log.txt
>
>
> Hi All,
> I am using Apache Drill 1.8.0 on AWS EMR and joining two hive tables. Below 
> is sample query. This working fine in Drill CLI but giving below error after 
> running few minutes. If i try simple select query (select t1.col from 
> hive.table t1) it works fine in both Drill CLI and UI. Only problem with join 
> query.
> If i cancel the join query from background, it displays results in UI. This 
> is very strange situation.
> Drill has been installed on a AWS node which has 32 GB RAM and 80 GB storage. 
> I didn't specify memory separately to Drill. I am trying to join two tables 
> have rows 4607818 & 14273378 respectively. Please find attached drillbit.log 
> file as well.
> Only my confusion here is Drill CLI is working fine with this data and 
> showing output then why UI doesn't display output and throwing error. While 
> running query from UI, if i cancel it from background then UI immediately 
> display result.
> I believe i am missing some configuration (which is holding o/p in buffer and 
> after cancellation display result) here and need your help.
> Join Query 
> --
> select t1.col FROM hive.table1 as t1 join hive.table2 as t2 on t1.col = 
> t2.col limit 1000;
> Error
> ---
> Query Failed: An Error Occurred 
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> RpcException: Data not accepted downstream. Fragment 1:4 [Error Id: 
> 0b5ed2db-3653-4e3a-9c92-d0a6cd69b66e on 
> ip-172-31-16-222.us-west-2.compute.internal:31010]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11072) Add data validation between Hive metastore upgrades tests

2016-11-22 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-11072:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks Naveen for the work and Chaoyu for reviewing.

> Add data validation between Hive metastore upgrades tests
> -
>
> Key: HIVE-11072
> URL: https://issues.apache.org/jira/browse/HIVE-11072
> Project: Hive
>  Issue Type: New Feature
>  Components: Tests
>Reporter: Sergio Peña
>Assignee: Naveen Gangam
> Fix For: 2.2.0
>
> Attachments: HIVE-11072.1.patch, HIVE-11072.2.patch, 
> HIVE-11072.3.patch, HIVE-11072.4.patch, HIVE-11072.5.patch, 
> HIVE-11072.to-be-committed.patch
>
>
> An existing Hive metastore upgrade test is running on Hive jenkins. However, 
> these scripts do test only database schema upgrade, not data validation 
> between upgrades.
> We should validate data between metastore version upgrades. Using data 
> validation, we may ensure that data won't be damaged, or corrupted when 
> upgrading the Hive metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14582) Add trunc(numeric) udf

2016-11-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15686916#comment-15686916
 ] 

Hive QA commented on HIVE-14582:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12840024/HIVE-14582.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10732 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=134)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=92)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2241/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2241/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2241/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12840024 - PreCommit-HIVE-Build

> Add trunc(numeric) udf
> --
>
> Key: HIVE-14582
> URL: https://issues.apache.org/jira/browse/HIVE-14582
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Ashutosh Chauhan
>Assignee: Chinna Rao Lalam
> Attachments: HIVE-14582.1.patch, HIVE-14582.2.patch, 
> HIVE-14582.3.patch, HIVE-14582.patch
>
>
> https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions200.htm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15263) Detect the values for incorrect NULL values

2016-11-22 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-15263:

Attachment: HIVE-15263.1.patch

> Detect the values for incorrect NULL values
> ---
>
> Key: HIVE-15263
> URL: https://issues.apache.org/jira/browse/HIVE-15263
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-15263.1.patch
>
>
> We have seen the incorrect NULL values for SD_ID in TBLS for the hive tables. 
> That column can be null since it will be NULL for hive views. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-15199) INSERT INTO data on S3 is replacing the old rows with the new ones

2016-11-22 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15684283#comment-15684283
 ] 

Steve Loughran edited comment on HIVE-15199 at 11/22/16 3:19 PM:
-

you are right, I am wrong: serves me right for commenting without staring at 
the code. you should be calling; I got confused by naming.

you should be invoking

{code}
RemoteIterator listFiles(Path f,  boolean recursive) 
{code}
with recursive = true.

this defaults to a standard recursive treewalk; on object stores we can do an 
O(1) listing of all child files, irrespective of directory depth and width. For 
anything other than a flat directory, this is a significant speedup


was (Author: ste...@apache.org):
you are right, I am wrong: serves me right for commenting without staring at 
the code. you should be calling; I got confused by naming.

you should be invoking

{code}
RemoteIterator listFiles(Path f,  boolean recursive) 
{Code}
with recursive = true.

this defaults to a standard recursive treewalk; on object stores we can do an 
O(1) listing of all child files, irrespective of directory depth and width. For 
anything other than a flat directory, this is a significant speedup

> INSERT INTO data on S3 is replacing the old rows with the new ones
> --
>
> Key: HIVE-15199
> URL: https://issues.apache.org/jira/browse/HIVE-15199
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Critical
> Attachments: HIVE-15199.1.patch, HIVE-15199.2.patch, 
> HIVE-15199.3.patch, HIVE-15199.4.patch, HIVE-15199.5.patch, 
> HIVE-15199.6.patch, HIVE-15199.7.patch, HIVE-15199.8.patch
>
>
> Any INSERT INTO statement run on S3 tables and when the scratch directory is 
> saved on S3 is deleting old rows of the table.
> {noformat}
> hive> set hive.blobstore.use.blobstore.as.scratchdir=true;
> hive> create table t1 (id int, name string) location 's3a://spena-bucket/t1';
> hive> insert into table t1 values (1,'name1');
> hive> select * from t1;
> 1   name1
> hive> insert into table t1 values (2,'name2');
> hive> select * from t1;
> 2   name2
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15263) Detect the values for incorrect NULL values

2016-11-22 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-15263:

Status: Patch Available  (was: Open)

patch-1: check the values for SD_ID in TBLS. Hive tables should have valid 
SD_ID. 

Also fixed the closures of BufferedReader and Beeline which could cause 
leaking. And handle possible NULL for columnDescriptor (seems it could be null 
by checking related logic and the schema allows null) - add the handling for 
such case.

> Detect the values for incorrect NULL values
> ---
>
> Key: HIVE-15263
> URL: https://issues.apache.org/jira/browse/HIVE-15263
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-15263.1.patch
>
>
> We have seen the incorrect NULL values for SD_ID in TBLS for the hive tables. 
> That column can be null since it will be NULL for hive views. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15073) Schematool should detect malformed URIs

2016-11-22 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-15073:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master.
Thanks Aihua for reviewing the code.

> Schematool should detect malformed URIs
> ---
>
> Key: HIVE-15073
> URL: https://issues.apache.org/jira/browse/HIVE-15073
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Fix For: 2.2.0
>
> Attachments: HIVE-15073.1.patch, HIVE-15073.2.patch
>
>
> For some causes(most unknown), HMS DB tables sometimes has invalid entries, 
> for example URI missing scheme for SDS table's LOCATION column or DBS's 
> DB_LOCATION_URI column. These malformed URIs lead to hard to analyze errors 
> in HIVE and SENTRY. Schematool need to provide a command to detect these 
> malformed URI, give a warning and provide an option to fix the URIs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15263) Detect the values for incorrect NULL values

2016-11-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15687044#comment-15687044
 ] 

Hive QA commented on HIVE-15263:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12840038/HIVE-15263.1.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2242/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2242/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2242/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2016-11-22 15:32:39.566
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-2242/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2016-11-22 15:32:39.569
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   893b255..a6c4004  master -> origin/master
+ git reset --hard HEAD
HEAD is now at 893b255 HIVE-15211: Provide support for complex expressions in 
ON clauses for INNER joins (Jesus Camacho Rodriguez, reviewed by Ashutosh 
Chauhan)
+ git clean -f -d
Removing ql/src/test/queries/clientpositive/udf_trunc_number.q
Removing ql/src/test/results/clientpositive/udf_trunc_number.q.out
+ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 2 commits, and can be fast-forwarded.
  (use "git pull" to update your local branch)
+ git reset --hard origin/master
HEAD is now at a6c4004 HIVE-15073: Schematool should detect malformed URIs 
(Yongzhi Chen, reviewed by Aihua Xu)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2016-11-22 15:32:40.872
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: a/beeline/src/java/org/apache/hive/beeline/HiveSchemaTool.java: No such 
file or directory
error: 
a/itests/hive-unit/src/test/java/org/apache/hive/beeline/TestSchemaTool.java: 
No such file or directory
error: a/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java: 
No such file or directory
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12840038 - PreCommit-HIVE-Build

> Detect the values for incorrect NULL values
> ---
>
> Key: HIVE-15263
> URL: https://issues.apache.org/jira/browse/HIVE-15263
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-15263.1.patch
>
>
> We have seen the incorrect NULL values for SD_ID in TBLS for the hive tables. 
> That column can be null since it will be NULL for hive views. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15199) INSERT INTO data on S3 is replacing the old rows with the new ones

2016-11-22 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15687108#comment-15687108
 ] 

Steve Loughran commented on HIVE-15199:
---

I do think I'd rather fix this in s3, because it is adding 2 GET calls and a 
LIST before each rename, calls which take place in the rename itself. And of 
course, when Hadoop 2.8 or derivatives change s3a's rename to == HDFS, the 
check will be superfluous. Similarly, once you have a consistent FS view 
(s3guard, etc),  you are less likely to see a mismatch between listing and 
stat-ing. If you are, it means something else is writing to the same dir, 
putting you in trouble.

Would it be possible to set this up to make it easy to turn off in future. For 
example: create the JIRA on stripping the exists check out?

> INSERT INTO data on S3 is replacing the old rows with the new ones
> --
>
> Key: HIVE-15199
> URL: https://issues.apache.org/jira/browse/HIVE-15199
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Critical
> Attachments: HIVE-15199.1.patch, HIVE-15199.2.patch, 
> HIVE-15199.3.patch, HIVE-15199.4.patch, HIVE-15199.5.patch, 
> HIVE-15199.6.patch, HIVE-15199.7.patch, HIVE-15199.8.patch
>
>
> Any INSERT INTO statement run on S3 tables and when the scratch directory is 
> saved on S3 is deleting old rows of the table.
> {noformat}
> hive> set hive.blobstore.use.blobstore.as.scratchdir=true;
> hive> create table t1 (id int, name string) location 's3a://spena-bucket/t1';
> hive> insert into table t1 values (1,'name1');
> hive> select * from t1;
> 1   name1
> hive> insert into table t1 values (2,'name2');
> hive> select * from t1;
> 2   name2
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15199) INSERT INTO data on S3 is replacing the old rows with the new ones

2016-11-22 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-15199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15687154#comment-15687154
 ] 

Sergio Peña commented on HIVE-15199:


I moved the listFiles() to the beginning of the method to avoid calling it for 
each new rename. However, we still have the 2 GET calls on each rename (exists 
&& rename), I added a validation to do this on S3 only, and leave only the 
rename call on HDFS.

As you mentioned, when S3Guard is released, this would help us a lot on 
consistency, and we could remove the exists() call. I think the listFiles() is 
still beneficial so Hive can figure out the next filename to use when renaming 
the file.

The code change is pretty easy, so I can create a Jira to remove it in the 
future.
{noformat}
+  boolean isBlobStoragePath = BlobStorageUtils.isBlobStoragePath(conf, 
destDirPath);
+  while ((isBlobStoragePath && destFs.exists(destFilePath)) || 
!destFs.rename(sourcePath, destFilePath)) {
+destFilePath = createCopyFilePath(destDirPath, name, type, ++counter);
+  }
{noformat}

Is the S3Guard going to be released on Hadoop 2.8?

> INSERT INTO data on S3 is replacing the old rows with the new ones
> --
>
> Key: HIVE-15199
> URL: https://issues.apache.org/jira/browse/HIVE-15199
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Critical
> Attachments: HIVE-15199.1.patch, HIVE-15199.2.patch, 
> HIVE-15199.3.patch, HIVE-15199.4.patch, HIVE-15199.5.patch, 
> HIVE-15199.6.patch, HIVE-15199.7.patch, HIVE-15199.8.patch
>
>
> Any INSERT INTO statement run on S3 tables and when the scratch directory is 
> saved on S3 is deleting old rows of the table.
> {noformat}
> hive> set hive.blobstore.use.blobstore.as.scratchdir=true;
> hive> create table t1 (id int, name string) location 's3a://spena-bucket/t1';
> hive> insert into table t1 values (1,'name1');
> hive> select * from t1;
> 1   name1
> hive> insert into table t1 values (2,'name2');
> hive> select * from t1;
> 2   name2
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14582) Add trunc(numeric) udf

2016-11-22 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15687180#comment-15687180
 ] 

Ashutosh Chauhan commented on HIVE-14582:
-

Patch looks good. Any reason you restricted scale argument to be constant? Your 
evaluate method supports it, but you throw exception for non-constants in 
initialize(). Date version already supports it, so we should consider 
supporting it for numerics too. Will enable queries like:
create table t1(c double, d int); select trunc (c,d) from t1;

> Add trunc(numeric) udf
> --
>
> Key: HIVE-14582
> URL: https://issues.apache.org/jira/browse/HIVE-14582
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Ashutosh Chauhan
>Assignee: Chinna Rao Lalam
> Attachments: HIVE-14582.1.patch, HIVE-14582.2.patch, 
> HIVE-14582.3.patch, HIVE-14582.patch
>
>
> https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions200.htm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14145) Too small length of column 'PARAM_VALUE' in table 'SERDE_PARAMS'

2016-11-22 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-14145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15687181#comment-15687181
 ] 

Sergio Peña commented on HIVE-14145:


[~ngangam] Let's wait until you commit your changes to fix issues like this. We 
can then test this issue to make sure it is fixed.

> Too small length of column 'PARAM_VALUE' in table 'SERDE_PARAMS'
> 
>
> Key: HIVE-14145
> URL: https://issues.apache.org/jira/browse/HIVE-14145
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Oleksiy Sayankin
>Assignee: Oleksiy Sayankin
> Attachments: HIVE-14145.1.patch, HIVE-14145.2.patch
>
>
> Customer has following table
> {code}
> create external table hive_hbase_test(
> HBASE_KEY string,
> ENTITY_NAME string,
> ENTITY_ID string,
> CLAIM_HEADER_ID string,
> CLAIM_LINE_ID string,
> MEDICAL_CLAIM_SOURCE_SYSTEM string,
> UNIQUE_MEMBER_ID string,
> MEMBER_SOURCE_SYSTEM string,
> SUBSCRIBER_ID string,
> COVERAGE_CLASS_CODE string,
> SERVICING_PROVIDER_ID string,
> PROVIDER_SOURCE_SYSTEM string,
> SERVICING_PROVIDER_SPECIALTY string,
> SERVICING_STANDARD_PROVIDER_SPECIALTY string,
> SERVICING_PROVIDER_TYPE_CODE string,
> REFERRING_PROVIDER_ID string,
> ADMITTING_PROVIDER_ID string,
> ATTENDING_PROVIDER_ID string,
> OPERATING_PROVIDER_ID string,
> BILLING_PROVIDER_ID string,
> ORDERING_PROVIDER_ID string,
> HEALTH_PLAN_SOURCE_ID string,
> HEALTH_PLAN_PAYER_NAME string,
> BUSINESS_UNIT string,
> OPERATING_UNIT string,
> PRODUCT string,
> MARKET string,
> DEPARTMENT string,
> IPA string,
> SUPPLEMENTAL_DATA_TYPE string,
> PSEUDO_CLAIM_FLAG string,
> CLAIM_STATUS string,
> CLAIM_LINE_STATUS string,
> CLAIM_DENIED_FLAG string,
> SERVICE_LINE_DENIED_FLAG string,
> DENIED_REASON_CODE string,
> SERVICE_LINE_DENIED_REASON_CODE string,
> DAYS_DENIED int,
> DIAGNOSIS_DATE timestamp,
> SERVICE_DATE TIMESTAMP,
> SERVICE_FROM_DATE TIMESTAMP,
> SERVICE_TO_DATE TIMESTAMP,
> ADMIT_DATE TIMESTAMP,
> ADMIT_TYPE string,
> ADMIT_SOURCE_TYPE string,
> DISCHARGE_DATE TIMESTAMP,
> DISCHARGE_STATUS_CODE string,
> SERVICE_LINE_TYPE_OF_SERVICE string,
> TYPE_OF_BILL_CODE string,
> INPATIENT_FLAG string,
> PLACE_OF_SERVICE_CODE string,
> FACILITY_CODE string,
> AUTHORIZATION_NUMBER string,
> CLAIM_REFERRAL_NUMBER string,
> CLAIM_TYPE string,
> CLAIM_ADJUSTMENT_TYPE string,
> ICD_DIAGNOSIS_CODE_1 string,
> PRESENT_ON_ADMISSION_FLAG_1 string,
> ICD_DIAGNOSIS_CODE_2 string,
> PRESENT_ON_ADMISSION_FLAG_2 string,
> ICD_DIAGNOSIS_CODE_3 string,
> PRESENT_ON_ADMISSION_FLAG_3 string,
> ICD_DIAGNOSIS_CODE_4 string,
> PRESENT_ON_ADMISSION_FLAG_4 string,
> ICD_DIAGNOSIS_CODE_5 string,
> PRESENT_ON_ADMISSION_FLAG_5 string,
> ICD_DIAGNOSIS_CODE_6 string,
> PRESENT_ON_ADMISSION_FLAG_6 string,
> ICD_DIAGNOSIS_CODE_7 string,
> PRESENT_ON_ADMISSION_FLAG_7 string,
> ICD_DIAGNOSIS_CODE_8 string,
> PRESENT_ON_ADMISSION_FLAG_8 string,
> ICD_DIAGNOSIS_CODE_9 string,
> PRESENT_ON_ADMISSION_FLAG_9 string,
> ICD_DIAGNOSIS_CODE_10 string,
> PRESENT_ON_ADMISSION_FLAG_10 string,
> ICD_DIAGNOSIS_CODE_11 string,
> PRESENT_ON_ADMISSION_FLAG_11 string,
> ICD_DIAGNOSIS_CODE_12 string,
> PRESENT_ON_ADMISSION_FLAG_12 string,
> ICD_DIAGNOSIS_CODE_13 string,
> PRESENT_ON_ADMISSION_FLAG_13 string,
> ICD_DIAGNOSIS_CODE_14 string,
> PRESENT_ON_ADMISSION_FLAG_14 string,
> ICD_DIAGNOSIS_CODE_15 string,
> PRESENT_ON_ADMISSION_FLAG_15 string,
> ICD_DIAGNOSIS_CODE_16 string,
> PRESENT_ON_ADMISSION_FLAG_16 string,
> ICD_DIAGNOSIS_CODE_17 string,
> PRESENT_ON_ADMISSION_FLAG_17 string,
> ICD_DIAGNOSIS_CODE_18 string,
> PRESENT_ON_ADMISSION_FLAG_18 string,
> ICD_DIAGNOSIS_CODE_19 string,
> PRESENT_ON_ADMISSION_FLAG_19 string,
> ICD_DIAGNOSIS_CODE_20 string,
> PRESENT_ON_ADMISSION_FLAG_20 string,
> ICD_DIAGNOSIS_CODE_21 string,
> PRESENT_ON_ADMISSION_FLAG_21 string,
> ICD_DIAGNOSIS_CODE_22 string,
> PRESENT_ON_ADMISSION_FLAG_22 string,
> ICD_DIAGNOSIS_CODE_23 string,
> PRESENT_ON_ADMISSION_FLAG_23 string,
> ICD_DIAGNOSIS_CODE_24 string,
> PRESENT_ON_ADMISSION_FLAG_24 string,
> ICD_DIAGNOSIS_CODE_25 string,
> PRESENT_ON_ADMISSION_FLAG_25 string,
> QUANTITY_OF_SERVICES decimal(10,2),
> REVENUE_CODE string,
> PROCEDURE_CODE string,
> PROCEDURE_CODE_MODIFIER_1 string,
> PROCEDURE_CODE_MODIFIER_2 string,
> PROCEDURE_CODE_MODIFIER_3 string,
> PROCEDURE_CODE_MODIFIER_4 string,
> ICD_VERSION_CODE_TYPE string,
> ICD_PROCEDURE_CODE_1 string,
> ICD_PROCEDURE_CODE_2 string,
> ICD_PROCEDURE_CODE_3 string,
> ICD_PROCEDURE_CODE_4 string,
> ICD_PROCEDURE_CODE_5 string,
> ICD_PROCEDURE_CODE_6 string,
> ICD_PROCEDURE_CODE_7 string,
> ICD_PROCEDURE_CODE_8 string,
> ICD_PROCEDURE_CODE_9 string,
> ICD_PROCEDURE_CODE_10 string,
> ICD_PROCEDURE_CODE_11 string,
> ICD_PROCEDURE_CODE_12 string,
> ICD_PROCEDURE_CODE_13 string,
> ICD_PROCEDURE_CODE_14

[jira] [Commented] (HIVE-15260) Auto delete old log files of hive services

2016-11-22 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15687206#comment-15687206
 ] 

Thejas M Nair commented on HIVE-15260:
--

Thanks for pointing that out [~prasanth_j]!
So its only hive1 line with log4j v1 that has this issue.
As people migrate to hive2 this is less of an issue.


> Auto delete old log files of hive services
> --
>
> Key: HIVE-15260
> URL: https://issues.apache.org/jira/browse/HIVE-15260
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore
>Reporter: Thejas M Nair
>Assignee: anishek
>
> Hive log4j settings rotate the old log files by date, but they don't delete 
> the old log files.
> It would be good to delete the old log files so that the space used doesn't 
> keep increasing for ever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15260) Auto delete old log files of hive services

2016-11-22 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15687211#comment-15687211
 ] 

Thejas M Nair commented on HIVE-15260:
--

[~anagarwal] any changes for this would be applicable only to branch-1 of hive.


> Auto delete old log files of hive services
> --
>
> Key: HIVE-15260
> URL: https://issues.apache.org/jira/browse/HIVE-15260
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore
>Reporter: Thejas M Nair
>Assignee: anishek
>
> Hive log4j settings rotate the old log files by date, but they don't delete 
> the old log files.
> It would be good to delete the old log files so that the space used doesn't 
> keep increasing for ever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15202) Concurrent compactions for the same partition may generate malformed folder structure

2016-11-22 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15687348#comment-15687348
 ] 

Eugene Koifman commented on HIVE-15202:
---

That would not work.  COMPACTION_QUEUE has CQ_STATE so it's valid to have 
multiple entries for the same db/table/partition (in fact in some cases even 
db/table/partiton/cq_state need not be unique)

> Concurrent compactions for the same partition may generate malformed folder 
> structure
> -
>
> Key: HIVE-15202
> URL: https://issues.apache.org/jira/browse/HIVE-15202
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Rui Li
>Assignee: Eugene Koifman
>
> If two compactions run concurrently on a single partition, it may generate 
> folder structure like this: (nested base dir)
> {noformat}
> drwxr-xr-x   - root supergroup  0 2016-11-14 22:23 
> /user/hive/warehouse/test/z=1/base_007/base_007
> -rw-r--r--   3 root supergroup201 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_0
> -rw-r--r--   3 root supergroup611 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_1
> -rw-r--r--   3 root supergroup614 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_2
> -rw-r--r--   3 root supergroup621 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_3
> -rw-r--r--   3 root supergroup621 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_4
> -rw-r--r--   3 root supergroup201 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_5
> -rw-r--r--   3 root supergroup201 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_6
> -rw-r--r--   3 root supergroup201 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_7
> -rw-r--r--   3 root supergroup201 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_8
> -rw-r--r--   3 root supergroup201 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_007/bucket_9
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15168) Flaky test: TestSparkClient.testJobSubmission (still flaky)

2016-11-22 Thread Barna Zsombor Klara (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15687491#comment-15687491
 ] 

Barna Zsombor Klara commented on HIVE-15168:


I had another look at this and you are correct, the listeners are being called 
even if the task has completed. Unfortunately that isn't enough for us because 
in the JobHandleImpl the state change will not reset the state of a job to 
queued if it is already cancelled/succeeded/failed. And if we never set the 
state to queued then the onJobQueued method will never be called on the 
JobHandleListener.
{code}
 /**
   * Changes the state of this job handle, making sure that illegal state 
transitions are ignored.
   * Fires events appropriately.
   *
   * As a rule, state transitions can only occur if the current state is 
"higher" than the current
   * state (i.e., has a higher ordinal number) and is not a "final" state. 
"Final" states are
   * CANCELLED, FAILED and SUCCEEDED, defined here in the code as having an 
ordinal number higher
   * than the CANCELLED enum constant.
   */
  boolean changeState(State newState) {
synchronized (listeners) {
  if (newState.ordinal() > state.ordinal() && state.ordinal() < 
State.CANCELLED.ordinal()) {
state = newState;
for (Listener listener : listeners) {
  fireStateChange(newState, listener);
}
return true;
  }
  return false;
}
  }
{code}

I think that this code is correct and should not be changed. Once a job has 
transitioned to a terminal state we should not revert it to queued. But it also 
means that we must ensure that the state changes happen sequentially.

> Flaky test: TestSparkClient.testJobSubmission (still flaky)
> ---
>
> Key: HIVE-15168
> URL: https://issues.apache.org/jira/browse/HIVE-15168
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Fix For: 2.2.0
>
> Attachments: HIVE-15168.patch
>
>
> [HIVE-14910|https://issues.apache.org/jira/browse/HIVE-14910] already 
> addressed one source of flakyness bud sadly not all it seems.
> In JobHandleImpl the listeners are registered after the job has been 
> submitted.
> This may end up in a racecondition.
> {code}
>  // Link the RPC and the promise so that events from one are propagated to 
> the other as
>   // needed.
>   rpc.addListener(new 
> GenericFutureListener>() {
> @Override
> public void operationComplete(io.netty.util.concurrent.Future 
> f) {
>   if (f.isSuccess()) {
> handle.changeState(JobHandle.State.QUEUED);
>   } else if (!promise.isDone()) {
> promise.setFailure(f.cause());
>   }
> }
>   });
>   promise.addListener(new GenericFutureListener>() {
> @Override
> public void operationComplete(Promise p) {
>   if (jobId != null) {
> jobs.remove(jobId);
>   }
>   if (p.isCancelled() && !rpc.isDone()) {
> rpc.cancel(true);
>   }
> }
>   });
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15266) Edit test output of negative blobstore tests to match HIVE-15226

2016-11-22 Thread Thomas Poepping (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-15266:
---
Attachment: HIVE-15266.1.patch

Attached initial patch

> Edit test output of negative blobstore tests to match HIVE-15226
> 
>
> Key: HIVE-15266
> URL: https://issues.apache.org/jira/browse/HIVE-15266
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-15266.1.patch
>
>
> In HIVE-15226 ( https://issues.apache.org/jira/browse/HIVE-15226 ), blobstore 
> tests were changed to print a different masking pattern for the blobstore 
> path. In that patch, test output was replaced for the clientpositive test ( 
> insert_into.q ), but not for the clientnegative test ( select_dropped_table.q 
> ), causing the negative tests to fail.
> This patch is the result of -Dtest.output.overwrite=true with the 
> clientnegative tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15266) Edit test output of negative blobstore tests to match HIVE-15226

2016-11-22 Thread Thomas Poepping (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Poepping updated HIVE-15266:
---
Status: Patch Available  (was: Open)

> Edit test output of negative blobstore tests to match HIVE-15226
> 
>
> Key: HIVE-15266
> URL: https://issues.apache.org/jira/browse/HIVE-15266
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-15266.1.patch
>
>
> In HIVE-15226 ( https://issues.apache.org/jira/browse/HIVE-15226 ), blobstore 
> tests were changed to print a different masking pattern for the blobstore 
> path. In that patch, test output was replaced for the clientpositive test ( 
> insert_into.q ), but not for the clientnegative test ( select_dropped_table.q 
> ), causing the negative tests to fail.
> This patch is the result of -Dtest.output.overwrite=true with the 
> clientnegative tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15266) Edit test output of negative blobstore tests to match HIVE-15226

2016-11-22 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15687679#comment-15687679
 ] 

Thomas Poepping commented on HIVE-15266:


[~spena] [~mohitsabharwal] [~stakiar] Can we take a look? Trivial change

> Edit test output of negative blobstore tests to match HIVE-15226
> 
>
> Key: HIVE-15266
> URL: https://issues.apache.org/jira/browse/HIVE-15266
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-15266.1.patch
>
>
> In HIVE-15226 ( https://issues.apache.org/jira/browse/HIVE-15226 ), blobstore 
> tests were changed to print a different masking pattern for the blobstore 
> path. In that patch, test output was replaced for the clientpositive test ( 
> insert_into.q ), but not for the clientnegative test ( select_dropped_table.q 
> ), causing the negative tests to fail.
> This patch is the result of -Dtest.output.overwrite=true with the 
> clientnegative tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15266) Edit test output of negative blobstore tests to match HIVE-15226

2016-11-22 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15687686#comment-15687686
 ] 

Sahil Takiar commented on HIVE-15266:
-

+1 but need a committer to take a look.

> Edit test output of negative blobstore tests to match HIVE-15226
> 
>
> Key: HIVE-15266
> URL: https://issues.apache.org/jira/browse/HIVE-15266
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-15266.1.patch
>
>
> In HIVE-15226 ( https://issues.apache.org/jira/browse/HIVE-15226 ), blobstore 
> tests were changed to print a different masking pattern for the blobstore 
> path. In that patch, test output was replaced for the clientpositive test ( 
> insert_into.q ), but not for the clientnegative test ( select_dropped_table.q 
> ), causing the negative tests to fail.
> This patch is the result of -Dtest.output.overwrite=true with the 
> clientnegative tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15266) Edit test output of negative blobstore tests to match HIVE-15226

2016-11-22 Thread Illya Yalovyy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15687762#comment-15687762
 ] 

Illya Yalovyy commented on HIVE-15266:
--

Thank you for fixing that!

> Edit test output of negative blobstore tests to match HIVE-15226
> 
>
> Key: HIVE-15266
> URL: https://issues.apache.org/jira/browse/HIVE-15266
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-15266.1.patch
>
>
> In HIVE-15226 ( https://issues.apache.org/jira/browse/HIVE-15226 ), blobstore 
> tests were changed to print a different masking pattern for the blobstore 
> path. In that patch, test output was replaced for the clientpositive test ( 
> insert_into.q ), but not for the clientnegative test ( select_dropped_table.q 
> ), causing the negative tests to fail.
> This patch is the result of -Dtest.output.overwrite=true with the 
> clientnegative tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15199) INSERT INTO data on S3 is replacing the old rows with the new ones

2016-11-22 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15687764#comment-15687764
 ] 

Gopal V commented on HIVE-15199:


The HIVE-14535 branch might be interesting to compare, since it prevents 
multiple queries from using the same path.

> INSERT INTO data on S3 is replacing the old rows with the new ones
> --
>
> Key: HIVE-15199
> URL: https://issues.apache.org/jira/browse/HIVE-15199
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Critical
> Attachments: HIVE-15199.1.patch, HIVE-15199.2.patch, 
> HIVE-15199.3.patch, HIVE-15199.4.patch, HIVE-15199.5.patch, 
> HIVE-15199.6.patch, HIVE-15199.7.patch, HIVE-15199.8.patch
>
>
> Any INSERT INTO statement run on S3 tables and when the scratch directory is 
> saved on S3 is deleting old rows of the table.
> {noformat}
> hive> set hive.blobstore.use.blobstore.as.scratchdir=true;
> hive> create table t1 (id int, name string) location 's3a://spena-bucket/t1';
> hive> insert into table t1 values (1,'name1');
> hive> select * from t1;
> 1   name1
> hive> insert into table t1 values (2,'name2');
> hive> select * from t1;
> 2   name2
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15266) Edit test output of negative blobstore tests to match HIVE-15226

2016-11-22 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-15266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15687748#comment-15687748
 ] 

Sergio Peña commented on HIVE-15266:


Looks good.
+1

> Edit test output of negative blobstore tests to match HIVE-15226
> 
>
> Key: HIVE-15266
> URL: https://issues.apache.org/jira/browse/HIVE-15266
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-15266.1.patch
>
>
> In HIVE-15226 ( https://issues.apache.org/jira/browse/HIVE-15226 ), blobstore 
> tests were changed to print a different masking pattern for the blobstore 
> path. In that patch, test output was replaced for the clientpositive test ( 
> insert_into.q ), but not for the clientnegative test ( select_dropped_table.q 
> ), causing the negative tests to fail.
> This patch is the result of -Dtest.output.overwrite=true with the 
> clientnegative tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

2016-11-22 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-15121:

Attachment: HIVE-15121.3.patch

> Last MR job in Hive should be able to write to a different scratch directory
> 
>
> Key: HIVE-15121
> URL: https://issues.apache.org/jira/browse/HIVE-15121
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, 
> HIVE-15121.3.patch, HIVE-15121.WIP.1.patch, HIVE-15121.WIP.2.patch, 
> HIVE-15121.WIP.patch, HIVE-15121.patch
>
>
> Hive should be able to configure all intermediate MR jobs to write to HDFS, 
> but the final MR job to write to S3.
> This will be useful for implementing parallel renames on S3. The idea is that 
> for a mutli-job query, all intermediate MR jobs write to HDFS, and then the 
> final job writes to S3. Writing to HDFS should be faster than writing to S3, 
> so it makes more sense to write intermediate data to HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15266) Edit test output of negative blobstore tests to match HIVE-15226

2016-11-22 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-15266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-15266:
---
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

> Edit test output of negative blobstore tests to match HIVE-15226
> 
>
> Key: HIVE-15266
> URL: https://issues.apache.org/jira/browse/HIVE-15266
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Fix For: 2.2.0
>
> Attachments: HIVE-15266.1.patch
>
>
> In HIVE-15226 ( https://issues.apache.org/jira/browse/HIVE-15226 ), blobstore 
> tests were changed to print a different masking pattern for the blobstore 
> path. In that patch, test output was replaced for the clientpositive test ( 
> insert_into.q ), but not for the clientnegative test ( select_dropped_table.q 
> ), causing the negative tests to fail.
> This patch is the result of -Dtest.output.overwrite=true with the 
> clientnegative tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

2016-11-22 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15687818#comment-15687818
 ] 

Sergio Peña commented on HIVE-15121:


LGTM
+1

> Last MR job in Hive should be able to write to a different scratch directory
> 
>
> Key: HIVE-15121
> URL: https://issues.apache.org/jira/browse/HIVE-15121
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, 
> HIVE-15121.3.patch, HIVE-15121.WIP.1.patch, HIVE-15121.WIP.2.patch, 
> HIVE-15121.WIP.patch, HIVE-15121.patch
>
>
> Hive should be able to configure all intermediate MR jobs to write to HDFS, 
> but the final MR job to write to S3.
> This will be useful for implementing parallel renames on S3. The idea is that 
> for a mutli-job query, all intermediate MR jobs write to HDFS, and then the 
> final job writes to S3. Writing to HDFS should be faster than writing to S3, 
> so it makes more sense to write intermediate data to HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

2016-11-22 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-15121:

Description: 
Hive should be able to configure all intermediate MR jobs to write to HDFS, but 
the final MR job to write to S3.

This will be useful for implementing parallel renames on S3. The idea is that 
for a multi-job query, all intermediate MR jobs write to HDFS, and then the 
final job writes to S3. Writing to HDFS should be faster than writing to S3, so 
it makes more sense to write intermediate data to HDFS.

The advantage is that any copying of data that needs to be done from the 
scratch directory to the final table directory can be done server-side, within 
the blobstore. The MoveTask simply renames data from the scratch directory to 
the final table location, which should translate to a server-side COPY request. 
This way HiveServer2 doesn't have to actually copy any data, it just tells the 
blobstore to do all the work.

  was:
Hive should be able to configure all intermediate MR jobs to write to HDFS, but 
the final MR job to write to S3.

This will be useful for implementing parallel renames on S3. The idea is that 
for a mutli-job query, all intermediate MR jobs write to HDFS, and then the 
final job writes to S3. Writing to HDFS should be faster than writing to S3, so 
it makes more sense to write intermediate data to HDFS.


> Last MR job in Hive should be able to write to a different scratch directory
> 
>
> Key: HIVE-15121
> URL: https://issues.apache.org/jira/browse/HIVE-15121
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, 
> HIVE-15121.3.patch, HIVE-15121.WIP.1.patch, HIVE-15121.WIP.2.patch, 
> HIVE-15121.WIP.patch, HIVE-15121.patch
>
>
> Hive should be able to configure all intermediate MR jobs to write to HDFS, 
> but the final MR job to write to S3.
> This will be useful for implementing parallel renames on S3. The idea is that 
> for a multi-job query, all intermediate MR jobs write to HDFS, and then the 
> final job writes to S3. Writing to HDFS should be faster than writing to S3, 
> so it makes more sense to write intermediate data to HDFS.
> The advantage is that any copying of data that needs to be done from the 
> scratch directory to the final table directory can be done server-side, 
> within the blobstore. The MoveTask simply renames data from the scratch 
> directory to the final table location, which should translate to a 
> server-side COPY request. This way HiveServer2 doesn't have to actually copy 
> any data, it just tells the blobstore to do all the work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15266) Edit test output of negative blobstore tests to match HIVE-15226

2016-11-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15687858#comment-15687858
 ] 

Hive QA commented on HIVE-15266:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12840093/HIVE-15266.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10732 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=133)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=91)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] 
(batchId=90)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2243/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2243/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2243/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12840093 - PreCommit-HIVE-Build

> Edit test output of negative blobstore tests to match HIVE-15226
> 
>
> Key: HIVE-15266
> URL: https://issues.apache.org/jira/browse/HIVE-15266
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Fix For: 2.2.0
>
> Attachments: HIVE-15266.1.patch
>
>
> In HIVE-15226 ( https://issues.apache.org/jira/browse/HIVE-15226 ), blobstore 
> tests were changed to print a different masking pattern for the blobstore 
> path. In that patch, test output was replaced for the clientpositive test ( 
> insert_into.q ), but not for the clientnegative test ( select_dropped_table.q 
> ), causing the negative tests to fail.
> This patch is the result of -Dtest.output.overwrite=true with the 
> clientnegative tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15181) buildQueryWithINClause didn't properly handle multiples of ConfVars.METASTORE_DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE

2016-11-22 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15687917#comment-15687917
 ] 

Wei Zheng commented on HIVE-15181:
--

Thanks Eugene for the comment, I will handle the second part which is to modify 
TxnUtils.needNewQuery() in a separate ticket HIVE-15267

> buildQueryWithINClause didn't properly handle multiples of 
> ConfVars.METASTORE_DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE
> ---
>
> Key: HIVE-15181
> URL: https://issues.apache.org/jira/browse/HIVE-15181
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
>Priority: Critical
> Attachments: HIVE-15181.1.patch, HIVE-15181.2.patch
>
>
> As there is a bug, we can still work around the issue by using the settings 
> below (making sure the second setting is always at least 1000 times of the 
> first setting):
> set hive.direct.sql.max.query.length=1;
> set hive.direct.sql.max.elements.in.clause=1000;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15181) buildQueryWithINClause didn't properly handle multiples of ConfVars.METASTORE_DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE

2016-11-22 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15181:
-
Status: Open  (was: Patch Available)

> buildQueryWithINClause didn't properly handle multiples of 
> ConfVars.METASTORE_DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE
> ---
>
> Key: HIVE-15181
> URL: https://issues.apache.org/jira/browse/HIVE-15181
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Affects Versions: 2.1.0, 1.2.1
>Reporter: Wei Zheng
>Assignee: Wei Zheng
>Priority: Critical
> Attachments: HIVE-15181.1.patch, HIVE-15181.2.patch
>
>
> As there is a bug, we can still work around the issue by using the settings 
> below (making sure the second setting is always at least 1000 times of the 
> first setting):
> set hive.direct.sql.max.query.length=1;
> set hive.direct.sql.max.elements.in.clause=1000;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15181) buildQueryWithINClause didn't properly handle multiples of ConfVars.METASTORE_DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE

2016-11-22 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15181:
-
Attachment: HIVE-15181.3.patch

[~ekoifman] Can you take a look at patch 3 please?

> buildQueryWithINClause didn't properly handle multiples of 
> ConfVars.METASTORE_DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE
> ---
>
> Key: HIVE-15181
> URL: https://issues.apache.org/jira/browse/HIVE-15181
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
>Priority: Critical
> Attachments: HIVE-15181.1.patch, HIVE-15181.2.patch, 
> HIVE-15181.3.patch
>
>
> As there is a bug, we can still work around the issue by using the settings 
> below (making sure the second setting is always at least 1000 times of the 
> first setting):
> set hive.direct.sql.max.query.length=1;
> set hive.direct.sql.max.elements.in.clause=1000;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15181) buildQueryWithINClause didn't properly handle multiples of ConfVars.METASTORE_DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE

2016-11-22 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15181:
-
Status: Patch Available  (was: Open)

> buildQueryWithINClause didn't properly handle multiples of 
> ConfVars.METASTORE_DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE
> ---
>
> Key: HIVE-15181
> URL: https://issues.apache.org/jira/browse/HIVE-15181
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Affects Versions: 2.1.0, 1.2.1
>Reporter: Wei Zheng
>Assignee: Wei Zheng
>Priority: Critical
> Attachments: HIVE-15181.1.patch, HIVE-15181.2.patch, 
> HIVE-15181.3.patch
>
>
> As there is a bug, we can still work around the issue by using the settings 
> below (making sure the second setting is always at least 1000 times of the 
> first setting):
> set hive.direct.sql.max.query.length=1;
> set hive.direct.sql.max.elements.in.clause=1000;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15199) INSERT INTO data on S3 is replacing the old rows with the new ones

2016-11-22 Thread Illya Yalovyy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15687962#comment-15687962
 ] 

Illya Yalovyy commented on HIVE-15199:
--

[~spena] Thank you for the CR link. I have added some comments.

> INSERT INTO data on S3 is replacing the old rows with the new ones
> --
>
> Key: HIVE-15199
> URL: https://issues.apache.org/jira/browse/HIVE-15199
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Critical
> Attachments: HIVE-15199.1.patch, HIVE-15199.2.patch, 
> HIVE-15199.3.patch, HIVE-15199.4.patch, HIVE-15199.5.patch, 
> HIVE-15199.6.patch, HIVE-15199.7.patch, HIVE-15199.8.patch
>
>
> Any INSERT INTO statement run on S3 tables and when the scratch directory is 
> saved on S3 is deleting old rows of the table.
> {noformat}
> hive> set hive.blobstore.use.blobstore.as.scratchdir=true;
> hive> create table t1 (id int, name string) location 's3a://spena-bucket/t1';
> hive> insert into table t1 values (1,'name1');
> hive> select * from t1;
> 1   name1
> hive> insert into table t1 values (2,'name2');
> hive> select * from t1;
> 2   name2
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15199) INSERT INTO data on S3 is replacing the old rows with the new ones

2016-11-22 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15687963#comment-15687963
 ] 

Sahil Takiar commented on HIVE-15199:
-

[~yalovyyi] what other problems with S3 do you think this would help with? In 
what way would this help with eventual consistency? I agree having the query-id 
in the file name is good, but I do agree with Sergio that we need to be careful 
and see what other parts of the code rely on the file name. We should also be 
wary if any external clients rely on these file names.

> INSERT INTO data on S3 is replacing the old rows with the new ones
> --
>
> Key: HIVE-15199
> URL: https://issues.apache.org/jira/browse/HIVE-15199
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Critical
> Attachments: HIVE-15199.1.patch, HIVE-15199.2.patch, 
> HIVE-15199.3.patch, HIVE-15199.4.patch, HIVE-15199.5.patch, 
> HIVE-15199.6.patch, HIVE-15199.7.patch, HIVE-15199.8.patch
>
>
> Any INSERT INTO statement run on S3 tables and when the scratch directory is 
> saved on S3 is deleting old rows of the table.
> {noformat}
> hive> set hive.blobstore.use.blobstore.as.scratchdir=true;
> hive> create table t1 (id int, name string) location 's3a://spena-bucket/t1';
> hive> insert into table t1 values (1,'name1');
> hive> select * from t1;
> 1   name1
> hive> insert into table t1 values (2,'name2');
> hive> select * from t1;
> 2   name2
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15199) INSERT INTO data on S3 is replacing the old rows with the new ones

2016-11-22 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15687975#comment-15687975
 ] 

Yongzhi Chen commented on HIVE-15199:
-

The 8th patch LGTM  +1

> INSERT INTO data on S3 is replacing the old rows with the new ones
> --
>
> Key: HIVE-15199
> URL: https://issues.apache.org/jira/browse/HIVE-15199
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Critical
> Attachments: HIVE-15199.1.patch, HIVE-15199.2.patch, 
> HIVE-15199.3.patch, HIVE-15199.4.patch, HIVE-15199.5.patch, 
> HIVE-15199.6.patch, HIVE-15199.7.patch, HIVE-15199.8.patch
>
>
> Any INSERT INTO statement run on S3 tables and when the scratch directory is 
> saved on S3 is deleting old rows of the table.
> {noformat}
> hive> set hive.blobstore.use.blobstore.as.scratchdir=true;
> hive> create table t1 (id int, name string) location 's3a://spena-bucket/t1';
> hive> insert into table t1 values (1,'name1');
> hive> select * from t1;
> 1   name1
> hive> insert into table t1 values (2,'name2');
> hive> select * from t1;
> 2   name2
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15180) Extend JSONMessageFactory to store additional information about metadata objects on different table events

2016-11-22 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-15180:

Attachment: HIVE-15180.7.patch

> Extend JSONMessageFactory to store additional information about metadata 
> objects on different table events
> --
>
> Key: HIVE-15180
> URL: https://issues.apache.org/jira/browse/HIVE-15180
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-15180.1.patch, HIVE-15180.2.patch, 
> HIVE-15180.3.patch, HIVE-15180.3.patch, HIVE-15180.4.patch, 
> HIVE-15180.5.patch, HIVE-15180.6.patch, HIVE-15180.6.patch, HIVE-15180.7.patch
>
>
> We want the {{NOTIFICATION_LOG}} table to capture additional information 
> about the metadata objects when {{DbNotificationListener}} captures different 
> events for a table (create/drop/alter) and a partition (create/alter/drop). 
> We'll use the messages field to add json objects for table and partitions for 
> create and alter events. The drop events message remains unchanged. These 
> messages can then be used to replay these events on the destination in event 
> of a replication, in a way that will put the destination in a state that's 
> consistent with one of the past states of the source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

2016-11-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15688037#comment-15688037
 ] 

Hive QA commented on HIVE-15121:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12840100/HIVE-15121.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10717 tests 
executed
*Failed tests:*
{noformat}
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=114)

[join39.q,bucketsortoptimize_insert_7.q,vector_distinct_2.q,join11.q,union13.q,dynamic_rdd_cache.q,auto_sortmerge_join_16.q,windowing.q,union_remove_3.q,skewjoinopt7.q,stats7.q,annotate_stats_join.q,multi_insert_lateral_view.q,ptf_streaming.q,join_1to1.q]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=43)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=133)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=133)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=91)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2244/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2244/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2244/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12840100 - PreCommit-HIVE-Build

> Last MR job in Hive should be able to write to a different scratch directory
> 
>
> Key: HIVE-15121
> URL: https://issues.apache.org/jira/browse/HIVE-15121
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, 
> HIVE-15121.3.patch, HIVE-15121.WIP.1.patch, HIVE-15121.WIP.2.patch, 
> HIVE-15121.WIP.patch, HIVE-15121.patch
>
>
> Hive should be able to configure all intermediate MR jobs to write to HDFS, 
> but the final MR job to write to S3.
> This will be useful for implementing parallel renames on S3. The idea is that 
> for a multi-job query, all intermediate MR jobs write to HDFS, and then the 
> final job writes to S3. Writing to HDFS should be faster than writing to S3, 
> so it makes more sense to write intermediate data to HDFS.
> The advantage is that any copying of data that needs to be done from the 
> scratch directory to the final table directory can be done server-side, 
> within the blobstore. The MoveTask simply renames data from the scratch 
> directory to the final table location, which should translate to a 
> server-side COPY request. This way HiveServer2 doesn't have to actually copy 
> any data, it just tells the blobstore to do all the work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15263) Detect the values for incorrect NULL values

2016-11-22 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-15263:

Attachment: (was: HIVE-15263.1.patch)

> Detect the values for incorrect NULL values
> ---
>
> Key: HIVE-15263
> URL: https://issues.apache.org/jira/browse/HIVE-15263
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>
> We have seen the incorrect NULL values for SD_ID in TBLS for the hive tables. 
> That column can be null since it will be NULL for hive views. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15263) Detect the values for incorrect NULL values

2016-11-22 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-15263:

Status: In Progress  (was: Patch Available)

> Detect the values for incorrect NULL values
> ---
>
> Key: HIVE-15263
> URL: https://issues.apache.org/jira/browse/HIVE-15263
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>
> We have seen the incorrect NULL values for SD_ID in TBLS for the hive tables. 
> That column can be null since it will be NULL for hive views. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15263) Detect the values for incorrect NULL values

2016-11-22 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-15263:

Attachment: HIVE-15263.1.patch

> Detect the values for incorrect NULL values
> ---
>
> Key: HIVE-15263
> URL: https://issues.apache.org/jira/browse/HIVE-15263
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-15263.1.patch
>
>
> We have seen the incorrect NULL values for SD_ID in TBLS for the hive tables. 
> That column can be null since it will be NULL for hive views. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15263) Detect the values for incorrect NULL values

2016-11-22 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-15263:

Status: Patch Available  (was: In Progress)

> Detect the values for incorrect NULL values
> ---
>
> Key: HIVE-15263
> URL: https://issues.apache.org/jira/browse/HIVE-15263
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-15263.1.patch
>
>
> We have seen the incorrect NULL values for SD_ID in TBLS for the hive tables. 
> That column can be null since it will be NULL for hive views. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15240) Updating/Altering stats in metastore can be expensive in S3

2016-11-22 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-15240:

Attachment: HIVE-15240.3.patch

> Updating/Altering stats in metastore can be expensive in S3
> ---
>
> Key: HIVE-15240
> URL: https://issues.apache.org/jira/browse/HIVE-15240
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-15240.1.patch, HIVE-15240.2.patch, 
> HIVE-15240.3.patch
>
>
> https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java#L630
> https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java#L367
> If there are 100 partitions, it iterates every partition to determine its 
> location taking up more than good amount of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15199) INSERT INTO data on S3 is replacing the old rows with the new ones

2016-11-22 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-15199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-15199:
---
Attachment: HIVE-15199.9.patch

Addressed issues from [~yalovyyi] on the RB.

Also, I'll keep to the if (!exists || !rename) condition on S3, and not using 
the listFiles() to avoid OOM issues with concurrent HS2 requests. We can design 
a better performance approach in a different JIRA.

> INSERT INTO data on S3 is replacing the old rows with the new ones
> --
>
> Key: HIVE-15199
> URL: https://issues.apache.org/jira/browse/HIVE-15199
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Critical
> Attachments: HIVE-15199.1.patch, HIVE-15199.2.patch, 
> HIVE-15199.3.patch, HIVE-15199.4.patch, HIVE-15199.5.patch, 
> HIVE-15199.6.patch, HIVE-15199.7.patch, HIVE-15199.8.patch, HIVE-15199.9.patch
>
>
> Any INSERT INTO statement run on S3 tables and when the scratch directory is 
> saved on S3 is deleting old rows of the table.
> {noformat}
> hive> set hive.blobstore.use.blobstore.as.scratchdir=true;
> hive> create table t1 (id int, name string) location 's3a://spena-bucket/t1';
> hive> insert into table t1 values (1,'name1');
> hive> select * from t1;
> 1   name1
> hive> insert into table t1 values (2,'name2');
> hive> select * from t1;
> 2   name2
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6689) Provide an option to not display partition columns separately in describe table output

2016-11-22 Thread Sergey Tsoy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15688169#comment-15688169
 ] 

Sergey Tsoy commented on HIVE-6689:
---

This flag does not seem to work with "DESCRIBE FORMATTED". Is it expected?

> Provide an option to not display partition columns separately in describe 
> table output 
> ---
>
> Key: HIVE-6689
> URL: https://issues.apache.org/jira/browse/HIVE-6689
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.11.0, 0.12.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 0.13.0
>
> Attachments: HIVE-6689.1.patch, HIVE-6689.2.patch, HIVE-6689.patch
>
>
> In ancient Hive partition columns were not displayed differently, in newer 
> version they are displayed differently. This has resulted in backward 
> incompatible change for upgrade scenarios. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15236) timestamp and date comparison should happen in timestamp

2016-11-22 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-15236:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Added ut in TestFunctionRegistry. Pushed to master. 

> timestamp and date comparison should happen in timestamp
> 
>
> Key: HIVE-15236
> URL: https://issues.apache.org/jira/browse/HIVE-15236
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 2.2.0
>
> Attachments: HIVE-15236.patch
>
>
> Currently it happens in string, which results in incorrect result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

2016-11-22 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15688185#comment-15688185
 ] 

Sahil Takiar commented on HIVE-15121:
-

[~spena] test failures look unrelated, and the tests are failing on other 
patches too.

> Last MR job in Hive should be able to write to a different scratch directory
> 
>
> Key: HIVE-15121
> URL: https://issues.apache.org/jira/browse/HIVE-15121
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, 
> HIVE-15121.3.patch, HIVE-15121.WIP.1.patch, HIVE-15121.WIP.2.patch, 
> HIVE-15121.WIP.patch, HIVE-15121.patch
>
>
> Hive should be able to configure all intermediate MR jobs to write to HDFS, 
> but the final MR job to write to S3.
> This will be useful for implementing parallel renames on S3. The idea is that 
> for a multi-job query, all intermediate MR jobs write to HDFS, and then the 
> final job writes to S3. Writing to HDFS should be faster than writing to S3, 
> so it makes more sense to write intermediate data to HDFS.
> The advantage is that any copying of data that needs to be done from the 
> scratch directory to the final table directory can be done server-side, 
> within the blobstore. The MoveTask simply renames data from the scratch 
> directory to the final table location, which should translate to a 
> server-side COPY request. This way HiveServer2 doesn't have to actually copy 
> any data, it just tells the blobstore to do all the work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-14923) DATE and TIMESTAMP comparisons do not work

2016-11-22 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-14923.
-
   Resolution: Fixed
 Assignee: Ashutosh Chauhan
Fix Version/s: 2.2.0

Fixed via HIVE-15236

> DATE and TIMESTAMP comparisons do not work
> --
>
> Key: HIVE-14923
> URL: https://issues.apache.org/jira/browse/HIVE-14923
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Ashutosh Chauhan
>Priority: Critical
> Fix For: 2.2.0
>
>
> When comparing a DATE type with a TIMESTAMP type, the planner promotes both 
> sides to string.  But since the DATE value does not include the 
> hh:mm:ss[.n] the comparison produces wrong results.
> Thanks to Jason Dere for observing this.
> Here is a portion of an EXPLAIN output:
> {code}
>   Map Join Operator
> condition map:
>  Inner Join 0 to 1
> keys:
>   0 UDFToString(some_timestamp) (type: string)
>   1 UDFToString(some_date) (type: string)
> {code}
> Workaround is to cast the DATE to a TIMESTAMP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (HIVE-15199) INSERT INTO data on S3 is replacing the old rows with the new ones

2016-11-22 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-15199:

Comment: was deleted

(was: The 8th patch LGTM  +1)

> INSERT INTO data on S3 is replacing the old rows with the new ones
> --
>
> Key: HIVE-15199
> URL: https://issues.apache.org/jira/browse/HIVE-15199
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Critical
> Attachments: HIVE-15199.1.patch, HIVE-15199.2.patch, 
> HIVE-15199.3.patch, HIVE-15199.4.patch, HIVE-15199.5.patch, 
> HIVE-15199.6.patch, HIVE-15199.7.patch, HIVE-15199.8.patch, HIVE-15199.9.patch
>
>
> Any INSERT INTO statement run on S3 tables and when the scratch directory is 
> saved on S3 is deleting old rows of the table.
> {noformat}
> hive> set hive.blobstore.use.blobstore.as.scratchdir=true;
> hive> create table t1 (id int, name string) location 's3a://spena-bucket/t1';
> hive> insert into table t1 values (1,'name1');
> hive> select * from t1;
> 1   name1
> hive> insert into table t1 values (2,'name2');
> hive> select * from t1;
> 2   name2
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15181) buildQueryWithINClause didn't properly handle multiples of ConfVars.METASTORE_DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE

2016-11-22 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15688218#comment-15688218
 ] 

Eugene Koifman commented on HIVE-15181:
---

A nit:  It seems better to throw IllegalArgumentException rather than 
SQLException, otherwise LGTM

> buildQueryWithINClause didn't properly handle multiples of 
> ConfVars.METASTORE_DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE
> ---
>
> Key: HIVE-15181
> URL: https://issues.apache.org/jira/browse/HIVE-15181
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
>Priority: Critical
> Attachments: HIVE-15181.1.patch, HIVE-15181.2.patch, 
> HIVE-15181.3.patch
>
>
> As there is a bug, we can still work around the issue by using the settings 
> below (making sure the second setting is always at least 1000 times of the 
> first setting):
> set hive.direct.sql.max.query.length=1;
> set hive.direct.sql.max.elements.in.clause=1000;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

2016-11-22 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-15121:
---
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Thanks [~stakiar] for your contribution. I committed this to master.

> Last MR job in Hive should be able to write to a different scratch directory
> 
>
> Key: HIVE-15121
> URL: https://issues.apache.org/jira/browse/HIVE-15121
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Fix For: 2.2.0
>
> Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, 
> HIVE-15121.3.patch, HIVE-15121.WIP.1.patch, HIVE-15121.WIP.2.patch, 
> HIVE-15121.WIP.patch, HIVE-15121.patch
>
>
> Hive should be able to configure all intermediate MR jobs to write to HDFS, 
> but the final MR job to write to S3.
> This will be useful for implementing parallel renames on S3. The idea is that 
> for a multi-job query, all intermediate MR jobs write to HDFS, and then the 
> final job writes to S3. Writing to HDFS should be faster than writing to S3, 
> so it makes more sense to write intermediate data to HDFS.
> The advantage is that any copying of data that needs to be done from the 
> scratch directory to the final table directory can be done server-side, 
> within the blobstore. The MoveTask simply renames data from the scratch 
> directory to the final table location, which should translate to a 
> server-side COPY request. This way HiveServer2 doesn't have to actually copy 
> any data, it just tells the blobstore to do all the work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15181) buildQueryWithINClause didn't properly handle multiples of ConfVars.METASTORE_DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE

2016-11-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15688252#comment-15688252
 ] 

Hive QA commented on HIVE-15181:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12840121/HIVE-15181.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10713 tests 
executed
*Failed tests:*
{noformat}
TestMiniSparkOnYarnCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=158)

[scriptfile1.q,vector_outer_join5.q,file_with_header_footer.q,bucket4.q,input16_cc.q,bucket5.q,infer_bucket_sort_merge.q,constprog_partitioner.q,orc_merge2.q,reduce_deduplicate.q,schemeAuthority2.q,load_fs2.q,orc_merge8.q,orc_merge_incompat2.q,infer_bucket_sort_bucketed_table.q,vector_outer_join4.q,disable_merge_for_bucketing.q,vector_inner_join.q,orc_merge7.q]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=133)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_1] 
(batchId=90)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=91)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2245/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2245/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2245/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12840121 - PreCommit-HIVE-Build

> buildQueryWithINClause didn't properly handle multiples of 
> ConfVars.METASTORE_DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE
> ---
>
> Key: HIVE-15181
> URL: https://issues.apache.org/jira/browse/HIVE-15181
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
>Priority: Critical
> Attachments: HIVE-15181.1.patch, HIVE-15181.2.patch, 
> HIVE-15181.3.patch
>
>
> As there is a bug, we can still work around the issue by using the settings 
> below (making sure the second setting is always at least 1000 times of the 
> first setting):
> set hive.direct.sql.max.query.length=1;
> set hive.direct.sql.max.elements.in.clause=1000;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15199) INSERT INTO data on S3 is replacing the old rows with the new ones

2016-11-22 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15688250#comment-15688250
 ] 

Yongzhi Chen commented on HIVE-15199:
-

+1 for 9th patch pending tests

> INSERT INTO data on S3 is replacing the old rows with the new ones
> --
>
> Key: HIVE-15199
> URL: https://issues.apache.org/jira/browse/HIVE-15199
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Critical
> Attachments: HIVE-15199.1.patch, HIVE-15199.2.patch, 
> HIVE-15199.3.patch, HIVE-15199.4.patch, HIVE-15199.5.patch, 
> HIVE-15199.6.patch, HIVE-15199.7.patch, HIVE-15199.8.patch, HIVE-15199.9.patch
>
>
> Any INSERT INTO statement run on S3 tables and when the scratch directory is 
> saved on S3 is deleting old rows of the table.
> {noformat}
> hive> set hive.blobstore.use.blobstore.as.scratchdir=true;
> hive> create table t1 (id int, name string) location 's3a://spena-bucket/t1';
> hive> insert into table t1 values (1,'name1');
> hive> select * from t1;
> 1   name1
> hive> insert into table t1 values (2,'name2');
> hive> select * from t1;
> 2   name2
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15247) Pass the purge option for drop table to storage handlers

2016-11-22 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-15247:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master.

> Pass the purge option for drop table to storage handlers
> 
>
> Key: HIVE-15247
> URL: https://issues.apache.org/jira/browse/HIVE-15247
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.0.0, 2.1.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 2.2.0
>
> Attachments: HIVE-15247.patch
>
>
> This gives storage handler more control on how to handle drop table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-14646) poison metastore APIs to make sure we can fail old clients for backward compat

2016-11-22 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-14646.
-
   Resolution: Fixed
Fix Version/s: hive-14535

Committed to branch

> poison metastore APIs to make sure we can fail old clients for backward compat
> --
>
> Key: HIVE-14646
> URL: https://issues.apache.org/jira/browse/HIVE-14646
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: hive-14535
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15234) Semijoin cardinality estimation can be improved

2016-11-22 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15688271#comment-15688271
 ] 

Ashutosh Chauhan commented on HIVE-15234:
-

[~jcamachorodriguez] Can you take a look?

> Semijoin cardinality estimation can be improved
> ---
>
> Key: HIVE-15234
> URL: https://issues.apache.org/jira/browse/HIVE-15234
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Logical Optimizer
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-15234.1.patch, HIVE-15234.2.patch, HIVE-15234.patch
>
>
> Currently calcite optimization rules rely on (Hive)SemiJoin to represent semi 
> join node, whereas Stats estimate use {{leftSemiJoin}} field of Join to 
> estimate stats. As a result semi-join specific stats calculation logic is 
> never hit since at plan generation time HiveSemiJoin is created and 
> leftSemiJoin field of Join is never set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

2016-11-22 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-15121:
--
Labels: TODOC2.2  (was: )

> Last MR job in Hive should be able to write to a different scratch directory
> 
>
> Key: HIVE-15121
> URL: https://issues.apache.org/jira/browse/HIVE-15121
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, 
> HIVE-15121.3.patch, HIVE-15121.WIP.1.patch, HIVE-15121.WIP.2.patch, 
> HIVE-15121.WIP.patch, HIVE-15121.patch
>
>
> Hive should be able to configure all intermediate MR jobs to write to HDFS, 
> but the final MR job to write to S3.
> This will be useful for implementing parallel renames on S3. The idea is that 
> for a multi-job query, all intermediate MR jobs write to HDFS, and then the 
> final job writes to S3. Writing to HDFS should be faster than writing to S3, 
> so it makes more sense to write intermediate data to HDFS.
> The advantage is that any copying of data that needs to be done from the 
> scratch directory to the final table directory can be done server-side, 
> within the blobstore. The MoveTask simply renames data from the scratch 
> directory to the final table location, which should translate to a 
> server-side COPY request. This way HiveServer2 doesn't have to actually copy 
> any data, it just tells the blobstore to do all the work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

2016-11-22 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15688365#comment-15688365
 ] 

Lefty Leverenz commented on HIVE-15121:
---

Doc note:  This adds *hive.blobstore.optimizations.enabled* to HiveConf.java, 
so it needs to be documented in the wiki for release 2.2.0.  I recommend using 
the description in patch 2 (revision 3 on the Review Board) instead of 
referring back here for details:

{quote}
This parameter enables a number of optimizations when running on blobstores:
(1) If hive.blobstore.use.blobstore.as.scratchdir is false, force the last Hive 
job to write to the blobstore. This is a performance optimization that forces 
the final FileSinkOperator to write to the blobstore. The advantage is that any 
copying of data that needs to be done from the scratch directory to the final 
table directory can be done server-side, within the blobstore. The MoveTask 
simply renames data from the scratch directory to the final table location, 
which should translate to a server-side COPY request. This way HiveServer2 
doesn't have to actually copy any data, it just tells the blobstore to do all 
the work.
{quote}

I'm not sure if *hive.blobstore.optimizations.enabled* belongs in the general 
query execution section or a new blobstore section, along with the two 
parameters created by HIVE-14270.

* [Hive Configuration Properties | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties]

Added a TODOC2.2 label.

> Last MR job in Hive should be able to write to a different scratch directory
> 
>
> Key: HIVE-15121
> URL: https://issues.apache.org/jira/browse/HIVE-15121
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, 
> HIVE-15121.3.patch, HIVE-15121.WIP.1.patch, HIVE-15121.WIP.2.patch, 
> HIVE-15121.WIP.patch, HIVE-15121.patch
>
>
> Hive should be able to configure all intermediate MR jobs to write to HDFS, 
> but the final MR job to write to S3.
> This will be useful for implementing parallel renames on S3. The idea is that 
> for a multi-job query, all intermediate MR jobs write to HDFS, and then the 
> final job writes to S3. Writing to HDFS should be faster than writing to S3, 
> so it makes more sense to write intermediate data to HDFS.
> The advantage is that any copying of data that needs to be done from the 
> scratch directory to the final table directory can be done server-side, 
> within the blobstore. The MoveTask simply renames data from the scratch 
> directory to the final table location, which should translate to a 
> server-side COPY request. This way HiveServer2 doesn't have to actually copy 
> any data, it just tells the blobstore to do all the work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15181) buildQueryWithINClause didn't properly handle multiples of ConfVars.METASTORE_DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE

2016-11-22 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15181:
-
Status: Open  (was: Patch Available)

> buildQueryWithINClause didn't properly handle multiples of 
> ConfVars.METASTORE_DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE
> ---
>
> Key: HIVE-15181
> URL: https://issues.apache.org/jira/browse/HIVE-15181
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Affects Versions: 2.1.0, 1.2.1
>Reporter: Wei Zheng
>Assignee: Wei Zheng
>Priority: Critical
> Attachments: HIVE-15181.1.patch, HIVE-15181.2.patch, 
> HIVE-15181.3.patch
>
>
> As there is a bug, we can still work around the issue by using the settings 
> below (making sure the second setting is always at least 1000 times of the 
> first setting):
> set hive.direct.sql.max.query.length=1;
> set hive.direct.sql.max.elements.in.clause=1000;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15181) buildQueryWithINClause didn't properly handle multiples of ConfVars.METASTORE_DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE

2016-11-22 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15181:
-
Status: Patch Available  (was: Open)

> buildQueryWithINClause didn't properly handle multiples of 
> ConfVars.METASTORE_DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE
> ---
>
> Key: HIVE-15181
> URL: https://issues.apache.org/jira/browse/HIVE-15181
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Affects Versions: 2.1.0, 1.2.1
>Reporter: Wei Zheng
>Assignee: Wei Zheng
>Priority: Critical
> Attachments: HIVE-15181.1.patch, HIVE-15181.2.patch, 
> HIVE-15181.3.patch, HIVE-15181.4.patch
>
>
> As there is a bug, we can still work around the issue by using the settings 
> below (making sure the second setting is always at least 1000 times of the 
> first setting):
> set hive.direct.sql.max.query.length=1;
> set hive.direct.sql.max.elements.in.clause=1000;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15181) buildQueryWithINClause didn't properly handle multiples of ConfVars.METASTORE_DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE

2016-11-22 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15181:
-
Attachment: HIVE-15181.4.patch

patch 4 replaces SQLException with IllegalArgumentException

> buildQueryWithINClause didn't properly handle multiples of 
> ConfVars.METASTORE_DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE
> ---
>
> Key: HIVE-15181
> URL: https://issues.apache.org/jira/browse/HIVE-15181
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
>Priority: Critical
> Attachments: HIVE-15181.1.patch, HIVE-15181.2.patch, 
> HIVE-15181.3.patch, HIVE-15181.4.patch
>
>
> As there is a bug, we can still work around the issue by using the settings 
> below (making sure the second setting is always at least 1000 times of the 
> first setting):
> set hive.direct.sql.max.query.length=1;
> set hive.direct.sql.max.elements.in.clause=1000;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15180) Extend JSONMessageFactory to store additional information about metadata objects on different table events

2016-11-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15688396#comment-15688396
 ] 

Hive QA commented on HIVE-15180:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12840131/HIVE-15180.7.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10717 tests 
executed
*Failed tests:*
{noformat}
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=98)

[avro_joins.q,skewjoinopt16.q,auto_join14.q,vectorization_14.q,auto_join26.q,stats1.q,cbo_stats.q,auto_sortmerge_join_6.q,union22.q,union_remove_24.q,union_view.q,smb_mapjoin_22.q,stats15.q,ptf_matchpath.q,transform_ppr1.q]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=133)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_1] 
(batchId=90)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_3] 
(batchId=90)
org.apache.hive.hcatalog.api.TestHCatClientNotification.createTable 
(batchId=217)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2247/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2247/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2247/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12840131 - PreCommit-HIVE-Build

> Extend JSONMessageFactory to store additional information about metadata 
> objects on different table events
> --
>
> Key: HIVE-15180
> URL: https://issues.apache.org/jira/browse/HIVE-15180
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-15180.1.patch, HIVE-15180.2.patch, 
> HIVE-15180.3.patch, HIVE-15180.3.patch, HIVE-15180.4.patch, 
> HIVE-15180.5.patch, HIVE-15180.6.patch, HIVE-15180.6.patch, HIVE-15180.7.patch
>
>
> We want the {{NOTIFICATION_LOG}} table to capture additional information 
> about the metadata objects when {{DbNotificationListener}} captures different 
> events for a table (create/drop/alter) and a partition (create/alter/drop). 
> We'll use the messages field to add json objects for table and partitions for 
> create and alter events. The drop events message remains unchanged. These 
> messages can then be used to replay these events on the destination in event 
> of a replication, in a way that will put the destination in a state that's 
> consistent with one of the past states of the source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15181) buildQueryWithINClause didn't properly handle multiples of ConfVars.METASTORE_DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE

2016-11-22 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15688431#comment-15688431
 ] 

Eugene Koifman commented on HIVE-15181:
---

+1 pending tests

> buildQueryWithINClause didn't properly handle multiples of 
> ConfVars.METASTORE_DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE
> ---
>
> Key: HIVE-15181
> URL: https://issues.apache.org/jira/browse/HIVE-15181
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
>Priority: Critical
> Attachments: HIVE-15181.1.patch, HIVE-15181.2.patch, 
> HIVE-15181.3.patch, HIVE-15181.4.patch
>
>
> As there is a bug, we can still work around the issue by using the settings 
> below (making sure the second setting is always at least 1000 times of the 
> first setting):
> set hive.direct.sql.max.query.length=1;
> set hive.direct.sql.max.elements.in.clause=1000;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13931) Add support for HikariCP connection pooling

2016-11-22 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-13931:
--
Labels: TODOC2.2  (was: )

> Add support for HikariCP  connection pooling
> 
>
> Key: HIVE-13931
> URL: https://issues.apache.org/jira/browse/HIVE-13931
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Sushanth Sowmyan
>Assignee: Prasanth Jayachandran
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-13931.2.patch, HIVE-13931.3.patch, 
> HIVE-13931.4.patch, HIVE-13931.patch
>
>
> Currently, we use BoneCP as our primary connection pooling mechanism 
> (overridable by users). However, BoneCP is no longer being actively 
> developed, and is considered deprecated, replaced by HikariCP.
> Thus, we should add support for HikariCP, and try to replace our primary 
> usage of BoneCP with it.
> Note : bonecp is still the default for now, the version of hikaricp being 
> used requires java 8 runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13931) Add support for HikariCP connection pooling

2016-11-22 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15688465#comment-15688465
 ] 

Lefty Leverenz commented on HIVE-13931:
---

Doc note:  This adds a new value for *datanucleus.connectionPoolingType* in 
HiveConf.java, so the wiki description needs to be updated for release 2.2.0.  
The parameter could also be documented in the Metastore Administration doc 
(although it would go in a table that omits most of the datanucleus parameters 
and has a disclaimer about being obsolete).

* [Configuration Properties -- datanucleus.connectionPoolingType | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-datanucleus.connectionPoolingType]
* [Metastore Administration -- Additional Configuration Parameters | 
https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin#AdminManualMetastoreAdmin-AdditionalConfigurationParameters]

Added a TODOC2.2 label.

> Add support for HikariCP  connection pooling
> 
>
> Key: HIVE-13931
> URL: https://issues.apache.org/jira/browse/HIVE-13931
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Sushanth Sowmyan
>Assignee: Prasanth Jayachandran
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-13931.2.patch, HIVE-13931.3.patch, 
> HIVE-13931.4.patch, HIVE-13931.patch
>
>
> Currently, we use BoneCP as our primary connection pooling mechanism 
> (overridable by users). However, BoneCP is no longer being actively 
> developed, and is considered deprecated, replaced by HikariCP.
> Thus, we should add support for HikariCP, and try to replace our primary 
> usage of BoneCP with it.
> Note : bonecp is still the default for now, the version of hikaricp being 
> used requires java 8 runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15263) Detect the values for incorrect NULL values

2016-11-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15688507#comment-15688507
 ] 

Hive QA commented on HIVE-15263:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12840132/HIVE-15263.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10733 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=133)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_1] 
(batchId=90)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] 
(batchId=91)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2248/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2248/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2248/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12840132 - PreCommit-HIVE-Build

> Detect the values for incorrect NULL values
> ---
>
> Key: HIVE-15263
> URL: https://issues.apache.org/jira/browse/HIVE-15263
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-15263.1.patch
>
>
> We have seen the incorrect NULL values for SD_ID in TBLS for the hive tables. 
> That column can be null since it will be NULL for hive views. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15250) Reuse partitions info generated in MoveTask to its subscribers (StatsTask)

2016-11-22 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-15250:

Attachment: HIVE-15250.2.patch

> Reuse partitions info generated in MoveTask to its subscribers (StatsTask)
>
> -
>
> Key: HIVE-15250
> URL: https://issues.apache.org/jira/browse/HIVE-15250
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-15250.1.patch, HIVE-15250.2.patch
>
>
> When dynamic partitions are enabled, {{StatsTask}} loads partition 
> information by querying metastore. In cases like {{insert overwrite table}}, 
> this can be expensive operation depending on the number of partitions 
> involved (for e.g, in tpcds populating web_returns table would incur 2184 DB 
> calls just on this function).
> It would be good to pass on the partition information generated in MoveTask 
> to its subscribers to reduce the number of DB calls.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15250) Reuse partitions info generated in MoveTask to its subscribers (StatsTask)

2016-11-22 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-15250:

Status: Patch Available  (was: Open)

> Reuse partitions info generated in MoveTask to its subscribers (StatsTask)
>
> -
>
> Key: HIVE-15250
> URL: https://issues.apache.org/jira/browse/HIVE-15250
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-15250.1.patch, HIVE-15250.2.patch
>
>
> When dynamic partitions are enabled, {{StatsTask}} loads partition 
> information by querying metastore. In cases like {{insert overwrite table}}, 
> this can be expensive operation depending on the number of partitions 
> involved (for e.g, in tpcds populating web_returns table would incur 2184 DB 
> calls just on this function).
> It would be good to pass on the partition information generated in MoveTask 
> to its subscribers to reduce the number of DB calls.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15240) Updating/Altering stats in metastore can be expensive in S3

2016-11-22 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-15240:

Status: Patch Available  (was: Open)

> Updating/Altering stats in metastore can be expensive in S3
> ---
>
> Key: HIVE-15240
> URL: https://issues.apache.org/jira/browse/HIVE-15240
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-15240.1.patch, HIVE-15240.2.patch, 
> HIVE-15240.3.patch
>
>
> https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java#L630
> https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java#L367
> If there are 100 partitions, it iterates every partition to determine its 
> location taking up more than good amount of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >