date:20150529

[jira] [Updated] (HIVE-10821) Beeline-CLI: Implement CLI source command using Beeline functionality

2015-05-29 Thread Ferdinand Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-10821:

Attachment: HIVE-10821.2-beeline-cli.patch

Hi [~xuefuz], do you have any further comments about this jira? Thank you!

> Beeline-CLI: Implement CLI source command using Beeline functionality
> -
>
> Key: HIVE-10821
> URL: https://issues.apache.org/jira/browse/HIVE-10821
> Project: Hive
>  Issue Type: Sub-task
>  Components: CLI
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-10821.1-beeline-cli.patch, 
> HIVE-10821.1-beeline-cli.patch, HIVE-10821.2-beeline-cli.patch, 
> HIVE-10821.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10821) Beeline-CLI: Implement CLI source command using Beeline functionality

2015-05-29 Thread Ferdinand Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-10821:

Release Note: 
Examples:
#cat /root/workspace/test.sql 
create table test2(a string, b string);
#0: jdbc:hive2://> source /root/workspace/test.sql
#0: jdbc:hive2://> create table test2(a string, b string);

  was:
Examples:
{noformat}
#cat /root/workspace/test.sql 
create table test2(a string, b string);
#0: jdbc:hive2://> source /root/workspace/test.sql
#0: jdbc:hive2://> create table test2(a string, b string);
{noformat}


> Beeline-CLI: Implement CLI source command using Beeline functionality
> -
>
> Key: HIVE-10821
> URL: https://issues.apache.org/jira/browse/HIVE-10821
> Project: Hive
>  Issue Type: Sub-task
>  Components: CLI
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-10821.1-beeline-cli.patch, 
> HIVE-10821.1-beeline-cli.patch, HIVE-10821.2-beeline-cli.patch, 
> HIVE-10821.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7567) support automatic calculating reduce task number [Spark Branch]

2015-05-29 Thread Lefty Leverenz (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-7567:
-
Labels: TODOC-SPARK TODOC15  (was: TODOC-SPARK)

> support automatic calculating reduce task number [Spark Branch]
> ---
>
> Key: HIVE-7567
> URL: https://issues.apache.org/jira/browse/HIVE-7567
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>  Labels: TODOC-SPARK, TODOC15
> Fix For: 1.1.0
>
> Attachments: HIVE-7567.1-spark.patch, HIVE-7567.2-spark.patch, 
> HIVE-7567.3-spark.patch, HIVE-7567.4-spark.patch, HIVE-7567.5-spark.patch, 
> HIVE-7567.6-spark.patch
>
>
> Hive have its own machenism to calculate reduce task number, we need to 
> implement it on spark job.
> NO PRECOMMIT TESTS. This is for spark-branch only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7567) support automatic calculating reduce task number [Spark Branch]

2015-05-29 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564344#comment-14564344
 ] 

Lefty Leverenz commented on HIVE-7567:
--

Adding TODOC15 (which actually means TODOC1.1.0, but let's not proliferate 
labels).

> support automatic calculating reduce task number [Spark Branch]
> ---
>
> Key: HIVE-7567
> URL: https://issues.apache.org/jira/browse/HIVE-7567
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>  Labels: TODOC-SPARK, TODOC15
> Fix For: 1.1.0
>
> Attachments: HIVE-7567.1-spark.patch, HIVE-7567.2-spark.patch, 
> HIVE-7567.3-spark.patch, HIVE-7567.4-spark.patch, HIVE-7567.5-spark.patch, 
> HIVE-7567.6-spark.patch
>
>
> Hive have its own machenism to calculate reduce task number, we need to 
> implement it on spark job.
> NO PRECOMMIT TESTS. This is for spark-branch only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10832) ColumnStatsTask failure when processing large amount of partitions

2015-05-29 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564345#comment-14564345
 ] 

Chao Sun commented on HIVE-10832:
-

Hi [~ashutoshc], the patch does work, but it needs a heavy rebase since 
there're lots of changes since then.
Also, after the query the "Column stats" is still "PARTIAL" after I run
{code}
analyze table catalog_returns partition(cr_returned_date_sk) compute statistics 
for columns
{code}
I also tried to put all the column names after "for columns", and the result is 
still the same.
Do you know whether this command really work? Thanks.

> ColumnStatsTask failure when processing large amount of partitions
> --
>
> Key: HIVE-10832
> URL: https://issues.apache.org/jira/browse/HIVE-10832
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 1.1.0
>Reporter: Chao Sun
>
> We are trying to populate column stats for a TPC-DS 4TB dataset, and, every 
> time we try to do:
> {code}
> analyze table catalog_sales partition(cs_sold_date_sk) compute statistics for 
> columns;
> {code}
> it ends up with the failure:
> {noformat}
> 2015-05-26 12:14:53,128 WARN 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient: MetaStoreClient 
> lost connection. Attempting to reconnect.
> org.apache.thrift.transport.TTransportException: 
> java.net.SocketTimeoutException: Read timed out
> at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
> at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_set_aggr_stats_for(ThriftHiveMetastore.java:2974)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.set_aggr_stats_for(ThriftHiveMetastore.java:2961)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.setPartitionColumnStatistics(HiveMetaStoreClient.java:1376)
> at sun.reflect.GeneratedMethodAccessor44.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:91)
> at com.sun.proxy.$Proxy10.setPartitionColumnStatistics(Unknown Source)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.setPartitionColumnStatistics(Hive.java:2921)
> at 
> org.apache.hadoop.hive.ql.exec.ColumnStatsTask.persistPartitionStats(ColumnStatsTask.java:349)
> at org.apache.hadoop.hive.ql.exec.ColumnStatsTask.execute(Write 
> failed: Broken pipe
> ~ $ at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1638)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1397)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1181)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1047)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1042)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:145)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:70)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:197)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:209)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStr

[jira] [Updated] (HIVE-9487) Make Remote Spark Context secure [Spark Branch]

2015-05-29 Thread Lefty Leverenz (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-9487:
-
Labels: TODOC-SPARK TODOC15  (was: TODOC-SPARK)

> Make Remote Spark Context secure [Spark Branch]
> ---
>
> Key: HIVE-9487
> URL: https://issues.apache.org/jira/browse/HIVE-9487
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>  Labels: TODOC-SPARK, TODOC15
> Fix For: 1.1.0
>
> Attachments: HIVE-9487.1-spark.patch, HIVE-9487.2-spark.patch
>
>
> The RSC currently uses an ad-hoc, insecure authentication mechanism. We 
> should instead use a proper auth mechanism and add encryption to the mix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9487) Make Remote Spark Context secure [Spark Branch]

2015-05-29 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564359#comment-14564359
 ] 

Lefty Leverenz commented on HIVE-9487:
--

Adding TODOC15 (which means TODOC1.1.0).

> Make Remote Spark Context secure [Spark Branch]
> ---
>
> Key: HIVE-9487
> URL: https://issues.apache.org/jira/browse/HIVE-9487
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>  Labels: TODOC-SPARK, TODOC15
> Fix For: 1.1.0
>
> Attachments: HIVE-9487.1-spark.patch, HIVE-9487.2-spark.patch
>
>
> The RSC currently uses an ad-hoc, insecure authentication mechanism. We 
> should instead use a proper auth mechanism and add encryption to the mix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8054) Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch]

2015-05-29 Thread Lefty Leverenz (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-8054:
-
Labels: Spark-M1 TODOC-SPARK TODOC15  (was: Spark-M1 TODOC-SPARK)

> Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark 
> Branch]
> --
>
> Key: HIVE-8054
> URL: https://issues.apache.org/jira/browse/HIVE-8054
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Na Yang
>  Labels: Spark-M1, TODOC-SPARK, TODOC15
> Fix For: 1.1.0
>
> Attachments: HIVE-8054-spark.patch, HIVE-8054.2-spark.patch, 
> HIVE-8054.3-spark.patch
>
>
> Option hive.optimize.union.remove introduced in HIVE-3276 removes union 
> operators from the operator graph in certain cases as an optimization reduce 
> the number of MR jobs. While making sense in MR, this optimization is 
> actually harmful to an execution engine such as Spark, which natives supports 
> union without requiring additional jobs. This is because removing union 
> operator creates disjointed operator graphs, each graph generating a job, and 
> thus this optimization requires more jobs to run the query. Not to mention 
> the additional complexity handling linked FS descriptors.
> I propose that we disable such optimization when the execution engine is 
> Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8054) Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch]

2015-05-29 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564368#comment-14564368
 ] 

Lefty Leverenz commented on HIVE-8054:
--

Adding TODOC15 (which means TODOC1.1.0).

> Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark 
> Branch]
> --
>
> Key: HIVE-8054
> URL: https://issues.apache.org/jira/browse/HIVE-8054
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Na Yang
>  Labels: Spark-M1, TODOC-SPARK, TODOC15
> Fix For: 1.1.0
>
> Attachments: HIVE-8054-spark.patch, HIVE-8054.2-spark.patch, 
> HIVE-8054.3-spark.patch
>
>
> Option hive.optimize.union.remove introduced in HIVE-3276 removes union 
> operators from the operator graph in certain cases as an optimization reduce 
> the number of MR jobs. While making sense in MR, this optimization is 
> actually harmful to an execution engine such as Spark, which natives supports 
> union without requiring additional jobs. This is because removing union 
> operator creates disjointed operator graphs, each graph generating a job, and 
> thus this optimization requires more jobs to run the query. Not to mention 
> the additional complexity handling linked FS descriptors.
> I propose that we disable such optimization when the execution engine is 
> Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10659) Beeline command which contains semi-colon as a non-command terminator will fail

2015-05-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-10659:
-
Labels:   (was: TODOC1.2.1)

> Beeline command which contains semi-colon as a non-command terminator will 
> fail
> ---
>
> Key: HIVE-10659
> URL: https://issues.apache.org/jira/browse/HIVE-10659
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Fix For: 1.2.1
>
> Attachments: HIVE-10659.1.patch
>
>
> Consider a scenario where beeline is used to connect to a mysql server. The 
> commands executed via beeline can include stored procedures. For e.g. the 
> following command used to create a stored procedure is a valid command :
> {code}
> CREATE PROCEDURE RM_TLBS_LINKID() BEGIN IF EXISTS (SELECT * FROM 
> `INFORMATION_SCHEMA`.`COLUMNS` WHERE `TABLE_NAME` = 'TBLS' AND `COLUMN_NAME` 
> = 'LINK_TARGET_ID') THEN ALTER TABLE `TBLS` DROP FOREIGN KEY `TBLS_FK3` ; 
> ALTER TABLE `TBLS` DROP KEY `TBLS_N51` ; ALTER TABLE `TBLS` DROP COLUMN 
> `LINK_TARGET_ID` ; END IF; END
> {code}
> MySQL stored procedures have semi-colon ( ; ) as the statement terminator. 
> Since this coincides with beeline's only available command terminator, 
> semi-colon, beeline will not able to execute the above command successfully . 
> i.e, beeline tries to execute the below partial command instead of the 
> complete command shown above.
> {code}
> CREATE PROCEDURE RM_TLBS_LINKID() BEGIN IF EXISTS (SELECT * FROM 
> `INFORMATION_SCHEMA`.`COLUMNS` WHERE `TABLE_NAME` = 'TBLS' AND `COLUMN_NAME` 
> = 'LINK_TARGET_ID') THEN ALTER TABLE `TBLS` DROP FOREIGN KEY `TBLS_FK3` ; 
> {code} 
> The above situation can actually happen within Hive when Hive SchemaTool is 
> used to upgrade a mysql metastore db and the scripts used for the upgrade 
> process contain stored procedures(as the one introduced initially by 
> HIVE-7018). As of now, we cannot have any stored procedure as part of MySQL 
> metastore db upgrade scripts because schemaTool uses beeline to connect to 
> MySQL. As of now, beeline fails to execute any "create procedure" command or 
> similar command containing ; . This is a serious limitation; it needs to be 
> fixed by allowing the end user to provide an option to beeline to not use  
> semi-colon as the command delimiter and instead use new line character as the 
> command delimiter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]

2015-05-29 Thread Lefty Leverenz (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-8528:
-
Labels:   (was: TODOC-SPARK)

> Add remote Spark client to Hive [Spark Branch]
> --
>
> Key: HIVE-8528
> URL: https://issues.apache.org/jira/browse/HIVE-8528
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
> Fix For: 1.1.0
>
> Attachments: HIVE-8528.1-spark-client.patch, HIVE-8528.1-spark.patch, 
> HIVE-8528.2-spark.patch, HIVE-8528.2-spark.patch, HIVE-8528.3-spark.patch
>
>
> For the time being, at least, we've decided to build the Spark client (see 
> SPARK-3215) inside Hive. This task tracks merging the ongoing work into the 
> Spark branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7439) Spark job monitoring and error reporting [Spark Branch]

2015-05-29 Thread Lefty Leverenz (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-7439:
-
Labels: Spark-M3 TODOC-SPARK TODOC15  (was: Spark-M3 TODOC-SPARK)

> Spark job monitoring and error reporting [Spark Branch]
> ---
>
> Key: HIVE-7439
> URL: https://issues.apache.org/jira/browse/HIVE-7439
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Chengxiang Li
>  Labels: Spark-M3, TODOC-SPARK, TODOC15
> Fix For: 1.1.0
>
> Attachments: HIVE-7439.1-spark.patch, HIVE-7439.2-spark.patch, 
> HIVE-7439.2-spark.patch, HIVE-7439.3-spark.patch, HIVE-7439.3-spark.patch, 
> hive on spark job status.PNG
>
>
> After Hive submits a job to Spark cluster, we need to report to user the job 
> progress, such as the percentage done, to the user. This is especially 
> important for long running queries. Moreover, if there is an error during job 
> submission or execution, it's also crucial for hive to fetch the error log 
> and/or stacktrace and feedback it to the user.
> Please refer design doc on wiki for more information.
> CLEAR LIBRARY CACHE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7439) Spark job monitoring and error reporting [Spark Branch]

2015-05-29 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564397#comment-14564397
 ] 

Lefty Leverenz commented on HIVE-7439:
--

Adding TODOC15 (which means TODOC1.1.0).

> Spark job monitoring and error reporting [Spark Branch]
> ---
>
> Key: HIVE-7439
> URL: https://issues.apache.org/jira/browse/HIVE-7439
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Chengxiang Li
>  Labels: Spark-M3, TODOC-SPARK, TODOC15
> Fix For: 1.1.0
>
> Attachments: HIVE-7439.1-spark.patch, HIVE-7439.2-spark.patch, 
> HIVE-7439.2-spark.patch, HIVE-7439.3-spark.patch, HIVE-7439.3-spark.patch, 
> hive on spark job status.PNG
>
>
> After Hive submits a job to Spark cluster, we need to report to user the job 
> progress, such as the percentage done, to the user. This is especially 
> important for long running queries. Moreover, if there is an error during job 
> submission or execution, it's also crucial for hive to fetch the error log 
> and/or stacktrace and feedback it to the user.
> Please refer design doc on wiki for more information.
> CLEAR LIBRARY CACHE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8640) Support hints of SMBJoin [Spark Branch]

2015-05-29 Thread Lefty Leverenz (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-8640:
-
Labels: TODOC-SPARK TODOC15  (was: TODOC-SPARK)

> Support hints of SMBJoin [Spark Branch]
> ---
>
> Key: HIVE-8640
> URL: https://issues.apache.org/jira/browse/HIVE-8640
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Szehon Ho
>Assignee: Szehon Ho
>  Labels: TODOC-SPARK, TODOC15
> Fix For: 1.1.0
>
> Attachments: HIVE-8640.1-spark.patch, HIVE-8640.2-spark.patch, 
> HIVE-8640.3-spark.patch
>
>
> HIVE-8202 supports conversion of join to SMB Join automatically, which relies 
> on the configuration property: 
> "hive.auto.convert.sortmerge.join.bigtable.selection.policy".  
> This task is to support conversion based on map-hints, instead of this 
> policy.  As hints are deprecated, this would not be the primary policy in 
> line with MapReduce, but can be available as a backup to achieve feature 
> parity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8640) Support hints of SMBJoin [Spark Branch]

2015-05-29 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564402#comment-14564402
 ] 

Lefty Leverenz commented on HIVE-8640:
--

Adding TODOC15 (which means TODOC1.1.0).

> Support hints of SMBJoin [Spark Branch]
> ---
>
> Key: HIVE-8640
> URL: https://issues.apache.org/jira/browse/HIVE-8640
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Szehon Ho
>Assignee: Szehon Ho
>  Labels: TODOC-SPARK, TODOC15
> Fix For: 1.1.0
>
> Attachments: HIVE-8640.1-spark.patch, HIVE-8640.2-spark.patch, 
> HIVE-8640.3-spark.patch
>
>
> HIVE-8202 supports conversion of join to SMB Join automatically, which relies 
> on the configuration property: 
> "hive.auto.convert.sortmerge.join.bigtable.selection.policy".  
> This task is to support conversion based on map-hints, instead of this 
> policy.  As hints are deprecated, this would not be the primary policy in 
> line with MapReduce, but can be available as a backup to achieve feature 
> parity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10659) Beeline command which contains semi-colon as a non-command terminator will fail

2015-05-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564406#comment-14564406
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-10659:
--

[~sushanth] [~leftylev] Thanks for the previous comments. Sorry for the 
misleading title here; my bad, I should have mentioned that this fix proposed 
here is intended to fix the delimiter issue only when beeline is accessed via 
schematool. I.e, this change is internal and is used only by the hive 
schematool and need not be documented. For the general solution to the issue 
mentioned in the jira, i.e. for beeline to support this feature we need to 
introduce a command similar to DELIMITER used in MySQL client. This will be 
covered under HIVE-10865 and should hopefully make it for 1.3 release.

Thanks
Hari

> Beeline command which contains semi-colon as a non-command terminator will 
> fail
> ---
>
> Key: HIVE-10659
> URL: https://issues.apache.org/jira/browse/HIVE-10659
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Fix For: 1.2.1
>
> Attachments: HIVE-10659.1.patch
>
>
> Consider a scenario where beeline is used to connect to a mysql server. The 
> commands executed via beeline can include stored procedures. For e.g. the 
> following command used to create a stored procedure is a valid command :
> {code}
> CREATE PROCEDURE RM_TLBS_LINKID() BEGIN IF EXISTS (SELECT * FROM 
> `INFORMATION_SCHEMA`.`COLUMNS` WHERE `TABLE_NAME` = 'TBLS' AND `COLUMN_NAME` 
> = 'LINK_TARGET_ID') THEN ALTER TABLE `TBLS` DROP FOREIGN KEY `TBLS_FK3` ; 
> ALTER TABLE `TBLS` DROP KEY `TBLS_N51` ; ALTER TABLE `TBLS` DROP COLUMN 
> `LINK_TARGET_ID` ; END IF; END
> {code}
> MySQL stored procedures have semi-colon ( ; ) as the statement terminator. 
> Since this coincides with beeline's only available command terminator, 
> semi-colon, beeline will not able to execute the above command successfully . 
> i.e, beeline tries to execute the below partial command instead of the 
> complete command shown above.
> {code}
> CREATE PROCEDURE RM_TLBS_LINKID() BEGIN IF EXISTS (SELECT * FROM 
> `INFORMATION_SCHEMA`.`COLUMNS` WHERE `TABLE_NAME` = 'TBLS' AND `COLUMN_NAME` 
> = 'LINK_TARGET_ID') THEN ALTER TABLE `TBLS` DROP FOREIGN KEY `TBLS_FK3` ; 
> {code} 
> The above situation can actually happen within Hive when Hive SchemaTool is 
> used to upgrade a mysql metastore db and the scripts used for the upgrade 
> process contain stored procedures(as the one introduced initially by 
> HIVE-7018). As of now, we cannot have any stored procedure as part of MySQL 
> metastore db upgrade scripts because schemaTool uses beeline to connect to 
> MySQL. As of now, beeline fails to execute any "create procedure" command or 
> similar command containing ; . This is a serious limitation; it needs to be 
> fixed by allowing the end user to provide an option to beeline to not use  
> semi-colon as the command delimiter and instead use new line character as the 
> command delimiter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10855) Make HIVE-10568 work with Spark [Spark Branch]

2015-05-29 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564407#comment-14564407
 ] 

Rui Li commented on HIVE-10855:
---

Hi [~xuefuz], to make this work, we need first merge HIVE-10568 to spark 
branch. I tried applying the patch but there're some conflicts. Shall we do a 
merge from master to spark first, or I can just merge HIVE-10568 and probably 
some other commits to spark here?

> Make HIVE-10568 work with Spark [Spark Branch]
> --
>
> Key: HIVE-10855
> URL: https://issues.apache.org/jira/browse/HIVE-10855
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
>
> HIVE-10568 only works with Tez. It's good to make it also work for Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8911) Enable mapjoin hints [Spark Branch]

2015-05-29 Thread Lefty Leverenz (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-8911:
-
Labels: TODOC-SPARK TODOC15  (was: TODOC-SPARK)

> Enable mapjoin hints [Spark Branch]
> ---
>
> Key: HIVE-8911
> URL: https://issues.apache.org/jira/browse/HIVE-8911
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Szehon Ho
>Assignee: Chao Sun
>  Labels: TODOC-SPARK, TODOC15
> Fix For: 1.1.0
>
> Attachments: HIVE-8911.1-spark.patch, HIVE-8911.2-spark.patch, 
> HIVE-8911.3-spark.patch, HIVE-8911.4-spark.patch, HIVE-8911.5-spark.patch, 
> HIVE-8911.6-spark.patch
>
>
> Currently the big table selection in a mapjoin is based on stats.
> We should also enable the big-table selection based on hints.  See class 
> MapJoinProcessor.  This is a logical-optimizer class, so we should be able to 
> re-use this without too many changes to hook up with SparkMapJoinResolver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8911) Enable mapjoin hints [Spark Branch]

2015-05-29 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564408#comment-14564408
 ] 

Lefty Leverenz commented on HIVE-8911:
--

Adding TODOC15 (which means TODOC1.1.0).

> Enable mapjoin hints [Spark Branch]
> ---
>
> Key: HIVE-8911
> URL: https://issues.apache.org/jira/browse/HIVE-8911
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Szehon Ho
>Assignee: Chao Sun
>  Labels: TODOC-SPARK, TODOC15
> Fix For: 1.1.0
>
> Attachments: HIVE-8911.1-spark.patch, HIVE-8911.2-spark.patch, 
> HIVE-8911.3-spark.patch, HIVE-8911.4-spark.patch, HIVE-8911.5-spark.patch, 
> HIVE-8911.6-spark.patch
>
>
> Currently the big table selection in a mapjoin is based on stats.
> We should also enable the big-table selection based on hints.  See class 
> MapJoinProcessor.  This is a logical-optimizer class, so we should be able to 
> re-use this without too many changes to hook up with SparkMapJoinResolver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10855) Make HIVE-10568 work with Spark [Spark Branch]

2015-05-29 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564410#comment-14564410
 ] 

Rui Li commented on HIVE-10855:
---

Never mind I just saw your merge :)

> Make HIVE-10568 work with Spark [Spark Branch]
> --
>
> Key: HIVE-10855
> URL: https://issues.apache.org/jira/browse/HIVE-10855
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
>
> HIVE-10568 only works with Tez. It's good to make it also work for Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10659) Beeline command which contains semi-colon as a non-command terminator will fail

2015-05-29 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564413#comment-14564413
 ] 

Lefty Leverenz commented on HIVE-10659:
---

Cool.  Thanks Hari.

> Beeline command which contains semi-colon as a non-command terminator will 
> fail
> ---
>
> Key: HIVE-10659
> URL: https://issues.apache.org/jira/browse/HIVE-10659
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Fix For: 1.2.1
>
> Attachments: HIVE-10659.1.patch
>
>
> Consider a scenario where beeline is used to connect to a mysql server. The 
> commands executed via beeline can include stored procedures. For e.g. the 
> following command used to create a stored procedure is a valid command :
> {code}
> CREATE PROCEDURE RM_TLBS_LINKID() BEGIN IF EXISTS (SELECT * FROM 
> `INFORMATION_SCHEMA`.`COLUMNS` WHERE `TABLE_NAME` = 'TBLS' AND `COLUMN_NAME` 
> = 'LINK_TARGET_ID') THEN ALTER TABLE `TBLS` DROP FOREIGN KEY `TBLS_FK3` ; 
> ALTER TABLE `TBLS` DROP KEY `TBLS_N51` ; ALTER TABLE `TBLS` DROP COLUMN 
> `LINK_TARGET_ID` ; END IF; END
> {code}
> MySQL stored procedures have semi-colon ( ; ) as the statement terminator. 
> Since this coincides with beeline's only available command terminator, 
> semi-colon, beeline will not able to execute the above command successfully . 
> i.e, beeline tries to execute the below partial command instead of the 
> complete command shown above.
> {code}
> CREATE PROCEDURE RM_TLBS_LINKID() BEGIN IF EXISTS (SELECT * FROM 
> `INFORMATION_SCHEMA`.`COLUMNS` WHERE `TABLE_NAME` = 'TBLS' AND `COLUMN_NAME` 
> = 'LINK_TARGET_ID') THEN ALTER TABLE `TBLS` DROP FOREIGN KEY `TBLS_FK3` ; 
> {code} 
> The above situation can actually happen within Hive when Hive SchemaTool is 
> used to upgrade a mysql metastore db and the scripts used for the upgrade 
> process contain stored procedures(as the one introduced initially by 
> HIVE-7018). As of now, we cannot have any stored procedure as part of MySQL 
> metastore db upgrade scripts because schemaTool uses beeline to connect to 
> MySQL. As of now, beeline fails to execute any "create procedure" command or 
> similar command containing ; . This is a serious limitation; it needs to be 
> fixed by allowing the end user to provide an option to beeline to not use  
> semi-colon as the command delimiter and instead use new line character as the 
> command delimiter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10845) TezJobMonitor uses killedTaskCount instead of killedTaskAttemptCount

2015-05-29 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564416#comment-14564416
 ] 

Hive QA commented on HIVE-10845:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12735938/HIVE-10845.2.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8978 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fold_case
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4087/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4087/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4087/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12735938 - PreCommit-HIVE-TRUNK-Build

> TezJobMonitor uses killedTaskCount instead of killedTaskAttemptCount
> 
>
> Key: HIVE-10845
> URL: https://issues.apache.org/jira/browse/HIVE-10845
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Fix For: 1.2.1
>
> Attachments: HIVE-10845.1.patch, HIVE-10845.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10815) Let HiveMetaStoreClient Choose MetaStore Randomly

2015-05-29 Thread Nemon Lou (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564437#comment-14564437
 ] 

Nemon Lou commented on HIVE-10815:
--

Failed test cases seem unrelated .
What this patch does: 
make the order of meta store URIs random at the creation phase of 
HiveMetaStoreClient.



> Let HiveMetaStoreClient Choose MetaStore Randomly
> -
>
> Key: HIVE-10815
> URL: https://issues.apache.org/jira/browse/HIVE-10815
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Metastore
>Affects Versions: 1.2.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
> Attachments: HIVE-10815.patch
>
>
> Currently HiveMetaStoreClient using a fixed order to choose MetaStore URIs 
> when multiple metastores configured.
>  Choosing MetaStore Randomly will be good for load balance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9370) SparkJobMonitor timeout as sortByKey would launch extra Spark job before original job get submitted [Spark Branch]

2015-05-29 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564442#comment-14564442
 ] 

Lefty Leverenz commented on HIVE-9370:
--

Again, should this be documented?  ([~chengxiang li], [~xuefuz])

> SparkJobMonitor timeout as sortByKey would launch extra Spark job before 
> original job get submitted [Spark Branch]
> --
>
> Key: HIVE-9370
> URL: https://issues.apache.org/jira/browse/HIVE-9370
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: yuyun.chen
>Assignee: Chengxiang Li
> Fix For: 1.1.0
>
> Attachments: HIVE-9370.1-spark.patch
>
>
> enable hive on spark and run BigBench Query 8 then got the following 
> exception:
> 2015-01-14 11:43:46,057 INFO  [main]: impl.RemoteSparkJobStatus 
> (RemoteSparkJobStatus.java:getSparkJobInfo(143)) - Job hasn't been submitted 
> after 30s. Aborting it.
> 2015-01-14 11:43:46,061 INFO  [main]: impl.RemoteSparkJobStatus 
> (RemoteSparkJobStatus.java:getSparkJobInfo(143)) - Job hasn't been submitted 
> after 30s. Aborting it.
> 2015-01-14 11:43:46,061 ERROR [main]: status.SparkJobMonitor 
> (SessionState.java:printError(839)) - Status: Failed
> 2015-01-14 11:43:46,062 INFO  [main]: log.PerfLogger 
> (PerfLogger.java:PerfLogEnd(148)) -  start=1421206996052 end=1421207026062 duration=30010 
> from=org.apache.hadoop.hive.ql.exec.spark.status.SparkJobMonitor>
> 2015-01-14 11:43:46,071 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(436)) - 15/01/14 11:43:46 INFO RemoteDriver: Failed 
> to run job 0a9a7782-0e0b-4561-8468-959a6d8df0a3
> 2015-01-14 11:43:46,071 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(436)) - java.lang.InterruptedException
> 2015-01-14 11:43:46,071 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(436)) -at java.lang.Object.wait(Native 
> Method)
> 2015-01-14 11:43:46,071 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(436)) -at 
> java.lang.Object.wait(Object.java:503)
> 2015-01-14 11:43:46,071 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(436)) -at 
> org.apache.spark.scheduler.JobWaiter.awaitResult(JobWaiter.scala:73)
> 2015-01-14 11:43:46,071 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(436)) -at 
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:514)
> 2015-01-14 11:43:46,071 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(436)) -at 
> org.apache.spark.SparkContext.runJob(SparkContext.scala:1282)
> 2015-01-14 11:43:46,072 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(436)) -at 
> org.apache.spark.SparkContext.runJob(SparkContext.scala:1300)
> 2015-01-14 11:43:46,072 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(436)) -at 
> org.apache.spark.SparkContext.runJob(SparkContext.scala:1314)
> 2015-01-14 11:43:46,072 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(436)) -at 
> org.apache.spark.SparkContext.runJob(SparkContext.scala:1328)
> 2015-01-14 11:43:46,072 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(436)) -at 
> org.apache.spark.rdd.RDD.collect(RDD.scala:780)
> 2015-01-14 11:43:46,072 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(436)) -at 
> org.apache.spark.RangePartitioner$.sketch(Partitioner.scala:262)
> 2015-01-14 11:43:46,072 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(436)) -at 
> org.apache.spark.RangePartitioner.(Partitioner.scala:124)
> 2015-01-14 11:43:46,072 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(436)) -at 
> org.apache.spark.rdd.OrderedRDDFunctions.sortByKey(OrderedRDDFunctions.scala:63)
> 2015-01-14 11:43:46,073 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(436)) -at 
> org.apache.spark.api.java.JavaPairRDD.sortByKey(JavaPairRDD.scala:894)
> 2015-01-14 11:43:46,073 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(436)) -at 
> org.apache.spark.api.java.JavaPairRDD.sortByKey(JavaPairRDD.scala:864)
> 2015-01-14 11:43:46,073 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(436)) -at 
> org.apache.hadoop.hive.ql.exec.spark.SortByShuffler.shuffle(SortByShuffler.java:48)
> 2015-01-14 11:43:46,073 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(436)) -at 
> org.apache.hadoop.hive.ql.exec.spark.ShuffleTran.transform(ShuffleTran.j

[jira] [Updated] (HIVE-8993) Make sure Spark + HS2 work [Spark Branch]

2015-05-29 Thread Lefty Leverenz (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-8993:
-
Labels: TODOC-SPARK TODOC15  (was: TODOC-SPARK)

> Make sure Spark + HS2 work [Spark Branch]
> -
>
> Key: HIVE-8993
> URL: https://issues.apache.org/jira/browse/HIVE-8993
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Chengxiang Li
>  Labels: TODOC-SPARK, TODOC15
> Fix For: 1.1.0
>
> Attachments: HIVE-8993.1-spark.patch, HIVE-8993.2-spark.patch, 
> HIVE-8993.3-spark.patch
>
>
> We haven't formally tested this combination yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8993) Make sure Spark + HS2 work [Spark Branch]

2015-05-29 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564445#comment-14564445
 ] 

Lefty Leverenz commented on HIVE-8993:
--

Adding TODOC15 (which means TODOC1.1.0).

> Make sure Spark + HS2 work [Spark Branch]
> -
>
> Key: HIVE-8993
> URL: https://issues.apache.org/jira/browse/HIVE-8993
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Chengxiang Li
>  Labels: TODOC-SPARK, TODOC15
> Fix For: 1.1.0
>
> Attachments: HIVE-8993.1-spark.patch, HIVE-8993.2-spark.patch, 
> HIVE-8993.3-spark.patch
>
>
> We haven't formally tested this combination yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8043) Support merging small files [Spark Branch]

2015-05-29 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564449#comment-14564449
 ] 

Lefty Leverenz commented on HIVE-8043:
--

Does this need documentation?

> Support merging small files [Spark Branch]
> --
>
> Key: HIVE-8043
> URL: https://issues.apache.org/jira/browse/HIVE-8043
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
>  Labels: Spark-M1
> Fix For: 1.1.0
>
> Attachments: HIVE-8043.1-spark.patch, HIVE-8043.2-spark.patch, 
> HIVE-8043.3-spark.patch
>
>
> Hive currently supports merging small files with MR as the execution engine. 
> There are options available for this, such as 
> {code}
> hive.merge.mapfiles
> hive.merge.mapredfiles
> {code}
> Hive.merge.sparkfiles is already introduced in HIVE-7810. To make it work, we 
> might need a little more research and design on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8868) SparkSession and SparkClient mapping[Spark Branch]

2015-05-29 Thread Lefty Leverenz (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-8868:
-
Labels: Spark-M3 TODOC-SPARK TODOC15  (was: Spark-M3 TODOC-SPARK)

> SparkSession and SparkClient mapping[Spark Branch]
> --
>
> Key: HIVE-8868
> URL: https://issues.apache.org/jira/browse/HIVE-8868
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Rui Li
>  Labels: Spark-M3, TODOC-SPARK, TODOC15
> Fix For: 1.1.0
>
> Attachments: HIVE-8868.1-spark.patch, HIVE-8868.2-spark.patch, 
> HIVE-8868.2-spark.patch
>
>
> It should be a seperate spark context for each user session, currently we 
> share a singleton local spark context in all user sessions with local spark, 
> and create remote spark context for each spark job with spark cluster.
> To binding one spark context to each user session, we may construct spark 
> client on session open, one thing to notify is that, is SparkSession::conf is 
> consist with Context::getConf? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8868) SparkSession and SparkClient mapping[Spark Branch]

2015-05-29 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564456#comment-14564456
 ] 

Lefty Leverenz commented on HIVE-8868:
--

Adding TODOC15 (which means TODOC1.1.0).

> SparkSession and SparkClient mapping[Spark Branch]
> --
>
> Key: HIVE-8868
> URL: https://issues.apache.org/jira/browse/HIVE-8868
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Rui Li
>  Labels: Spark-M3, TODOC-SPARK, TODOC15
> Fix For: 1.1.0
>
> Attachments: HIVE-8868.1-spark.patch, HIVE-8868.2-spark.patch, 
> HIVE-8868.2-spark.patch
>
>
> It should be a seperate spark context for each user session, currently we 
> share a singleton local spark context in all user sessions with local spark, 
> and create remote spark context for each spark job with spark cluster.
> To binding one spark context to each user session, we may construct spark 
> client on session open, one thing to notify is that, is SparkSession::conf is 
> consist with Context::getConf? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-4239) Remove lock on compilation stage

2015-05-29 Thread Carl Steinbach (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564517#comment-14564517
 ] 

Carl Steinbach commented on HIVE-4239:
--

A couple comments on the patch:
* Would you mind changing the name of the new configuration property to 
'hive.driver.parallel.compilation'? The only reference to this config property 
is located in the Driver class, and both Driver and SessionState predate 
HiveServer2 by several years. Driver and SessionState were used by HiveServer1 
and continue to be used by the HiveCli, and I know that there are at least a 
couple third-party libraries out there that attempt to support concurrency by 
scheduling queries across a pool of Driver objects. In other words, this 
property changes the behavior of a class which is not part of HiveServer2, and 
also has the potential to change the behavior of other user-facing interfaces 
built on top of Driver which are also not part of HiveServer2. I also hope that 
at some point the Driver and SessionState classes will go away completely, at 
which point we can deprecate and remove this property, but only if it 
references "driver".
* I think the docstring for the new property could use some wordsmithing: 
"Whether to enable parallel compilation on HiveServer2. _Disable as a 
workaround for future bugs._" The last sentence isn't going to inspire much 
confidence in users about either the quality of the product or the development 
team that produced it. 
* In order to enable this feature by default I think we need to be pretty 
confident that parallel compilation works. The additional parallel test 
coverage included in this patch is a great start, but I think it falls well 
short of being comprehensive. I mentioned in an earlier comment that code 
already exists for running qfile tests in parallel on top of HiveServer2. Why 
not re-enable this?


> Remove lock on compilation stage
> 
>
> Key: HIVE-4239
> URL: https://issues.apache.org/jira/browse/HIVE-4239
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Query Processor
>Reporter: Carl Steinbach
>Assignee: Sergey Shelukhin
> Attachments: HIVE-4239.01.patch, HIVE-4239.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10563) MiniTezCliDriver tests ordering issues

2015-05-29 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564528#comment-14564528
 ] 

Hive QA commented on HIVE-10563:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12735829/HIVE-10563.7.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 8977 tests executed
*Failed tests:*
{noformat}
TestCustomAuthentication - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fold_case
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4088/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4088/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4088/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12735829 - PreCommit-HIVE-TRUNK-Build

> MiniTezCliDriver tests ordering issues
> --
>
> Key: HIVE-10563
> URL: https://issues.apache.org/jira/browse/HIVE-10563
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-10563.1.patch, HIVE-10563.2.patch, 
> HIVE-10563.3.patch, HIVE-10563.4.patch, HIVE-10563.5.patch, 
> HIVE-10563.6.patch, HIVE-10563.7.patch
>
>
> There are a bunch of tests related to TestMiniTezCliDriver which gives 
> ordering issues when run on Centos/Windows/OSX



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support

2015-05-29 Thread Saurabh Santhosh (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564551#comment-14564551
 ] 

Saurabh Santhosh commented on HIVE-5317:


is the merge functionality (MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... 
WHEN NOT MATCHED THEN ...) available as well?
Or is there any other JIRA ticket for it?

> Implement insert, update, and delete in Hive with full ACID support
> ---
>
> Key: HIVE-5317
> URL: https://issues.apache.org/jira/browse/HIVE-5317
> Project: Hive
>  Issue Type: New Feature
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 0.14.0
>
> Attachments: InsertUpdatesinHive.pdf
>
>
> Many customers want to be able to insert, update and delete rows from Hive 
> tables with full ACID support. The use cases are varied, but the form of the 
> queries that should be supported are:
> * INSERT INTO tbl SELECT …
> * INSERT INTO tbl VALUES ...
> * UPDATE tbl SET … WHERE …
> * DELETE FROM tbl WHERE …
> * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN 
> ...
> * SET TRANSACTION LEVEL …
> * BEGIN/END TRANSACTION
> Use Cases
> * Once an hour, a set of inserts and updates (up to 500k rows) for various 
> dimension tables (eg. customer, inventory, stores) needs to be processed. The 
> dimension tables have primary keys and are typically bucketed and sorted on 
> those keys.
> * Once a day a small set (up to 100k rows) of records need to be deleted for 
> regulatory compliance.
> * Once an hour a log of transactions is exported from a RDBS and the fact 
> tables need to be updated (up to 1m rows)  to reflect the new data. The 
> transactions are a combination of inserts, updates, and deletes. The table is 
> partitioned and bucketed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8043) Support merging small files [Spark Branch]

2015-05-29 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564569#comment-14564569
 ] 

Rui Li commented on HIVE-8043:
--

[~leftylev] - I think that's already handled in HIVE-7810

> Support merging small files [Spark Branch]
> --
>
> Key: HIVE-8043
> URL: https://issues.apache.org/jira/browse/HIVE-8043
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
>  Labels: Spark-M1
> Fix For: 1.1.0
>
> Attachments: HIVE-8043.1-spark.patch, HIVE-8043.2-spark.patch, 
> HIVE-8043.3-spark.patch
>
>
> Hive currently supports merging small files with MR as the execution engine. 
> There are options available for this, such as 
> {code}
> hive.merge.mapfiles
> hive.merge.mapredfiles
> {code}
> Hive.merge.sparkfiles is already introduced in HIVE-7810. To make it work, we 
> might need a little more research and design on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10835) Concurrency issues in JDBC driver

2015-05-29 Thread Chaoyu Tang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-10835:
---
Attachment: (was: HIVE-10835.1.patch)

> Concurrency issues in JDBC driver
> -
>
> Key: HIVE-10835
> URL: https://issues.apache.org/jira/browse/HIVE-10835
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 1.2.0
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-10835.1.patch, HIVE-10835.patch
>
>
> Though JDBC specification specifies that "Each Connection object can create 
> multiple Statement objects that may be used concurrently by the program", but 
> that does not work in current Hive JDBC driver. In addition, there also exist 
>  race conditions between DatabaseMetaData, Statement and ResultSet as long as 
> they make RPC calls to HS2 using same Thrift transport, which happens within 
> a connection.
> So we need a connection level lock to serialize all these RPC calls in a 
> connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10835) Concurrency issues in JDBC driver

2015-05-29 Thread Chaoyu Tang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-10835:
---
Attachment: HIVE-10835.2.patch

Updated the patch:
1. Add a unit test TestJdbcWithMiniHS2#testConcurrentStatements (300 tasks 
using 100 threads, take around 1.5 -2 min in my local machine)
2. use the client object itself as the lock
[~thejas], [~xuefuz] and [~szehon], could you review it? thanks.

> Concurrency issues in JDBC driver
> -
>
> Key: HIVE-10835
> URL: https://issues.apache.org/jira/browse/HIVE-10835
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 1.2.0
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-10835.1.patch, HIVE-10835.2.patch, HIVE-10835.patch
>
>
> Though JDBC specification specifies that "Each Connection object can create 
> multiple Statement objects that may be used concurrently by the program", but 
> that does not work in current Hive JDBC driver. In addition, there also exist 
>  race conditions between DatabaseMetaData, Statement and ResultSet as long as 
> they make RPC calls to HS2 using same Thrift transport, which happens within 
> a connection.
> So we need a connection level lock to serialize all these RPC calls in a 
> connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10563) MiniTezCliDriver tests ordering issues

2015-05-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564621#comment-14564621
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-10563:
--

The test failures look unrelated, the changes dont touch fold_case.* or 
ql_rewrite_gbtoidx_cbo_2.* files.

Thanks
Hari

> MiniTezCliDriver tests ordering issues
> --
>
> Key: HIVE-10563
> URL: https://issues.apache.org/jira/browse/HIVE-10563
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-10563.1.patch, HIVE-10563.2.patch, 
> HIVE-10563.3.patch, HIVE-10563.4.patch, HIVE-10563.5.patch, 
> HIVE-10563.6.patch, HIVE-10563.7.patch
>
>
> There are a bunch of tests related to TestMiniTezCliDriver which gives 
> ordering issues when run on Centos/Windows/OSX



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10835) Concurrency issues in JDBC driver

2015-05-29 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564686#comment-14564686
 ] 

Hive QA commented on HIVE-10835:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12735879/HIVE-10835.1.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 8978 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fold_case
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2
org.apache.hive.jdbc.TestSSL.testSSLFetchHttp
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4089/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4089/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4089/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12735879 - PreCommit-HIVE-TRUNK-Build

> Concurrency issues in JDBC driver
> -
>
> Key: HIVE-10835
> URL: https://issues.apache.org/jira/browse/HIVE-10835
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 1.2.0
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-10835.1.patch, HIVE-10835.2.patch, HIVE-10835.patch
>
>
> Though JDBC specification specifies that "Each Connection object can create 
> multiple Statement objects that may be used concurrently by the program", but 
> that does not work in current Hive JDBC driver. In addition, there also exist 
>  race conditions between DatabaseMetaData, Statement and ResultSet as long as 
> they make RPC calls to HS2 using same Thrift transport, which happens within 
> a connection.
> So we need a connection level lock to serialize all these RPC calls in a 
> connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10855) Make HIVE-10568 work with Spark [Spark Branch]

2015-05-29 Thread Rui Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-10855:
--
Attachment: HIVE-10855.1-spark.patch

Should work out of box. Let the tests run.

> Make HIVE-10568 work with Spark [Spark Branch]
> --
>
> Key: HIVE-10855
> URL: https://issues.apache.org/jira/browse/HIVE-10855
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
> Attachments: HIVE-10855.1-spark.patch
>
>
> HIVE-10568 only works with Tez. It's good to make it also work for Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10821) Beeline-CLI: Implement CLI source command using Beeline functionality

2015-05-29 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564806#comment-14564806
 ] 

Xuefu Zhang commented on HIVE-10821:


[~Ferd], thanks for working on this. If you're introducing the functionality of 
sourcing a script file in Beeline, I think we should follow Beeline's command 
style. That's is, all commands need to start with "!", for instance, !sql, 
!connect, and now !source. A command without "!" is treated as if it is started 
with "!sql". Thus, if you introduce a command like "source script.sql", it 
should be translated to "!sql source script.sql", which isn't intuitive. I'd 
suggest we instead introduce "!source script.sql" as the way to execute a 
script file in interactive mode. Then, for Hive CLI's beeline implementation, 
Hive CLI's "source script.sql" will be translated to "!source script.sql" via 
your translation method. Thoughts?

> Beeline-CLI: Implement CLI source command using Beeline functionality
> -
>
> Key: HIVE-10821
> URL: https://issues.apache.org/jira/browse/HIVE-10821
> Project: Hive
>  Issue Type: Sub-task
>  Components: CLI
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-10821.1-beeline-cli.patch, 
> HIVE-10821.1-beeline-cli.patch, HIVE-10821.2-beeline-cli.patch, 
> HIVE-10821.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-10821) Beeline-CLI: Implement CLI source command using Beeline functionality

2015-05-29 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564806#comment-14564806
 ] 

Xuefu Zhang edited comment on HIVE-10821 at 5/29/15 1:36 PM:
-

[~Ferd], thanks for working on this. If you're introducing the functionality of 
sourcing a script file in Beeline, I think we should follow Beeline's command 
style. That's is, all commands need to start with {noformat}!{noformat}, for 
instance, !sql, !connect, and now !source. A command without 
{noformat}!{noformat} is treated as if it is started with "!sql". Thus, if you 
execute a command like "source script.sql" in beeline, it should be equivalent 
to "!sql source script.sql", which isn't intuitive. I'd suggest we instead 
introduce "!source script.sql" as a general way to execute a script file in 
Beeline interactive mode. Then, for Hive CLI's beeline implementation, Hive 
CLI's "source script.sql" will be translated to "!source script.sql" via your 
translation method. Thoughts?


was (Author: xuefuz):
[~Ferd], thanks for working on this. If you're introducing the functionality of 
sourcing a script file in Beeline, I think we should follow Beeline's command 
style. That's is, all commands need to start with "!", for instance, !sql, 
!connect, and now !source. A command without "!" is treated as if it is started 
with "!sql". Thus, if you introduce a command like "source script.sql", it 
should be translated to "!sql source script.sql", which isn't intuitive. I'd 
suggest we instead introduce "!source script.sql" as the way to execute a 
script file in interactive mode. Then, for Hive CLI's beeline implementation, 
Hive CLI's "source script.sql" will be translated to "!source script.sql" via 
your translation method. Thoughts?

> Beeline-CLI: Implement CLI source command using Beeline functionality
> -
>
> Key: HIVE-10821
> URL: https://issues.apache.org/jira/browse/HIVE-10821
> Project: Hive
>  Issue Type: Sub-task
>  Components: CLI
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-10821.1-beeline-cli.patch, 
> HIVE-10821.1-beeline-cli.patch, HIVE-10821.2-beeline-cli.patch, 
> HIVE-10821.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-10821) Beeline-CLI: Implement CLI source command using Beeline functionality

2015-05-29 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564806#comment-14564806
 ] 

Xuefu Zhang edited comment on HIVE-10821 at 5/29/15 1:37 PM:
-

[~Ferd], thanks for working on this. If you're introducing the functionality of 
sourcing a script file in Beeline, I think we should follow Beeline's command 
style. That's is, all commands need to start with !, for instance, !sql, 
!connect, and now !source. A command without ! is treated as if it is started 
with "!sql". Thus, if you execute a command like "source script.sql" in 
beeline, it should be equivalent to "!sql source script.sql", which isn't 
intuitive. I'd suggest we instead introduce "!source script.sql" as a general 
way to execute a script file in Beeline interactive mode. Then, for Hive CLI's 
beeline implementation, Hive CLI's "source script.sql" will be translated to 
"!source script.sql" via your translation method. Thoughts?


was (Author: xuefuz):
[~Ferd], thanks for working on this. If you're introducing the functionality of 
sourcing a script file in Beeline, I think we should follow Beeline's command 
style. That's is, all commands need to start with {noformat}!{noformat}, for 
instance, !sql, !connect, and now !source. A command without 
{noformat}!{noformat} is treated as if it is started with "!sql". Thus, if you 
execute a command like "source script.sql" in beeline, it should be equivalent 
to "!sql source script.sql", which isn't intuitive. I'd suggest we instead 
introduce "!source script.sql" as a general way to execute a script file in 
Beeline interactive mode. Then, for Hive CLI's beeline implementation, Hive 
CLI's "source script.sql" will be translated to "!source script.sql" via your 
translation method. Thoughts?

> Beeline-CLI: Implement CLI source command using Beeline functionality
> -
>
> Key: HIVE-10821
> URL: https://issues.apache.org/jira/browse/HIVE-10821
> Project: Hive
>  Issue Type: Sub-task
>  Components: CLI
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-10821.1-beeline-cli.patch, 
> HIVE-10821.1-beeline-cli.patch, HIVE-10821.2-beeline-cli.patch, 
> HIVE-10821.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10788) Change sort_array to support non-primitive types

2015-05-29 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564855#comment-14564855
 ] 

Hive QA commented on HIVE-10788:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12735904/HIVE-10788.3.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 8980 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fold_case
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_minimr_broken_pipe
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
org.apache.hive.jdbc.TestSSL.testSSLFetchHttp
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4090/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4090/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4090/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12735904 - PreCommit-HIVE-TRUNK-Build

> Change sort_array to support non-primitive types
> 
>
> Key: HIVE-10788
> URL: https://issues.apache.org/jira/browse/HIVE-10788
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-10788.1.patch, HIVE-10788.2.patch, 
> HIVE-10788.3.patch
>
>
> Currently {{sort_array}} only support primitive types. As we already support 
> comparison between non-primitive types, it makes sense to remove this 
> restriction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-10866) Throw error when client try to insert into bucketed table

2015-05-29 Thread Yongzhi Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen reassigned HIVE-10866:
---

Assignee: Yongzhi Chen

> Throw error when client try to insert into bucketed table
> -
>
> Key: HIVE-10866
> URL: https://issues.apache.org/jira/browse/HIVE-10866
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
>
> Currently, hive does not support appends(insert into) bucketed table, see 
> open jira HIVE-3608. When insert into such table, the data will be 
> "corrupted" and not fit for bucketmapjoin. 
> We need find a way to prevent client from inserting into such table.
> Reproduce:
> {noformat}
> CREATE TABLE IF NOT EXISTS buckettestoutput1( 
> data string 
> )CLUSTERED BY(data) 
> INTO 2 BUCKETS 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> CREATE TABLE IF NOT EXISTS buckettestoutput2( 
> data string 
> )CLUSTERED BY(data) 
> INTO 2 BUCKETS 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> set hive.enforce.bucketing = true; 
> set hive.enforce.sorting=true;
> insert into table buckettestoutput1 select code from sample_07 where 
> total_emp < 134354250 limit 10;
> After this first insert, I did:
> set hive.auto.convert.sortmerge.join=true; 
> set hive.optimize.bucketmapjoin = true; 
> set hive.optimize.bucketmapjoin.sortedmerge = true; 
> set hive.auto.convert.sortmerge.join.noconditionaltask=true;
> 0: jdbc:hive2://localhost:1> select * from buckettestoutput1 a join 
> buckettestoutput2 b on (a.data=b.data);
> +---+---+
> | data  | data  |
> +---+---+
> +---+---+
> So select works fine. 
> Second insert:
> 0: jdbc:hive2://localhost:1> insert into table buckettestoutput1 select 
> code from sample_07 where total_emp >= 134354250 limit 10;
> No rows affected (61.235 seconds)
> Then select:
> 0: jdbc:hive2://localhost:1> select * from buckettestoutput1 a join 
> buckettestoutput2 b on (a.data=b.data);
> Error: Error while compiling statement: FAILED: SemanticException [Error 
> 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
> bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
> of buckets for table buckettestoutput1 is 2, whereas the number of files is 4 
> (state=42000,code=10141)
> 0: jdbc:hive2://localhost:1>
> {noformat}
> Insert into empty table or partition will be fine, but insert into the 
> non-empty one (after second insert in the reproduce), the bucketmapjoin will 
> throw an error. We should not let second insert succeed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10855) Make HIVE-10568 work with Spark [Spark Branch]

2015-05-29 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564896#comment-14564896
 ] 

Hive QA commented on HIVE-10855:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12736130/HIVE-10855.1-spark.patch

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 7919 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.initializationError
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_memcheck
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join32
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_3
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_7
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_8
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin11
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin5
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketsortoptimize_insert_2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_limit_pushdown
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_count_distinct
org.apache.hive.jdbc.TestSSL.testSSLConnectionWithProperty
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/870/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/870/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-870/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12736130 - PreCommit-HIVE-SPARK-Build

> Make HIVE-10568 work with Spark [Spark Branch]
> --
>
> Key: HIVE-10855
> URL: https://issues.apache.org/jira/browse/HIVE-10855
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
> Attachments: HIVE-10855.1-spark.patch
>
>
> HIVE-10568 only works with Tez. It's good to make it also work for Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10728) deprecate unix_timestamp(void) and make it deterministic

2015-05-29 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564998#comment-14564998
 ] 

Hive QA commented on HIVE-10728:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12735980/HIVE-10728.02.patch

{color:red}ERROR:{color} -1 due to 40 failed/errored test(s), 8978 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fold_case
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_unix_timestamp
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_auto_sortmerge_join_16
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket4
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket5
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket6
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketizedhiveinputformat
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketmapjoin6
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketmapjoin7
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_disable_merge_for_bucketing
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_empty_dir_in_table
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_external_table_with_space_in_location_path
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap_auto
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_bucketed_table
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_map_operators
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_merge
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_num_buckets
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_leftsemijoin_mr
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_list_bucket_dml_10
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_parallel_orderby
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_quotedid_smb
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_reduce_deduplicate
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_remote_script
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_schemeAuthority
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_schemeAuthority2
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_scriptfile1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_smb_mapjoin_8
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_stats_counter
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_stats_counter_partitioned
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_truncate_column_buckets
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_uber_reduce
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2
org.apache.hadoop.hive.thrift.TestHadoop20SAuthBridge.testSaslWithHiveMetaStore
org.apache.hive.jdbc.TestSSL.testSSLFetchHttp
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4092/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4092/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4092/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 40 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12735980 - PreCommit-HIVE-TRUNK-Build

> deprecate unix_timestamp(void) and make it deterministic
> 
>
> Key: HIVE-10728
> URL: https://issues.apache.org/jira/browse/HIVE-10728
> Proje

[jira] [Updated] (HIVE-7193) Hive should support additional LDAP authentication parameters

2015-05-29 Thread Naveen Gangam (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-7193:

Attachment: HIVE-7193.2.patch

Incorporated suggestions from Chao's review. 

> Hive should support additional LDAP authentication parameters
> -
>
> Key: HIVE-7193
> URL: https://issues.apache.org/jira/browse/HIVE-7193
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Mala Chikka Kempanna
>Assignee: Naveen Gangam
> Attachments: HIVE-7193.2.patch, HIVE-7193.patch, 
> LDAPAuthentication_Design_Doc.docx, LDAPAuthentication_Design_Doc_V2.docx
>
>
> Currently hive has only following authenticator parameters for LDAP
>  authentication for hiveserver2. 
>  
> hive.server2.authentication 
> LDAP 
>  
>  
> hive.server2.authentication.ldap.url 
> ldap://our_ldap_address 
>  
> We need to include other LDAP properties as part of hive-LDAP authentication 
> like below
> a group search base -> dc=domain,dc=com 
> a group search filter -> member={0} 
> a user search base -> dc=domain,dc=com 
> a user search filter -> sAMAAccountName={0} 
> a list of valid user groups -> group1,group2,group3 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10788) Change sort_array to support non-primitive types

2015-05-29 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565034#comment-14565034
 ] 

Chao Sun commented on HIVE-10788:
-

I don't think the failed tests above are related.

> Change sort_array to support non-primitive types
> 
>
> Key: HIVE-10788
> URL: https://issues.apache.org/jira/browse/HIVE-10788
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-10788.1.patch, HIVE-10788.2.patch, 
> HIVE-10788.3.patch
>
>
> Currently {{sort_array}} only support primitive types. As we already support 
> comparison between non-primitive types, it makes sense to remove this 
> restriction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10863) Merge trunk to Spark branch 5/28/2015 [Spark Branch]

2015-05-29 Thread Jimmy Xiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-10863:
---
Attachment: HIVE-10863.1-spark.patch

> Merge trunk to Spark branch 5/28/2015 [Spark Branch]
> 
>
> Key: HIVE-10863
> URL: https://issues.apache.org/jira/browse/HIVE-10863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Deepesh Khandelwal
> Attachments: HIVE-10863.0-spark.patch, HIVE-10863.1-spark.patch, 
> mj.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10863) Merge trunk to Spark branch 5/28/2015 [Spark Branch]

2015-05-29 Thread Jimmy Xiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565035#comment-14565035
 ] 

Jimmy Xiang commented on HIVE-10863:


I resolved the mj.patch conflicts and pushed it into the spark branch. Attached 
the dummy patch again to trigger the tests.

> Merge trunk to Spark branch 5/28/2015 [Spark Branch]
> 
>
> Key: HIVE-10863
> URL: https://issues.apache.org/jira/browse/HIVE-10863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Deepesh Khandelwal
> Attachments: HIVE-10863.0-spark.patch, HIVE-10863.1-spark.patch, 
> mj.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10835) Concurrency issues in JDBC driver

2015-05-29 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565082#comment-14565082
 ] 

Sergey Shelukhin commented on HIVE-10835:
-

Note that races also potentially exist during execution surrounding 
SessionState, as per recent comments in HIVE-4239

> Concurrency issues in JDBC driver
> -
>
> Key: HIVE-10835
> URL: https://issues.apache.org/jira/browse/HIVE-10835
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 1.2.0
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-10835.1.patch, HIVE-10835.2.patch, HIVE-10835.patch
>
>
> Though JDBC specification specifies that "Each Connection object can create 
> multiple Statement objects that may be used concurrently by the program", but 
> that does not work in current Hive JDBC driver. In addition, there also exist 
>  race conditions between DatabaseMetaData, Statement and ResultSet as long as 
> they make RPC calls to HS2 using same Thrift transport, which happens within 
> a connection.
> So we need a connection level lock to serialize all these RPC calls in a 
> connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10834) Support First_value()/last_value() over x preceding and y preceding windowing

2015-05-29 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565094#comment-14565094
 ] 

Aihua Xu commented on HIVE-10834:
-

[~ashutoshc] Can you help review this change? Thanks. 

> Support First_value()/last_value() over x preceding and y preceding windowing
> -
>
> Key: HIVE-10834
> URL: https://issues.apache.org/jira/browse/HIVE-10834
> Project: Hive
>  Issue Type: Sub-task
>  Components: PTF-Windowing
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-10834.patch
>
>
> Currently the following query
> {noformat}
> select ts, f, first_value(f) over (partition by ts order by t rows between 2 
> preceding and 1 preceding) from over10k limit 100;
> {noformat}
> throws exception:
> {noformat}
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row (tag=0) 
> {"key":{"reducesinkkey0":"2013-03-01 
> 09:11:58.703071","reducesinkkey1":-3},"value":{"_col3":0.83}}
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:256)
> at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:449)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) {"key":{"reducesinkkey0":"2013-03-01 
> 09:11:58.703071","reducesinkkey1":-3},"value":{"_col3":0.83}}
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
> ... 3 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Internal Error: 
> cannot generate all output rows for a Partition
> at 
> org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.finishPartition(WindowingTableFunction.java:519)
> at 
> org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.finishPartition(PTFOperator.java:337)
> at 
> org.apache.hadoop.hive.ql.exec.PTFOperator.process(PTFOperator.java:114)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10550) Dynamic RDD caching optimization for HoS.[Spark Branch]

2015-05-29 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10550:
---
Fix Version/s: spark-branch

> Dynamic RDD caching optimization for HoS.[Spark Branch]
> ---
>
> Key: HIVE-10550
> URL: https://issues.apache.org/jira/browse/HIVE-10550
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
> Fix For: spark-branch
>
> Attachments: HIVE-10550.1-spark.patch, HIVE-10550.1.patch, 
> HIVE-10550.2-spark.patch, HIVE-10550.3-spark.patch, HIVE-10550.4-spark.patch, 
> HIVE-10550.5-spark.patch, HIVE-10550.6-spark.patch
>
>
> A Hive query may try to scan the same table multi times, like self-join, 
> self-union, or even share the same subquery, [TPC-DS 
> Q39|https://github.com/hortonworks/hive-testbench/blob/hive14/sample-queries-tpcds/query39.sql]
>  is an example. As you may know that, Spark support cache RDD data, which 
> mean Spark would put the calculated RDD data in memory and get the data from 
> memory directly for next time, this avoid the calculation cost of this 
> RDD(and all the cost of its dependencies) at the cost of more memory usage. 
> Through analyze the query context, we should be able to understand which part 
> of query could be shared, so that we can reuse the cached RDD in the 
> generated Spark job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10838) Allow the Hive metastore client to bind to a specific address when connecting to the server

2015-05-29 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HIVE-10838:

Issue Type: Bug  (was: Task)

> Allow the Hive metastore client to bind to a specific address when connecting 
> to the server
> ---
>
> Key: HIVE-10838
> URL: https://issues.apache.org/jira/browse/HIVE-10838
> Project: Hive
>  Issue Type: Bug
>Reporter: HeeSoo Kim
>Assignee: HeeSoo Kim
>
> +*In a cluster with Kerberos authentication*+
> When a Hive metastore client (e.g. HS2, oozie) has been configured with a 
> logical hostname (e.g. hiveserver/hiveserver_logical_hostn...@example.com), 
> it still uses its physical hostname to try to connect to the hive metastore.
> For example, we specifiy, in hive-site.xml:
> {noformat}
> 
>   hive.server2.authentication.kerberos.principal
>   hiveserver/hiveserver_logical_hostn...@example.com
> 
> {noformat}
> When the client tried to get a delegation token from the metastore, an 
> exception occurred:
> {noformat}
> 2015-05-21 23:17:59,554 ERROR metadata.Hive 
> (Hive.java:getDelegationToken(2638)) - MetaException(message:Unauthorized 
> connection for super-user: hiveserver/hiveserver_logical_hostn...@example.com 
> from IP 10.250.16.43)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_delegation_token_result$get_delegation_token_resultStandardScheme.read(ThriftHiveMetastore.java)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_delegation_token_result$get_delegation_token_resultStandardScheme.read(ThriftHiveMetastore.java)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_delegation_token_result.read(ThriftHiveMetastore.java)
> at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_delegation_token(ThriftHiveMetastore.java:3293)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_delegation_token(ThriftHiveMetastore.java:3279)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDelegationToken(HiveMetaStoreClient.java:1559)
> {noformat}
> We need to set the bind address when Hive metastore client tries to connect 
> Hive metastore based on hostname of Kerberos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10838) Allow the Hive metastore client to bind to a specific address when connecting to the server

2015-05-29 Thread HeeSoo Kim (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HeeSoo Kim updated HIVE-10838:
--
Description: 
+*In a cluster with Kerberos authentication*+
When a Hive metastore client (e.g. HS2, oozie) has been configured with a 
logical hostname (e.g. hiveserver/hiveserver_logical_hostn...@example.com), it 
still uses its physical hostname to try to connect to the hive metastore.

For example, we specifiy, in hive-site.xml:
{noformat}

  hive.server2.authentication.kerberos.principal
  hiveserver/hiveserver_logical_hostn...@example.com

{noformat}

When the client tried to get a delegation token from the metastore, an 
exception occurred:
{noformat}
2015-05-21 23:17:59,554 ERROR metadata.Hive 
(Hive.java:getDelegationToken(2638)) - MetaException(message:Unauthorized 
connection for super-user: hiveserver/hiveserver_logical_hostn...@example.com 
from IP 10.250.16.43)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_delegation_token_result$get_delegation_token_resultStandardScheme.read(ThriftHiveMetastore.java)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_delegation_token_result$get_delegation_token_resultStandardScheme.read(ThriftHiveMetastore.java)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_delegation_token_result.read(ThriftHiveMetastore.java)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_delegation_token(ThriftHiveMetastore.java:3293)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_delegation_token(ThriftHiveMetastore.java:3279)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDelegationToken(HiveMetaStoreClient.java:1559)
{noformat}

We need to set the bind address when Hive metastore client tries to connect 
Hive metastore based on logical hostname of Kerberos.


  was:
+*In a cluster with Kerberos authentication*+
When a Hive metastore client (e.g. HS2, oozie) has been configured with a 
logical hostname (e.g. hiveserver/hiveserver_logical_hostn...@example.com), it 
still uses its physical hostname to try to connect to the hive metastore.

For example, we specifiy, in hive-site.xml:
{noformat}

  hive.server2.authentication.kerberos.principal
  hiveserver/hiveserver_logical_hostn...@example.com

{noformat}

When the client tried to get a delegation token from the metastore, an 
exception occurred:
{noformat}
2015-05-21 23:17:59,554 ERROR metadata.Hive 
(Hive.java:getDelegationToken(2638)) - MetaException(message:Unauthorized 
connection for super-user: hiveserver/hiveserver_logical_hostn...@example.com 
from IP 10.250.16.43)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_delegation_token_result$get_delegation_token_resultStandardScheme.read(ThriftHiveMetastore.java)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_delegation_token_result$get_delegation_token_resultStandardScheme.read(ThriftHiveMetastore.java)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_delegation_token_result.read(ThriftHiveMetastore.java)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_delegation_token(ThriftHiveMetastore.java:3293)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_delegation_token(ThriftHiveMetastore.java:3279)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDelegationToken(HiveMetaStoreClient.java:1559)
{noformat}

We need to set the bind address when Hive metastore client tries to connect 
Hive metastore based on hostname of Kerberos.



> Allow the Hive metastore client to bind to a specific address when connecting 
> to the server
> ---
>
> Key: HIVE-10838
> URL: https://issues.apache.org/jira/browse/HIVE-10838
> Project: Hive
>  Issue Type: Bug
>Reporter: HeeSoo Kim
>Assignee: HeeSoo Kim
>
> +*In a cluster with Kerberos authentication*+
> When a Hive metastore client (e.g. HS2, oozie) has been configured with a 
> logical hostname (e.g. hiveserver/hiveserver_logical_hostn...@example.com), 
> it still uses its physical hostname to try to connect to the hive metastore.
> For example, we specifiy, in hive-site.xml:
> {noformat}
> 
>   hive.server2.authentication.kerberos.principal
>   hiveserver/hiveserver_logical_hostn...@example.com
> 
> {noformat}
> When the client tried to get a delegation token from the metastore, an 
> exception occurred:
> {noformat}
> 2015-05-21 23:17:59,554 ERROR metadata.Hive 
> (Hive.java:getDelegationToken(2638)) - MetaException(message:U

[jira] [Commented] (HIVE-10835) Concurrency issues in JDBC driver

2015-05-29 Thread Chaoyu Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565124#comment-14565124
 ] 

Chaoyu Tang commented on HIVE-10835:


Thanks [~sershe] for the information and noticed that you are already working 
on HIVE-4239. 
This JIRA mainly addresses the concurrency issue at client site (JDBC driver) 
while HIVE-4239 may focus on server site as I understand, so I think they could 
be considered as separate issues.  We are working together to address the races 
in the system but from different areas. Could you take a look at the patch and 
comment, thanks.

> Concurrency issues in JDBC driver
> -
>
> Key: HIVE-10835
> URL: https://issues.apache.org/jira/browse/HIVE-10835
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 1.2.0
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-10835.1.patch, HIVE-10835.2.patch, HIVE-10835.patch
>
>
> Though JDBC specification specifies that "Each Connection object can create 
> multiple Statement objects that may be used concurrently by the program", but 
> that does not work in current Hive JDBC driver. In addition, there also exist 
>  race conditions between DatabaseMetaData, Statement and ResultSet as long as 
> they make RPC calls to HS2 using same Thrift transport, which happens within 
> a connection.
> So we need a connection level lock to serialize all these RPC calls in a 
> connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10863) Merge trunk to Spark branch 5/28/2015 [Spark Branch]

2015-05-29 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565162#comment-14565162
 ] 

Hive QA commented on HIVE-10863:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12736177/HIVE-10863.1-spark.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 7948 tests executed
*Failed tests:*
{noformat}
TestCliDriver-infer_bucket_sort_multi_insert.q-insert_values_tmp_table.q-union_remove_11.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.initializationError
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/871/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/871/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-871/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12736177 - PreCommit-HIVE-SPARK-Build

> Merge trunk to Spark branch 5/28/2015 [Spark Branch]
> 
>
> Key: HIVE-10863
> URL: https://issues.apache.org/jira/browse/HIVE-10863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Deepesh Khandelwal
> Attachments: HIVE-10863.0-spark.patch, HIVE-10863.1-spark.patch, 
> mj.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10853) Create ExplainTask in ATS hook through ExplainWork

2015-05-29 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565165#comment-14565165
 ] 

Hive QA commented on HIVE-10853:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12735956/HIVE-10853.01.patch

{color:red}ERROR:{color} -1 due to 45 failed/errored test(s), 8978 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fold_case
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_auto_sortmerge_join_16
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket4
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket5
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket6
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketizedhiveinputformat
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketmapjoin6
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketmapjoin7
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_disable_merge_for_bucketing
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_empty_dir_in_table
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_external_table_with_space_in_location_path
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_file_with_header_footer
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_import_exported_table
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap_auto
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_bucketed_table
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_map_operators
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_merge
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_num_buckets
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_input16_cc
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_leftsemijoin_mr
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_list_bucket_dml_10
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_load_fs2
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_load_hdfs_file_with_space_in_the_name
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_parallel_orderby
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_quotedid_smb
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_reduce_deduplicate
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_remote_script
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_schemeAuthority
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_schemeAuthority2
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_scriptfile1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_smb_mapjoin_8
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_stats_counter
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_stats_counter_partitioned
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_temp_table_external
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_truncate_column_buckets
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_uber_reduce
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchAbort
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4093/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4093/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4093/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.p

[jira] [Updated] (HIVE-6867) Bucketized Table feature fails in some cases

2015-05-29 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-6867:
--
Attachment: HIVE-6867.05.patch

address [~jpullokkaran], [~xuefuz]'s comments.

> Bucketized Table feature fails in some cases
> 
>
> Key: HIVE-6867
> URL: https://issues.apache.org/jira/browse/HIVE-6867
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0
>Reporter: Laljo John Pullokkaran
>Assignee: Pengcheng Xiong
> Attachments: HIVE-6867.01.patch, HIVE-6867.02.patch, 
> HIVE-6867.03.patch, HIVE-6867.04.patch, HIVE-6867.05.patch
>
>
> Bucketized Table feature fails in some cases. if src & destination is 
> bucketed on same key, and if actual data in the src is not bucketed (because 
> data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be 
> bucketed while writing to destination.
> Example
> --
> CREATE TABLE P1(key STRING, val STRING)
> CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE 
> P1;
> – perform an insert to make sure there are 2 files
> INSERT OVERWRITE TABLE P1 select key, val from P1;
> --
> This is not a regression. This has never worked.
> This got only discovered due to Hadoop2 changes.
> In Hadoop1, in local mode, number of reducers will always be 1, regardless of 
> what is requested by app. Hadoop2 now honors the number of reducer setting in 
> local mode (by spawning threads).
> Long term solution seems to be to prevent load data for bucketed table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10858) WebHCat specific resources should be added to HADOOP_CLASSPATH first

2015-05-29 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565169#comment-14565169
 ] 

Thejas M Nair commented on HIVE-10858:
--

+1

> WebHCat specific resources should be added to HADOOP_CLASSPATH first
> 
>
> Key: HIVE-10858
> URL: https://issues.apache.org/jira/browse/HIVE-10858
> Project: Hive
>  Issue Type: Bug
>  Components: WebHCat
>Affects Versions: 1.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-10858.patch
>
>
> When submitting jobs via WebHCat the user may specify additional jars to be 
> included with the job.  Sqoop jobs is one such example where user may need to 
> supply a jar with JDBC classes for a given database.  If the a different 
> version of the same jar is already present in HADOOP_CLASSPATH, we need  to 
> make sure user specified jar is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10853) Create ExplainTask in ATS hook through ExplainWork

2015-05-29 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-10853:
---
Attachment: HIVE-10853.02.patch

address [~hagleitn]'s comment

> Create ExplainTask in ATS hook through ExplainWork
> --
>
> Key: HIVE-10853
> URL: https://issues.apache.org/jira/browse/HIVE-10853
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10853.01.patch, HIVE-10853.02.patch
>
>
> Right now ExplainTask is created directly. That's fragile and can lead to 
> stuff like: HIVE-10829



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10862) TestHiveAuthorizerShowFilters tests fail when run in sequence

2015-05-29 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-10862:
-
Attachment: HIVE-10862.2.patch

2.patch - added more comments to the util func


> TestHiveAuthorizerShowFilters tests fail when run in sequence
> -
>
> Key: HIVE-10862
> URL: https://issues.apache.org/jira/browse/HIVE-10862
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-10862.1.patch, HIVE-10862.2.patch
>
>
> The tests fail when there are left over tables or databases from other tests.
> Fails with error like - 
> java.lang.AssertionError: All tables should be passed as arguments 
> expected:<[testhiveauthorizershowfilterstable1, 
> testhiveauthorizershowfilterstable2]> but 
> was:<[testhiveauthorizercheckinvocationtable, 
> testhiveauthorizercheckinvocationtable_acid, 
> testhiveauthorizershowfilterstable1, testhiveauthorizershowfilterstable2]>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10868) Update release note for 1.2.0 and 1.1.0

2015-05-29 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10868:
---
Attachment: HIVE-10868.patch

> Update release note for 1.2.0 and 1.1.0
> ---
>
> Key: HIVE-10868
> URL: https://issues.apache.org/jira/browse/HIVE-10868
> Project: Hive
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 1.2.0, 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-10868.patch
>
>
> It's recently found that Hive's release notes don't contain all JIRAs fixed. 
> This happened due to a lack of correct or missing fix version in a JIRA. A 
> large chunk of such JIRAs are due to the fact that their fix versions didn't 
> get updated when a merge from feature branch to trunk (master). This JIRA is 
> to fix such JIRAs related to Hive on Spark work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10862) TestHiveAuthorizerShowFilters tests fail when run in sequence

2015-05-29 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565187#comment-14565187
 ] 

Thejas M Nair commented on HIVE-10862:
--

Note that the util function can also help in cases where tests are executed in 
parallel, in such cases cleaning up the tables and dbs might not be sufficient.


> TestHiveAuthorizerShowFilters tests fail when run in sequence
> -
>
> Key: HIVE-10862
> URL: https://issues.apache.org/jira/browse/HIVE-10862
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-10862.1.patch, HIVE-10862.2.patch
>
>
> The tests fail when there are left over tables or databases from other tests.
> Fails with error like - 
> java.lang.AssertionError: All tables should be passed as arguments 
> expected:<[testhiveauthorizershowfilterstable1, 
> testhiveauthorizershowfilterstable2]> but 
> was:<[testhiveauthorizercheckinvocationtable, 
> testhiveauthorizercheckinvocationtable_acid, 
> testhiveauthorizershowfilterstable1, testhiveauthorizershowfilterstable2]>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10868) Update release note for 1.2.0 and 1.1.0

2015-05-29 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565188#comment-14565188
 ] 

Xuefu Zhang commented on HIVE-10868:


[~thejas]/[~sushanth], please review. Once I get a +1, I will commit it to 
master.

> Update release note for 1.2.0 and 1.1.0
> ---
>
> Key: HIVE-10868
> URL: https://issues.apache.org/jira/browse/HIVE-10868
> Project: Hive
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 1.2.0, 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-10868.patch
>
>
> It's recently found that Hive's release notes don't contain all JIRAs fixed. 
> This happened due to a lack of correct or missing fix version in a JIRA. A 
> large chunk of such JIRAs are due to the fact that their fix versions didn't 
> get updated when a merge from feature branch to trunk (master). This JIRA is 
> to fix such JIRAs related to Hive on Spark work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9863) Querying parquet tables fails with IllegalStateException [Spark Branch]

2015-05-29 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HIVE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565190#comment-14565190
 ] 

Sergio Peña commented on HIVE-9863:
---

[~xuefuz]
I run the same tests using the hive cli + spark this time; but it works fine. 
There is no error exception.

{noformat}
hive> desc formatted parquet;
...
# Storage Information
SerDe Library:  
org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe  
InputFormat:
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
OutputFormat:   
org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
...

hive> select count(*) from parquet;
Query ID = sergio_20150529133253_5fd9da28-d73b-4137-a04a-3975108dbba7
...
Starting Spark Job = 513800e8-2d6a-47af-830c-d18099e52bc3
2015-05-29 13:32:54,261 Stage-3_0: 1/1 Finished Stage-4_0: 1/1 Finished
Status: Finished successfully in 1.01 seconds
OK
500
Time taken: 1.198 seconds, Fetched: 1 row(s)
{noformat}

> Querying parquet tables fails with IllegalStateException [Spark Branch]
> ---
>
> Key: HIVE-9863
> URL: https://issues.apache.org/jira/browse/HIVE-9863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>
> Not necessarily happens only in spark branch, queries such as select count(*) 
> from table_name fails with error:
> {code}
> hive> select * from content limit 2;
> OK
> Failed with exception java.io.IOException:java.lang.IllegalStateException: 
> All the offsets listed in the split should be found in the file. expected: 
> [4, 4] found: [BlockMetaData{69644, 881917418 [ColumnMetaData{GZIP [guid] 
> BINARY  [PLAIN, BIT_PACKED], 4}, ColumnMetaData{GZIP [collection_name] BINARY 
>  [PLAIN_DICTIONARY, BIT_PACKED], 389571}, ColumnMetaData{GZIP [doc_type] 
> BINARY  [PLAIN_DICTIONARY, BIT_PACKED], 389790}, ColumnMetaData{GZIP [stage] 
> INT64  [PLAIN_DICTIONARY, BIT_PACKED], 389887}, ColumnMetaData{GZIP 
> [meta_timestamp] INT64  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 397673}, 
> ColumnMetaData{GZIP [doc_timestamp] INT64  [RLE, PLAIN_DICTIONARY, 
> BIT_PACKED], 422161}, ColumnMetaData{GZIP [meta_size] INT32  [RLE, 
> PLAIN_DICTIONARY, BIT_PACKED], 460215}, ColumnMetaData{GZIP [content_size] 
> INT32  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 521728}, ColumnMetaData{GZIP 
> [source] BINARY  [RLE, PLAIN, BIT_PACKED], 683740}, ColumnMetaData{GZIP 
> [delete_flag] BOOLEAN  [RLE, PLAIN, BIT_PACKED], 683787}, ColumnMetaData{GZIP 
> [meta] BINARY  [RLE, PLAIN, BIT_PACKED], 683834}, ColumnMetaData{GZIP 
> [content] BINARY  [RLE, PLAIN, BIT_PACKED], 6992365}]}] out of: [4, 
> 129785482, 260224757] in range 0, 134217728
> Time taken: 0.253 seconds
> hive> 
> {code}
> I can reproduce the problem with either local or yarn-cluster. It seems 
> happening to MR also. Thus, I suspect this is an parquet problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10862) TestHiveAuthorizerShowFilters tests fail when run in sequence

2015-05-29 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565189#comment-14565189
 ] 

Thejas M Nair commented on HIVE-10862:
--

on second thoughts, we need a better solution than this if tests are being 
executed in parallel, same derby location can't be used in parallel.


> TestHiveAuthorizerShowFilters tests fail when run in sequence
> -
>
> Key: HIVE-10862
> URL: https://issues.apache.org/jira/browse/HIVE-10862
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-10862.1.patch
>
>
> The tests fail when there are left over tables or databases from other tests.
> Fails with error like - 
> java.lang.AssertionError: All tables should be passed as arguments 
> expected:<[testhiveauthorizershowfilterstable1, 
> testhiveauthorizershowfilterstable2]> but 
> was:<[testhiveauthorizercheckinvocationtable, 
> testhiveauthorizercheckinvocationtable_acid, 
> testhiveauthorizershowfilterstable1, testhiveauthorizershowfilterstable2]>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10862) TestHiveAuthorizerShowFilters tests fail when run in sequence

2015-05-29 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-10862:
-
Attachment: (was: HIVE-10862.2.patch)

> TestHiveAuthorizerShowFilters tests fail when run in sequence
> -
>
> Key: HIVE-10862
> URL: https://issues.apache.org/jira/browse/HIVE-10862
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-10862.1.patch
>
>
> The tests fail when there are left over tables or databases from other tests.
> Fails with error like - 
> java.lang.AssertionError: All tables should be passed as arguments 
> expected:<[testhiveauthorizershowfilterstable1, 
> testhiveauthorizershowfilterstable2]> but 
> was:<[testhiveauthorizercheckinvocationtable, 
> testhiveauthorizercheckinvocationtable_acid, 
> testhiveauthorizershowfilterstable1, testhiveauthorizershowfilterstable2]>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10862) TestHiveAuthorizerShowFilters tests fail when run in sequence

2015-05-29 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-10862:
-
Attachment: HIVE-10862.2.patch

> TestHiveAuthorizerShowFilters tests fail when run in sequence
> -
>
> Key: HIVE-10862
> URL: https://issues.apache.org/jira/browse/HIVE-10862
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-10862.1.patch, HIVE-10862.2.patch
>
>
> The tests fail when there are left over tables or databases from other tests.
> Fails with error like - 
> java.lang.AssertionError: All tables should be passed as arguments 
> expected:<[testhiveauthorizershowfilterstable1, 
> testhiveauthorizershowfilterstable2]> but 
> was:<[testhiveauthorizercheckinvocationtable, 
> testhiveauthorizercheckinvocationtable_acid, 
> testhiveauthorizershowfilterstable1, testhiveauthorizershowfilterstable2]>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10811) RelFieldTrimmer throws NoSuchElementException in some cases

2015-05-29 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565197#comment-14565197
 ] 

Ashutosh Chauhan commented on HIVE-10811:
-

[~jcamachorodriguez] / [~jpullokkaran] This patch is committed with reported 
failure of TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2 in HiveQA 
run. This test has been failing consistently on all builds post this commit.

> RelFieldTrimmer throws NoSuchElementException in some cases
> ---
>
> Key: HIVE-10811
> URL: https://issues.apache.org/jira/browse/HIVE-10811
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 1.2.1
>
> Attachments: HIVE-10811.01.patch, HIVE-10811.02.patch, 
> HIVE-10811.patch
>
>
> RelFieldTrimmer runs into NoSuchElementException in some cases.
> Stack trace:
> {noformat}
> Exception in thread "main" java.lang.AssertionError: Internal error: While 
> invoking method 'public org.apache.calcite.sql2rel.RelFieldTrimmer$TrimResult 
> org.apache.calcite.sql2rel.RelFieldTrimmer.trimFields(org.apache.calcite.rel.core.Sort,org.apache.calcite.util.ImmutableBitSet,java.util.Set)'
>   at org.apache.calcite.util.Util.newInternal(Util.java:743)
>   at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:543)
>   at 
> org.apache.calcite.sql2rel.RelFieldTrimmer.dispatchTrimFields(RelFieldTrimmer.java:269)
>   at 
> org.apache.calcite.sql2rel.RelFieldTrimmer.trim(RelFieldTrimmer.java:175)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPreJoinOrderingTransforms(CalcitePlanner.java:947)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:820)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:768)
>   at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:109)
>   at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:730)
>   at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:145)
>   at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:105)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:607)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:244)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10048)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:207)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:536)
>   ... 32 more
> Caused by: java.lang.AssertionError: Internal error: While invoking method 
> 'public org.apache.calcite.sql2rel.RelFieldTrimmer$TrimResult 
> org.apache.calcite.sql2rel.RelFieldTrimmer.trimFields(org.apa

[jira] [Commented] (HIVE-9863) Querying parquet tables fails with IllegalStateException [Spark Branch]

2015-05-29 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565196#comment-14565196
 ] 

Xuefu Zhang commented on HIVE-9863:
---

Okay. Let me try to reproduce it with the old release and will update.

> Querying parquet tables fails with IllegalStateException [Spark Branch]
> ---
>
> Key: HIVE-9863
> URL: https://issues.apache.org/jira/browse/HIVE-9863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>
> Not necessarily happens only in spark branch, queries such as select count(*) 
> from table_name fails with error:
> {code}
> hive> select * from content limit 2;
> OK
> Failed with exception java.io.IOException:java.lang.IllegalStateException: 
> All the offsets listed in the split should be found in the file. expected: 
> [4, 4] found: [BlockMetaData{69644, 881917418 [ColumnMetaData{GZIP [guid] 
> BINARY  [PLAIN, BIT_PACKED], 4}, ColumnMetaData{GZIP [collection_name] BINARY 
>  [PLAIN_DICTIONARY, BIT_PACKED], 389571}, ColumnMetaData{GZIP [doc_type] 
> BINARY  [PLAIN_DICTIONARY, BIT_PACKED], 389790}, ColumnMetaData{GZIP [stage] 
> INT64  [PLAIN_DICTIONARY, BIT_PACKED], 389887}, ColumnMetaData{GZIP 
> [meta_timestamp] INT64  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 397673}, 
> ColumnMetaData{GZIP [doc_timestamp] INT64  [RLE, PLAIN_DICTIONARY, 
> BIT_PACKED], 422161}, ColumnMetaData{GZIP [meta_size] INT32  [RLE, 
> PLAIN_DICTIONARY, BIT_PACKED], 460215}, ColumnMetaData{GZIP [content_size] 
> INT32  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 521728}, ColumnMetaData{GZIP 
> [source] BINARY  [RLE, PLAIN, BIT_PACKED], 683740}, ColumnMetaData{GZIP 
> [delete_flag] BOOLEAN  [RLE, PLAIN, BIT_PACKED], 683787}, ColumnMetaData{GZIP 
> [meta] BINARY  [RLE, PLAIN, BIT_PACKED], 683834}, ColumnMetaData{GZIP 
> [content] BINARY  [RLE, PLAIN, BIT_PACKED], 6992365}]}] out of: [4, 
> 129785482, 260224757] in range 0, 134217728
> Time taken: 0.253 seconds
> hive> 
> {code}
> I can reproduce the problem with either local or yarn-cluster. It seems 
> happening to MR also. Thus, I suspect this is an parquet problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10283) HIVE-4240 may be causing issue with bucketed tables

2015-05-29 Thread Yongzhi Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565200#comment-14565200
 ] 

Yongzhi Chen commented on HIVE-10283:
-

[~xuefuz] && [~szehon], could you find someone who know this part well work on 
the issue. Currently, in upstream master code , number of buckets is not 
respected even with insert overwrite. (insert overwrite only create 1 bucket 
file while the table definition is 2. 
Reproduce:
{noformat}
create table buckettest (data string) partitioned by (state string) clustered 
by (data) into 2 buckets;
set hive.enforce.bucketing = true;
insert overwrite table buckettest partition(state='MA') select code from jsmall 
limit 10;
set hive.auto.convert.sortmerge.join=true;
set hive.optimize.bucketmapjoin = true; 
set hive.optimize.bucketmapjoin.sortedmerge = true;
0: jdbc:hive2://localhost:1> select * from buckettest a join 
buckettestoutput2 b on (a.data=b.data);
select * from buckettest a join buckettestoutpu 
t2 b on (a.data=b.data);
Error: Error while compiling statement: FAILED: SemanticException [Error 
10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number of 
buckets for table buckettest partition state=MA is 2, whereas the number of 
files is 1 (state=42000,code=10141)



> HIVE-4240 may be causing issue with bucketed tables 
> 
>
> Key: HIVE-10283
> URL: https://issues.apache.org/jira/browse/HIVE-10283
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Ryan P
>
> I suspect that by removing the reducer, HIVE-4240, may be causing issues. 
> Because of this inserts will not consolidate 'buckets' into single files 
> which is problematic when attempting to use bucketmapjoin.
> CREATE TABLE IF NOT EXISTS buckettestinput( 
> data string 
> ) 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; 
> CREATE TABLE IF NOT EXISTS buckettestoutput1( 
> data string 
> )CLUSTERED BY(data) 
> INTO 2 BUCKETS 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; 
> CREATE TABLE IF NOT EXISTS buckettestoutput2( 
> data string 
> )CLUSTERED BY(data) 
> INTO 2 BUCKETS 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; 
> Then I inserted the following data into the "buckettestinput" table 
> firstinsert1 
> firstinsert2 
> firstinsert3 
> firstinsert4 
> firstinsert5 
> firstinsert6 
> firstinsert7 
> firstinsert8 
> secondinsert1 
> secondinsert2 
> secondinsert3 
> secondinsert4 
> secondinsert5 
> secondinsert6 
> secondinsert7 
> secondinsert8 
> set hive.enforce.bucketing = true; 
> set hive.enforce.sorting=true; 
> insert into table buckettestoutput1 
> select * from buckettestinput where data like 'first%' 
> SELECT * 
> FROM buckettestoutput1 TABLESAMPLE(BUCKET 1 OUT OF 1 ON data) s; 
> insert into table buckettestoutput1 
> select * from buckettestinput where data like 'second%' 
> check the results of the table sample query. 
> for sort merge bucket map join 
> set hive.auto.convert.sortmerge.join=true; 
> set hive.optimize.bucketmapjoin = true; 
> set hive.optimize.bucketmapjoin.sortedmerge = true; 
> set hive.auto.convert.sortmerge.join.noconditionaltask=true; 
> select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data) 
> hive> select * from buckettestoutput1 a join buckettestoutput2 b on 
> (a.data=b.data); 
> FAILED: SemanticException [Error 10141]: Bucketed table metadata is not 
> correct. Fix the metadata or don't use bucketed mapjoin, by setting 
> hive.enforce.bucketmapjoin to false. The number of buckets for table 
> buckettestoutput1 is 2, whereas the number of files is 4 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10811) RelFieldTrimmer throws NoSuchElementException in some cases

2015-05-29 Thread Laljo John Pullokkaran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565203#comment-14565203
 ] 

Laljo John Pullokkaran commented on HIVE-10811:
---

[~jcamachorodriguez] Did you miss a q file update?

> RelFieldTrimmer throws NoSuchElementException in some cases
> ---
>
> Key: HIVE-10811
> URL: https://issues.apache.org/jira/browse/HIVE-10811
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 1.2.1
>
> Attachments: HIVE-10811.01.patch, HIVE-10811.02.patch, 
> HIVE-10811.patch
>
>
> RelFieldTrimmer runs into NoSuchElementException in some cases.
> Stack trace:
> {noformat}
> Exception in thread "main" java.lang.AssertionError: Internal error: While 
> invoking method 'public org.apache.calcite.sql2rel.RelFieldTrimmer$TrimResult 
> org.apache.calcite.sql2rel.RelFieldTrimmer.trimFields(org.apache.calcite.rel.core.Sort,org.apache.calcite.util.ImmutableBitSet,java.util.Set)'
>   at org.apache.calcite.util.Util.newInternal(Util.java:743)
>   at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:543)
>   at 
> org.apache.calcite.sql2rel.RelFieldTrimmer.dispatchTrimFields(RelFieldTrimmer.java:269)
>   at 
> org.apache.calcite.sql2rel.RelFieldTrimmer.trim(RelFieldTrimmer.java:175)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPreJoinOrderingTransforms(CalcitePlanner.java:947)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:820)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:768)
>   at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:109)
>   at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:730)
>   at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:145)
>   at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:105)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:607)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:244)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10048)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:207)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:536)
>   ... 32 more
> Caused by: java.lang.AssertionError: Internal error: While invoking method 
> 'public org.apache.calcite.sql2rel.RelFieldTrimmer$TrimResult 
> org.apache.calcite.sql2rel.RelFieldTrimmer.trimFields(org.apache.calcite.rel.core.Sort,org.apache.calcite.util.ImmutableBitSet,java.util.Set)'
>   at org.apache.calcite.util.Util.newInternal(Util.java:743)
>   at org.apache.ca

[jira] [Updated] (HIVE-10807) Invalidate basic stats for insert queries if autogather=false

2015-05-29 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-10807:

Attachment: HIVE-10807.5.patch

Golden files updated.

> Invalidate basic stats for insert queries if autogather=false
> -
>
> Key: HIVE-10807
> URL: https://issues.apache.org/jira/browse/HIVE-10807
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 1.2.0
>Reporter: Gopal V
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-10807.2.patch, HIVE-10807.3.patch, 
> HIVE-10807.4.patch, HIVE-10807.5.patch, HIVE-10807.patch
>
>
> if stats.autogather=false leads to incorrect basic stats in case of insert 
> statements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10866) Throw error when client try to insert into bucketed table

2015-05-29 Thread Yongzhi Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565205#comment-14565205
 ] 

Yongzhi Chen commented on HIVE-10866:
-

The bucket enforcing seems totally broken. Even insert overwrite does not work 
properly now. 

> Throw error when client try to insert into bucketed table
> -
>
> Key: HIVE-10866
> URL: https://issues.apache.org/jira/browse/HIVE-10866
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
>
> Currently, hive does not support appends(insert into) bucketed table, see 
> open jira HIVE-3608. When insert into such table, the data will be 
> "corrupted" and not fit for bucketmapjoin. 
> We need find a way to prevent client from inserting into such table.
> Reproduce:
> {noformat}
> CREATE TABLE IF NOT EXISTS buckettestoutput1( 
> data string 
> )CLUSTERED BY(data) 
> INTO 2 BUCKETS 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> CREATE TABLE IF NOT EXISTS buckettestoutput2( 
> data string 
> )CLUSTERED BY(data) 
> INTO 2 BUCKETS 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> set hive.enforce.bucketing = true; 
> set hive.enforce.sorting=true;
> insert into table buckettestoutput1 select code from sample_07 where 
> total_emp < 134354250 limit 10;
> After this first insert, I did:
> set hive.auto.convert.sortmerge.join=true; 
> set hive.optimize.bucketmapjoin = true; 
> set hive.optimize.bucketmapjoin.sortedmerge = true; 
> set hive.auto.convert.sortmerge.join.noconditionaltask=true;
> 0: jdbc:hive2://localhost:1> select * from buckettestoutput1 a join 
> buckettestoutput2 b on (a.data=b.data);
> +---+---+
> | data  | data  |
> +---+---+
> +---+---+
> So select works fine. 
> Second insert:
> 0: jdbc:hive2://localhost:1> insert into table buckettestoutput1 select 
> code from sample_07 where total_emp >= 134354250 limit 10;
> No rows affected (61.235 seconds)
> Then select:
> 0: jdbc:hive2://localhost:1> select * from buckettestoutput1 a join 
> buckettestoutput2 b on (a.data=b.data);
> Error: Error while compiling statement: FAILED: SemanticException [Error 
> 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
> bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
> of buckets for table buckettestoutput1 is 2, whereas the number of files is 4 
> (state=42000,code=10141)
> 0: jdbc:hive2://localhost:1>
> {noformat}
> Insert into empty table or partition will be fine, but insert into the 
> non-empty one (after second insert in the reproduce), the bucketmapjoin will 
> throw an error. We should not let second insert succeed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10811) RelFieldTrimmer throws NoSuchElementException in some cases

2015-05-29 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565206#comment-14565206
 ] 

Ashutosh Chauhan commented on HIVE-10811:
-

Nopes.. actual bug. If you click on Hive QA run report of this issue (link in 
previous HIVE QA comment) you will find 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4044/testReport/org.apache.hadoop.hive.cli/TestMinimrCliDriver/testCliDriver_ql_rewrite_gbtoidx_cbo_2/

> RelFieldTrimmer throws NoSuchElementException in some cases
> ---
>
> Key: HIVE-10811
> URL: https://issues.apache.org/jira/browse/HIVE-10811
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 1.2.1
>
> Attachments: HIVE-10811.01.patch, HIVE-10811.02.patch, 
> HIVE-10811.patch
>
>
> RelFieldTrimmer runs into NoSuchElementException in some cases.
> Stack trace:
> {noformat}
> Exception in thread "main" java.lang.AssertionError: Internal error: While 
> invoking method 'public org.apache.calcite.sql2rel.RelFieldTrimmer$TrimResult 
> org.apache.calcite.sql2rel.RelFieldTrimmer.trimFields(org.apache.calcite.rel.core.Sort,org.apache.calcite.util.ImmutableBitSet,java.util.Set)'
>   at org.apache.calcite.util.Util.newInternal(Util.java:743)
>   at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:543)
>   at 
> org.apache.calcite.sql2rel.RelFieldTrimmer.dispatchTrimFields(RelFieldTrimmer.java:269)
>   at 
> org.apache.calcite.sql2rel.RelFieldTrimmer.trim(RelFieldTrimmer.java:175)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPreJoinOrderingTransforms(CalcitePlanner.java:947)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:820)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:768)
>   at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:109)
>   at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:730)
>   at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:145)
>   at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:105)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:607)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:244)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10048)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:207)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:536)
>   ... 32 more
> Caused by: java.lang.AssertionError: Internal error: While invoking method 
> 'public org.apache.calcite.sql2rel.RelFieldTrimmer

[jira] [Commented] (HIVE-10811) RelFieldTrimmer throws NoSuchElementException in some cases

2015-05-29 Thread Laljo John Pullokkaran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565217#comment-14565217
 ] 

Laljo John Pullokkaran commented on HIVE-10811:
---

[~jcamachorodriguez] Let me know if you want me to look at this.

> RelFieldTrimmer throws NoSuchElementException in some cases
> ---
>
> Key: HIVE-10811
> URL: https://issues.apache.org/jira/browse/HIVE-10811
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 1.2.1
>
> Attachments: HIVE-10811.01.patch, HIVE-10811.02.patch, 
> HIVE-10811.patch
>
>
> RelFieldTrimmer runs into NoSuchElementException in some cases.
> Stack trace:
> {noformat}
> Exception in thread "main" java.lang.AssertionError: Internal error: While 
> invoking method 'public org.apache.calcite.sql2rel.RelFieldTrimmer$TrimResult 
> org.apache.calcite.sql2rel.RelFieldTrimmer.trimFields(org.apache.calcite.rel.core.Sort,org.apache.calcite.util.ImmutableBitSet,java.util.Set)'
>   at org.apache.calcite.util.Util.newInternal(Util.java:743)
>   at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:543)
>   at 
> org.apache.calcite.sql2rel.RelFieldTrimmer.dispatchTrimFields(RelFieldTrimmer.java:269)
>   at 
> org.apache.calcite.sql2rel.RelFieldTrimmer.trim(RelFieldTrimmer.java:175)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPreJoinOrderingTransforms(CalcitePlanner.java:947)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:820)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:768)
>   at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:109)
>   at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:730)
>   at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:145)
>   at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:105)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:607)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:244)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10048)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:207)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:536)
>   ... 32 more
> Caused by: java.lang.AssertionError: Internal error: While invoking method 
> 'public org.apache.calcite.sql2rel.RelFieldTrimmer$TrimResult 
> org.apache.calcite.sql2rel.RelFieldTrimmer.trimFields(org.apache.calcite.rel.core.Sort,org.apache.calcite.util.ImmutableBitSet,java.util.Set)'
>   at org.apache.calcite.util.Util.newInternal(Util.java:743)
>   at

[jira] [Updated] (HIVE-10869) fold_case.q failing on trunk

2015-05-29 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-10869:

Attachment: HIVE-10869.patch

Simple golden file update.

> fold_case.q failing on trunk
> 
>
> Key: HIVE-10869
> URL: https://issues.apache.org/jira/browse/HIVE-10869
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 1.3.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-10869.patch
>
>
> Race condition of commits between HIVE-10716 & HIVE-10812



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10834) Support First_value()/last_value() over x preceding and y preceding windowing

2015-05-29 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565235#comment-14565235
 ] 

Ashutosh Chauhan commented on HIVE-10834:
-

+1

> Support First_value()/last_value() over x preceding and y preceding windowing
> -
>
> Key: HIVE-10834
> URL: https://issues.apache.org/jira/browse/HIVE-10834
> Project: Hive
>  Issue Type: Sub-task
>  Components: PTF-Windowing
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-10834.patch
>
>
> Currently the following query
> {noformat}
> select ts, f, first_value(f) over (partition by ts order by t rows between 2 
> preceding and 1 preceding) from over10k limit 100;
> {noformat}
> throws exception:
> {noformat}
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row (tag=0) 
> {"key":{"reducesinkkey0":"2013-03-01 
> 09:11:58.703071","reducesinkkey1":-3},"value":{"_col3":0.83}}
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:256)
> at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:449)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) {"key":{"reducesinkkey0":"2013-03-01 
> 09:11:58.703071","reducesinkkey1":-3},"value":{"_col3":0.83}}
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
> ... 3 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Internal Error: 
> cannot generate all output rows for a Partition
> at 
> org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.finishPartition(WindowingTableFunction.java:519)
> at 
> org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.finishPartition(PTFOperator.java:337)
> at 
> org.apache.hadoop.hive.ql.exec.PTFOperator.process(PTFOperator.java:114)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-6867) Bucketized Table feature fails in some cases

2015-05-29 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565242#comment-14565242
 ] 

Xuefu Zhang commented on HIVE-6867:
---

[~pxiong], Looking at the latest patch, I don't think this is adequate to 
prevent user getting corrupted data due to previously mentioned non-interactive 
clients. Even if you log an warning message, it doesn't help because data 
corruption may have already happened.

I think we should disallow user to do "insert into" a bucketed table, as I 
articulated on RB. This will cover all types of clients.

> Bucketized Table feature fails in some cases
> 
>
> Key: HIVE-6867
> URL: https://issues.apache.org/jira/browse/HIVE-6867
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0
>Reporter: Laljo John Pullokkaran
>Assignee: Pengcheng Xiong
> Attachments: HIVE-6867.01.patch, HIVE-6867.02.patch, 
> HIVE-6867.03.patch, HIVE-6867.04.patch, HIVE-6867.05.patch
>
>
> Bucketized Table feature fails in some cases. if src & destination is 
> bucketed on same key, and if actual data in the src is not bucketed (because 
> data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be 
> bucketed while writing to destination.
> Example
> --
> CREATE TABLE P1(key STRING, val STRING)
> CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE 
> P1;
> – perform an insert to make sure there are 2 files
> INSERT OVERWRITE TABLE P1 select key, val from P1;
> --
> This is not a regression. This has never worked.
> This got only discovered due to Hadoop2 changes.
> In Hadoop1, in local mode, number of reducers will always be 1, regardless of 
> what is requested by app. Hadoop2 now honors the number of reducer setting in 
> local mode (by spawning threads).
> Long term solution seems to be to prevent load data for bucketed table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10835) Concurrency issues in JDBC driver

2015-05-29 Thread Chaoyu Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565241#comment-14565241
 ] 

Chaoyu Tang commented on HIVE-10835:


I think resolving this JIRA might be helpful since it can be used as one of 
test cases to HIVE-4239.  [~thejas], [~xuefuz], [~szehon] and [~sershe], could 
you review the patch? Thanks

> Concurrency issues in JDBC driver
> -
>
> Key: HIVE-10835
> URL: https://issues.apache.org/jira/browse/HIVE-10835
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 1.2.0
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-10835.1.patch, HIVE-10835.2.patch, HIVE-10835.patch
>
>
> Though JDBC specification specifies that "Each Connection object can create 
> multiple Statement objects that may be used concurrently by the program", but 
> that does not work in current Hive JDBC driver. In addition, there also exist 
>  race conditions between DatabaseMetaData, Statement and ResultSet as long as 
> they make RPC calls to HS2 using same Thrift transport, which happens within 
> a connection.
> So we need a connection level lock to serialize all these RPC calls in a 
> connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10870) Merge Spark branch to trunk 5/29/2015

2015-05-29 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10870:
---
Attachment: HIVE-10870.patch

Patch is generated by cherry-picking patches from Spark branch and resolving 
some conflicts.

> Merge Spark branch to trunk 5/29/2015
> -
>
> Key: HIVE-10870
> URL: https://issues.apache.org/jira/browse/HIVE-10870
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-10870.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10835) Concurrency issues in JDBC driver

2015-05-29 Thread Vaibhav Gumashta (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565269#comment-14565269
 ] 

Vaibhav Gumashta commented on HIVE-10835:
-

[~ctang.ma] Thanks for the patch. I'm reviewing it right now.

> Concurrency issues in JDBC driver
> -
>
> Key: HIVE-10835
> URL: https://issues.apache.org/jira/browse/HIVE-10835
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 1.2.0
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-10835.1.patch, HIVE-10835.2.patch, HIVE-10835.patch
>
>
> Though JDBC specification specifies that "Each Connection object can create 
> multiple Statement objects that may be used concurrently by the program", but 
> that does not work in current Hive JDBC driver. In addition, there also exist 
>  race conditions between DatabaseMetaData, Statement and ResultSet as long as 
> they make RPC calls to HS2 using same Thrift transport, which happens within 
> a connection.
> So we need a connection level lock to serialize all these RPC calls in a 
> connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10870) Merge Spark branch to trunk 5/29/2015

2015-05-29 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565279#comment-14565279
 ] 

Chao Sun commented on HIVE-10870:
-

+1 on pending tests.

> Merge Spark branch to trunk 5/29/2015
> -
>
> Key: HIVE-10870
> URL: https://issues.apache.org/jira/browse/HIVE-10870
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-10870.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10835) Concurrency issues in JDBC driver

2015-05-29 Thread Vaibhav Gumashta (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565292#comment-14565292
 ] 

Vaibhav Gumashta commented on HIVE-10835:
-

[~ctang.ma] Patch looks good. Minor comments on rb.

> Concurrency issues in JDBC driver
> -
>
> Key: HIVE-10835
> URL: https://issues.apache.org/jira/browse/HIVE-10835
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 1.2.0
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-10835.1.patch, HIVE-10835.2.patch, HIVE-10835.patch
>
>
> Though JDBC specification specifies that "Each Connection object can create 
> multiple Statement objects that may be used concurrently by the program", but 
> that does not work in current Hive JDBC driver. In addition, there also exist 
>  race conditions between DatabaseMetaData, Statement and ResultSet as long as 
> they make RPC calls to HS2 using same Thrift transport, which happens within 
> a connection.
> So we need a connection level lock to serialize all these RPC calls in a 
> connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10858) WebHCat specific resources should be added to HADOOP_CLASSPATH first

2015-05-29 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565294#comment-14565294
 ] 

Hive QA commented on HIVE-10858:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12735990/HIVE-10858.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 8978 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fold_case
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchAbort
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4094/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4094/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4094/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12735990 - PreCommit-HIVE-TRUNK-Build

> WebHCat specific resources should be added to HADOOP_CLASSPATH first
> 
>
> Key: HIVE-10858
> URL: https://issues.apache.org/jira/browse/HIVE-10858
> Project: Hive
>  Issue Type: Bug
>  Components: WebHCat
>Affects Versions: 1.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-10858.patch
>
>
> When submitting jobs via WebHCat the user may specify additional jars to be 
> included with the job.  Sqoop jobs is one such example where user may need to 
> supply a jar with JDBC classes for a given database.  If the a different 
> version of the same jar is already present in HADOOP_CLASSPATH, we need  to 
> make sure user specified jar is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-4239) Remove lock on compilation stage

2015-05-29 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-4239:
---
Attachment: HIVE-4239.02.patch

Addressed feedback.
I also made some progress making beeline tests work and added some random 
compile-heavy tests, however now even despite proper dependencies they fail 
after the first test(?) with bunch of dependency errors e.g.
{noformat}
java.lang.NoClassDefFoundError: 
org/apache/hive/service/cli/thrift/TCLIService$CloseSession_result$CloseSession_resultStandardScheme
at 
org.apache.hive.service.cli.thrift.TCLIService$CloseSession_result$CloseSession_resultStandardSchemeFactory.getScheme(TCLIService.java:2988)
at 
org.apache.hive.service.cli.thrift.TCLIService$CloseSession_result$CloseSession_resultStandardSchemeFactory.getScheme(TCLIService.java:2986)
at 
org.apache.hive.service.cli.thrift.TCLIService$CloseSession_result.write(TCLIService.java:2943)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: 
org.apache.hive.service.cli.thrift.TCLIService$CloseSession_result$CloseSession_resultStandardScheme
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 10 more
{noformat}

Would you be ok committing this off by default, and enabling the beeline test + 
the flag as the next step?

> Remove lock on compilation stage
> 
>
> Key: HIVE-4239
> URL: https://issues.apache.org/jira/browse/HIVE-4239
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Query Processor
>Reporter: Carl Steinbach
>Assignee: Sergey Shelukhin
> Attachments: HIVE-4239.01.patch, HIVE-4239.02.patch, HIVE-4239.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10868) Update release note for 1.2.0 and 1.1.0

2015-05-29 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565304#comment-14565304
 ] 

Thejas M Nair commented on HIVE-10868:
--

+1
I also took a look at sample of the ones that are being removed. They are being 
rightfully removed, they are ones that were part of 1.0.0 but also incorrectly 
had fix version of 1.1.0 (and I had later removed the 1.1.0 fix version for 
them). [~leftylev] had pointed out this issue earlier.



> Update release note for 1.2.0 and 1.1.0
> ---
>
> Key: HIVE-10868
> URL: https://issues.apache.org/jira/browse/HIVE-10868
> Project: Hive
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 1.2.0, 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-10868.patch
>
>
> It's recently found that Hive's release notes don't contain all JIRAs fixed. 
> This happened due to a lack of correct or missing fix version in a JIRA. A 
> large chunk of such JIRAs are due to the fact that their fix versions didn't 
> get updated when a merge from feature branch to trunk (master). This JIRA is 
> to fix such JIRAs related to Hive on Spark work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10868) Update release note for 1.2.0 and 1.1.0

2015-05-29 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565305#comment-14565305
 ] 

Thejas M Nair commented on HIVE-10868:
--

[~sushanth] Can you +1 for branch-0.12 ?


> Update release note for 1.2.0 and 1.1.0
> ---
>
> Key: HIVE-10868
> URL: https://issues.apache.org/jira/browse/HIVE-10868
> Project: Hive
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 1.2.0, 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-10868.patch
>
>
> It's recently found that Hive's release notes don't contain all JIRAs fixed. 
> This happened due to a lack of correct or missing fix version in a JIRA. A 
> large chunk of such JIRAs are due to the fact that their fix versions didn't 
> get updated when a merge from feature branch to trunk (master). This JIRA is 
> to fix such JIRAs related to Hive on Spark work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10802) Table join query with some constant field in select fails

2015-05-29 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-10802:

Attachment: HIVE-10802.patch

> Table join query with some constant field in select fails
> -
>
> Key: HIVE-10802
> URL: https://issues.apache.org/jira/browse/HIVE-10802
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 1.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-10802.patch
>
>
> The following query fails:
> {noformat}
> create table tb1 (year string, month string);
> create table tb2(month string);
> select unix_timestamp(a.year) 
> from (select * from tb1 where year='2001') a join tb2 b on (a.month=b.month);
> {noformat}
> with the exception {noformat}
> Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:635)
> at java.util.ArrayList.get(ArrayList.java:411)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:118)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.(StandardStructObjectInspector.java:109)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:290)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:275)
> at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.getJoinOutputObjectInspector(CommonJoinOperator.java:175)
> {noformat}
> The issue seems to be: during the query compilation, the field in the select 
> should be replaced with the constant when some UDFs are used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10834) Support First_value()/last_value() over x preceding and y preceding windowing

2015-05-29 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-10834:

Fix Version/s: 1.3.0

> Support First_value()/last_value() over x preceding and y preceding windowing
> -
>
> Key: HIVE-10834
> URL: https://issues.apache.org/jira/browse/HIVE-10834
> Project: Hive
>  Issue Type: Sub-task
>  Components: PTF-Windowing
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Fix For: 1.3.0
>
> Attachments: HIVE-10834.patch
>
>
> Currently the following query
> {noformat}
> select ts, f, first_value(f) over (partition by ts order by t rows between 2 
> preceding and 1 preceding) from over10k limit 100;
> {noformat}
> throws exception:
> {noformat}
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row (tag=0) 
> {"key":{"reducesinkkey0":"2013-03-01 
> 09:11:58.703071","reducesinkkey1":-3},"value":{"_col3":0.83}}
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:256)
> at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:449)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) {"key":{"reducesinkkey0":"2013-03-01 
> 09:11:58.703071","reducesinkkey1":-3},"value":{"_col3":0.83}}
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
> ... 3 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Internal Error: 
> cannot generate all output rows for a Partition
> at 
> org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.finishPartition(WindowingTableFunction.java:519)
> at 
> org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.finishPartition(PTFOperator.java:337)
> at 
> org.apache.hadoop.hive.ql.exec.PTFOperator.process(PTFOperator.java:114)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10834) Support First_value()/last_value() over x preceding and y preceding windowing

2015-05-29 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-10834:

Attachment: HIVE-10834.patch

> Support First_value()/last_value() over x preceding and y preceding windowing
> -
>
> Key: HIVE-10834
> URL: https://issues.apache.org/jira/browse/HIVE-10834
> Project: Hive
>  Issue Type: Sub-task
>  Components: PTF-Windowing
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Fix For: 1.3.0
>
> Attachments: HIVE-10834.patch
>
>
> Currently the following query
> {noformat}
> select ts, f, first_value(f) over (partition by ts order by t rows between 2 
> preceding and 1 preceding) from over10k limit 100;
> {noformat}
> throws exception:
> {noformat}
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row (tag=0) 
> {"key":{"reducesinkkey0":"2013-03-01 
> 09:11:58.703071","reducesinkkey1":-3},"value":{"_col3":0.83}}
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:256)
> at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:449)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) {"key":{"reducesinkkey0":"2013-03-01 
> 09:11:58.703071","reducesinkkey1":-3},"value":{"_col3":0.83}}
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
> ... 3 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Internal Error: 
> cannot generate all output rows for a Partition
> at 
> org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.finishPartition(WindowingTableFunction.java:519)
> at 
> org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.finishPartition(PTFOperator.java:337)
> at 
> org.apache.hadoop.hive.ql.exec.PTFOperator.process(PTFOperator.java:114)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10834) Support First_value()/last_value() over x preceding and y preceding windowing

2015-05-29 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-10834:

Attachment: (was: HIVE-10834.patch)

> Support First_value()/last_value() over x preceding and y preceding windowing
> -
>
> Key: HIVE-10834
> URL: https://issues.apache.org/jira/browse/HIVE-10834
> Project: Hive
>  Issue Type: Sub-task
>  Components: PTF-Windowing
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Fix For: 1.3.0
>
> Attachments: HIVE-10834.patch
>
>
> Currently the following query
> {noformat}
> select ts, f, first_value(f) over (partition by ts order by t rows between 2 
> preceding and 1 preceding) from over10k limit 100;
> {noformat}
> throws exception:
> {noformat}
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row (tag=0) 
> {"key":{"reducesinkkey0":"2013-03-01 
> 09:11:58.703071","reducesinkkey1":-3},"value":{"_col3":0.83}}
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:256)
> at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:449)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) {"key":{"reducesinkkey0":"2013-03-01 
> 09:11:58.703071","reducesinkkey1":-3},"value":{"_col3":0.83}}
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
> ... 3 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Internal Error: 
> cannot generate all output rows for a Partition
> at 
> org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.finishPartition(WindowingTableFunction.java:519)
> at 
> org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.finishPartition(PTFOperator.java:337)
> at 
> org.apache.hadoop.hive.ql.exec.PTFOperator.process(PTFOperator.java:114)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10834) Support First_value()/last_value() over x preceding and y preceding windowing

2015-05-29 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565329#comment-14565329
 ] 

Aihua Xu commented on HIVE-10834:
-

Thanks Ashutosh.

> Support First_value()/last_value() over x preceding and y preceding windowing
> -
>
> Key: HIVE-10834
> URL: https://issues.apache.org/jira/browse/HIVE-10834
> Project: Hive
>  Issue Type: Sub-task
>  Components: PTF-Windowing
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Fix For: 1.3.0
>
> Attachments: HIVE-10834.patch
>
>
> Currently the following query
> {noformat}
> select ts, f, first_value(f) over (partition by ts order by t rows between 2 
> preceding and 1 preceding) from over10k limit 100;
> {noformat}
> throws exception:
> {noformat}
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row (tag=0) 
> {"key":{"reducesinkkey0":"2013-03-01 
> 09:11:58.703071","reducesinkkey1":-3},"value":{"_col3":0.83}}
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:256)
> at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:449)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) {"key":{"reducesinkkey0":"2013-03-01 
> 09:11:58.703071","reducesinkkey1":-3},"value":{"_col3":0.83}}
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
> ... 3 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Internal Error: 
> cannot generate all output rows for a Partition
> at 
> org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.finishPartition(WindowingTableFunction.java:519)
> at 
> org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.finishPartition(PTFOperator.java:337)
> at 
> org.apache.hadoop.hive.ql.exec.PTFOperator.process(PTFOperator.java:114)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10752) Revert HIVE-5193

2015-05-29 Thread Chaoyu Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565339#comment-14565339
 ] 

Chaoyu Tang commented on HIVE-10752:


+1

> Revert HIVE-5193
> 
>
> Key: HIVE-10752
> URL: https://issues.apache.org/jira/browse/HIVE-10752
> Project: Hive
>  Issue Type: Sub-task
>  Components: HCatalog
>Affects Versions: 1.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-10752.patch
>
>
> Revert HIVE-5193 since it causes pig+hcatalog not working.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-6867) Bucketized Table feature fails in some cases

2015-05-29 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565340#comment-14565340
 ] 

Pengcheng Xiong commented on HIVE-6867:
---

[~xuefuz], thanks for your comments. If you read the comments from 
[~jpullokkaran] in the review board, you will find out that this patch is 
targeting "load into" a bucketed table rather than "insert into" a bucketed 
table.

> Bucketized Table feature fails in some cases
> 
>
> Key: HIVE-6867
> URL: https://issues.apache.org/jira/browse/HIVE-6867
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0
>Reporter: Laljo John Pullokkaran
>Assignee: Pengcheng Xiong
> Attachments: HIVE-6867.01.patch, HIVE-6867.02.patch, 
> HIVE-6867.03.patch, HIVE-6867.04.patch, HIVE-6867.05.patch
>
>
> Bucketized Table feature fails in some cases. if src & destination is 
> bucketed on same key, and if actual data in the src is not bucketed (because 
> data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be 
> bucketed while writing to destination.
> Example
> --
> CREATE TABLE P1(key STRING, val STRING)
> CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE 
> P1;
> – perform an insert to make sure there are 2 files
> INSERT OVERWRITE TABLE P1 select key, val from P1;
> --
> This is not a regression. This has never worked.
> This got only discovered due to Hadoop2 changes.
> In Hadoop1, in local mode, number of reducers will always be 1, regardless of 
> what is requested by app. Hadoop2 now honors the number of reducer setting in 
> local mode (by spawning threads).
> Long term solution seems to be to prevent load data for bucketed table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10807) Invalidate basic stats for insert queries if autogather=false

2015-05-29 Thread Gunther Hagleitner (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565343#comment-14565343
 ] 

Gunther Hagleitner commented on HIVE-10807:
---

Seems straightforward (adding a clear to inserts iff autogather=false), but I 
don't understand some of the golden file updates. Could you comment on why 
these are correct. (cc [~mmokhtar]).

> Invalidate basic stats for insert queries if autogather=false
> -
>
> Key: HIVE-10807
> URL: https://issues.apache.org/jira/browse/HIVE-10807
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 1.2.0
>Reporter: Gopal V
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-10807.2.patch, HIVE-10807.3.patch, 
> HIVE-10807.4.patch, HIVE-10807.5.patch, HIVE-10807.patch
>
>
> if stats.autogather=false leads to incorrect basic stats in case of insert 
> statements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10862) TestHiveAuthorizerShowFilters tests fail when run in sequence

2015-05-29 Thread Gunther Hagleitner (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565348#comment-14565348
 ] 

Gunther Hagleitner commented on HIVE-10862:
---

+1

> TestHiveAuthorizerShowFilters tests fail when run in sequence
> -
>
> Key: HIVE-10862
> URL: https://issues.apache.org/jira/browse/HIVE-10862
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-10862.1.patch, HIVE-10862.2.patch
>
>
> The tests fail when there are left over tables or databases from other tests.
> Fails with error like - 
> java.lang.AssertionError: All tables should be passed as arguments 
> expected:<[testhiveauthorizershowfilterstable1, 
> testhiveauthorizershowfilterstable2]> but 
> was:<[testhiveauthorizercheckinvocationtable, 
> testhiveauthorizercheckinvocationtable_acid, 
> testhiveauthorizershowfilterstable1, testhiveauthorizershowfilterstable2]>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10752) Revert HIVE-5193

2015-05-29 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565401#comment-14565401
 ] 

Mithun Radhakrishnan commented on HIVE-10752:
-


bq. Given that HIVE-5193 broke some functionality and it was just for columnar 
table performance improvement, in addition that patch provided in HIVE-10720 
did still not solve the issue.

While I agree that HIVE-5193 did introduce a bug, I can't yet agree that we 
should revert it. [~viraj] is currently testing whether the one-liner posted in 
HIVE-10720 doesn't resolve the issue. (My understanding was that this does.) 
I'll let him confirm shortly.

In the meantime, please consider that the fix 
({{ColumnProjectionUtils.setReadColumnIDs(job.getConfiguration(), null);}}) is 
only applied when {{requiredFieldsInfo == null}}, which is shorthand for Pig 
requiring all columns. So the deserialization is not done in all cases. It's 
only for when all fields are required. There isn't any loss of performance in 
this case.

Am I missing something?

> Revert HIVE-5193
> 
>
> Key: HIVE-10752
> URL: https://issues.apache.org/jira/browse/HIVE-10752
> Project: Hive
>  Issue Type: Sub-task
>  Components: HCatalog
>Affects Versions: 1.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-10752.patch
>
>
> Revert HIVE-5193 since it causes pig+hcatalog not working.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10605) Make hive version number update automatically in webhcat-default.xml during hive tar generation

2015-05-29 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-10605:
--
Fix Version/s: 1.2.1

> Make hive version number update automatically in webhcat-default.xml during 
> hive tar generation
> ---
>
> Key: HIVE-10605
> URL: https://issues.apache.org/jira/browse/HIVE-10605
> Project: Hive
>  Issue Type: Bug
>  Components: WebHCat
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Fix For: 1.3.0, 1.2.1
>
> Attachments: HIVE-10605.patch
>
>
> so we don't have to do HIVE-10604 on each release



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10720) Pig using HCatLoader to access RCFile and perform join but get incorrect result.

2015-05-29 Thread Viraj Bhat (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565406#comment-14565406
 ] 

Viraj Bhat commented on HIVE-10720:
---

Hi Aihua,
 Can you explain me how the patch I put up in this Jira will lead to loss of 
functionality. I am setting the 
{code}
ColumnProjectionUtils.setReadColumnIDs(job.getConfiguration(), null);
{code}
only for RequiredFieldList is null. Also let me check internally if we have 
some changes to ColumnProjectionUtils.

Viraj

> Pig using HCatLoader to access RCFile and perform join but get incorrect 
> result.
> 
>
> Key: HIVE-10720
> URL: https://issues.apache.org/jira/browse/HIVE-10720
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.3.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-10720.patch
>
>
> {noformat}
> Create table tbl1 (key string, value string) stored as rcfile;
> Create table tbl2 (key string, value string);
> insert into tbl1 values('1', 'value1');
> insert into tbl2 values('1', 'value2');
> {noformat}
> Pig script:
> {noformat}
> tbl1 = LOAD 'tbl1' USING org.apache.hive.hcatalog.pig.HCatLoader();
> tbl2 = LOAD 'tbl2' USING org.apache.hive.hcatalog.pig.HCatLoader();
> src_tbl1 = FILTER tbl1 BY (key == '1');
> prj_tbl1 = FOREACH src_tbl1 GENERATE
>key as tbl1_key,
>value as tbl1_value,
>'333' as tbl1_v1;
>
> src_tbl2 = FILTER tbl2 BY (key == '1');
> prj_tbl2 = FOREACH src_tbl2 GENERATE
>key as tbl2_key,
>value as tbl2_value;
>
> result = JOIN prj_tbl1 BY (tbl1_key), prj_tbl2 BY (tbl2_key);
> prj_result = FOREACH result 
>   GENERATE  prj_tbl1::tbl1_key AS key1,
> prj_tbl1::tbl1_value AS value1,
> prj_tbl1::tbl1_v1 AS v1,
> prj_tbl2::tbl2_key AS key2,
> prj_tbl2::tbl2_value AS value2;
>
> dump prj_result;
> {noformat}
> We could see different invalid results or even no result which should return.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 >

1 - 100 of 142 matches

Mail list logo