from:"Xuefu Zhang"


[ 
https://issues.apache.org/jira/browse/HIVE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277628#comment-14277628
 ] 

Xuefu Zhang commented on HIVE-9378:
---

[~jxiang], qfile test is supposed to run independently though it may seem 
inefficient. I'm not sure if there is a strong need to change this. However, 2e 
don't want them to share a session because each test may have different 
configurations.

 Spark qfile tests should reuse RSC
 --

 Key: HIVE-9378
 URL: https://issues.apache.org/jira/browse/HIVE-9378
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang

 Run several qfile tests, use jps to monitor the java processes. You will find 
 several SparkSubmitDriverBootstrapper processes are created (not the same 
 time of course).  It seems to me that we create a RSC for each qfile, then 
 terminate it when this qfile test is done. The RSC seems not shared among 
 qfiles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9367) CombineFileInputFormatShim#getDirIndices is expensive


[ 
https://issues.apache.org/jira/browse/HIVE-9367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277642#comment-14277642
 ] 

Xuefu Zhang commented on HIVE-9367:
---

Thanks for the explanation. This is a shim class, so we are okay. Patch looks 
good to me. One note though, is that prune() method seems no longer needed. 
Could you remove it? 

 CombineFileInputFormatShim#getDirIndices is expensive
 -

 Key: HIVE-9367
 URL: https://issues.apache.org/jira/browse/HIVE-9367
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: HIVE-9367.1.patch


 [~lirui] found out that we spent quite some time on 
 CombineFileInputFormatShim#getDirIndices. Looked into it and it seems to me 
 we should be able to get rid of this method completely if we can enhance 
 CombineFileInputFormatShim a little.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9379) Fix tests with some versions of Spark + Snappy [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277875#comment-14277875
 ] 

Xuefu Zhang commented on HIVE-9379:
---

+1. Looks good to me, and good for mac users.

 Fix tests with some versions of Spark + Snappy [Spark Branch]
 -

 Key: HIVE-9379
 URL: https://issues.apache.org/jira/browse/HIVE-9379
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-9379.1-spark.patch


 Some versions of spark use a snappy versions which requires the following 
 properties on OSX:
 {noformat}
 -Dorg.xerial.snappy.tempdir=/tmp 
 -Dorg.xerial.snappy.lib.name=libsnappyjava.jnilib
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9342) add num-executors / executor-cores / executor-memory option support for hive on spark in Yarn mode


[ 
https://issues.apache.org/jira/browse/HIVE-9342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277900#comment-14277900
 ] 

Xuefu Zhang commented on HIVE-9342:
---

+1

 add num-executors / executor-cores / executor-memory option support for hive 
 on spark in Yarn mode
 --

 Key: HIVE-9342
 URL: https://issues.apache.org/jira/browse/HIVE-9342
 Project: Hive
  Issue Type: Improvement
  Components: spark-branch
Affects Versions: spark-branch
Reporter: Pierre Yin
Priority: Minor
  Labels: spark
 Fix For: spark-branch

 Attachments: HIVE-9342.1-spark.patch, HIVE-9342.2-spark.patch


 When I run hive on spark with Yarn mode, I want to control some yarn option, 
 such as --num-executors, --executor-cores, --executor-memory.
 We can append these options into argv in SparkClientImpl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9342) add num-executors / executor-cores / executor-memory option support for hive on spark in Yarn mode [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9342:
--
Summary: add num-executors / executor-cores / executor-memory option 
support for hive on spark in Yarn mode [Spark Branch]  (was: add num-executors 
/ executor-cores / executor-memory option support for hive on spark in Yarn 
mode)

 add num-executors / executor-cores / executor-memory option support for hive 
 on spark in Yarn mode [Spark Branch]
 -

 Key: HIVE-9342
 URL: https://issues.apache.org/jira/browse/HIVE-9342
 Project: Hive
  Issue Type: Improvement
  Components: spark-branch
Affects Versions: spark-branch
Reporter: Pierre Yin
Priority: Minor
  Labels: spark
 Fix For: spark-branch

 Attachments: HIVE-9342.1-spark.patch, HIVE-9342.2-spark.patch


 When I run hive on spark with Yarn mode, I want to control some yarn option, 
 such as --num-executors, --executor-cores, --executor-memory.
 We can append these options into argv in SparkClientImpl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9372) Parallel checking non-combinable paths in CombineHiveInputFormat


[ 
https://issues.apache.org/jira/browse/HIVE-9372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278053#comment-14278053
 ] 

Xuefu Zhang commented on HIVE-9372:
---

Your patch is for trunk, which has a longer test queue: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/

 Parallel checking non-combinable paths in CombineHiveInputFormat
 

 Key: HIVE-9372
 URL: https://issues.apache.org/jira/browse/HIVE-9372
 Project: Hive
  Issue Type: Improvement
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-9372.1.patch


 Checking if an input path is combinable is expensive. So we should make it 
 parallel.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9342) add num-executors / executor-cores / executor-memory option support for hive on spark in Yarn mode [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9342:
--
Issue Type: Sub-task  (was: Improvement)
Parent: HIVE-7292

 add num-executors / executor-cores / executor-memory option support for hive 
 on spark in Yarn mode [Spark Branch]
 -

 Key: HIVE-9342
 URL: https://issues.apache.org/jira/browse/HIVE-9342
 Project: Hive
  Issue Type: Sub-task
  Components: spark-branch
Affects Versions: spark-branch
Reporter: Pierre Yin
Priority: Minor
  Labels: spark
 Fix For: spark-branch

 Attachments: HIVE-9342.1-spark.patch, HIVE-9342.2-spark.patch


 When I run hive on spark with Yarn mode, I want to control some yarn option, 
 such as --num-executors, --executor-cores, --executor-memory.
 We can append these options into argv in SparkClientImpl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9367) CombineFileInputFormatShim#getDirIndices is expensive


[ 
https://issues.apache.org/jira/browse/HIVE-9367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277931#comment-14277931
 ] 

Xuefu Zhang commented on HIVE-9367:
---

+1 pending on test

 CombineFileInputFormatShim#getDirIndices is expensive
 -

 Key: HIVE-9367
 URL: https://issues.apache.org/jira/browse/HIVE-9367
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: HIVE-9367.1.patch, HIVE-9367.2.patch


 [~lirui] found out that we spent quite some time on 
 CombineFileInputFormatShim#getDirIndices. Looked into it and it seems to me 
 we should be able to get rid of this method completely if we can enhance 
 CombineFileInputFormatShim a little.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9372) Parallel checking non-combinable paths in CombineHiveInputFormat


[ 
https://issues.apache.org/jira/browse/HIVE-9372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277982#comment-14277982
 ] 

Xuefu Zhang commented on HIVE-9372:
---

Patch looks good to me. One minor comment, can we give initial size to the 
following list since we know how many elements we will have?
{code}
ListFutureSetInteger futureList = new ArrayListFutureSetInteger();
{code}

 Parallel checking non-combinable paths in CombineHiveInputFormat
 

 Key: HIVE-9372
 URL: https://issues.apache.org/jira/browse/HIVE-9372
 Project: Hive
  Issue Type: Improvement
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-9372.1.patch


 Checking if an input path is combinable is expensive. So we should make it 
 parallel.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9178) Create a separate API for remote Spark Context RPC other than job submission [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278179#comment-14278179
 ] 

Xuefu Zhang commented on HIVE-9178:
---

+1

 Create a separate API for remote Spark Context RPC other than job submission 
 [Spark Branch]
 ---

 Key: HIVE-9178
 URL: https://issues.apache.org/jira/browse/HIVE-9178
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Marcelo Vanzin
 Attachments: HIVE-9178.1-spark.patch, HIVE-9178.1-spark.patch, 
 HIVE-9178.2-spark.patch, HIVE-9178.2-spark.patch


 Based on discussions in HIVE-8972, it seems making sense to create a separate 
 API for RPCs, such as addJar and getExecutorCounter. These jobs are different 
 from a query submission in that they don't need to be queued in the backend 
 and can be executed right away.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9367) CombineFileInputFormatShim#getDirIndices is expensive


[ 
https://issues.apache.org/jira/browse/HIVE-9367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277070#comment-14277070
 ] 

Xuefu Zhang commented on HIVE-9367:
---

Nice improvement. However, I'm a little concerned about overriding listStatus() 
method, as an caller (including subclasses) would suddently get a list with 
folders excluded. I'm wondering if it's possible to achieve the same 
optimization w/o overriding that method.

 CombineFileInputFormatShim#getDirIndices is expensive
 -

 Key: HIVE-9367
 URL: https://issues.apache.org/jira/browse/HIVE-9367
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: HIVE-9367.1.patch


 [~lirui] found out that we spent quite some time on 
 CombineFileInputFormatShim#getDirIndices. Looked into it and it seems to me 
 we should be able to get rid of this method completely if we can enhance 
 CombineFileInputFormatShim a little.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-9178) Create a separate API for remote Spark Context RPC other than job submission [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276485#comment-14276485
 ] 

Xuefu Zhang edited comment on HIVE-9178 at 1/14/15 5:10 AM:


The dummy patch produces 3 failures and the run takes 1 hour 48 minutes. It 
seems likely that the patch has some defects. Chengxiang's question might be a 
hint.


was (Author: xuefuz):
The dummy patch produces 3 failures and the run takes 1 hour 48 minutes. It 
seems likely that the patch has some defects.

 Create a separate API for remote Spark Context RPC other than job submission 
 [Spark Branch]
 ---

 Key: HIVE-9178
 URL: https://issues.apache.org/jira/browse/HIVE-9178
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Marcelo Vanzin
 Attachments: HIVE-9178.1-spark.patch, HIVE-9178.1-spark.patch, 
 HIVE-9178.2-spark.patch


 Based on discussions in HIVE-8972, it seems making sense to create a separate 
 API for RPCs, such as addJar and getExecutorCounter. These jobs are different 
 from a query submission in that they don't need to be queued in the backend 
 and can be executed right away.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9178) Create a separate API for remote Spark Context RPC other than job submission [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276485#comment-14276485
 ] 

Xuefu Zhang commented on HIVE-9178:
---

The dummy patch produces 3 failures and the run takes 1 hour 48 minutes. It 
seems likely that the patch has some defects.

 Create a separate API for remote Spark Context RPC other than job submission 
 [Spark Branch]
 ---

 Key: HIVE-9178
 URL: https://issues.apache.org/jira/browse/HIVE-9178
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Marcelo Vanzin
 Attachments: HIVE-9178.1-spark.patch, HIVE-9178.1-spark.patch, 
 HIVE-9178.2-spark.patch


 Based on discussions in HIVE-8972, it seems making sense to create a separate 
 API for RPCs, such as addJar and getExecutorCounter. These jobs are different 
 from a query submission in that they don't need to be queued in the backend 
 and can be executed right away.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9178) Create a separate API for remote Spark Context RPC other than job submission [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9178:
--
Attachment: HIVE-9178.1-spark.patch

Reattach the same patch to have another test run.

 Create a separate API for remote Spark Context RPC other than job submission 
 [Spark Branch]
 ---

 Key: HIVE-9178
 URL: https://issues.apache.org/jira/browse/HIVE-9178
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Marcelo Vanzin
 Attachments: HIVE-9178.1-spark.patch, HIVE-9178.1-spark.patch


 Based on discussions in HIVE-8972, it seems making sense to create a separate 
 API for RPCs, such as addJar and getExecutorCounter. These jobs are different 
 from a query submission in that they don't need to be queued in the backend 
 and can be executed right away.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9178) Create a separate API for remote Spark Context RPC other than job submission [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9178:
--
Attachment: HIVE-9178.2-spark.patch

Attached a dummy patch to test the test env.

 Create a separate API for remote Spark Context RPC other than job submission 
 [Spark Branch]
 ---

 Key: HIVE-9178
 URL: https://issues.apache.org/jira/browse/HIVE-9178
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Marcelo Vanzin
 Attachments: HIVE-9178.1-spark.patch, HIVE-9178.1-spark.patch, 
 HIVE-9178.2-spark.patch


 Based on discussions in HIVE-8972, it seems making sense to create a separate 
 API for RPCs, such as addJar and getExecutorCounter. These jobs are different 
 from a query submission in that they don't need to be queued in the backend 
 and can be executed right away.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9178) Create a separate API for remote Spark Context RPC other than job submission [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276223#comment-14276223
 ] 

Xuefu Zhang commented on HIVE-9178:
---

It looks like that the patch somehow has increased the test run time quite 
dramatically. Normally it takes about an hour to finish, but now it has been 
running for over 4 hours and is still going.

http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/638/
 -- last one
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/639/
 -- currently running


 Create a separate API for remote Spark Context RPC other than job submission 
 [Spark Branch]
 ---

 Key: HIVE-9178
 URL: https://issues.apache.org/jira/browse/HIVE-9178
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Marcelo Vanzin
 Attachments: HIVE-9178.1-spark.patch, HIVE-9178.1-spark.patch


 Based on discussions in HIVE-8972, it seems making sense to create a separate 
 API for RPCs, such as addJar and getExecutorCounter. These jobs are different 
 from a query submission in that they don't need to be queued in the backend 
 and can be executed right away.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9342) add num-executors / executor-cores / executor-memory option support for hive on spark in Yarn mode


[ 
https://issues.apache.org/jira/browse/HIVE-9342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14274074#comment-14274074
 ] 

Xuefu Zhang commented on HIVE-9342:
---

[~fangxi.yin], thanks for working on this.

[~chengxiang li], could you please take a look the proposed change, especially 
in light of Spark dynamic executor scaling? Also note that Spark standalone 
mode is also supported by Hive.

 add num-executors / executor-cores / executor-memory option support for hive 
 on spark in Yarn mode
 --

 Key: HIVE-9342
 URL: https://issues.apache.org/jira/browse/HIVE-9342
 Project: Hive
  Issue Type: Improvement
  Components: spark-branch
Affects Versions: spark-branch
Reporter: Pierre Yin
Priority: Minor
  Labels: spark
 Fix For: spark-branch

 Attachments: HIVE-9342.1-spark.patch


 When I run hive on spark with Yarn mode, I want to control some yarn option, 
 such as --num-executors, --executor-cores, --executor-memory.
 We can append these options into argv in SparkClientImpl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9340) Address review of HIVE-9257 (ii) [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9340:
--
Summary: Address review of HIVE-9257 (ii) [Spark Branch]  (was: Address 
review of HIVE-9257 (ii))

 Address review of HIVE-9257 (ii) [Spark Branch]
 ---

 Key: HIVE-9340
 URL: https://issues.apache.org/jira/browse/HIVE-9340
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-9257-spark.patch


 Some minor fixes:
 1.  Get rid of spark_test.q, which was used to test the sparkCliDriver test 
 fw.
 2.  Get rid of spark-snapshot repository dep in pom (found by Xuefu)
 3.  Cleanup ExplainTask to get rid of * in imports. (found by Xuefu)
 4.  Reorder the scala/spark dependencies in pom to fit the alphabetical order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9340) Address review of HIVE-9257 (ii)


[ 
https://issues.apache.org/jira/browse/HIVE-9340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14274303#comment-14274303
 ] 

Xuefu Zhang commented on HIVE-9340:
---

+1 pending on test

 Address review of HIVE-9257 (ii)
 

 Key: HIVE-9340
 URL: https://issues.apache.org/jira/browse/HIVE-9340
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-9257-spark.patch


 Some minor fixes:
 1.  Get rid of spark_test.q, which was used to test the sparkCliDriver test 
 fw.
 2.  Get rid of spark-snapshot repository dep in pom (found by Xuefu)
 3.  Cleanup ExplainTask to get rid of * in imports. (found by Xuefu)
 4.  Reorder the scala/spark dependencies in pom to fit the alphabetical order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: new hive udfs

2015-01-12 Thread Xuefu Zhang

Hi Alex,

This should be a good starting point:
https://cwiki.apache.org/confluence/display/Hive/HowToContribute.

Thanks,
Xuefu

On Mon, Jan 12, 2015 at 2:37 PM, Alexander Pivovarov apivova...@gmail.com
wrote:

 Hi Everyone

 I have several custom udfs I want to contribute to hive

 month_add
 last_day

 greatest
 least

 What is the process for adding new UDFs?

 Alex

Re: new hive udfs

2015-01-12 Thread Xuefu Zhang

No. You can just create JIRA describing your reasoning and attach your
patch for review.

On Mon, Jan 12, 2015 at 2:53 PM, Alexander Pivovarov apivova...@gmail.com
wrote:

 I mean should I get any approval before creating JIRA?

 Just want to make sure that these UDFs are needed.

 On Mon, Jan 12, 2015 at 2:48 PM, Xuefu Zhang xzh...@cloudera.com wrote:

  Hi Alex,
 
  This should be a good starting point:
  https://cwiki.apache.org/confluence/display/Hive/HowToContribute.
 
  Thanks,
  Xuefu
 
  On Mon, Jan 12, 2015 at 2:37 PM, Alexander Pivovarov 
 apivova...@gmail.com
  
  wrote:
 
   Hi Everyone
  
   I have several custom udfs I want to contribute to hive
  
   month_add
   last_day
  
   greatest
   least
  
   What is the process for adding new UDFs?
  
   Alex

[jira] [Commented] (HIVE-9178) Create a separate API for remote Spark Context RPC other than job submission [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14274678#comment-14274678
 ] 

Xuefu Zhang commented on HIVE-9178:
---

The patch looks good to me. [~chengxiang li], could you also take a look.

[~brocknoland], I'm wondering why the test hasn't kicked in for this.

 Create a separate API for remote Spark Context RPC other than job submission 
 [Spark Branch]
 ---

 Key: HIVE-9178
 URL: https://issues.apache.org/jira/browse/HIVE-9178
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Marcelo Vanzin
 Attachments: HIVE-9178.1-spark.patch


 Based on discussions in HIVE-8972, it seems making sense to create a separate 
 API for RPCs, such as addJar and getExecutorCounter. These jobs are different 
 from a query submission in that they don't need to be queued in the backend 
 and can be executed right away.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9258) Explain query should share the same Spark application with regular queries [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9258:
--
Description: Currently for Hive on Spark, query plan includes the number of 
reducers, which is determined partly by the Spark cluster. Thus, explain query 
will need to launch a Spark application (Spark remote context), which should be 
shared with regular queries so that we don't launch additional Spark remote 
context.  (was: Currently for Hive on Spark, query plan includes the number of 
reducers, which is determined partly by the Spark cluster. Thus, explain query 
will need to launch a Spark application (Spark remote context), which is 
costly. To make things worse, the application is discarded right way.

Ideally, we shouldn't launch a Spark application even for an explain query.)

 Explain query should share the same Spark application with regular queries 
 [Spark Branch]
 -

 Key: HIVE-9258
 URL: https://issues.apache.org/jira/browse/HIVE-9258
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Jimmy Xiang

 Currently for Hive on Spark, query plan includes the number of reducers, 
 which is determined partly by the Spark cluster. Thus, explain query will 
 need to launch a Spark application (Spark remote context), which should be 
 shared with regular queries so that we don't launch additional Spark remote 
 context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9258) Explain query should share the same Spark application with regular queries [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9258:
--
Summary: Explain query should share the same Spark application with regular 
queries [Spark Branch]  (was: Explain query shouldn't launch a Spark 
application [Spark Branch])

 Explain query should share the same Spark application with regular queries 
 [Spark Branch]
 -

 Key: HIVE-9258
 URL: https://issues.apache.org/jira/browse/HIVE-9258
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Jimmy Xiang

 Currently for Hive on Spark, query plan includes the number of reducers, 
 which is determined partly by the Spark cluster. Thus, explain query will 
 need to launch a Spark application (Spark remote context), which is costly. 
 To make things worse, the application is discarded right way.
 Ideally, we shouldn't launch a Spark application even for an explain query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9258) Explain query shouldn't launch a Spark application [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14274528#comment-14274528
 ] 

Xuefu Zhang commented on HIVE-9258:
---

[~jxiang], thanks for looking into this. Looking at the code, I see that it 
uses SparkSession instance which is indeed shared with regular queries. Since 
this is confirmed, please close this as not a problem. 

BTW, I noticed that we have local cache for sparkMemoryAndCores as in:
{code}
if (sparkMemoryAndCores == null) {
{code}
This would mean that we wouldn't update the value in the entire user session. 
However, this value can change dynamically. Do you think we should not cache 
the value?

 Explain query shouldn't launch a Spark application [Spark Branch]
 -

 Key: HIVE-9258
 URL: https://issues.apache.org/jira/browse/HIVE-9258
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Jimmy Xiang

 Currently for Hive on Spark, query plan includes the number of reducers, 
 which is determined partly by the Spark cluster. Thus, explain query will 
 need to launch a Spark application (Spark remote context), which is costly. 
 To make things worse, the application is discarded right way.
 Ideally, we shouldn't launch a Spark application even for an explain query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9258) Explain query should share the same Spark application with regular queries [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14274632#comment-14274632
 ] 

Xuefu Zhang commented on HIVE-9258:
---

Makes sense. Thanks for the explanation.

 Explain query should share the same Spark application with regular queries 
 [Spark Branch]
 -

 Key: HIVE-9258
 URL: https://issues.apache.org/jira/browse/HIVE-9258
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Jimmy Xiang

 Currently for Hive on Spark, query plan includes the number of reducers, 
 which is determined partly by the Spark cluster. Thus, explain query will 
 need to launch a Spark application (Spark remote context), which should be 
 shared with regular queries so that we don't launch additional Spark remote 
 context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9135) Cache Map and Reduce works in RSC [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14274537#comment-14274537
 ] 

Xuefu Zhang commented on HIVE-9135:
---

+1

 Cache Map and Reduce works in RSC [Spark Branch]
 

 Key: HIVE-9135
 URL: https://issues.apache.org/jira/browse/HIVE-9135
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Jimmy Xiang
 Fix For: spark-branch

 Attachments: HIVE-9135.1-spark.patch, HIVE-9135.1-spark.patch, 
 HIVE-9135.3-spark.patch, HIVE-9135.3.patch, HIVE-9135.4-spark.patch


 HIVE-9127 works around the fact that we don't cache Map/Reduce works in 
 Spark. However, other input formats such as HiveInputFormat will not benefit 
 from that fix. We should investigate how to allow caching on the RSC while 
 not on tasks (see HIVE-7431).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9135) Cache Map and Reduce works in RSC [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9135:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to Spark branch. Thanks, Jimmy!

 Cache Map and Reduce works in RSC [Spark Branch]
 

 Key: HIVE-9135
 URL: https://issues.apache.org/jira/browse/HIVE-9135
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Jimmy Xiang
 Fix For: spark-branch

 Attachments: HIVE-9135.1-spark.patch, HIVE-9135.1-spark.patch, 
 HIVE-9135.3-spark.patch, HIVE-9135.3.patch, HIVE-9135.4-spark.patch


 HIVE-9127 works around the fact that we don't cache Map/Reduce works in 
 Spark. However, other input formats such as HiveInputFormat will not benefit 
 from that fix. We should investigate how to allow caching on the RSC while 
 not on tasks (see HIVE-7431).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9257) Merge from spark to trunk January 2015


[ 
https://issues.apache.org/jira/browse/HIVE-9257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273177#comment-14273177
 ] 

Xuefu Zhang commented on HIVE-9257:
---

Actually my comments on RB were not covered by HIVE-9335, which already has +1 
pending. We may need a separate JIRA to cover them.

 Merge from spark to trunk January 2015
 --

 Key: HIVE-9257
 URL: https://issues.apache.org/jira/browse/HIVE-9257
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: 0.15.0
Reporter: Szehon Ho
Assignee: Szehon Ho
  Labels: TODOC15
 Fix For: 0.15.0

 Attachments: trunk-mr2-spark-merge.properties


 The hive on spark work has reached a point where we can merge it into the 
 trunk branch.  Note that spark execution engine is optional and no current 
 users should be impacted.
 This JIRA will be used to track the merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9339) Optimize split grouping for CombineHiveInputFormat [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273185#comment-14273185
 ] 

Xuefu Zhang commented on HIVE-9339:
---

cc: [~lirui]

 Optimize split grouping for CombineHiveInputFormat [Spark Branch]
 -

 Key: HIVE-9339
 URL: https://issues.apache.org/jira/browse/HIVE-9339
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang

 It seems that split generation, especially in terms of grouping inputs, needs 
 to be improved. For this, we may need cluster information. Because of this, 
 we will first try to solve the problem for Spark.
 As to cluster information, Spark doesn't provide an API (SPARK-5080). 
 However, Spark doesn't have a listener API, with which Spark driver can get 
 notifications about executor going up/down, task starting/finishing, etc. 
 With this information, Spark client should be able to have a view of the 
 current cluster image.
 Spark developers mentioned that the listener can only be created after 
 SparkContext is started, at which time, some executions may have already 
 started and so the listener will miss some information. This can be fixed. 
 File a JIRA with Spark project if necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9339) Optimize split grouping for CombineHiveInputFormat [Spark Branch]

Xuefu Zhang created HIVE-9339:
-

 Summary: Optimize split grouping for CombineHiveInputFormat [Spark 
Branch]
 Key: HIVE-9339
 URL: https://issues.apache.org/jira/browse/HIVE-9339
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang


It seems that split generation, especially in terms of grouping inputs, needs 
to be improved. For this, we may need cluster information. Because of this, we 
will first try to solve the problem for Spark.

As to cluster information, Spark doesn't provide an API (SPARK-5080). However, 
Spark doesn't have a listener API, with which Spark driver can get 
notifications about executor going up/down, task starting/finishing, etc. With 
this information, Spark client should be able to have a view of the current 
cluster image.

Spark developers mentioned that the listener can only be created after 
SparkContext is started, at which time, some executions may have already 
started and so the listener will miss some information. This can be fixed. File 
a JIRA with Spark project if necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9335) Address review items on HIVE-9257 [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273104#comment-14273104
 ] 

Xuefu Zhang commented on HIVE-9335:
---

+1

 Address review items on HIVE-9257 [Spark Branch]
 

 Key: HIVE-9335
 URL: https://issues.apache.org/jira/browse/HIVE-9335
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-9335.1-spark.patch, HIVE-9335.2-spark.patch


 I made a pass through HIVE-9257 and found the following issues: 
 {{HashTableSinkOperator.java}}
 The fields EMPTY_OBJECT_ARRAY and EMPTY_ROW_CONTAINER are no longer constants 
 and should not be in upper case.
 {{HivePairFlatMapFunction.java}}
 We share NumberFormat accross threads and it's not thread safe.
 {{KryoSerializer.java}}
 we eat the stack trace in deserializeJobConf
 {{SparkMapRecordHandler}}
 in processRow we should not be using {{StringUtils.stringifyException}} since 
 LOG can handle stack traces.
 in close:
 {noformat}
 // signal new failure to map-reduce
 LOG.error(Hit error while closing operators - failing tree);
 throw new IllegalStateException(Error while closing operators, e);
 {noformat}
 Should be:
 {noformat}
  String msg = Error while closing operators:  + e;
 throw new IllegalStateException(msg, e);
 {noformat}
 {{SparkSessionManagerImpl}} - the method {{canReuseSession}} is useless
 {{GenSparkSkewJoinProcessor}}
 {noformat}
 +  // keep it as reference in case we need fetch work
 +//localPlan.getAliasToFetchWork().put(small_alias.toString(),
 +//new FetchWork(tblDir, tableDescList.get(small_alias)));
 {noformat}
 {{GenSparkWorkWalker}} trim ws
 {{SparkCompiler}} remote init
 {{SparkEdgeProperty}} trim ws
 {{CounterStatsPublisher}} eat exception
 {{Hadoop23Shims}} unused import of {{ResourceBundles}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9119) ZooKeeperHiveLockManager does not use zookeeper in the proper way


 [ 
https://issues.apache.org/jira/browse/HIVE-9119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9119:
--
   Resolution: Fixed
Fix Version/s: 0.15.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Na.

 ZooKeeperHiveLockManager does not use zookeeper in the proper way
 -

 Key: HIVE-9119
 URL: https://issues.apache.org/jira/browse/HIVE-9119
 Project: Hive
  Issue Type: Improvement
  Components: Locking
Affects Versions: 0.13.0, 0.14.0, 0.13.1
Reporter: Na Yang
Assignee: Na Yang
 Fix For: 0.15.0

 Attachments: HIVE-9119.1.patch, HIVE-9119.2.patch, HIVE-9119.3.patch, 
 HIVE-9119.4.patch


 ZooKeeperHiveLockManager does not use zookeeper in the proper way. 
 Currently a new zookeeper client instance is created for each 
 getlock/releaselock query which sometimes causes the number of open 
 connections between
 HiveServer2 and ZooKeeper exceed the max connection number that zookeeper 
 server allows. 
 To use zookeeper as a distributed lock, there is no need to create a new 
 zookeeper instance for every getlock try. A single zookeeper instance could 
 be reused and shared by ZooKeeperHiveLockManagers.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9119) ZooKeeperHiveLockManager does not use zookeeper in the proper way


 [ 
https://issues.apache.org/jira/browse/HIVE-9119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9119:
--
Labels: TODOC15  (was: )

 ZooKeeperHiveLockManager does not use zookeeper in the proper way
 -

 Key: HIVE-9119
 URL: https://issues.apache.org/jira/browse/HIVE-9119
 Project: Hive
  Issue Type: Improvement
  Components: Locking
Affects Versions: 0.13.0, 0.14.0, 0.13.1
Reporter: Na Yang
Assignee: Na Yang
  Labels: TODOC15
 Fix For: 0.15.0

 Attachments: HIVE-9119.1.patch, HIVE-9119.2.patch, HIVE-9119.3.patch, 
 HIVE-9119.4.patch


 ZooKeeperHiveLockManager does not use zookeeper in the proper way. 
 Currently a new zookeeper client instance is created for each 
 getlock/releaselock query which sometimes causes the number of open 
 connections between
 HiveServer2 and ZooKeeper exceed the max connection number that zookeeper 
 server allows. 
 To use zookeeper as a distributed lock, there is no need to create a new 
 zookeeper instance for every getlock try. A single zookeeper instance could 
 be reused and shared by ZooKeeperHiveLockManagers.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9112) Query may generate different results depending on the number of reducers


 [ 
https://issues.apache.org/jira/browse/HIVE-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9112:
--
Status: Patch Available  (was: Open)

 Query may generate different results depending on the number of reducers
 

 Key: HIVE-9112
 URL: https://issues.apache.org/jira/browse/HIVE-9112
 Project: Hive
  Issue Type: Bug
Reporter: Chao
Assignee: Ted Xu
 Attachments: HIVE-9112.patch


 Some queries may generate different results depending on the number of 
 reducers, for example, tests like ppd_multi_insert.q, join_nullsafe.q, 
 subquery_in.q, etc.
 Take subquery_in.q as example, if we add
 {noformat}
 set mapred.reduce.tasks=3;
 {noformat}
 to this test file, the result will be different (and wrong):
 {noformat}
 @@ -903,5 +903,3 @@ where li.l_linenumber = 1 and
  POSTHOOK: type: QUERY
  POSTHOOK: Input: default@lineitem
   A masked pattern was here 
 -108570 8571
 -4297   1798
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 29787: HIVE-9257 : Merge spark to trunk January 2015 (Modified files)

2015-01-10 Thread Xuefu Zhang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29787/#review67603
---



pom.xml
https://reviews.apache.org/r/29787/#comment111661

A followup to get rid of this?



ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java
https://reviews.apache.org/r/29787/#comment111662

We should restrain from using * in imports.


- Xuefu Zhang


On Jan. 9, 2015, 11:55 p.m., Szehon Ho wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/29787/
 ---
 
 (Updated Jan. 9, 2015, 11:55 p.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-9257
 https://issues.apache.org/jira/browse/HIVE-9257
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 As the entire patch is too big, this shows the modified files.  These have 
 been cleanuped as part of HIVE-9319, HIVE-9306, HIVE-9305.
 
 The new files can be found here:  
 http://svn.apache.org/repos/asf/hive/branches/spark/ or 
 https://github.com/apache/hive/tree/spark under:
 # data/conf/spark/
 # 
 itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcWithLocalClusterSpark.java
 # 
 itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestMultiSessionsHS2WithLocalClusterSpark.java
 # itests/qtest-spark/
 # 
 ql/src/java/org/apache/hadoop/hive/ql/exec/SparkHashTableSinkOperator.java
 # ql/src/java/org/apache/hadoop/hive/ql/exec/spark/
 # ql/src/java/org/apache/hadoop/hive/ql/lib/TypeRule.java
 # 
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkMapJoinProcessor.java
 # 
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/GenSparkSkewJoinProcessor.java
 # 
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkCrossProductCheck.java
 # 
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
 # ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/
 # ql/src/java/org/apache/hadoop/hive/ql/parse/spark/
 # 
 ql/src/java/org/apache/hadoop/hive/ql/plan/SparkBucketMapJoinContext.java
 # ql/src/java/org/apache/hadoop/hive/ql/plan/SparkEdgeProperty.java
 # ql/src/java/org/apache/hadoop/hive/ql/plan/SparkHashTableSinkDesc.java
 # ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java
 # 
 ql/src/java/org/apache/hadoop/hive/ql/stats/CounterStatsAggregatorSpark.java
 # ql/src/test/org/apache/hadoop/hive/ql/exec/spark/
 # ql/src/test/queries/clientpositive/auto_join_stats.q
 # ql/src/test/queries/clientpositive/auto_join_stats2.q
 # ql/src/test/queries/clientpositive/bucket_map_join_spark1.q
 # ql/src/test/queries/clientpositive/bucket_map_join_spark2.q
 # ql/src/test/queries/clientpositive/bucket_map_join_spark3.q
 # ql/src/test/queries/clientpositive/bucket_map_join_spark4.q
 # ql/src/test/queries/clientpositive/multi_insert_mixed.q
 # ql/src/test/queries/clientpositive/multi_insert_union_src.q
 # ql/src/test/queries/clientpositive/parallel_join0.q
 # ql/src/test/queries/clientpositive/parallel_join1.q
 # ql/src/test/queries/clientpositive/spark_test.q
 # ql/src/test/queries/clientpositive/udf_example_add.q
 # ql/src/test/results/clientpositive/auto_join_stats.q.out
 # ql/src/test/results/clientpositive/auto_join_stats2.q.out
 # ql/src/test/results/clientpositive/bucket_map_join_spark1.q.out
 # ql/src/test/results/clientpositive/bucket_map_join_spark2.q.out
 # ql/src/test/results/clientpositive/bucket_map_join_spark3.q.out
 # ql/src/test/results/clientpositive/bucket_map_join_spark4.q.out
 # ql/src/test/results/clientpositive/multi_insert_mixed.q.out
 # ql/src/test/results/clientpositive/multi_insert_union_src.q.out
 # ql/src/test/results/clientpositive/parallel_join0.q.out
 # ql/src/test/results/clientpositive/parallel_join1.q.out
 # ql/src/test/results/clientpositive/spark/
 # ql/src/test/results/clientpositive/spark_test.q.out
 # ql/src/test/results/clientpositive/udf_example_add.q.out
 # spark-client/
 Cleanup and review of those have been done as part of HIVE-9281 and HIVE-9288.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/common/StatsSetupConst.java cd4beeb 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 8264b16 
   data/conf/hive-log4j.properties a5b9c9a 
   itests/hive-unit/pom.xml f9f59c9 
   itests/pom.xml 0a154d6 
   itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 878202a 
   pom.xml efe5e3a 
   ql/pom.xml 84e912e 
   ql/src/java/org/apache/hadoop/hive/ql/Context.java 0373273 
   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 8bb6d0f 
   ql/src/java/org/apache/hadoop/hive/ql/HashTableLoaderFactory.java 10ad933 
   ql/src/java/org/apache/hadoop/hive/ql/exec

[jira] [Updated] (HIVE-9104) windowing.q failed when mapred.reduce.tasks is set to larger than one


 [ 
https://issues.apache.org/jira/browse/HIVE-9104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9104:
--
   Resolution: Fixed
Fix Version/s: 0.15.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thank Chao for the contribution and Harish for the review.

 windowing.q failed when mapred.reduce.tasks is set to larger than one
 -

 Key: HIVE-9104
 URL: https://issues.apache.org/jira/browse/HIVE-9104
 Project: Hive
  Issue Type: Sub-task
Reporter: Chao
Assignee: Chao
 Fix For: 0.15.0

 Attachments: HIVE-9104.2.patch, HIVE-9104.patch


 Test {{windowing.q}} is actually not enabled in Spark branch - in test 
 configurations it is {{windowing.q.q}}.
 I just run this test, and query
 {code}
 -- 12. testFirstLastWithWhere
 select  p_mfgr,p_name, p_size,
 rank() over(distribute by p_mfgr sort by p_name) as r,
 sum(p_size) over (distribute by p_mfgr sort by p_name rows between current 
 row and current row) as s2,
 first_value(p_size) over w1 as f,
 last_value(p_size, false) over w1 as l
 from part
 where p_mfgr = 'Manufacturer#3'
 window w1 as (distribute by p_mfgr sort by p_name rows between 2 preceding 
 and 2 following);
 {code}
 failed with the following exception:
 {noformat}
 java.lang.RuntimeException: Hive Runtime Error while closing operators: null
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.close(SparkReduceRecordHandler.java:446)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.closeRecordProcessor(HiveReduceFunctionResultList.java:58)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390)
   at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.util.NoSuchElementException
   at java.util.ArrayDeque.getFirst(ArrayDeque.java:318)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFFirstValue$FirstValStreamingFixedWindow.terminate(GenericUDAFFirstValue.java:290)
   at 
 org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.finishPartition(WindowingTableFunction.java:413)
   at 
 org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.finishPartition(PTFOperator.java:337)
   at org.apache.hadoop.hive.ql.exec.PTFOperator.closeOp(PTFOperator.java:95)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.close(SparkReduceRecordHandler.java:431)
   ... 15 more
 {noformat}
 We need to find out:
 - Since which commit this test started failing, and
 - Why it fails



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9257) Merge from spark to trunk January 2015


[ 
https://issues.apache.org/jira/browse/HIVE-9257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272801#comment-14272801
 ] 

Xuefu Zhang commented on HIVE-9257:
---

I reviewed most of patches in Spark branch during the months, and also produced 
some. I reviewed the mega patch here, and left a couple of comments on RB. 
However, these can be addressed as followups.

+1 pending on test.




 Merge from spark to trunk January 2015
 --

 Key: HIVE-9257
 URL: https://issues.apache.org/jira/browse/HIVE-9257
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: 0.15.0
Reporter: Szehon Ho
Assignee: Szehon Ho

 The hive on spark work has reached a point where we can merge it into the 
 trunk branch.  Note that spark execution engine is optional and no current 
 users should be impacted.
 This JIRA will be used to track the merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9335) Address review items on HIVE-9257 [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272780#comment-14272780
 ] 

Xuefu Zhang commented on HIVE-9335:
---

Patch looks good, except it seems containing code changes from HIVE-9289, which 
hasn't been finalized. It's okay to address HIVE-9289 after the merge, but I 
think we shouldn't include that patch here.

 Address review items on HIVE-9257 [Spark Branch]
 

 Key: HIVE-9335
 URL: https://issues.apache.org/jira/browse/HIVE-9335
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-9335.1-spark.patch


 I made a pass through HIVE-9257 and found the following issues: 
 {{HashTableSinkOperator.java}}
 The fields EMPTY_OBJECT_ARRAY and EMPTY_ROW_CONTAINER are no longer constants 
 and should not be in upper case.
 {{HivePairFlatMapFunction.java}}
 We share NumberFormat accross threads and it's not thread safe.
 {{KryoSerializer.java}}
 we eat the stack trace in deserializeJobConf
 {{SparkMapRecordHandler}}
 in processRow we should not be using {{StringUtils.stringifyException}} since 
 LOG can handle stack traces.
 in close:
 {noformat}
 // signal new failure to map-reduce
 LOG.error(Hit error while closing operators - failing tree);
 throw new IllegalStateException(Error while closing operators, e);
 {noformat}
 Should be:
 {noformat}
  String msg = Error while closing operators:  + e;
 throw new IllegalStateException(msg, e);
 {noformat}
 {{SparkSessionManagerImpl}} - the method {{canReuseSession}} is useless
 {{GenSparkSkewJoinProcessor}}
 {noformat}
 +  // keep it as reference in case we need fetch work
 +//localPlan.getAliasToFetchWork().put(small_alias.toString(),
 +//new FetchWork(tblDir, tableDescList.get(small_alias)));
 {noformat}
 {{GenSparkWorkWalker}} trim ws
 {{SparkCompiler}} remote init
 {{SparkEdgeProperty}} trim ws
 {{CounterStatsPublisher}} eat exception
 {{Hadoop23Shims}} unused import of {{ResourceBundles}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9104) windowing.q failed when mapred.reduce.tasks is set to larger than one


 [ 
https://issues.apache.org/jira/browse/HIVE-9104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9104:
--
Component/s: (was: Spark)

 windowing.q failed when mapred.reduce.tasks is set to larger than one
 -

 Key: HIVE-9104
 URL: https://issues.apache.org/jira/browse/HIVE-9104
 Project: Hive
  Issue Type: Sub-task
Reporter: Chao
Assignee: Chao
 Fix For: 0.15.0

 Attachments: HIVE-9104.2.patch, HIVE-9104.patch


 Test {{windowing.q}} is actually not enabled in Spark branch - in test 
 configurations it is {{windowing.q.q}}.
 I just run this test, and query
 {code}
 -- 12. testFirstLastWithWhere
 select  p_mfgr,p_name, p_size,
 rank() over(distribute by p_mfgr sort by p_name) as r,
 sum(p_size) over (distribute by p_mfgr sort by p_name rows between current 
 row and current row) as s2,
 first_value(p_size) over w1 as f,
 last_value(p_size, false) over w1 as l
 from part
 where p_mfgr = 'Manufacturer#3'
 window w1 as (distribute by p_mfgr sort by p_name rows between 2 preceding 
 and 2 following);
 {code}
 failed with the following exception:
 {noformat}
 java.lang.RuntimeException: Hive Runtime Error while closing operators: null
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.close(SparkReduceRecordHandler.java:446)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.closeRecordProcessor(HiveReduceFunctionResultList.java:58)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390)
   at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.util.NoSuchElementException
   at java.util.ArrayDeque.getFirst(ArrayDeque.java:318)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFFirstValue$FirstValStreamingFixedWindow.terminate(GenericUDAFFirstValue.java:290)
   at 
 org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.finishPartition(WindowingTableFunction.java:413)
   at 
 org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.finishPartition(PTFOperator.java:337)
   at org.apache.hadoop.hive.ql.exec.PTFOperator.closeOp(PTFOperator.java:95)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.close(SparkReduceRecordHandler.java:431)
   ... 15 more
 {noformat}
 We need to find out:
 - Since which commit this test started failing, and
 - Why it fails



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 29494: HIVE-9119: ZooKeeperHiveLockManager does not use zookeeper in the proper way

2015-01-10 Thread Xuefu Zhang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29494/#review67606
---

Ship it!


Ship It!

- Xuefu Zhang


On Jan. 5, 2015, 6:43 p.m., Na Yang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/29494/
 ---
 
 (Updated Jan. 5, 2015, 6:43 p.m.)
 
 
 Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang.
 
 
 Bugs: HIVE-9119
 https://issues.apache.org/jira/browse/HIVE-9119
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 1. Use singleton ZooKeeper client for ZooKeeperHiveLocManager
 2. Use CuratorFramework to manage ZooKeeper client 
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 2e51518 
   itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 878202a 
   ql/pom.xml 84e912e 
   
 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/CuratorFrameworkSingleton.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
  1334a91 
   
 ql/src/test/org/apache/hadoop/hive/ql/lockmgr/zookeeper/TestZookeeperLockManager.java
  aacb73f 
 
 Diff: https://reviews.apache.org/r/29494/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Na Yang

[jira] [Commented] (HIVE-9119) ZooKeeperHiveLockManager does not use zookeeper in the proper way


[ 
https://issues.apache.org/jira/browse/HIVE-9119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272806#comment-14272806
 ] 

Xuefu Zhang commented on HIVE-9119:
---

+1 patch looks good to me. Thanks for fixing this long haunting issue, and now 
the zookeeper related code is much cleaner also.

 ZooKeeperHiveLockManager does not use zookeeper in the proper way
 -

 Key: HIVE-9119
 URL: https://issues.apache.org/jira/browse/HIVE-9119
 Project: Hive
  Issue Type: Improvement
  Components: Locking
Affects Versions: 0.13.0, 0.14.0, 0.13.1
Reporter: Na Yang
Assignee: Na Yang
 Attachments: HIVE-9119.1.patch, HIVE-9119.2.patch, HIVE-9119.3.patch, 
 HIVE-9119.4.patch


 ZooKeeperHiveLockManager does not use zookeeper in the proper way. 
 Currently a new zookeeper client instance is created for each 
 getlock/releaselock query which sometimes causes the number of open 
 connections between
 HiveServer2 and ZooKeeper exceed the max connection number that zookeeper 
 server allows. 
 To use zookeeper as a distributed lock, there is no need to create a new 
 zookeeper instance for every getlock try. A single zookeeper instance could 
 be reused and shared by ZooKeeperHiveLockManagers.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9104) windowing.q failed when mapred.reduce.tasks is set to larger than one


[ 
https://issues.apache.org/jira/browse/HIVE-9104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272187#comment-14272187
 ] 

Xuefu Zhang commented on HIVE-9104:
---

+1. Code looks reasonable to me. However, it's great if [~rhbutani] or someone 
else familiar to this part of code to take a look.

 windowing.q failed when mapred.reduce.tasks is set to larger than one
 -

 Key: HIVE-9104
 URL: https://issues.apache.org/jira/browse/HIVE-9104
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chao
Assignee: Chao
 Attachments: HIVE-9104.patch


 Test {{windowing.q}} is actually not enabled in Spark branch - in test 
 configurations it is {{windowing.q.q}}.
 I just run this test, and query
 {code}
 -- 12. testFirstLastWithWhere
 select  p_mfgr,p_name, p_size,
 rank() over(distribute by p_mfgr sort by p_name) as r,
 sum(p_size) over (distribute by p_mfgr sort by p_name rows between current 
 row and current row) as s2,
 first_value(p_size) over w1 as f,
 last_value(p_size, false) over w1 as l
 from part
 where p_mfgr = 'Manufacturer#3'
 window w1 as (distribute by p_mfgr sort by p_name rows between 2 preceding 
 and 2 following);
 {code}
 failed with the following exception:
 {noformat}
 java.lang.RuntimeException: Hive Runtime Error while closing operators: null
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.close(SparkReduceRecordHandler.java:446)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.closeRecordProcessor(HiveReduceFunctionResultList.java:58)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390)
   at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.util.NoSuchElementException
   at java.util.ArrayDeque.getFirst(ArrayDeque.java:318)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFFirstValue$FirstValStreamingFixedWindow.terminate(GenericUDAFFirstValue.java:290)
   at 
 org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.finishPartition(WindowingTableFunction.java:413)
   at 
 org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.finishPartition(PTFOperator.java:337)
   at org.apache.hadoop.hive.ql.exec.PTFOperator.closeOp(PTFOperator.java:95)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.close(SparkReduceRecordHandler.java:431)
   ... 15 more
 {noformat}
 We need to find out:
 - Since which commit this test started failing, and
 - Why it fails



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9104) windowing.q failed when mapred.reduce.tasks is set to larger than one


[ 
https://issues.apache.org/jira/browse/HIVE-9104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272299#comment-14272299
 ] 

Xuefu Zhang commented on HIVE-9104:
---

[~csun] Could you add a test case in which perhaps the same query runs with 
multiple reducers. It can be in the same .q file.

 windowing.q failed when mapred.reduce.tasks is set to larger than one
 -

 Key: HIVE-9104
 URL: https://issues.apache.org/jira/browse/HIVE-9104
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chao
Assignee: Chao
 Attachments: HIVE-9104.patch


 Test {{windowing.q}} is actually not enabled in Spark branch - in test 
 configurations it is {{windowing.q.q}}.
 I just run this test, and query
 {code}
 -- 12. testFirstLastWithWhere
 select  p_mfgr,p_name, p_size,
 rank() over(distribute by p_mfgr sort by p_name) as r,
 sum(p_size) over (distribute by p_mfgr sort by p_name rows between current 
 row and current row) as s2,
 first_value(p_size) over w1 as f,
 last_value(p_size, false) over w1 as l
 from part
 where p_mfgr = 'Manufacturer#3'
 window w1 as (distribute by p_mfgr sort by p_name rows between 2 preceding 
 and 2 following);
 {code}
 failed with the following exception:
 {noformat}
 java.lang.RuntimeException: Hive Runtime Error while closing operators: null
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.close(SparkReduceRecordHandler.java:446)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.closeRecordProcessor(HiveReduceFunctionResultList.java:58)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390)
   at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.util.NoSuchElementException
   at java.util.ArrayDeque.getFirst(ArrayDeque.java:318)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFFirstValue$FirstValStreamingFixedWindow.terminate(GenericUDAFFirstValue.java:290)
   at 
 org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.finishPartition(WindowingTableFunction.java:413)
   at 
 org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.finishPartition(PTFOperator.java:337)
   at org.apache.hadoop.hive.ql.exec.PTFOperator.closeOp(PTFOperator.java:95)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.close(SparkReduceRecordHandler.java:431)
   ... 15 more
 {noformat}
 We need to find out:
 - Since which commit this test started failing, and
 - Why it fails



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9251) SetSparkReducerParallelism is likely to set too small number of reducers [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9251:
--
   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Committed to spark branch. Thanks, Rui.

 SetSparkReducerParallelism is likely to set too small number of reducers 
 [Spark Branch]
 ---

 Key: HIVE-9251
 URL: https://issues.apache.org/jira/browse/HIVE-9251
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Fix For: spark-branch

 Attachments: HIVE-9251.1-spark.patch, HIVE-9251.2-spark.patch, 
 HIVE-9251.3-spark.patch, HIVE-9251.4-spark.patch, HIVE-9251.5-spark.patch, 
 HIVE-9251.6-spark.patch


 This may hurt performance or even lead to task failures. For example, spark's 
 netty-based shuffle limits the max frame size to be 2G.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9290) Make some test results deterministic


[ 
https://issues.apache.org/jira/browse/HIVE-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271116#comment-14271116
 ] 

Xuefu Zhang commented on HIVE-9290:
---

I was aware of this but knew Rui was immediately working on HIVE-9251 which 
depends on this issue. Yes, there will be a little time period where the tests 
would fail, but I think it's okay as long as we are aware.

 Make some test results deterministic
 

 Key: HIVE-9290
 URL: https://issues.apache.org/jira/browse/HIVE-9290
 Project: Hive
  Issue Type: Test
Reporter: Rui Li
Assignee: Rui Li
 Fix For: spark-branch, 0.15.0

 Attachments: HIVE-9290-spark.patch, HIVE-9290.1.patch, 
 HIVE-9290.1.patch


 {noformat}
 limit_pushdown.q
 optimize_nullscan.q
 ppd_gby_join.q
 vector_string_concat.q
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-9290) Make some test results deterministic


[ 
https://issues.apache.org/jira/browse/HIVE-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271116#comment-14271116
 ] 

Xuefu Zhang edited comment on HIVE-9290 at 1/9/15 3:42 PM:
---

I was aware of this but knew Rui was immediately working on HIVE-9251 which 
depends on this issue. Yes, there would be a little time period where the tests 
would fail, but I thought it's okay as long as we are aware. Sorry for the 
inconvenience.


was (Author: xuefuz):
I was aware of this but knew Rui was immediately working on HIVE-9251 which 
depends on this issue. Yes, there will be a little time period where the tests 
would fail, but I think it's okay as long as we are aware.

 Make some test results deterministic
 

 Key: HIVE-9290
 URL: https://issues.apache.org/jira/browse/HIVE-9290
 Project: Hive
  Issue Type: Test
Reporter: Rui Li
Assignee: Rui Li
 Fix For: spark-branch, 0.15.0

 Attachments: HIVE-9290-spark.patch, HIVE-9290.1.patch, 
 HIVE-9290.1.patch


 {noformat}
 limit_pushdown.q
 optimize_nullscan.q
 ppd_gby_join.q
 vector_string_concat.q
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9326) BaseProtocol.Error failed to deserialization due to NPE.[Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9326:
--
   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

The test failures are known and unrelated.

Committed to Spark branch. Thanks, Chengxiang.

 BaseProtocol.Error failed to deserialization due to NPE.[Spark Branch]
 --

 Key: HIVE-9326
 URL: https://issues.apache.org/jira/browse/HIVE-9326
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M5
 Fix For: spark-branch

 Attachments: HIVE-9326.1-spark.patch


 Throwables.getStackTraceAsString(cause) throw NPE if cause is null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9326) BaseProtocol.Error failed to deserialization due to NPE.[Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271772#comment-14271772
 ] 

Xuefu Zhang commented on HIVE-9326:
---

+1

 BaseProtocol.Error failed to deserialization due to NPE.[Spark Branch]
 --

 Key: HIVE-9326
 URL: https://issues.apache.org/jira/browse/HIVE-9326
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M5
 Attachments: HIVE-9326.1-spark.patch


 Throwables.getStackTraceAsString(cause) throw NPE if cause is null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9306) Let Context.isLocalOnlyExecutionMode() return false if execution engine is Spark [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270202#comment-14270202
 ] 

Xuefu Zhang commented on HIVE-9306:
---

Test failure above,  
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_percentile_approx_23.q,
 doesn't seem related to the patch. It didn't happen in previous run, and 
neither in my local run.

 Let Context.isLocalOnlyExecutionMode() return false if execution engine is 
 Spark [Spark Branch]
 ---

 Key: HIVE-9306
 URL: https://issues.apache.org/jira/browse/HIVE-9306
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-9306.1-spark.patch, HIVE-9306.2-spark.patch, 
 HIVE-9306.3-spark.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9306) Let Context.isLocalOnlyExecutionMode() return false if execution engine is Spark [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9306:
--
   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Committed to Spark branch. Thanks to Szehon for the review.

 Let Context.isLocalOnlyExecutionMode() return false if execution engine is 
 Spark [Spark Branch]
 ---

 Key: HIVE-9306
 URL: https://issues.apache.org/jira/browse/HIVE-9306
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: spark-branch

 Attachments: HIVE-9306.1-spark.patch, HIVE-9306.2-spark.patch, 
 HIVE-9306.3-spark.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9290) Make some test results deterministic


 [ 
https://issues.apache.org/jira/browse/HIVE-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9290:
--
   Resolution: Fixed
Fix Version/s: 0.15.0
   spark-branch
   Status: Resolved  (was: Patch Available)

Committed to trunk and merged to Spark branch. Thanks, Rui.

 Make some test results deterministic
 

 Key: HIVE-9290
 URL: https://issues.apache.org/jira/browse/HIVE-9290
 Project: Hive
  Issue Type: Test
Reporter: Rui Li
Assignee: Rui Li
 Fix For: spark-branch, 0.15.0

 Attachments: HIVE-9290.1.patch, HIVE-9290.1.patch


 {noformat}
 limit_pushdown.q
 optimize_nullscan.q
 ppd_gby_join.q
 vector_string_concat.q
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 29733: HIVE-9319 : Cleanup Modified Files [Spark Branch]

2015-01-08 Thread Xuefu Zhang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29733/#review67348
---

Ship it!


Ship It!

- Xuefu Zhang


On Jan. 9, 2015, 12:01 a.m., Szehon Ho wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/29733/
 ---
 
 (Updated Jan. 9, 2015, 12:01 a.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Note that this limits cleanup to lines of code changed in spark-branch in the 
 merge to trunk, not cleanup of all of the modified files, in order to reduce 
 merge conflicts.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/Driver.java fa40082 
   ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java b25a639 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java ee42f4c 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java
  abdb6af 
   ql/src/java/org/apache/hadoop/hive/ql/io/HiveKey.java 33aeda4 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 6f216c9 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java a6d5c62 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/unionproc/UnionProcessor.java 
 fec6822 
   ql/src/java/org/apache/hadoop/hive/ql/parse/MapReduceCompiler.java 1b6de64 
   ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 1efbb12 
   
 ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java 
 4582678 
   ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 076d2fa 
   shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java 
 f1743ae 
 
 Diff: https://reviews.apache.org/r/29733/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Szehon Ho

[jira] [Commented] (HIVE-9319) Cleanup Modified Files [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270303#comment-14270303
 ] 

Xuefu Zhang commented on HIVE-9319:
---

+1 pending on test

 Cleanup Modified Files [Spark Branch]
 -

 Key: HIVE-9319
 URL: https://issues.apache.org/jira/browse/HIVE-9319
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho
Assignee: Szehon Ho
Priority: Minor
 Attachments: HIVE-9319-spark.patch


 Cleanup the code that is modified based on checkstyle/TODO/warnings.
 It is a follow-up of HIVE-9281 which is for new files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9267) Ensure custom UDF works with Spark [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9267:
--
   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Committed to Spark branch. Thanks to Szehon for the review.

 Ensure custom UDF works with Spark [Spark Branch]
 -

 Key: HIVE-9267
 URL: https://issues.apache.org/jira/browse/HIVE-9267
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: spark-branch

 Attachments: HIVE-9267.1-spark.patch


 Create or add auto qtest if necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9293) Cleanup SparkTask getMapWork to skip UnionWork check [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9293:
--
   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Committed to Spark branch. Thanks, Chao.

 Cleanup SparkTask getMapWork to skip UnionWork check [Spark Branch]
 ---

 Key: HIVE-9293
 URL: https://issues.apache.org/jira/browse/HIVE-9293
 Project: Hive
  Issue Type: Task
  Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho
Assignee: Chao
Priority: Minor
 Fix For: spark-branch

 Attachments: HIVE-9293.1-spark.patch


 As we don't have UnionWork anymore, we can simplify the logic to get root 
 mapworks from the SparkWork.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9306) Let Context.isLocalOnlyExecutionMode() return false if execution engine is Spark [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269409#comment-14269409
 ] 

Xuefu Zhang commented on HIVE-9306:
---

fs_default_name2.q output needs to be updated, which will be consistent with 
trunk.

skewjoinopt5.q failed due to error below. It shouldn't be related to the 
changes here. In hive.log:
{code}
2015-01-07 22:05:46,510 INFO  [main]: impl.RemoteSparkJobStatus 
(RemoteSparkJobStatus.java:getSparkJobInfo(143)) - Job hasn't been submitted 
after 30s. Aborting it.
2015-01-07 22:05:46,511 ERROR [main]: exec.Task 
(SessionState.java:printError(839)) - Failed to execute spark task, with 
exception 'java.lang.IllegalStateException(RPC channel is closed.)'
java.lang.IllegalStateException: RPC channel is closed.
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:149)
at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:264)
at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:251)
at 
org.apache.hive.spark.client.SparkClientImpl$ClientProtocol.cancel(SparkClientImpl.java:375)
at 
org.apache.hive.spark.client.SparkClientImpl.cancel(SparkClientImpl.java:159)
at 
org.apache.hive.spark.client.JobHandleImpl.cancel(JobHandleImpl.java:59)
at 
org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus.getSparkJobInfo(RemoteSparkJobStatus.java:144)
at 
org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus.getState(RemoteSparkJobStatus.java:75)
at 
org.apache.hadoop.hive.ql.exec.spark.status.SparkJobMonitor.startMonitor(SparkJobMonitor.java:72)
at 
org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:108)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1634)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1393)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1179)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1035)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:206)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:158)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:369)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:304)
at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:880)
at 
org.apache.hadoop.hive.cli.TestSparkCliDriver.runTest(TestSparkCliDriver.java:234)
at 
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt5(TestSparkCliDriver.java:206)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at junit.framework.TestCase.runTest(TestCase.java:176)
at junit.framework.TestCase.runBare(TestCase.java:141)
at junit.framework.TestResult$1.protect(TestResult.java:122)
at junit.framework.TestResult.runProtected(TestResult.java:142)
at junit.framework.TestResult.run(TestResult.java:125)
at junit.framework.TestCase.run(TestCase.java:129)
at junit.framework.TestSuite.runTest(TestSuite.java:255)
at junit.framework.TestSuite.run(TestSuite.java:250)
at 
org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
{code}
This might be transient, but we need to address it if the problem persists.

 Let Context.isLocalOnlyExecutionMode() return false if execution engine is 
 Spark [Spark Branch]
 ---

 Key: HIVE-9306
 URL: https://issues.apache.org/jira/browse/HIVE-9306
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE

[jira] [Updated] (HIVE-9301) Potential null dereference in MoveTask#createTargetPath()


 [ 
https://issues.apache.org/jira/browse/HIVE-9301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9301:
--
   Resolution: Fixed
Fix Version/s: 0.15.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Ted.

 Potential null dereference in MoveTask#createTargetPath()
 -

 Key: HIVE-9301
 URL: https://issues.apache.org/jira/browse/HIVE-9301
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.15.0

 Attachments: HIVE-9301.patch


 {code}
 if (mkDirPath != null  !fs.exists(mkDirPath)) {
 {code}
 '' should be used instead of single ampersand.
 If mkDirPath is null, fs.exists() would still be called - resulting in NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9293) Cleanup SparkTask getMapWork to skip UnionWork check [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269352#comment-14269352
 ] 

Xuefu Zhang commented on HIVE-9293:
---

+1

 Cleanup SparkTask getMapWork to skip UnionWork check [Spark Branch]
 ---

 Key: HIVE-9293
 URL: https://issues.apache.org/jira/browse/HIVE-9293
 Project: Hive
  Issue Type: Task
  Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho
Assignee: Chao
Priority: Minor
 Attachments: HIVE-9293.1-spark.patch


 As we don't have UnionWork anymore, we can simplify the logic to get root 
 mapworks from the SparkWork.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9290) Make some test results deterministic


[ 
https://issues.apache.org/jira/browse/HIVE-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269371#comment-14269371
 ] 

Xuefu Zhang commented on HIVE-9290:
---

+1

 Make some test results deterministic
 

 Key: HIVE-9290
 URL: https://issues.apache.org/jira/browse/HIVE-9290
 Project: Hive
  Issue Type: Test
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-9290.1.patch


 {noformat}
 limit_pushdown.q
 optimize_nullscan.q
 ppd_gby_join.q
 vector_string_concat.q
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-9290) Make some test results deterministic


[ 
https://issues.apache.org/jira/browse/HIVE-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269371#comment-14269371
 ] 

Xuefu Zhang edited comment on HIVE-9290 at 1/8/15 2:37 PM:
---

+1 pending on test


was (Author: xuefuz):
+1

 Make some test results deterministic
 

 Key: HIVE-9290
 URL: https://issues.apache.org/jira/browse/HIVE-9290
 Project: Hive
  Issue Type: Test
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-9290.1.patch


 {noformat}
 limit_pushdown.q
 optimize_nullscan.q
 ppd_gby_join.q
 vector_string_concat.q
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-9112) Query may generate different results depending on the number of reducers


 [ 
https://issues.apache.org/jira/browse/HIVE-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang reassigned HIVE-9112:
-

Assignee: Ted Xu  (was: Chao)

Hi [~tedxu], I'm assigning this to you for investigation/fix. Thanks.

 Query may generate different results depending on the number of reducers
 

 Key: HIVE-9112
 URL: https://issues.apache.org/jira/browse/HIVE-9112
 Project: Hive
  Issue Type: Bug
Reporter: Chao
Assignee: Ted Xu

 Some queries may generate different results depending on the number of 
 reducers, for example, tests like ppd_multi_insert.q, join_nullsafe.q, 
 subquery_in.q, etc.
 Take subquery_in.q as example, if we add
 {noformat}
 set mapred.reduce.tasks=3;
 {noformat}
 to this test file, the result will be different (and wrong):
 {noformat}
 @@ -903,5 +903,3 @@ where li.l_linenumber = 1 and
  POSTHOOK: type: QUERY
  POSTHOOK: Input: default@lineitem
   A masked pattern was here 
 -108570 8571
 -4297   1798
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9306) Let Context.isLocalOnlyExecutionMode() return false if execution engine is Spark [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9306:
--
Attachment: HIVE-9306.2-spark.patch

 Let Context.isLocalOnlyExecutionMode() return false if execution engine is 
 Spark [Spark Branch]
 ---

 Key: HIVE-9306
 URL: https://issues.apache.org/jira/browse/HIVE-9306
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-9306.1-spark.patch, HIVE-9306.2-spark.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9306) Let Context.isLocalOnlyExecutionMode() return false if execution engine is Spark [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9306:
--
Attachment: HIVE-9306.3-spark.patch

 Let Context.isLocalOnlyExecutionMode() return false if execution engine is 
 Spark [Spark Branch]
 ---

 Key: HIVE-9306
 URL: https://issues.apache.org/jira/browse/HIVE-9306
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-9306.1-spark.patch, HIVE-9306.2-spark.patch, 
 HIVE-9306.3-spark.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9306) Let Context.isLocalOnlyExecutionMode() return false if execution engine is Spark [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9306:
--
Attachment: (was: HIVE-9306.3-spark.patch)

 Let Context.isLocalOnlyExecutionMode() return false if execution engine is 
 Spark [Spark Branch]
 ---

 Key: HIVE-9306
 URL: https://issues.apache.org/jira/browse/HIVE-9306
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-9306.1-spark.patch, HIVE-9306.2-spark.patch, 
 HIVE-9306.3-spark.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9306) Let Context.isLocalOnlyExecutionMode() return false if execution engine is Spark [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9306:
--
Attachment: HIVE-9306.3-spark.patch

 Let Context.isLocalOnlyExecutionMode() return false if execution engine is 
 Spark [Spark Branch]
 ---

 Key: HIVE-9306
 URL: https://issues.apache.org/jira/browse/HIVE-9306
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-9306.1-spark.patch, HIVE-9306.2-spark.patch, 
 HIVE-9306.3-spark.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-9219) Investigate differences for auto join tests in explain after merge from trunk [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang resolved HIVE-9219.
---
Resolution: Not a Problem

 Investigate differences for auto join tests in explain after merge from trunk 
 [Spark Branch]
 

 Key: HIVE-9219
 URL: https://issues.apache.org/jira/browse/HIVE-9219
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Chao

 {noformat}
 diff --git a/ql/src/test/results/clientpositive/spark/auto_join14.q.out 
 b/ql/src/test/results/clientpositive/spark/auto_join14.q.out
 index cbca649..830314e 100644
 --- a/ql/src/test/results/clientpositive/spark/auto_join14.q.out
 +++ b/ql/src/test/results/clientpositive/spark/auto_join14.q.out
 @@ -38,9 +38,6 @@ STAGE PLANS:
  predicate: (key  100) (type: boolean)
  Statistics: Num rows: 166 Data size: 1763 Basic stats: 
 COMPLETE Column stats: NONE
  Spark HashTable Sink Operator
 -  condition expressions:
 -0 
 -1 {value}
keys:
  0 key (type: string)
  1 key (type: string)
 @@ -62,9 +59,6 @@ STAGE PLANS:
  Map Join Operator
condition map:
 Inner Join 0 to 1
 -  condition expressions:
 -0 {key}
 -1 {value}
keys:
  0 key (type: string)
  1 key (type: string)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9306) Let Context.isLocalOnlyExecutionMode() return false if execution engine is Spark [Spark Branch]

Xuefu Zhang created HIVE-9306:
-

 Summary: Let Context.isLocalOnlyExecutionMode() return false if 
execution engine is Spark [Spark Branch]
 Key: HIVE-9306
 URL: https://issues.apache.org/jira/browse/HIVE-9306
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9306) Let Context.isLocalOnlyExecutionMode() return false if execution engine is Spark [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9306:
--
Status: Patch Available  (was: Open)

 Let Context.isLocalOnlyExecutionMode() return false if execution engine is 
 Spark [Spark Branch]
 ---

 Key: HIVE-9306
 URL: https://issues.apache.org/jira/browse/HIVE-9306
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-9306.1-spark.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9251) SetSparkReducerParallelism is likely to set too small number of reducers [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268841#comment-14268841
 ] 

Xuefu Zhang commented on HIVE-9251:
---

It should be okay. Limit is still pushed down in the extra stage introduced by 
order by.

 SetSparkReducerParallelism is likely to set too small number of reducers 
 [Spark Branch]
 ---

 Key: HIVE-9251
 URL: https://issues.apache.org/jira/browse/HIVE-9251
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-9251.1-spark.patch, HIVE-9251.2-spark.patch, 
 HIVE-9251.3-spark.patch


 This may hurt performance or even lead to task failures. For example, spark's 
 netty-based shuffle limits the max frame size to be 2G.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9306) Let Context.isLocalOnlyExecutionMode() return false if execution engine is Spark [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9306:
--
Attachment: HIVE-9306.1-spark.patch

 Let Context.isLocalOnlyExecutionMode() return false if execution engine is 
 Spark [Spark Branch]
 ---

 Key: HIVE-9306
 URL: https://issues.apache.org/jira/browse/HIVE-9306
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-9306.1-spark.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9290) Make some test results deterministic


[ 
https://issues.apache.org/jira/browse/HIVE-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268835#comment-14268835
 ] 

Xuefu Zhang commented on HIVE-9290:
---

It should be okay. Even though an extra stage is introduced, I see the limit is 
still pushed down in the second stage according to the plan.

 Make some test results deterministic
 

 Key: HIVE-9290
 URL: https://issues.apache.org/jira/browse/HIVE-9290
 Project: Hive
  Issue Type: Test
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-9290.1.patch


 {noformat}
 limit_pushdown.q
 optimize_nullscan.q
 ppd_gby_join.q
 vector_string_concat.q
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9281) Code cleanup [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268466#comment-14268466
 ] 

Xuefu Zhang commented on HIVE-9281:
---

Could you load both versions to RB so that I can just look at the diff between 
the versions?

 Code cleanup [Spark Branch]
 ---

 Key: HIVE-9281
 URL: https://issues.apache.org/jira/browse/HIVE-9281
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-9281-spark.patch, HIVE-9281.2-spark.patch


 In preparation for merge, we need to cleanup the codes.
 This includes removing TODO's, fixing checkstyles, removing commented or 
 unused code, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9104) windowing.q failed when mapred.reduce.tasks is set to larger than one


 [ 
https://issues.apache.org/jira/browse/HIVE-9104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9104:
--
Affects Version/s: (was: spark-branch)

 windowing.q failed when mapred.reduce.tasks is set to larger than one
 -

 Key: HIVE-9104
 URL: https://issues.apache.org/jira/browse/HIVE-9104
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chao
Assignee: Chao

 Test {{windowing.q}} is actually not enabled in Spark branch - in test 
 configurations it is {{windowing.q.q}}.
 I just run this test, and query
 {code}
 -- 12. testFirstLastWithWhere
 select  p_mfgr,p_name, p_size,
 rank() over(distribute by p_mfgr sort by p_name) as r,
 sum(p_size) over (distribute by p_mfgr sort by p_name rows between current 
 row and current row) as s2,
 first_value(p_size) over w1 as f,
 last_value(p_size, false) over w1 as l
 from part
 where p_mfgr = 'Manufacturer#3'
 window w1 as (distribute by p_mfgr sort by p_name rows between 2 preceding 
 and 2 following);
 {code}
 failed with the following exception:
 {noformat}
 java.lang.RuntimeException: Hive Runtime Error while closing operators: null
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.close(SparkReduceRecordHandler.java:446)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.closeRecordProcessor(HiveReduceFunctionResultList.java:58)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at 
 org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
   at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390)
   at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.util.NoSuchElementException
   at java.util.ArrayDeque.getFirst(ArrayDeque.java:318)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFFirstValue$FirstValStreamingFixedWindow.terminate(GenericUDAFFirstValue.java:290)
   at 
 org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.finishPartition(WindowingTableFunction.java:413)
   at 
 org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.finishPartition(PTFOperator.java:337)
   at org.apache.hadoop.hive.ql.exec.PTFOperator.closeOp(PTFOperator.java:95)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.close(SparkReduceRecordHandler.java:431)
   ... 15 more
 {noformat}
 We need to find out:
 - Since which commit this test started failing, and
 - Why it fails



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9281) Code cleanup [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268443#comment-14268443
 ] 

Xuefu Zhang commented on HIVE-9281:
---

My eyes got sored after going thru the patch, but +1. It's a big, nice cleanup.

 Code cleanup [Spark Branch]
 ---

 Key: HIVE-9281
 URL: https://issues.apache.org/jira/browse/HIVE-9281
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-9281-spark.patch


 In preparation for merge, we need to cleanup the codes.
 This includes removing TODO's, fixing checkstyles, removing commented or 
 unused code, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9281) Code cleanup [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268523#comment-14268523
 ] 

Xuefu Zhang commented on HIVE-9281:
---

+1

 Code cleanup [Spark Branch]
 ---

 Key: HIVE-9281
 URL: https://issues.apache.org/jira/browse/HIVE-9281
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-9281-spark.patch, HIVE-9281.2-spark.patch


 In preparation for merge, we need to cleanup the codes.
 This includes removing TODO's, fixing checkstyles, removing commented or 
 unused code, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9301) Potential null dereference in MoveTask#createTargetPath()


[ 
https://issues.apache.org/jira/browse/HIVE-9301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268564#comment-14268564
 ] 

Xuefu Zhang commented on HIVE-9301:
---

+1

 Potential null dereference in MoveTask#createTargetPath()
 -

 Key: HIVE-9301
 URL: https://issues.apache.org/jira/browse/HIVE-9301
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: HIVE-9301.patch


 {code}
 if (mkDirPath != null  !fs.exists(mkDirPath)) {
 {code}
 '' should be used instead of single ampersand.
 If mkDirPath is null, fs.exists() would still be called - resulting in NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9289) TODO : Store user name in session [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268105#comment-14268105
 ] 

Xuefu Zhang commented on HIVE-9289:
---

Maybe I don't fully understand the code, but it seems a little concerning that 
session is reused purely based on user name. Hive supports multiple sessions 
for the same user, and we don't want such session is reused.

I also have a feeling that we don't have any session reuse (though we have code 
here and there for that). If that's the case, we'd rather just get rid of the 
code.

[~chengxiang li], could you comment on this?

 TODO : Store user name in session [Spark Branch]
 

 Key: HIVE-9289
 URL: https://issues.apache.org/jira/browse/HIVE-9289
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-9289.1-spark.patch


 TODO  : this we need to store the session username somewhere else as 
 getUGIForConf never used the conf SparkSessionManagerImpl.java 
 /hive-exec/src/java/org/apache/hadoop/hive/ql/exec/spark/session line 145 
 Java Task



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 29655: HIVE-9288 TODO cleanup task1[Spark Branch]

2015-01-07 Thread Xuefu Zhang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29655/#review67024
---

Ship it!


Ship It!

- Xuefu Zhang


On Jan. 7, 2015, 9:03 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/29655/
 ---
 
 (Updated Jan. 7, 2015, 9:03 a.m.)
 
 
 Review request for hive, Szehon Ho and Xuefu Zhang.
 
 
 Bugs: HIVE-9288
 https://issues.apache.org/jira/browse/HIVE-9288
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 clean job status related TODO.
 
 
 Diffs
 -
 
   spark-client/src/main/java/org/apache/hive/spark/client/JobHandle.java 
 fd5daf4 
   spark-client/src/main/java/org/apache/hive/spark/client/JobHandleImpl.java 
 6aeb6b7 
   spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java 
 c873e8a 
 
 Diff: https://reviews.apache.org/r/29655/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 chengxiang li

[jira] [Commented] (HIVE-9288) TODO cleanup task1.[Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267729#comment-14267729
 ] 

Xuefu Zhang commented on HIVE-9288:
---

+1

 TODO cleanup task1.[Spark Branch]
 -

 Key: HIVE-9288
 URL: https://issues.apache.org/jira/browse/HIVE-9288
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
Priority: Minor
  Labels: Spark-M5
 Attachments: HIVE-9288.1-spark.patch


 cleanup TODO for job status related class if available before merge back to 
 trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9110) Performance of SELECT COUNT(*) FROM store_sales WHERE ss_item_sk IS NOT NULL [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9110:
--
Assignee: Rui Li  (was: Chao)

 Performance of SELECT COUNT(*) FROM store_sales WHERE ss_item_sk IS NOT NULL 
 [Spark Branch]
 ---

 Key: HIVE-9110
 URL: https://issues.apache.org/jira/browse/HIVE-9110
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Rui Li

 The query 
 {noformat}
 SELECT COUNT(*) FROM store_sales WHERE ss_item_sk IS NOT NULL
 {noformat}
 could benefit from performance enhancements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9251) SetSparkReducerParallelism is likely to set too small number of reducers [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268017#comment-14268017
 ] 

Xuefu Zhang commented on HIVE-9251:
---

Besides HIVE-9290, it seems that golden files for limit_pushdown.q and 
outer_join_ppr.q also need to be updated.

+1 for the code change.

 SetSparkReducerParallelism is likely to set too small number of reducers 
 [Spark Branch]
 ---

 Key: HIVE-9251
 URL: https://issues.apache.org/jira/browse/HIVE-9251
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-9251.1-spark.patch, HIVE-9251.2-spark.patch, 
 HIVE-9251.3-spark.patch


 This may hurt performance or even lead to task failures. For example, spark's 
 netty-based shuffle limits the max frame size to be 2G.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9288) TODO cleanup task1.[Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9288:
--
   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Committed to Spark branch. Thanks, Chengxiang.

 TODO cleanup task1.[Spark Branch]
 -

 Key: HIVE-9288
 URL: https://issues.apache.org/jira/browse/HIVE-9288
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
Priority: Minor
  Labels: Spark-M5
 Fix For: spark-branch

 Attachments: HIVE-9288.1-spark.patch


 cleanup TODO for job status related class if available before merge back to 
 trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9288) TODO cleanup task1.[Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9288:
--
Status: Patch Available  (was: Open)

 TODO cleanup task1.[Spark Branch]
 -

 Key: HIVE-9288
 URL: https://issues.apache.org/jira/browse/HIVE-9288
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
Priority: Minor
  Labels: Spark-M5
 Attachments: HIVE-9288.1-spark.patch


 cleanup TODO for job status related class if available before merge back to 
 trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-6173) Beeline doesn't accept --hiveconf option as Hive CLI does

[
https://issues.apache.org/jira/browse/HIVE-6173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266711#comment-14266711
]

Xuefu Zhang commented on HIVE-6173:
---

Aha. I see. These undocumented properties, such as maxHeight and trimScripts,
are not internal, but unknown to majority of users (probably due to lack of
documentation). While it's nice to have them documented, but doing so requires
work load from the community. Not just write a few works about them, but also
ensure they are doing what they are supposed to do. Here I give some
descriptions w/o the merit of guaranty:

1. showElapsedTime -- whether to log elapsed time at command prompt. Default
true.
2. maxHeight, maxWidth -- maximum height/width of the output. Default, the
height/width of the terminal.
3. timeout -- unused
4. trimScripts -- whether to trim leading/trailing spaces/tabs in the script.
Default true.
5. allowMultiLineCommand -- whether to allow multi-line commands. Default true.

Beeline doesn't accept --hiveconf option as Hive CLI does
-

Key: HIVE-6173
URL: https://issues.apache.org/jira/browse/HIVE-6173
Project: Hive
Issue Type: Improvement
Components: CLI
Affects Versions: 0.10.0, 0.11.0, 0.12.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
Labels: TODOC13
Fix For: 0.13.0

Attachments: HIVE-6173.1.patch, HIVE-6173.2.patch, HIVE-6173.patch

{code}
beeline -u jdbc:hive2:// --hiveconf a=b
Usage: java org.apache.hive.cli.beeline.BeeLine
{code}
Since Beeline is replacing Hive CLI, it should support this command line
option as well.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9267) Ensure custom UDF works with Spark [Spark Branch]

Xuefu Zhang created HIVE-9267:
-

 Summary: Ensure custom UDF works with Spark [Spark Branch]
 Key: HIVE-9267
 URL: https://issues.apache.org/jira/browse/HIVE-9267
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang


Create or add auto qtest if necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9267) Ensure custom UDF works with Spark [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9267:
--
Status: Patch Available  (was: Open)

 Ensure custom UDF works with Spark [Spark Branch]
 -

 Key: HIVE-9267
 URL: https://issues.apache.org/jira/browse/HIVE-9267
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-9267.1-spark.patch


 Create or add auto qtest if necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9267) Ensure custom UDF works with Spark [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9267:
--
Attachment: HIVE-9267.1-spark.patch

 Ensure custom UDF works with Spark [Spark Branch]
 -

 Key: HIVE-9267
 URL: https://issues.apache.org/jira/browse/HIVE-9267
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-9267.1-spark.patch


 Create or add auto qtest if necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9251) SetSparkReducerParallelism is likely to set too small number of reducers [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267223#comment-14267223
 ] 

Xuefu Zhang commented on HIVE-9251:
---

Hi Rui, for our unit test, the input size and cluster are all fixed. It 
shouldn't matter whether reducer count is exposed in the plan. As to the 
question of whether or not, we briefly discussed about this today and we will 
try to use the same RSC with query execution for explain query. If this can be 
nicely shared, it seems okay to have it in the plan. Let me know if I missed 
anything. 

 SetSparkReducerParallelism is likely to set too small number of reducers 
 [Spark Branch]
 ---

 Key: HIVE-9251
 URL: https://issues.apache.org/jira/browse/HIVE-9251
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-9251.1-spark.patch, HIVE-9251.2-spark.patch


 This may hurt performance or even lead to task failures. For example, spark's 
 netty-based shuffle limits the max frame size to be 2G.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9251) SetSparkReducerParallelism is likely to set too small number of reducers [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267243#comment-14267243
 ] 

Xuefu Zhang commented on HIVE-9251:
---

The patch looks good. One question though: (-1, -1) is returned for get memory 
and core call, which makes me wonder what's the behavior on Hive side if that's 
the case. Should we somehow safeguard on this?

 SetSparkReducerParallelism is likely to set too small number of reducers 
 [Spark Branch]
 ---

 Key: HIVE-9251
 URL: https://issues.apache.org/jira/browse/HIVE-9251
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-9251.1-spark.patch, HIVE-9251.2-spark.patch


 This may hurt performance or even lead to task failures. For example, spark's 
 netty-based shuffle limits the max frame size to be 2G.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9154) Cache pathToPartitionInfo in context aware record reader


 [ 
https://issues.apache.org/jira/browse/HIVE-9154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9154:
--
   Resolution: Fixed
Fix Version/s: (was: spark-branch)
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Jimmy.

 Cache pathToPartitionInfo in context aware record reader
 

 Key: HIVE-9154
 URL: https://issues.apache.org/jira/browse/HIVE-9154
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Fix For: 0.15.0

 Attachments: HIVE-9154.1-spark.patch, HIVE-9154.1-spark.patch, 
 HIVE-9154.2.patch, HIVE-9154.3.patch


 This is similar to HIVE-9127.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9219) Investigate differences for auto join tests in explain after merge from trunk [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267299#comment-14267299
 ] 

Xuefu Zhang commented on HIVE-9219:
---

[~csun], anything to be done here? If not, we just close this as not a 
problem then.

 Investigate differences for auto join tests in explain after merge from trunk 
 [Spark Branch]
 

 Key: HIVE-9219
 URL: https://issues.apache.org/jira/browse/HIVE-9219
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Chao

 {noformat}
 diff --git a/ql/src/test/results/clientpositive/spark/auto_join14.q.out 
 b/ql/src/test/results/clientpositive/spark/auto_join14.q.out
 index cbca649..830314e 100644
 --- a/ql/src/test/results/clientpositive/spark/auto_join14.q.out
 +++ b/ql/src/test/results/clientpositive/spark/auto_join14.q.out
 @@ -38,9 +38,6 @@ STAGE PLANS:
  predicate: (key  100) (type: boolean)
  Statistics: Num rows: 166 Data size: 1763 Basic stats: 
 COMPLETE Column stats: NONE
  Spark HashTable Sink Operator
 -  condition expressions:
 -0 
 -1 {value}
keys:
  0 key (type: string)
  1 key (type: string)
 @@ -62,9 +59,6 @@ STAGE PLANS:
  Map Join Operator
condition map:
 Inner Join 0 to 1
 -  condition expressions:
 -0 {key}
 -1 {value}
keys:
  0 key (type: string)
  1 key (type: string)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9251) SetSparkReducerParallelism is likely to set too small number of reducers [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267293#comment-14267293
 ] 

Xuefu Zhang commented on HIVE-9251:
---

I see it in the code now. Patch looks good. I just had one minor 
comment/question on RB.

 SetSparkReducerParallelism is likely to set too small number of reducers 
 [Spark Branch]
 ---

 Key: HIVE-9251
 URL: https://issues.apache.org/jira/browse/HIVE-9251
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-9251.1-spark.patch, HIVE-9251.2-spark.patch


 This may hurt performance or even lead to task failures. For example, spark's 
 netty-based shuffle limits the max frame size to be 2G.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9243) Static Map in IOContext is not thread safe