from:"Chao Sun"

[jira] [Created] (HIVE-10434) Cancel connection to HS2 when remote Spark driver process has failed [Spark Branch]

2015-04-21 Thread Chao Sun (JIRA)

Chao Sun created HIVE-10434:
---

 Summary: Cancel connection to HS2 when remote Spark driver process 
has failed [Spark Branch] 
 Key: HIVE-10434
 URL: https://issues.apache.org/jira/browse/HIVE-10434
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Affects Versions: 1.2.0
Reporter: Chao Sun
Assignee: Chao Sun


Currently in HoS, in SparkClientImpl it first launch a remote Driver process, 
and then wait for it to connect back to the HS2. However, in certain situations 
(for instance, permission issue), the remote process may fail and exit with 
error code. In this situation, the HS2 process will still wait for the process 
to connect, and wait for a full timeout period before it throws the exception.

What makes it worth, user may need to wait for two timeout periods: one for the 
SparkSetReducerParallelism, and another for the actual Spark job. This could be 
very annoying.

We should cancel the timeout task once we found out that the process has 
failed, and set the promise as failed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 33422: HIVE-10434 - Cancel connection when remote Spark driver process has failed [Spark Branch]

2015-04-21 Thread Chao Sun



 On April 22, 2015, 12:38 a.m., Marcelo Vanzin wrote:
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java, 
  line 172
  https://reviews.apache.org/r/33422/diff/1/?file=938965#file938965line172
 
  This will throw an exception if the child process exits with a non-zero 
  status after the RSC connects back to HS2. I don't think you want that.

Oh yes. I forgot that case.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33422/#review81103
---


On April 22, 2015, 12:30 a.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33422/
 ---
 
 (Updated April 22, 2015, 12:30 a.m.)
 
 
 Review request for hive and Marcelo Vanzin.
 
 
 Bugs: HIVE-10434
 https://issues.apache.org/jira/browse/HIVE-10434
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 This patch cancels the connection from HS2 to remote process once the latter 
 has failed and exited with error code, to
 avoid potential long timeout.
 It add a new public method cancelClient to the RpcServer class - not sure 
 whether there's an easier way to do this..
 
 
 Diffs
 -
 
   
 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
 71e432d 
   spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java 
 32d4c46 
 
 Diff: https://reviews.apache.org/r/33422/diff/
 
 
 Testing
 ---
 
 Tested on my own cluster, and it worked.
 
 
 Thanks,
 
 Chao Sun

Re: Review Request 33422: HIVE-10434 - Cancel connection when remote Spark driver process has failed [Spark Branch]

2015-04-21 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33422/
---

(Updated April 22, 2015, 1:25 a.m.)


Review request for hive and Marcelo Vanzin.


Bugs: HIVE-10434
https://issues.apache.org/jira/browse/HIVE-10434


Repository: hive-git


Description
---

This patch cancels the connection from HS2 to remote process once the latter 
has failed and exited with error code, to
avoid potential long timeout.
It add a new public method cancelClient to the RpcServer class - not sure 
whether there's an easier way to do this..


Diffs (updated)
-

  spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
71e432d 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java 
32d4c46 

Diff: https://reviews.apache.org/r/33422/diff/


Testing
---

Tested on my own cluster, and it worked.


Thanks,

Chao Sun

[jira] [Created] (HIVE-10433) Cancel connection when remote driver process exited with error code [Spark Branch]

2015-04-21 Thread Chao Sun (JIRA)

Chao Sun created HIVE-10433:
---

 Summary: Cancel connection when remote driver process exited with 
error code [Spark Branch]
 Key: HIVE-10433
 URL: https://issues.apache.org/jira/browse/HIVE-10433
 Project: Hive
  Issue Type: Bug
  Components: spark-branch
Reporter: Chao Sun


Currently in HoS, after starting a remote process in SparkClientImpl, it will 
wait for the process to connect back. However, there are cases that the process 
may fail and exit with error code, and thus no connection is attempted. In this 
situation, the HS2 process will still wait for the connection and eventually 
timeout itself. What makes it worse, user may need to wait for two timeout 
periods, one for SparkSetReducerParallelism, and another for the actual Spark 
job.

We should cancel the timeout task and mark the promise as failed once we know 
that the process is failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Review Request 33422: HIVE-10434 - Cancel connection when remote Spark driver process has failed [Spark Branch]

2015-04-21 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33422/
---

Review request for hive and Marcelo Vanzin.


Bugs: HIVE-10434
https://issues.apache.org/jira/browse/HIVE-10434


Repository: hive-git


Description
---

This patch cancels the connection from HS2 to remote process once the latter 
has failed and exited with error code, to
avoid potential long timeout.
It add a new public method cancelClient to the RpcServer class - not sure 
whether there's an easier way to do this..


Diffs
-

  spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
71e432d 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java 
32d4c46 

Diff: https://reviews.apache.org/r/33422/diff/


Testing
---

Tested on my own cluster, and it worked.


Thanks,

Chao Sun

Re: VOTE: move to git

2015-04-15 Thread Chao Sun

+1. Looking forward to seeing it get implemented.

On Thu, Apr 16, 2015 at 12:11 AM, Owen O'Malley owen.omal...@gmail.com
wrote:

 +1

 Thanks for taking the initiative and starting this.

 .. Owen

  On Apr 15, 2015, at 23:46, Sergey Shelukhin ser...@apache.org wrote:
 
  Hi.
  We’ve been discussing this some time ago; this time I¹d like to start an
  official vote about moving Hive project to git from svn.
 
  I volunteer to facilitate the move; that seems to be just filing INFRA
  jira, and following instructions such as verifying that the new repo is
  sane.
 
  Please vote:
  +1 move to git
  0 don’t care
  -1 stay on svn
 
  +1.
 
 
 




-- 
Best,
Chao

Re: [ANNOUNCE] New Hive Committer - Mithun Radhakrishnan

2015-04-14 Thread Chao Sun

Congrats Mithun!

On Tue, Apr 14, 2015 at 3:29 PM, Chris Drome cdr...@yahoo-inc.com.invalid
wrote:

 Congratulations Mithun!



  On Tuesday, April 14, 2015 2:57 PM, Carl Steinbach c...@apache.org
 wrote:


  The Apache Hive PMC has voted to make Mithun Radhakrishnan a committer on
 the Apache Hive Project.
 Please join me in congratulating Mithun.
 Thanks.
 - Carl







-- 
Best,
Chao

Re: Dataset for Hive

2015-04-02 Thread Chao Sun

Hi Xiaohe,

You can try TPC-DS from https://github.com/hortonworks/hive-testbench.
It contains large number of queries with complex joins.

Chao

On Wed, Apr 1, 2015 at 9:30 PM, xiaohe lan zombiexco...@gmail.com wrote:

 Hi All,

 I am new to Hive. Just set up a 5 node Hadoop environment and want to have
 a try on HiveQL.
 Is there any dataset I can download to play HiveQL. The dataset should have
 several tables some I can write some complex join. About 100G should be
 fine.

 Thanks,
 Xiaohe

Re: Request for feedback on work intent for non-equijoin support

2015-04-01 Thread Chao Sun

Hey Lefty,

You need to use the ftp protocol, not http.
After clicking the link, you'll need to remove http://; from the address
bar.

Best,
Chao

On Wed, Apr 1, 2015 at 9:41 PM, Lefty Leverenz leftylever...@gmail.com
wrote:

 Andrés, I followed that link and got the dread 404 Not Found:

 The requested URI /pub/torres/Hiperfuse/extended_hiperfuse.pdf was not
 found on this server.

 -- Lefty

 On Wed, Apr 1, 2015 at 7:23 PM, andres.qui...@parc.com wrote:

  Dear Lefty,
 
  Thank you very much for pointing that out and for your initial pointers.
  Here is the missing link:
 
  ftp.parc.com/pub/torres/Hiperfuse/extended_hiperfuse.pdf
 
  Regards,
 
  Andrés
 
  -Original Message-
  From: Lefty Leverenz [mailto:leftylever...@gmail.com]
  Sent: Wednesday, April 01, 2015 12:48 AM
  To: dev@hive.apache.org
  Subject: Re: Request for feedback on work intent for non-equijoin support
 
  Hello Andres, the link to your paper is missing:
 
  In our preliminary work, which you can find here (pointer to the paper)
 ...
 
 
  You can find general information about contributing to Hive in the
  wiki:  Resources
  for Contributors
  
 
 https://cwiki.apache.org/confluence/display/Hive/Home#Home-ResourcesforContributors
  
  , How to Contribute
  https://cwiki.apache.org/confluence/display/Hive/HowToContribute.
 
  -- Lefty
 
  On Tue, Mar 31, 2015 at 10:42 PM, andres.qui...@parc.com wrote:
 
Dear Hive development community members,
  
  
  
   I am interested in learning more about the current support for
   non-equijoins in Hive and/or other Hadoop SQL engines, and in getting
   feedback about community interest in more extensive support for such a
   feature. I intend to work on this challenge, assuming people find it
   compelling, and I intend to contribute results to the community. Where
   possible, it would be great to receive feedback and engage in
   collaborations along the way (for a bit more context, see the
   postscript of this message).
  
  
  
   My initial goal is to support query conditions such as the following:
  
  
  
   A.x  B.y
  
   A.x in_range [B.y, B.z]
  
   distance(A.x, B.y)  D
  
  
  
   where A and B are distinct tables/files. It is my understanding that
   current support for performing non-equijoins like those above is quite
   limited, and where some forms are supported (like in Cloudera's
   Impala), this support is based on doing a potentially expensive cross
  product join.
   Depending on the data types involved, I believe that joins with these
   conditions can be made to be tractable (at least on the average) with
   join algorithms that exploit properties of the data types, possibly
   with some pre-scanning of the data.
  
  
  
   I am asking for feedback on the interest  need in the community for
   this work, as well as any pointers to similar work. In particular, I
   would appreciate any answers people could give on the following
  questions:
  
  
  
   - Is my understanding of the state of the art in Hive and similar
   tools accurate? Are there groups currently working on similar or
   related issues, or tools that already accomplish some or all of what I
  have proposed?
  
   - Is there significant value to the community in the support of such a
   feature? In other words, are the manual workarounds necessary because
   of the absence of non-equijoins such as these enough of a pain to
   justify the work I propose?
  
   - Being aware that the potential pre-scanning adds to the cost of the
   join, and that data could still blow-up in the worst case, am I
   missing any other important considerations and tradeoffs for this
  problem?
  
   - What would be a good avenue to contribute this feature to the
   community (e.g. as a standalone tool on top of Hadoop, or as a Hive
   extension or plugin)?
  
   - What is the best way to get started in working with the community?
  
  
  
   Thanks for your attention and any info you can provide!
  
  
  
   Andres Quiroz
  
  
  
   P.S. If you are interested in some context, and why/how I am proposing
   to do this work, please read on.
  
  
  
   I am part of a small project team at PARC working on the general
   problems of data integration and automated ETL. We have proposed a
   tool called HiperFuse that is designed to accept declarative,
   high-level queries in order to produce joined (fused) data sets from
   multiple heterogeneous raw data sources. In our preliminary work,
   which you can find here (pointer to the paper), we designed the
   architecture of the tool and obtained some results separately on the
   problems of automated data cleansing, data type inference, and query
   planning. One of the planned prototype implementations of HiperFuse
   relies on Hadoop MR, and because the declarative language we proposed
   was closely related to SQL, we thought that we could exploit the
   existing work in Hive and/or other open-source tools for handling the
   SQL part and layer our work on top of

Re: Review Request 32692: HIVE-10083 SMBJoin fails in case one table is uninitialized

2015-03-31 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32692/#review78380
---

Ship it!


Ship It!

- Chao Sun


On March 31, 2015, 5:01 p.m., Na Yang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/32692/
 ---
 
 (Updated March 31, 2015, 5:01 p.m.)
 
 
 Review request for hive, Brock Noland, Chao Sun, and Xuefu Zhang.
 
 
 Bugs: 10083
 https://issues.apache.org/jira/browse/10083
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 When one table is unintialized, the smallTblFilesNames is a empty list which 
 caues the IndexOutOfBoundsException when 
 smallTblFileNames.get(toAddSmallIndex) is called.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/AbstractBucketJoinProc.java 
 70c23a6 
 
 Diff: https://reviews.apache.org/r/32692/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Na Yang

Re: [ANNOUNCE] New Hive Committers - Jimmy Xiang, Matt McCline, and Sergio Pena

2015-03-23 Thread Chao Sun

Congrats everyone!

On Mon, Mar 23, 2015 at 11:33 AM, Alexander Pivovarov apivova...@gmail.com
wrote:

 Congrats to Matt, Jimmy and Sergio!

 On Mon, Mar 23, 2015 at 11:30 AM, Chaoyu Tang ct...@cloudera.com wrote:

  Congratulations to Jimmy and Sergio!
 
  On Mon, Mar 23, 2015 at 2:08 PM, Carl Steinbach c...@apache.org wrote:
 
  The Apache Hive PMC has voted to make Jimmy Xiang, Matt McCline, and
  Sergio Pena committers on the Apache Hive Project.
 
  Please join me in congratulating Jimmy, Matt, and Sergio.
 
  Thanks.
 
  - Carl
 
 
 




-- 
Best,
Chao

Re: Review Request 31942: HIVE-9930 fix QueryPlan.makeQueryId time format

2015-03-11 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31942/#review76102
---

Ship it!


Ship It!

- Chao Sun


On March 11, 2015, 5:48 p.m., Alexander Pivovarov wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/31942/
 ---
 
 (Updated March 11, 2015, 5:48 p.m.)
 
 
 Review request for hive, Jason Dere, Thejas Nair, and Xuefu Zhang.
 
 
 Bugs: HIVE-9930
 https://issues.apache.org/jira/browse/HIVE-9930
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-9930 fix QueryPlan.makeQueryId time format
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java 
 8e1e6e2b4f29a2499845df1f565dbb6859b262a8 
 
 Diff: https://reviews.apache.org/r/31942/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Alexander Pivovarov

Re: [VOTE] Apache Hive 1.1.0 Release Candidate 3

2015-02-19 Thread Chao Sun

+1

1. Build src with hadoop-1 and hadoop-2, tested the generated bin with some
DDL/DML queries.
2. Tested the bin with some DDL/DML queries.
3. Verified signature for bin and src, both asc and md5.

Chao

On Thu, Feb 19, 2015 at 1:55 AM, Szehon Ho sze...@cloudera.com wrote:

 +1

 1.  Verified signature for bin and src
 2.  Built src with hadoop2
 3.  Ran few queries from beeline with src
 4.  Ran few queries from beeline with bin
 5.  Verified no SNAPSHOT deps

 Thanks
 Szehon

 On Wed, Feb 18, 2015 at 10:03 PM, Xuefu Zhang xzh...@cloudera.com wrote:

  +1
 
  1. downloaded the src tarball and built w/ -Phadoop-1/2
  2. verified no binary (jars) in the src tarball
 
  On Wed, Feb 18, 2015 at 8:56 PM, Brock Noland br...@cloudera.com
 wrote:
 
   +1
  
   verified sigs, hashes, created tables, ran MR on YARN jobs
  
   On Wed, Feb 18, 2015 at 8:54 PM, Brock Noland br...@cloudera.com
  wrote:
Apache Hive 1.1.0 Release Candidate 3 is available here:
http://people.apache.org/~brock/apache-hive-1.1.0-rc3/
   
Maven artifacts are available here:
   
 https://repository.apache.org/content/repositories/orgapachehive-1026/
   
Source tag for RC3 is at:
http://svn.apache.org/repos/asf/hive/tags/release-1.1.0-rc3/
   
My key is located here:
 https://people.apache.org/keys/group/hive.asc
   
Voting will conclude in 72 hours
  
 




-- 
Best,
Chao

Re: [VOTE] Apache Hive 1.1.0 Release Candidate 2

2015-02-18 Thread Chao Sun

I tested apache-hive.1.1.0-bin and I also got the same error as Szehon
reported.

On Wed, Feb 18, 2015 at 3:48 PM, Brock Noland br...@cloudera.com wrote:

 Hi,



 On Wed, Feb 18, 2015 at 2:21 PM, Gopal Vijayaraghavan gop...@apache.org
 wrote:
  Hi,
 
  From the release branch, I noticed that the hive-exec.jar now contains a
  copy of guava-14 without any relocations.
 
  The hive spark-client pom.xml adds guava as a lib jar instead of shading
  it in.
 
  https://github.com/apache/hive/blob/branch-1.1/spark-client/pom.xml#L111
 
 
  That seems to be a great approach for guava compat issues across
 execution
  engines.
 
 
  Spark itself relocates guava-14 for compatibility with Hive-on-Spark(??).
 
  https://issues.apache.org/jira/browse/SPARK-2848
 
 
  Does any of the same compatibility issues occur when using a
 hive-exec.jar
  containing guava-14 on MRv2 (which has guava-11 in the classpath)?

 Not that I am aware of. I've tested it on top of MRv2 a number of
 times and I think the unit tests also excercise these code paths.

 
  Cheers,
  Gopal
 
  On 2/17/15, 3:14 PM, Brock Noland br...@cloudera.com wrote:
 
 Apache Hive 1.1.0 Release Candidate 2 is available here:
 http://people.apache.org/~brock/apache-hive-1.1.0-rc2/
 
 Maven artifacts are available here:
 https://repository.apache.org/content/repositories/orgapachehive-1025/
 
 Source tag for RC1 is at:
 http://svn.apache.org/repos/asf/hive/tags/release-1.1.0-rc2/
 
 My key is located here: https://people.apache.org/keys/group/hive.asc
 
 Voting will conclude in 72 hours
 
 




-- 
Best,
Chao

Re: [VOTE] Apache Hive 1.1.0 Release Candidate 1

2015-02-16 Thread Chao Sun

- Tried to build the src for both hadoop-1 and hadoop-2, and some simple
DDL/DML queries from generated bin. They worked fine.
- Tried to run some simple DDL/DML queries from the bin, and worked fine.
- Verified PGP signature and MD5 sum for both src and bin. They are OK.

+1


On Mon, Feb 16, 2015 at 9:08 PM, Brock Noland br...@cloudera.com wrote:

 Apache Hive 1.1.0 Release Candidate 0 is available here:
 http://people.apache.org/~brock/apache-hive-1.1.0-rc1/

 Maven artifacts are available here:
 https://repository.apache.org/content/repositories/orgapachehive-1024/

 Source tag for RC1 is at:
 http://svn.apache.org/repos/asf/hive/tags/release-1.1.0-rc1/

 My key is located here: https://people.apache.org/keys/group/hive.asc

 Voting will conclude in 72 hours




-- 
Best,
Chao

Re: Hive 1.0 patch 9481

2015-02-12 Thread Chao Sun

No, there's no such way. You need to rebuild the project from source after
applying the patch.
Please checkout
https://cwiki.apache.org/confluence/display/Hive/HowToContribute for more
details.

Chao

On Thu, Feb 12, 2015 at 4:05 AM, Srinivas Thunga srinivas.thu...@gmail.com
wrote:

 Hi Team,

 is there any way that we can run the patch directly on Hive instead of
 source.

 As i am using hive-0.14 - bin.

 So i need to apply the patch directly in hive to get reflect and support
 for select columns for insert




 *Thanks  Regards,*

 *Srinivas T*




-- 
Best,
Chao

Re: Review Request 30388: HIVE-9103 - Support backup task for join related optimization [Spark Branch]

2015-01-29 Thread Chao Sun



 On Jan. 29, 2015, 4:20 a.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java,
   line 295
  https://reviews.apache.org/r/30388/diff/1/?file=839499#file839499line295
 
  childrenBackupTasks or backChildrenTasks? I suggest more consistent 
  variable/method names. Since the none is task, I suggest child.

Good point. Will change.


 On Jan. 29, 2015, 4:20 a.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java, line 110
  https://reviews.apache.org/r/30388/diff/1/?file=839504#file839504line110
 
  In Spark branch - For Spark

Will change.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30388/#review70150
---


On Jan. 29, 2015, 1:05 a.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30388/
 ---
 
 (Updated Jan. 29, 2015, 1:05 a.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-9103
 https://issues.apache.org/jira/browse/HIVE-9103
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 This patch adds backup task to map join task. The backup task, which uses 
 common join, will be triggered
 in case the mapjoin task failed.
 
 Note that, no matter how many map joins there are in the SparkTask, we will 
 only generate one backup task.
 This means that if the original task failed at the very last map join, the 
 whole task will be re-executed.
 
 The handling of backup task is a little bit different from what MR does, 
 mostly because we convert JOIN to
 MAPJOIN during the operator plan optimization phase, at which time no 
 task/work exist yet. In the patch, we
 cloned the whole operator tree before the JOIN operator is converted. The 
 operator tree will be processed
 and generate a separate work tree for a separate backup SparkTask.
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
  69004dc 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/StageIDsRearranger.java
  79c3e02 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkJoinOptimizer.java 
 d57ceff 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java
  9ff47c7 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkSortMergeJoinFactory.java
  6e0ac38 
   ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java b838bff 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 773cfbd 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java
  f7586a4 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 3a7477a 
   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 0e85990 
   ql/src/test/results/clientpositive/spark/auto_join25.q.out ab01b8a 
 
 Diff: https://reviews.apache.org/r/30388/diff/
 
 
 Testing
 ---
 
 auto_join25.q
 
 
 Thanks,
 
 Chao Sun

Re: Review Request 30388: HIVE-9103 - Support backup task for join related optimization [Spark Branch]

2015-01-29 Thread Chao Sun

/auto_sortmerge_join_13.q.out 7eadcd0 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_14.q.out 984db20 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_15.q.out 2acc323 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_2.q.out f05b0cc 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_3.q.out c7d23f8 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_4.q.out f5dc2f7 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_5.q.out 26e7957 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_7.q.out a5c0562 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_8.q.out ef13a40 
  ql/src/test/results/clientpositive/spark/auto_sortmerge_join_9.q.out a2b98fc 
  ql/src/test/results/clientpositive/spark/bucket_map_join_spark1.q.out 6230bef 
  ql/src/test/results/clientpositive/spark/bucket_map_join_spark2.q.out 1a33625 
  ql/src/test/results/clientpositive/spark/bucket_map_join_spark3.q.out fed923c 
  ql/src/test/results/clientpositive/spark/bucket_map_join_spark4.q.out 8b5e8d4 
  ql/src/test/results/clientpositive/spark/bucket_map_join_tez1.q.out 1c81d1b 
  ql/src/test/results/clientpositive/spark/bucket_map_join_tez2.q.out 04a934f 
  ql/src/test/results/clientpositive/spark/bucketsortoptimize_insert_2.q.out 
365306e 
  ql/src/test/results/clientpositive/spark/bucketsortoptimize_insert_4.q.out 
3846de7 
  ql/src/test/results/clientpositive/spark/bucketsortoptimize_insert_6.q.out 
5b559c4 
  ql/src/test/results/clientpositive/spark/bucketsortoptimize_insert_7.q.out 
cefc6aa 
  ql/src/test/results/clientpositive/spark/bucketsortoptimize_insert_8.q.out 
ca44d7c 
  ql/src/test/results/clientpositive/spark/cross_product_check_2.q.out dda6c38 
  ql/src/test/results/clientpositive/spark/identity_project_remove_skip.q.out 
7238009 
  ql/src/test/results/clientpositive/spark/infer_bucket_sort_convert_join.q.out 
3d4eb18 
  ql/src/test/results/clientpositive/spark/join28.q.out f23f662 
  ql/src/test/results/clientpositive/spark/join29.q.out 0b4284c 
  ql/src/test/results/clientpositive/spark/join31.q.out a52a8b6 
  ql/src/test/results/clientpositive/spark/join32.q.out a9d50b4 
  ql/src/test/results/clientpositive/spark/join32_lessSize.q.out dac9610 
  ql/src/test/results/clientpositive/spark/join33.q.out a9d50b4 
  ql/src/test/results/clientpositive/spark/join_reorder4.q.out 5cc30f7 
  ql/src/test/results/clientpositive/spark/join_star.q.out 69c2fd7 
  ql/src/test/results/clientpositive/spark/mapjoin_decimal.q.out b681e5f 
  ql/src/test/results/clientpositive/spark/mapjoin_filter_on_outerjoin.q.out 
0271f97 
  ql/src/test/results/clientpositive/spark/mapjoin_hook.q.out 7aa8ce9 
  ql/src/test/results/clientpositive/spark/mapjoin_mapjoin.q.out 65a7d06 
  ql/src/test/results/clientpositive/spark/mapjoin_memcheck.q.out 14f316c 
  ql/src/test/results/clientpositive/spark/mapjoin_subquery.q.out 2d1e7a7 
  ql/src/test/results/clientpositive/spark/mapjoin_subquery2.q.out a757d0b 
  ql/src/test/results/clientpositive/spark/mapjoin_test_outer.q.out 7143348 
  ql/src/test/results/clientpositive/spark/multi_join_union.q.out bda569d 
  ql/src/test/results/clientpositive/spark/parquet_join.q.out 390aeb1 
  
ql/src/test/results/clientpositive/spark/reduce_deduplicate_exclude_join.q.out 
19ab4c8 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_17.q.out bd3a6a1 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_25.q.out cb811ed 
  ql/src/test/results/clientpositive/spark/subquery_multiinsert.q.java1.7.out 
92a8595 
  ql/src/test/results/clientpositive/spark/vector_decimal_mapjoin.q.out 5ec95c2 
  ql/src/test/results/clientpositive/spark/vector_left_outer_join.q.out ca8918a 
  ql/src/test/results/clientpositive/spark/vector_mapjoin_reduce.q.out 02c1fc6 
  ql/src/test/results/clientpositive/spark/vectorized_mapjoin.q.out 237df98 
  ql/src/test/results/clientpositive/spark/vectorized_nested_mapjoin.q.out 
f8e8ba7 

Diff: https://reviews.apache.org/r/30388/diff/


Testing
---

auto_join25.q


Thanks,

Chao Sun

Re: Review Request 30388: HIVE-9103 - Support backup task for join related optimization [Spark Branch]

2015-01-29 Thread Chao Sun

 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_17.q.out bd3a6a1 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_25.q.out cb811ed 
  ql/src/test/results/clientpositive/spark/subquery_multiinsert.q.java1.7.out 
92a8595 
  ql/src/test/results/clientpositive/spark/vector_decimal_mapjoin.q.out 5ec95c2 
  ql/src/test/results/clientpositive/spark/vector_left_outer_join.q.out ca8918a 
  ql/src/test/results/clientpositive/spark/vector_mapjoin_reduce.q.out 02c1fc6 
  ql/src/test/results/clientpositive/spark/vectorized_mapjoin.q.out 237df98 
  ql/src/test/results/clientpositive/spark/vectorized_nested_mapjoin.q.out 
f8e8ba7 
  ql/src/test/results/clientpositive/vector_mapjoin_reduce.q.out 6f11b8c 

Diff: https://reviews.apache.org/r/30388/diff/


Testing
---

auto_join25.q


Thanks,

Chao Sun

Re: [ANNOUNCE] New Hive PMC Members - Szehon Ho, Vikram Dixit, Jason Dere, Owen O'Malley and Prasanth Jayachandran

2015-01-28 Thread Chao Sun

Congrats!!!

On Wed, Jan 28, 2015 at 1:21 PM, Vaibhav Gumashta vgumas...@hortonworks.com
 wrote:

 Congratulations e’one!

 —Vaibhav
 On Jan 28, 2015, at 1:20 PM, Xuefu Zhang xzh...@cloudera.commailto:
 xzh...@cloudera.com wrote:

 Congratulations to all!

 --Xuefu

 On Wed, Jan 28, 2015 at 1:15 PM, Carl Steinbach c...@apache.orgmailto:
 c...@apache.org wrote:
 I am pleased to announce that Szehon Ho, Vikram Dixit, Jason Dere, Owen
 O'Malley and Prasanth Jayachandran have been elected to the Hive Project
 Management Committee. Please join me in congratulating the these new PMC
 members!

 Thanks.

 - Carl





-- 
Best,
Chao

Re: [ANNOUNCE] New Hive PMC Members - Szehon Ho, Vikram Dixit, Jason Dere, Owen O'Malley and Prasanth Jayachandran

2015-01-28 Thread Chao Sun

Congrats!!!

On Wed, Jan 28, 2015 at 1:21 PM, Vaibhav Gumashta vgumas...@hortonworks.com
 wrote:

 Congratulations e’one!

 —Vaibhav
 On Jan 28, 2015, at 1:20 PM, Xuefu Zhang xzh...@cloudera.commailto:
 xzh...@cloudera.com wrote:

 Congratulations to all!

 --Xuefu

 On Wed, Jan 28, 2015 at 1:15 PM, Carl Steinbach c...@apache.orgmailto:
 c...@apache.org wrote:
 I am pleased to announce that Szehon Ho, Vikram Dixit, Jason Dere, Owen
 O'Malley and Prasanth Jayachandran have been elected to the Hive Project
 Management Committee. Please join me in congratulating the these new PMC
 members!

 Thanks.

 - Carl

Review Request 30388: HIVE-9103 - Support backup task for join related optimization [Spark Branch]

2015-01-28 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30388/
---

Review request for hive and Xuefu Zhang.


Bugs: HIVE-9103
https://issues.apache.org/jira/browse/HIVE-9103


Repository: hive-git


Description
---

This patch adds backup task to map join task. The backup task, which uses 
common join, will be triggered
in case the mapjoin task failed.

Note that, no matter how many map joins there are in the SparkTask, we will 
only generate one backup task.
This means that if the original task failed at the very last map join, the 
whole task will be re-executed.

The handling of backup task is a little bit different from what MR does, mostly 
because we convert JOIN to
MAPJOIN during the operator plan optimization phase, at which time no task/work 
exist yet. In the patch, we
cloned the whole operator tree before the JOIN operator is converted. The 
operator tree will be processed
and generate a separate work tree for a separate backup SparkTask.


Diffs
-

  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
 69004dc 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/StageIDsRearranger.java
 79c3e02 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkJoinOptimizer.java 
d57ceff 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java
 9ff47c7 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkSortMergeJoinFactory.java
 6e0ac38 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java b838bff 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
773cfbd 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java 
f7586a4 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 3a7477a 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 0e85990 
  ql/src/test/results/clientpositive/spark/auto_join25.q.out ab01b8a 

Diff: https://reviews.apache.org/r/30388/diff/


Testing
---

auto_join25.q


Thanks,

Chao Sun

Re: [VOTE] Apache Hive 1.0 Release Candidate 1

2015-01-27 Thread Chao Sun

- Tried to build the src for both hadoop-1 and hadoop-2, and some simple
queries from generated bin. They worked fine.
- Tried to run some simple queries from the bin, and worked fine.
- Checked RELEASE_NOTES, NOTICE, README.txt. The copy right in the NOTICE
file needs to be updated to 2008-2015. In README.txt it mentions
@VERSION@, shouldn't that be a concrete number?
- Verified PGP signature and MD5 sum for both src and bin. One minor thing
is, for PGP signature I kept get warning saying it's not certified with a
trusted signature. Maybe the public key is not updated?

Best,
Chao

On Tue, Jan 27, 2015 at 3:36 PM, Lefty Leverenz leftylever...@gmail.com
wrote:

 Can webhcat-default.xml be updated?  Besides 0.11.0 in the defaults for
 templeton.hive.path and templeton.pig.path (HIVE-8807
 https://issues.apache.org/jira/browse/HIVE-8807) there are
 0.14.0-SNAPSHOT values for templeton.hive.home and templeton.hcat.home.

 -- Lefty

 On Tue, Jan 27, 2015 at 2:28 PM, Vikram Dixit K vikram.di...@gmail.com
 wrote:

  Apache Hive 1.0 Release Candidate 1 is available here:
  http://people.apache.org/~vikram/hive/apache-hive-1.0-rc1/
 
  Maven artifacts are available here:
  https://repository.apache.org/content/repositories/orgapachehive-1020/
 
  Source tag for RC1 is at:
  http://svn.apache.org/repos/asf/hive/branches/branch-1.0/
 
  Voting will conclude in 72 hours.
 
  Hive PMC Members: Please test and vote.
 
  Thanks
 
  Vikram.
 
 
  --
  Nothing better than when appreciated for hard work.
  -Mark
 




-- 
Best,
Chao

Review Request 29111: HIVE-9041 - Generate better plan for queries containing both union and multi-insert [Spark Branch]

2014-12-16 Thread Chao Sun

/results/clientpositive/spark/union33.q.out
ql/src/test/results/clientpositive/spark/union4.q.out
ql/src/test/results/clientpositive/spark/union5.q.out
ql/src/test/results/clientpositive/spark/union6.q.out
ql/src/test/results/clientpositive/spark/union7.q.out
ql/src/test/results/clientpositive/spark/union8.q.out
ql/src/test/results/clientpositive/spark/union9.q.out
ql/src/test/results/clientpositive/spark/union_ppr.q.out
ql/src/test/results/clientpositive/spark/union_remove_1.q.out
ql/src/test/results/clientpositive/spark/union_remove_10.q.out
ql/src/test/results/clientpositive/spark/union_remove_11.q.out
ql/src/test/results/clientpositive/spark/union_remove_15.q.out
ql/src/test/results/clientpositive/spark/union_remove_16.q.out
ql/src/test/results/clientpositive/spark/union_remove_17.q.out
ql/src/test/results/clientpositive/spark/union_remove_18.q.out
ql/src/test/results/clientpositive/spark/union_remove_19.q.out
ql/src/test/results/clientpositive/spark/union_remove_2.q.out
ql/src/test/results/clientpositive/spark/union_remove_20.q.out
ql/src/test/results/clientpositive/spark/union_remove_21.q.out
ql/src/test/results/clientpositive/spark/union_remove_24.q.out
ql/src/test/results/clientpositive/spark/union_remove_25.q.out
ql/src/test/results/clientpositive/spark/union_remove_3.q.out
ql/src/test/results/clientpositive/spark/union_remove_4.q.out
ql/src/test/results/clientpositive/spark/union_remove_5.q.out
ql/src/test/results/clientpositive/spark/union_remove_6.q.out
ql/src/test/results/clientpositive/spark/union_remove_7.q.out
ql/src/test/results/clientpositive/spark/union_remove_8.q.out
ql/src/test/results/clientpositive/spark/union_remove_9.q.out


Thanks,

Chao Sun

Re: Review Request 29111: HIVE-9041 - Generate better plan for queries containing both union and multi-insert [Spark Branch]

2014-12-16 Thread Chao Sun

/clientpositive/spark/union30.q.out
ql/src/test/results/clientpositive/spark/union33.q.out
ql/src/test/results/clientpositive/spark/union4.q.out
ql/src/test/results/clientpositive/spark/union5.q.out
ql/src/test/results/clientpositive/spark/union6.q.out
ql/src/test/results/clientpositive/spark/union7.q.out
ql/src/test/results/clientpositive/spark/union8.q.out
ql/src/test/results/clientpositive/spark/union9.q.out
ql/src/test/results/clientpositive/spark/union_ppr.q.out
ql/src/test/results/clientpositive/spark/union_remove_1.q.out
ql/src/test/results/clientpositive/spark/union_remove_10.q.out
ql/src/test/results/clientpositive/spark/union_remove_11.q.out
ql/src/test/results/clientpositive/spark/union_remove_15.q.out
ql/src/test/results/clientpositive/spark/union_remove_16.q.out
ql/src/test/results/clientpositive/spark/union_remove_17.q.out
ql/src/test/results/clientpositive/spark/union_remove_18.q.out
ql/src/test/results/clientpositive/spark/union_remove_19.q.out
ql/src/test/results/clientpositive/spark/union_remove_2.q.out
ql/src/test/results/clientpositive/spark/union_remove_20.q.out
ql/src/test/results/clientpositive/spark/union_remove_21.q.out
ql/src/test/results/clientpositive/spark/union_remove_24.q.out
ql/src/test/results/clientpositive/spark/union_remove_25.q.out
ql/src/test/results/clientpositive/spark/union_remove_3.q.out
ql/src/test/results/clientpositive/spark/union_remove_4.q.out
ql/src/test/results/clientpositive/spark/union_remove_5.q.out
ql/src/test/results/clientpositive/spark/union_remove_6.q.out
ql/src/test/results/clientpositive/spark/union_remove_7.q.out
ql/src/test/results/clientpositive/spark/union_remove_8.q.out
ql/src/test/results/clientpositive/spark/union_remove_9.q.out


Thanks,

Chao Sun

Re: Review Request 29111: HIVE-9041 - Generate better plan for queries containing both union and multi-insert [Spark Branch]

2014-12-16 Thread Chao Sun



 On Dec. 17, 2014, midnight, Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkReduceSinkMapJoinProc.java,
   line 207
  https://reviews.apache.org/r/29111/diff/1/?file=793109#file793109line207
 
  should we remove this variable completely?

Yes, I'll remove it completely.


 On Dec. 17, 2014, midnight, Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java, 
  line 93
  https://reviews.apache.org/r/29111/diff/1/?file=793110#file793110line93
 
  Original name seems more meaningful.

OK, will fix.


 On Dec. 17, 2014, midnight, Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java, line 98
  https://reviews.apache.org/r/29111/diff/1/?file=793112#file793112line98
 
  Should we keep it?

You're right - this was a mistake.


 On Dec. 17, 2014, midnight, Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java, 
  line 212
  https://reviews.apache.org/r/29111/diff/1/?file=793108#file793108line212
 
  An assert here would be good.

OK, will add.


 On Dec. 17, 2014, midnight, Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java, 
  line 133
  https://reviews.apache.org/r/29111/diff/1/?file=793108#file793108line133
 
  An assert here would be good.

OK, will add.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29111/#review65257
---


On Dec. 16, 2014, 7:02 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/29111/
 ---
 
 (Updated Dec. 16, 2014, 7:02 p.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-9041
 https://issues.apache.org/jira/browse/HIVE-9041
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 This JIRA removes UnionWork from Spark plan.
 
 UnionWork right now is just a dummy work - in execution, it is translated to 
 IdentityTran, which does nothing.
 The actually union operation is implemented with rdd.union, which happens 
 when a BaseWork has multiple parent BaseWorks. For instance:
 
  MW_1MW_2
 \/
  \  /
  RW_1
  
 In this case, MW_1 and MW_2 both translates to RDD_1 and RDD_2, and then we 
 create another RDD_3 which is the
 result of rdd.union(RDD_1, RDD_2). We then create RDD_4 for RW_1, whose 
 parent is RDD_3.
 
 *Changes on GenSparkWork*
 
 To remove the UnionWork, most changes are in GenSparkWork. I got rid of a 
 chunk of code that creates UnionWork and link the work with parent works. 
 But, I still kept `currentUnionOperators` and `workWithUnionOperators`, since 
 they are needed for removing union operators later.
 
 I also changed how `followingWork` is handled. This happens when we have the 
 following operator tree:
 
  TS_0  TS_1
\   /
 \ /
  UNION_2
   /
  RS_3
 /
FS_4

 (You can see that I ignored quite a few operators here. They are not required 
 to illustrate the problem)
 
 In this plan, we will reach `RS_3` via two different paths: `TS_0` and `TS_1`.
 The first time we get to `RS_3`, say via `TS_0`, we would break `RS_3` with 
 its child, and create a work
 for the path `TS_0 - UNION_2 - RS_3`. Let's say the work is `MW_1`.
 
 We then proceed to `FS_4`, create another ReduceWork `RW_2` for it, and link 
 `RW_2` with `MW_1`.
 We then will visit to `RS_3` for the second time, from `TS_1`, and create 
 another work for the path
 `TS_1 - UNION_2 - RS_3`, say `MW_3`.
 
 But, the problem is that `RS_3` is already disconnected with `FS_4`. In order 
 to link `MW_3` with `RW_2`,
 we need to save that information somewhere.
 
 This is why we need `leafOpToChildWorkInfo`. It is actually changed from 
 `leafOpToFollowingWork`.
 But, I found that we also need to have the edge property between `RS_3` and 
 its child saved, in order to connect.
 
 I also encountered a case where two BaseWorks may be connected twice. I've 
 explained that in the comments for the source code.
 
 *Changes on SparkPlanGenerator*
 
 Without UnionWork, SparkPlanGenerator can be a bit cleaner. The changes on 
 this class are mostly refactoring.
 I got rid of some redundant code in `generate(SparkWork)` method, and 
 combined `generate(MapWork)` and `generate(ReduceWork)` into one.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/IdentityTran.java 
 eb758e09888d7864acc9d88c7186ae2de48bc8f7 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
 438efabb062112da8fefc1bed9d8bd90ade26c67 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer

Re: Review Request 29111: HIVE-9041 - Generate better plan for queries containing both union and multi-insert [Spark Branch]

2014-12-16 Thread Chao Sun

/union29.q.out
ql/src/test/results/clientpositive/spark/union3.q.out
ql/src/test/results/clientpositive/spark/union30.q.out
ql/src/test/results/clientpositive/spark/union33.q.out
ql/src/test/results/clientpositive/spark/union4.q.out
ql/src/test/results/clientpositive/spark/union5.q.out
ql/src/test/results/clientpositive/spark/union6.q.out
ql/src/test/results/clientpositive/spark/union7.q.out
ql/src/test/results/clientpositive/spark/union8.q.out
ql/src/test/results/clientpositive/spark/union9.q.out
ql/src/test/results/clientpositive/spark/union_ppr.q.out
ql/src/test/results/clientpositive/spark/union_remove_1.q.out
ql/src/test/results/clientpositive/spark/union_remove_10.q.out
ql/src/test/results/clientpositive/spark/union_remove_11.q.out
ql/src/test/results/clientpositive/spark/union_remove_15.q.out
ql/src/test/results/clientpositive/spark/union_remove_16.q.out
ql/src/test/results/clientpositive/spark/union_remove_17.q.out
ql/src/test/results/clientpositive/spark/union_remove_18.q.out
ql/src/test/results/clientpositive/spark/union_remove_19.q.out
ql/src/test/results/clientpositive/spark/union_remove_2.q.out
ql/src/test/results/clientpositive/spark/union_remove_20.q.out
ql/src/test/results/clientpositive/spark/union_remove_21.q.out
ql/src/test/results/clientpositive/spark/union_remove_24.q.out
ql/src/test/results/clientpositive/spark/union_remove_25.q.out
ql/src/test/results/clientpositive/spark/union_remove_3.q.out
ql/src/test/results/clientpositive/spark/union_remove_4.q.out
ql/src/test/results/clientpositive/spark/union_remove_5.q.out
ql/src/test/results/clientpositive/spark/union_remove_6.q.out
ql/src/test/results/clientpositive/spark/union_remove_7.q.out
ql/src/test/results/clientpositive/spark/union_remove_8.q.out
ql/src/test/results/clientpositive/spark/union_remove_9.q.out


Thanks,

Chao Sun

Re: Review Request 28889: HIVE-8911 - Enable mapjoin hints [Spark Branch]

2014-12-12 Thread Chao Sun



 On Dec. 12, 2014, 7:45 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkMapJoinProcessor.java, 
  line 78
  https://reviews.apache.org/r/28889/diff/2/?file=789801#file789801line78
 
  nit: grandParentOps.get(0) is repeated in the next line. nice to have a 
  var for it.

Sure. Will fix.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28889/#review64959
---


On Dec. 11, 2014, 10:36 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/28889/
 ---
 
 (Updated Dec. 11, 2014, 10:36 p.m.)
 
 
 Review request for hive, Szehon Ho and Xuefu Zhang.
 
 
 Bugs: HIVE-8911
 https://issues.apache.org/jira/browse/HIVE-8911
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Basically the idea is to reuse as much code as possible, from MR.
 
 The issue is that, in MR's MapJoinProcessor, after join op is converted to 
 mapjoin op, all the parents ReduceSinkOperators are removed. However, for our 
 Spark branch, we need to preserve those, because they serve as boundaries 
 between BaseWorks, and SparkReduceSinkMapJoinProc triggers upon them.
 
 Initially I tried to move this part of logic to SparkMapJoinOptimizer, which 
 happens at a later stage. However, although this works, I'm worried it may 
 have too much affect on the smb join w/ hint, because we then have to move 
 that part of logic to SparkMapJoinOptimizer too. In general, I want to 
 minimize the affect on code path.
 
 This patch make changes on MapJoinProcessor. I created a separate method 
 convertMapJoinForSpark, which doesn't remove the 
 ReduceSinkOperators, for small tables. Then, in the transform method it 
 decides which method to call based on the execution engine.
 
 I also have to disable several tests related to smb join w/ hints. They can 
 be activated once HIVE-8640 is resolved.
 
 
 Diffs
 -
 
   data/conf/spark/hive-site.xml 44eac86 
   itests/src/test/resources/testconfiguration.properties 2348e06 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 
 773c827 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java a8a3d86 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkMapJoinProcessor.java 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/bucket_map_join_1.q.out f24ae73 
   ql/src/test/results/clientpositive/spark/bucket_map_join_2.q.out 33e9e8b 
   ql/src/test/results/clientpositive/spark/bucketmapjoin1.q.out aaa0151 
   ql/src/test/results/clientpositive/spark/bucketmapjoin10.q.out 9954b77 
   ql/src/test/results/clientpositive/spark/bucketmapjoin11.q.out ad8f0a5 
   ql/src/test/results/clientpositive/spark/bucketmapjoin12.q.out aa3e2b6 
   ql/src/test/results/clientpositive/spark/bucketmapjoin13.q.out 44233f6 
   ql/src/test/results/clientpositive/spark/bucketmapjoin2.q.out c4702ef 
   ql/src/test/results/clientpositive/spark/bucketmapjoin3.q.out 7c31e05 
   ql/src/test/results/clientpositive/spark/bucketmapjoin4.q.out a8e892e 
   ql/src/test/results/clientpositive/spark/bucketmapjoin5.q.out 041ba12 
   ql/src/test/results/clientpositive/spark/bucketmapjoin7.q.out 54c4be3 
   ql/src/test/results/clientpositive/spark/bucketmapjoin8.q.out da9fe1c 
   ql/src/test/results/clientpositive/spark/bucketmapjoin9.q.out 5a5e3f6 
   ql/src/test/results/clientpositive/spark/bucketmapjoin_negative.q.out 
 5ac3f4c 
   ql/src/test/results/clientpositive/spark/bucketmapjoin_negative2.q.out 
 e4ff965 
   ql/src/test/results/clientpositive/spark/bucketmapjoin_negative3.q.out 
 fce5566 
   ql/src/test/results/clientpositive/spark/join25.q.out 284c97d 
   ql/src/test/results/clientpositive/spark/join26.q.out e271184 
   ql/src/test/results/clientpositive/spark/join27.q.out d31f29e 
   ql/src/test/results/clientpositive/spark/join30.q.out 7fbbcfa 
   ql/src/test/results/clientpositive/spark/join36.q.out f1317ea 
   ql/src/test/results/clientpositive/spark/join37.q.out 448e983 
   ql/src/test/results/clientpositive/spark/join38.q.out 735d7ea 
   ql/src/test/results/clientpositive/spark/join39.q.out 0734d4b 
   ql/src/test/results/clientpositive/spark/join40.q.out 60ef13d 
   ql/src/test/results/clientpositive/spark/join_map_ppr.q.out 59fdb99 
   ql/src/test/results/clientpositive/spark/mapjoin1.q.out 80e38b9 
   ql/src/test/results/clientpositive/spark/mapjoin_distinct.q.out dc7241c 
   ql/src/test/results/clientpositive/spark/mapjoin_filter_on_outerjoin.q.out 
 3b80437 
   ql/src/test/results/clientpositive/spark/mapjoin_test_outer.q.out fdf8f24 
   ql/src/test/results/clientpositive/spark/semijoin.q.out 2b8e04b 
   ql/src/test/results/clientpositive/spark/skewjoin.q.out 56b78be 
 
 Diff: https://reviews.apache.org/r/28889

Re: Review Request 28889: HIVE-8911 - Enable mapjoin hints [Spark Branch]

2014-12-12 Thread Chao Sun



 On Dec. 12, 2014, 7:45 p.m., Xuefu Zhang wrote:
  Patch looks good. One suggestion, we should be able to change the static 
  methods non-static, which would further simplify the code.

I agree. Let me change it.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28889/#review64959
---


On Dec. 11, 2014, 10:36 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/28889/
 ---
 
 (Updated Dec. 11, 2014, 10:36 p.m.)
 
 
 Review request for hive, Szehon Ho and Xuefu Zhang.
 
 
 Bugs: HIVE-8911
 https://issues.apache.org/jira/browse/HIVE-8911
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Basically the idea is to reuse as much code as possible, from MR.
 
 The issue is that, in MR's MapJoinProcessor, after join op is converted to 
 mapjoin op, all the parents ReduceSinkOperators are removed. However, for our 
 Spark branch, we need to preserve those, because they serve as boundaries 
 between BaseWorks, and SparkReduceSinkMapJoinProc triggers upon them.
 
 Initially I tried to move this part of logic to SparkMapJoinOptimizer, which 
 happens at a later stage. However, although this works, I'm worried it may 
 have too much affect on the smb join w/ hint, because we then have to move 
 that part of logic to SparkMapJoinOptimizer too. In general, I want to 
 minimize the affect on code path.
 
 This patch make changes on MapJoinProcessor. I created a separate method 
 convertMapJoinForSpark, which doesn't remove the 
 ReduceSinkOperators, for small tables. Then, in the transform method it 
 decides which method to call based on the execution engine.
 
 I also have to disable several tests related to smb join w/ hints. They can 
 be activated once HIVE-8640 is resolved.
 
 
 Diffs
 -
 
   data/conf/spark/hive-site.xml 44eac86 
   itests/src/test/resources/testconfiguration.properties 2348e06 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 
 773c827 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java a8a3d86 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkMapJoinProcessor.java 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/bucket_map_join_1.q.out f24ae73 
   ql/src/test/results/clientpositive/spark/bucket_map_join_2.q.out 33e9e8b 
   ql/src/test/results/clientpositive/spark/bucketmapjoin1.q.out aaa0151 
   ql/src/test/results/clientpositive/spark/bucketmapjoin10.q.out 9954b77 
   ql/src/test/results/clientpositive/spark/bucketmapjoin11.q.out ad8f0a5 
   ql/src/test/results/clientpositive/spark/bucketmapjoin12.q.out aa3e2b6 
   ql/src/test/results/clientpositive/spark/bucketmapjoin13.q.out 44233f6 
   ql/src/test/results/clientpositive/spark/bucketmapjoin2.q.out c4702ef 
   ql/src/test/results/clientpositive/spark/bucketmapjoin3.q.out 7c31e05 
   ql/src/test/results/clientpositive/spark/bucketmapjoin4.q.out a8e892e 
   ql/src/test/results/clientpositive/spark/bucketmapjoin5.q.out 041ba12 
   ql/src/test/results/clientpositive/spark/bucketmapjoin7.q.out 54c4be3 
   ql/src/test/results/clientpositive/spark/bucketmapjoin8.q.out da9fe1c 
   ql/src/test/results/clientpositive/spark/bucketmapjoin9.q.out 5a5e3f6 
   ql/src/test/results/clientpositive/spark/bucketmapjoin_negative.q.out 
 5ac3f4c 
   ql/src/test/results/clientpositive/spark/bucketmapjoin_negative2.q.out 
 e4ff965 
   ql/src/test/results/clientpositive/spark/bucketmapjoin_negative3.q.out 
 fce5566 
   ql/src/test/results/clientpositive/spark/join25.q.out 284c97d 
   ql/src/test/results/clientpositive/spark/join26.q.out e271184 
   ql/src/test/results/clientpositive/spark/join27.q.out d31f29e 
   ql/src/test/results/clientpositive/spark/join30.q.out 7fbbcfa 
   ql/src/test/results/clientpositive/spark/join36.q.out f1317ea 
   ql/src/test/results/clientpositive/spark/join37.q.out 448e983 
   ql/src/test/results/clientpositive/spark/join38.q.out 735d7ea 
   ql/src/test/results/clientpositive/spark/join39.q.out 0734d4b 
   ql/src/test/results/clientpositive/spark/join40.q.out 60ef13d 
   ql/src/test/results/clientpositive/spark/join_map_ppr.q.out 59fdb99 
   ql/src/test/results/clientpositive/spark/mapjoin1.q.out 80e38b9 
   ql/src/test/results/clientpositive/spark/mapjoin_distinct.q.out dc7241c 
   ql/src/test/results/clientpositive/spark/mapjoin_filter_on_outerjoin.q.out 
 3b80437 
   ql/src/test/results/clientpositive/spark/mapjoin_test_outer.q.out fdf8f24 
   ql/src/test/results/clientpositive/spark/semijoin.q.out 2b8e04b 
   ql/src/test/results/clientpositive/spark/skewjoin.q.out 56b78be 
 
 Diff: https://reviews.apache.org/r/28889/diff/
 
 
 Testing
 ---
 
 bucket_map_join_1.q
 bucket_map_join_2.q
 bucketmapjoin1.q
 bucketmapjoin10.q

Re: Review Request 28889: HIVE-8911 - Enable mapjoin hints [Spark Branch]

2014-12-12 Thread Chao Sun

mapjoin_hook.q
mapjoin_tester.q
semijoin.q
skewjoin.q
table_access_keys_stats.q


Thanks,

Chao Sun

Re: Review Request 28889: HIVE-8911 - Enable mapjoin hints [Spark Branch]

2014-12-11 Thread Chao Sun

join36.q
join37.q
join38.q
join39.q
join40.q
join_empty.q
join_filters_overlap.q
join_map_ppr.q
mapjoin1.q
mapjoin_distinct.q
mapjoin_filter_onerjoin.q
mapjoin_hook.q
mapjoin_tester.q
semijoin.q
skewjoin.q
table_access_keys_stats.q


Thanks,

Chao Sun

Re: Review Request 28791: HIVE-9025 join38.q (without map join) produces incorrect result when testing with multiple reducers

2014-12-10 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28791/#review64582
---



http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagateProcFactory.java
https://reviews.apache.org/r/28791/#comment107348

trailing whitespace.


- Chao Sun


On Dec. 10, 2014, 6:09 p.m., Ted Xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/28791/
 ---
 
 (Updated Dec. 10, 2014, 6:09 p.m.)
 
 
 Review request for hive, Ashutosh Chauhan and Chao Sun.
 
 
 Bugs: HIVE-9025
 https://issues.apache.org/jira/browse/HIVE-9025
 
 
 Repository: hive
 
 
 Description
 ---
 
 HIVE-5771 introduced a bug that when all partition columns are constants, the 
 partition is transformed to be a random  dispatch, which is not expected.
 
 This patch adds a constant column in the above case to avoid random 
 partitioning.
 
 
 Diffs
 -
 
   
 http://svn.apache.org/repos/asf/hive/trunk/itests/src/test/resources/testconfiguration.properties
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagateProcFactory.java
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientpositive/constprog_partitioner.q
  PRE-CREATION 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/cluster.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/constprog2.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/constprog_partitioner.q.out
  PRE-CREATION 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/join_nullsafe.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd2.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_clusterby.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_join4.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join5.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/quotedid_basic.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/smb_mapjoin_25.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/tez/dynamic_partition_pruning.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/tez/dynamic_partition_pruning_2.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/tez/join_nullsafe.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/tez/vector_decimal_mapjoin.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/tez/vectorized_dynamic_partition_pruning.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/union27.q.out
  1644497 
 
 Diff: https://reviews.apache.org/r/28791/diff/
 
 
 Testing
 ---
 
 TestCliDriver passed.
 
 
 Thanks,
 
 Ted Xu

Re: Review Request 28791: HIVE-9025 join38.q (without map join) produces incorrect result when testing with multiple reducers

2014-12-10 Thread Chao Sun



 On Dec. 10, 2014, 6:16 p.m., Chao Sun wrote:
 

I don't think optimize_nullscan and vector_decimal_aggregate are related. 
Ashutosh can correct me if I'm wrong.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28791/#review64582
---


On Dec. 10, 2014, 6:09 p.m., Ted Xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/28791/
 ---
 
 (Updated Dec. 10, 2014, 6:09 p.m.)
 
 
 Review request for hive, Ashutosh Chauhan and Chao Sun.
 
 
 Bugs: HIVE-9025
 https://issues.apache.org/jira/browse/HIVE-9025
 
 
 Repository: hive
 
 
 Description
 ---
 
 HIVE-5771 introduced a bug that when all partition columns are constants, the 
 partition is transformed to be a random  dispatch, which is not expected.
 
 This patch adds a constant column in the above case to avoid random 
 partitioning.
 
 
 Diffs
 -
 
   
 http://svn.apache.org/repos/asf/hive/trunk/itests/src/test/resources/testconfiguration.properties
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagateProcFactory.java
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientpositive/constprog_partitioner.q
  PRE-CREATION 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/cluster.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/constprog2.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/constprog_partitioner.q.out
  PRE-CREATION 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/join_nullsafe.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd2.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_clusterby.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_join4.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join5.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/quotedid_basic.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/smb_mapjoin_25.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/tez/dynamic_partition_pruning.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/tez/dynamic_partition_pruning_2.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/tez/join_nullsafe.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/tez/vector_decimal_mapjoin.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/tez/vectorized_dynamic_partition_pruning.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/union27.q.out
  1644497 
 
 Diff: https://reviews.apache.org/r/28791/diff/
 
 
 Testing
 ---
 
 TestCliDriver passed.
 
 
 Thanks,
 
 Ted Xu

Re: Review Request 28791: HIVE-9025 join38.q (without map join) produces incorrect result when testing with multiple reducers

2014-12-10 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28791/#review64666
---

Ship it!


Ship It!

- Chao Sun


On Dec. 11, 2014, 1:36 a.m., Ted Xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/28791/
 ---
 
 (Updated Dec. 11, 2014, 1:36 a.m.)
 
 
 Review request for hive, Ashutosh Chauhan and Chao Sun.
 
 
 Bugs: HIVE-9025
 https://issues.apache.org/jira/browse/HIVE-9025
 
 
 Repository: hive
 
 
 Description
 ---
 
 HIVE-5771 introduced a bug that when all partition columns are constants, the 
 partition is transformed to be a random  dispatch, which is not expected.
 
 This patch adds a constant column in the above case to avoid random 
 partitioning.
 
 
 Diffs
 -
 
   
 http://svn.apache.org/repos/asf/hive/trunk/itests/src/test/resources/testconfiguration.properties
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagateProcFactory.java
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientpositive/constprog_partitioner.q
  PRE-CREATION 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/cluster.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/constprog2.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/constprog_partitioner.q.out
  PRE-CREATION 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/join_nullsafe.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd2.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_clusterby.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_join4.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join5.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/quotedid_basic.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/smb_mapjoin_25.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/tez/dynamic_partition_pruning.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/tez/dynamic_partition_pruning_2.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/tez/join_nullsafe.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/tez/vector_decimal_mapjoin.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/tez/vectorized_dynamic_partition_pruning.q.out
  1644497 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/union27.q.out
  1644497 
 
 Diff: https://reviews.apache.org/r/28791/diff/
 
 
 Testing
 ---
 
 TestCliDriver passed.
 
 
 Thanks,
 
 Ted Xu

Review Request 28889: HIVE-8911 - Enable mapjoin hints [Spark Branch]

2014-12-09 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28889/
---

Review request for hive, Szehon Ho and Xuefu Zhang.


Bugs: HIVE-8911
https://issues.apache.org/jira/browse/HIVE-8911


Repository: hive-git


Description
---

Basically the idea is to reuse as much code as possible, from MR.

The issue is that, in MR's MapJoinProcessor, after join op is converted to 
mapjoin op, all the parents ReduceSinkOperators are removed. However, for our 
Spark branch, we need to preserve those, because they serve as boundaries 
between BaseWorks, and SparkReduceSinkMapJoinProc triggers upon them.

Initially I tried to move this part of logic to SparkMapJoinOptimizer, which 
happens at a later stage. However, although this works, I'm worried it may have 
too much affect on the smb join w/ hint, because we then have to move that part 
of logic to SparkMapJoinOptimizer too. In general, I want to minimize the 
affect on code path.

This patch make changes on MapJoinProcessor. I created a separate method 
convertMapJoinForSpark, which doesn't remove the 
ReduceSinkOperators, for small tables. Then, in the transform method it decides 
which method to call based on the execution engine.

I also have to disable several tests related to smb join w/ hints. They can be 
activated once HIVE-8640 is resolved.


Diffs
-

  data/conf/spark/hive-site.xml 44eac86 
  itests/src/test/resources/testconfiguration.properties d6f8267 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 773c827 
  ql/src/test/results/clientpositive/spark/bucket_map_join_1.q.out f24ae73 
  ql/src/test/results/clientpositive/spark/bucket_map_join_2.q.out 33e9e8b 
  ql/src/test/results/clientpositive/spark/bucketmapjoin1.q.out aaa0151 
  ql/src/test/results/clientpositive/spark/bucketmapjoin10.q.out 9954b77 
  ql/src/test/results/clientpositive/spark/bucketmapjoin11.q.out ad8f0a5 
  ql/src/test/results/clientpositive/spark/bucketmapjoin12.q.out aa3e2b6 
  ql/src/test/results/clientpositive/spark/bucketmapjoin13.q.out 44233f6 
  ql/src/test/results/clientpositive/spark/bucketmapjoin2.q.out c4702ef 
  ql/src/test/results/clientpositive/spark/bucketmapjoin3.q.out 7c31e05 
  ql/src/test/results/clientpositive/spark/bucketmapjoin4.q.out a8e892e 
  ql/src/test/results/clientpositive/spark/bucketmapjoin5.q.out 041ba12 
  ql/src/test/results/clientpositive/spark/bucketmapjoin7.q.out 54c4be3 
  ql/src/test/results/clientpositive/spark/bucketmapjoin8.q.out da9fe1c 
  ql/src/test/results/clientpositive/spark/bucketmapjoin9.q.out 5a5e3f6 
  ql/src/test/results/clientpositive/spark/bucketmapjoin_negative.q.out 5ac3f4c 
  ql/src/test/results/clientpositive/spark/bucketmapjoin_negative2.q.out 
e4ff965 
  ql/src/test/results/clientpositive/spark/bucketmapjoin_negative3.q.out 
fce5566 
  ql/src/test/results/clientpositive/spark/join25.q.out 284c97d 
  ql/src/test/results/clientpositive/spark/join26.q.out e271184 
  ql/src/test/results/clientpositive/spark/join27.q.out d31f29e 
  ql/src/test/results/clientpositive/spark/join30.q.out 7fbbcfa 
  ql/src/test/results/clientpositive/spark/join36.q.out f1317ea 
  ql/src/test/results/clientpositive/spark/join37.q.out 448e983 
  ql/src/test/results/clientpositive/spark/join38.q.out 735d7ea 
  ql/src/test/results/clientpositive/spark/join39.q.out 0734d4b 
  ql/src/test/results/clientpositive/spark/join40.q.out 60ef13d 
  ql/src/test/results/clientpositive/spark/join_map_ppr.q.out 59fdb99 
  ql/src/test/results/clientpositive/spark/mapjoin1.q.out 80e38b9 
  ql/src/test/results/clientpositive/spark/mapjoin_distinct.q.out dc7241c 
  ql/src/test/results/clientpositive/spark/mapjoin_filter_on_outerjoin.q.out 
3b80437 
  ql/src/test/results/clientpositive/spark/mapjoin_test_outer.q.out fdf8f24 
  ql/src/test/results/clientpositive/spark/semijoin.q.out 2b8e04b 
  ql/src/test/results/clientpositive/spark/skewjoin.q.out 56b78be 

Diff: https://reviews.apache.org/r/28889/diff/


Testing
---

bucket_map_join_1.q
bucket_map_join_2.q
bucketmapjoin1.q
bucketmapjoin10.q
bucketmapjoin11.q
bucketmapjoin12.q
bucketmapjoin13.q
bucketmapjoin2.q
bucketmapjoin3.q
bucketmapjoin4.q
bucketmapjoin5.q
bucketmapjoin7.q
bucketmapjoin8.q
bucketmapjoin9.q
bucketmapjoin_negative.q
bucketmapjoin_negative2.q
column_access_stats.q
join25.q
join26.q
join27.q
join30.q
join36.q
join37.q
join38.q
join39.q
join40.q
join_empty.q
join_filters_overlap.q
join_map_ppr.q
mapjoin1.q
mapjoin_distinct.q
mapjoin_filter_onerjoin.q
mapjoin_hook.q
mapjoin_tester.q
semijoin.q
skewjoin.q
table_access_keys_stats.q


Thanks,

Chao Sun

Re: Review Request 28791: HIVE-9025 join38.q (without map join) produces incorrect result when testing with multiple reducers

2014-12-08 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28791/#review64242
---


Hi, [~tedxu], thanks for the quick work! I just have one minor question: do you 
think it would be good to have a new test case for this? Maybe someone that 
just like join38.q, but uses common join and set number of reducers to be a 
number greater than one?

- Chao Sun


On Dec. 7, 2014, 9:30 a.m., Ted Xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/28791/
 ---
 
 (Updated Dec. 7, 2014, 9:30 a.m.)
 
 
 Review request for hive, Ashutosh Chauhan and Chao Sun.
 
 
 Bugs: HIVE-9025
 https://issues.apache.org/jira/browse/HIVE-9025
 
 
 Repository: hive
 
 
 Description
 ---
 
 HIVE-5771 introduced a bug that when all partition columns are constants, the 
 partition is transformed to be a random  dispatch, which is not expected.
 
 This patch adds a constant column in the above case to avoid random 
 partitioning.
 
 
 Diffs
 -
 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagateProcFactory.java
  1643530 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/cluster.q.out
  1643530 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/constprog2.q.out
  1643530 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/join_nullsafe.q.out
  1643530 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd2.q.out
  1643530 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_clusterby.q.out
  1643530 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_join4.q.out
  1643530 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join5.q.out
  1643530 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/quotedid_basic.q.out
  1643530 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/smb_mapjoin_25.q.out
  1643530 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/union27.q.out
  1643530 
 
 Diff: https://reviews.apache.org/r/28791/diff/
 
 
 Testing
 ---
 
 TestCliDriver passed.
 
 
 Thanks,
 
 Ted Xu

Re: Review Request 28791: HIVE-9025 join38.q (without map join) produces incorrect result when testing with multiple reducers

2014-12-08 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28791/#review64244
---


Also, some golden files for tez branch need to be updated.

- Chao Sun


On Dec. 7, 2014, 9:30 a.m., Ted Xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/28791/
 ---
 
 (Updated Dec. 7, 2014, 9:30 a.m.)
 
 
 Review request for hive, Ashutosh Chauhan and Chao Sun.
 
 
 Bugs: HIVE-9025
 https://issues.apache.org/jira/browse/HIVE-9025
 
 
 Repository: hive
 
 
 Description
 ---
 
 HIVE-5771 introduced a bug that when all partition columns are constants, the 
 partition is transformed to be a random  dispatch, which is not expected.
 
 This patch adds a constant column in the above case to avoid random 
 partitioning.
 
 
 Diffs
 -
 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagateProcFactory.java
  1643530 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/cluster.q.out
  1643530 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/constprog2.q.out
  1643530 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/join_nullsafe.q.out
  1643530 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd2.q.out
  1643530 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_clusterby.q.out
  1643530 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_join4.q.out
  1643530 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join5.q.out
  1643530 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/quotedid_basic.q.out
  1643530 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/smb_mapjoin_25.q.out
  1643530 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/union27.q.out
  1643530 
 
 Diff: https://reviews.apache.org/r/28791/diff/
 
 
 Testing
 ---
 
 TestCliDriver passed.
 
 
 Thanks,
 
 Ted Xu

Re: Review Request 28727: HIVE-8638 Implement bucket map join optimization [Spark Branch]

2014-12-05 Thread Chao Sun



 On Dec. 5, 2014, 2:27 a.m., Chao Sun wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java,
   line 111
  https://reviews.apache.org/r/28727/diff/1/?file=782895#file782895line111
 
  why check twice here?
 
 Jimmy Xiang wrote:
 estimatedBuckets could = 0 too.

Sorry, you are right. My mistake.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28727/#review63952
---


On Dec. 4, 2014, 11:38 p.m., Jimmy Xiang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/28727/
 ---
 
 (Updated Dec. 4, 2014, 11:38 p.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-8638
 https://issues.apache.org/jira/browse/HIVE-8638
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Patch v3 that works when bucket number matches
 
 
 Diffs
 -
 
   itests/src/test/resources/testconfiguration.properties 09c667e 
   ql/src/java/org/apache/hadoop/hive/ql/exec/SparkHashTableSinkOperator.java 
 cfc1501 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java
  2f9e55a 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
  4054173 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkBucketJoinProcCtx.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java
  8b78123 
   ql/src/test/queries/clientpositive/bucket_map_join_spark1.q PRE-CREATION 
   ql/src/test/queries/clientpositive/bucket_map_join_spark2.q PRE-CREATION 
   ql/src/test/results/clientpositive/bucket_map_join_spark1.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/bucket_map_join_spark2.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/bucket_map_join_spark1.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/bucket_map_join_spark2.q.out 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/28727/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Jimmy Xiang

Re: Review Request 28727: HIVE-8638 Implement bucket map join optimization [Spark Branch]

2014-12-05 Thread Chao Sun



 On Dec. 5, 2014, 2:27 a.m., Chao Sun wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java,
   line 96
  https://reviews.apache.org/r/28727/diff/1/?file=782893#file782893line96
 
  I'm wondering if we can get rid of containsOp, and replace with this 
  one.
 
 Jimmy Xiang wrote:
 containsOp is used in many places. It's better to keep it. I changed 
 getOp a little so that getOp and containsOp share the same logic.

Sounds good.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28727/#review63952
---


On Dec. 4, 2014, 11:38 p.m., Jimmy Xiang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/28727/
 ---
 
 (Updated Dec. 4, 2014, 11:38 p.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-8638
 https://issues.apache.org/jira/browse/HIVE-8638
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Patch v3 that works when bucket number matches
 
 
 Diffs
 -
 
   itests/src/test/resources/testconfiguration.properties 09c667e 
   ql/src/java/org/apache/hadoop/hive/ql/exec/SparkHashTableSinkOperator.java 
 cfc1501 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java
  2f9e55a 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
  4054173 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkBucketJoinProcCtx.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java
  8b78123 
   ql/src/test/queries/clientpositive/bucket_map_join_spark1.q PRE-CREATION 
   ql/src/test/queries/clientpositive/bucket_map_join_spark2.q PRE-CREATION 
   ql/src/test/results/clientpositive/bucket_map_join_spark1.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/bucket_map_join_spark2.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/bucket_map_join_spark1.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/bucket_map_join_spark2.q.out 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/28727/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Jimmy Xiang

Re: Review Request 28727: HIVE-8638 Implement bucket map join optimization [Spark Branch]

2014-12-04 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28727/#review63952
---



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
https://reviews.apache.org/r/28727/#comment106303

I'm wondering if we can get rid of containsOp, and replace with this one.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
https://reviews.apache.org/r/28727/#comment106304

trailing whitespace.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java
https://reviews.apache.org/r/28727/#comment106305

should have space between  and parentOp.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java
https://reviews.apache.org/r/28727/#comment106306

why check twice here?


- Chao Sun


On Dec. 4, 2014, 11:38 p.m., Jimmy Xiang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/28727/
 ---
 
 (Updated Dec. 4, 2014, 11:38 p.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-8638
 https://issues.apache.org/jira/browse/HIVE-8638
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Patch v3 that works when bucket number matches
 
 
 Diffs
 -
 
   itests/src/test/resources/testconfiguration.properties 09c667e 
   ql/src/java/org/apache/hadoop/hive/ql/exec/SparkHashTableSinkOperator.java 
 cfc1501 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java
  2f9e55a 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
  4054173 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkBucketJoinProcCtx.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java
  8b78123 
   ql/src/test/queries/clientpositive/bucket_map_join_spark1.q PRE-CREATION 
   ql/src/test/queries/clientpositive/bucket_map_join_spark2.q PRE-CREATION 
   ql/src/test/results/clientpositive/bucket_map_join_spark1.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/bucket_map_join_spark2.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/bucket_map_join_spark1.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/bucket_map_join_spark2.q.out 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/28727/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Jimmy Xiang

Review Request 28464: HIVE-8934 - Investigate test failure on bucketmapjoin10.q and bucketmapjoin11.q [Spark Branch]

2014-11-25 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28464/
---

Review request for hive, Jimmy Xiang, Szehon Ho, and Xuefu Zhang.


Bugs: HIVE-8934
https://issues.apache.org/jira/browse/HIVE-8934


Repository: hive-git


Description
---

With MapJoin enabled, these two tests will generate incorrect results.
This seem to be related to the HiveInputFormat that these two are using.
We need to investigate the issue.


Diffs
-

  itests/src/test/resources/testconfiguration.properties 38380fb 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinEagerRowContainer.java
 65bb1b7 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java
 eddbf18 
  ql/src/test/results/clientpositive/spark/bucketmapjoin10.q.out 4188ad8 
  ql/src/test/results/clientpositive/spark/bucketmapjoin11.q.out e4a98ba 

Diff: https://reviews.apache.org/r/28464/diff/


Testing
---

bucketmapjoin10.q and bucketmapjoin11.q now return correct results


Thanks,

Chao Sun

Review Request 28299: HIVE-8921 - Investigate test failure on auto_join2.q [Spark Branch]

2014-11-20 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28299/
---

Review request for hive, Jimmy Xiang and Szehon Ho.


Bugs: HIVE-8921
https://issues.apache.org/jira/browse/HIVE-8921


Repository: hive-git


Description
---

Running this test, sometimes it produce the correct result, sometimes it just 
produce NULL. Looks like there's some concurrency issue.


Diffs
-

  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
 96481f1 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapredLocalWork.java 6fbdcd2 

Diff: https://reviews.apache.org/r/28299/diff/


Testing
---


Thanks,

Chao Sun

Review Request 28307: HIVE-8908 - Investigate test failure on join34.q [Spark Branch]

2014-11-20 Thread Chao Sun

output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: default.dest_j1
Local Work:
  Map Reduce Local Work
Union 2 
Vertex: Union 2

  Stage: Stage-2
Dependency Collection

  Stage: Stage-0
Move Operator
  tables:
  replace: true
  table:
  input format: org.apache.hadoop.mapred.TextInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
  name: default.dest_j1

  Stage: Stage-3
Stats-Aggr Operator

  Stage: Stage-5
Spark
  DagName: chao_20141118150101_a47a2d7b-e750-4764-be66-5ba95ebbe433:5
  Vertices:
Map 4 
Map Operator Tree:
TableScan
  alias: x
  Statistics: Num rows: 1 Data size: 216 Basic stats: COMPLETE 
Column stats: NONE
  Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 1 Data size: 216 Basic stats: 
COMPLETE Column stats: NONE
Spark HashTable Sink Operator
  condition expressions:
0 {_col1}
1 {value}
  keys:
0 _col0 (type: string)
1 key (type: string)
Reduce Output Operator
  key expressions: key (type: string)
  sort order: +
  Map-reduce partition columns: key (type: string)
  Statistics: Num rows: 1 Data size: 216 Basic stats: 
COMPLETE Column stats: NONE
  value expressions: value (type: string)
Local Work:
  Map Reduce Local Work

Time taken: 0.127 seconds, Fetched: 156 row(s)
Note that Stage-4 and Stage-5 are identical. Also, in Stage-4 there's a 
parallel RS operator with the HTS operator, which is strange.


Diffs
-

  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java
 4bfc26f 

Diff: https://reviews.apache.org/r/28307/diff/


Testing
---


Thanks,

Chao Sun

Re: Review Request 28299: HIVE-8921 - Investigate test failure on auto_join2.q [Spark Branch]

2014-11-20 Thread Chao Sun



 On Nov. 20, 2014, 9:56 p.m., Jimmy Xiang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapredLocalWork.java, line 67
  https://reviews.apache.org/r/28299/diff/1/?file=771588#file771588line67
 
  Should be other way around, i.e., the default constructor call this 
  one: this(new LinkHashMap...)

OK, will do in this way.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28299/#review62437
---


On Nov. 20, 2014, 9:43 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/28299/
 ---
 
 (Updated Nov. 20, 2014, 9:43 p.m.)
 
 
 Review request for hive, Jimmy Xiang and Szehon Ho.
 
 
 Bugs: HIVE-8921
 https://issues.apache.org/jira/browse/HIVE-8921
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Running this test, sometimes it produce the correct result, sometimes it just 
 produce NULL. Looks like there's some concurrency issue.
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
  96481f1 
   ql/src/java/org/apache/hadoop/hive/ql/plan/MapredLocalWork.java 6fbdcd2 
 
 Diff: https://reviews.apache.org/r/28299/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Chao Sun

Re: Review Request 28299: HIVE-8921 - Investigate test failure on auto_join2.q [Spark Branch]

2014-11-20 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28299/
---

(Updated Nov. 21, 2014, 1:28 a.m.)


Review request for hive, Jimmy Xiang and Szehon Ho.


Bugs: HIVE-8921
https://issues.apache.org/jira/browse/HIVE-8921


Repository: hive-git


Description
---

Running this test, sometimes it produce the correct result, sometimes it just 
produce NULL. Looks like there's some concurrency issue.


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
 96481f1 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapredLocalWork.java 6fbdcd2 

Diff: https://reviews.apache.org/r/28299/diff/


Testing
---


Thanks,

Chao Sun

Re: Review Request 28145: HIVE-8883 - Investigate test failures on auto_join30.q [Spark Branch]

2014-11-19 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28145/
---

(Updated Nov. 19, 2014, 11:35 p.m.)


Review request for hive, Jimmy Xiang and Szehon Ho.


Bugs: HIVE-8883
https://issues.apache.org/jira/browse/HIVE-8883


Repository: hive-git


Description
---

This test fails with the following stack trace:
java.lang.NullPointerException
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
  at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
  at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319)
  at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276)
  at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48)
  at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
  at 
org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96)
  at 
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
  at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214)
  at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
  at org.apache.spark.scheduler.Task.run(Task.scala:56)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
2014-11-14 17:05:09,206 ERROR [Executor task launch worker-4]: 
spark.SparkReduceRecordHandler (SparkReduceRecordHandler.java:processRow(285)) 
- org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row (tag=0) {key:{reducesinkkey0:val_0},value:{_col0:0}}
  at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:328)
  at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276)
  at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48)
  at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
  at 
org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96)
  at 
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
  at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214)
  at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
  at org.apache.spark.scheduler.Task.run(Task.scala:56)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected 
exception: null
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:318)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
  at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
  at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319)
  ... 14 more
Caused by: java.lang.NullPointerException
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257)
  ... 17 more
auto_join27.q and auto_join31.q seem to fail with the same error.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java 2895d80 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java 
141ae6f 

Diff: https://reviews.apache.org/r/28145/diff/


Testing
---

Tested with auto_join30.q, auto_join31.q, and auto_join27.q. They now generates 
correct results.


Thanks,

Chao Sun

Re: Review Request 28145: HIVE-8883 - Investigate test failures on auto_join30.q [Spark Branch]

2014-11-19 Thread Chao Sun



 On Nov. 19, 2014, 11:50 p.m., Jimmy Xiang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java, line 
  74
  https://reviews.apache.org/r/28145/diff/3/?file=770558#file770558line74
 
  We don't need this any more?

I was thinking about cleaning it and then restoring the code in the non-staged 
map join JIRA. But, after talking with Szehon, I decided to keep it anyway.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28145/#review62285
---


On Nov. 19, 2014, 11:35 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/28145/
 ---
 
 (Updated Nov. 19, 2014, 11:35 p.m.)
 
 
 Review request for hive, Jimmy Xiang and Szehon Ho.
 
 
 Bugs: HIVE-8883
 https://issues.apache.org/jira/browse/HIVE-8883
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 This test fails with the following stack trace:
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at 
 org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214)
   at 
 org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 2014-11-14 17:05:09,206 ERROR [Executor task launch worker-4]: 
 spark.SparkReduceRecordHandler 
 (SparkReduceRecordHandler.java:processRow(285)) - 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row (tag=0) 
 {key:{reducesinkkey0:val_0},value:{_col0:0}}
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:328)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at 
 org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214)
   at 
 org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected 
 exception: null
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:318)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84

Re: Review Request 28145: HIVE-8883 - Investigate test failures on auto_join30.q [Spark Branch]

2014-11-19 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28145/
---

(Updated Nov. 19, 2014, 11:57 p.m.)


Review request for hive, Jimmy Xiang and Szehon Ho.


Bugs: HIVE-8883
https://issues.apache.org/jira/browse/HIVE-8883


Repository: hive-git


Description
---

This test fails with the following stack trace:
java.lang.NullPointerException
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
  at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
  at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319)
  at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276)
  at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48)
  at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
  at 
org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96)
  at 
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
  at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214)
  at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
  at org.apache.spark.scheduler.Task.run(Task.scala:56)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
2014-11-14 17:05:09,206 ERROR [Executor task launch worker-4]: 
spark.SparkReduceRecordHandler (SparkReduceRecordHandler.java:processRow(285)) 
- org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row (tag=0) {key:{reducesinkkey0:val_0},value:{_col0:0}}
  at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:328)
  at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276)
  at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48)
  at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
  at 
org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96)
  at 
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
  at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214)
  at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
  at org.apache.spark.scheduler.Task.run(Task.scala:56)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected 
exception: null
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:318)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
  at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
  at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319)
  ... 14 more
Caused by: java.lang.NullPointerException
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257)
  ... 17 more
auto_join27.q and auto_join31.q seem to fail with the same error.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java 2895d80 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java 
141ae6f 

Diff: https://reviews.apache.org/r/28145/diff/


Testing
---

Tested with auto_join30.q, auto_join31.q, and auto_join27.q. They now generates 
correct results.


Thanks,

Chao Sun

Review Request 28145: HIVE-8883 - Investigate test failures on auto_join30.q [Spark Branch]

2014-11-17 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28145/
---

Review request for hive, Jimmy Xiang and Szehon Ho.


Bugs: HIVE-8883
https://issues.apache.org/jira/browse/HIVE-8883


Repository: hive-git


Description
---

This test fails with the following stack trace:
java.lang.NullPointerException
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
  at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
  at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319)
  at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276)
  at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48)
  at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
  at 
org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96)
  at 
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
  at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214)
  at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
  at org.apache.spark.scheduler.Task.run(Task.scala:56)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
2014-11-14 17:05:09,206 ERROR [Executor task launch worker-4]: 
spark.SparkReduceRecordHandler (SparkReduceRecordHandler.java:processRow(285)) 
- org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row (tag=0) {key:{reducesinkkey0:val_0},value:{_col0:0}}
  at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:328)
  at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276)
  at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48)
  at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
  at 
org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96)
  at 
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
  at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214)
  at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
  at org.apache.spark.scheduler.Task.run(Task.scala:56)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected 
exception: null
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:318)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
  at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
  at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319)
  ... 14 more
Caused by: java.lang.NullPointerException
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257)
  ... 17 more
auto_join27.q and auto_join31.q seem to fail with the same error.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java 2895d80 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java 
141ae6f 

Diff: https://reviews.apache.org/r/28145/diff/


Testing
---

Tested with auto_join30.q, auto_join31.q, and auto_join27.q. They now generates 
correct results.


Thanks,

Chao Sun

Re: Review Request 28145: HIVE-8883 - Investigate test failures on auto_join30.q [Spark Branch]

2014-11-17 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28145/
---

(Updated Nov. 18, 2014, 2:51 a.m.)


Review request for hive, Jimmy Xiang and Szehon Ho.


Changes
---

Last patch failed because of upstream change on HashTableLoader#load(). Now 
fixed.


Bugs: HIVE-8883
https://issues.apache.org/jira/browse/HIVE-8883


Repository: hive-git


Description
---

This test fails with the following stack trace:
java.lang.NullPointerException
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
  at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
  at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319)
  at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276)
  at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48)
  at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
  at 
org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96)
  at 
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
  at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214)
  at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
  at org.apache.spark.scheduler.Task.run(Task.scala:56)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
2014-11-14 17:05:09,206 ERROR [Executor task launch worker-4]: 
spark.SparkReduceRecordHandler (SparkReduceRecordHandler.java:processRow(285)) 
- org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row (tag=0) {key:{reducesinkkey0:val_0},value:{_col0:0}}
  at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:328)
  at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276)
  at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48)
  at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
  at 
org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96)
  at 
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
  at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214)
  at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
  at org.apache.spark.scheduler.Task.run(Task.scala:56)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected 
exception: null
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:318)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
  at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
  at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319)
  ... 14 more
Caused by: java.lang.NullPointerException
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257)
  ... 17 more
auto_join27.q and auto_join31.q seem to fail with the same error.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java 2895d80 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java 
141ae6f 

Diff: https://reviews.apache.org/r/28145/diff/


Testing
---

Tested with auto_join30.q, auto_join31.q, and auto_join27.q. They now generates 
correct results.


Thanks,

Chao Sun

Review Request 28045: HIVE-8865 - Needs to set hashTableMemoryUsage for MapJoinDesc [Spark Branch]

2014-11-14 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28045/
---

Review request for hive, Szehon Ho and Xuefu Zhang.


Bugs: HIVE-8865
https://issues.apache.org/jira/browse/HIVE-8865


Repository: hive-git


Description
---

If this part is not done, hashTableMemoryUsage is always 0.0, which will cause 
MapJoinMemoryExhaustionException.


Diffs
-

  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkReduceSinkMapJoinProc.java
 83d54bd 

Diff: https://reviews.apache.org/r/28045/diff/


Testing
---


Thanks,

Chao Sun

Review Request 28051: HIVE-8860 - Populate ExecMapperContext in SparkReduceRecordHandler [Spark Branch]

2014-11-14 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28051/
---

Review request for hive, Szehon Ho and Xuefu Zhang.


Bugs: HIVE-8860
https://issues.apache.org/jira/browse/HIVE-8860


Repository: hive-git


Description
---

Currently, only SparkMapRecordHandler populates this information. However, 
since in Spark branch a HashTableSinkOperator could also appear in a 
ReduceWork, and it needs to have a ExecMapperContext to get a MapredLocalWork, 
we need to do the same thing in SparkReduceRecordHandler as well.


Diffs
-

  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java 
21ac7ab 

Diff: https://reviews.apache.org/r/28051/diff/


Testing
---


Thanks,

Chao Sun

Re: Review Request 28064: HIVE-8844 Choose a persisent policy for RDD caching [Spark Branch]

2014-11-14 Thread Chao Sun



 On Nov. 15, 2014, 2:34 a.m., Szehon Ho wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ShuffleTran.java, line 39
  https://reviews.apache.org/r/28064/diff/1/?file=764642#file764642line39
 
  OK, does spark handle that if we pass NONE in by doing no-op?   If 
  that's the case, then maybe cleaner for our code in that case.  I'm a bit 
  confused what NONE means. 
  
  If we dont want to call NONE due to side-effects, can we just change 
  the HadoopRDD call to:
  
  storageHandler.equals(StorageHandler.NONE) ? hadoopRdd : ...
  
  Then the logic is centralized to there.
 
 Jimmy Xiang wrote:
 Sure. Will fix it as suggested. Thanks.

persist() also register the RDD for GC clean up, but there seem to have no 
extra cost besides that. Either way is fine to me.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28064/#review61616
---


On Nov. 15, 2014, 12:32 a.m., Jimmy Xiang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/28064/
 ---
 
 (Updated Nov. 15, 2014, 12:32 a.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-8844
 https://issues.apache.org/jira/browse/HIVE-8844
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Changed spark cache policy to be configurable with default memory+disk.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 79baea7 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ShuffleTran.java 8565ba0 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
 11f4236 
 
 Diff: https://reviews.apache.org/r/28064/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Jimmy Xiang

Review Request 28017: HIVE-8776 - Generate MapredLocalWork in SparkMapJoinResolver [Spark Brach]

2014-11-13 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28017/
---

Review request for hive, Jimmy Xiang, Szehon Ho, and Xuefu Zhang.


Bugs: HIVE-8776
https://issues.apache.org/jira/browse/HIVE-8776


Repository: hive-git


Description
---

In SparkMapJoinResolver, we need to populate MapredLocalWork for all MapWorks 
with MapJoinOperator. It is needed later in HashTableLoader, for example, to 
retrieve small hash tables and direct fetch tables.
We need to set up information, such as aliasToWork, aliasToFetchWork, 
directFetchOp, inputFileChangeSensitive, tmpPath, etc., for the new local works.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java d30ae51 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
 4b9a6cb 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapredLocalWork.java 785e4a0 

Diff: https://reviews.apache.org/r/28017/diff/


Testing
---


Thanks,

Chao Sun

Re: Review Request 28017: HIVE-8776 - Generate MapredLocalWork in SparkMapJoinResolver [Spark Brach]

2014-11-13 Thread Chao Sun



 On Nov. 14, 2014, 1:53 a.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java,
   line 138
  https://reviews.apache.org/r/28017/diff/1/?file=763012#file763012line138
 
  currentTask seems to be the container for sparkWork. Do we need to pass 
  in both of them?
  BTW, currentTask seems to be a misleading varaible name.

Good point. I always forget this.. Changed the name to originalTask.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28017/#review61373
---


On Nov. 14, 2014, 12:03 a.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/28017/
 ---
 
 (Updated Nov. 14, 2014, 12:03 a.m.)
 
 
 Review request for hive, Jimmy Xiang, Szehon Ho, and Xuefu Zhang.
 
 
 Bugs: HIVE-8776
 https://issues.apache.org/jira/browse/HIVE-8776
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 In SparkMapJoinResolver, we need to populate MapredLocalWork for all MapWorks 
 with MapJoinOperator. It is needed later in HashTableLoader, for example, to 
 retrieve small hash tables and direct fetch tables.
 We need to set up information, such as aliasToWork, aliasToFetchWork, 
 directFetchOp, inputFileChangeSensitive, tmpPath, etc., for the new local 
 works.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java 
 d30ae51 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
  4b9a6cb 
   ql/src/java/org/apache/hadoop/hive/ql/plan/MapredLocalWork.java 785e4a0 
 
 Diff: https://reviews.apache.org/r/28017/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Chao Sun

Re: Review Request 28017: HIVE-8776 - Generate MapredLocalWork in SparkMapJoinResolver [Spark Brach]

2014-11-13 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28017/
---

(Updated Nov. 14, 2014, 2:43 a.m.)


Review request for hive, Jimmy Xiang, Szehon Ho, and Xuefu Zhang.


Changes
---

Thanks Xuefu for the comments!


Bugs: HIVE-8776
https://issues.apache.org/jira/browse/HIVE-8776


Repository: hive-git


Description
---

In SparkMapJoinResolver, we need to populate MapredLocalWork for all MapWorks 
with MapJoinOperator. It is needed later in HashTableLoader, for example, to 
retrieve small hash tables and direct fetch tables.
We need to set up information, such as aliasToWork, aliasToFetchWork, 
directFetchOp, inputFileChangeSensitive, tmpPath, etc., for the new local works.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java d30ae51 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
 9ce1a18 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapredLocalWork.java 785e4a0 

Diff: https://reviews.apache.org/r/28017/diff/


Testing
---


Thanks,

Chao Sun

Re: Review Request 27933: HIVE-8810 Make HashTableSinkOperator works for Spark Branch [Spark Branch]

2014-11-12 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27933/#review61150
---



ql/src/java/org/apache/hadoop/hive/ql/exec/SparkHashTableSinkOperator.java
https://reviews.apache.org/r/27933/#comment102640

Don't need this check anymore.



ql/src/java/org/apache/hadoop/hive/ql/exec/SparkHashTableSinkOperator.java
https://reviews.apache.org/r/27933/#comment102639

Can we use SPARKHASHTABLESINK, or something similar?


- Chao Sun


On Nov. 12, 2014, 11:58 p.m., Jimmy Xiang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/27933/
 ---
 
 (Updated Nov. 12, 2014, 11:58 p.m.)
 
 
 Review request for hive, Chao Sun, Szehon Ho, and Xuefu Zhang.
 
 
 Bugs: HIVE-8810
 https://issues.apache.org/jira/browse/HIVE-8810
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Fixed the Spark HashTableSinkOperator
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java 
 78d9012 
   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java f1c3564 
   ql/src/java/org/apache/hadoop/hive/ql/exec/SparkHashTableSinkOperator.java 
 PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkReduceSinkMapJoinProc.java
  a58a6c5 
   ql/src/java/org/apache/hadoop/hive/ql/plan/SparkHashTableSinkDesc.java 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/27933/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Jimmy Xiang

Re: Review Request 27933: HIVE-8810 Make HashTableSinkOperator works for Spark Branch [Spark Branch]

2014-11-12 Thread Chao Sun



 On Nov. 13, 2014, 12:34 a.m., Chao Sun wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/SparkHashTableSinkOperator.java, 
  line 326
  https://reviews.apache.org/r/27933/diff/2/?file=760734#file760734line326
 
  Can we use SPARKHASHTABLESINK, or something similar?
 
 Jimmy Xiang wrote:
 Does this need match the Operator type?

I think these two are not related. Somebody can correct me if I'm wrong.
One potential issue with using the same name is that RuleRegExp may become 
harder to define.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27933/#review61150
---


On Nov. 12, 2014, 11:58 p.m., Jimmy Xiang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/27933/
 ---
 
 (Updated Nov. 12, 2014, 11:58 p.m.)
 
 
 Review request for hive, Chao Sun, Szehon Ho, and Xuefu Zhang.
 
 
 Bugs: HIVE-8810
 https://issues.apache.org/jira/browse/HIVE-8810
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Fixed the Spark HashTableSinkOperator
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java 
 78d9012 
   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java f1c3564 
   ql/src/java/org/apache/hadoop/hive/ql/exec/SparkHashTableSinkOperator.java 
 PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkReduceSinkMapJoinProc.java
  a58a6c5 
   ql/src/java/org/apache/hadoop/hive/ql/plan/SparkHashTableSinkDesc.java 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/27933/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Jimmy Xiang

Review Request 27955: HIVE-8842 - auto_join2.q produces incorrect tree [Spark Branch]

2014-11-12 Thread Chao Sun

: string), _col10 (type: string), 
_col11 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
_col5
Statistics: Num rows: 16 Data size: 3306 Basic stats: 
COMPLETE Column stats: NONE
File Output Operator
  compressed: false
  Statistics: Num rows: 16 Data size: 3306 Basic stats: 
COMPLETE Column stats: NONE
  table:
  input format: 
org.apache.hadoop.mapred.TextInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-7
Spark
  DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:3
  Vertices:
Map 1 
Map Operator Tree:
TableScan
  alias: src2
  Statistics: Num rows: 29 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
  Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 15 Data size: 3006 Basic stats: 
COMPLETE Column stats: NONE
HashTable Sink Operator
  condition expressions:
0 {key} {value}
1 {key} {value}
  keys:
0 key (type: string)
1 key (type: string)

  Stage: Stage-6
Spark
  DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:2
  Vertices:
Map 3 
Map Operator Tree:
TableScan
  alias: src1
  Statistics: Num rows: 29 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
  Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 15 Data size: 3006 Basic stats: 
COMPLETE Column stats: NONE
Map Join Operator
  condition map:
   Inner Join 0 to 1
  condition expressions:
0 {key} {value}
1 {key} {value}
  keys:
0 key (type: string)
1 key (type: string)
  outputColumnNames: _col0, _col1, _col5, _col6
  input vertices:
1 Map 1
  Statistics: Num rows: 16 Data size: 3306 Basic stats: 
COMPLETE Column stats: NONE
  Filter Operator
predicate: (_col0 + _col5) is not null (type: boolean)
Statistics: Num rows: 8 Data size: 1653 Basic stats: 
COMPLETE Column stats: NONE
HashTable Sink Operator
  condition expressions:
0 {_col0} {_col1} {_col5} {_col6}
1 {key} {value}
  keys:
0 (_col0 + _col5) (type: double)
1 UDFToDouble(key) (type: double)

  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
ListSink
{noformat}


Diffs
-

  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
 a8b7ac6 

Diff: https://reviews.apache.org/r/27955/diff/


Testing
---


Thanks,

Chao Sun

Re: Review Request 27955: HIVE-8842 - auto_join2.q produces incorrect tree [Spark Branch]

2014-11-12 Thread Chao Sun

: _col0 (type: string), _col1 (type: 
string), _col5 (type: string), _col6 (type: string), _col10 (type: string), 
_col11 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
_col5
Statistics: Num rows: 16 Data size: 3306 Basic stats: 
COMPLETE Column stats: NONE
File Output Operator
  compressed: false
  Statistics: Num rows: 16 Data size: 3306 Basic stats: 
COMPLETE Column stats: NONE
  table:
  input format: 
org.apache.hadoop.mapred.TextInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-7
Spark
  DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:3
  Vertices:
Map 1 
Map Operator Tree:
TableScan
  alias: src2
  Statistics: Num rows: 29 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
  Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 15 Data size: 3006 Basic stats: 
COMPLETE Column stats: NONE
HashTable Sink Operator
  condition expressions:
0 {key} {value}
1 {key} {value}
  keys:
0 key (type: string)
1 key (type: string)

  Stage: Stage-6
Spark
  DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:2
  Vertices:
Map 3 
Map Operator Tree:
TableScan
  alias: src1
  Statistics: Num rows: 29 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
  Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 15 Data size: 3006 Basic stats: 
COMPLETE Column stats: NONE
Map Join Operator
  condition map:
   Inner Join 0 to 1
  condition expressions:
0 {key} {value}
1 {key} {value}
  keys:
0 key (type: string)
1 key (type: string)
  outputColumnNames: _col0, _col1, _col5, _col6
  input vertices:
1 Map 1
  Statistics: Num rows: 16 Data size: 3306 Basic stats: 
COMPLETE Column stats: NONE
  Filter Operator
predicate: (_col0 + _col5) is not null (type: boolean)
Statistics: Num rows: 8 Data size: 1653 Basic stats: 
COMPLETE Column stats: NONE
HashTable Sink Operator
  condition expressions:
0 {_col0} {_col1} {_col5} {_col6}
1 {key} {value}
  keys:
0 (_col0 + _col5) (type: double)
1 UDFToDouble(key) (type: double)

  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
ListSink
{noformat}


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
 a8b7ac6 

Diff: https://reviews.apache.org/r/27955/diff/


Testing
---


Thanks,

Chao Sun

Re: Review Request 27955: HIVE-8842 - auto_join2.q produces incorrect tree [Spark Branch]

2014-11-12 Thread Chao Sun



 On Nov. 13, 2014, 3:56 a.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java,
   line 136
  https://reviews.apache.org/r/27955/diff/1/?file=760901#file760901line136
 
  It seems that originalWork is the work enclosed in originalTask. Do we 
  really need both as parameters?

You're right - originalWork is redundant. Let me change it.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27955/#review61198
---


On Nov. 13, 2014, 2:29 a.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/27955/
 ---
 
 (Updated Nov. 13, 2014, 2:29 a.m.)
 
 
 Review request for hive, Szehon Ho and Xuefu Zhang.
 
 
 Bugs: HIVE-8842
 https://issues.apache.org/jira/browse/HIVE-8842
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Enabling the SparkMapJoinResolver and SparkReduceSinkMapJoinProc, I see the 
 following:
 explain select * from src src1 JOIN src src2 ON (src1.key = src2.key) JOIN 
 src src3 ON (src1.key + src2.key = src3.key);
 Enabling the SparkMapJoinResolver and SparkReduceSinkMapJoinProc, I see the 
 following:
 
 {noformat}
 explain select * from src src1 JOIN src src2 ON (src1.key = src2.key) JOIN 
 src src3 ON (src1.key + src2.key = src3.key);
 {noformat}
 
 produces too many stages (six), and too many HashTableSink.
 {noformat}
 STAGE DEPENDENCIES:
   Stage-5 is a root stage
   Stage-4 depends on stages: Stage-5
   Stage-3 depends on stages: Stage-4
   Stage-7 is a root stage
   Stage-6 depends on stages: Stage-7
   Stage-0 is a root stage
 
 STAGE PLANS:
   Stage: Stage-5
 Spark
   DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:3
   Vertices:
 Map 1 
 Map Operator Tree:
 TableScan
   alias: src2
   Statistics: Num rows: 29 Data size: 5812 Basic stats: 
 COMPLETE Column stats: NONE
   Filter Operator
 predicate: key is not null (type: boolean)
 Statistics: Num rows: 15 Data size: 3006 Basic stats: 
 COMPLETE Column stats: NONE
 HashTable Sink Operator
   condition expressions:
 0 {key} {value}
 1 {key} {value}
   keys:
 0 key (type: string)
 1 key (type: string)
 
   Stage: Stage-4
 Spark
   DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:2
   Vertices:
 Map 3 
 Map Operator Tree:
 TableScan
   alias: src1
   Statistics: Num rows: 29 Data size: 5812 Basic stats: 
 COMPLETE Column stats: NONE
   Filter Operator
 predicate: key is not null (type: boolean)
 Statistics: Num rows: 15 Data size: 3006 Basic stats: 
 COMPLETE Column stats: NONE
 Map Join Operator
   condition map:
Inner Join 0 to 1
   condition expressions:
 0 {key} {value}
 1 {key} {value}
   keys:
 0 key (type: string)
 1 key (type: string)
   outputColumnNames: _col0, _col1, _col5, _col6
   input vertices:
 1 Map 1
   Statistics: Num rows: 16 Data size: 3306 Basic stats: 
 COMPLETE Column stats: NONE
   Filter Operator
 predicate: (_col0 + _col5) is not null (type: boolean)
 Statistics: Num rows: 8 Data size: 1653 Basic stats: 
 COMPLETE Column stats: NONE
 HashTable Sink Operator
   condition expressions:
 0 {_col0} {_col1} {_col5} {_col6}
 1 {key} {value}
   keys:
 0 (_col0 + _col5) (type: double)
 1 UDFToDouble(key) (type: double)
 
   Stage: Stage-3
 Spark
   DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:1
   Vertices:
 Map 2 
 Map Operator Tree:
 TableScan
   alias: src3
   Statistics: Num rows: 29 Data size: 5812 Basic stats: 
 COMPLETE Column stats: NONE
   Filter Operator
 predicate: UDFToDouble(key) is not null (type: boolean)
 Statistics: Num rows: 15 Data size: 3006 Basic stats

Re: Review Request 27627: Split map-join plan into 2 SparkTasks in 3 stages [Spark Branch]

2014-11-09 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27627/
---

(Updated Nov. 9, 2014, 10:39 p.m.)


Review request for hive.


Changes
---

Adopting Xuefu's pseudo code. Now for each BaseWork with MJ operator, use a 
SparkWork for its parent BaseWorks that contain HashTableSinkOperator.
I manually tested this patch with several qfiles containing map-join queries, 
and results look correct.


Bugs: HIVE-8622
https://issues.apache.org/jira/browse/HIVE-8622


Repository: hive-git


Description
---

This is a sub-task of map-join for spark 
https://issues.apache.org/jira/browse/HIVE-7613
This can use the baseline patch for map-join
https://issues.apache.org/jira/browse/HIVE-8616


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java 46d02bf 

Diff: https://reviews.apache.org/r/27627/diff/


Testing
---


Thanks,

Chao Sun

Re: Review Request 27627: Split map-join plan into 2 SparkTasks in 3 stages [Spark Branch]

2014-11-09 Thread Chao Sun



 On Nov. 8, 2014, 3:15 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java,
   line 214
  https://reviews.apache.org/r/27627/diff/3/?file=754597#file754597line214
 
  This assumes that result SparkWorks will be linearly dependent on each 
  other, which isn't true in general.Let's say the are two works (w1 and w2), 
  each having a map join operator. w1 and w2 are connected to w3 via HTS. w3 
  also contains map join operator. Dependency in this scenario will be 
  graphic rather than linear.
 
 Chao Sun wrote:
 I was thinking, in this case, if there's no dependency between w1 and w2, 
 they can be put in the same SparkWork, right?
 Otherwise, they will form a linear dependency too.
 
 Xuefu Zhang wrote:
 w1 and w2 are fine. they will be in the same SparkWork. This SparkWork 
 will depends on both the SparkWork generated at w1 and SparkWork generated at 
 w2. This dependency is not linear.
 
 To put more details, for each work that has map join op, we need to 
 create a SparkWork to handle its small tables. So, both w1 and w2 will need 
 to create such SparkWork. While w1 and w2 are in the same SparkWork, this 
 SparkWork depends on the two SparkWorks created.
 
 Chao Sun wrote:
 I'm not getting it, why This dependency is not linear? Can you give a 
 counter example?
 Suppose w1(MJ_1) w2(MJ_2), and w3(MJ_3) are like the following:
 
  HTS_1   HTS_2 HTS_3HTS_4
\  /   \ /
 \/ \   /
   MJ_1  MJ_2
| |
| |
   HTS_5HTS_6
   \/
\  /
 \/
  \  /
   \/
 MJ_3
 
 Then, what I'm doing is to put HTS_1, HTS_2, HTS_3, and HTS_4 in the same 
 SparkWork, say SW_1
 then, MJ_1, MJ_2, HTS_5, and HTS_6 will be in another SparkWork SW_2, and 
 MJ_3 in another SparkWork SW_3.
 SW_1 - SW_2 - SW_3.
 
 Xuefu Zhang wrote:
 I don't think we should put (HTS1,HTS2) and (HTS3, HTS4) in the same 
 SparkWork. They belong to different MJ handling different sets of small 
 tables. This will complicate things, making HashTableSinkOperator and 
 HashTableLoader more complicated.
 
 Per dependency, MJ1 doesn't need to wait for HTS3/HTS4 in order to run, 
 and vice versa.
 
 Please refer to pseudo code posted in the JIRA for implementation ideas. 
 Thanks.

Resolved via a offline chat.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27627/#review60482
---


On Nov. 9, 2014, 10:39 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/27627/
 ---
 
 (Updated Nov. 9, 2014, 10:39 p.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-8622
 https://issues.apache.org/jira/browse/HIVE-8622
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 This is a sub-task of map-join for spark 
 https://issues.apache.org/jira/browse/HIVE-7613
 This can use the baseline patch for map-join
 https://issues.apache.org/jira/browse/HIVE-8616
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java 46d02bf 
 
 Diff: https://reviews.apache.org/r/27627/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Chao Sun

Re: Review Request 27627: Split map-join plan into 2 SparkTasks in 3 stages [Spark Branch]

2014-11-08 Thread Chao Sun



 On Nov. 8, 2014, 3:15 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java,
   line 214
  https://reviews.apache.org/r/27627/diff/3/?file=754597#file754597line214
 
  This assumes that result SparkWorks will be linearly dependent on each 
  other, which isn't true in general.Let's say the are two works (w1 and w2), 
  each having a map join operator. w1 and w2 are connected to w3 via HTS. w3 
  also contains map join operator. Dependency in this scenario will be 
  graphic rather than linear.

I was thinking, in this case, if there's no dependency between w1 and w2, they 
can be put in the same SparkWork, right?
Otherwise, they will form a linear dependency too.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27627/#review60482
---


On Nov. 7, 2014, 6:07 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/27627/
 ---
 
 (Updated Nov. 7, 2014, 6:07 p.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-8622
 https://issues.apache.org/jira/browse/HIVE-8622
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 This is a sub-task of map-join for spark 
 https://issues.apache.org/jira/browse/HIVE-7613
 This can use the baseline patch for map-join
 https://issues.apache.org/jira/browse/HIVE-8616
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java 66fd6b6 
 
 Diff: https://reviews.apache.org/r/27627/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Chao Sun

Re: Review Request 27627: Split map-join plan into 2 SparkTasks in 3 stages [Spark Branch]

2014-11-08 Thread Chao Sun



 On Nov. 8, 2014, 12:44 a.m., Szehon Ho wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java,
   line 224
  https://reviews.apache.org/r/27627/diff/3/?file=754597#file754597line224
 
  I've been thinking about this, as you had brought up a pretty rare 
  use-case where a big-table parent of mapjoin1 still had a HTS , but its for 
  another(!) mapjoin.  I dont know if this is still a valid case , but do you 
  think this handles it, as it just indisciriminately adds it to the parent 
  map if it has HTS?

Fixed through a offline chat.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27627/#review60380
---


On Nov. 7, 2014, 6:07 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/27627/
 ---
 
 (Updated Nov. 7, 2014, 6:07 p.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-8622
 https://issues.apache.org/jira/browse/HIVE-8622
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 This is a sub-task of map-join for spark 
 https://issues.apache.org/jira/browse/HIVE-7613
 This can use the baseline patch for map-join
 https://issues.apache.org/jira/browse/HIVE-8616
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java 66fd6b6 
 
 Diff: https://reviews.apache.org/r/27627/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Chao Sun

Re: Review Request 27627: Split map-join plan into 2 SparkTasks in 3 stages [Spark Branch]

2014-11-08 Thread Chao Sun



 On Nov. 7, 2014, 11:07 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java,
   line 100
  https://reviews.apache.org/r/27627/diff/2/?file=754549#file754549line100
 
  It seems possible that current is MJwork, right? Are you going to add 
  it to the target?

Yes, it's possible. But that MJwork will be a one of which all HTS are already 
handled, so we can go through it to some HTS for other MJworks.


 On Nov. 7, 2014, 11:07 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java,
   line 115
  https://reviews.apache.org/r/27627/diff/2/?file=754549#file754549line115
 
  Frankly, I'm not 100% following the logic. The diagram has operators 
  mixed with works, which makes it hard. But I'm seeing where you're coming 
  from. Maybe you can explain to me better in person.

Here the operator name (MJ, HTS) means a work contains the operator, so MJ is a 
BaseWork containing MJ operator, and same for HTS.
Yes, I think explaining in person would be better.


 On Nov. 7, 2014, 11:07 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java,
   line 155
  https://reviews.apache.org/r/27627/diff/2/?file=754549#file754549line155
 
  I think there is a separate JIRA handling combining mapjoins, owned by 
  Szehon.

In my understanding, Szehon's JIRA is try to put MJ operators in the same 
BaseWork. But, there're some cases that we cannot apply this optimization, and 
MJ operators will be in different BaseWorks. My work here is to try to put them 
in the same SparkWork, if there's no dependency among them.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27627/#review60403
---


On Nov. 7, 2014, 6:07 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/27627/
 ---
 
 (Updated Nov. 7, 2014, 6:07 p.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-8622
 https://issues.apache.org/jira/browse/HIVE-8622
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 This is a sub-task of map-join for spark 
 https://issues.apache.org/jira/browse/HIVE-7613
 This can use the baseline patch for map-join
 https://issues.apache.org/jira/browse/HIVE-8616
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java 66fd6b6 
 
 Diff: https://reviews.apache.org/r/27627/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Chao Sun

Re: Review Request 27627: Split map-join plan into 2 SparkTasks in 3 stages [Spark Branch]

2014-11-08 Thread Chao Sun



 On Nov. 8, 2014, 3:15 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java,
   line 214
  https://reviews.apache.org/r/27627/diff/3/?file=754597#file754597line214
 
  This assumes that result SparkWorks will be linearly dependent on each 
  other, which isn't true in general.Let's say the are two works (w1 and w2), 
  each having a map join operator. w1 and w2 are connected to w3 via HTS. w3 
  also contains map join operator. Dependency in this scenario will be 
  graphic rather than linear.
 
 Chao Sun wrote:
 I was thinking, in this case, if there's no dependency between w1 and w2, 
 they can be put in the same SparkWork, right?
 Otherwise, they will form a linear dependency too.
 
 Xuefu Zhang wrote:
 w1 and w2 are fine. they will be in the same SparkWork. This SparkWork 
 will depends on both the SparkWork generated at w1 and SparkWork generated at 
 w2. This dependency is not linear.
 
 To put more details, for each work that has map join op, we need to 
 create a SparkWork to handle its small tables. So, both w1 and w2 will need 
 to create such SparkWork. While w1 and w2 are in the same SparkWork, this 
 SparkWork depends on the two SparkWorks created.

I'm not getting it, why This dependency is not linear? Can you give a counter 
example?
Suppose w1(MJ_1) w2(MJ_2), and w3(MJ_3) are like the following:

 HTS_1   HTS_2 HTS_3HTS_4
   \  /   \ /
\/ \   /
  MJ_1  MJ_2
   | |
   | |
  HTS_5HTS_6
  \/
   \  /
\/
 \  /
  \/
MJ_3

Then, what I'm doing is to put HTS_1, HTS_2, HTS_3, and HTS_4 in the same 
SparkWork, say SW_1
then, MJ_1, MJ_2, HTS_5, and HTS_6 will be in another SparkWork SW_2, and MJ_3 
in another SparkWork SW_3.
SW_1 - SW_2 - SW_3.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27627/#review60482
---


On Nov. 7, 2014, 6:07 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/27627/
 ---
 
 (Updated Nov. 7, 2014, 6:07 p.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-8622
 https://issues.apache.org/jira/browse/HIVE-8622
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 This is a sub-task of map-join for spark 
 https://issues.apache.org/jira/browse/HIVE-7613
 This can use the baseline patch for map-join
 https://issues.apache.org/jira/browse/HIVE-8616
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java 66fd6b6 
 
 Diff: https://reviews.apache.org/r/27627/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Chao Sun

Re: Review Request 27627: Split map-join plan into 2 SparkTasks in 3 stages [Spark Branch]

2014-11-07 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27627/
---

(Updated Nov. 7, 2014, 3:57 p.m.)


Review request for hive.


Changes
---

Another patch with a cleaner solution in my opinion. I tested it with 
subquery_multiinsert.q and result looks fine. Please give suggestions!


Bugs: HIVE-8622
https://issues.apache.org/jira/browse/HIVE-8622


Repository: hive-git


Description
---

This is a sub-task of map-join for spark 
https://issues.apache.org/jira/browse/HIVE-7613
This can use the baseline patch for map-join
https://issues.apache.org/jira/browse/HIVE-8616


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/27627/diff/


Testing
---


Thanks,

Chao Sun

Re: Review Request 27627: Split map-join plan into 2 SparkTasks in 3 stages [Spark Branch]

2014-11-07 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27627/
---

(Updated Nov. 7, 2014, 6:07 p.m.)


Review request for hive.


Changes
---

Instead of using a Set, we should use a Map from a BaseWork w/ MJ to all its 
parent BaseWorks w/ HTSs. The principle is, we cannot process all BaseWorks 
below this MJ until all HTSs are processed.


Bugs: HIVE-8622
https://issues.apache.org/jira/browse/HIVE-8622


Repository: hive-git


Description
---

This is a sub-task of map-join for spark 
https://issues.apache.org/jira/browse/HIVE-7613
This can use the baseline patch for map-join
https://issues.apache.org/jira/browse/HIVE-8616


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java 66fd6b6 

Diff: https://reviews.apache.org/r/27627/diff/


Testing
---


Thanks,

Chao Sun

Review Request 27627: Split map-join plan into 2 SparkTasks in 3 stages [Spark Branch]

2014-11-05 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27627/
---

Review request for hive.


Bugs: HIVE-8622
https://issues.apache.org/jira/browse/HIVE-8622


Repository: hive-git


Description
---

This is a sub-task of map-join for spark 
https://issues.apache.org/jira/browse/HIVE-7613
This can use the baseline patch for map-join
https://issues.apache.org/jira/browse/HIVE-8616


Diffs
-

  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/27627/diff/


Testing
---


Thanks,

Chao Sun

Re: Review Request 27627: Split map-join plan into 2 SparkTasks in 3 stages [Spark Branch]

2014-11-05 Thread Chao Sun



 On Nov. 5, 2014, 9:24 p.m., Szehon Ho wrote:
  Hi Chao, I left a review for a form of this patch at 
  https://reviews.apache.org/r/27640/, as Suhas put it up for a separate 
  review in combination with his patch.

Thanks, I'll take a look there.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27627/#review60034
---


On Nov. 5, 2014, 5:51 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/27627/
 ---
 
 (Updated Nov. 5, 2014, 5:51 p.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-8622
 https://issues.apache.org/jira/browse/HIVE-8622
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 This is a sub-task of map-join for spark 
 https://issues.apache.org/jira/browse/HIVE-7613
 This can use the baseline patch for map-join
 https://issues.apache.org/jira/browse/HIVE-8616
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/27627/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Chao Sun

Re: Review Request 27627: Split map-join plan into 2 SparkTasks in 3 stages [Spark Branch]

2014-11-05 Thread Chao Sun



 On Nov. 5, 2014, 7:16 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java,
   line 128
  https://reviews.apache.org/r/27627/diff/1/?file=750389#file750389line128
 
  Do you mean parentTasks != null?

That was a silly mistake.


 On Nov. 5, 2014, 7:16 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java,
   line 185
  https://reviews.apache.org/r/27627/diff/1/?file=750389#file750389line185
 
  Merge with itself?

Yes, in this case (current BaseWork has no MJ), we merge all parent SparkWorks 
into the current SparkWork.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27627/#review59987
---


On Nov. 5, 2014, 5:51 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/27627/
 ---
 
 (Updated Nov. 5, 2014, 5:51 p.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-8622
 https://issues.apache.org/jira/browse/HIVE-8622
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 This is a sub-task of map-join for spark 
 https://issues.apache.org/jira/browse/HIVE-7613
 This can use the baseline patch for map-join
 https://issues.apache.org/jira/browse/HIVE-8616
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/27627/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Chao Sun

Re: Review Request 27640: HIVE-8700 Replace ReduceSink to HashTableSink (or equi.) for small tables [Spark Branch]

2014-11-05 Thread Chao Sun



 On Nov. 5, 2014, 9:23 p.m., Szehon Ho wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java,
   line 188
  https://reviews.apache.org/r/27640/diff/1/?file=750693#file750693line188
 
  Can you elaborate why we need this assumption?  This may not be true in 
  all cases.

Actually, we don't need this assumption anymore. I'll remove it.


 On Nov. 5, 2014, 9:23 p.m., Szehon Ho wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java,
   line 141
  https://reviews.apache.org/r/27640/diff/1/?file=750693#file750693line141
 
  Please use proper javadoc notation for your javadocs.

I didn't use javadoc since it's a private method. Maybe I can write a better 
description on what it does?


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27640/#review60031
---


On Nov. 5, 2014, 8:29 p.m., Suhas Satish wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/27640/
 ---
 
 (Updated Nov. 5, 2014, 8:29 p.m.)
 
 
 Review request for hive, Chao Sun, Jimmy Xiang, Szehon Ho, and Xuefu Zhang.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 This replaces ReduceSinks with HashTableSinks in smaller tables for a 
 map-join. But the condition check field to detect map-join is actually being 
 set in CommonJoinResolver, which doesnt exist yet. We need to decide where is 
 the right place to populate this field. 
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 795a5d7 
 
 Diff: https://reviews.apache.org/r/27640/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Suhas Satish

Re: Review Request 27117: HIVE-8457 - MapOperator initialization fails when multiple Spark threads is enabled [Spark Branch]

2014-10-24 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27117/
---

(Updated Oct. 24, 2014, 4:51 p.m.)


Review request for hive and Xuefu Zhang.


Changes
---

Thanks Xuefu for the comments. I've updated my patch.


Bugs: HIVE-8457
https://issues.apache.org/jira/browse/HIVE-8457


Repository: hive-git


Description
---

Currently, on the Spark branch, each thread it is bound with a thread-local 
IOContext, which gets initialized when we generates an input HadoopRDD, and 
later used in MapOperator, FilterOperator, etc.
And, given the introduction of HIVE-8118, we may have multiple downstream RDDs 
that share the same input HadoopRDD, and we would like to have the HadoopRDD to 
be cached, to avoid scanning the same table multiple times. A typical case 
would be like the following:
 inputRDD inputRDD
||
   MT_11MT_12
||
   RT_1 RT_2
Here, MT_11 and MT_12 are MapTran from a splitted MapWork,
and RT_1 and RT_2 are two ReduceTran. Note that, this example is simplified, as 
we may also have ShuffleTran between MapTran and ReduceTran.
When multiple Spark threads are running, MT_11 may be executed first, and it 
will ask for an iterator from the HadoopRDD will trigger the creation of the 
iterator, which in turn triggers the initialization of the IOContext associated 
with that particular thread.
Now, the problem is: before MT_12 starts executing, it will also ask for an 
iterator from the
HadoopRDD, and since the RDD is already cached, instead of creating a new 
iterator, it will just fetch it from the cached result. However, this will skip 
the initialization of the IOContext associated with this particular thread. 
And, when MT_12 starts executing, it will try to initialize the MapOperator, 
but since the IOContext is not initialized, this will fail miserably.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
20ea977 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
00a6f3d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 4de3ad4 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java 
58e1ceb 
  ql/src/java/org/apache/hadoop/hive/ql/io/IOContext.java 5fb3b13 

Diff: https://reviews.apache.org/r/27117/diff/


Testing
---

All multi-insertion related tests are passing on my local machine.


Thanks,

Chao Sun

Review Request 27148: HIVE-8533 - Enable all q-tests for multi-insertion [Spark Branch]

2014-10-24 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27148/
---

Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang.


Bugs: HIVE-8533
https://issues.apache.org/jira/browse/HIVE-8533


Repository: hive-git


Description
---

As HIVE-8436 is done, we should be able to enable all multi-insertion related 
tests. This JIRA is created to track this and record any potential issue 
encountered.


Diffs
-

  itests/src/test/resources/testconfiguration.properties db8866d 
  ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby10.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby11.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby3_map_skew.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby7.q.out PRE-CREATION 
  
ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out
 PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby8.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby8_map.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby8_noskew.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby9.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby_complex_types.q.out 
PRE-CREATION 
  
ql/src/test/results/clientpositive/spark/groupby_complex_types_multi_single_reducer.q.out
 PRE-CREATION 
  
ql/src/test/results/clientpositive/spark/groupby_multi_insert_common_distinct.q.out
 PRE-CREATION 
  ql/src/test/results/clientpositive/spark/pcr.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_13.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_15.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_16.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/table_access_keys_stats.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/27148/diff/


Testing
---

auto_smb_mapjoin_14.q
groupby10.q
groupby11.q
groupby3_map_skew.q
groupby7.q
groupby7_noskew_multi_single_reducer.q
groupby8.q
groupby8_map.q
groupby8_map_skew.q
groupby8_noskew.q
groupby9.q
groupby_complex_types.q
groupby_complex_types_multi_single_reducer.q
groupby_multi_insert_common_distinct.q
pcr.q
smb_mapjoin_13.q
smb_mapjoin_15.q
smb_mapjoin_16.q
table_access_keys_stats.q


Thanks,

Chao Sun

Re: Review Request 27148: HIVE-8533 - Enable all q-tests for multi-insertion [Spark Branch]

2014-10-24 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27148/
---

(Updated Oct. 24, 2014, 6:03 p.m.)


Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang.


Bugs: HIVE-8533
https://issues.apache.org/jira/browse/HIVE-8533


Repository: hive-git


Description
---

As HIVE-8436 is done, we should be able to enable all multi-insertion related 
tests. This JIRA is created to track this and record any potential issue 
encountered.


Diffs
-

  itests/src/test/resources/testconfiguration.properties db8866d 
  ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby10.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby11.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby3_map_skew.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby7.q.out PRE-CREATION 
  
ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out
 PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby8.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby8_map.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby8_noskew.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby9.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby_complex_types.q.out 
PRE-CREATION 
  
ql/src/test/results/clientpositive/spark/groupby_complex_types_multi_single_reducer.q.out
 PRE-CREATION 
  
ql/src/test/results/clientpositive/spark/groupby_multi_insert_common_distinct.q.out
 PRE-CREATION 
  ql/src/test/results/clientpositive/spark/pcr.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_13.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_15.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_16.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/table_access_keys_stats.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/27148/diff/


Testing
---

auto_smb_mapjoin_14.q
groupby10.q
groupby11.q
groupby3_map_skew.q
groupby7.q
groupby7_noskew_multi_single_reducer.q
groupby8.q
groupby8_map.q
groupby8_map_skew.q
groupby8_noskew.q
groupby9.q
groupby_complex_types.q
groupby_complex_types_multi_single_reducer.q
groupby_multi_insert_common_distinct.q
pcr.q
smb_mapjoin_13.q
smb_mapjoin_15.q
smb_mapjoin_16.q
table_access_keys_stats.q


Thanks,

Chao Sun

Re: Build failure on trunk

2014-10-24 Thread Chao Sun

Maybe it's because the patch didn't apply?

2014-10-24 17:14:50,934 INFO
LocalCommand$CollectLogPolicy.handleOutput:69 The patch does not
appear to apply with p0, p1, or p2
2014-10-24 17:14:50,938 INFO
LocalCommand$CollectLogPolicy.handleOutput:69 + exit 1
2014-10-24 17:14:50,939 ERROR PTest.run:175 Test run exited with an
unexpected error
org.apache.hive.ptest.execution.ssh.NonZeroExitCodeException: Command
'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with
exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]

On Fri, Oct 24, 2014 at 2:19 PM, Prasanth Jayachandran
pjayachand...@hortonworks.com wrote:

Unit test run for HIVE-8454 spent 2hr 48mins but finally it says “no tests
executed”.

https://issues.apache.org/jira/browse/HIVE-8454?focusedCommentId=14183509page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14183509

http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1444/

Anyone know why?

- Prasanth

On Thu, Oct 23, 2014 at 9:26 PM, Gunther Hagleitner
ghagleit...@hortonworks.com wrote:

Thanks Xuefu - I appreciate it!
On Thu, Oct 23, 2014 at 9:15 PM, Xuefu Zhang xzh...@cloudera.com
wrote:
You can add CLEAR LIBRARY CACHE in the description for any JIRA, which
will
clear local maven repo. I added it to HIVE-6165.

On Thu, Oct 23, 2014 at 9:09 PM, Gunther Hagleitner
ghagleit...@hortonworks.com wrote:

Builds are running again (reverted patch). I've re-uploaded the
patches
that had a failed run because of it.

Sorry about that...

Thanks,
Gunther.

On Thu, Oct 23, 2014 at 8:07 PM, Gunther Hagleitner
gunther.hagleit...@gmail.com wrote:

The builds are failing right now on trunk after I committed a change
that
requires new/updated calcite libs. (Sorry about that).

Is it possible for someone to wipe the .m2 cache on the build
machine,
so
it would download a new version with the changes?

Thank you,
Gunther.

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity
to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified
that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender
immediately
and delete it from your system. Thank You.
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.

Re: Review Request 27046: HIVE-8545 - Exception when casting Text to BytesWritable [Spark Branch]

2014-10-23 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27046/
---

(Updated Oct. 23, 2014, 5:32 p.m.)


Review request for hive, Brock Noland and Xuefu Zhang.


Changes
---

Thanks Xuefu for the suggestions. This patch uses a blank Configuration instead 
of serialize/deserialize JobConf.


Bugs: hive-8545
https://issues.apache.org/jira/browse/hive-8545


Repository: hive-git


Description
---

With the current multi-insertion implementation, when caching is enabled for 
input RDD, query may fail with the following exception:
2014-10-21 13:57:34,742 WARN  [task-result-getter-0]: scheduler.TaskSetManager 
(Logging.scala:logWarning(71)) - Lost task 0.0 in stage 1.0 (TID 1, localhost): 
java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to 
org.apache.hadoop.io.BytesWritable

org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:67)

org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:61)

org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)

org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:234)
org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163)
org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)

org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)

org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
org.apache.spark.scheduler.Task.run(Task.scala:56)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
The fix should be easy. However, interestingly, this error doesn't show up when 
the caching is turned off. We need to find out why.


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java
 dc5d148 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 9849b49 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
25a4515 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTran.java 8a3dbf2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 0f21b46 

Diff: https://reviews.apache.org/r/27046/diff/


Testing
---


Thanks,

Chao Sun

Review Request 27117: HIVE-8457 - MapOperator initialization fails when multiple Spark threads is enabled [Spark Branch]

2014-10-23 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27117/
---

Review request for hive and Xuefu Zhang.


Bugs: HIVE-8457
https://issues.apache.org/jira/browse/HIVE-8457


Repository: hive-git


Description
---

Currently, on the Spark branch, each thread it is bound with a thread-local 
IOContext, which gets initialized when we generates an input HadoopRDD, and 
later used in MapOperator, FilterOperator, etc.
And, given the introduction of HIVE-8118, we may have multiple downstream RDDs 
that share the same input HadoopRDD, and we would like to have the HadoopRDD to 
be cached, to avoid scanning the same table multiple times. A typical case 
would be like the following:
 inputRDD inputRDD
||
   MT_11MT_12
||
   RT_1 RT_2
Here, MT_11 and MT_12 are MapTran from a splitted MapWork,
and RT_1 and RT_2 are two ReduceTran. Note that, this example is simplified, as 
we may also have ShuffleTran between MapTran and ReduceTran.
When multiple Spark threads are running, MT_11 may be executed first, and it 
will ask for an iterator from the HadoopRDD will trigger the creation of the 
iterator, which in turn triggers the initialization of the IOContext associated 
with that particular thread.
Now, the problem is: before MT_12 starts executing, it will also ask for an 
iterator from the
HadoopRDD, and since the RDD is already cached, instead of creating a new 
iterator, it will just fetch it from the cached result. However, this will skip 
the initialization of the IOContext associated with this particular thread. 
And, when MT_12 starts executing, it will try to initialize the MapOperator, 
but since the IOContext is not initialized, this will fail miserably.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
20ea977 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
00a6f3d 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java 
58e1ceb 

Diff: https://reviews.apache.org/r/27117/diff/


Testing
---

All multi-insertion related tests are passing on my local machine.


Thanks,

Chao Sun

Review Request 27046: HIVE-8545 - Exception when casting Text to BytesWritable [Spark Branch]

2014-10-22 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27046/
---

Review request for hive, Brock Noland and Xuefu Zhang.


Bugs: hive-8545
https://issues.apache.org/jira/browse/hive-8545


Repository: hive-git


Description
---

With the current multi-insertion implementation, when caching is enabled for 
input RDD, query may fail with the following exception:
2014-10-21 13:57:34,742 WARN  [task-result-getter-0]: scheduler.TaskSetManager 
(Logging.scala:logWarning(71)) - Lost task 0.0 in stage 1.0 (TID 1, localhost): 
java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to 
org.apache.hadoop.io.BytesWritable

org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:67)

org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:61)

org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)

org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:234)
org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163)
org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)

org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)

org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
org.apache.spark.scheduler.Task.run(Task.scala:56)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
The fix should be easy. However, interestingly, this error doesn't show up when 
the caching is turned off. We need to find out why.


Diffs
-

  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java
 dc5d148 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveCopyFunction.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 9849b49 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
25a4515 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTran.java 8a3dbf2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 0f21b46 

Diff: https://reviews.apache.org/r/27046/diff/


Testing
---


Thanks,

Chao Sun

Re: Review Request 27046: HIVE-8545 - Exception when casting Text to BytesWritable [Spark Branch]

2014-10-22 Thread Chao Sun



 On Oct. 22, 2014, 11:40 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTran.java, line 25
  https://reviews.apache.org/r/27046/diff/1/?file=728820#file728820line25
 
  Why KO becomes Writable now? Should it be WritableComparable according 
  to MapInput?

My mistake, it should be WritableComparable. Thanks for pointing out.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27046/#review57932
---


On Oct. 22, 2014, 5:50 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/27046/
 ---
 
 (Updated Oct. 22, 2014, 5:50 p.m.)
 
 
 Review request for hive, Brock Noland and Xuefu Zhang.
 
 
 Bugs: hive-8545
 https://issues.apache.org/jira/browse/hive-8545
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 With the current multi-insertion implementation, when caching is enabled for 
 input RDD, query may fail with the following exception:
 2014-10-21 13:57:34,742 WARN  [task-result-getter-0]: 
 scheduler.TaskSetManager (Logging.scala:logWarning(71)) - Lost task 0.0 in 
 stage 1.0 (TID 1, localhost): java.lang.ClassCastException: 
 org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.BytesWritable
 
 org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:67)
 
 org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:61)
 
 org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
 
 org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
 scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 
 org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:234)
 
 org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163)
 org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 org.apache.spark.scheduler.Task.run(Task.scala:56)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:745)
 The fix should be easy. However, interestingly, this error doesn't show up 
 when the caching is turned off. We need to find out why.
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java
  dc5d148 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveCopyFunction.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 9849b49 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
 25a4515 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTran.java 8a3dbf2 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
 0f21b46 
 
 Diff: https://reviews.apache.org/r/27046/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Chao Sun

Re: Review Request 27046: HIVE-8545 - Exception when casting Text to BytesWritable [Spark Branch]

2014-10-22 Thread Chao Sun



 On Oct. 22, 2014, 11:36 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java,
   line 77
  https://reviews.apache.org/r/27046/diff/1/?file=728816#file728816line77
 
  I think we should let this stay in SparkUtils which otherwise now 
  become an empty class.

OK. To make it consistent, I also moved copyHiveKey to SparkUtilities.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27046/#review57929
---


On Oct. 22, 2014, 5:50 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/27046/
 ---
 
 (Updated Oct. 22, 2014, 5:50 p.m.)
 
 
 Review request for hive, Brock Noland and Xuefu Zhang.
 
 
 Bugs: hive-8545
 https://issues.apache.org/jira/browse/hive-8545
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 With the current multi-insertion implementation, when caching is enabled for 
 input RDD, query may fail with the following exception:
 2014-10-21 13:57:34,742 WARN  [task-result-getter-0]: 
 scheduler.TaskSetManager (Logging.scala:logWarning(71)) - Lost task 0.0 in 
 stage 1.0 (TID 1, localhost): java.lang.ClassCastException: 
 org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.BytesWritable
 
 org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:67)
 
 org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:61)
 
 org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
 
 org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
 scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 
 org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:234)
 
 org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163)
 org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 org.apache.spark.scheduler.Task.run(Task.scala:56)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:745)
 The fix should be easy. However, interestingly, this error doesn't show up 
 when the caching is turned off. We need to find out why.
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java
  dc5d148 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveCopyFunction.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 9849b49 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
 25a4515 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTran.java 8a3dbf2 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
 0f21b46 
 
 Diff: https://reviews.apache.org/r/27046/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Chao Sun

Re: Review Request 27046: HIVE-8545 - Exception when casting Text to BytesWritable [Spark Branch]

2014-10-22 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27046/
---

(Updated Oct. 23, 2014, 12:42 a.m.)


Review request for hive, Brock Noland and Xuefu Zhang.


Changes
---

Thanks Xuefu for the comments. I've changed my patch accordingly.


Bugs: hive-8545
https://issues.apache.org/jira/browse/hive-8545


Repository: hive-git


Description
---

With the current multi-insertion implementation, when caching is enabled for 
input RDD, query may fail with the following exception:
2014-10-21 13:57:34,742 WARN  [task-result-getter-0]: scheduler.TaskSetManager 
(Logging.scala:logWarning(71)) - Lost task 0.0 in stage 1.0 (TID 1, localhost): 
java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to 
org.apache.hadoop.io.BytesWritable

org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:67)

org.apache.hadoop.hive.ql.exec.spark.MapInput$CopyFunction.call(MapInput.java:61)

org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)

org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:234)
org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163)
org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)

org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)

org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
org.apache.spark.scheduler.Task.run(Task.scala:56)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
The fix should be easy. However, interestingly, this error doesn't show up when 
the caching is turned off. We need to find out why.


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java
 dc5d148 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveCopyFunction.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 9849b49 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
25a4515 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTran.java 8a3dbf2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 0f21b46 

Diff: https://reviews.apache.org/r/27046/diff/


Testing
---


Thanks,

Chao Sun

Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]

2014-10-20 Thread Chao Sun



 On Oct. 19, 2014, 12:15 a.m., Xuefu Zhang wrote:
  ql/src/test/queries/clientpositive/spark_multi_insert_split_work.q, line 1
  https://reviews.apache.org/r/26706/diff/4/?file=724864#file724864line1
 
  Could we put this test as spark only, as splitting doesn't apply mr or 
  tez? I think we have a dir for spark only tests.
 
 Chao Sun wrote:
 I also wanted to make this as a spark-only test.
 But the feature hasn't been implemented yet (I think Szehon is working on 
 it).
 I made the file name to start with spark_ so in future we can move it 
 to spark-only test directory.
 But currently, there's no test dir for spark, only result dir.
 
 Xuefu Zhang wrote:
 In the case, let rename the test to have a generic name. it's a valid 
 test case for MR also, but also a special case for Spark.

OK, thanks. I've updated the patch accordingly.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26706/#review57286
---


On Oct. 19, 2014, 12:46 a.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26706/
 ---
 
 (Updated Oct. 19, 2014, 12:46 a.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-8436
 https://issues.apache.org/jira/browse/HIVE-8436
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Based on the design doc, we need to split the operator tree of a work in 
 SparkWork if the work is connected to multiple child works. The way splitting 
 the operator tree is performed by cloning the original work and removing 
 unwanted branches in the operator tree. Please refer to the design doc for 
 details.
 This process should be done right before we generate SparkPlan. We should 
 have a utility method that takes the orignal SparkWork and return a modified 
 SparkWork.
 This process should also keep the information about the original work and its 
 clones. Such information will be needed during SparkPlan generation 
 (HIVE-8437).
 
 
 Diffs
 -
 
   itests/src/test/resources/testconfiguration.properties 558dd02 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java
  c956101 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
 5153885 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
 126cb9f 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
 3773dcb 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 d7744e9 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 280edde 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 644c681 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java
  1d01040 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
  93940bc 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java
  20eb344 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java
  a62643a 
   ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 
   ql/src/test/queries/clientpositive/spark_multi_insert_split_work.q 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/groupby7_map.q.out 2d99a81 
   ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out ca73985 
   ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 2d2c55b 
   ql/src/test/results/clientpositive/spark/groupby_cube1.q.out 942cdaa 
   ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 
 399fe41 
   ql/src/test/results/clientpositive/spark/groupby_position.q.out 5e68807 
   ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 4259412 
   ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out e0e882e 
   ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 
 a43921e 
   ql/src/test/results/clientpositive/spark/input12.q.out 4b0cf44 
   ql/src/test/results/clientpositive/spark/input13.q.out 260a65a 
   ql/src/test/results/clientpositive/spark/input1_limit.q.out 1f3b484 
   ql/src/test/results/clientpositive/spark/input_part2.q.out f2f3a2d 
   ql/src/test/results/clientpositive/spark/insert1.q.out 65032cb 
   ql/src/test/results/clientpositive/spark/insert_into3.q.out 5318a8b 
   ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out 3b669fc 
   ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 50c052d 
   ql/src/test/results

Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]

2014-10-20 Thread Chao Sun

 checked 
against the old results.
Also I created a new test spark_multi_insert_spill_work.q to check splitting 
won't generate duplicate FSs.


Thanks,

Chao Sun

Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]

2014-10-20 Thread Chao Sun

, and manually checked 
against the old results.
Also I created a new test spark_multi_insert_spill_work.q to check splitting 
won't generate duplicate FSs.


Thanks,

Chao Sun

Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]

2014-10-20 Thread Chao Sun



 On Oct. 20, 2014, 9:52 p.m., Xuefu Zhang wrote:
  itests/src/test/resources/testconfiguration.properties, line 509
  https://reviews.apache.org/r/26706/diff/7/?file=726397#file726397line509
 
  We might need to change this as well.

Can't believe I missed this. Sorry for the sloppyness!


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26706/#review57445
---


On Oct. 20, 2014, 9:10 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26706/
 ---
 
 (Updated Oct. 20, 2014, 9:10 p.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-8436
 https://issues.apache.org/jira/browse/HIVE-8436
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Based on the design doc, we need to split the operator tree of a work in 
 SparkWork if the work is connected to multiple child works. The way splitting 
 the operator tree is performed by cloning the original work and removing 
 unwanted branches in the operator tree. Please refer to the design doc for 
 details.
 This process should be done right before we generate SparkPlan. We should 
 have a utility method that takes the orignal SparkWork and return a modified 
 SparkWork.
 This process should also keep the information about the original work and its 
 clones. Such information will be needed during SparkPlan generation 
 (HIVE-8437).
 
 
 Diffs
 -
 
   itests/src/test/resources/testconfiguration.properties 558dd02 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java
  c956101 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
 5153885 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
 126cb9f 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
 3773dcb 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 d7744e9 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 280edde 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 644c681 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java
  1d01040 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
  93940bc 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java
  20eb344 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java
  a62643a 
   ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 
   ql/src/test/queries/clientpositive/multi_insert_mixed.q PRE-CREATION 
   ql/src/test/results/clientpositive/multi_insert_mixed.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/spark/groupby7_map.q.out 310f2fe 
   ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out e6054c9 
   ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out d0f3e76 
   ql/src/test/results/clientpositive/spark/groupby_cube1.q.out d40c7bb 
   ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 
 b4ded62 
   ql/src/test/results/clientpositive/spark/groupby_position.q.out d2529bb 
   ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 7fa6130 
   ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out 4a4070b 
   ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 
 62c179e 
   ql/src/test/results/clientpositive/spark/input12.q.out a4b7a3c 
   ql/src/test/results/clientpositive/spark/input13.q.out 5c799dc 
   ql/src/test/results/clientpositive/spark/input1_limit.q.out 1105ed8 
   ql/src/test/results/clientpositive/spark/input_part2.q.out 514f54a 
   ql/src/test/results/clientpositive/spark/insert1.q.out 1b88026 
   ql/src/test/results/clientpositive/spark/insert_into3.q.out 5b2aa78 
   ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out cbf7204 
   ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 3905d84 
   ql/src/test/results/clientpositive/spark/multi_insert.q.out 0404119 
   ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out 903e966 
   ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out 
 730fb4f 
   ql/src/test/results/clientpositive/spark/multi_insert_mixed.q.out 
 PRE-CREATION 
   
 ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out
  1f31f56 
   ql/src/test/results/clientpositive/spark/multigroupby_singlemr.q.out 
 4ded9d2 
   ql/src/test/results

Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]

2014-10-20 Thread Chao Sun

 spark_multi_insert_spill_work.q to check splitting 
won't generate duplicate FSs.


Thanks,

Chao Sun

Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]

2014-10-18 Thread Chao Sun



 On Oct. 19, 2014, 12:15 a.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java, line 64
  https://reviews.apache.org/r/26706/diff/4/?file=724853#file724853line64
 
  Could we reuse this as a utility? I think we have same/similar thing 
  somewhere.

You're right - HiveBaseFunctionResultList has the same method.
I've put it in the SparkUtilities.


 On Oct. 19, 2014, 12:15 a.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java, 
  line 250
  https://reviews.apache.org/r/26706/diff/4/?file=724854#file724854line250
 
  Do we need to disconnect it or remove does this automatically?

Yes, remove also remove all edges connected to this node.


 On Oct. 19, 2014, 12:15 a.m., Xuefu Zhang wrote:
  ql/src/test/queries/clientpositive/spark_multi_insert_split_work.q, line 1
  https://reviews.apache.org/r/26706/diff/4/?file=724864#file724864line1
 
  Could we put this test as spark only, as splitting doesn't apply mr or 
  tez? I think we have a dir for spark only tests.

I also wanted to make this as a spark-only test.
But the feature hasn't been implemented yet (I think Szehon is working on it).
I made the file name to start with spark_ so in future we can move it to 
spark-only test directory.
But currently, there's no test dir for spark, only result dir.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26706/#review57286
---


On Oct. 17, 2014, 9:24 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26706/
 ---
 
 (Updated Oct. 17, 2014, 9:24 p.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-8436
 https://issues.apache.org/jira/browse/HIVE-8436
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Based on the design doc, we need to split the operator tree of a work in 
 SparkWork if the work is connected to multiple child works. The way splitting 
 the operator tree is performed by cloning the original work and removing 
 unwanted branches in the operator tree. Please refer to the design doc for 
 details.
 This process should be done right before we generate SparkPlan. We should 
 have a utility method that takes the orignal SparkWork and return a modified 
 SparkWork.
 This process should also keep the information about the original work and its 
 clones. Such information will be needed during SparkPlan generation 
 (HIVE-8437).
 
 
 Diffs
 -
 
   itests/src/test/resources/testconfiguration.properties 558dd02 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
 5153885 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
 126cb9f 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 d7744e9 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 280edde 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 644c681 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java
  1d01040 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
  93940bc 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java
  20eb344 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java
  a62643a 
   ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 
   ql/src/test/queries/clientpositive/spark_multi_insert_split_work.q 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/groupby7_map.q.out 2d99a81 
   ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out ca73985 
   ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 2d2c55b 
   ql/src/test/results/clientpositive/spark/groupby_cube1.q.out 942cdaa 
   ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 
 399fe41 
   ql/src/test/results/clientpositive/spark/groupby_position.q.out 5e68807 
   ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 4259412 
   ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out e0e882e 
   ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 
 a43921e 
   ql/src/test/results/clientpositive/spark/input12.q.out 4b0cf44 
   ql/src/test/results/clientpositive/spark/input13.q.out 260a65a 
   ql/src/test/results/clientpositive/spark/input1_limit.q.out 1f3b484 
   ql/src/test/results/clientpositive/spark/input_part2.q.out f2f3a2d 
   ql/src

Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]

2014-10-18 Thread Chao Sun

 results are regenerated, and manually checked 
against the old results.
Also I created a new test spark_multi_insert_spill_work.q to check splitting 
won't generate duplicate FSs.


Thanks,

Chao Sun

Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]

2014-10-17 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26706/
---

(Updated Oct. 17, 2014, 6:04 p.m.)


Review request for hive and Xuefu Zhang.


Changes
---

Added a test to check that splitting work doesn't create duplicate FSs.


Bugs: HIVE-8436
https://issues.apache.org/jira/browse/HIVE-8436


Repository: hive-git


Description
---

Based on the design doc, we need to split the operator tree of a work in 
SparkWork if the work is connected to multiple child works. The way splitting 
the operator tree is performed by cloning the original work and removing 
unwanted branches in the operator tree. Please refer to the design doc for 
details.
This process should be done right before we generate SparkPlan. We should have 
a utility method that takes the orignal SparkWork and return a modified 
SparkWork.
This process should also keep the information about the original work and its 
clones. Such information will be needed during SparkPlan generation (HIVE-8437).


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
5153885 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
126cb9f 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
d7744e9 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 280edde 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 644c681 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java 
1d01040 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
 93940bc 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 
20eb344 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java 
a62643a 
  ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 
  ql/src/test/queries/clientpositive/spark_multi_insert_split_work.q 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby7_map.q.out 2d99a81 
  ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out ca73985 
  ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 2d2c55b 
  ql/src/test/results/clientpositive/spark/groupby_cube1.q.out 942cdaa 
  ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 
399fe41 
  ql/src/test/results/clientpositive/spark/groupby_position.q.out 5e68807 
  ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 4259412 
  ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out e0e882e 
  ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out a43921e 
  ql/src/test/results/clientpositive/spark/input12.q.out 4b0cf44 
  ql/src/test/results/clientpositive/spark/input13.q.out 260a65a 
  ql/src/test/results/clientpositive/spark/input1_limit.q.out 1f3b484 
  ql/src/test/results/clientpositive/spark/input_part2.q.out f2f3a2d 
  ql/src/test/results/clientpositive/spark/insert1.q.out 65032cb 
  ql/src/test/results/clientpositive/spark/insert_into3.q.out 5318a8b 
  ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out 3b669fc 
  ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 50c052d 
  ql/src/test/results/clientpositive/spark/multi_insert.q.out bae325f 
  ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out 280a893 
  ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out 
b07c582 
  
ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out
 fd477ca 
  ql/src/test/results/clientpositive/spark/multigroupby_singlemr.q.out 44991e3 
  ql/src/test/results/clientpositive/spark/ppd_multi_insert.q.out 96f2c06 
  ql/src/test/results/clientpositive/spark/ppd_transform.q.out 7ec5d8d 
  ql/src/test/results/clientpositive/spark/spark_multi_insert_split_work.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/subquery_multiinsert.q.out 2b4a331 
  ql/src/test/results/clientpositive/spark/union18.q.out f94fa0b 
  ql/src/test/results/clientpositive/spark/union19.q.out 8dcb543 
  ql/src/test/results/clientpositive/spark/union_remove_6.q.out 6730010 
  ql/src/test/results/clientpositive/spark/vectorized_ptf.q.out 909378b 

Diff: https://reviews.apache.org/r/26706/diff/


Testing
---


Thanks,

Chao Sun

Review Request 26884: HIVE-8496 - Re-enable statistics [Spark Branch]

2014-10-17 Thread Chao Sun

/clientpositive/spark/union23.q.out 22aa965 
  ql/src/test/results/clientpositive/spark/union25.q.out bad0e5c 
  ql/src/test/results/clientpositive/spark/union28.q.out 1478976 
  ql/src/test/results/clientpositive/spark/union3.q.out 8a7954b 
  ql/src/test/results/clientpositive/spark/union30.q.out a33e999 
  ql/src/test/results/clientpositive/spark/union33.q.out 840cb4d 
  ql/src/test/results/clientpositive/spark/union4.q.out 78c3979 
  ql/src/test/results/clientpositive/spark/union5.q.out 9717853 
  ql/src/test/results/clientpositive/spark/union6.q.out eb42a40 
  ql/src/test/results/clientpositive/spark/union7.q.out 8606278 
  ql/src/test/results/clientpositive/spark/union9.q.out 9db0539 
  ql/src/test/results/clientpositive/spark/union_ppr.q.out 15dec39 
  ql/src/test/results/clientpositive/spark/union_remove_1.q.out 0d0ec26 
  ql/src/test/results/clientpositive/spark/union_remove_10.q.out e03c2d9 
  ql/src/test/results/clientpositive/spark/union_remove_15.q.out ab98518 
  ql/src/test/results/clientpositive/spark/union_remove_16.q.out 90cb97c 
  ql/src/test/results/clientpositive/spark/union_remove_18.q.out 83fab64 
  ql/src/test/results/clientpositive/spark/union_remove_19.q.out 07e1cc3 
  ql/src/test/results/clientpositive/spark/union_remove_2.q.out 00dd51e 
  ql/src/test/results/clientpositive/spark/union_remove_20.q.out 9140453 
  ql/src/test/results/clientpositive/spark/union_remove_21.q.out b921b1a 
  ql/src/test/results/clientpositive/spark/union_remove_24.q.out 7d54e78 
  ql/src/test/results/clientpositive/spark/union_remove_25.q.out d8292aa 
  ql/src/test/results/clientpositive/spark/union_remove_4.q.out db816e4 
  ql/src/test/results/clientpositive/spark/union_remove_5.q.out 7c85791 
  ql/src/test/results/clientpositive/spark/union_remove_6.q.out 6730010 
  ql/src/test/results/clientpositive/spark/union_remove_7.q.out ed30b09 
  ql/src/test/results/clientpositive/spark/union_remove_8.q.out 16f15f4 
  ql/src/test/results/clientpositive/spark/union_remove_9.q.out 4a33436 
  ql/src/test/results/clientpositive/spark/vector_cast_constant.q.out f30c803 
  ql/src/test/results/clientpositive/spark/vector_data_types.q.out d21c68f 
  ql/src/test/results/clientpositive/spark/vector_decimal_aggregate.q.out 
99606d9 
  ql/src/test/results/clientpositive/spark/vector_left_outer_join.q.out 8c28349 
  ql/src/test/results/clientpositive/spark/vectorization_14.q.out f1e4916 
  ql/src/test/results/clientpositive/spark/vectorization_15.q.out 3eb3722 
  ql/src/test/results/clientpositive/spark/vectorization_9.q.out 21434d4 
  ql/src/test/results/clientpositive/spark/vectorization_part_project.q.out 
c6458ec 
  ql/src/test/results/clientpositive/spark/vectorized_mapjoin.q.out e8751a6 
  ql/src/test/results/clientpositive/spark/vectorized_nested_mapjoin.q.out 
d163d42 
  ql/src/test/results/clientpositive/spark/vectorized_ptf.q.out 909378b 
  ql/src/test/results/clientpositive/spark/vectorized_shufflejoin.q.out e8751a6 
  ql/src/test/results/clientpositive/spark/vectorized_timestamp_funcs.q.out 
abf1d86 

Diff: https://reviews.apache.org/r/26884/diff/


Testing
---

All test results are regenerated.


Thanks,

Chao Sun

Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]

2014-10-17 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26706/
---

(Updated Oct. 17, 2014, 9:22 p.m.)


Review request for hive and Xuefu Zhang.


Changes
---

Included a qfile result for MR mode.


Bugs: HIVE-8436
https://issues.apache.org/jira/browse/HIVE-8436


Repository: hive-git


Description
---

Based on the design doc, we need to split the operator tree of a work in 
SparkWork if the work is connected to multiple child works. The way splitting 
the operator tree is performed by cloning the original work and removing 
unwanted branches in the operator tree. Please refer to the design doc for 
details.
This process should be done right before we generate SparkPlan. We should have 
a utility method that takes the orignal SparkWork and return a modified 
SparkWork.
This process should also keep the information about the original work and its 
clones. Such information will be needed during SparkPlan generation (HIVE-8437).


Diffs (updated)
-

  itests/src/test/resources/testconfiguration.properties 558dd02 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
5153885 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
126cb9f 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
d7744e9 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 280edde 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 644c681 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java 
1d01040 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
 93940bc 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 
20eb344 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java 
a62643a 
  ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 
  ql/src/test/queries/clientpositive/spark_multi_insert_split_work.q 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby7_map.q.out 2d99a81 
  ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out ca73985 
  ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 2d2c55b 
  ql/src/test/results/clientpositive/spark/groupby_cube1.q.out 942cdaa 
  ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 
399fe41 
  ql/src/test/results/clientpositive/spark/groupby_position.q.out 5e68807 
  ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 4259412 
  ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out e0e882e 
  ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out a43921e 
  ql/src/test/results/clientpositive/spark/input12.q.out 4b0cf44 
  ql/src/test/results/clientpositive/spark/input13.q.out 260a65a 
  ql/src/test/results/clientpositive/spark/input1_limit.q.out 1f3b484 
  ql/src/test/results/clientpositive/spark/input_part2.q.out f2f3a2d 
  ql/src/test/results/clientpositive/spark/insert1.q.out 65032cb 
  ql/src/test/results/clientpositive/spark/insert_into3.q.out 5318a8b 
  ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out 3b669fc 
  ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 50c052d 
  ql/src/test/results/clientpositive/spark/multi_insert.q.out bae325f 
  ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out 280a893 
  ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out 
b07c582 
  
ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out
 fd477ca 
  ql/src/test/results/clientpositive/spark/multigroupby_singlemr.q.out 44991e3 
  ql/src/test/results/clientpositive/spark/ppd_multi_insert.q.out 96f2c06 
  ql/src/test/results/clientpositive/spark/ppd_transform.q.out 7ec5d8d 
  ql/src/test/results/clientpositive/spark/spark_multi_insert_split_work.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/subquery_multiinsert.q.out 2b4a331 
  ql/src/test/results/clientpositive/spark/union18.q.out f94fa0b 
  ql/src/test/results/clientpositive/spark/union19.q.out 8dcb543 
  ql/src/test/results/clientpositive/spark/union_remove_6.q.out 6730010 
  ql/src/test/results/clientpositive/spark/vectorized_ptf.q.out 909378b 
  ql/src/test/results/clientpositive/spark_multi_insert_split_work.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/26706/diff/


Testing
---


Thanks,

Chao Sun

Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]

2014-10-17 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26706/
---

(Updated Oct. 17, 2014, 9:24 p.m.)


Review request for hive and Xuefu Zhang.


Bugs: HIVE-8436
https://issues.apache.org/jira/browse/HIVE-8436


Repository: hive-git


Description
---

Based on the design doc, we need to split the operator tree of a work in 
SparkWork if the work is connected to multiple child works. The way splitting 
the operator tree is performed by cloning the original work and removing 
unwanted branches in the operator tree. Please refer to the design doc for 
details.
This process should be done right before we generate SparkPlan. We should have 
a utility method that takes the orignal SparkWork and return a modified 
SparkWork.
This process should also keep the information about the original work and its 
clones. Such information will be needed during SparkPlan generation (HIVE-8437).


Diffs
-

  itests/src/test/resources/testconfiguration.properties 558dd02 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
5153885 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
126cb9f 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
d7744e9 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 280edde 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 644c681 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java 
1d01040 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
 93940bc 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 
20eb344 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java 
a62643a 
  ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 
  ql/src/test/queries/clientpositive/spark_multi_insert_split_work.q 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby7_map.q.out 2d99a81 
  ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out ca73985 
  ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 2d2c55b 
  ql/src/test/results/clientpositive/spark/groupby_cube1.q.out 942cdaa 
  ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 
399fe41 
  ql/src/test/results/clientpositive/spark/groupby_position.q.out 5e68807 
  ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 4259412 
  ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out e0e882e 
  ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out a43921e 
  ql/src/test/results/clientpositive/spark/input12.q.out 4b0cf44 
  ql/src/test/results/clientpositive/spark/input13.q.out 260a65a 
  ql/src/test/results/clientpositive/spark/input1_limit.q.out 1f3b484 
  ql/src/test/results/clientpositive/spark/input_part2.q.out f2f3a2d 
  ql/src/test/results/clientpositive/spark/insert1.q.out 65032cb 
  ql/src/test/results/clientpositive/spark/insert_into3.q.out 5318a8b 
  ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out 3b669fc 
  ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 50c052d 
  ql/src/test/results/clientpositive/spark/multi_insert.q.out bae325f 
  ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out 280a893 
  ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out 
b07c582 
  
ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out
 fd477ca 
  ql/src/test/results/clientpositive/spark/multigroupby_singlemr.q.out 44991e3 
  ql/src/test/results/clientpositive/spark/ppd_multi_insert.q.out 96f2c06 
  ql/src/test/results/clientpositive/spark/ppd_transform.q.out 7ec5d8d 
  ql/src/test/results/clientpositive/spark/spark_multi_insert_split_work.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/subquery_multiinsert.q.out 2b4a331 
  ql/src/test/results/clientpositive/spark/union18.q.out f94fa0b 
  ql/src/test/results/clientpositive/spark/union19.q.out 8dcb543 
  ql/src/test/results/clientpositive/spark/union_remove_6.q.out 6730010 
  ql/src/test/results/clientpositive/spark/vectorized_ptf.q.out 909378b 
  ql/src/test/results/clientpositive/spark_multi_insert_split_work.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/26706/diff/


Testing (updated)
---

All multi-insertion related results are regenerated, and manually checked 
against the old results.
Also I created a new test spark_multi_insert_spill_work.q to check splitting 
won't generate duplicate FSs.


Thanks,

Chao Sun

Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]

2014-10-15 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26706/
---

(Updated Oct. 16, 2014, 1:25 a.m.)


Review request for hive and Xuefu Zhang.


Changes
---

Addressing the comments. Also, I'm thinking about adding another test for 
multi-insert in another JIRA, specifically check if the plan after splitting is 
in the correct shape.


Bugs: HIVE-8436
https://issues.apache.org/jira/browse/HIVE-8436


Repository: hive-git


Description
---

Based on the design doc, we need to split the operator tree of a work in 
SparkWork if the work is connected to multiple child works. The way splitting 
the operator tree is performed by cloning the original work and removing 
unwanted branches in the operator tree. Please refer to the design doc for 
details.
This process should be done right before we generate SparkPlan. We should have 
a utility method that takes the orignal SparkWork and return a modified 
SparkWork.
This process should also keep the information about the original work and its 
clones. Such information will be needed during SparkPlan generation (HIVE-8437).


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
5153885 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
126cb9f 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
d7744e9 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 280edde 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 644c681 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java 
1d01040 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
 93940bc 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 
20eb344 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java 
a62643a 
  ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 
  ql/src/test/results/clientpositive/spark/groupby7_map.q.out 2d99a81 
  ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out ca73985 
  ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 2d2c55b 
  ql/src/test/results/clientpositive/spark/groupby_cube1.q.out 942cdaa 
  ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 
399fe41 
  ql/src/test/results/clientpositive/spark/groupby_position.q.out 5e68807 
  ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 4259412 
  ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out e0e882e 
  ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out a43921e 
  ql/src/test/results/clientpositive/spark/input12.q.out 4b0cf44 
  ql/src/test/results/clientpositive/spark/input13.q.out 260a65a 
  ql/src/test/results/clientpositive/spark/input1_limit.q.out 1f3b484 
  ql/src/test/results/clientpositive/spark/input_part2.q.out f2f3a2d 
  ql/src/test/results/clientpositive/spark/insert1.q.out 65032cb 
  ql/src/test/results/clientpositive/spark/insert_into3.q.out 5318a8b 
  ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out 3b669fc 
  ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 50c052d 
  ql/src/test/results/clientpositive/spark/multi_insert.q.out bae325f 
  ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out 280a893 
  ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out 
b07c582 
  
ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out
 fd477ca 
  ql/src/test/results/clientpositive/spark/multigroupby_singlemr.q.out 44991e3 
  ql/src/test/results/clientpositive/spark/ppd_multi_insert.q.out 96f2c06 
  ql/src/test/results/clientpositive/spark/ppd_transform.q.out 7ec5d8d 
  ql/src/test/results/clientpositive/spark/subquery_multiinsert.q.out 2b4a331 
  ql/src/test/results/clientpositive/spark/union18.q.out f94fa0b 
  ql/src/test/results/clientpositive/spark/union19.q.out 8dcb543 
  ql/src/test/results/clientpositive/spark/union_remove_6.q.out 6730010 
  ql/src/test/results/clientpositive/spark/vectorized_ptf.q.out 909378b 

Diff: https://reviews.apache.org/r/26706/diff/


Testing
---


Thanks,

Chao Sun

Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]

2014-10-14 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26706/
---

Review request for hive and Xuefu Zhang.


Bugs: HIVE-8436
https://issues.apache.org/jira/browse/HIVE-8436


Repository: hive-git


Description
---

Based on the design doc, we need to split the operator tree of a work in 
SparkWork if the work is connected to multiple child works. The way splitting 
the operator tree is performed by cloning the original work and removing 
unwanted branches in the operator tree. Please refer to the design doc for 
details.
This process should be done right before we generate SparkPlan. We should have 
a utility method that takes the orignal SparkWork and return a modified 
SparkWork.
This process should also keep the information about the original work and its 
clones. Such information will be needed during SparkPlan generation (HIVE-8437).


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
5153885 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
126cb9f 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
d7744e9 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 280edde 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 644c681 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java 
1d01040 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
 93940bc 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 
20eb344 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java 
a62643a 
  ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 
  ql/src/test/results/clientpositive/spark/groupby7_map.q.out 95d7b59 
  ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out b425c67 
  ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out dc713b3 
  ql/src/test/results/clientpositive/spark/groupby_cube1.q.out cd8e85e 
  ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 
801ac8a 
  ql/src/test/results/clientpositive/spark/groupby_position.q.out b04e55c 
  ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 4bde6ea 
  ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out ab2fe84 
  ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 5c1cbc4 
  ql/src/test/results/clientpositive/spark/input12.q.out 4b0cf44 
  ql/src/test/results/clientpositive/spark/input13.q.out 260a65a 
  ql/src/test/results/clientpositive/spark/input1_limit.q.out 90bc8ea 
  ql/src/test/results/clientpositive/spark/input_part2.q.out f2f3a2d 
  ql/src/test/results/clientpositive/spark/insert1.q.out 65032cb 
  ql/src/test/results/clientpositive/spark/insert_into3.q.out 7964802 
  ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out 3b669fc 
  ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 50c052d 
  ql/src/test/results/clientpositive/spark/multi_insert.q.out 31ebbeb 
  ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out 0a983d8 
  ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out 
68b1312 
  
ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out
 f7867ac 
  ql/src/test/results/clientpositive/spark/multigroupby_singlemr.q.out dbb78a6 
  ql/src/test/results/clientpositive/spark/orc_analyze.q.out a0af7ba 
  ql/src/test/results/clientpositive/spark/parallel.q.out acd418f 
  ql/src/test/results/clientpositive/spark/ppd_multi_insert.q.out 169d2f1 
  ql/src/test/results/clientpositive/spark/ppd_transform.q.out 54b8a8a 
  ql/src/test/results/clientpositive/spark/subquery_multiinsert.q.out 6f8066d 
  ql/src/test/results/clientpositive/spark/union18.q.out 07ea2c5 
  ql/src/test/results/clientpositive/spark/union19.q.out 2fefe8e 
  ql/src/test/results/clientpositive/spark/union_remove_6.q.out 147f1fe 
  ql/src/test/results/clientpositive/spark/vectorized_ptf.q.out e12943c 

Diff: https://reviews.apache.org/r/26706/diff/


Testing
---


Thanks,

Chao Sun

Re: Review Request 26569: HIVE-8276 - Separate shuffle from ReduceTran and so create ShuffleTran [Spark Branch]

2014-10-11 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26569/
---

(Updated Oct. 11, 2014, 3:01 p.m.)


Review request for hive and Xuefu Zhang.


Changes
---

Addressing comments. Thanks Xuefu!


Bugs: HIVE-8276
https://issues.apache.org/jira/browse/HIVE-8276


Repository: hive-git


Description
---

Currently ShuffleTran captures both shuffle and reduce side processing. Per 
HIVE-8118, sometimes the output RDD from shuffle needs to be cached for better 
performance. Thus, it makes sense to separate shuffle from Reduce and create 
ShuffleTran class.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/IdentityTran.java 6c3cf2f 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 0732e06 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapTran.java e62527c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java 52ac724 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ShuffleTran.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
8e251df 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTran.java e770158 

Diff: https://reviews.apache.org/r/26569/diff/


Testing
---


Thanks,

Chao Sun

Review Request 26569: HIVE-8276 - Separate shuffle from ReduceTran and so create ShuffleTran [Spark Branch]

2014-10-10 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26569/
---

Review request for hive and Xuefu Zhang.


Bugs: HIVE-8276
https://issues.apache.org/jira/browse/HIVE-8276


Repository: hive-git


Description
---

Currently ShuffleTran captures both shuffle and reduce side processing. Per 
HIVE-8118, sometimes the output RDD from shuffle needs to be cached for better 
performance. Thus, it makes sense to separate shuffle from Reduce and create 
ShuffleTran class.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/IdentityTran.java 6c3cf2f 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 0732e06 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapTran.java e62527c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java 52ac724 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ShuffleTran.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
8e251df 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTran.java e770158 

Diff: https://reviews.apache.org/r/26569/diff/


Testing
---


Thanks,

Chao Sun

Re: Review Request 26569: HIVE-8276 - Separate shuffle from ReduceTran and so create ShuffleTran [Spark Branch]

2014-10-10 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26569/
---

(Updated Oct. 10, 2014, 7:18 p.m.)


Review request for hive and Xuefu Zhang.


Changes
---

Added cache flag for ShuffleTran.


Bugs: HIVE-8276
https://issues.apache.org/jira/browse/HIVE-8276


Repository: hive-git


Description
---

Currently ShuffleTran captures both shuffle and reduce side processing. Per 
HIVE-8118, sometimes the output RDD from shuffle needs to be cached for better 
performance. Thus, it makes sense to separate shuffle from Reduce and create 
ShuffleTran class.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/IdentityTran.java 6c3cf2f 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 0732e06 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapTran.java e62527c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java 52ac724 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ShuffleTran.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
8e251df 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTran.java e770158 

Diff: https://reviews.apache.org/r/26569/diff/


Testing
---


Thanks,

Chao Sun

Re: Review Request 26569: HIVE-8276 - Separate shuffle from ReduceTran and so create ShuffleTran [Spark Branch]

2014-10-10 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26569/
---

(Updated Oct. 11, 2014, 12:31 a.m.)


Review request for hive and Xuefu Zhang.


Bugs: HIVE-8276
https://issues.apache.org/jira/browse/HIVE-8276


Repository: hive-git


Description
---

Currently ShuffleTran captures both shuffle and reduce side processing. Per 
HIVE-8118, sometimes the output RDD from shuffle needs to be cached for better 
performance. Thus, it makes sense to separate shuffle from Reduce and create 
ShuffleTran class.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/IdentityTran.java 6c3cf2f 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 0732e06 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapTran.java e62527c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java 52ac724 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ShuffleTran.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
8e251df 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTran.java e770158 

Diff: https://reviews.apache.org/r/26569/diff/


Testing
---


Thanks,

Chao Sun

< 1 2 3 4 >

201 - 300 of 344 matches

Mail list logo