date:20150423

[jira] [Created] (HIVE-10460) change the key of Parquet Record to Nullwritable instead of void

2015-04-23 Thread Ferdinand Xu (JIRA)

Ferdinand Xu created HIVE-10460:
---

 Summary: change the key of Parquet Record to Nullwritable instead 
of void
 Key: HIVE-10460
 URL: https://issues.apache.org/jira/browse/HIVE-10460
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu


AcidInputFormat is accepting the key type implement the writable interface. So 
the void type is not valid if we want to make acid work for parquet. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10461) Implement Record Updater and Raw Merger for Parquet as well

2015-04-23 Thread Ferdinand Xu (JIRA)

Ferdinand Xu created HIVE-10461:
---

 Summary: Implement Record Updater and Raw Merger for Parquet as 
well
 Key: HIVE-10461
 URL: https://issues.apache.org/jira/browse/HIVE-10461
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu


The Record updater will create the data with acid information. And for the raw 
record merger it can provide the user-view data. In this jira, we should 
implement these two classes and make the basic acid w/r case work. For the 
upper layer like FileSinkOperator, CompactorMR and TxnManager, we can file new 
jiras to fix them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10462) CBO (Calcite Return Path): Exception thrown in conversion to MapJoin

2015-04-23 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-10462:
--

 Summary: CBO (Calcite Return Path): Exception thrown in conversion 
to MapJoin
 Key: HIVE-10462
 URL: https://issues.apache.org/jira/browse/HIVE-10462
 Project: Hive
  Issue Type: Sub-task
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


When the return path is on, the mapjoin conversion optimization fails as some 
DS in the Join descriptor have not been initialized properly.

The failure can be reproduced with auto_join4.q. In particular, the following 
Exception is thrown:

{noformat}
org.apache.hadoop.hive.ql.parse.SemanticException: Generate Map Join Task 
Error: null
at 
org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinTaskDispatcher.processCurrentTask(CommonJoinTaskDispatcher.java:516)
at 
org.apache.hadoop.hive.ql.optimizer.physical.AbstractJoinTaskDispatcher.dispatch(AbstractJoinTaskDispatcher.java:179)
at 
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111)
at 
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:180)
at 
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:125)
at 
org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver.resolve(CommonJoinResolver.java:79)
at 
org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:107)
at 
org.apache.hadoop.hive.ql.parse.MapReduceCompiler.optimizeTaskPlan(MapReduceCompiler.java:270)
at 
org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:227)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10084)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:203)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:225)
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:225)
...
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10463) CBO (Calcite Return Path): Insert overwrite... select * from... queries failing for bucketed tables

2015-04-23 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-10463:
--

 Summary: CBO (Calcite Return Path): Insert overwrite... select * 
from... queries failing for bucketed tables
 Key: HIVE-10463
 URL: https://issues.apache.org/jira/browse/HIVE-10463
 Project: Hive
  Issue Type: Sub-task
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


When return path is on. To reproduce the Exception, take the following excerpt 
from auto_sortmerge_join_10.q:

{noformat}
set hive.enforce.bucketing = true;
set hive.enforce.sorting = true;
set hive.exec.reducers.max = 1;

CREATE TABLE tbl1(key int, value string) CLUSTERED BY (key) SORTED BY (key) 
INTO 2 BUCKETS;

insert overwrite table tbl1
select * from src where key < 10;
{noformat}

It produces the following Exception:

{noformat}
java.lang.Exception: java.lang.RuntimeException: Error in configuring object
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: java.lang.RuntimeException: Error in configuring object
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:409)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 10 more
Caused by: java.lang.RuntimeException: Reduce operator initialization failed
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:157)
... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.RuntimeException: cannot find field key from [0:_col0, 1:_col1]
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:446)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:150)
... 14 more
Caused by: java.lang.RuntimeException: cannot find field key from [0:_col0, 
1:_col1]
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:416)
at 
org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55)
at 
org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:978)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:383)
... 22 more
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 33422: HIVE-10434 - Cancel connection when remote Spark driver process has failed [Spark Branch]

2015-04-23 Thread Marcelo Vanzin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33422/#review81328
---

Ship it!


Just a minor thing left to fix.


spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java


To avoid races, I'd do:

final ClientInfo cinfo = pendingClients.remove(clientId);
if (cinfo == null) { /* nothing to do */ }


- Marcelo Vanzin


On April 22, 2015, 1:25 a.m., Chao Sun wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33422/
> ---
> 
> (Updated April 22, 2015, 1:25 a.m.)
> 
> 
> Review request for hive and Marcelo Vanzin.
> 
> 
> Bugs: HIVE-10434
> https://issues.apache.org/jira/browse/HIVE-10434
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch cancels the connection from HS2 to remote process once the latter 
> has failed and exited with error code, to
> avoid potential long timeout.
> It add a new public method cancelClient to the RpcServer class - not sure 
> whether there's an easier way to do this..
> 
> 
> Diffs
> -
> 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
> 71e432d 
>   spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java 
> 32d4c46 
> 
> Diff: https://reviews.apache.org/r/33422/diff/
> 
> 
> Testing
> ---
> 
> Tested on my own cluster, and it worked.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

PreCommit-HIVE-TRUNK-Build still on svn repo

2015-04-23 Thread Ashutosh Chauhan

Hi all,

Seems like Hive QA is still doing checkouts from locked down svn repo for
running tests. When I went to configure page of jenkins job, it doesn't
list git as an option in Source code management section. Does anyone know
if git repo is supported there? And if so, how to enable it?

Thanks,
Ashutosh

[jira] [Created] (HIVE-10464) How i find the kryo version

2015-04-23 Thread ankush (JIRA)

ankush created HIVE-10464:
-

 Summary: How i find the kryo version 
 Key: HIVE-10464
 URL: https://issues.apache.org/jira/browse/HIVE-10464
 Project: Hive
  Issue Type: Improvement
Reporter: ankush


Could you please let me know how i find the kryo version that i using ?

Please help me on this,

We are just running HQL (Hive) queries



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 33251: HIVE-10302 Cache small tables in memory [Spark Branch]

2015-04-23 Thread Jimmy Xiang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33251/
---

(Updated April 23, 2015, 5:48 p.m.)


Review request for hive, Chao Sun, Szehon Ho, and Xuefu Zhang.


Bugs: HIVE-10302
https://issues.apache.org/jira/browse/HIVE-10302


Repository: hive-git


Description
---

Cached the small table containter so that mapjoin tasks can use it if the task 
is executed on the same Spark executor.
The cache is released right before the next job after the mapjoin job is done.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java fe108c4 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HivePairFlatMapFunction.java 
2f137f9 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
3f240f5 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 72ab913 

Diff: https://reviews.apache.org/r/33251/diff/


Testing
---

Ran several queries in live cluster. ptest pending.


Thanks,

Jimmy Xiang

Re: Review Request 33422: HIVE-10434 - Cancel connection when remote Spark driver process has failed [Spark Branch]

2015-04-23 Thread Chao Sun



> On April 23, 2015, 4:55 p.m., Marcelo Vanzin wrote:
> > spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java, 
> > line 172
> > 
> >
> > To avoid races, I'd do:
> > 
> > final ClientInfo cinfo = pendingClients.remove(clientId);
> > if (cinfo == null) { /* nothing to do */ }

Yeah, this could happen if the method is called by multiple threads with same 
clientId, even though this shouldn't happen in the current case, I think. 
Should be fixed. Thanks for the suggestion.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33422/#review81328
---


On April 22, 2015, 1:25 a.m., Chao Sun wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33422/
> ---
> 
> (Updated April 22, 2015, 1:25 a.m.)
> 
> 
> Review request for hive and Marcelo Vanzin.
> 
> 
> Bugs: HIVE-10434
> https://issues.apache.org/jira/browse/HIVE-10434
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch cancels the connection from HS2 to remote process once the latter 
> has failed and exited with error code, to
> avoid potential long timeout.
> It add a new public method cancelClient to the RpcServer class - not sure 
> whether there's an easier way to do this..
> 
> 
> Diffs
> -
> 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
> 71e432d 
>   spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java 
> 32d4c46 
> 
> Diff: https://reviews.apache.org/r/33422/diff/
> 
> 
> Testing
> ---
> 
> Tested on my own cluster, and it worked.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

Re: Review Request 33422: HIVE-10434 - Cancel connection when remote Spark driver process has failed [Spark Branch]

2015-04-23 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33422/
---

(Updated April 23, 2015, 6:11 p.m.)


Review request for hive and Marcelo Vanzin.


Changes
---

Addressing Marcelo's comment.


Bugs: HIVE-10434
https://issues.apache.org/jira/browse/HIVE-10434


Repository: hive-git


Description
---

This patch cancels the connection from HS2 to remote process once the latter 
has failed and exited with error code, to
avoid potential long timeout.
It add a new public method cancelClient to the RpcServer class - not sure 
whether there's an easier way to do this..


Diffs (updated)
-

  spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
71e432d 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java 
32d4c46 

Diff: https://reviews.apache.org/r/33422/diff/


Testing
---

Tested on my own cluster, and it worked.


Thanks,

Chao Sun

Re: Review Request 33422: HIVE-10434 - Cancel connection when remote Spark driver process has failed [Spark Branch]

2015-04-23 Thread Xuefu Zhang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33422/#review81361
---



spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java


I'm wondering if cinfo can be null here. After the contains() check above, 
things might have changed. So, cinfo is not guaranteed to be not null.


- Xuefu Zhang


On April 23, 2015, 6:11 p.m., Chao Sun wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33422/
> ---
> 
> (Updated April 23, 2015, 6:11 p.m.)
> 
> 
> Review request for hive and Marcelo Vanzin.
> 
> 
> Bugs: HIVE-10434
> https://issues.apache.org/jira/browse/HIVE-10434
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch cancels the connection from HS2 to remote process once the latter 
> has failed and exited with error code, to
> avoid potential long timeout.
> It add a new public method cancelClient to the RpcServer class - not sure 
> whether there's an easier way to do this..
> 
> 
> Diffs
> -
> 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
> 71e432d 
>   spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java 
> 32d4c46 
> 
> Diff: https://reviews.apache.org/r/33422/diff/
> 
> 
> Testing
> ---
> 
> Tested on my own cluster, and it worked.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

Please cancel Mail forward

2015-04-23 Thread Ahmed Mosaad


Dear All

Kindly be informed that there is mail forwarding done from those accounts
which I'm sending " TO "to be received on " amos...@eg.ibm.com ", which was
related to an old employee " Anas Mosaad " who is no longer owns this mail
ID.
please cancel this forward from your side as I've been receiving a lot of
mails I shouldn't receive ( as I was assigned the same mail ID now ).

Regards
Ahmed Mosaad

Re: Review Request 33422: HIVE-10434 - Cancel connection when remote Spark driver process has failed [Spark Branch]

2015-04-23 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33422/
---

(Updated April 23, 2015, 6:54 p.m.)


Review request for hive and Marcelo Vanzin.


Changes
---

Addressing comments from Xuefu and Marcelo.


Bugs: HIVE-10434
https://issues.apache.org/jira/browse/HIVE-10434


Repository: hive-git


Description
---

This patch cancels the connection from HS2 to remote process once the latter 
has failed and exited with error code, to
avoid potential long timeout.
It add a new public method cancelClient to the RpcServer class - not sure 
whether there's an easier way to do this..


Diffs (updated)
-

  spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
71e432d 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java 
32d4c46 

Diff: https://reviews.apache.org/r/33422/diff/


Testing
---

Tested on my own cluster, and it worked.


Thanks,

Chao Sun

Re: Review Request 33422: HIVE-10434 - Cancel connection when remote Spark driver process has failed [Spark Branch]

2015-04-23 Thread Chao Sun



> On April 23, 2015, 6:22 p.m., Xuefu Zhang wrote:
> > spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java, 
> > line 176
> > 
> >
> > I'm wondering if cinfo can be null here. After the contains() check 
> > above, things might have changed. So, cinfo is not guaranteed to be not 
> > null.
> 
> Marcelo Vanzin wrote:
> Yeah, that was my suggestion above. Don't use `containsKey`, instead just 
> remove and check for null.

Yes, that's correct. I ignored that case as well..


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33422/#review81361
---


On April 23, 2015, 6:54 p.m., Chao Sun wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33422/
> ---
> 
> (Updated April 23, 2015, 6:54 p.m.)
> 
> 
> Review request for hive and Marcelo Vanzin.
> 
> 
> Bugs: HIVE-10434
> https://issues.apache.org/jira/browse/HIVE-10434
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch cancels the connection from HS2 to remote process once the latter 
> has failed and exited with error code, to
> avoid potential long timeout.
> It add a new public method cancelClient to the RpcServer class - not sure 
> whether there's an easier way to do this..
> 
> 
> Diffs
> -
> 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
> 71e432d 
>   spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java 
> 32d4c46 
> 
> Diff: https://reviews.apache.org/r/33422/diff/
> 
> 
> Testing
> ---
> 
> Tested on my own cluster, and it worked.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

Re: Review Request 33422: HIVE-10434 - Cancel connection when remote Spark driver process has failed [Spark Branch]

2015-04-23 Thread Marcelo Vanzin



> On April 23, 2015, 6:22 p.m., Xuefu Zhang wrote:
> > spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java, 
> > line 176
> > 
> >
> > I'm wondering if cinfo can be null here. After the contains() check 
> > above, things might have changed. So, cinfo is not guaranteed to be not 
> > null.

Yeah, that was my suggestion above. Don't use `containsKey`, instead just 
remove and check for null.


- Marcelo


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33422/#review81361
---


On April 23, 2015, 6:11 p.m., Chao Sun wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33422/
> ---
> 
> (Updated April 23, 2015, 6:11 p.m.)
> 
> 
> Review request for hive and Marcelo Vanzin.
> 
> 
> Bugs: HIVE-10434
> https://issues.apache.org/jira/browse/HIVE-10434
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch cancels the connection from HS2 to remote process once the latter 
> has failed and exited with error code, to
> avoid potential long timeout.
> It add a new public method cancelClient to the RpcServer class - not sure 
> whether there's an easier way to do this..
> 
> 
> Diffs
> -
> 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
> 71e432d 
>   spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java 
> 32d4c46 
> 
> Diff: https://reviews.apache.org/r/33422/diff/
> 
> 
> Testing
> ---
> 
> Tested on my own cluster, and it worked.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

Re: PreCommit-HIVE-TRUNK-Build still on svn repo

2015-04-23 Thread Sergio Pena

I'm going to investigate that. I think there is some work that is needed to
do in the dev-support/ scripts in order to switch to git.

On Thu, Apr 23, 2015 at 12:12 PM, Ashutosh Chauhan 
wrote:

> Hi all,
>
> Seems like Hive QA is still doing checkouts from locked down svn repo for
> running tests. When I went to configure page of jenkins job, it doesn't
> list git as an option in Source code management section. Does anyone know
> if git repo is supported there? And if so, how to enable it?
>
> Thanks,
> Ashutosh
>

Re: PreCommit-HIVE-TRUNK-Build still on svn repo

2015-04-23 Thread Szehon Ho

Yea it is probably specified in the source code or the properties file, the
option is not exposed as parameter via jenkins job currently (wasn't
expected it would switch).  We will have to look at it.

On Thu, Apr 23, 2015 at 10:12 AM, Ashutosh Chauhan 
wrote:

> Hi all,
>
> Seems like Hive QA is still doing checkouts from locked down svn repo for
> running tests. When I went to configure page of jenkins job, it doesn't
> list git as an option in Source code management section. Does anyone know
> if git repo is supported there? And if so, how to enable it?
>
> Thanks,
> Ashutosh
>

[jira] [Created] (HIVE-10465) whitelist restrictions don't get initialized in initial part of session

2015-04-23 Thread Thejas M Nair (JIRA)

Thejas M Nair created HIVE-10465:


 Summary: whitelist restrictions don't get initialized in initial 
part of session
 Key: HIVE-10465
 URL: https://issues.apache.org/jira/browse/HIVE-10465
 Project: Hive
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair


Whitelist restrictions use a regex pattern in HiveConf, but when a new HiveConf 
object copy is created, the regex pattern is not initialized in the new 
HiveConf copy.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10466) LLAP: fix container sizing configuration for memory

2015-04-23 Thread Sergey Shelukhin (JIRA)

Sergey Shelukhin created HIVE-10466:
---

 Summary: LLAP: fix container sizing configuration for memory
 Key: HIVE-10466
 URL: https://issues.apache.org/jira/browse/HIVE-10466
 Project: Hive
  Issue Type: Sub-task
Reporter: Gopal V
Assignee: Vikram Dixit K


This is [~sershe] impersonating :)

We cannot use full machine for LLAP due to config for cache and executors being 
"split brain"... please refer to  [~gopalv] for details



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10467) Switch to GIT repository on Jenkins precommit tests

2015-04-23 Thread JIRA

Sergio Peña created HIVE-10467:
--

 Summary: Switch to GIT repository on Jenkins precommit tests 
 Key: HIVE-10467
 URL: https://issues.apache.org/jira/browse/HIVE-10467
 Project: Hive
  Issue Type: Improvement
  Components: Testing Infrastructure
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-10467.1.patch





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10468) Create scripts to do metastore upgrade tests on jenkins for Oracle DB.

2015-04-23 Thread Naveen Gangam (JIRA)

Naveen Gangam created HIVE-10468:


 Summary: Create scripts to do metastore upgrade tests on jenkins 
for Oracle DB.
 Key: HIVE-10468
 URL: https://issues.apache.org/jira/browse/HIVE-10468
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 1.1.0
Reporter: Naveen Gangam
Assignee: Naveen Gangam






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10469) Create table Like in HcatClient does not create partitions

2015-04-23 Thread Antoni Ivanov (JIRA)

Antoni Ivanov created HIVE-10469:


 Summary: Create table Like in HcatClient does not create partitions
 Key: HIVE-10469
 URL: https://issues.apache.org/jira/browse/HIVE-10469
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Antoni Ivanov
 Fix For: 0.12.1


When using HcaClient#createTableLike the table created is missing partitions 
although the original table does have them

This is unlike the Hacatalog Rest API: 
https://hive.apache.org/javadocs/hcat-r0.5.0/rest.html 
or as impala/hive SQL query 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[DISCUSS] Deprecating Hive CLI

2015-04-23 Thread Xuefu Zhang

Hi all,

I'd like to revive the discussion about the fate of Hive CLI, as this topic
has haunted us several times including [1][2]. It looks to me that there is
a consensus that it's not wise for Hive community to keep both Hive CLI as
it is as well as Beeline + HS2. However, I don't believe that no action is
the best action for us. From discussion so far, I see the following
proposals:

1. Deprecating Hive CLI and advise that users use Beeline.
2. Make Hive CLI as naming flavor to beeline with embedded mode.

Frankly, I don't see much difference between the two approaches. Keeping an
alias at script or even code level isn't that much work. However, shouldn't
we pick a direction and start moving to it? If there is any gaps between
beeline embedded and Hive CLI, we should identify and fill in those.

I'd love to hear the thoughts from the community and hope this time we will
have concrete action items to work on.

Thanks,
Xuefu

[1]
http://mail-archives.apache.org/mod_mbox/hive-dev/201412.mbox/%3C5485E1BE.3060709%40hortonworks.com%3E
[2] https://www.mail-archive.com/dev@hive.apache.org/msg112378.html

[jira] [Created] (HIVE-10470) LLAP: NPE in IO when returning 0 rows with no projection

2015-04-23 Thread Sergey Shelukhin (JIRA)

Sergey Shelukhin created HIVE-10470:
---

 Summary: LLAP: NPE in IO when returning 0 rows with no projection
 Key: HIVE-10470
 URL: https://issues.apache.org/jira/browse/HIVE-10470
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Prasanth Jayachandran


Looks like a trivial fix, unless I'm missing something. I may do it later if 
you don't ;)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Please cancel Mail forward

2015-04-23 Thread Thejas Nair

To unsubscribe from an email to X@*.apache.org mailing list, send an
email to X-unsubscribe@*.apache.org

For example to unsubscribe to dev@hive.apache.org, send email to
dev-unsubscr...@hive.apache.org

On Thu, Apr 23, 2015 at 11:21 AM, Ahmed Mosaad  wrote:
>
> Dear All
>
> Kindly be informed that there is mail forwarding done from those accounts
> which I'm sending " TO "to be received on " amos...@eg.ibm.com ", which was
> related to an old employee " Anas Mosaad " who is no longer owns this mail
> ID.
> please cancel this forward from your side as I've been receiving a lot of
> mails I shouldn't receive ( as I was assigned the same mail ID now ).
>
> Regards
> Ahmed Mosaad
>

[jira] [Created] (HIVE-10471) Derive column definitions from a raw Parquet data file

2015-04-23 Thread Mariano Dominguez (JIRA)

Mariano Dominguez created HIVE-10471:


 Summary: Derive column definitions from a raw Parquet data file
 Key: HIVE-10471
 URL: https://issues.apache.org/jira/browse/HIVE-10471
 Project: Hive
  Issue Type: Improvement
Reporter: Mariano Dominguez


This feature will allow Hive to create Parquet-backed tables the same way 
Cloudera's Impala[1] does:

CREATE EXTERNAL TABLE ingest_existing_files LIKE PARQUET 
'/user/etl/destination/datafile1.dat'
  STORED AS PARQUET
  LOCATION '/user/etl/destination';

CREATE TABLE columns_from_data_file LIKE PARQUET 
'/user/etl/destination/datafile1.dat'
  STORED AS PARQUET;

[1] 
http://www.cloudera.com/content/cloudera/en/documentation/cloudera-impala/latest/topics/impala_parquet.html#parquet_ddl_unique_1





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSS] Deprecating Hive CLI

2015-04-23 Thread Alan Gates

Xuefu, thanks for getting this discussion started. Limiting our code
paths is definitely a plus. My inclination would be to go towards
option 2. A few questions:

1) Is there any functionality in CLI that's not in beeline?
2) If I understand correctly option 2 would have an implicit HS2 in
process when a user runs the CLI. Would this be available in option 1
as well?
3) Are there any performance implications, since now commands have to
hop through a thrift/jdbc loop even in the embedded mode?
4) If we choose option 2 how backward compatible can we make it? Will
users need to change any scripts they have that use the CLI? Do we have
tests that will make sure of this?

Alan.

Xuefu Zhang
April 23, 2015 at 14:43
Hi all,

I'd like to revive the discussion about the fate of Hive CLI, as this
topic
has haunted us several times including [1][2]. It looks to me that
there is

a consensus that it's not wise for Hive community to keep both Hive CLI as
it is as well as Beeline + HS2. However, I don't believe that no action is
the best action for us. From discussion so far, I see the following
proposals:

1. Deprecating Hive CLI and advise that users use Beeline.
2. Make Hive CLI as naming flavor to beeline with embedded mode.

Frankly, I don't see much difference between the two approaches.
Keeping an
alias at script or even code level isn't that much work. However,
shouldn't

we pick a direction and start moving to it? If there is any gaps between
beeline embedded and Hive CLI, we should identify and fill in those.

I'd love to hear the thoughts from the community and hope this time we
will

have concrete action items to work on.

Thanks,
Xuefu

[1]
http://mail-archives.apache.org/mod_mbox/hive-dev/201412.mbox/%3C5485E1BE.3060709%40hortonworks.com%3E
[2] https://www.mail-archive.com/dev@hive.apache.org/msg112378.html

[jira] [Created] (HIVE-10472) Jenkins HMS upgrade test is not publishing results due to GIT change

2015-04-23 Thread JIRA

Sergio Peña created HIVE-10472:
--

 Summary: Jenkins HMS upgrade test is not publishing results due to 
GIT change
 Key: HIVE-10472
 URL: https://issues.apache.org/jira/browse/HIVE-10472
 Project: Hive
  Issue Type: Bug
Reporter: Sergio Peña
Assignee: Sergio Peña


This error is happening on Jenkins when running the HMS upgrade tests. 
The class used to publish the results is not found on any directory.

+ cd /var/lib/jenkins/jobs/PreCommit-HIVE-METASTORE-Test/workspace
+ set +x
Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/hive/ptest/execution/JIRAService
Caused by: java.lang.ClassNotFoundException: 
org.apache.hive.ptest.execution.JIRAService
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class: org.apache.hive.ptest.execution.JIRAService.  
Program will exit.
+ ret=0

The problem is because the jenkins-execute-hms-test.sh is downloading the code 
to another directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10473) Spark client is recreated even spark configuration is not changed

2015-04-23 Thread Jimmy Xiang (JIRA)

Jimmy Xiang created HIVE-10473:
--

 Summary: Spark client is recreated even spark configuration is not 
changed
 Key: HIVE-10473
 URL: https://issues.apache.org/jira/browse/HIVE-10473
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor


Currently, we think a spark setting is changed as long as the set method is 
called, even we set it to the same value as before. We should check if the 
value is changed too, since it takes time to start a new spark client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSS] Deprecating Hive CLI

2015-04-23 Thread Alexander Pivovarov

Hive CLI should be part of Hive project

Beeline is a jdbc client. It should be separated from Hive (as any other
jdbc clients).

I'm not sure I understand business value of beeline.

Power users (which has ssh access to cluster) can use Hive CLI.
- more reliable approach because it does not need HS2
- simpler authentication (in many cases application users do not have LDAP
password)

Business users (LDAP users) will probably use GUI jdbc clients (e.g. SQL
Workbench/J or SQuirreL SQL)

Is beeline production ready?
- beeline q tests were disabled about 2 years ago

I do not think beeline is popular.

I'd rather focus on improving JDBC driver to make sure SQL Workbench/J or
SQuirreL SQL can work with hiveserver2 effectively.
7 months ago did several fixes for hive JDBC driver to make basic
operations work in SQL Workbench/J and SQuirreL SQL.
But lots of jdbc methods still not supported by hive jdbc driver







On Thu, Apr 23, 2015 at 2:43 PM, Xuefu Zhang  wrote:

> Hi all,
>
> I'd like to revive the discussion about the fate of Hive CLI, as this topic
> has haunted us several times including [1][2]. It looks to me that there is
> a consensus that it's not wise for Hive community to keep both Hive CLI as
> it is as well as Beeline + HS2. However, I don't believe that no action is
> the best action for us. From discussion so far, I see the following
> proposals:
>
> 1. Deprecating Hive CLI and advise that users use Beeline.
> 2. Make Hive CLI as naming flavor to beeline with embedded mode.
>
> Frankly, I don't see much difference between the two approaches. Keeping an
> alias at script or even code level isn't that much work. However, shouldn't
> we pick a direction and start moving to it? If there is any gaps between
> beeline embedded and Hive CLI, we should identify and fill in those.
>
> I'd love to hear the thoughts from the community and hope this time we will
> have concrete action items to work on.
>
> Thanks,
> Xuefu
>
> [1]
>
> http://mail-archives.apache.org/mod_mbox/hive-dev/201412.mbox/%3C5485E1BE.3060709%40hortonworks.com%3E
> [2] https://www.mail-archive.com/dev@hive.apache.org/msg112378.html
>

Re: [DISCUSS] Deprecating Hive CLI

2015-04-23 Thread Lars Francke

I've been at about 20 different customers in the years since Beeline has
been added. I can only think of a single one that has used beeline. The
instinct is to use "hive", partially because it is easy to remember and
intuitive and because it is easier to use. I end up googling the stupid
JDBC syntax every single time.

I know this might be a bit "out there" but I propose something else:
1) Rename (or link) "beeline" to "hive"
2) Add a "--hiveserver2" (or "--jdbc" or "--beeline") option to the "hive"
command to get the current "beeline", this'd keep the CLI as default, we
could also add a "--legacy" or "--cli" option and make
"hiveserver2/beeline" the default.
3) Add a "--embedded-hs2" option to the "hive" command to get an embedded
HS2 in Beeline
4) Add some documentation to beeline reminding people on startup of beeline
on how to connect and how to use embedded mode

The fact is that the old shell just works for lots of people and there's
just no need for beeline for these people. Also the name is confusing -
especially for non-native speakers. It's not a common word so it's not easy
to remember.

On Fri, Apr 24, 2015 at 12:35 AM, Alan Gates  wrote:

> Xuefu, thanks for getting this discussion started.  Limiting our code
> paths is definitely a plus.  My inclination would be to go towards option
> 2.  A few questions:
>
> 1) Is there any functionality in CLI that's not in beeline?
> 2) If I understand correctly option 2 would have an implicit HS2 in
> process when a user runs the CLI.  Would this be available in option 1 as
> well?
> 3) Are there any performance implications, since now commands have to hop
> through a thrift/jdbc loop even in the embedded mode?
> 4) If we choose option 2 how backward compatible can we make it?  Will
> users need to change any scripts they have that use the CLI?  Do we have
> tests that will make sure of this?
>
> Alan.
>
>   Xuefu Zhang 
>  April 23, 2015 at 14:43
> Hi all,
>
> I'd like to revive the discussion about the fate of Hive CLI, as this topic
> has haunted us several times including [1][2]. It looks to me that there is
> a consensus that it's not wise for Hive community to keep both Hive CLI as
> it is as well as Beeline + HS2. However, I don't believe that no action is
> the best action for us. From discussion so far, I see the following
> proposals:
>
> 1. Deprecating Hive CLI and advise that users use Beeline.
> 2. Make Hive CLI as naming flavor to beeline with embedded mode.
>
> Frankly, I don't see much difference between the two approaches. Keeping an
> alias at script or even code level isn't that much work. However, shouldn't
> we pick a direction and start moving to it? If there is any gaps between
> beeline embedded and Hive CLI, we should identify and fill in those.
>
> I'd love to hear the thoughts from the community and hope this time we will
> have concrete action items to work on.
>
> Thanks,
> Xuefu
>
> [1]
>
> http://mail-archives.apache.org/mod_mbox/hive-dev/201412.mbox/%3C5485E1BE.3060709%40hortonworks.com%3E
> [2] https://www.mail-archive.com/dev@hive.apache.org/msg112378.html
>
>

Re: [DISCUSS] Deprecating Hive CLI

2015-04-23 Thread Xuefu Zhang

Hi Alan,

Here is my understanding to the questions you asked:

Re 1): There used to be many gaps, but majority if not all of them are
filled. One of the action item out of this discussion is to identity and
remaining gaps.

Re 2): if you run "beeline -u jdbc:hive2://", there will be a HS2 embedded
in the beeline process. We can change the shell so that user just need to
type "beeline" for embedded HS2.

Re 3): I don't know if there will be any perf penalty and how much for
beeline + embedded HS2. I'm also not certain if there is such a loop (to be
found out). If so, I don't believe the perf impact would be noticeable.

Re 4): Sort of related to #1. The goal of this discussion is to choose a
route and take whatever actions to make it backward compatibility at
functional level. Choosing option 1 of course require user to change their
script, while option 2 doesn't. Testing is also an action item.

FYI, I had an old blog post [1], though outdated, would give some ideas of
the difference between beeline and Hive CLI at the time of writing.

Thanks,
Xuefu

[1]
http://blog.cloudera.com/blog/2014/02/migrating-from-hive-cli-to-beeline-a-primer/

On Thu, Apr 23, 2015 at 3:35 PM, Alan Gates  wrote:

> Xuefu, thanks for getting this discussion started.  Limiting our code
> paths is definitely a plus.  My inclination would be to go towards option
> 2.  A few questions:
>
> 1) Is there any functionality in CLI that's not in beeline?
> 2) If I understand correctly option 2 would have an implicit HS2 in
> process when a user runs the CLI.  Would this be available in option 1 as
> well?
> 3) Are there any performance implications, since now commands have to hop
> through a thrift/jdbc loop even in the embedded mode?
> 4) If we choose option 2 how backward compatible can we make it?  Will
> users need to change any scripts they have that use the CLI?  Do we have
> tests that will make sure of this?
>
> Alan.
>
>   Xuefu Zhang 
>  April 23, 2015 at 14:43
> Hi all,
>
> I'd like to revive the discussion about the fate of Hive CLI, as this topic
> has haunted us several times including [1][2]. It looks to me that there is
> a consensus that it's not wise for Hive community to keep both Hive CLI as
> it is as well as Beeline + HS2. However, I don't believe that no action is
> the best action for us. From discussion so far, I see the following
> proposals:
>
> 1. Deprecating Hive CLI and advise that users use Beeline.
> 2. Make Hive CLI as naming flavor to beeline with embedded mode.
>
> Frankly, I don't see much difference between the two approaches. Keeping an
> alias at script or even code level isn't that much work. However, shouldn't
> we pick a direction and start moving to it? If there is any gaps between
> beeline embedded and Hive CLI, we should identify and fill in those.
>
> I'd love to hear the thoughts from the community and hope this time we will
> have concrete action items to work on.
>
> Thanks,
> Xuefu
>
> [1]
>
> http://mail-archives.apache.org/mod_mbox/hive-dev/201412.mbox/%3C5485E1BE.3060709%40hortonworks.com%3E
> [2] https://www.mail-archive.com/dev@hive.apache.org/msg112378.html
>
>

[jira] [Created] (HIVE-10474) LLAP: investigate why TPCH Q1 1k is slow

2015-04-23 Thread Sergey Shelukhin (JIRA)

Sergey Shelukhin created HIVE-10474:
---

 Summary: LLAP: investigate why TPCH Q1 1k is slow
 Key: HIVE-10474
 URL: https://issues.apache.org/jira/browse/HIVE-10474
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin


While most queries run faster in LLAP than just Tez with container reuse, TPCH 
Q1 is much slower.
On my run, tez with container reuse (current default LLAP configuration but 
mode == container and no daemons running) finished in 25.5sec average; with 16 
LLAP daemons in default config it finished in 35.5sec; w/the daemons w/o IO 
elevator (to rule out its impact) it took 59.7sec w/strange distribution (later 
runs were slower than earlier runs, still, fastest run was 49.5sec).

We need to figure out why this is happening. Is it just slot discrepancy? 
Regardless, this needs to be addressed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 33251: HIVE-10302 Cache small tables in memory [Spark Branch]

2015-04-23 Thread Jimmy Xiang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33251/
---

(Updated April 23, 2015, 11:22 p.m.)


Review request for hive, Chao Sun, Szehon Ho, and Xuefu Zhang.


Bugs: HIVE-10302
https://issues.apache.org/jira/browse/HIVE-10302


Repository: hive-git


Description
---

Cached the small table containter so that mapjoin tasks can use it if the task 
is executed on the same Spark executor.
The cache is released right before the next job after the mapjoin job is done.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java fe108c4 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HivePairFlatMapFunction.java 
2f137f9 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 72ab913 

Diff: https://reviews.apache.org/r/33251/diff/


Testing
---

Ran several queries in live cluster. ptest pending.


Thanks,

Jimmy Xiang

jsonpath UDF HIVE-9864

2015-04-23 Thread Alexander Pivovarov

Hi Everyone

I implemented jsonpath UDF because existing get_json_object supports only
limited json path syntax.

Quite often people store json in string column in hive table.
e.g.
Mongo migration
Oracle GoldenGate data
Rest API responses from devices
application logs in json
etc.

I think jsonpath UDF has great business value because it provides full json
path support to query the data

Can you put your comments on the ticket?
https://issues.apache.org/jira/browse/HIVE-9864


Thank you
Alex

[jira] [Created] (HIVE-10475) LLAP: Minor fixes after tez api enhancements for dag completion

2015-04-23 Thread Siddharth Seth (JIRA)

Siddharth Seth created HIVE-10475:
-

 Summary: LLAP: Minor fixes after tez api enhancements for dag 
completion
 Key: HIVE-10475
 URL: https://issues.apache.org/jira/browse/HIVE-10475
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Fix For: llap


TEZ-2212 and TEZ-2361  add APIs to propagate dag completion information to the 
TaskCommunicator plugin. This jira is for minor fixes to get the llap branch to 
compile against these changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10476) Hive query should fail when it fails to initialize a session in SetSparkReducerParallelism [Spark Branch]

2015-04-23 Thread Chao Sun (JIRA)

Chao Sun created HIVE-10476:
---

 Summary: Hive query should fail when it fails to initialize a 
session in SetSparkReducerParallelism [Spark Branch]
 Key: HIVE-10476
 URL: https://issues.apache.org/jira/browse/HIVE-10476
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao Sun
Assignee: Chao Sun
Priority: Minor


Currently, for a Hive query HoS need to get a session
a session twice, once in SparkSetReducerParallelism, and another when 
submitting the actual job.
The issue is that sometimes there's problem when launching a Yarn application 
(e.g., don't have permission), then user will have to wait for two timeouts, 
because both session initializations will fail. This turned out to happen 
frequently.

This JIRA proposes to fail the query in SparkSetReducerParallelism, when it 
cannot initialize the session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 33251: HIVE-10302 Cache small tables in memory [Spark Branch]

2015-04-23 Thread Xuefu Zhang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33251/#review81444
---

Ship it!


Ship It!

- Xuefu Zhang


On April 23, 2015, 11:22 p.m., Jimmy Xiang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33251/
> ---
> 
> (Updated April 23, 2015, 11:22 p.m.)
> 
> 
> Review request for hive, Chao Sun, Szehon Ho, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-10302
> https://issues.apache.org/jira/browse/HIVE-10302
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Cached the small table containter so that mapjoin tasks can use it if the 
> task is executed on the same Spark executor.
> The cache is released right before the next job after the mapjoin job is done.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java 
> fe108c4 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HivePairFlatMapFunction.java 
> 2f137f9 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
> 72ab913 
> 
> Diff: https://reviews.apache.org/r/33251/diff/
> 
> 
> Testing
> ---
> 
> Ran several queries in live cluster. ptest pending.
> 
> 
> Thanks,
> 
> Jimmy Xiang
> 
>

Re: PreCommit-HIVE-TRUNK-Build still on svn repo

2015-04-23 Thread Thejas Nair

Sergio, Szehon, thanks for updating the repo being checkout from svn to git.

I see jobs getting submitted in precommit, and git checkout being
used, but it still fails with an NPE -
2015-04-23 22:32:34,618 ERROR TestExecutor.run:138 Error executing
PreCommit-HIVE-TRUNK-Build-3565 java.lang.NullPointerException: branch
at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:209)
at 
org.apache.hive.ptest.execution.conf.TestConfiguration.(TestConfiguration.java:99)
at 
org.apache.hive.ptest.execution.conf.TestConfiguration.fromInputStream(TestConfiguration.java:272)
at 
org.apache.hive.ptest.execution.conf.TestConfiguration.fromFile(TestConfiguration.java:280)
at org.apache.hive.ptest.api.server.TestExecutor.run(TestExecutor.java:110)


On Thu, Apr 23, 2015 at 12:02 PM, Szehon Ho  wrote:
> Yea it is probably specified in the source code or the properties file, the
> option is not exposed as parameter via jenkins job currently (wasn't
> expected it would switch).  We will have to look at it.
>
> On Thu, Apr 23, 2015 at 10:12 AM, Ashutosh Chauhan 
> wrote:
>
>> Hi all,
>>
>> Seems like Hive QA is still doing checkouts from locked down svn repo for
>> running tests. When I went to configure page of jenkins job, it doesn't
>> list git as an option in Source code management section. Does anyone know
>> if git repo is supported there? And if so, how to enable it?
>>
>> Thanks,
>> Ashutosh
>>

[jira] [Created] (HIVE-10477) Provide option to disable Spark tests in Windows OS

2015-04-23 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

Hari Sankar Sivarama Subramaniyan created HIVE-10477:


 Summary: Provide option to disable Spark tests in Windows OS
 Key: HIVE-10477
 URL: https://issues.apache.org/jira/browse/HIVE-10477
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan


In the current master branch, unit tests fail with windows OS because of the 
dependency on "bash" executable in itests/hive-unit/pom.xml around these lines :
{code}
 

  

  
{code}

We should provide an option to disable spark tests in OSes  like Windows where 
bash might be absent



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

40 matches

Mail list logo