[jira] [Commented] (HIVE-10807) Invalidate basic stats for insert queries if autogather=false

2015-05-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564308#comment-14564308
 ] 

Hive QA commented on HIVE-10807:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12735815/HIVE-10807.4.patch

{color:red}ERROR:{color} -1 due to 40 failed/errored test(s), 8978 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fold_case
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_partialscan_autogether
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_insert_into1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoin_union_remove_1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_stats12
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_stats13
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_stats2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_stats5
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_stats6
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_stats9
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_stats_noscan_1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_stats_partscan_1_23
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_10
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_11
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_12
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_13
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_14
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_15
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_16
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_17
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_18
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_19
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_20
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_21
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_22
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_23
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_24
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_25
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_3
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_4
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_5
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_6
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_6_subq
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_7
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_8
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_9
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4086/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4086/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4086/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 40 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12735815 - PreCommit-HIVE-TRUNK-Build

> Invalidate basic stats for insert queries if autogather=false
> -
>
> Key: HIVE-10807
> URL: https://issues.apache.org/jira/browse/HIVE-10807
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 1.2.0
>Reporter: Gopal V
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-10807.2.patch, HIVE-10807.3.patch, 
> HIVE-10807.4.patch, HIVE-10807.patch
>
>
> if stats.autogather=false leads to incorrect basic stats in case of insert 
> statements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-05-28 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564297#comment-14564297
 ] 

Alexander Pivovarov commented on HIVE-10841:


if I set hive.optimize.ppd=false then query returns 1 row BUT the plan does not 
have "brn is not null"

if I  set hive.optimize.ppd=true and change JOIN statements order to 
(A,acct,PI) then query returns 1 row AND the plan HAS "brn is not null".
{code}
set hive.optimize.ppd=true;

select
  acct.ACC_N,
  acct.brn
FROM LA
JOIN A ON LA.aid = A.id
JOIN acct ON LA.aid = acct.aid
JOIN PI ON PI.id = LA.pi_id
WHERE
  LA.loan_id = 4436
  and acct.brn is not null;
OK
10  122
{code}




> [WHERE col is not null] does not work sometimes for queries with many JOIN 
> statements
> -
>
> Key: HIVE-10841
> URL: https://issues.apache.org/jira/browse/HIVE-10841
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning, Query Processor
>Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0
>Reporter: Alexander Pivovarov
>
> The result from the following SELECT query is 3 rows but it should be 1 row.
> I checked it in MySQL - it returned 1 row.
> To reproduce the issue in Hive
> 1. prepare tables
> {code}
> drop table if exists L;
> drop table if exists LA;
> drop table if exists FR;
> drop table if exists A;
> drop table if exists PI;
> drop table if exists acct;
> create table L as select 4436 id;
> create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
> create table FR as select 4436 loan_id;
> create table A as select 4748 id;
> create table PI as select 4415 id;
> create table acct as select 4748 aid, 10 acc_n, 122 brn;
> insert into table acct values(4748, null, null);
> insert into table acct values(4748, null, null);
> {code}
> 2. run SELECT query
> {code}
> select
>   acct.ACC_N,
>   acct.brn
> FROM L
> JOIN LA ON L.id = LA.loan_id
> JOIN FR ON L.id = FR.loan_id
> JOIN A ON LA.aid = A.id
> JOIN PI ON PI.id = LA.pi_id
> JOIN acct ON A.id = acct.aid
> WHERE
>   L.id = 4436
>   and acct.brn is not null;
> {code}
> the result is 3 rows
> {code}
> 10122
> NULL  NULL
> NULL  NULL
> {code}
> but it should be 1 row
> {code}
> 10122
> {code}
> 2.1 "explain select ..." output for hive-1.3.0 MR
> {code}
> STAGE DEPENDENCIES:
>   Stage-12 is a root stage
>   Stage-9 depends on stages: Stage-12
>   Stage-0 depends on stages: Stage-9
> STAGE PLANS:
>   Stage: Stage-12
> Map Reduce Local Work
>   Alias -> Map Local Tables:
> a 
>   Fetch Operator
> limit: -1
> acct 
>   Fetch Operator
> limit: -1
> fr 
>   Fetch Operator
> limit: -1
> l 
>   Fetch Operator
> limit: -1
> pi 
>   Fetch Operator
> limit: -1
>   Alias -> Map Local Operator Tree:
> a 
>   TableScan
> alias: a
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
> stats: NONE
> Filter Operator
>   predicate: id is not null (type: boolean)
>   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
> Column stats: NONE
>   HashTable Sink Operator
> keys:
>   0 _col5 (type: int)
>   1 id (type: int)
>   2 aid (type: int)
> acct 
>   TableScan
> alias: acct
> Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
> Column stats: NONE
> Filter Operator
>   predicate: aid is not null (type: boolean)
>   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
> Column stats: NONE
>   HashTable Sink Operator
> keys:
>   0 _col5 (type: int)
>   1 id (type: int)
>   2 aid (type: int)
> fr 
>   TableScan
> alias: fr
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
> stats: NONE
> Filter Operator
>   predicate: (loan_id = 4436) (type: boolean)
>   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
> Column stats: NONE
>   HashTable Sink Operator
> keys:
>   0 4436 (type: int)
>   1 4436 (type: int)
>   2 4436 (type: int)
> l 
>   TableScan
> alias: l
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
> stats: NONE
> Filter Operator
>   predicate: (id = 4436) (type: boolean)
>   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
> Column stats: NONE
>   HashTable Sink Operat

[jira] [Commented] (HIVE-10821) Beeline-CLI: Implement CLI source command using Beeline functionality

2015-05-28 Thread Chinna Rao Lalam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564288#comment-14564288
 ] 

Chinna Rao Lalam commented on HIVE-10821:
-

Hi [~Ferd]  
Please remove String trimedCmd = cmd.trim() this in sourceFile(String cmd) 
Other Than this patch looks good to me +1(non binding)

> Beeline-CLI: Implement CLI source command using Beeline functionality
> -
>
> Key: HIVE-10821
> URL: https://issues.apache.org/jira/browse/HIVE-10821
> Project: Hive
>  Issue Type: Sub-task
>  Components: CLI
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-10821.1-beeline-cli.patch, 
> HIVE-10821.1-beeline-cli.patch, HIVE-10821.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10863) Merge trunk to Spark branch 5/28/2015 [Spark Branch]

2015-05-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564280#comment-14564280
 ] 

Hive QA commented on HIVE-10863:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12736048/HIVE-10863.0-spark.patch

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 7962 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.initializationError
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_7
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_8
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin11
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin5
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketsortoptimize_insert_2
org.apache.hive.jdbc.TestSSL.testSSLConnectionWithProperty
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/869/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/869/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-869/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12736048 - PreCommit-HIVE-SPARK-Build

> Merge trunk to Spark branch 5/28/2015 [Spark Branch]
> 
>
> Key: HIVE-10863
> URL: https://issues.apache.org/jira/browse/HIVE-10863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Deepesh Khandelwal
> Attachments: HIVE-10863.0-spark.patch, mj.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10821) Beeline-CLI: Implement CLI source command using Beeline functionality

2015-05-28 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-10821:

Release Note: 
Examples:
{noformat}
#cat /root/workspace/test.sql 
create table test2(a string, b string);
#0: jdbc:hive2://> source /root/workspace/test.sql
#0: jdbc:hive2://> create table test2(a string, b string);
{noformat}

> Beeline-CLI: Implement CLI source command using Beeline functionality
> -
>
> Key: HIVE-10821
> URL: https://issues.apache.org/jira/browse/HIVE-10821
> Project: Hive
>  Issue Type: Sub-task
>  Components: CLI
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-10821.1-beeline-cli.patch, 
> HIVE-10821.1-beeline-cli.patch, HIVE-10821.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10864) CBO (Calcite Return Path): auto_join2.q returning wrong results

2015-05-28 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-10864:
---
Attachment: HIVE-10864.patch

[~ashutoshc], could you review it? Thanks

> CBO (Calcite Return Path): auto_join2.q returning wrong results
> ---
>
> Key: HIVE-10864
> URL: https://issues.apache.org/jira/browse/HIVE-10864
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-10864.patch
>
>
> auto_join2.q returns wrong results when return path is on. The problem is 
> that we create the same join expression once per input reference when we are 
> translating. Thus, we incorrectly end up with a key composed by multiple 
> expressions in those cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-05-28 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564273#comment-14564273
 ] 

Alexander Pivovarov commented on HIVE-10841:


L tables also can be removed from the query - it does not affect the result.

But I found that order of JOIN statements matters.
I tries the following combinations of the JOIN statements
(PI,acct,A) (PI,A,acct) (A,PI,acct) (acct,PI,A) (A,act,PI) (act,A,PI)

3 rows are returned only for (A,PI,acct) combination
FROM LA
JOIN A
JOIN PI
JOIN acct

{code}
select
  acct.ACC_N,
  acct.brn
FROM LA
JOIN A ON LA.aid = A.id
JOIN PI ON PI.id = LA.pi_id
JOIN acct ON LA.aid = acct.aid
WHERE
  LA.loan_id = 4436
  and acct.brn is not null;
OK
10  122
NULLNULL
NULLNULL
{code}

> [WHERE col is not null] does not work sometimes for queries with many JOIN 
> statements
> -
>
> Key: HIVE-10841
> URL: https://issues.apache.org/jira/browse/HIVE-10841
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning, Query Processor
>Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0
>Reporter: Alexander Pivovarov
>
> The result from the following SELECT query is 3 rows but it should be 1 row.
> I checked it in MySQL - it returned 1 row.
> To reproduce the issue in Hive
> 1. prepare tables
> {code}
> drop table if exists L;
> drop table if exists LA;
> drop table if exists FR;
> drop table if exists A;
> drop table if exists PI;
> drop table if exists acct;
> create table L as select 4436 id;
> create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
> create table FR as select 4436 loan_id;
> create table A as select 4748 id;
> create table PI as select 4415 id;
> create table acct as select 4748 aid, 10 acc_n, 122 brn;
> insert into table acct values(4748, null, null);
> insert into table acct values(4748, null, null);
> {code}
> 2. run SELECT query
> {code}
> select
>   acct.ACC_N,
>   acct.brn
> FROM L
> JOIN LA ON L.id = LA.loan_id
> JOIN FR ON L.id = FR.loan_id
> JOIN A ON LA.aid = A.id
> JOIN PI ON PI.id = LA.pi_id
> JOIN acct ON A.id = acct.aid
> WHERE
>   L.id = 4436
>   and acct.brn is not null;
> {code}
> the result is 3 rows
> {code}
> 10122
> NULL  NULL
> NULL  NULL
> {code}
> but it should be 1 row
> {code}
> 10122
> {code}
> 2.1 "explain select ..." output for hive-1.3.0 MR
> {code}
> STAGE DEPENDENCIES:
>   Stage-12 is a root stage
>   Stage-9 depends on stages: Stage-12
>   Stage-0 depends on stages: Stage-9
> STAGE PLANS:
>   Stage: Stage-12
> Map Reduce Local Work
>   Alias -> Map Local Tables:
> a 
>   Fetch Operator
> limit: -1
> acct 
>   Fetch Operator
> limit: -1
> fr 
>   Fetch Operator
> limit: -1
> l 
>   Fetch Operator
> limit: -1
> pi 
>   Fetch Operator
> limit: -1
>   Alias -> Map Local Operator Tree:
> a 
>   TableScan
> alias: a
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
> stats: NONE
> Filter Operator
>   predicate: id is not null (type: boolean)
>   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
> Column stats: NONE
>   HashTable Sink Operator
> keys:
>   0 _col5 (type: int)
>   1 id (type: int)
>   2 aid (type: int)
> acct 
>   TableScan
> alias: acct
> Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
> Column stats: NONE
> Filter Operator
>   predicate: aid is not null (type: boolean)
>   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
> Column stats: NONE
>   HashTable Sink Operator
> keys:
>   0 _col5 (type: int)
>   1 id (type: int)
>   2 aid (type: int)
> fr 
>   TableScan
> alias: fr
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
> stats: NONE
> Filter Operator
>   predicate: (loan_id = 4436) (type: boolean)
>   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
> Column stats: NONE
>   HashTable Sink Operator
> keys:
>   0 4436 (type: int)
>   1 4436 (type: int)
>   2 4436 (type: int)
> l 
>   TableScan
> alias: l
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
> stats: NONE
> Filter Operator
>   predicate: (id = 4436) (type: boolean)
>   Statistics: Num rows: 1 Dat

[jira] [Updated] (HIVE-9069) Simplify filter predicates for CBO

2015-05-28 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-9069:
--
Attachment: HIVE-9069.18.patch

> Simplify filter predicates for CBO
> --
>
> Key: HIVE-9069
> URL: https://issues.apache.org/jira/browse/HIVE-9069
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Jesus Camacho Rodriguez
> Fix For: 0.14.1
>
> Attachments: HIVE-9069.01.patch, HIVE-9069.02.patch, 
> HIVE-9069.03.patch, HIVE-9069.04.patch, HIVE-9069.05.patch, 
> HIVE-9069.06.patch, HIVE-9069.07.patch, HIVE-9069.08.patch, 
> HIVE-9069.08.patch, HIVE-9069.09.patch, HIVE-9069.10.patch, 
> HIVE-9069.11.patch, HIVE-9069.12.patch, HIVE-9069.13.patch, 
> HIVE-9069.14.patch, HIVE-9069.14.patch, HIVE-9069.15.patch, 
> HIVE-9069.16.patch, HIVE-9069.17.patch, HIVE-9069.17.patch, 
> HIVE-9069.18.patch, HIVE-9069.patch
>
>
> Simplify predicates for disjunctive predicates so that can get pushed down to 
> the scan.
> Looks like this is still an issue, some of the filters can be pushed down to 
> the scan.
> {code}
> set hive.cbo.enable=true
> set hive.stats.fetch.column.stats=true
> set hive.exec.dynamic.partition.mode=nonstrict
> set hive.tez.auto.reducer.parallelism=true
> set hive.auto.convert.join.noconditionaltask.size=32000
> set hive.exec.reducers.bytes.per.reducer=1
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager
> set hive.support.concurrency=false
> set hive.tez.exec.print.summary=true
> explain  
> select  substr(r_reason_desc,1,20) as r
>,avg(ws_quantity) wq
>,avg(wr_refunded_cash) ref
>,avg(wr_fee) fee
>  from web_sales, web_returns, web_page, customer_demographics cd1,
>   customer_demographics cd2, customer_address, date_dim, reason 
>  where web_sales.ws_web_page_sk = web_page.wp_web_page_sk
>and web_sales.ws_item_sk = web_returns.wr_item_sk
>and web_sales.ws_order_number = web_returns.wr_order_number
>and web_sales.ws_sold_date_sk = date_dim.d_date_sk and d_year = 1998
>and cd1.cd_demo_sk = web_returns.wr_refunded_cdemo_sk 
>and cd2.cd_demo_sk = web_returns.wr_returning_cdemo_sk
>and customer_address.ca_address_sk = web_returns.wr_refunded_addr_sk
>and reason.r_reason_sk = web_returns.wr_reason_sk
>and
>(
> (
>  cd1.cd_marital_status = 'M'
>  and
>  cd1.cd_marital_status = cd2.cd_marital_status
>  and
>  cd1.cd_education_status = '4 yr Degree'
>  and 
>  cd1.cd_education_status = cd2.cd_education_status
>  and
>  ws_sales_price between 100.00 and 150.00
> )
>or
> (
>  cd1.cd_marital_status = 'D'
>  and
>  cd1.cd_marital_status = cd2.cd_marital_status
>  and
>  cd1.cd_education_status = 'Primary' 
>  and
>  cd1.cd_education_status = cd2.cd_education_status
>  and
>  ws_sales_price between 50.00 and 100.00
> )
>or
> (
>  cd1.cd_marital_status = 'U'
>  and
>  cd1.cd_marital_status = cd2.cd_marital_status
>  and
>  cd1.cd_education_status = 'Advanced Degree'
>  and
>  cd1.cd_education_status = cd2.cd_education_status
>  and
>  ws_sales_price between 150.00 and 200.00
> )
>)
>and
>(
> (
>  ca_country = 'United States'
>  and
>  ca_state in ('KY', 'GA', 'NM')
>  and ws_net_profit between 100 and 200  
> )
> or
> (
>  ca_country = 'United States'
>  and
>  ca_state in ('MT', 'OR', 'IN')
>  and ws_net_profit between 150 and 300  
> )
> or
> (
>  ca_country = 'United States'
>  and
>  ca_state in ('WI', 'MO', 'WV')
>  and ws_net_profit between 50 and 250  
> )
>)
> group by r_reason_desc
> order by r, wq, ref, fee
> limit 100
> OK
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   Edges:
> Map 9 <- Map 1 (BROADCAST_EDGE)
> Reducer 3 <- Map 13 (SIMPLE_EDGE), Map 2 (SIMPLE_EDGE)
> Reducer 4 <- Map 9 (SIMPLE_EDGE), Reducer 3 (SIMPLE_EDGE)
> Reducer 5 <- Map 14 (SIMPLE_EDGE), Reducer 4 (SIMPLE_EDGE)
> Reducer 6 <- Map 10 (SIMPLE_EDGE), Map 11 (BROADCAST_EDGE), Map 12 
> (BROADCAST_EDGE), Reducer 5 (SIMPLE_EDGE)
> Reducer 7 <- Reducer 6 (SIMPLE_EDGE)
> Reducer 8 <- Reducer 7 (SIMPLE_EDGE)
>   DagName: mmokhtar_2014161818_f5fd23ba-d783-4b13-8507-7faa65851798:1
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: web_page
>   filterExpr: wp_web_page_sk is not null (type: boolean)
>   Statistics: Num rows: 4602 Data size: 2

[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-05-28 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564244#comment-14564244
 ] 

Gopal V commented on HIVE-10841:


can you try this with 

set hive.optimize.ppd=false;

?

> [WHERE col is not null] does not work sometimes for queries with many JOIN 
> statements
> -
>
> Key: HIVE-10841
> URL: https://issues.apache.org/jira/browse/HIVE-10841
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning, Query Processor
>Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0
>Reporter: Alexander Pivovarov
>
> The result from the following SELECT query is 3 rows but it should be 1 row.
> I checked it in MySQL - it returned 1 row.
> To reproduce the issue in Hive
> 1. prepare tables
> {code}
> drop table if exists L;
> drop table if exists LA;
> drop table if exists FR;
> drop table if exists A;
> drop table if exists PI;
> drop table if exists acct;
> create table L as select 4436 id;
> create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
> create table FR as select 4436 loan_id;
> create table A as select 4748 id;
> create table PI as select 4415 id;
> create table acct as select 4748 aid, 10 acc_n, 122 brn;
> insert into table acct values(4748, null, null);
> insert into table acct values(4748, null, null);
> {code}
> 2. run SELECT query
> {code}
> select
>   acct.ACC_N,
>   acct.brn
> FROM L
> JOIN LA ON L.id = LA.loan_id
> JOIN FR ON L.id = FR.loan_id
> JOIN A ON LA.aid = A.id
> JOIN PI ON PI.id = LA.pi_id
> JOIN acct ON A.id = acct.aid
> WHERE
>   L.id = 4436
>   and acct.brn is not null;
> {code}
> the result is 3 rows
> {code}
> 10122
> NULL  NULL
> NULL  NULL
> {code}
> but it should be 1 row
> {code}
> 10122
> {code}
> 2.1 "explain select ..." output for hive-1.3.0 MR
> {code}
> STAGE DEPENDENCIES:
>   Stage-12 is a root stage
>   Stage-9 depends on stages: Stage-12
>   Stage-0 depends on stages: Stage-9
> STAGE PLANS:
>   Stage: Stage-12
> Map Reduce Local Work
>   Alias -> Map Local Tables:
> a 
>   Fetch Operator
> limit: -1
> acct 
>   Fetch Operator
> limit: -1
> fr 
>   Fetch Operator
> limit: -1
> l 
>   Fetch Operator
> limit: -1
> pi 
>   Fetch Operator
> limit: -1
>   Alias -> Map Local Operator Tree:
> a 
>   TableScan
> alias: a
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
> stats: NONE
> Filter Operator
>   predicate: id is not null (type: boolean)
>   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
> Column stats: NONE
>   HashTable Sink Operator
> keys:
>   0 _col5 (type: int)
>   1 id (type: int)
>   2 aid (type: int)
> acct 
>   TableScan
> alias: acct
> Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
> Column stats: NONE
> Filter Operator
>   predicate: aid is not null (type: boolean)
>   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
> Column stats: NONE
>   HashTable Sink Operator
> keys:
>   0 _col5 (type: int)
>   1 id (type: int)
>   2 aid (type: int)
> fr 
>   TableScan
> alias: fr
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
> stats: NONE
> Filter Operator
>   predicate: (loan_id = 4436) (type: boolean)
>   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
> Column stats: NONE
>   HashTable Sink Operator
> keys:
>   0 4436 (type: int)
>   1 4436 (type: int)
>   2 4436 (type: int)
> l 
>   TableScan
> alias: l
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
> stats: NONE
> Filter Operator
>   predicate: (id = 4436) (type: boolean)
>   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
> Column stats: NONE
>   HashTable Sink Operator
> keys:
>   0 4436 (type: int)
>   1 4436 (type: int)
>   2 4436 (type: int)
> pi 
>   TableScan
> alias: pi
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
> stats: NONE
> Filter Operator
>   predicate: id is not null (type: boolean)
>   Statistics: Num rows: 1 Data size: 4 Basic stats: C

[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-05-28 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564243#comment-14564243
 ] 

Alexander Pivovarov commented on HIVE-10841:


I found that "JOIN FR" can be removed - the result still will be 3 rows
But adding or removing "JOIN PI" changes Filter Operator predicate for acct 
table

if we remove "JOIN PI" then acct table Filter Operator predicate has "brn is 
not null" and query returns 1 row
{code}
acct 
  TableScan
alias: acct
Statistics: Num rows: 5 Data size: 63 Basic stats: COMPLETE Column 
stats: NONE
Filter Operator
  predicate: (aid is not null and brn is not null) (type: boolean)
{code}

How it can be possible that removing "JOIN PI" changes Filter Operator 
predicate for acct table?

The query below returns 1 row. Query plan has "brn is not null" predicate in 
Filter Operator for acct table.
But if we remove comment before "JOIN PI" then query plan will not have "brn is 
not null" predicate.
{code}
explain select
  acct.ACC_N,
  acct.brn
FROM L
JOIN LA ON L.id = LA.loan_id
JOIN A ON LA.aid = A.id
--JOIN PI ON PI.id = LA.pi_id
JOIN acct ON A.id = acct.aid
WHERE
  L.id = 4436
  and acct.brn is not null;
{code}


> [WHERE col is not null] does not work sometimes for queries with many JOIN 
> statements
> -
>
> Key: HIVE-10841
> URL: https://issues.apache.org/jira/browse/HIVE-10841
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning, Query Processor
>Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0
>Reporter: Alexander Pivovarov
>
> The result from the following SELECT query is 3 rows but it should be 1 row.
> I checked it in MySQL - it returned 1 row.
> To reproduce the issue in Hive
> 1. prepare tables
> {code}
> drop table if exists L;
> drop table if exists LA;
> drop table if exists FR;
> drop table if exists A;
> drop table if exists PI;
> drop table if exists acct;
> create table L as select 4436 id;
> create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
> create table FR as select 4436 loan_id;
> create table A as select 4748 id;
> create table PI as select 4415 id;
> create table acct as select 4748 aid, 10 acc_n, 122 brn;
> insert into table acct values(4748, null, null);
> insert into table acct values(4748, null, null);
> {code}
> 2. run SELECT query
> {code}
> select
>   acct.ACC_N,
>   acct.brn
> FROM L
> JOIN LA ON L.id = LA.loan_id
> JOIN FR ON L.id = FR.loan_id
> JOIN A ON LA.aid = A.id
> JOIN PI ON PI.id = LA.pi_id
> JOIN acct ON A.id = acct.aid
> WHERE
>   L.id = 4436
>   and acct.brn is not null;
> {code}
> the result is 3 rows
> {code}
> 10122
> NULL  NULL
> NULL  NULL
> {code}
> but it should be 1 row
> {code}
> 10122
> {code}
> 2.1 "explain select ..." output for hive-1.3.0 MR
> {code}
> STAGE DEPENDENCIES:
>   Stage-12 is a root stage
>   Stage-9 depends on stages: Stage-12
>   Stage-0 depends on stages: Stage-9
> STAGE PLANS:
>   Stage: Stage-12
> Map Reduce Local Work
>   Alias -> Map Local Tables:
> a 
>   Fetch Operator
> limit: -1
> acct 
>   Fetch Operator
> limit: -1
> fr 
>   Fetch Operator
> limit: -1
> l 
>   Fetch Operator
> limit: -1
> pi 
>   Fetch Operator
> limit: -1
>   Alias -> Map Local Operator Tree:
> a 
>   TableScan
> alias: a
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
> stats: NONE
> Filter Operator
>   predicate: id is not null (type: boolean)
>   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
> Column stats: NONE
>   HashTable Sink Operator
> keys:
>   0 _col5 (type: int)
>   1 id (type: int)
>   2 aid (type: int)
> acct 
>   TableScan
> alias: acct
> Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
> Column stats: NONE
> Filter Operator
>   predicate: aid is not null (type: boolean)
>   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
> Column stats: NONE
>   HashTable Sink Operator
> keys:
>   0 _col5 (type: int)
>   1 id (type: int)
>   2 aid (type: int)
> fr 
>   TableScan
> alias: fr
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
> stats: NONE
> Filter Operator
>   predicate: (loan_id = 4436) (type: boolean)
>   Statistics: Num rows: 1 Data size: 4 Basic stat

[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-05-28 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564229#comment-14564229
 ] 

Alexander Pivovarov commented on HIVE-10841:


if we change "is not null" to "is null" or to " = 122" then all 3 rows will be 
NULL or 122 for column "brn" (second column).

and acct.brn is null;
{code}
10  NULL
NULLNULL
NULLNULL
{code}

and acct.brn = 122;
{code}
10  122
NULL122
NULL122
{code}

and acct.brn is null;
{code}
10  122
NULLNULL
NULLNULL
{code}

> [WHERE col is not null] does not work sometimes for queries with many JOIN 
> statements
> -
>
> Key: HIVE-10841
> URL: https://issues.apache.org/jira/browse/HIVE-10841
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning, Query Processor
>Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0
>Reporter: Alexander Pivovarov
>
> The result from the following SELECT query is 3 rows but it should be 1 row.
> I checked it in MySQL - it returned 1 row.
> To reproduce the issue in Hive
> 1. prepare tables
> {code}
> drop table if exists L;
> drop table if exists LA;
> drop table if exists FR;
> drop table if exists A;
> drop table if exists PI;
> drop table if exists acct;
> create table L as select 4436 id;
> create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
> create table FR as select 4436 loan_id;
> create table A as select 4748 id;
> create table PI as select 4415 id;
> create table acct as select 4748 aid, 10 acc_n, 122 brn;
> insert into table acct values(4748, null, null);
> insert into table acct values(4748, null, null);
> {code}
> 2. run SELECT query
> {code}
> select
>   acct.ACC_N,
>   acct.brn
> FROM L
> JOIN LA ON L.id = LA.loan_id
> JOIN FR ON L.id = FR.loan_id
> JOIN A ON LA.aid = A.id
> JOIN PI ON PI.id = LA.pi_id
> JOIN acct ON A.id = acct.aid
> WHERE
>   L.id = 4436
>   and acct.brn is not null;
> {code}
> the result is 3 rows
> {code}
> 10122
> NULL  NULL
> NULL  NULL
> {code}
> but it should be 1 row
> {code}
> 10122
> {code}
> 2.1 "explain select ..." output for hive-1.3.0 MR
> {code}
> STAGE DEPENDENCIES:
>   Stage-12 is a root stage
>   Stage-9 depends on stages: Stage-12
>   Stage-0 depends on stages: Stage-9
> STAGE PLANS:
>   Stage: Stage-12
> Map Reduce Local Work
>   Alias -> Map Local Tables:
> a 
>   Fetch Operator
> limit: -1
> acct 
>   Fetch Operator
> limit: -1
> fr 
>   Fetch Operator
> limit: -1
> l 
>   Fetch Operator
> limit: -1
> pi 
>   Fetch Operator
> limit: -1
>   Alias -> Map Local Operator Tree:
> a 
>   TableScan
> alias: a
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
> stats: NONE
> Filter Operator
>   predicate: id is not null (type: boolean)
>   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
> Column stats: NONE
>   HashTable Sink Operator
> keys:
>   0 _col5 (type: int)
>   1 id (type: int)
>   2 aid (type: int)
> acct 
>   TableScan
> alias: acct
> Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
> Column stats: NONE
> Filter Operator
>   predicate: aid is not null (type: boolean)
>   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
> Column stats: NONE
>   HashTable Sink Operator
> keys:
>   0 _col5 (type: int)
>   1 id (type: int)
>   2 aid (type: int)
> fr 
>   TableScan
> alias: fr
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
> stats: NONE
> Filter Operator
>   predicate: (loan_id = 4436) (type: boolean)
>   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
> Column stats: NONE
>   HashTable Sink Operator
> keys:
>   0 4436 (type: int)
>   1 4436 (type: int)
>   2 4436 (type: int)
> l 
>   TableScan
> alias: l
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
> stats: NONE
> Filter Operator
>   predicate: (id = 4436) (type: boolean)
>   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
> Column stats: NONE
>   HashTable Sink Operator
> keys:
>   0 4436 (type: int)
>   1 4436 (type: int)
>   2 4436 (type:

[jira] [Commented] (HIVE-9069) Simplify filter predicates for CBO

2015-05-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564227#comment-14564227
 ] 

Hive QA commented on HIVE-9069:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12735925/HIVE-9069.17.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4085/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4085/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4085/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-4085/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   4610084..aafd586  spark  -> origin/spark
+ git reset --hard HEAD
HEAD is now at 52221a7 HIVE-10684: Fix the unit test failures for HIVE-7553 
after HIVE-10674 removed the binary jar files(Ferdinand Xu, reviewed by Hari 
Sankar Sivarama Subramaniyan and Sushanth Sowmyan)
+ git clean -f -d
Removing common/src/java/org/apache/hadoop/hive/common/JvmPauseMonitor.java
Removing 
common/src/java/org/apache/hadoop/hive/common/metrics/LegacyMetrics.java
Removing common/src/java/org/apache/hadoop/hive/common/metrics/common/
Removing common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/
Removing 
common/src/test/org/apache/hadoop/hive/common/metrics/TestLegacyMetrics.java
Removing common/src/test/org/apache/hadoop/hive/common/metrics/metrics2/
Removing 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestMetaStoreMetrics.java
+ git checkout master
Already on 'master'
+ git reset --hard origin/master
HEAD is now at 52221a7 HIVE-10684: Fix the unit test failures for HIVE-7553 
after HIVE-10674 removed the binary jar files(Ferdinand Xu, reviewed by Hari 
Sankar Sivarama Subramaniyan and Sushanth Sowmyan)
+ git merge --ff-only origin/master
Already up-to-date.
+ git gc
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12735925 - PreCommit-HIVE-TRUNK-Build

> Simplify filter predicates for CBO
> --
>
> Key: HIVE-9069
> URL: https://issues.apache.org/jira/browse/HIVE-9069
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Jesus Camacho Rodriguez
> Fix For: 0.14.1
>
> Attachments: HIVE-9069.01.patch, HIVE-9069.02.patch, 
> HIVE-9069.03.patch, HIVE-9069.04.patch, HIVE-9069.05.patch, 
> HIVE-9069.06.patch, HIVE-9069.07.patch, HIVE-9069.08.patch, 
> HIVE-9069.08.patch, HIVE-9069.09.patch, HIVE-9069.10.patch, 
> HIVE-9069.11.patch, HIVE-9069.12.patch, HIVE-9069.13.patch, 
> HIVE-9069.14.patch, HIVE-9069.14.patch, HIVE-9069.15.patch, 
> HIVE-9069.16.patch, HIVE-9069.17.patch, HIVE-9069.17.patch, HIVE-9069.patch
>
>
> Simplify predicates for disjunctive predicates so that can get pushed down to 
> the scan.
> Looks like this

[jira] [Commented] (HIVE-10761) Create codahale-based metrics system for Hive

2015-05-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564222#comment-14564222
 ] 

Hive QA commented on HIVE-10761:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12736002/HIVE-10761.5.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8983 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fold_case
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4082/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4082/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4082/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12736002 - PreCommit-HIVE-TRUNK-Build

> Create codahale-based metrics system for Hive
> -
>
> Key: HIVE-10761
> URL: https://issues.apache.org/jira/browse/HIVE-10761
> Project: Hive
>  Issue Type: New Feature
>  Components: Diagnosability
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-10761.2.patch, HIVE-10761.3.patch, 
> HIVE-10761.4.patch, HIVE-10761.5.patch, HIVE-10761.patch, hms-metrics.json
>
>
> There is a current Hive metrics system that hooks up to a JMX reporting, but 
> all its measurements, models are custom.
> This is to make another metrics system that will be based on Codahale (ie 
> yammer, dropwizard), which has the following advantage:
> * Well-defined metric model for frequently-needed metrics (ie JVM metrics)
> * Well-defined measurements for all metrics (ie max, mean, stddev, mean_rate, 
> etc), 
> * Built-in reporting frameworks like JMX, Console, Log, JSON webserver
> It is used for many projects, including several Apache projects like Oozie.  
> Overall, monitoring tools should find it easier to understand these common 
> metric, measurement, reporting models.
> The existing metric subsystem will be kept and can be enabled if backward 
> compatibility is desired.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-05-28 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564215#comment-14564215
 ] 

Alexander Pivovarov commented on HIVE-10841:


hive-0.12.0 plan
{code}
ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_JOIN (TOK_JOIN (TOK_JOIN (TOK_JOIN 
(TOK_TABREF (TOK_TABNAME L)) (TOK_TABREF (TOK_TABNAME LA)) (= (. 
(TOK_TABLE_OR_COL L) id) (. (TOK_TABLE_OR_COL LA) loan_id))) (TOK_TABREF 
(TOK_TABNAME FR)) (= (. (TOK_TABLE_OR_COL L) id) (. (TOK_TABLE_OR_COL FR) 
loan_id))) (TOK_TABREF (TOK_TABNAME A)) (= (. (TOK_TABLE_OR_COL LA) aid) (. 
(TOK_TABLE_OR_COL A) id))) (TOK_TABREF (TOK_TABNAME PI)) (= (. 
(TOK_TABLE_OR_COL PI) id) (. (TOK_TABLE_OR_COL LA) pi_id))) (TOK_TABREF 
(TOK_TABNAME acct)) (= (. (TOK_TABLE_OR_COL A) id) (. (TOK_TABLE_OR_COL acct) 
aid (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT 
(TOK_SELEXPR (. (TOK_TABLE_OR_COL acct) ACC_N)) (TOK_SELEXPR (. 
(TOK_TABLE_OR_COL acct) brn))) (TOK_WHERE (and (= (. (TOK_TABLE_OR_COL L) id) 
4436) (TOK_FUNCTION TOK_ISNOTNULL (. (TOK_TABLE_OR_COL acct) brn))

STAGE DEPENDENCIES:
  Stage-11 is a root stage
  Stage-8 depends on stages: Stage-11
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-11
Map Reduce Local Work
  Alias -> Map Local Tables:
a 
  Fetch Operator
limit: -1
acct 
  Fetch Operator
limit: -1
fr 
  Fetch Operator
limit: -1
l 
  Fetch Operator
limit: -1
pi 
  Fetch Operator
limit: -1
  Alias -> Map Local Operator Tree:
a 
  TableScan
alias: a
HashTable Sink Operator
  condition expressions:
0 {_col5}
1 
2 {acc_n} {brn}
  handleSkewJoin: false
  keys:
0 [Column[_col4]]
1 [Column[id]]
2 [Column[aid]]
  Position of Big Table: 0
acct 
  TableScan
alias: acct
Filter Operator
  predicate:
  expr: brn is not null
  type: boolean
  HashTable Sink Operator
condition expressions:
  0 {_col5}
  1 
  2 {acc_n} {brn}
handleSkewJoin: false
keys:
  0 [Column[_col4]]
  1 [Column[id]]
  2 [Column[aid]]
Position of Big Table: 0
fr 
  TableScan
alias: fr
Filter Operator
  predicate:
  expr: (loan_id = 4436)
  type: boolean
  HashTable Sink Operator
condition expressions:
  0 
  1 {aid} {pi_id}
  2 
handleSkewJoin: false
keys:
  0 [Column[id]]
  1 [Column[loan_id]]
  2 [Column[loan_id]]
Position of Big Table: 1
l 
  TableScan
alias: l
Filter Operator
  predicate:
  expr: (id = 4436)
  type: boolean
  HashTable Sink Operator
condition expressions:
  0 
  1 {aid} {pi_id}
  2 
handleSkewJoin: false
keys:
  0 [Column[id]]
  1 [Column[loan_id]]
  2 [Column[loan_id]]
Position of Big Table: 1
pi 
  TableScan
alias: pi
HashTable Sink Operator
  condition expressions:
0 {_col15} {_col16}
1 
  handleSkewJoin: false
  keys:
0 [Column[_col2]]
1 [Column[id]]
  Position of Big Table: 0

  Stage: Stage-8
Map Reduce
  Alias -> Map Operator Tree:
la 
  TableScan
alias: la
Filter Operator
  predicate:
  expr: (loan_id = 4436)
  type: boolean
  Map Join Operator
condition map:
 Inner Join 0 to 1
 Inner Join 0 to 2
condition expressions:
  0 
  1 {aid} {pi_id}
  2 
handleSkewJoin: false
keys:
  0 [Column[id]]
  1 [Column[loan_id]]
  2 [Column[loan_id]]
outputColumnNames: _col4, _col5
Position of Big Table: 1
Map Join Operator
  condition map:
   Inner Join 0 to 1
   Inner Join 1 to 2
  

[jira] [Updated] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-05-28 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-10841:
---
Component/s: Query Planning

> [WHERE col is not null] does not work sometimes for queries with many JOIN 
> statements
> -
>
> Key: HIVE-10841
> URL: https://issues.apache.org/jira/browse/HIVE-10841
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning, Query Processor
>Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0
>Reporter: Alexander Pivovarov
>
> The result from the following SELECT query is 3 rows but it should be 1 row.
> I checked it in MySQL - it returned 1 row.
> To reproduce the issue in Hive
> 1. prepare tables
> {code}
> drop table if exists L;
> drop table if exists LA;
> drop table if exists FR;
> drop table if exists A;
> drop table if exists PI;
> drop table if exists acct;
> create table L as select 4436 id;
> create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
> create table FR as select 4436 loan_id;
> create table A as select 4748 id;
> create table PI as select 4415 id;
> create table acct as select 4748 aid, 10 acc_n, 122 brn;
> insert into table acct values(4748, null, null);
> insert into table acct values(4748, null, null);
> {code}
> 2. run SELECT query
> {code}
> select
>   acct.ACC_N,
>   acct.brn
> FROM L
> JOIN LA ON L.id = LA.loan_id
> JOIN FR ON L.id = FR.loan_id
> JOIN A ON LA.aid = A.id
> JOIN PI ON PI.id = LA.pi_id
> JOIN acct ON A.id = acct.aid
> WHERE
>   L.id = 4436
>   and acct.brn is not null;
> {code}
> the result is 3 rows
> {code}
> 10122
> NULL  NULL
> NULL  NULL
> {code}
> but it should be 1 row
> {code}
> 10122
> {code}
> 2.1 "explain select ..." output for hive-1.3.0 MR
> {code}
> STAGE DEPENDENCIES:
>   Stage-12 is a root stage
>   Stage-9 depends on stages: Stage-12
>   Stage-0 depends on stages: Stage-9
> STAGE PLANS:
>   Stage: Stage-12
> Map Reduce Local Work
>   Alias -> Map Local Tables:
> a 
>   Fetch Operator
> limit: -1
> acct 
>   Fetch Operator
> limit: -1
> fr 
>   Fetch Operator
> limit: -1
> l 
>   Fetch Operator
> limit: -1
> pi 
>   Fetch Operator
> limit: -1
>   Alias -> Map Local Operator Tree:
> a 
>   TableScan
> alias: a
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
> stats: NONE
> Filter Operator
>   predicate: id is not null (type: boolean)
>   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
> Column stats: NONE
>   HashTable Sink Operator
> keys:
>   0 _col5 (type: int)
>   1 id (type: int)
>   2 aid (type: int)
> acct 
>   TableScan
> alias: acct
> Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
> Column stats: NONE
> Filter Operator
>   predicate: aid is not null (type: boolean)
>   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
> Column stats: NONE
>   HashTable Sink Operator
> keys:
>   0 _col5 (type: int)
>   1 id (type: int)
>   2 aid (type: int)
> fr 
>   TableScan
> alias: fr
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
> stats: NONE
> Filter Operator
>   predicate: (loan_id = 4436) (type: boolean)
>   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
> Column stats: NONE
>   HashTable Sink Operator
> keys:
>   0 4436 (type: int)
>   1 4436 (type: int)
>   2 4436 (type: int)
> l 
>   TableScan
> alias: l
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
> stats: NONE
> Filter Operator
>   predicate: (id = 4436) (type: boolean)
>   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
> Column stats: NONE
>   HashTable Sink Operator
> keys:
>   0 4436 (type: int)
>   1 4436 (type: int)
>   2 4436 (type: int)
> pi 
>   TableScan
> alias: pi
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
> stats: NONE
> Filter Operator
>   predicate: id is not null (type: boolean)
>   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
> Column stats: NONE
>   HashTable Si

[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-05-28 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564213#comment-14564213
 ] 

Alexander Pivovarov commented on HIVE-10841:


if we look at hive-0.12 plan then we can see that it has "brn is not null" 
predicate in Filter Operator
{code}
acct 
  TableScan
alias: acct
Filter Operator
  predicate:
  expr: brn is not null
  type: boolean
{code}

But in hive-1.3.0 plan I do not see "brn" at all. It only has "predicate: aid 
is not null" for acct table.
Does it mean that hive-1.3.0 plan is wrong?

I checked ppd folder diff btw 0.12.0 and 0.13.0
It was two fixes
HIVE-4293 : Predicates following UDTF operator are removed by PPD 
HIVE-5411 : Migrate expression serialization to Kryo



> [WHERE col is not null] does not work sometimes for queries with many JOIN 
> statements
> -
>
> Key: HIVE-10841
> URL: https://issues.apache.org/jira/browse/HIVE-10841
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning, Query Processor
>Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0
>Reporter: Alexander Pivovarov
>
> The result from the following SELECT query is 3 rows but it should be 1 row.
> I checked it in MySQL - it returned 1 row.
> To reproduce the issue in Hive
> 1. prepare tables
> {code}
> drop table if exists L;
> drop table if exists LA;
> drop table if exists FR;
> drop table if exists A;
> drop table if exists PI;
> drop table if exists acct;
> create table L as select 4436 id;
> create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
> create table FR as select 4436 loan_id;
> create table A as select 4748 id;
> create table PI as select 4415 id;
> create table acct as select 4748 aid, 10 acc_n, 122 brn;
> insert into table acct values(4748, null, null);
> insert into table acct values(4748, null, null);
> {code}
> 2. run SELECT query
> {code}
> select
>   acct.ACC_N,
>   acct.brn
> FROM L
> JOIN LA ON L.id = LA.loan_id
> JOIN FR ON L.id = FR.loan_id
> JOIN A ON LA.aid = A.id
> JOIN PI ON PI.id = LA.pi_id
> JOIN acct ON A.id = acct.aid
> WHERE
>   L.id = 4436
>   and acct.brn is not null;
> {code}
> the result is 3 rows
> {code}
> 10122
> NULL  NULL
> NULL  NULL
> {code}
> but it should be 1 row
> {code}
> 10122
> {code}
> 2.1 "explain select ..." output for hive-1.3.0 MR
> {code}
> STAGE DEPENDENCIES:
>   Stage-12 is a root stage
>   Stage-9 depends on stages: Stage-12
>   Stage-0 depends on stages: Stage-9
> STAGE PLANS:
>   Stage: Stage-12
> Map Reduce Local Work
>   Alias -> Map Local Tables:
> a 
>   Fetch Operator
> limit: -1
> acct 
>   Fetch Operator
> limit: -1
> fr 
>   Fetch Operator
> limit: -1
> l 
>   Fetch Operator
> limit: -1
> pi 
>   Fetch Operator
> limit: -1
>   Alias -> Map Local Operator Tree:
> a 
>   TableScan
> alias: a
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
> stats: NONE
> Filter Operator
>   predicate: id is not null (type: boolean)
>   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
> Column stats: NONE
>   HashTable Sink Operator
> keys:
>   0 _col5 (type: int)
>   1 id (type: int)
>   2 aid (type: int)
> acct 
>   TableScan
> alias: acct
> Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
> Column stats: NONE
> Filter Operator
>   predicate: aid is not null (type: boolean)
>   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
> Column stats: NONE
>   HashTable Sink Operator
> keys:
>   0 _col5 (type: int)
>   1 id (type: int)
>   2 aid (type: int)
> fr 
>   TableScan
> alias: fr
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
> stats: NONE
> Filter Operator
>   predicate: (loan_id = 4436) (type: boolean)
>   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
> Column stats: NONE
>   HashTable Sink Operator
> keys:
>   0 4436 (type: int)
>   1 4436 (type: int)
>   2 4436 (type: int)
> l 
>   TableScan
> alias: l
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
> stats: NONE
> Filter Operator
>   predicate: (id = 4436) (

[jira] [Assigned] (HIVE-10863) Merge trunk to Spark branch 5/28/2015 [Spark Branch]

2015-05-28 Thread Deepesh Khandelwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepesh Khandelwal reassigned HIVE-10863:
-

Assignee: Deepesh Khandelwal  (was: Xuefu Zhang)

> Merge trunk to Spark branch 5/28/2015 [Spark Branch]
> 
>
> Key: HIVE-10863
> URL: https://issues.apache.org/jira/browse/HIVE-10863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Deepesh Khandelwal
> Attachments: HIVE-10863.0-spark.patch, mj.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10863) Merge trunk to Spark branch 5/28/2015 [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10863:
---
Attachment: HIVE-10863.0-spark.patch

Patch #0 is a dummy patch to trigger a test run.

> Merge trunk to Spark branch 5/28/2015 [Spark Branch]
> 
>
> Key: HIVE-10863
> URL: https://issues.apache.org/jira/browse/HIVE-10863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-10863.0-spark.patch, mj.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10863) Merge trunk to Spark branch 5/28/2015 [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10863:
---
Attachment: mj.patch

> Merge trunk to Spark branch 5/28/2015 [Spark Branch]
> 
>
> Key: HIVE-10863
> URL: https://issues.apache.org/jira/browse/HIVE-10863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: mj.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10863) Merge trunk to Spark branch 5/28/2015 [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564198#comment-14564198
 ] 

Xuefu Zhang commented on HIVE-10863:


Unfortunately, the patch is too big to be attached here. I had to commit the 
merge and fix anything later on. There are conflicts, as shown below:
{code}
Conflicts:
pom.xml
ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java

ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java
ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java
ql/src/test/results/clientpositive/runtime_skewjoin_mapjoin_spark.q.out
ql/src/test/results/clientpositive/spark/cbo_gby.q.out
ql/src/test/results/clientpositive/spark/cbo_simple_select.q.out
ql/src/test/results/clientpositive/spark/cbo_udf_udaf.q.out

ql/src/test/results/clientpositive/spark/runtime_skewjoin_mapjoin_spark.q.out
ql/src/test/results/clientpositive/spark/union12.q.out
ql/src/test/results/clientpositive/spark/union17.q.out
ql/src/test/results/clientpositive/spark/union20.q.out
ql/src/test/results/clientpositive/spark/union21.q.out
ql/src/test/results/clientpositive/spark/union22.q.out
ql/src/test/results/clientpositive/spark/union24.q.out
ql/src/test/results/clientpositive/spark/union26.q.out
ql/src/test/results/clientpositive/spark/union27.q.out
ql/src/test/results/clientpositive/spark/union31.q.out
ql/src/test/results/clientpositive/spark/union32.q.out
ql/src/test/results/clientpositive/spark/union34.q.out
ql/src/test/results/clientpositive/spark/union_lateralview.q.out
ql/src/test/results/clientpositive/spark/union_remove_12.q.out
ql/src/test/results/clientpositive/spark/union_remove_13.q.out
ql/src/test/results/clientpositive/spark/union_remove_14.q.out
ql/src/test/results/clientpositive/spark/union_remove_22.q.out
ql/src/test/results/clientpositive/spark/union_remove_23.q.out
ql/src/test/results/clientpositive/spark/union_remove_6_subq.q.out
ql/src/test/results/clientpositive/spark/union_top_level.q.out

service/src/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java

spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java

spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java
{code}
I resolved most of them, except some changes from Spark branch is lost. The 
diff is shown in the attached mj.patch file. [~jxiang], could you take a look 
and see how to apply the diff?

We will need to watch the test result and fix them as needed.


> Merge trunk to Spark branch 5/28/2015 [Spark Branch]
> 
>
> Key: HIVE-10863
> URL: https://issues.apache.org/jira/browse/HIVE-10863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: mj.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10853) Create ExplainTask in ATS hook through ExplainWork

2015-05-28 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564193#comment-14564193
 ] 

Gunther Hagleitner commented on HIVE-10853:
---

Minor nit: would be nice to comment all the bool flags in setting up an 
instance of explain work. +1

> Create ExplainTask in ATS hook through ExplainWork
> --
>
> Key: HIVE-10853
> URL: https://issues.apache.org/jira/browse/HIVE-10853
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10853.01.patch
>
>
> Right now ExplainTask is created directly. That's fragile and can lead to 
> stuff like: HIVE-10829



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-05-28 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-10841:
---
Affects Version/s: 0.13.0

> [WHERE col is not null] does not work sometimes for queries with many JOIN 
> statements
> -
>
> Key: HIVE-10841
> URL: https://issues.apache.org/jira/browse/HIVE-10841
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0
>Reporter: Alexander Pivovarov
>
> The result from the following SELECT query is 3 rows but it should be 1 row.
> I checked it in MySQL - it returned 1 row.
> To reproduce the issue in Hive
> 1. prepare tables
> {code}
> drop table if exists L;
> drop table if exists LA;
> drop table if exists FR;
> drop table if exists A;
> drop table if exists PI;
> drop table if exists acct;
> create table L as select 4436 id;
> create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
> create table FR as select 4436 loan_id;
> create table A as select 4748 id;
> create table PI as select 4415 id;
> create table acct as select 4748 aid, 10 acc_n, 122 brn;
> insert into table acct values(4748, null, null);
> insert into table acct values(4748, null, null);
> {code}
> 2. run SELECT query
> {code}
> select
>   acct.ACC_N,
>   acct.brn
> FROM L
> JOIN LA ON L.id = LA.loan_id
> JOIN FR ON L.id = FR.loan_id
> JOIN A ON LA.aid = A.id
> JOIN PI ON PI.id = LA.pi_id
> JOIN acct ON A.id = acct.aid
> WHERE
>   L.id = 4436
>   and acct.brn is not null;
> {code}
> the result is 3 rows
> {code}
> 10122
> NULL  NULL
> NULL  NULL
> {code}
> but it should be 1 row
> {code}
> 10122
> {code}
> 2.1 "explain select ..." output for hive-1.3.0 MR
> {code}
> STAGE DEPENDENCIES:
>   Stage-12 is a root stage
>   Stage-9 depends on stages: Stage-12
>   Stage-0 depends on stages: Stage-9
> STAGE PLANS:
>   Stage: Stage-12
> Map Reduce Local Work
>   Alias -> Map Local Tables:
> a 
>   Fetch Operator
> limit: -1
> acct 
>   Fetch Operator
> limit: -1
> fr 
>   Fetch Operator
> limit: -1
> l 
>   Fetch Operator
> limit: -1
> pi 
>   Fetch Operator
> limit: -1
>   Alias -> Map Local Operator Tree:
> a 
>   TableScan
> alias: a
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
> stats: NONE
> Filter Operator
>   predicate: id is not null (type: boolean)
>   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
> Column stats: NONE
>   HashTable Sink Operator
> keys:
>   0 _col5 (type: int)
>   1 id (type: int)
>   2 aid (type: int)
> acct 
>   TableScan
> alias: acct
> Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
> Column stats: NONE
> Filter Operator
>   predicate: aid is not null (type: boolean)
>   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
> Column stats: NONE
>   HashTable Sink Operator
> keys:
>   0 _col5 (type: int)
>   1 id (type: int)
>   2 aid (type: int)
> fr 
>   TableScan
> alias: fr
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
> stats: NONE
> Filter Operator
>   predicate: (loan_id = 4436) (type: boolean)
>   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
> Column stats: NONE
>   HashTable Sink Operator
> keys:
>   0 4436 (type: int)
>   1 4436 (type: int)
>   2 4436 (type: int)
> l 
>   TableScan
> alias: l
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
> stats: NONE
> Filter Operator
>   predicate: (id = 4436) (type: boolean)
>   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
> Column stats: NONE
>   HashTable Sink Operator
> keys:
>   0 4436 (type: int)
>   1 4436 (type: int)
>   2 4436 (type: int)
> pi 
>   TableScan
> alias: pi
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
> stats: NONE
> Filter Operator
>   predicate: id is not null (type: boolean)
>   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
> Column stats: NONE
>   HashTable Sink Operator
> 

[jira] [Updated] (HIVE-10821) Beeline-CLI: Implement CLI source command using Beeline functionality

2015-05-28 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-10821:

Attachment: (was: HIVE-10821.1-beeline-cli.patch)

> Beeline-CLI: Implement CLI source command using Beeline functionality
> -
>
> Key: HIVE-10821
> URL: https://issues.apache.org/jira/browse/HIVE-10821
> Project: Hive
>  Issue Type: Sub-task
>  Components: CLI
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-10821.1-beeline-cli.patch, 
> HIVE-10821.1-beeline-cli.patch, HIVE-10821.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10821) Beeline-CLI: Implement CLI source command using Beeline functionality

2015-05-28 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-10821:

Attachment: (was: HIVE-10821.1-beeline-cli.patch)

> Beeline-CLI: Implement CLI source command using Beeline functionality
> -
>
> Key: HIVE-10821
> URL: https://issues.apache.org/jira/browse/HIVE-10821
> Project: Hive
>  Issue Type: Sub-task
>  Components: CLI
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-10821.1-beeline-cli.patch, 
> HIVE-10821.1-beeline-cli.patch, HIVE-10821.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10821) Beeline-CLI: Implement CLI source command using Beeline functionality

2015-05-28 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-10821:

Attachment: HIVE-10821.1-beeline-cli.patch

Thanks [~chinnalalam] for review. Update the patch addressing your comments.

> Beeline-CLI: Implement CLI source command using Beeline functionality
> -
>
> Key: HIVE-10821
> URL: https://issues.apache.org/jira/browse/HIVE-10821
> Project: Hive
>  Issue Type: Sub-task
>  Components: CLI
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-10821.1-beeline-cli.patch, 
> HIVE-10821.1-beeline-cli.patch, HIVE-10821.1-beeline-cli.patch, 
> HIVE-10821.1-beeline-cli.patch, HIVE-10821.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10821) Beeline-CLI: Implement CLI source command using Beeline functionality

2015-05-28 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-10821:

Attachment: HIVE-10821.1-beeline-cli.patch

Thanks [~chinnalalam] for review. Update the patch addressing your comments.

> Beeline-CLI: Implement CLI source command using Beeline functionality
> -
>
> Key: HIVE-10821
> URL: https://issues.apache.org/jira/browse/HIVE-10821
> Project: Hive
>  Issue Type: Sub-task
>  Components: CLI
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-10821.1-beeline-cli.patch, 
> HIVE-10821.1-beeline-cli.patch, HIVE-10821.1-beeline-cli.patch, 
> HIVE-10821.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7767) hive.optimize.union.remove does not work properly [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7767:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> hive.optimize.union.remove does not work properly [Spark Branch]
> 
>
> Key: HIVE-7767
> URL: https://issues.apache.org/jira/browse/HIVE-7767
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Na Yang
>Assignee: Na Yang
> Fix For: 1.1.0
>
> Attachments: HIVE-7767.1-spark.patch, HIVE-7767.2-spark.patch, 
> HIVE-7767.2-spark.patch, HIVE-7767.3-spark.patch
>
>
> Turing on the hive.optimize.union.remove property generates wrong union all 
> result. 
> For Example:
> {noformat}
> create table inputTbl1(key string, val string) stored as textfile;
> load data local inpath '../../data/files/T1.txt' into table inputTbl1;
> SELECT *
> FROM (
>   SELECT key, count(1) as values from inputTbl1 group by key
>   UNION ALL
>   SELECT key, count(1) as values from inputTbl1 group by key
> ) a;  
> {noformat}
> when the hive.optimize.union.remove is turned on, the query result is like: 
> {noformat}
> 1 1
> 2 1
> 3 1
> 7 1
> 8 2
> {noformat}
> when the hive.optimize.union.remove is turned off, the query result is like: 
> {noformat}
> 7 1
> 2 1
> 8 2
> 3 1
> 1 1
> 7 1
> 2 1
> 8 2
> 3 1
> 1 1
> {noformat}
> The expected query result is:
> {noformat}
> 7 1
> 2 1
> 8 2
> 3 1
> 1 1
> 7 1
> 2 1
> 8 2
> 3 1
> 1 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9267) Ensure custom UDF works with Spark [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9267:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Ensure custom UDF works with Spark [Spark Branch]
> -
>
> Key: HIVE-9267
> URL: https://issues.apache.org/jira/browse/HIVE-9267
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 1.1.0
>
> Attachments: HIVE-9267.1-spark.patch
>
>
> Create or add auto qtest if necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8352) Enable windowing.q for spark [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8352:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Enable windowing.q for spark [Spark Branch]
> ---
>
> Key: HIVE-8352
> URL: https://issues.apache.org/jira/browse/HIVE-8352
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Jimmy Xiang
>Priority: Minor
> Fix For: 1.1.0
>
> Attachments: HIVE-8352.1-spark.patch, HIVE-8352.1-spark.patch, 
> hive-8385.patch
>
>
> We should enable windowing.q for basic windowing coverage. After checking out 
> the spark branch, we would build:
> {noformat}
> $ mvn clean install -DskipTests -Phadoop-2
> $ cd itests/
> $ mvn clean install -DskipTests -Phadoop-2
> {noformat}
> Then generate the windowing.q.out file:
> {noformat}
> $ cd qtest-spark/
> $ mvn test -Dtest=TestSparkCliDriver -Dqfile=windowing.q -Phadoop-2 
> -Dtest.output.overwrite=true
> {noformat}
> Compare the output against MapReduce:
> {noformat}
> $ diff -y -W 150 
> ../../ql/src/test/results/clientpositive/spark/windowing.q.out 
> ../../ql/src/test/results/clientpositive/windowing.q.out| less
> {noformat}
> And if everything looks good, add it to {{spark.query.files}} in 
> {{./itests/src/test/resources/testconfiguration.properties}}
> then submit the patch including the .q file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9040) Spark Memory can be formatted string [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9040:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Spark Memory can be formatted string [Spark Branch]
> ---
>
> Key: HIVE-9040
> URL: https://issues.apache.org/jira/browse/HIVE-9040
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Jimmy Xiang
> Fix For: 1.1.0
>
> Attachments: HIVE-9040.1-spark.patch, HIVE-9040.2-spark.patch, 
> HIVE-9040.3-spark.patch
>
>
> Here: 
> https://github.com/apache/hive/blob/spark/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java#L72
> we call {{getInt}} on {{spark.executor.memory}} but this is a formatted 
> string, example here: http://spark.apache.org/docs/1.0.1/configuration.html 
> as such, I get:
> {noformat}
> 2014-12-08 03:04:48,114 WARN  [HiveServer2-Handler-Pool: Thread-34]: 
> spark.SetSparkReducerParallelism 
> (SetSparkReducerParallelism.java:process(141)) - Failed to create spark 
> client.
> java.lang.NumberFormatException: For input string: "23000m"
> at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> at java.lang.Integer.parseInt(Integer.java:492)
> at java.lang.Integer.parseInt(Integer.java:527)
> at 
> scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229)
> at scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
> at 
> org.apache.spark.SparkConf$$anonfun$getInt$2.apply(SparkConf.scala:184)
> at 
> org.apache.spark.SparkConf$$anonfun$getInt$2.apply(SparkConf.scala:184)
> at scala.Option.map(Option.scala:145)
> at org.apache.spark.SparkConf.getInt(SparkConf.scala:184)
> at 
> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.getMemoryAndCores(SparkSessionImpl.java:72)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8074) Merge trunk into spark 9/12/2014

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8074:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Merge trunk into spark 9/12/2014
> 
>
> Key: HIVE-8074
> URL: https://issues.apache.org/jira/browse/HIVE-8074
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Brock Noland
>Assignee: Brock Noland
> Fix For: 1.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7382) Create a MiniSparkCluster and set up a testing framework [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7382:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Create a MiniSparkCluster and set up a testing framework [Spark Branch]
> ---
>
> Key: HIVE-7382
> URL: https://issues.apache.org/jira/browse/HIVE-7382
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
>  Labels: Spark-M1
> Fix For: 1.1.0
>
>
> To automatically test Hive functionality over Spark execution engine, we need 
> to create a test framework that can execute Hive queries with Spark as the 
> backend. For that, we should create a MiniSparkCluser for this, similar to 
> other execution engines.
> Spark has a way to create a local cluster with a few processes in the local 
> machine, each process is a work node. It's fairly close to a real Spark 
> cluster. Our mini cluster can be based on that.
> For more info, please refer to the design doc on wiki.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8843) Release RDD cache when Hive query is done [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8843:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Release RDD cache when Hive query is done [Spark Branch]
> 
>
> Key: HIVE-8843
> URL: https://issues.apache.org/jira/browse/HIVE-8843
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Jimmy Xiang
> Fix For: 1.1.0
>
> Attachments: HIVE-8843.1-spark.patch, HIVE-8843.2-spark.patch, 
> HIVE-8843.3-spark.patch, HIVE-8843.3-spark.patch
>
>
> In some multi-inser cases, RDD.cache() is called to improve performance. RDD 
> is SparkContext specific, but the caching is useful only for the query. Thus, 
> once the query is executed, we need to release the cache used by calling 
> RDD.uncache().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9110) Performance of SELECT COUNT(*) FROM store_sales WHERE ss_item_sk IS NOT NULL [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9110:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Performance of SELECT COUNT(*) FROM store_sales WHERE ss_item_sk IS NOT NULL 
> [Spark Branch]
> ---
>
> Key: HIVE-9110
> URL: https://issues.apache.org/jira/browse/HIVE-9110
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Brock Noland
>Assignee: Rui Li
> Fix For: 1.1.0
>
>
> The query 
> {noformat}
> SELECT COUNT(*) FROM store_sales WHERE ss_item_sk IS NOT NULL
> {noformat}
> could benefit from performance enhancements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8908) Investigate test failure on join34.q [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8908:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Investigate test failure on join34.q [Spark Branch]
> ---
>
> Key: HIVE-8908
> URL: https://issues.apache.org/jira/browse/HIVE-8908
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Chao Sun
>Assignee: Chao Sun
> Fix For: 1.1.0
>
> Attachments: HIVE-8908.1-spark.patch, HIVE-8908.2-spark.patch
>
>
> For this query, the plan doesn't look correct:
> {noformat}
> OK
> STAGE DEPENDENCIES:
>   Stage-4 is a root stage
>   Stage-1 depends on stages: Stage-5, Stage-4
>   Stage-2 depends on stages: Stage-1
>   Stage-0 depends on stages: Stage-2
>   Stage-3 depends on stages: Stage-0
>   Stage-5 is a root stage
> STAGE PLANS:
>   Stage: Stage-4
> Spark
>   DagName: chao_20141118150101_a47a2d7b-e750-4764-be66-5ba95ebbe433:6
>   Vertices:
> Map 4 
> Map Operator Tree:
> TableScan
>   alias: x
>   Statistics: Num rows: 1 Data size: 216 Basic stats: 
> COMPLETE Column stats: NONE
>   Filter Operator
> predicate: key is not null (type: boolean)
> Statistics: Num rows: 1 Data size: 216 Basic stats: 
> COMPLETE Column stats: NONE
> Spark HashTable Sink Operator
>   condition expressions:
> 0 {_col1}
> 1 {value}
>   keys:
> 0 _col0 (type: string)
> 1 key (type: string)
> Reduce Output Operator
>   key expressions: key (type: string)
>   sort order: +
>   Map-reduce partition columns: key (type: string)
>   Statistics: Num rows: 1 Data size: 216 Basic stats: 
> COMPLETE Column stats: NONE
>   value expressions: value (type: string)
> Local Work:
>   Map Reduce Local Work
>   Stage: Stage-1
> Spark
>   Edges:
> Union 2 <- Map 1 (NONE, 0), Map 3 (NONE, 0)
>   DagName: chao_20141118150101_a47a2d7b-e750-4764-be66-5ba95ebbe433:4
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: x
>   Filter Operator
> predicate: (key < 20) (type: boolean)
> Select Operator
>   expressions: key (type: string), value (type: string)
>   outputColumnNames: _col0, _col1
>   Map Join Operator
> condition map:
>  Inner Join 0 to 1
> condition expressions:
>   0 {_col1}
>   1 {key} {value}
> keys:
>   0 _col0 (type: string)
>   1 key (type: string)
> outputColumnNames: _col1, _col2, _col3
> input vertices:
>   1 Map 4
> Select Operator
>   expressions: _col2 (type: string), _col3 (type: 
> string), _col1 (type: string)
>   outputColumnNames: _col0, _col1, _col2
>   File Output Operator
> compressed: false
> table:
> input format: 
> org.apache.hadoop.mapred.TextInputFormat
> output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> name: default.dest_j1
> Local Work:
>   Map Reduce Local Work
> Map 3 
> Map Operator Tree:
> TableScan
>   alias: x1
>   Filter Operator
> predicate: (key > 100) (type: boolean)
> Select Operator
>   expressions: key (type: string), value (type: string)
>   outputColumnNames: _col0, _col1
>   Map Join Operator
> condition map:
>  Inner Join 0 to 1
> condition expressions:
>   0 {_col1}
>   1 {key} {value}
> keys:
>   0 _col0 (type: string)
>

[jira] [Updated] (HIVE-8141) Refactor the GraphTran code by moving union handling logic to UnionTran [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8141:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Refactor the GraphTran code by moving union handling logic to UnionTran 
> [Spark Branch]
> --
>
> Key: HIVE-8141
> URL: https://issues.apache.org/jira/browse/HIVE-8141
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Na Yang
>Assignee: Na Yang
>  Labels: Spark-M1
> Fix For: 1.1.0
>
> Attachments: HIVE-8141.1-spark.patch
>
>
> In the current hive on spark code, union logic is handled in the GraphTran 
> class. The Union logic could be moved to the UnionTran class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8913) Make SparkMapJoinResolver handle runtime skew join [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8913:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Make SparkMapJoinResolver handle runtime skew join [Spark Branch]
> -
>
> Key: HIVE-8913
> URL: https://issues.apache.org/jira/browse/HIVE-8913
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Fix For: 1.1.0
>
> Attachments: HIVE-8913.1-spark.patch, HIVE-8913.2-spark.patch, 
> HIVE-8913.3-spark.patch
>
>
> Sub-task of HIVE-8406.
> Now we have {{SparkMapJoinResolver}} in place. But at the moment, it doesn't 
> handle the map join task created by upstream SkewJoinResolver, i.e. those 
> wrapped in a ConditionalTask. We have to implement this part for runtime skew 
> join to work on spark. To do so, we can borrow logic from {{MapJoinResolver}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8536) Enable SkewJoinResolver for spark [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8536:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Enable SkewJoinResolver for spark [Spark Branch]
> 
>
> Key: HIVE-8536
> URL: https://issues.apache.org/jira/browse/HIVE-8536
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Fix For: 1.1.0
>
> Attachments: HIVE-8536.1-spark.patch, HIVE-8536.2-spark.patch, 
> HIVE-8536.3-spark.patch, HIVE-8536.4-spark.patch
>
>
> Sub-task of HIVE-8406



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7748) Add qfile_regex to qtest-spark pom [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7748:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Add qfile_regex to qtest-spark pom [Spark Branch]
> -
>
> Key: HIVE-7748
> URL: https://issues.apache.org/jira/browse/HIVE-7748
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Brock Noland
> Fix For: 1.1.0
>
> Attachments: HIVE-7748.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7755) Enable avro* tests [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7755:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Enable avro* tests [Spark Branch]
> -
>
> Key: HIVE-7755
> URL: https://issues.apache.org/jira/browse/HIVE-7755
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Brock Noland
> Fix For: 1.1.0
>
> Attachments: HIVE-7755.1-spark.patch, HIVE-7755.2-spark.patch, 
> HIVE-7755.3-spark.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9081) Bucket mapjoin should use the new alias in posToAliasMap [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9081:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Bucket mapjoin should use the new alias in posToAliasMap [Spark Branch]
> ---
>
> Key: HIVE-9081
> URL: https://issues.apache.org/jira/browse/HIVE-9081
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 1.1.0
>
> Attachments: HIVE-9081.1-spark.patch, HIVE-9081.2-spark.patch
>
>
> In converting a mapjoin to a bucket mapjoin, the join aliases could be 
> updated. So we should update the posToAliasMap accordingly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7708) Fix qtest-spark pom.xml reference to test properties [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7708:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Fix qtest-spark pom.xml reference to test properties [Spark Branch]
> ---
>
> Key: HIVE-7708
> URL: https://issues.apache.org/jira/browse/HIVE-7708
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Brock Noland
> Fix For: 1.1.0
>
> Attachments: HIVE-7708.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8029) Remove reducers number configure in SparkTask [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8029:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Remove reducers number configure in SparkTask [Spark Branch]
> 
>
> Key: HIVE-8029
> URL: https://issues.apache.org/jira/browse/HIVE-8029
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>  Labels: Spark-M4
> Fix For: 1.1.0
>
> Attachments: HIVE-8029.1-spark.patch
>
>
> We do not need duplicated logic to configure reducers number in SparkTask, as 
> SetSparkReduceParallelism would always set reducers number in compiler phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8743) Disable MapJoin [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8743:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Disable MapJoin [Spark Branch]
> --
>
> Key: HIVE-8743
> URL: https://issues.apache.org/jira/browse/HIVE-8743
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
> Fix For: 1.1.0
>
> Attachments: HIVE-8743.1-spark.patch
>
>
> Disable MapJoin in Spark branch for now. It is not implemented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9054) Add additional logging to SetSparkReducerParallelism [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9054:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Add additional logging to SetSparkReducerParallelism [Spark Branch]
> ---
>
> Key: HIVE-9054
> URL: https://issues.apache.org/jira/browse/HIVE-9054
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Brock Noland
>Assignee: Brock Noland
> Fix For: 1.1.0
>
> Attachments: HIVE-9054.1-spark.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8883) Investigate test failures on auto_join30.q [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8883:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Investigate test failures on auto_join30.q [Spark Branch]
> -
>
> Key: HIVE-8883
> URL: https://issues.apache.org/jira/browse/HIVE-8883
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Chao Sun
>Assignee: Chao Sun
> Fix For: 1.1.0
>
> Attachments: HIVE-8883.1-spark.patch, HIVE-8883.2-spark.patch, 
> HIVE-8883.3-spark.patch, HIVE-8883.4-spark.patch
>
>
> This test fails with the following stack trace:
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>   at 
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214)
>   at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>   at org.apache.spark.scheduler.Task.run(Task.scala:56)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> 2014-11-14 17:05:09,206 ERROR [Executor task launch worker-4]: 
> spark.SparkReduceRecordHandler 
> (SparkReduceRecordHandler.java:processRow(285)) - 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) 
> {"key":{"reducesinkkey0":"val_0"},"value":{"_col0":"0"}}
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:328)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>   at 
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214)
>   at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>   at org.apache.spark.scheduler.Task.run(Task.scala:56)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected 
> exception: null
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:318)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319)
>   ... 14 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOpe

[jira] [Updated] (HIVE-9378) Spark qfile tests should reuse RSC [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9378:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Spark qfile tests should reuse RSC [Spark Branch]
> -
>
> Key: HIVE-9378
> URL: https://issues.apache.org/jira/browse/HIVE-9378
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 1.1.0
>
> Attachments: HIVE-9378.1-spark.patch, HIVE-9378.2-spark.patch, 
> HIVE-9378.3-spark.patch, HIVE-9378.4-spark.patch
>
>
> Run several qfile tests, use jps to monitor the java processes. You will find 
> several SparkSubmitDriverBootstrapper processes are created (not the same 
> time of course).  It seems to me that we create a RSC for each qfile, then 
> terminate it when this qfile test is done. The RSC seems not shared among 
> qfiles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8899) Merge from trunk to spark [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8899:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Merge from trunk to spark [Spark Branch]
> 
>
> Key: HIVE-8899
> URL: https://issues.apache.org/jira/browse/HIVE-8899
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Brock Noland
>Assignee: Brock Noland
> Fix For: 1.1.0
>
> Attachments: HIVE-8899.2-spark.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9192) One-pass SMB Optimizations [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9192:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> One-pass SMB Optimizations [Spark Branch]
> -
>
> Key: HIVE-9192
> URL: https://issues.apache.org/jira/browse/HIVE-9192
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: spark-branch
>Reporter: Szehon Ho
>Assignee: Szehon Ho
>Priority: Minor
> Fix For: 1.1.0
>
> Attachments: HIVE-9192-spark.patch
>
>
> Currently for Spark compiler's task-generation there is a second-pass to 
> handle SMB joins.  This might be optimized to one-pass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7745) NullPointerException when turn on hive.optimize.union.remove, hive.merge.mapfiles and hive.merge.mapredfiles [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7745:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> NullPointerException when turn on hive.optimize.union.remove, 
> hive.merge.mapfiles and hive.merge.mapredfiles [Spark Branch]
> ---
>
> Key: HIVE-7745
> URL: https://issues.apache.org/jira/browse/HIVE-7745
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Na Yang
>Assignee: Na Yang
> Fix For: 1.1.0
>
> Attachments: HIVE-7745-spark.patch
>
>
> When the hive.optimize.union.remove, hive.merge.mapfiles and 
> hive.merge.mapredfiles are turned on, it throws NullPointerException when I 
> do the following queries: 
> {noformat}
> create table inputTbl1(key string, val string) stored as textfile;
> create table outputTbl1(key string, values bigint) stored as rcfile;
> load data local inpath '../../data/files/T1.txt' into table inputTbl1;
> explain
> insert overwrite table outputTbl1
> SELECT * FROM
> (
> select key, count(1) as values from inputTbl1 group by key 
> union all
> select * FROM (
>   SELECT key, 1 as values from inputTbl1 
>   UNION ALL
>   SELECT key, 2 as values from inputTbl1
> ) a
> )b;
> {noformat}
> If the hive.merge.mapfiles and hive.merge.mapredfiles are turned off, I do 
> not see any error. 
> Here is the stack trace:
> {noformat}
> 2014-08-16 01:32:26,849 ERROR [main]: ql.Driver 
> (SessionState.java:printError(681)) - FAILED: NullPointerException null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.createMoveTask(GenMapRedUtils.java:1738)
> at 
> org.apache.hadoop.hive.ql.parse.spark.GenSparkUtils.processFileSink(GenSparkUtils.java:281)
> at 
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.generateTaskTree(SparkCompiler.java:187)
> at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:199)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9508)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:208)
> at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:208)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:414)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:310)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1005)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1070)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:942)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:932)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:246)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:198)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:408)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8783) Create some tests that use Spark counter for stats collection [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8783:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Create some tests that use Spark counter for stats collection [Spark Branch]
> 
>
> Key: HIVE-8783
> URL: https://issues.apache.org/jira/browse/HIVE-8783
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Chengxiang Li
> Fix For: 1.1.0
>
> Attachments: HIVE-8783.1-spark.patch, HIVE-8783.2-spark.patch, 
> HIVE-8783.2-spark.patch
>
>
> Currently when .q tests are run with Spark, the default stats collection is 
> "fs". We need to have some tests that use Spark counter for stats collection 
> to enhance coverage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9517) UNION ALL query failed with ArrayIndexOutOfBoundsException [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9517:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> UNION ALL query failed with ArrayIndexOutOfBoundsException [Spark Branch]
> -
>
> Key: HIVE-9517
> URL: https://issues.apache.org/jira/browse/HIVE-9517
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: spark-branch
>Reporter: Chao Sun
>Assignee: Chao Sun
> Fix For: 1.1.0
>
> Attachments: HIVE-9517.1.patch, HIVE-9517.2.patch
>
>
> I was running a query from cbo_gby_empty.q:
> {code}
> select unionsrc.key, unionsrc.value FROM (select 'max' as key, max(c_int) as 
> value from cbo_t3 s1
>   UNION  ALL
>   select 'min' as key,  min(c_int) as value from cbo_t3 s2
> UNION ALL
> select 'avg' as key,  avg(c_int) as value from cbo_t3 s3) unionsrc 
> order by unionsrc.key;
> {code}
> and got the following exception:
> {noformat}
> 2015-01-29 15:57:55,948 ERROR [Executor task launch worker-1]: 
> spark.SparkReduceRecordHandler 
> (SparkReduceRecordHandler.java:processRow(299)) - Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) {"key":{"reducesinkkey0":"max"},"value":{"_col0":1.5}}
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) {"key":{"reducesinkkey0":"max"},"value":{"_col0":1.5}}
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:339)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
>   at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390)
>   at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>   at org.apache.spark.scheduler.Task.run(Task.scala:56)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating 
> VALUE._col0
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:82)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:330)
>   ... 17 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 3
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.byteArrayToLong(LazyBinaryUtils.java:84)
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryDouble.init(LazyBinaryDouble.java:43)
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:264)
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:201)
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:98)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:77)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8274) Refactoring SparkPlan and SparkPlanGeneration [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8274:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Refactoring SparkPlan and SparkPlanGeneration [Spark Branch]
> 
>
> Key: HIVE-8274
> URL: https://issues.apache.org/jira/browse/HIVE-8274
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>  Labels: Spark-M1
> Fix For: 1.1.0
>
>
> As part of HIVE-8118, SparkWork will be modified with cloned Map/Reduce- 
> Works, and input RDDs and some intemediate RDDs may need to be cached for 
> performance. To accomodate these, SparkPlan model and SparkPlan generation 
> need to be refactored.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8422) Turn on all join .q tests [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8422:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Turn on all join .q tests [Spark Branch]
> 
>
> Key: HIVE-8422
> URL: https://issues.apache.org/jira/browse/HIVE-8422
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Chao Sun
> Fix For: 1.1.0
>
> Attachments: HIVE-8422.1-spark.patch, HIVE-8422.2-spark.patch
>
>
> With HIVE-8412, all join queries should work on Spark, whether they require a 
> particular optimization or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8160) Upgrade Spark dependency to 1.2.0-SNAPSHOT [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8160:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Upgrade Spark dependency to 1.2.0-SNAPSHOT [Spark Branch]
> -
>
> Key: HIVE-8160
> URL: https://issues.apache.org/jira/browse/HIVE-8160
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>Priority: Minor
>  Labels: Spark-M1
> Fix For: 1.1.0
>
> Attachments: HIVE-8160.1-spark.patch
>
>
> Hive on Spark needs SPARK-2978, which is now available in latest Spark main 
> branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8436) Modify SparkWork to split works with multiple child works [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8436:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Modify SparkWork to split works with multiple child works [Spark Branch]
> 
>
> Key: HIVE-8436
> URL: https://issues.apache.org/jira/browse/HIVE-8436
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Chao Sun
> Fix For: 1.1.0
>
> Attachments: HIVE-8436.1-spark.patch, HIVE-8436.10-spark.patch, 
> HIVE-8436.11-spark.patch, HIVE-8436.2-spark.patch, HIVE-8436.3-spark.patch, 
> HIVE-8436.4-spark.patch, HIVE-8436.5-spark.patch, HIVE-8436.6-spark.patch, 
> HIVE-8436.7-spark.patch, HIVE-8436.8-spark.patch, HIVE-8436.9-spark.patch
>
>
> Based on the design doc, we need to split the operator tree of a work in 
> SparkWork if the work is connected to multiple child works. The way splitting 
> the operator tree is performed by cloning the original work and removing 
> unwanted branches in the operator tree. Please refer to the design doc for 
> details.
> This process should be done right before we generate SparkPlan. We should 
> have a utility method that takes the orignal SparkWork and return a modified 
> SparkWork.
> This process should also keep the information about the original work and its 
> clones. Such information will be needed during SparkPlan generation 
> (HIVE-8437).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8405) Research Bucket Map Join [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8405:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Research Bucket Map Join [Spark Branch]
> ---
>
> Key: HIVE-8405
> URL: https://issues.apache.org/jira/browse/HIVE-8405
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Na Yang
>Assignee: Na Yang
> Fix For: 1.1.0
>
> Attachments: hive-on-spark-bucketmapjoin.pdf
>
>
> Research on how to implement Bucket Map Join for hive on Spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7793) Enable tests on Spark branch (3) [Sparch Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7793:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Enable tests on Spark branch (3) [Sparch Branch]
> 
>
> Key: HIVE-7793
> URL: https://issues.apache.org/jira/browse/HIVE-7793
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Brock Noland
>Assignee: Chengxiang Li
> Fix For: 1.1.0
>
> Attachments: HIVE-7793.1-spark.patch
>
>
> This jira is to enable *most* of the tests below. If tests don't pass because 
> of some unsupported feature, ensure that a JIRA exists and move on.
> {noformat}
>  ptf.q,\
>   sample1.q,\
>   script_env_var1.q,\
>   script_env_var2.q,\
>   script_pipe.q,\
>   scriptfile1.q,\
>   stats_counter.q,\
>   stats_counter_partitioned.q,\
>   stats_noscan_1.q,\
>   subquery_exists.q,\
>   subquery_in.q,\
>   temp_table.q,\
>   transform1.q,\
>   transform2.q,\
>   transform_ppr1.q,\
>   transform_ppr2.q,\
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7411) Exclude hadoop 1 from spark dep [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7411:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Exclude hadoop 1 from spark dep [Spark Branch]
> --
>
> Key: HIVE-7411
> URL: https://issues.apache.org/jira/browse/HIVE-7411
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Brock Noland
> Fix For: 1.1.0
>
> Attachments: HIVE-7411.patch
>
>
> The branch does not compile on my machine. Attached patch fixes this.
> NO PRECOMMIT TESTS (I am working on this)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8540) HivePairFlatMapFunction.java missing license header [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8540:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> HivePairFlatMapFunction.java missing license header [Spark Branch]
> --
>
> Key: HIVE-8540
> URL: https://issues.apache.org/jira/browse/HIVE-8540
> Project: Hive
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: spark-branch
>Reporter: Xuefu Zhang
>Assignee: Chao Sun
> Fix For: 1.1.0
>
> Attachments: HIVE-8540.1-spark.patch
>
>
> Also, please remove unneeded imports in SparkUtilities.java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9207) Add more log information for debug RSC[Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9207:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Add more log information for debug RSC[Spark Branch]
> 
>
> Key: HIVE-9207
> URL: https://issues.apache.org/jira/browse/HIVE-9207
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>Priority: Minor
> Fix For: 1.1.0
>
> Attachments: HIVE-9207.1-spark.patch
>
>
> Currently, error message in certain scenerio is lost in RSC, and we need more 
> log info in DEBUG level for debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8982) IndexOutOfBounds exception in mapjoin [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8982:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> IndexOutOfBounds exception in mapjoin [Spark Branch]
> 
>
> Key: HIVE-8982
> URL: https://issues.apache.org/jira/browse/HIVE-8982
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Szehon Ho
>Assignee: Chao Sun
> Fix For: 1.1.0
>
> Attachments: HIVE-8982.1-spark.patch, HIVE-8982.2-spark.patch
>
>
> There are sometimes random failures in spark mapjoin during unit tests like:
> {noformat}
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
>   at 
> org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
>   at 
> org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365)
>   at 
> org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>   at org.apache.spark.scheduler.Task.run(Task.scala:56)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
>   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
>   at java.util.ArrayList.get(ArrayList.java:411)
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.first(MapJoinEagerRowContainer.java:70)
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.write(MapJoinEagerRowContainer.java:150)
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.persist(MapJoinTableContainerSerDe.java:167)
>   at 
> org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.flushToFile(SparkHashTableSinkOperator.java:128)
>   at 
> org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:77)
>   ... 20 more
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
>   at 
> org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreac

[jira] [Updated] (HIVE-7880) Support subquery [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7880:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Support subquery [Spark Branch]
> ---
>
> Key: HIVE-7880
> URL: https://issues.apache.org/jira/browse/HIVE-7880
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Xuefu Zhang
>  Labels: Spark-M2
> Fix For: 1.1.0
>
> Attachments: HIVE-7880.1-spark.patch
>
>
> While try to enable SubQuery qtests, I found that SubQuery cases return null 
> value currently, we should enable subquery for Hive on Spark. We should 
> enable subquery_exists.q and subquery_in.q in this task as Tez does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9568) Revert changes in two test configuration files accidently brought in by HIVE-9552 [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9568:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Revert changes in two test configuration files accidently brought in by 
> HIVE-9552 [Spark Branch]
> 
>
> Key: HIVE-9568
> URL: https://issues.apache.org/jira/browse/HIVE-9568
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 1.1.0
>
> Attachments: HIVE-9568.1-spark.patch
>
>
> The changes in the following files, while harmless for test, needs to be 
> reverted because they are unnecessary.
> {code}
> data/conf/spark/standalone/hive-site.xml
> data/conf/spark/yarn-client/hive-site.xml
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7821) StarterProject: enable groupby4.q [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7821:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> StarterProject: enable groupby4.q [Spark Branch]
> 
>
> Key: HIVE-7821
> URL: https://issues.apache.org/jira/browse/HIVE-7821
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Suhas Satish
> Fix For: 1.1.0
>
> Attachments: HIVE-7821-spark.patch, HIVE-7821.3-spark.patch, 
> HIVE-7821.4-spark.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9627) Add cbo_gby_empty.q.out for Spark [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9627:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Add cbo_gby_empty.q.out for Spark [Spark Branch]
> 
>
> Key: HIVE-9627
> URL: https://issues.apache.org/jira/browse/HIVE-9627
> Project: Hive
>  Issue Type: Test
>Affects Versions: spark-branch
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Trivial
> Fix For: 1.1.0
>
> Attachments: HIVE-9627.1-spark.patch
>
>
> The golden file cbo_gby_empty.q.out for Spark is missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8686) Enable vectorization tests with query results sort [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8686:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Enable vectorization tests with query results sort [Spark Branch]
> -
>
> Key: HIVE-8686
> URL: https://issues.apache.org/jira/browse/HIVE-8686
> Project: Hive
>  Issue Type: Test
>  Components: Spark
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Trivial
> Fix For: 1.1.0
>
> Attachments: HIVE-8686.1-spark.patch, HIVE-8686.2-spark.patch
>
>
> Hive-8573 added query results sort to some vectorization tests. Now since the 
> patch is merged to the spark branch. We can enable these tests in the spark 
> branch now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7591) GenMapRedUtils::addStatsTask only assumes either MapredWork or TezWork

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7591:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> GenMapRedUtils::addStatsTask only assumes either MapredWork or TezWork
> --
>
> Key: HIVE-7591
> URL: https://issues.apache.org/jira/browse/HIVE-7591
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Chao Sun
>Assignee: Brock Noland
> Fix For: 1.1.0
>
> Attachments: HIVE-7591-spark.patch
>
>
> When running queries, I got exception like this:
> {noformat}
> FAILED: ClassCastException org.apache.hadoop.hive.ql.plan.SparkWork cannot be 
> cast to org.apache.hadoop.hive.ql.plan.TezWork
> 14/07/31 15:08:53 ERROR ql.Driver: FAILED: ClassCastException 
> org.apache.hadoop.hive.ql.plan.SparkWork cannot be cast to 
> org.apache.hadoop.hive.ql.plan.TezWork
> java.lang.ClassCastException: org.apache.hadoop.hive.ql.plan.SparkWork cannot 
> be cast to org.apache.hadoop.hive.ql.plan.TezWork
>   at 
> org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.addStatsTask(GenMapRedUtils.java:1419)
>   at 
> org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.isMergeRequired(GenMapRedUtils.java:1645)
>   at 
> org.apache.hadoop.hive.ql.parse.spark.GenSparkUtils.processFileSink(GenSparkUtils.java:313)
>   at 
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.generateTaskTree(SparkCompiler.java:180)
>   at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:199)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9514)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:207)
>   at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:207)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:413)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:309)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:984)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1049)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:921)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:265)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:217)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:427)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:800)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:694)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:633)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> {noformat}
> Apparently, GenMapRedUtils::addStatsTask only assumes either MapredWork or 
> TezWork, and since we are introducing SparkWork, this need to be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8507) UT: fix rcfile_bigdata test [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8507:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> UT: fix rcfile_bigdata test [Spark Branch]
> --
>
> Key: HIVE-8507
> URL: https://issues.apache.org/jira/browse/HIVE-8507
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Thomas Friedrich
>Assignee: Chinna Rao Lalam
>Priority: Minor
> Fix For: 1.1.0
>
> Attachments: HIVE-8507.1-spark.patch, HIVE-8507.2-spark.patch
>
>
> The tests
> groupby_bigdata
> rcfile_bigdata 
> fail because it can't find the dumpdata_script.py file that is referenced in 
> the script: rcfile_bigdata.q
> /usr/bin/python: can't open file 'dumpdata_script.py': [Errno 2] No such file 
> or directory
> There are two references:
> add file ../../dumpdata_script.py;
> FROM (FROM src MAP src.key,src.value USING 'python dumpdata_script.py'
> Since it's using relative path it seems to be related to spark tests being 
> one level deeper than regular tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7781) Enable windowing and analytic function qtests [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7781:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Enable windowing and analytic function qtests [Spark Branch]
> 
>
> Key: HIVE-7781
> URL: https://issues.apache.org/jira/browse/HIVE-7781
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
> Fix For: 1.1.0
>
> Attachments: HIVE-7781.1-spark.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9216) Avoid redundant clone of JobConf [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9216:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Avoid redundant clone of JobConf [Spark Branch]
> ---
>
> Key: HIVE-9216
> URL: https://issues.apache.org/jira/browse/HIVE-9216
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Minor
> Fix For: 1.1.0
>
> Attachments: HIVE-9216.1-spark.patch
>
>
> Currently in SparkPlanGenerator, we clone job conf twice for each MapWork. 
> Should avoid this as cloning job conf involves writing to HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7338) Create SparkPlanGenerator [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7338:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Create SparkPlanGenerator [Spark Branch]
> 
>
> Key: HIVE-7338
> URL: https://issues.apache.org/jira/browse/HIVE-7338
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>  Labels: Spark-M1
> Fix For: 1.1.0
>
> Attachments: HIVE-7338.patch
>
>
> Translate SparkWork into SparkPlan. The translation may be invoked by 
> SparkClient when executing SparkTask.
> NO PRECOMMIT TESTS. This is for spark branch only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9493) Failed job may not throw exceptions [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9493:
--
Fix Version/s: (was: spark-branch)

> Failed job may not throw exceptions [Spark Branch]
> --
>
> Key: HIVE-9493
> URL: https://issues.apache.org/jira/browse/HIVE-9493
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Fix For: 1.1.0
>
> Attachments: HIVE-9493.1-spark.patch
>
>
> Currently remote driver assumes exception will be thrown when job fails to 
> run. This may not hold since job is submitted asynchronously. And we have to 
> check the futures before we decide the job is successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9101) bucket_map_join_spark4.q failed due to NPE.[Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9101:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> bucket_map_join_spark4.q failed due to NPE.[Spark Branch]
> -
>
> Key: HIVE-9101
> URL: https://issues.apache.org/jira/browse/HIVE-9101
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Jimmy Xiang
>  Labels: Spark-M4
> Fix For: 1.1.0
>
> Attachments: HIVE-9101.1-spark.patch
>
>
> bucket_map_join_spark4.q failed due to the following exception after 
> HIVE-9078:
> {noformat}
> 2014-12-15 04:48:56,241 ERROR [Executor task launch worker-0]: 
> executor.Executor (Logging.scala:logError(96)) - Exception in task 0.3 in 
> stage 7.0 (TID 15)
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:160)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:28)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
> at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
> at 
> org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390)
> at 
> org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> at org.apache.spark.scheduler.Task.run(Task.scala:56)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:114)
> at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:193)
> at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:219)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:149)
> ... 16 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:104)
> ... 25 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7527) Support order by and sort by on Spark [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7527:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Support order by and sort by on Spark [Spark Branch]
> 
>
> Key: HIVE-7527
> URL: https://issues.apache.org/jira/browse/HIVE-7527
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
> Fix For: 1.1.0
>
> Attachments: HIVE-7527-spark.patch, HIVE-7527.2-spark.patch
>
>
> Currently Hive depends completely on MapReduce's sorting as part of shuffling 
> to achieve order by (global sort, one reducer) and sort by (local sort).
> Spark has a sort by transformation in different variations that can used to 
> support Hive's order by and sort by. However, we still need to evaluate 
> weather Spark's sortBy can achieve the same functionality inherited from 
> MapReduce's shuffle sort.
> Currently Hive on Spark should be able to run simple sort by or order by, by 
> changing the currently partitionBy to sortby. This is the way to verify 
> theories. Complete solution will not be available until we have complete 
> SparkPlanGenerator.
> There is also a question of how we determine that there is order by or sort 
> by by just looking at the operator tree, from which Spark task is created. 
> This is the responsibility of SparkPlanGenerator, but we need to have an idea.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9211) Research on build mini HoS cluster on YARN for unit test[Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9211:
--
Fix Version/s: (was: spark-branch)

> Research on build mini HoS cluster on YARN for unit test[Spark Branch]
> --
>
> Key: HIVE-9211
> URL: https://issues.apache.org/jira/browse/HIVE-9211
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>  Labels: Spark-M5
> Fix For: 1.1.0
>
> Attachments: HIVE-9211.1-spark.patch, HIVE-9211.2-spark.patch, 
> HIVE-9211.2-spark.patch, HIVE-9211.3-spark.patch, HIVE-9211.4-spark.patch, 
> HIVE-9211.5-spark.patch, HIVE-9211.6-spark.patch, HIVE-9211.7-spark.patch
>
>
> HoS on YARN is a common use case in product environment, we'd better enable 
> unit test for this case. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9135) Cache Map and Reduce works in RSC [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9135:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Cache Map and Reduce works in RSC [Spark Branch]
> 
>
> Key: HIVE-9135
> URL: https://issues.apache.org/jira/browse/HIVE-9135
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Brock Noland
>Assignee: Jimmy Xiang
> Fix For: 1.1.0
>
> Attachments: HIVE-9135.1-spark.patch, HIVE-9135.1-spark.patch, 
> HIVE-9135.3-spark.patch, HIVE-9135.3.patch, HIVE-9135.4-spark.patch
>
>
> HIVE-9127 works around the fact that we don't cache Map/Reduce works in 
> Spark. However, other input formats such as HiveInputFormat will not benefit 
> from that fix. We should investigate how to allow caching on the RSC while 
> not on tasks (see HIVE-7431).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7717) Add .q tests coverage for "union all" [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7717:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Add .q tests coverage for "union all" [Spark Branch]
> 
>
> Key: HIVE-7717
> URL: https://issues.apache.org/jira/browse/HIVE-7717
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Na Yang
>Assignee: Na Yang
> Fix For: 1.1.0
>
> Attachments: HIVE-7717.1-spark.patch, HIVE-7717.2-spark.patch, 
> HIVE-7717.3-spark.patch
>
>
> Add automation test coverage for "union all", by searching through the 
> q-tests in "ql/src/test/queries/clientpositive/" for union tests (like 
> union*.q) and verifying/enabling them on spark.
> Steps to do:
> 1.  Enable a qtest .q in 
> itests/src/test/resources/testconfiguration.properties by adding the .q test 
> files to spark.query.files.
> 2.  Run mvn test -Dtest=TestSparkCliDriver -Dqfile=.q 
> -Dtest.output.overwrite=true -Phadoop-2 to generate the output (located in 
> ql/src/test/results/clientpositive/spark).  File will be called 
> .q.out.
> 3.  Check the generated output is good by verifying the results.  For 
> comparison, check the MR version in 
> ql/src/test/results/clientpositive/.q.out.  The reason its 
> separate is because the explain plan outputs are different for Spark/MR.
> 4.  Checkin the modification to testconfiguration.properties, and the 
> generated q.out file as well.  You only have to generate the output once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8438) Clean up code introduced by HIVE-7503 and such [Spark Plan]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8438:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Clean up code introduced by HIVE-7503 and such [Spark Plan]
> ---
>
> Key: HIVE-8438
> URL: https://issues.apache.org/jira/browse/HIVE-8438
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Chao Sun
> Fix For: 1.1.0
>
>
> With HIVE-8436 and HIVE-8437, we don't need the previouls, incomplete 
> solution for muti-insert. Thus, we need clean up the unwanted code, including 
> any disabled optimization or tricks added to make tests pass. All 
> multi-insert queries should pass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7584) Change SparkCompiler to generate a SparkWork that contains UnionWork from logical operator tree

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7584:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Change SparkCompiler to generate a SparkWork that contains UnionWork from 
> logical operator tree
> ---
>
> Key: HIVE-7584
> URL: https://issues.apache.org/jira/browse/HIVE-7584
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Na Yang
>Assignee: Na Yang
> Fix For: 1.1.0
>
> Attachments: HIVE-7584.1-spark.patch
>
>
> This is a subtask of supporting union all operation for Hive on Spark 
> We need to change the current SparkCompiler to generate a SparkWork that 
> contains UnionWork from logical operator tree.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8777) Should only register used counters in SparkCounters[Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8777:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Should only register used counters in SparkCounters[Spark Branch]
> -
>
> Key: HIVE-8777
> URL: https://issues.apache.org/jira/browse/HIVE-8777
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>  Labels: Spark-M3
> Fix For: 1.1.0
>
> Attachments: HIVE-8777.1-spark.patch
>
>
> Currently we register all hive operator counters in SparkCounters, while 
> actually not all hive operators are used in SparkTask, we should iterate 
> SparkTask's operators, and only register conuters required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9487) Make Remote Spark Context secure [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9487:
--
Fix Version/s: (was: spark-branch)

> Make Remote Spark Context secure [Spark Branch]
> ---
>
> Key: HIVE-9487
> URL: https://issues.apache.org/jira/browse/HIVE-9487
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>  Labels: TODOC-SPARK
> Fix For: 1.1.0
>
> Attachments: HIVE-9487.1-spark.patch, HIVE-9487.2-spark.patch
>
>
> The RSC currently uses an ad-hoc, insecure authentication mechanism. We 
> should instead use a proper auth mechanism and add encryption to the mix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9116) Add unit test for multi sessions.[Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9116:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Add unit test for multi sessions.[Spark Branch]
> ---
>
> Key: HIVE-9116
> URL: https://issues.apache.org/jira/browse/HIVE-9116
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>  Labels: Spark-M4
> Fix For: 1.1.0
>
> Attachments: HIVE-9116.1-spark.patch
>
>
> HS2 multi sessions support is enabled in HoS, we should add some unit tests 
> for verification and regression test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8457) MapOperator initialization fails when multiple Spark threads is enabled [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8457:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> MapOperator initialization fails when multiple Spark threads is enabled 
> [Spark Branch]
> --
>
> Key: HIVE-8457
> URL: https://issues.apache.org/jira/browse/HIVE-8457
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Chao Sun
>Assignee: Chao Sun
> Fix For: 1.1.0
>
> Attachments: HIVE-8457.1-spark.patch, HIVE-8457.2-spark.patch
>
>
> Currently, on the Spark branch, each thread it is bound with a thread-local 
> IOContext, which gets initialized when we generates an input {{HadoopRDD}}, 
> and later used in {{MapOperator}}, {{FilterOperator}}, etc.
> And, given the introduction of HIVE-8118, we may have multiple downstream 
> RDDs that share the same input {{HadoopRDD}}, and we would like to have the 
> {{HadoopRDD}} to be cached, to avoid scanning the same table multiple times. 
> A typical case would be like the following:
> {noformat}
>  inputRDD inputRDD
> ||
>MT_11MT_12
> ||
>RT_1 RT_2
> {noformat}
> Here, {{MT_11}} and {{MT_12}} are {{MapTran}} from a splitted {{MapWork}},
> and {{RT_1}} and {{RT_2}} are two {{ReduceTran}}. Note that, this example is 
> simplified, as we may also have {{ShuffleTran}} between {{MapTran}} and 
> {{ReduceTran}}.
> When multiple Spark threads are running, {{MT_11}} may be executed first, and 
> it will ask for an iterator from the {{HadoopRDD}} will trigger the creation 
> of the iterator, which in turn triggers the initialization of the 
> {{IOContext}} associated with that particular thread.
> *Now, the problem is*: before {{MT_12}} starts executing, it will also ask 
> for an iterator from the
> {{HadoopRDD}}, and since the RDD is already cached, instead of creating a new 
> iterator, it will just fetch it from the cached result. However, *this will 
> skip the initialization of the IOContext associated with this particular 
> thread*. And, when {{MT_12}} starts executing, it will try to initialize the 
> {{MapOperator}}, but since the {{IOContext}} is not initialized, this will 
> fail miserably. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8054) Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8054:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark 
> Branch]
> --
>
> Key: HIVE-8054
> URL: https://issues.apache.org/jira/browse/HIVE-8054
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Na Yang
>  Labels: Spark-M1, TODOC-SPARK
> Fix For: 1.1.0
>
> Attachments: HIVE-8054-spark.patch, HIVE-8054.2-spark.patch, 
> HIVE-8054.3-spark.patch
>
>
> Option hive.optimize.union.remove introduced in HIVE-3276 removes union 
> operators from the operator graph in certain cases as an optimization reduce 
> the number of MR jobs. While making sense in MR, this optimization is 
> actually harmful to an execution engine such as Spark, which natives supports 
> union without requiring additional jobs. This is because removing union 
> operator creates disjointed operator graphs, each graph generating a job, and 
> thus this optimization requires more jobs to run the query. Not to mention 
> the additional complexity handling linked FS descriptors.
> I propose that we disable such optimization when the execution engine is 
> Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7580) Support dynamic partitioning [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7580:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Support dynamic partitioning [Spark Branch]
> ---
>
> Key: HIVE-7580
> URL: https://issues.apache.org/jira/browse/HIVE-7580
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Chinna Rao Lalam
>  Labels: Spark-M1
> Fix For: 1.1.0
>
> Attachments: HIVE-7580.1-spark.patch, HIVE-7580.patch
>
>
> My understanding is that we don't need to do anything special for this. 
> However, this needs to be verified and tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7954) Investigate query failures (3)

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7954:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Investigate query failures (3)
> --
>
> Key: HIVE-7954
> URL: https://issues.apache.org/jira/browse/HIVE-7954
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Thomas Friedrich
> Fix For: 1.1.0
>
>
> I ran all q-file tests and the following failed with an exception:
> http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/HIVE-SPARK-ALL-TESTS-Build/lastCompletedBuild/testReport/
> we don't necessary want to run all these tests as part of the spark tests, 
> but we should understand why they failed with an exception. This JIRA is to 
> look into these failures and document them with one of:
> * New JIRA
> * Covered under existing JIRA
> * More investigation required
> Tests:
> {noformat}
>  
> org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_root_dir_external_table
>   0.28 sec2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_view   
> 12 sec  2
>  
> org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_complex_types
> 1.5 sec 2
>  
> org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_multi_insert_common_distinct
>  3.9 sec 2
>  
> org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_authorization_admin_almighty2
> 2.6 sec 2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_quotedid_smb 
> 3.2 sec 2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_input20  1.5 sec 
> 2
>  
> org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_dbtxnmgr_showlocks
>0.23 sec2
>  
> org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketsortoptimize_insert_5
>   9.9 sec 2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_schemeAuthority  
> 0.54 sec2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucket5  1.9 sec 
> 2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_load_fs2 0.83 
> sec2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_lock44.3 sec 
> 2
>  
> org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_exim_14_managed_location_over_existing
>1 sec   2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_udf_in_file  
> 0.73 sec2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_lock10.92 
> sec2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mi   1.9 sec 
> 2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_nullformatdir
> 1 sec   2
>  
> org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_exim_13_managed_location
>  3.4 sec 2
>  
> org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_import_exported_table
> 2.6 sec 2
>  
> org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_correlationoptimizer8
> 10 sec  2
>  
> org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_authorization_create_macro1
>   2.5 sec 2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_stats4   2.5 sec 
> 2
>  
> org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_exim_11_managed_external
>  0.99 sec2
>  
> org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_complex_types_multi_single_reducer
>8.2 sec 2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_nullgroup5   
> 1.2 sec 2
>  
> org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_5
> 9.9 sec 2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_lock34.2 sec 
> 2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_union_view   
> 4.1 sec 2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_sample10 2.5 sec 
> 2
>  
> org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_rename_external_partition_location
>2 sec   2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_remote_script
> 0.35 sec2
>  
> org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_exim_12_external_location
> 1 sec   2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_load_dyn_part1   
> 6.4 sec 2
>  
> org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_authorization_insert
>  3.6 sec 2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_newline  4.2 sec 
> 2
>  
> org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_file_with_header_footer
>   2.7 sec 2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_17   
> 10 sec  2
>  
> org.apache.hadoop.hive.cli.TestSparkCliD

[jira] [Updated] (HIVE-7567) support automatic calculating reduce task number [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7567:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> support automatic calculating reduce task number [Spark Branch]
> ---
>
> Key: HIVE-7567
> URL: https://issues.apache.org/jira/browse/HIVE-7567
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>  Labels: TODOC-SPARK
> Fix For: 1.1.0
>
> Attachments: HIVE-7567.1-spark.patch, HIVE-7567.2-spark.patch, 
> HIVE-7567.3-spark.patch, HIVE-7567.4-spark.patch, HIVE-7567.5-spark.patch, 
> HIVE-7567.6-spark.patch
>
>
> Hive have its own machenism to calculate reduce task number, we need to 
> implement it on spark job.
> NO PRECOMMIT TESTS. This is for spark-branch only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7541) Support union all on Spark [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7541:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Support union all on Spark [Spark Branch]
> -
>
> Key: HIVE-7541
> URL: https://issues.apache.org/jira/browse/HIVE-7541
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Xuefu Zhang
>Assignee: Na Yang
> Fix For: 1.1.0
>
> Attachments: HIVE-7541.1-spark.patch, HIVE-7541.2-spark.patch, 
> HIVE-7541.3-spark.patch, HIVE-7541.4-spark.patch, HIVE-7541.5-spark.patch, 
> Hive on Spark Union All design.pdf
>
>
> For union all operator, we will use Spark's union transformation. Refer to 
> the design doc on wiki for more information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8649) Increase level of parallelism in reduce phase [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8649:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Increase level of parallelism in reduce phase [Spark Branch]
> 
>
> Key: HIVE-8649
> URL: https://issues.apache.org/jira/browse/HIVE-8649
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Jimmy Xiang
> Fix For: 1.1.0
>
> Attachments: HIVE-8649.1-spark.patch, HIVE-8649.2-spark.patch
>
>
> We calculate the number of reducers based on the same code for MapReduce. 
> However, reducers are vastly cheaper in Spark and it's generally recommended 
> we have many more reducers than in MR.
> Sandy Ryza who works on Spark has some ideas about a heuristic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8437) Modify SparkPlan generation to set toCache flag to SparkTrans where caching is needed [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8437:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Modify SparkPlan generation to set toCache flag to SparkTrans where caching 
> is needed [Spark Branch]
> 
>
> Key: HIVE-8437
> URL: https://issues.apache.org/jira/browse/HIVE-8437
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
> Fix For: 1.1.0
>
>
> HIVE-8436 may modify the SparkWork right before SparkPlan generation. When 
> this happens, the output from some SparkTrans needs to be cached to avoid 
> regenerating the RDD. For more information, please refer to the design doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9007) Hive may generate wrong plan for map join queries due to IdentityProjectRemover [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9007:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Hive may generate wrong plan for map join queries due to 
> IdentityProjectRemover [Spark Branch]
> --
>
> Key: HIVE-9007
> URL: https://issues.apache.org/jira/browse/HIVE-9007
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Chao Sun
>Assignee: Szehon Ho
> Fix For: 1.1.0
>
> Attachments: HIVE-9007-spark.patch, HIVE-9007.2-spark.patch
>
>
> HIVE-8435 introduces a new logical optimizer called IdentityProjectRemover, 
> which may cause map join in spark branch to generate wrong plan.
> Currently, the map join conversion in spark branch first goes through a 
> method {{convertJoinMapJoin}}, which replaces a join op with a mapjoin op, 
> removes RS associated with big table, and keep RSs for all small tables. 
> Afterwards, in {{SparkReduceSinkMapJoinProc}} it replaces all parent RSs of 
> the mapjoin op with HTS (note it doesn't check whether the RS belongs to 
> small table or big table.)
> The issue arises, when IdentityProjectRemover comes into play, which may 
> result into a situation that a operator tree has two consecutive RSs. Imaging 
> the following example:
> {noformat}
>   Join   MapJoin
>   / \/   \
> RS   RS   ---> RS RS
>/  \   / \
>   TS   RS   TS  TS (big table)
> \  (small table)
>  TS
> {noformat}
> In this case, all parents of the mapjoin op will be RS, even the branch for 
> big table! In {{SparkReduceSinkMapJoinProc}}, they will be replaced with HTS, 
> which is obviously incorrect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8841) Make RDD caching work for multi-insert after HIVE-8793 when map join is involved [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8841:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Make RDD caching work for multi-insert after HIVE-8793 when map join is 
> involved [Spark Branch]
> ---
>
> Key: HIVE-8841
> URL: https://issues.apache.org/jira/browse/HIVE-8841
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
> Fix For: 1.1.0
>
> Attachments: HIVE-8841.1-spark.patch
>
>
> Splitting SparkWork now happens before MapJoinResolver. As MapJoinResolve may 
> further spins off a dependent SparkWork for small tables of a join, we need 
> to make Spark RDD caching continue work even across SparkWorks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8788) UT: fix partition test case [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8788:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> UT: fix partition test case [Spark Branch]
> --
>
> Key: HIVE-8788
> URL: https://issues.apache.org/jira/browse/HIVE-8788
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: spark-branch
>Reporter: Thomas Friedrich
>Assignee: Chinna Rao Lalam
> Fix For: 1.1.0
>
> Attachments: HIVE-8788-spark.patch, HIVE-8788.1-spark.patch
>
>
> The test limit_partition_metadataonly fails with 
> 2014-11-06 18:40:12,891 ERROR ql.Driver (SessionState.java:printError(829)) - 
> FAILED: SemanticException Number of partitions scanned (=4) on table srcpart 
> exceeds limit (=1). This is controlled by 
> hive.limit.query.max.table.partition.
> org.apache.hadoop.hive.ql.parse.SemanticException: Number of partitions 
> scanned (=4) on table srcpart exceeds limit (=1). This is controlled by 
> hive.limit.query.max.table.partition.
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.enforceScanLimits(SemanticAnalyzer.java:10358)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10190)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:419)
> In the test, SemanticAnalyzer.enforceScanLimits expects only 1 partition 
> ds=2008-04-08/hr=11 but gets 4 partitions:
> [srcpart(ds=2008-04-08/hr=11), srcpart(ds=2008-04-08/hr=12), 
> srcpart(ds=2008-04-09/hr=11), srcpart(ds=2008-04-09/hr=12)]
> In the log it shows that the ParitionPruner ran, and it should have only 
> retained one partition:
> 2014-11-07 14:18:09,147 DEBUG ppr.PartitionPruner 
> (PartitionPruner.java:prune(206)) - Filter w/ compacting: ((hr = 11) and (ds 
> = '2008-04-08')); filter w/o compacting: ((hr = 11) and (ds = '2008-04-08'))
> 2014-11-07 14:18:09,147 INFO  metastore.HiveMetaStore 
> (HiveMetaStore.java:logInfo(719)) - 0: get_partitions_by_expr : db=default 
> tbl=srcpart
> 2014-11-07 14:18:09,165 DEBUG ppr.PartitionPruner 
> (PartitionPruner.java:prunePartitionNames(491)) - retained partition: 
> ds=2008-04-08/hr=11



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8924) Investigate test failure for join_empty.q [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8924:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Investigate test failure for join_empty.q [Spark Branch]
> 
>
> Key: HIVE-8924
> URL: https://issues.apache.org/jira/browse/HIVE-8924
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Chao Sun
>Assignee: Szehon Ho
> Fix For: 1.1.0
>
> Attachments: HIVE-8924-spark.patch, HIVE-8924.2-spark.patch, 
> HIVE-8924.3-spark.patch, HIVE-8924.4-spark.patch
>
>
> This query has an interesting case where the big table work is empty. Here's 
> the MR plan:
> {noformat}
> STAGE DEPENDENCIES:
>   Stage-4 is a root stage
>   Stage-3 depends on stages: Stage-4
>   Stage-0 depends on stages: Stage-3
> STAGE PLANS:
>   Stage: Stage-4
> Map Reduce Local Work
>   Alias -> Map Local Tables:
> b 
>   Fetch Operator
> limit: -1
>   Alias -> Map Local Operator Tree:
> b 
>   TableScan
> alias: b
> Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE 
> Column stats: NONE
> Filter Operator
>   predicate: UDFToDouble(key) is not null (type: boolean)
>   Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE 
> Column stats: NONE
>   HashTable Sink Operator
> condition expressions:
>   0 {key}
>   1 {value}
> keys:
>   0 UDFToDouble(key) (type: double)
>   1 UDFToDouble(key) (type: double)
>   Stage: Stage-3
> Map Reduce
>   Local Work:
> Map Reduce Local Work
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> ListSink
> {noformat}
> The plan for Spark is not correct. We need to investigate the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7540) NotSerializableException encountered when using sortByKey transformation

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7540:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> NotSerializableException encountered when using sortByKey transformation
> 
>
> Key: HIVE-7540
> URL: https://issues.apache.org/jira/browse/HIVE-7540
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
> Environment: Spark-1.0.1
>Reporter: Rui Li
>Assignee: Rui Li
> Fix For: 1.1.0
>
> Attachments: HIVE-7540-spark.patch, HIVE-7540.2-spark.patch, 
> HIVE-7540.3-spark.patch
>
>
> This exception is thrown when sortByKey is used as the shuffle transformation 
> between MapWork and ReduceWork:
> {quote}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task not 
> serializable: java.io.NotSerializableException: 
> org.apache.hadoop.io.BytesWritable
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1049)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:772)
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:715)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:719)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:718)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:718)
> at 
> org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:699)
> …
> {quote}
>  The root cause is that the RangePartitioner used by sortByKey contains 
> rangeBounds: Array[BytesWritable], which is considered not serializable in 
> spark.
> A workaround to this issue is to set the number of partitions to 1 when 
> calling sortByKey, in which case the rangeBounds will be just an empty array.
> NO PRECOMMIT TESTS. This is for spark branch only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7613) Research optimization of auto convert join to map join [Spark branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7613:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Research optimization of auto convert join to map join [Spark branch]
> -
>
> Key: HIVE-7613
> URL: https://issues.apache.org/jira/browse/HIVE-7613
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Suhas Satish
>Priority: Minor
> Fix For: 1.1.0
>
> Attachments: HIve on Spark Map join background.docx, Hive on Spark 
> Join Master Design.pdf, small_table_broadcasting.pdf
>
>
> ConvertJoinMapJoin is an optimization the replaces a common join(aka shuffle 
> join) with a map join(aka broadcast or fragment replicate join) when 
> possible. we need to research how to make it workable with Hive on Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8842) auto_join2.q produces incorrect tree [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8842:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> auto_join2.q produces incorrect tree [Spark Branch]
> ---
>
> Key: HIVE-8842
> URL: https://issues.apache.org/jira/browse/HIVE-8842
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Szehon Ho
>Assignee: Chao Sun
> Fix For: 1.1.0
>
> Attachments: HIVE-8842.1-spark.patch, HIVE-8842.2-spark.patch, 
> HIVE-8842.3-spark.patch
>
>
> Enabling the SparkMapJoinResolver and SparkReduceSinkMapJoinProc, I see the 
> following:
> {noformat}
> explain select * from src src1 JOIN src src2 ON (src1.key = src2.key) JOIN 
> src src3 ON (src1.key + src2.key = src3.key);
> {noformat}
> produces too many stages (six), and too many HashTableSink.
> {noformat}
> STAGE DEPENDENCIES:
>   Stage-5 is a root stage
>   Stage-4 depends on stages: Stage-5
>   Stage-3 depends on stages: Stage-4
>   Stage-7 is a root stage
>   Stage-6 depends on stages: Stage-7
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-5
> Spark
>   DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:3
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: src2
>   Statistics: Num rows: 29 Data size: 5812 Basic stats: 
> COMPLETE Column stats: NONE
>   Filter Operator
> predicate: key is not null (type: boolean)
> Statistics: Num rows: 15 Data size: 3006 Basic stats: 
> COMPLETE Column stats: NONE
> HashTable Sink Operator
>   condition expressions:
> 0 {key} {value}
> 1 {key} {value}
>   keys:
> 0 key (type: string)
> 1 key (type: string)
>   Stage: Stage-4
> Spark
>   DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:2
>   Vertices:
> Map 3 
> Map Operator Tree:
> TableScan
>   alias: src1
>   Statistics: Num rows: 29 Data size: 5812 Basic stats: 
> COMPLETE Column stats: NONE
>   Filter Operator
> predicate: key is not null (type: boolean)
> Statistics: Num rows: 15 Data size: 3006 Basic stats: 
> COMPLETE Column stats: NONE
> Map Join Operator
>   condition map:
>Inner Join 0 to 1
>   condition expressions:
> 0 {key} {value}
> 1 {key} {value}
>   keys:
> 0 key (type: string)
> 1 key (type: string)
>   outputColumnNames: _col0, _col1, _col5, _col6
>   input vertices:
> 1 Map 1
>   Statistics: Num rows: 16 Data size: 3306 Basic stats: 
> COMPLETE Column stats: NONE
>   Filter Operator
> predicate: (_col0 + _col5) is not null (type: boolean)
> Statistics: Num rows: 8 Data size: 1653 Basic stats: 
> COMPLETE Column stats: NONE
> HashTable Sink Operator
>   condition expressions:
> 0 {_col0} {_col1} {_col5} {_col6}
> 1 {key} {value}
>   keys:
> 0 (_col0 + _col5) (type: double)
> 1 UDFToDouble(key) (type: double)
>   Stage: Stage-3
> Spark
>   DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:1
>   Vertices:
> Map 2 
> Map Operator Tree:
> TableScan
>   alias: src3
>   Statistics: Num rows: 29 Data size: 5812 Basic stats: 
> COMPLETE Column stats: NONE
>   Filter Operator
> predicate: UDFToDouble(key) is not null (type: boolean)
> Statistics: Num rows: 15 Data size: 3006 Basic stats: 
> COMPLETE Column stats: NONE
> Map Join Operator
>   condition map:
>Inner Join 0 to 1
>   condition expressions:
> 0 {_col0} {_col1} {_col5} {_col6}
> 1 {key} {value}
>   keys:
> 0 (_col0 + _col5) (type: double)
> 1 UDFToDouble(key) (type: double)
>   outputColumnNames: _col0, _col1,

[jira] [Updated] (HIVE-9088) Spark counter serialization error in spark.log [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9088:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Spark counter serialization error in spark.log [Spark Branch]
> -
>
> Key: HIVE-9088
> URL: https://issues.apache.org/jira/browse/HIVE-9088
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Chengxiang Li
> Fix For: 1.1.0
>
> Attachments: HIVE-9088.1-spark.patch
>
>
> It seems that the counter didn't get registered. Increasing it in executor 
> caused this error. Task itself succeeds nevertheless.
> {code}
> 2014-12-11 05:24:48,951 ERROR [Executor task launch worker-0]: 
> counter.SparkCounters (SparkCounters.java:increment(83)) -
> counter[HIVE, RECORDS_IN] has not initialized before.
> 2014-12-11 05:24:48,951 ERROR [Executor task launch worker-0]: 
> counter.SparkCounters (SparkCounters.java:increment(83)) -
> counter[HIVE, DESERIALIZE_ERRORS] has not initialized before.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7816) Enable map-join tests which Tez executes [Spark Branch]

2015-05-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7816:
--
Fix Version/s: (was: spark-branch)
   1.1.0

> Enable map-join tests which Tez executes [Spark Branch]
> ---
>
> Key: HIVE-7816
> URL: https://issues.apache.org/jira/browse/HIVE-7816
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Rui Li
> Fix For: 1.1.0
>
> Attachments: HIVE-7816.1-spark.patch, HIVE-7816.2-spark.patch
>
>
>  
> {noformat}
>   auto_join0.q,\
>   auto_join1.q,\
>   cross_join.q,\
>   cross_product_check_1.q,\
>   cross_product_check_2.q,\
> {noformat}
> {noformat}
> filter_join_breaktask.q,\
> filter_join_breaktask2.q
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   >