[jira] [Commented] (HIVE-10807) Invalidate basic stats for insert queries if autogather=false
[ https://issues.apache.org/jira/browse/HIVE-10807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564308#comment-14564308 ] Hive QA commented on HIVE-10807: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12735815/HIVE-10807.4.patch {color:red}ERROR:{color} -1 due to 40 failed/errored test(s), 8978 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fold_case org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_partialscan_autogether org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_insert_into1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoin_union_remove_1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_stats12 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_stats13 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_stats2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_stats5 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_stats6 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_stats9 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_stats_noscan_1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_stats_partscan_1_23 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_10 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_11 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_12 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_13 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_14 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_15 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_16 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_17 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_18 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_19 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_20 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_21 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_22 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_23 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_24 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_25 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_3 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_4 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_5 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_6 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_6_subq org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_7 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_8 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_9 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4086/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4086/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4086/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 40 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12735815 - PreCommit-HIVE-TRUNK-Build > Invalidate basic stats for insert queries if autogather=false > - > > Key: HIVE-10807 > URL: https://issues.apache.org/jira/browse/HIVE-10807 > Project: Hive > Issue Type: Bug > Components: Statistics >Affects Versions: 1.2.0 >Reporter: Gopal V >Assignee: Ashutosh Chauhan > Attachments: HIVE-10807.2.patch, HIVE-10807.3.patch, > HIVE-10807.4.patch, HIVE-10807.patch > > > if stats.autogather=false leads to incorrect basic stats in case of insert > statements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564297#comment-14564297 ] Alexander Pivovarov commented on HIVE-10841: if I set hive.optimize.ppd=false then query returns 1 row BUT the plan does not have "brn is not null" if I set hive.optimize.ppd=true and change JOIN statements order to (A,acct,PI) then query returns 1 row AND the plan HAS "brn is not null". {code} set hive.optimize.ppd=true; select acct.ACC_N, acct.brn FROM LA JOIN A ON LA.aid = A.id JOIN acct ON LA.aid = acct.aid JOIN PI ON PI.id = LA.pi_id WHERE LA.loan_id = 4436 and acct.brn is not null; OK 10 122 {code} > [WHERE col is not null] does not work sometimes for queries with many JOIN > statements > - > > Key: HIVE-10841 > URL: https://issues.apache.org/jira/browse/HIVE-10841 > Project: Hive > Issue Type: Bug > Components: Query Planning, Query Processor >Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0 >Reporter: Alexander Pivovarov > > The result from the following SELECT query is 3 rows but it should be 1 row. > I checked it in MySQL - it returned 1 row. > To reproduce the issue in Hive > 1. prepare tables > {code} > drop table if exists L; > drop table if exists LA; > drop table if exists FR; > drop table if exists A; > drop table if exists PI; > drop table if exists acct; > create table L as select 4436 id; > create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; > create table FR as select 4436 loan_id; > create table A as select 4748 id; > create table PI as select 4415 id; > create table acct as select 4748 aid, 10 acc_n, 122 brn; > insert into table acct values(4748, null, null); > insert into table acct values(4748, null, null); > {code} > 2. run SELECT query > {code} > select > acct.ACC_N, > acct.brn > FROM L > JOIN LA ON L.id = LA.loan_id > JOIN FR ON L.id = FR.loan_id > JOIN A ON LA.aid = A.id > JOIN PI ON PI.id = LA.pi_id > JOIN acct ON A.id = acct.aid > WHERE > L.id = 4436 > and acct.brn is not null; > {code} > the result is 3 rows > {code} > 10122 > NULL NULL > NULL NULL > {code} > but it should be 1 row > {code} > 10122 > {code} > 2.1 "explain select ..." output for hive-1.3.0 MR > {code} > STAGE DEPENDENCIES: > Stage-12 is a root stage > Stage-9 depends on stages: Stage-12 > Stage-0 depends on stages: Stage-9 > STAGE PLANS: > Stage: Stage-12 > Map Reduce Local Work > Alias -> Map Local Tables: > a > Fetch Operator > limit: -1 > acct > Fetch Operator > limit: -1 > fr > Fetch Operator > limit: -1 > l > Fetch Operator > limit: -1 > pi > Fetch Operator > limit: -1 > Alias -> Map Local Operator Tree: > a > TableScan > alias: a > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column > stats: NONE > Filter Operator > predicate: id is not null (type: boolean) > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE > Column stats: NONE > HashTable Sink Operator > keys: > 0 _col5 (type: int) > 1 id (type: int) > 2 aid (type: int) > acct > TableScan > alias: acct > Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE > Column stats: NONE > Filter Operator > predicate: aid is not null (type: boolean) > Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE > Column stats: NONE > HashTable Sink Operator > keys: > 0 _col5 (type: int) > 1 id (type: int) > 2 aid (type: int) > fr > TableScan > alias: fr > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column > stats: NONE > Filter Operator > predicate: (loan_id = 4436) (type: boolean) > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE > Column stats: NONE > HashTable Sink Operator > keys: > 0 4436 (type: int) > 1 4436 (type: int) > 2 4436 (type: int) > l > TableScan > alias: l > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column > stats: NONE > Filter Operator > predicate: (id = 4436) (type: boolean) > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE > Column stats: NONE > HashTable Sink Operat
[jira] [Commented] (HIVE-10821) Beeline-CLI: Implement CLI source command using Beeline functionality
[ https://issues.apache.org/jira/browse/HIVE-10821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564288#comment-14564288 ] Chinna Rao Lalam commented on HIVE-10821: - Hi [~Ferd] Please remove String trimedCmd = cmd.trim() this in sourceFile(String cmd) Other Than this patch looks good to me +1(non binding) > Beeline-CLI: Implement CLI source command using Beeline functionality > - > > Key: HIVE-10821 > URL: https://issues.apache.org/jira/browse/HIVE-10821 > Project: Hive > Issue Type: Sub-task > Components: CLI >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu > Attachments: HIVE-10821.1-beeline-cli.patch, > HIVE-10821.1-beeline-cli.patch, HIVE-10821.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10863) Merge trunk to Spark branch 5/28/2015 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564280#comment-14564280 ] Hive QA commented on HIVE-10863: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12736048/HIVE-10863.0-spark.patch {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 7962 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.initializationError org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_7 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_8 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin11 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin5 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketsortoptimize_insert_2 org.apache.hive.jdbc.TestSSL.testSSLConnectionWithProperty {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/869/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/869/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-869/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12736048 - PreCommit-HIVE-SPARK-Build > Merge trunk to Spark branch 5/28/2015 [Spark Branch] > > > Key: HIVE-10863 > URL: https://issues.apache.org/jira/browse/HIVE-10863 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Xuefu Zhang >Assignee: Deepesh Khandelwal > Attachments: HIVE-10863.0-spark.patch, mj.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10821) Beeline-CLI: Implement CLI source command using Beeline functionality
[ https://issues.apache.org/jira/browse/HIVE-10821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-10821: Release Note: Examples: {noformat} #cat /root/workspace/test.sql create table test2(a string, b string); #0: jdbc:hive2://> source /root/workspace/test.sql #0: jdbc:hive2://> create table test2(a string, b string); {noformat} > Beeline-CLI: Implement CLI source command using Beeline functionality > - > > Key: HIVE-10821 > URL: https://issues.apache.org/jira/browse/HIVE-10821 > Project: Hive > Issue Type: Sub-task > Components: CLI >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu > Attachments: HIVE-10821.1-beeline-cli.patch, > HIVE-10821.1-beeline-cli.patch, HIVE-10821.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10864) CBO (Calcite Return Path): auto_join2.q returning wrong results
[ https://issues.apache.org/jira/browse/HIVE-10864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-10864: --- Attachment: HIVE-10864.patch [~ashutoshc], could you review it? Thanks > CBO (Calcite Return Path): auto_join2.q returning wrong results > --- > > Key: HIVE-10864 > URL: https://issues.apache.org/jira/browse/HIVE-10864 > Project: Hive > Issue Type: Sub-task > Components: CBO >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-10864.patch > > > auto_join2.q returns wrong results when return path is on. The problem is > that we create the same join expression once per input reference when we are > translating. Thus, we incorrectly end up with a key composed by multiple > expressions in those cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564273#comment-14564273 ] Alexander Pivovarov commented on HIVE-10841: L tables also can be removed from the query - it does not affect the result. But I found that order of JOIN statements matters. I tries the following combinations of the JOIN statements (PI,acct,A) (PI,A,acct) (A,PI,acct) (acct,PI,A) (A,act,PI) (act,A,PI) 3 rows are returned only for (A,PI,acct) combination FROM LA JOIN A JOIN PI JOIN acct {code} select acct.ACC_N, acct.brn FROM LA JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON LA.aid = acct.aid WHERE LA.loan_id = 4436 and acct.brn is not null; OK 10 122 NULLNULL NULLNULL {code} > [WHERE col is not null] does not work sometimes for queries with many JOIN > statements > - > > Key: HIVE-10841 > URL: https://issues.apache.org/jira/browse/HIVE-10841 > Project: Hive > Issue Type: Bug > Components: Query Planning, Query Processor >Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0 >Reporter: Alexander Pivovarov > > The result from the following SELECT query is 3 rows but it should be 1 row. > I checked it in MySQL - it returned 1 row. > To reproduce the issue in Hive > 1. prepare tables > {code} > drop table if exists L; > drop table if exists LA; > drop table if exists FR; > drop table if exists A; > drop table if exists PI; > drop table if exists acct; > create table L as select 4436 id; > create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; > create table FR as select 4436 loan_id; > create table A as select 4748 id; > create table PI as select 4415 id; > create table acct as select 4748 aid, 10 acc_n, 122 brn; > insert into table acct values(4748, null, null); > insert into table acct values(4748, null, null); > {code} > 2. run SELECT query > {code} > select > acct.ACC_N, > acct.brn > FROM L > JOIN LA ON L.id = LA.loan_id > JOIN FR ON L.id = FR.loan_id > JOIN A ON LA.aid = A.id > JOIN PI ON PI.id = LA.pi_id > JOIN acct ON A.id = acct.aid > WHERE > L.id = 4436 > and acct.brn is not null; > {code} > the result is 3 rows > {code} > 10122 > NULL NULL > NULL NULL > {code} > but it should be 1 row > {code} > 10122 > {code} > 2.1 "explain select ..." output for hive-1.3.0 MR > {code} > STAGE DEPENDENCIES: > Stage-12 is a root stage > Stage-9 depends on stages: Stage-12 > Stage-0 depends on stages: Stage-9 > STAGE PLANS: > Stage: Stage-12 > Map Reduce Local Work > Alias -> Map Local Tables: > a > Fetch Operator > limit: -1 > acct > Fetch Operator > limit: -1 > fr > Fetch Operator > limit: -1 > l > Fetch Operator > limit: -1 > pi > Fetch Operator > limit: -1 > Alias -> Map Local Operator Tree: > a > TableScan > alias: a > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column > stats: NONE > Filter Operator > predicate: id is not null (type: boolean) > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE > Column stats: NONE > HashTable Sink Operator > keys: > 0 _col5 (type: int) > 1 id (type: int) > 2 aid (type: int) > acct > TableScan > alias: acct > Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE > Column stats: NONE > Filter Operator > predicate: aid is not null (type: boolean) > Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE > Column stats: NONE > HashTable Sink Operator > keys: > 0 _col5 (type: int) > 1 id (type: int) > 2 aid (type: int) > fr > TableScan > alias: fr > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column > stats: NONE > Filter Operator > predicate: (loan_id = 4436) (type: boolean) > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE > Column stats: NONE > HashTable Sink Operator > keys: > 0 4436 (type: int) > 1 4436 (type: int) > 2 4436 (type: int) > l > TableScan > alias: l > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column > stats: NONE > Filter Operator > predicate: (id = 4436) (type: boolean) > Statistics: Num rows: 1 Dat
[jira] [Updated] (HIVE-9069) Simplify filter predicates for CBO
[ https://issues.apache.org/jira/browse/HIVE-9069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-9069: -- Attachment: HIVE-9069.18.patch > Simplify filter predicates for CBO > -- > > Key: HIVE-9069 > URL: https://issues.apache.org/jira/browse/HIVE-9069 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 0.14.0 >Reporter: Mostafa Mokhtar >Assignee: Jesus Camacho Rodriguez > Fix For: 0.14.1 > > Attachments: HIVE-9069.01.patch, HIVE-9069.02.patch, > HIVE-9069.03.patch, HIVE-9069.04.patch, HIVE-9069.05.patch, > HIVE-9069.06.patch, HIVE-9069.07.patch, HIVE-9069.08.patch, > HIVE-9069.08.patch, HIVE-9069.09.patch, HIVE-9069.10.patch, > HIVE-9069.11.patch, HIVE-9069.12.patch, HIVE-9069.13.patch, > HIVE-9069.14.patch, HIVE-9069.14.patch, HIVE-9069.15.patch, > HIVE-9069.16.patch, HIVE-9069.17.patch, HIVE-9069.17.patch, > HIVE-9069.18.patch, HIVE-9069.patch > > > Simplify predicates for disjunctive predicates so that can get pushed down to > the scan. > Looks like this is still an issue, some of the filters can be pushed down to > the scan. > {code} > set hive.cbo.enable=true > set hive.stats.fetch.column.stats=true > set hive.exec.dynamic.partition.mode=nonstrict > set hive.tez.auto.reducer.parallelism=true > set hive.auto.convert.join.noconditionaltask.size=32000 > set hive.exec.reducers.bytes.per.reducer=1 > set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager > set hive.support.concurrency=false > set hive.tez.exec.print.summary=true > explain > select substr(r_reason_desc,1,20) as r >,avg(ws_quantity) wq >,avg(wr_refunded_cash) ref >,avg(wr_fee) fee > from web_sales, web_returns, web_page, customer_demographics cd1, > customer_demographics cd2, customer_address, date_dim, reason > where web_sales.ws_web_page_sk = web_page.wp_web_page_sk >and web_sales.ws_item_sk = web_returns.wr_item_sk >and web_sales.ws_order_number = web_returns.wr_order_number >and web_sales.ws_sold_date_sk = date_dim.d_date_sk and d_year = 1998 >and cd1.cd_demo_sk = web_returns.wr_refunded_cdemo_sk >and cd2.cd_demo_sk = web_returns.wr_returning_cdemo_sk >and customer_address.ca_address_sk = web_returns.wr_refunded_addr_sk >and reason.r_reason_sk = web_returns.wr_reason_sk >and >( > ( > cd1.cd_marital_status = 'M' > and > cd1.cd_marital_status = cd2.cd_marital_status > and > cd1.cd_education_status = '4 yr Degree' > and > cd1.cd_education_status = cd2.cd_education_status > and > ws_sales_price between 100.00 and 150.00 > ) >or > ( > cd1.cd_marital_status = 'D' > and > cd1.cd_marital_status = cd2.cd_marital_status > and > cd1.cd_education_status = 'Primary' > and > cd1.cd_education_status = cd2.cd_education_status > and > ws_sales_price between 50.00 and 100.00 > ) >or > ( > cd1.cd_marital_status = 'U' > and > cd1.cd_marital_status = cd2.cd_marital_status > and > cd1.cd_education_status = 'Advanced Degree' > and > cd1.cd_education_status = cd2.cd_education_status > and > ws_sales_price between 150.00 and 200.00 > ) >) >and >( > ( > ca_country = 'United States' > and > ca_state in ('KY', 'GA', 'NM') > and ws_net_profit between 100 and 200 > ) > or > ( > ca_country = 'United States' > and > ca_state in ('MT', 'OR', 'IN') > and ws_net_profit between 150 and 300 > ) > or > ( > ca_country = 'United States' > and > ca_state in ('WI', 'MO', 'WV') > and ws_net_profit between 50 and 250 > ) >) > group by r_reason_desc > order by r, wq, ref, fee > limit 100 > OK > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > Edges: > Map 9 <- Map 1 (BROADCAST_EDGE) > Reducer 3 <- Map 13 (SIMPLE_EDGE), Map 2 (SIMPLE_EDGE) > Reducer 4 <- Map 9 (SIMPLE_EDGE), Reducer 3 (SIMPLE_EDGE) > Reducer 5 <- Map 14 (SIMPLE_EDGE), Reducer 4 (SIMPLE_EDGE) > Reducer 6 <- Map 10 (SIMPLE_EDGE), Map 11 (BROADCAST_EDGE), Map 12 > (BROADCAST_EDGE), Reducer 5 (SIMPLE_EDGE) > Reducer 7 <- Reducer 6 (SIMPLE_EDGE) > Reducer 8 <- Reducer 7 (SIMPLE_EDGE) > DagName: mmokhtar_2014161818_f5fd23ba-d783-4b13-8507-7faa65851798:1 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: web_page > filterExpr: wp_web_page_sk is not null (type: boolean) > Statistics: Num rows: 4602 Data size: 2
[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564244#comment-14564244 ] Gopal V commented on HIVE-10841: can you try this with set hive.optimize.ppd=false; ? > [WHERE col is not null] does not work sometimes for queries with many JOIN > statements > - > > Key: HIVE-10841 > URL: https://issues.apache.org/jira/browse/HIVE-10841 > Project: Hive > Issue Type: Bug > Components: Query Planning, Query Processor >Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0 >Reporter: Alexander Pivovarov > > The result from the following SELECT query is 3 rows but it should be 1 row. > I checked it in MySQL - it returned 1 row. > To reproduce the issue in Hive > 1. prepare tables > {code} > drop table if exists L; > drop table if exists LA; > drop table if exists FR; > drop table if exists A; > drop table if exists PI; > drop table if exists acct; > create table L as select 4436 id; > create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; > create table FR as select 4436 loan_id; > create table A as select 4748 id; > create table PI as select 4415 id; > create table acct as select 4748 aid, 10 acc_n, 122 brn; > insert into table acct values(4748, null, null); > insert into table acct values(4748, null, null); > {code} > 2. run SELECT query > {code} > select > acct.ACC_N, > acct.brn > FROM L > JOIN LA ON L.id = LA.loan_id > JOIN FR ON L.id = FR.loan_id > JOIN A ON LA.aid = A.id > JOIN PI ON PI.id = LA.pi_id > JOIN acct ON A.id = acct.aid > WHERE > L.id = 4436 > and acct.brn is not null; > {code} > the result is 3 rows > {code} > 10122 > NULL NULL > NULL NULL > {code} > but it should be 1 row > {code} > 10122 > {code} > 2.1 "explain select ..." output for hive-1.3.0 MR > {code} > STAGE DEPENDENCIES: > Stage-12 is a root stage > Stage-9 depends on stages: Stage-12 > Stage-0 depends on stages: Stage-9 > STAGE PLANS: > Stage: Stage-12 > Map Reduce Local Work > Alias -> Map Local Tables: > a > Fetch Operator > limit: -1 > acct > Fetch Operator > limit: -1 > fr > Fetch Operator > limit: -1 > l > Fetch Operator > limit: -1 > pi > Fetch Operator > limit: -1 > Alias -> Map Local Operator Tree: > a > TableScan > alias: a > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column > stats: NONE > Filter Operator > predicate: id is not null (type: boolean) > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE > Column stats: NONE > HashTable Sink Operator > keys: > 0 _col5 (type: int) > 1 id (type: int) > 2 aid (type: int) > acct > TableScan > alias: acct > Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE > Column stats: NONE > Filter Operator > predicate: aid is not null (type: boolean) > Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE > Column stats: NONE > HashTable Sink Operator > keys: > 0 _col5 (type: int) > 1 id (type: int) > 2 aid (type: int) > fr > TableScan > alias: fr > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column > stats: NONE > Filter Operator > predicate: (loan_id = 4436) (type: boolean) > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE > Column stats: NONE > HashTable Sink Operator > keys: > 0 4436 (type: int) > 1 4436 (type: int) > 2 4436 (type: int) > l > TableScan > alias: l > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column > stats: NONE > Filter Operator > predicate: (id = 4436) (type: boolean) > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE > Column stats: NONE > HashTable Sink Operator > keys: > 0 4436 (type: int) > 1 4436 (type: int) > 2 4436 (type: int) > pi > TableScan > alias: pi > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column > stats: NONE > Filter Operator > predicate: id is not null (type: boolean) > Statistics: Num rows: 1 Data size: 4 Basic stats: C
[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564243#comment-14564243 ] Alexander Pivovarov commented on HIVE-10841: I found that "JOIN FR" can be removed - the result still will be 3 rows But adding or removing "JOIN PI" changes Filter Operator predicate for acct table if we remove "JOIN PI" then acct table Filter Operator predicate has "brn is not null" and query returns 1 row {code} acct TableScan alias: acct Statistics: Num rows: 5 Data size: 63 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (aid is not null and brn is not null) (type: boolean) {code} How it can be possible that removing "JOIN PI" changes Filter Operator predicate for acct table? The query below returns 1 row. Query plan has "brn is not null" predicate in Filter Operator for acct table. But if we remove comment before "JOIN PI" then query plan will not have "brn is not null" predicate. {code} explain select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN A ON LA.aid = A.id --JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} > [WHERE col is not null] does not work sometimes for queries with many JOIN > statements > - > > Key: HIVE-10841 > URL: https://issues.apache.org/jira/browse/HIVE-10841 > Project: Hive > Issue Type: Bug > Components: Query Planning, Query Processor >Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0 >Reporter: Alexander Pivovarov > > The result from the following SELECT query is 3 rows but it should be 1 row. > I checked it in MySQL - it returned 1 row. > To reproduce the issue in Hive > 1. prepare tables > {code} > drop table if exists L; > drop table if exists LA; > drop table if exists FR; > drop table if exists A; > drop table if exists PI; > drop table if exists acct; > create table L as select 4436 id; > create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; > create table FR as select 4436 loan_id; > create table A as select 4748 id; > create table PI as select 4415 id; > create table acct as select 4748 aid, 10 acc_n, 122 brn; > insert into table acct values(4748, null, null); > insert into table acct values(4748, null, null); > {code} > 2. run SELECT query > {code} > select > acct.ACC_N, > acct.brn > FROM L > JOIN LA ON L.id = LA.loan_id > JOIN FR ON L.id = FR.loan_id > JOIN A ON LA.aid = A.id > JOIN PI ON PI.id = LA.pi_id > JOIN acct ON A.id = acct.aid > WHERE > L.id = 4436 > and acct.brn is not null; > {code} > the result is 3 rows > {code} > 10122 > NULL NULL > NULL NULL > {code} > but it should be 1 row > {code} > 10122 > {code} > 2.1 "explain select ..." output for hive-1.3.0 MR > {code} > STAGE DEPENDENCIES: > Stage-12 is a root stage > Stage-9 depends on stages: Stage-12 > Stage-0 depends on stages: Stage-9 > STAGE PLANS: > Stage: Stage-12 > Map Reduce Local Work > Alias -> Map Local Tables: > a > Fetch Operator > limit: -1 > acct > Fetch Operator > limit: -1 > fr > Fetch Operator > limit: -1 > l > Fetch Operator > limit: -1 > pi > Fetch Operator > limit: -1 > Alias -> Map Local Operator Tree: > a > TableScan > alias: a > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column > stats: NONE > Filter Operator > predicate: id is not null (type: boolean) > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE > Column stats: NONE > HashTable Sink Operator > keys: > 0 _col5 (type: int) > 1 id (type: int) > 2 aid (type: int) > acct > TableScan > alias: acct > Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE > Column stats: NONE > Filter Operator > predicate: aid is not null (type: boolean) > Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE > Column stats: NONE > HashTable Sink Operator > keys: > 0 _col5 (type: int) > 1 id (type: int) > 2 aid (type: int) > fr > TableScan > alias: fr > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column > stats: NONE > Filter Operator > predicate: (loan_id = 4436) (type: boolean) > Statistics: Num rows: 1 Data size: 4 Basic stat
[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564229#comment-14564229 ] Alexander Pivovarov commented on HIVE-10841: if we change "is not null" to "is null" or to " = 122" then all 3 rows will be NULL or 122 for column "brn" (second column). and acct.brn is null; {code} 10 NULL NULLNULL NULLNULL {code} and acct.brn = 122; {code} 10 122 NULL122 NULL122 {code} and acct.brn is null; {code} 10 122 NULLNULL NULLNULL {code} > [WHERE col is not null] does not work sometimes for queries with many JOIN > statements > - > > Key: HIVE-10841 > URL: https://issues.apache.org/jira/browse/HIVE-10841 > Project: Hive > Issue Type: Bug > Components: Query Planning, Query Processor >Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0 >Reporter: Alexander Pivovarov > > The result from the following SELECT query is 3 rows but it should be 1 row. > I checked it in MySQL - it returned 1 row. > To reproduce the issue in Hive > 1. prepare tables > {code} > drop table if exists L; > drop table if exists LA; > drop table if exists FR; > drop table if exists A; > drop table if exists PI; > drop table if exists acct; > create table L as select 4436 id; > create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; > create table FR as select 4436 loan_id; > create table A as select 4748 id; > create table PI as select 4415 id; > create table acct as select 4748 aid, 10 acc_n, 122 brn; > insert into table acct values(4748, null, null); > insert into table acct values(4748, null, null); > {code} > 2. run SELECT query > {code} > select > acct.ACC_N, > acct.brn > FROM L > JOIN LA ON L.id = LA.loan_id > JOIN FR ON L.id = FR.loan_id > JOIN A ON LA.aid = A.id > JOIN PI ON PI.id = LA.pi_id > JOIN acct ON A.id = acct.aid > WHERE > L.id = 4436 > and acct.brn is not null; > {code} > the result is 3 rows > {code} > 10122 > NULL NULL > NULL NULL > {code} > but it should be 1 row > {code} > 10122 > {code} > 2.1 "explain select ..." output for hive-1.3.0 MR > {code} > STAGE DEPENDENCIES: > Stage-12 is a root stage > Stage-9 depends on stages: Stage-12 > Stage-0 depends on stages: Stage-9 > STAGE PLANS: > Stage: Stage-12 > Map Reduce Local Work > Alias -> Map Local Tables: > a > Fetch Operator > limit: -1 > acct > Fetch Operator > limit: -1 > fr > Fetch Operator > limit: -1 > l > Fetch Operator > limit: -1 > pi > Fetch Operator > limit: -1 > Alias -> Map Local Operator Tree: > a > TableScan > alias: a > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column > stats: NONE > Filter Operator > predicate: id is not null (type: boolean) > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE > Column stats: NONE > HashTable Sink Operator > keys: > 0 _col5 (type: int) > 1 id (type: int) > 2 aid (type: int) > acct > TableScan > alias: acct > Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE > Column stats: NONE > Filter Operator > predicate: aid is not null (type: boolean) > Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE > Column stats: NONE > HashTable Sink Operator > keys: > 0 _col5 (type: int) > 1 id (type: int) > 2 aid (type: int) > fr > TableScan > alias: fr > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column > stats: NONE > Filter Operator > predicate: (loan_id = 4436) (type: boolean) > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE > Column stats: NONE > HashTable Sink Operator > keys: > 0 4436 (type: int) > 1 4436 (type: int) > 2 4436 (type: int) > l > TableScan > alias: l > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column > stats: NONE > Filter Operator > predicate: (id = 4436) (type: boolean) > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE > Column stats: NONE > HashTable Sink Operator > keys: > 0 4436 (type: int) > 1 4436 (type: int) > 2 4436 (type:
[jira] [Commented] (HIVE-9069) Simplify filter predicates for CBO
[ https://issues.apache.org/jira/browse/HIVE-9069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564227#comment-14564227 ] Hive QA commented on HIVE-9069: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12735925/HIVE-9069.17.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4085/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4085/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4085/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-4085/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + cd apache-github-source-source + git fetch origin >From https://github.com/apache/hive 4610084..aafd586 spark -> origin/spark + git reset --hard HEAD HEAD is now at 52221a7 HIVE-10684: Fix the unit test failures for HIVE-7553 after HIVE-10674 removed the binary jar files(Ferdinand Xu, reviewed by Hari Sankar Sivarama Subramaniyan and Sushanth Sowmyan) + git clean -f -d Removing common/src/java/org/apache/hadoop/hive/common/JvmPauseMonitor.java Removing common/src/java/org/apache/hadoop/hive/common/metrics/LegacyMetrics.java Removing common/src/java/org/apache/hadoop/hive/common/metrics/common/ Removing common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/ Removing common/src/test/org/apache/hadoop/hive/common/metrics/TestLegacyMetrics.java Removing common/src/test/org/apache/hadoop/hive/common/metrics/metrics2/ Removing itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestMetaStoreMetrics.java + git checkout master Already on 'master' + git reset --hard origin/master HEAD is now at 52221a7 HIVE-10684: Fix the unit test failures for HIVE-7553 after HIVE-10674 removed the binary jar files(Ferdinand Xu, reviewed by Hari Sankar Sivarama Subramaniyan and Sushanth Sowmyan) + git merge --ff-only origin/master Already up-to-date. + git gc + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12735925 - PreCommit-HIVE-TRUNK-Build > Simplify filter predicates for CBO > -- > > Key: HIVE-9069 > URL: https://issues.apache.org/jira/browse/HIVE-9069 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 0.14.0 >Reporter: Mostafa Mokhtar >Assignee: Jesus Camacho Rodriguez > Fix For: 0.14.1 > > Attachments: HIVE-9069.01.patch, HIVE-9069.02.patch, > HIVE-9069.03.patch, HIVE-9069.04.patch, HIVE-9069.05.patch, > HIVE-9069.06.patch, HIVE-9069.07.patch, HIVE-9069.08.patch, > HIVE-9069.08.patch, HIVE-9069.09.patch, HIVE-9069.10.patch, > HIVE-9069.11.patch, HIVE-9069.12.patch, HIVE-9069.13.patch, > HIVE-9069.14.patch, HIVE-9069.14.patch, HIVE-9069.15.patch, > HIVE-9069.16.patch, HIVE-9069.17.patch, HIVE-9069.17.patch, HIVE-9069.patch > > > Simplify predicates for disjunctive predicates so that can get pushed down to > the scan. > Looks like this
[jira] [Commented] (HIVE-10761) Create codahale-based metrics system for Hive
[ https://issues.apache.org/jira/browse/HIVE-10761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564222#comment-14564222 ] Hive QA commented on HIVE-10761: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12736002/HIVE-10761.5.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8983 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fold_case org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4082/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4082/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4082/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12736002 - PreCommit-HIVE-TRUNK-Build > Create codahale-based metrics system for Hive > - > > Key: HIVE-10761 > URL: https://issues.apache.org/jira/browse/HIVE-10761 > Project: Hive > Issue Type: New Feature > Components: Diagnosability >Reporter: Szehon Ho >Assignee: Szehon Ho > Attachments: HIVE-10761.2.patch, HIVE-10761.3.patch, > HIVE-10761.4.patch, HIVE-10761.5.patch, HIVE-10761.patch, hms-metrics.json > > > There is a current Hive metrics system that hooks up to a JMX reporting, but > all its measurements, models are custom. > This is to make another metrics system that will be based on Codahale (ie > yammer, dropwizard), which has the following advantage: > * Well-defined metric model for frequently-needed metrics (ie JVM metrics) > * Well-defined measurements for all metrics (ie max, mean, stddev, mean_rate, > etc), > * Built-in reporting frameworks like JMX, Console, Log, JSON webserver > It is used for many projects, including several Apache projects like Oozie. > Overall, monitoring tools should find it easier to understand these common > metric, measurement, reporting models. > The existing metric subsystem will be kept and can be enabled if backward > compatibility is desired. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564215#comment-14564215 ] Alexander Pivovarov commented on HIVE-10841: hive-0.12.0 plan {code} ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_JOIN (TOK_JOIN (TOK_JOIN (TOK_JOIN (TOK_TABREF (TOK_TABNAME L)) (TOK_TABREF (TOK_TABNAME LA)) (= (. (TOK_TABLE_OR_COL L) id) (. (TOK_TABLE_OR_COL LA) loan_id))) (TOK_TABREF (TOK_TABNAME FR)) (= (. (TOK_TABLE_OR_COL L) id) (. (TOK_TABLE_OR_COL FR) loan_id))) (TOK_TABREF (TOK_TABNAME A)) (= (. (TOK_TABLE_OR_COL LA) aid) (. (TOK_TABLE_OR_COL A) id))) (TOK_TABREF (TOK_TABNAME PI)) (= (. (TOK_TABLE_OR_COL PI) id) (. (TOK_TABLE_OR_COL LA) pi_id))) (TOK_TABREF (TOK_TABNAME acct)) (= (. (TOK_TABLE_OR_COL A) id) (. (TOK_TABLE_OR_COL acct) aid (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL acct) ACC_N)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL acct) brn))) (TOK_WHERE (and (= (. (TOK_TABLE_OR_COL L) id) 4436) (TOK_FUNCTION TOK_ISNOTNULL (. (TOK_TABLE_OR_COL acct) brn)) STAGE DEPENDENCIES: Stage-11 is a root stage Stage-8 depends on stages: Stage-11 Stage-0 is a root stage STAGE PLANS: Stage: Stage-11 Map Reduce Local Work Alias -> Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias -> Map Local Operator Tree: a TableScan alias: a HashTable Sink Operator condition expressions: 0 {_col5} 1 2 {acc_n} {brn} handleSkewJoin: false keys: 0 [Column[_col4]] 1 [Column[id]] 2 [Column[aid]] Position of Big Table: 0 acct TableScan alias: acct Filter Operator predicate: expr: brn is not null type: boolean HashTable Sink Operator condition expressions: 0 {_col5} 1 2 {acc_n} {brn} handleSkewJoin: false keys: 0 [Column[_col4]] 1 [Column[id]] 2 [Column[aid]] Position of Big Table: 0 fr TableScan alias: fr Filter Operator predicate: expr: (loan_id = 4436) type: boolean HashTable Sink Operator condition expressions: 0 1 {aid} {pi_id} 2 handleSkewJoin: false keys: 0 [Column[id]] 1 [Column[loan_id]] 2 [Column[loan_id]] Position of Big Table: 1 l TableScan alias: l Filter Operator predicate: expr: (id = 4436) type: boolean HashTable Sink Operator condition expressions: 0 1 {aid} {pi_id} 2 handleSkewJoin: false keys: 0 [Column[id]] 1 [Column[loan_id]] 2 [Column[loan_id]] Position of Big Table: 1 pi TableScan alias: pi HashTable Sink Operator condition expressions: 0 {_col15} {_col16} 1 handleSkewJoin: false keys: 0 [Column[_col2]] 1 [Column[id]] Position of Big Table: 0 Stage: Stage-8 Map Reduce Alias -> Map Operator Tree: la TableScan alias: la Filter Operator predicate: expr: (loan_id = 4436) type: boolean Map Join Operator condition map: Inner Join 0 to 1 Inner Join 0 to 2 condition expressions: 0 1 {aid} {pi_id} 2 handleSkewJoin: false keys: 0 [Column[id]] 1 [Column[loan_id]] 2 [Column[loan_id]] outputColumnNames: _col4, _col5 Position of Big Table: 1 Map Join Operator condition map: Inner Join 0 to 1 Inner Join 1 to 2
[jira] [Updated] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10841: --- Component/s: Query Planning > [WHERE col is not null] does not work sometimes for queries with many JOIN > statements > - > > Key: HIVE-10841 > URL: https://issues.apache.org/jira/browse/HIVE-10841 > Project: Hive > Issue Type: Bug > Components: Query Planning, Query Processor >Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0 >Reporter: Alexander Pivovarov > > The result from the following SELECT query is 3 rows but it should be 1 row. > I checked it in MySQL - it returned 1 row. > To reproduce the issue in Hive > 1. prepare tables > {code} > drop table if exists L; > drop table if exists LA; > drop table if exists FR; > drop table if exists A; > drop table if exists PI; > drop table if exists acct; > create table L as select 4436 id; > create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; > create table FR as select 4436 loan_id; > create table A as select 4748 id; > create table PI as select 4415 id; > create table acct as select 4748 aid, 10 acc_n, 122 brn; > insert into table acct values(4748, null, null); > insert into table acct values(4748, null, null); > {code} > 2. run SELECT query > {code} > select > acct.ACC_N, > acct.brn > FROM L > JOIN LA ON L.id = LA.loan_id > JOIN FR ON L.id = FR.loan_id > JOIN A ON LA.aid = A.id > JOIN PI ON PI.id = LA.pi_id > JOIN acct ON A.id = acct.aid > WHERE > L.id = 4436 > and acct.brn is not null; > {code} > the result is 3 rows > {code} > 10122 > NULL NULL > NULL NULL > {code} > but it should be 1 row > {code} > 10122 > {code} > 2.1 "explain select ..." output for hive-1.3.0 MR > {code} > STAGE DEPENDENCIES: > Stage-12 is a root stage > Stage-9 depends on stages: Stage-12 > Stage-0 depends on stages: Stage-9 > STAGE PLANS: > Stage: Stage-12 > Map Reduce Local Work > Alias -> Map Local Tables: > a > Fetch Operator > limit: -1 > acct > Fetch Operator > limit: -1 > fr > Fetch Operator > limit: -1 > l > Fetch Operator > limit: -1 > pi > Fetch Operator > limit: -1 > Alias -> Map Local Operator Tree: > a > TableScan > alias: a > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column > stats: NONE > Filter Operator > predicate: id is not null (type: boolean) > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE > Column stats: NONE > HashTable Sink Operator > keys: > 0 _col5 (type: int) > 1 id (type: int) > 2 aid (type: int) > acct > TableScan > alias: acct > Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE > Column stats: NONE > Filter Operator > predicate: aid is not null (type: boolean) > Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE > Column stats: NONE > HashTable Sink Operator > keys: > 0 _col5 (type: int) > 1 id (type: int) > 2 aid (type: int) > fr > TableScan > alias: fr > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column > stats: NONE > Filter Operator > predicate: (loan_id = 4436) (type: boolean) > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE > Column stats: NONE > HashTable Sink Operator > keys: > 0 4436 (type: int) > 1 4436 (type: int) > 2 4436 (type: int) > l > TableScan > alias: l > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column > stats: NONE > Filter Operator > predicate: (id = 4436) (type: boolean) > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE > Column stats: NONE > HashTable Sink Operator > keys: > 0 4436 (type: int) > 1 4436 (type: int) > 2 4436 (type: int) > pi > TableScan > alias: pi > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column > stats: NONE > Filter Operator > predicate: id is not null (type: boolean) > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE > Column stats: NONE > HashTable Si
[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564213#comment-14564213 ] Alexander Pivovarov commented on HIVE-10841: if we look at hive-0.12 plan then we can see that it has "brn is not null" predicate in Filter Operator {code} acct TableScan alias: acct Filter Operator predicate: expr: brn is not null type: boolean {code} But in hive-1.3.0 plan I do not see "brn" at all. It only has "predicate: aid is not null" for acct table. Does it mean that hive-1.3.0 plan is wrong? I checked ppd folder diff btw 0.12.0 and 0.13.0 It was two fixes HIVE-4293 : Predicates following UDTF operator are removed by PPD HIVE-5411 : Migrate expression serialization to Kryo > [WHERE col is not null] does not work sometimes for queries with many JOIN > statements > - > > Key: HIVE-10841 > URL: https://issues.apache.org/jira/browse/HIVE-10841 > Project: Hive > Issue Type: Bug > Components: Query Planning, Query Processor >Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0 >Reporter: Alexander Pivovarov > > The result from the following SELECT query is 3 rows but it should be 1 row. > I checked it in MySQL - it returned 1 row. > To reproduce the issue in Hive > 1. prepare tables > {code} > drop table if exists L; > drop table if exists LA; > drop table if exists FR; > drop table if exists A; > drop table if exists PI; > drop table if exists acct; > create table L as select 4436 id; > create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; > create table FR as select 4436 loan_id; > create table A as select 4748 id; > create table PI as select 4415 id; > create table acct as select 4748 aid, 10 acc_n, 122 brn; > insert into table acct values(4748, null, null); > insert into table acct values(4748, null, null); > {code} > 2. run SELECT query > {code} > select > acct.ACC_N, > acct.brn > FROM L > JOIN LA ON L.id = LA.loan_id > JOIN FR ON L.id = FR.loan_id > JOIN A ON LA.aid = A.id > JOIN PI ON PI.id = LA.pi_id > JOIN acct ON A.id = acct.aid > WHERE > L.id = 4436 > and acct.brn is not null; > {code} > the result is 3 rows > {code} > 10122 > NULL NULL > NULL NULL > {code} > but it should be 1 row > {code} > 10122 > {code} > 2.1 "explain select ..." output for hive-1.3.0 MR > {code} > STAGE DEPENDENCIES: > Stage-12 is a root stage > Stage-9 depends on stages: Stage-12 > Stage-0 depends on stages: Stage-9 > STAGE PLANS: > Stage: Stage-12 > Map Reduce Local Work > Alias -> Map Local Tables: > a > Fetch Operator > limit: -1 > acct > Fetch Operator > limit: -1 > fr > Fetch Operator > limit: -1 > l > Fetch Operator > limit: -1 > pi > Fetch Operator > limit: -1 > Alias -> Map Local Operator Tree: > a > TableScan > alias: a > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column > stats: NONE > Filter Operator > predicate: id is not null (type: boolean) > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE > Column stats: NONE > HashTable Sink Operator > keys: > 0 _col5 (type: int) > 1 id (type: int) > 2 aid (type: int) > acct > TableScan > alias: acct > Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE > Column stats: NONE > Filter Operator > predicate: aid is not null (type: boolean) > Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE > Column stats: NONE > HashTable Sink Operator > keys: > 0 _col5 (type: int) > 1 id (type: int) > 2 aid (type: int) > fr > TableScan > alias: fr > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column > stats: NONE > Filter Operator > predicate: (loan_id = 4436) (type: boolean) > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE > Column stats: NONE > HashTable Sink Operator > keys: > 0 4436 (type: int) > 1 4436 (type: int) > 2 4436 (type: int) > l > TableScan > alias: l > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column > stats: NONE > Filter Operator > predicate: (id = 4436) (
[jira] [Assigned] (HIVE-10863) Merge trunk to Spark branch 5/28/2015 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepesh Khandelwal reassigned HIVE-10863: - Assignee: Deepesh Khandelwal (was: Xuefu Zhang) > Merge trunk to Spark branch 5/28/2015 [Spark Branch] > > > Key: HIVE-10863 > URL: https://issues.apache.org/jira/browse/HIVE-10863 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Xuefu Zhang >Assignee: Deepesh Khandelwal > Attachments: HIVE-10863.0-spark.patch, mj.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10863) Merge trunk to Spark branch 5/28/2015 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-10863: --- Attachment: HIVE-10863.0-spark.patch Patch #0 is a dummy patch to trigger a test run. > Merge trunk to Spark branch 5/28/2015 [Spark Branch] > > > Key: HIVE-10863 > URL: https://issues.apache.org/jira/browse/HIVE-10863 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > Attachments: HIVE-10863.0-spark.patch, mj.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10863) Merge trunk to Spark branch 5/28/2015 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-10863: --- Attachment: mj.patch > Merge trunk to Spark branch 5/28/2015 [Spark Branch] > > > Key: HIVE-10863 > URL: https://issues.apache.org/jira/browse/HIVE-10863 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > Attachments: mj.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10863) Merge trunk to Spark branch 5/28/2015 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564198#comment-14564198 ] Xuefu Zhang commented on HIVE-10863: Unfortunately, the patch is too big to be attached here. I had to commit the merge and fix anything later on. There are conflicts, as shown below: {code} Conflicts: pom.xml ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java ql/src/test/results/clientpositive/runtime_skewjoin_mapjoin_spark.q.out ql/src/test/results/clientpositive/spark/cbo_gby.q.out ql/src/test/results/clientpositive/spark/cbo_simple_select.q.out ql/src/test/results/clientpositive/spark/cbo_udf_udaf.q.out ql/src/test/results/clientpositive/spark/runtime_skewjoin_mapjoin_spark.q.out ql/src/test/results/clientpositive/spark/union12.q.out ql/src/test/results/clientpositive/spark/union17.q.out ql/src/test/results/clientpositive/spark/union20.q.out ql/src/test/results/clientpositive/spark/union21.q.out ql/src/test/results/clientpositive/spark/union22.q.out ql/src/test/results/clientpositive/spark/union24.q.out ql/src/test/results/clientpositive/spark/union26.q.out ql/src/test/results/clientpositive/spark/union27.q.out ql/src/test/results/clientpositive/spark/union31.q.out ql/src/test/results/clientpositive/spark/union32.q.out ql/src/test/results/clientpositive/spark/union34.q.out ql/src/test/results/clientpositive/spark/union_lateralview.q.out ql/src/test/results/clientpositive/spark/union_remove_12.q.out ql/src/test/results/clientpositive/spark/union_remove_13.q.out ql/src/test/results/clientpositive/spark/union_remove_14.q.out ql/src/test/results/clientpositive/spark/union_remove_22.q.out ql/src/test/results/clientpositive/spark/union_remove_23.q.out ql/src/test/results/clientpositive/spark/union_remove_6_subq.q.out ql/src/test/results/clientpositive/spark/union_top_level.q.out service/src/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java {code} I resolved most of them, except some changes from Spark branch is lost. The diff is shown in the attached mj.patch file. [~jxiang], could you take a look and see how to apply the diff? We will need to watch the test result and fix them as needed. > Merge trunk to Spark branch 5/28/2015 [Spark Branch] > > > Key: HIVE-10863 > URL: https://issues.apache.org/jira/browse/HIVE-10863 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > Attachments: mj.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10853) Create ExplainTask in ATS hook through ExplainWork
[ https://issues.apache.org/jira/browse/HIVE-10853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564193#comment-14564193 ] Gunther Hagleitner commented on HIVE-10853: --- Minor nit: would be nice to comment all the bool flags in setting up an instance of explain work. +1 > Create ExplainTask in ATS hook through ExplainWork > -- > > Key: HIVE-10853 > URL: https://issues.apache.org/jira/browse/HIVE-10853 > Project: Hive > Issue Type: Bug >Reporter: Gunther Hagleitner >Assignee: Pengcheng Xiong > Attachments: HIVE-10853.01.patch > > > Right now ExplainTask is created directly. That's fragile and can lead to > stuff like: HIVE-10829 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10841: --- Affects Version/s: 0.13.0 > [WHERE col is not null] does not work sometimes for queries with many JOIN > statements > - > > Key: HIVE-10841 > URL: https://issues.apache.org/jira/browse/HIVE-10841 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0 >Reporter: Alexander Pivovarov > > The result from the following SELECT query is 3 rows but it should be 1 row. > I checked it in MySQL - it returned 1 row. > To reproduce the issue in Hive > 1. prepare tables > {code} > drop table if exists L; > drop table if exists LA; > drop table if exists FR; > drop table if exists A; > drop table if exists PI; > drop table if exists acct; > create table L as select 4436 id; > create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; > create table FR as select 4436 loan_id; > create table A as select 4748 id; > create table PI as select 4415 id; > create table acct as select 4748 aid, 10 acc_n, 122 brn; > insert into table acct values(4748, null, null); > insert into table acct values(4748, null, null); > {code} > 2. run SELECT query > {code} > select > acct.ACC_N, > acct.brn > FROM L > JOIN LA ON L.id = LA.loan_id > JOIN FR ON L.id = FR.loan_id > JOIN A ON LA.aid = A.id > JOIN PI ON PI.id = LA.pi_id > JOIN acct ON A.id = acct.aid > WHERE > L.id = 4436 > and acct.brn is not null; > {code} > the result is 3 rows > {code} > 10122 > NULL NULL > NULL NULL > {code} > but it should be 1 row > {code} > 10122 > {code} > 2.1 "explain select ..." output for hive-1.3.0 MR > {code} > STAGE DEPENDENCIES: > Stage-12 is a root stage > Stage-9 depends on stages: Stage-12 > Stage-0 depends on stages: Stage-9 > STAGE PLANS: > Stage: Stage-12 > Map Reduce Local Work > Alias -> Map Local Tables: > a > Fetch Operator > limit: -1 > acct > Fetch Operator > limit: -1 > fr > Fetch Operator > limit: -1 > l > Fetch Operator > limit: -1 > pi > Fetch Operator > limit: -1 > Alias -> Map Local Operator Tree: > a > TableScan > alias: a > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column > stats: NONE > Filter Operator > predicate: id is not null (type: boolean) > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE > Column stats: NONE > HashTable Sink Operator > keys: > 0 _col5 (type: int) > 1 id (type: int) > 2 aid (type: int) > acct > TableScan > alias: acct > Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE > Column stats: NONE > Filter Operator > predicate: aid is not null (type: boolean) > Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE > Column stats: NONE > HashTable Sink Operator > keys: > 0 _col5 (type: int) > 1 id (type: int) > 2 aid (type: int) > fr > TableScan > alias: fr > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column > stats: NONE > Filter Operator > predicate: (loan_id = 4436) (type: boolean) > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE > Column stats: NONE > HashTable Sink Operator > keys: > 0 4436 (type: int) > 1 4436 (type: int) > 2 4436 (type: int) > l > TableScan > alias: l > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column > stats: NONE > Filter Operator > predicate: (id = 4436) (type: boolean) > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE > Column stats: NONE > HashTable Sink Operator > keys: > 0 4436 (type: int) > 1 4436 (type: int) > 2 4436 (type: int) > pi > TableScan > alias: pi > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column > stats: NONE > Filter Operator > predicate: id is not null (type: boolean) > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE > Column stats: NONE > HashTable Sink Operator >
[jira] [Updated] (HIVE-10821) Beeline-CLI: Implement CLI source command using Beeline functionality
[ https://issues.apache.org/jira/browse/HIVE-10821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-10821: Attachment: (was: HIVE-10821.1-beeline-cli.patch) > Beeline-CLI: Implement CLI source command using Beeline functionality > - > > Key: HIVE-10821 > URL: https://issues.apache.org/jira/browse/HIVE-10821 > Project: Hive > Issue Type: Sub-task > Components: CLI >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu > Attachments: HIVE-10821.1-beeline-cli.patch, > HIVE-10821.1-beeline-cli.patch, HIVE-10821.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10821) Beeline-CLI: Implement CLI source command using Beeline functionality
[ https://issues.apache.org/jira/browse/HIVE-10821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-10821: Attachment: (was: HIVE-10821.1-beeline-cli.patch) > Beeline-CLI: Implement CLI source command using Beeline functionality > - > > Key: HIVE-10821 > URL: https://issues.apache.org/jira/browse/HIVE-10821 > Project: Hive > Issue Type: Sub-task > Components: CLI >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu > Attachments: HIVE-10821.1-beeline-cli.patch, > HIVE-10821.1-beeline-cli.patch, HIVE-10821.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10821) Beeline-CLI: Implement CLI source command using Beeline functionality
[ https://issues.apache.org/jira/browse/HIVE-10821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-10821: Attachment: HIVE-10821.1-beeline-cli.patch Thanks [~chinnalalam] for review. Update the patch addressing your comments. > Beeline-CLI: Implement CLI source command using Beeline functionality > - > > Key: HIVE-10821 > URL: https://issues.apache.org/jira/browse/HIVE-10821 > Project: Hive > Issue Type: Sub-task > Components: CLI >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu > Attachments: HIVE-10821.1-beeline-cli.patch, > HIVE-10821.1-beeline-cli.patch, HIVE-10821.1-beeline-cli.patch, > HIVE-10821.1-beeline-cli.patch, HIVE-10821.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10821) Beeline-CLI: Implement CLI source command using Beeline functionality
[ https://issues.apache.org/jira/browse/HIVE-10821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-10821: Attachment: HIVE-10821.1-beeline-cli.patch Thanks [~chinnalalam] for review. Update the patch addressing your comments. > Beeline-CLI: Implement CLI source command using Beeline functionality > - > > Key: HIVE-10821 > URL: https://issues.apache.org/jira/browse/HIVE-10821 > Project: Hive > Issue Type: Sub-task > Components: CLI >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu > Attachments: HIVE-10821.1-beeline-cli.patch, > HIVE-10821.1-beeline-cli.patch, HIVE-10821.1-beeline-cli.patch, > HIVE-10821.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7767) hive.optimize.union.remove does not work properly [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7767: -- Fix Version/s: (was: spark-branch) 1.1.0 > hive.optimize.union.remove does not work properly [Spark Branch] > > > Key: HIVE-7767 > URL: https://issues.apache.org/jira/browse/HIVE-7767 > Project: Hive > Issue Type: Sub-task >Reporter: Na Yang >Assignee: Na Yang > Fix For: 1.1.0 > > Attachments: HIVE-7767.1-spark.patch, HIVE-7767.2-spark.patch, > HIVE-7767.2-spark.patch, HIVE-7767.3-spark.patch > > > Turing on the hive.optimize.union.remove property generates wrong union all > result. > For Example: > {noformat} > create table inputTbl1(key string, val string) stored as textfile; > load data local inpath '../../data/files/T1.txt' into table inputTbl1; > SELECT * > FROM ( > SELECT key, count(1) as values from inputTbl1 group by key > UNION ALL > SELECT key, count(1) as values from inputTbl1 group by key > ) a; > {noformat} > when the hive.optimize.union.remove is turned on, the query result is like: > {noformat} > 1 1 > 2 1 > 3 1 > 7 1 > 8 2 > {noformat} > when the hive.optimize.union.remove is turned off, the query result is like: > {noformat} > 7 1 > 2 1 > 8 2 > 3 1 > 1 1 > 7 1 > 2 1 > 8 2 > 3 1 > 1 1 > {noformat} > The expected query result is: > {noformat} > 7 1 > 2 1 > 8 2 > 3 1 > 1 1 > 7 1 > 2 1 > 8 2 > 3 1 > 1 1 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9267) Ensure custom UDF works with Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9267: -- Fix Version/s: (was: spark-branch) 1.1.0 > Ensure custom UDF works with Spark [Spark Branch] > - > > Key: HIVE-9267 > URL: https://issues.apache.org/jira/browse/HIVE-9267 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > Fix For: 1.1.0 > > Attachments: HIVE-9267.1-spark.patch > > > Create or add auto qtest if necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8352) Enable windowing.q for spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8352: -- Fix Version/s: (was: spark-branch) 1.1.0 > Enable windowing.q for spark [Spark Branch] > --- > > Key: HIVE-8352 > URL: https://issues.apache.org/jira/browse/HIVE-8352 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Brock Noland >Assignee: Jimmy Xiang >Priority: Minor > Fix For: 1.1.0 > > Attachments: HIVE-8352.1-spark.patch, HIVE-8352.1-spark.patch, > hive-8385.patch > > > We should enable windowing.q for basic windowing coverage. After checking out > the spark branch, we would build: > {noformat} > $ mvn clean install -DskipTests -Phadoop-2 > $ cd itests/ > $ mvn clean install -DskipTests -Phadoop-2 > {noformat} > Then generate the windowing.q.out file: > {noformat} > $ cd qtest-spark/ > $ mvn test -Dtest=TestSparkCliDriver -Dqfile=windowing.q -Phadoop-2 > -Dtest.output.overwrite=true > {noformat} > Compare the output against MapReduce: > {noformat} > $ diff -y -W 150 > ../../ql/src/test/results/clientpositive/spark/windowing.q.out > ../../ql/src/test/results/clientpositive/windowing.q.out| less > {noformat} > And if everything looks good, add it to {{spark.query.files}} in > {{./itests/src/test/resources/testconfiguration.properties}} > then submit the patch including the .q file -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9040) Spark Memory can be formatted string [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9040: -- Fix Version/s: (was: spark-branch) 1.1.0 > Spark Memory can be formatted string [Spark Branch] > --- > > Key: HIVE-9040 > URL: https://issues.apache.org/jira/browse/HIVE-9040 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Brock Noland >Assignee: Jimmy Xiang > Fix For: 1.1.0 > > Attachments: HIVE-9040.1-spark.patch, HIVE-9040.2-spark.patch, > HIVE-9040.3-spark.patch > > > Here: > https://github.com/apache/hive/blob/spark/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java#L72 > we call {{getInt}} on {{spark.executor.memory}} but this is a formatted > string, example here: http://spark.apache.org/docs/1.0.1/configuration.html > as such, I get: > {noformat} > 2014-12-08 03:04:48,114 WARN [HiveServer2-Handler-Pool: Thread-34]: > spark.SetSparkReducerParallelism > (SetSparkReducerParallelism.java:process(141)) - Failed to create spark > client. > java.lang.NumberFormatException: For input string: "23000m" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Integer.parseInt(Integer.java:492) > at java.lang.Integer.parseInt(Integer.java:527) > at > scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229) > at scala.collection.immutable.StringOps.toInt(StringOps.scala:31) > at > org.apache.spark.SparkConf$$anonfun$getInt$2.apply(SparkConf.scala:184) > at > org.apache.spark.SparkConf$$anonfun$getInt$2.apply(SparkConf.scala:184) > at scala.Option.map(Option.scala:145) > at org.apache.spark.SparkConf.getInt(SparkConf.scala:184) > at > org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.getMemoryAndCores(SparkSessionImpl.java:72) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8074) Merge trunk into spark 9/12/2014
[ https://issues.apache.org/jira/browse/HIVE-8074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8074: -- Fix Version/s: (was: spark-branch) 1.1.0 > Merge trunk into spark 9/12/2014 > > > Key: HIVE-8074 > URL: https://issues.apache.org/jira/browse/HIVE-8074 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Brock Noland >Assignee: Brock Noland > Fix For: 1.1.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7382) Create a MiniSparkCluster and set up a testing framework [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7382: -- Fix Version/s: (was: spark-branch) 1.1.0 > Create a MiniSparkCluster and set up a testing framework [Spark Branch] > --- > > Key: HIVE-7382 > URL: https://issues.apache.org/jira/browse/HIVE-7382 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Xuefu Zhang >Assignee: Rui Li > Labels: Spark-M1 > Fix For: 1.1.0 > > > To automatically test Hive functionality over Spark execution engine, we need > to create a test framework that can execute Hive queries with Spark as the > backend. For that, we should create a MiniSparkCluser for this, similar to > other execution engines. > Spark has a way to create a local cluster with a few processes in the local > machine, each process is a work node. It's fairly close to a real Spark > cluster. Our mini cluster can be based on that. > For more info, please refer to the design doc on wiki. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8843) Release RDD cache when Hive query is done [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8843: -- Fix Version/s: (was: spark-branch) 1.1.0 > Release RDD cache when Hive query is done [Spark Branch] > > > Key: HIVE-8843 > URL: https://issues.apache.org/jira/browse/HIVE-8843 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Xuefu Zhang >Assignee: Jimmy Xiang > Fix For: 1.1.0 > > Attachments: HIVE-8843.1-spark.patch, HIVE-8843.2-spark.patch, > HIVE-8843.3-spark.patch, HIVE-8843.3-spark.patch > > > In some multi-inser cases, RDD.cache() is called to improve performance. RDD > is SparkContext specific, but the caching is useful only for the query. Thus, > once the query is executed, we need to release the cache used by calling > RDD.uncache(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9110) Performance of SELECT COUNT(*) FROM store_sales WHERE ss_item_sk IS NOT NULL [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9110: -- Fix Version/s: (was: spark-branch) 1.1.0 > Performance of SELECT COUNT(*) FROM store_sales WHERE ss_item_sk IS NOT NULL > [Spark Branch] > --- > > Key: HIVE-9110 > URL: https://issues.apache.org/jira/browse/HIVE-9110 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Brock Noland >Assignee: Rui Li > Fix For: 1.1.0 > > > The query > {noformat} > SELECT COUNT(*) FROM store_sales WHERE ss_item_sk IS NOT NULL > {noformat} > could benefit from performance enhancements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8908) Investigate test failure on join34.q [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8908: -- Fix Version/s: (was: spark-branch) 1.1.0 > Investigate test failure on join34.q [Spark Branch] > --- > > Key: HIVE-8908 > URL: https://issues.apache.org/jira/browse/HIVE-8908 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Chao Sun >Assignee: Chao Sun > Fix For: 1.1.0 > > Attachments: HIVE-8908.1-spark.patch, HIVE-8908.2-spark.patch > > > For this query, the plan doesn't look correct: > {noformat} > OK > STAGE DEPENDENCIES: > Stage-4 is a root stage > Stage-1 depends on stages: Stage-5, Stage-4 > Stage-2 depends on stages: Stage-1 > Stage-0 depends on stages: Stage-2 > Stage-3 depends on stages: Stage-0 > Stage-5 is a root stage > STAGE PLANS: > Stage: Stage-4 > Spark > DagName: chao_20141118150101_a47a2d7b-e750-4764-be66-5ba95ebbe433:6 > Vertices: > Map 4 > Map Operator Tree: > TableScan > alias: x > Statistics: Num rows: 1 Data size: 216 Basic stats: > COMPLETE Column stats: NONE > Filter Operator > predicate: key is not null (type: boolean) > Statistics: Num rows: 1 Data size: 216 Basic stats: > COMPLETE Column stats: NONE > Spark HashTable Sink Operator > condition expressions: > 0 {_col1} > 1 {value} > keys: > 0 _col0 (type: string) > 1 key (type: string) > Reduce Output Operator > key expressions: key (type: string) > sort order: + > Map-reduce partition columns: key (type: string) > Statistics: Num rows: 1 Data size: 216 Basic stats: > COMPLETE Column stats: NONE > value expressions: value (type: string) > Local Work: > Map Reduce Local Work > Stage: Stage-1 > Spark > Edges: > Union 2 <- Map 1 (NONE, 0), Map 3 (NONE, 0) > DagName: chao_20141118150101_a47a2d7b-e750-4764-be66-5ba95ebbe433:4 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: x > Filter Operator > predicate: (key < 20) (type: boolean) > Select Operator > expressions: key (type: string), value (type: string) > outputColumnNames: _col0, _col1 > Map Join Operator > condition map: > Inner Join 0 to 1 > condition expressions: > 0 {_col1} > 1 {key} {value} > keys: > 0 _col0 (type: string) > 1 key (type: string) > outputColumnNames: _col1, _col2, _col3 > input vertices: > 1 Map 4 > Select Operator > expressions: _col2 (type: string), _col3 (type: > string), _col1 (type: string) > outputColumnNames: _col0, _col1, _col2 > File Output Operator > compressed: false > table: > input format: > org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > name: default.dest_j1 > Local Work: > Map Reduce Local Work > Map 3 > Map Operator Tree: > TableScan > alias: x1 > Filter Operator > predicate: (key > 100) (type: boolean) > Select Operator > expressions: key (type: string), value (type: string) > outputColumnNames: _col0, _col1 > Map Join Operator > condition map: > Inner Join 0 to 1 > condition expressions: > 0 {_col1} > 1 {key} {value} > keys: > 0 _col0 (type: string) >
[jira] [Updated] (HIVE-8141) Refactor the GraphTran code by moving union handling logic to UnionTran [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8141: -- Fix Version/s: (was: spark-branch) 1.1.0 > Refactor the GraphTran code by moving union handling logic to UnionTran > [Spark Branch] > -- > > Key: HIVE-8141 > URL: https://issues.apache.org/jira/browse/HIVE-8141 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Na Yang >Assignee: Na Yang > Labels: Spark-M1 > Fix For: 1.1.0 > > Attachments: HIVE-8141.1-spark.patch > > > In the current hive on spark code, union logic is handled in the GraphTran > class. The Union logic could be moved to the UnionTran class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8913) Make SparkMapJoinResolver handle runtime skew join [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8913: -- Fix Version/s: (was: spark-branch) 1.1.0 > Make SparkMapJoinResolver handle runtime skew join [Spark Branch] > - > > Key: HIVE-8913 > URL: https://issues.apache.org/jira/browse/HIVE-8913 > Project: Hive > Issue Type: Improvement > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Fix For: 1.1.0 > > Attachments: HIVE-8913.1-spark.patch, HIVE-8913.2-spark.patch, > HIVE-8913.3-spark.patch > > > Sub-task of HIVE-8406. > Now we have {{SparkMapJoinResolver}} in place. But at the moment, it doesn't > handle the map join task created by upstream SkewJoinResolver, i.e. those > wrapped in a ConditionalTask. We have to implement this part for runtime skew > join to work on spark. To do so, we can borrow logic from {{MapJoinResolver}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8536) Enable SkewJoinResolver for spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8536: -- Fix Version/s: (was: spark-branch) 1.1.0 > Enable SkewJoinResolver for spark [Spark Branch] > > > Key: HIVE-8536 > URL: https://issues.apache.org/jira/browse/HIVE-8536 > Project: Hive > Issue Type: Improvement > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Fix For: 1.1.0 > > Attachments: HIVE-8536.1-spark.patch, HIVE-8536.2-spark.patch, > HIVE-8536.3-spark.patch, HIVE-8536.4-spark.patch > > > Sub-task of HIVE-8406 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7748) Add qfile_regex to qtest-spark pom [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7748: -- Fix Version/s: (was: spark-branch) 1.1.0 > Add qfile_regex to qtest-spark pom [Spark Branch] > - > > Key: HIVE-7748 > URL: https://issues.apache.org/jira/browse/HIVE-7748 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Brock Noland >Assignee: Brock Noland > Fix For: 1.1.0 > > Attachments: HIVE-7748.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7755) Enable avro* tests [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7755: -- Fix Version/s: (was: spark-branch) 1.1.0 > Enable avro* tests [Spark Branch] > - > > Key: HIVE-7755 > URL: https://issues.apache.org/jira/browse/HIVE-7755 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Brock Noland >Assignee: Brock Noland > Fix For: 1.1.0 > > Attachments: HIVE-7755.1-spark.patch, HIVE-7755.2-spark.patch, > HIVE-7755.3-spark.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9081) Bucket mapjoin should use the new alias in posToAliasMap [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9081: -- Fix Version/s: (was: spark-branch) 1.1.0 > Bucket mapjoin should use the new alias in posToAliasMap [Spark Branch] > --- > > Key: HIVE-9081 > URL: https://issues.apache.org/jira/browse/HIVE-9081 > Project: Hive > Issue Type: Sub-task >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang > Fix For: 1.1.0 > > Attachments: HIVE-9081.1-spark.patch, HIVE-9081.2-spark.patch > > > In converting a mapjoin to a bucket mapjoin, the join aliases could be > updated. So we should update the posToAliasMap accordingly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7708) Fix qtest-spark pom.xml reference to test properties [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7708: -- Fix Version/s: (was: spark-branch) 1.1.0 > Fix qtest-spark pom.xml reference to test properties [Spark Branch] > --- > > Key: HIVE-7708 > URL: https://issues.apache.org/jira/browse/HIVE-7708 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Brock Noland >Assignee: Brock Noland > Fix For: 1.1.0 > > Attachments: HIVE-7708.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8029) Remove reducers number configure in SparkTask [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8029: -- Fix Version/s: (was: spark-branch) 1.1.0 > Remove reducers number configure in SparkTask [Spark Branch] > > > Key: HIVE-8029 > URL: https://issues.apache.org/jira/browse/HIVE-8029 > Project: Hive > Issue Type: Improvement > Components: Spark >Reporter: Chengxiang Li >Assignee: Chengxiang Li > Labels: Spark-M4 > Fix For: 1.1.0 > > Attachments: HIVE-8029.1-spark.patch > > > We do not need duplicated logic to configure reducers number in SparkTask, as > SetSparkReduceParallelism would always set reducers number in compiler phase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8743) Disable MapJoin [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8743: -- Fix Version/s: (was: spark-branch) 1.1.0 > Disable MapJoin [Spark Branch] > -- > > Key: HIVE-8743 > URL: https://issues.apache.org/jira/browse/HIVE-8743 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang >Priority: Minor > Fix For: 1.1.0 > > Attachments: HIVE-8743.1-spark.patch > > > Disable MapJoin in Spark branch for now. It is not implemented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9054) Add additional logging to SetSparkReducerParallelism [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9054: -- Fix Version/s: (was: spark-branch) 1.1.0 > Add additional logging to SetSparkReducerParallelism [Spark Branch] > --- > > Key: HIVE-9054 > URL: https://issues.apache.org/jira/browse/HIVE-9054 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Brock Noland >Assignee: Brock Noland > Fix For: 1.1.0 > > Attachments: HIVE-9054.1-spark.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8883) Investigate test failures on auto_join30.q [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8883: -- Fix Version/s: (was: spark-branch) 1.1.0 > Investigate test failures on auto_join30.q [Spark Branch] > - > > Key: HIVE-8883 > URL: https://issues.apache.org/jira/browse/HIVE-8883 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Chao Sun >Assignee: Chao Sun > Fix For: 1.1.0 > > Attachments: HIVE-8883.1-spark.patch, HIVE-8883.2-spark.patch, > HIVE-8883.3-spark.patch, HIVE-8883.4-spark.patch > > > This test fails with the following stack trace: > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at > org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 2014-11-14 17:05:09,206 ERROR [Executor task launch worker-4]: > spark.SparkReduceRecordHandler > (SparkReduceRecordHandler.java:processRow(285)) - > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row (tag=0) > {"key":{"reducesinkkey0":"val_0"},"value":{"_col0":"0"}} > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:328) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at > org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected > exception: null > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:318) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319) > ... 14 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOpe
[jira] [Updated] (HIVE-9378) Spark qfile tests should reuse RSC [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9378: -- Fix Version/s: (was: spark-branch) 1.1.0 > Spark qfile tests should reuse RSC [Spark Branch] > - > > Key: HIVE-9378 > URL: https://issues.apache.org/jira/browse/HIVE-9378 > Project: Hive > Issue Type: Improvement > Components: Spark >Affects Versions: spark-branch >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang > Fix For: 1.1.0 > > Attachments: HIVE-9378.1-spark.patch, HIVE-9378.2-spark.patch, > HIVE-9378.3-spark.patch, HIVE-9378.4-spark.patch > > > Run several qfile tests, use jps to monitor the java processes. You will find > several SparkSubmitDriverBootstrapper processes are created (not the same > time of course). It seems to me that we create a RSC for each qfile, then > terminate it when this qfile test is done. The RSC seems not shared among > qfiles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8899) Merge from trunk to spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8899: -- Fix Version/s: (was: spark-branch) 1.1.0 > Merge from trunk to spark [Spark Branch] > > > Key: HIVE-8899 > URL: https://issues.apache.org/jira/browse/HIVE-8899 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Brock Noland >Assignee: Brock Noland > Fix For: 1.1.0 > > Attachments: HIVE-8899.2-spark.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9192) One-pass SMB Optimizations [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9192: -- Fix Version/s: (was: spark-branch) 1.1.0 > One-pass SMB Optimizations [Spark Branch] > - > > Key: HIVE-9192 > URL: https://issues.apache.org/jira/browse/HIVE-9192 > Project: Hive > Issue Type: Sub-task >Affects Versions: spark-branch >Reporter: Szehon Ho >Assignee: Szehon Ho >Priority: Minor > Fix For: 1.1.0 > > Attachments: HIVE-9192-spark.patch > > > Currently for Spark compiler's task-generation there is a second-pass to > handle SMB joins. This might be optimized to one-pass. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7745) NullPointerException when turn on hive.optimize.union.remove, hive.merge.mapfiles and hive.merge.mapredfiles [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7745: -- Fix Version/s: (was: spark-branch) 1.1.0 > NullPointerException when turn on hive.optimize.union.remove, > hive.merge.mapfiles and hive.merge.mapredfiles [Spark Branch] > --- > > Key: HIVE-7745 > URL: https://issues.apache.org/jira/browse/HIVE-7745 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: spark-branch >Reporter: Na Yang >Assignee: Na Yang > Fix For: 1.1.0 > > Attachments: HIVE-7745-spark.patch > > > When the hive.optimize.union.remove, hive.merge.mapfiles and > hive.merge.mapredfiles are turned on, it throws NullPointerException when I > do the following queries: > {noformat} > create table inputTbl1(key string, val string) stored as textfile; > create table outputTbl1(key string, values bigint) stored as rcfile; > load data local inpath '../../data/files/T1.txt' into table inputTbl1; > explain > insert overwrite table outputTbl1 > SELECT * FROM > ( > select key, count(1) as values from inputTbl1 group by key > union all > select * FROM ( > SELECT key, 1 as values from inputTbl1 > UNION ALL > SELECT key, 2 as values from inputTbl1 > ) a > )b; > {noformat} > If the hive.merge.mapfiles and hive.merge.mapredfiles are turned off, I do > not see any error. > Here is the stack trace: > {noformat} > 2014-08-16 01:32:26,849 ERROR [main]: ql.Driver > (SessionState.java:printError(681)) - FAILED: NullPointerException null > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.createMoveTask(GenMapRedUtils.java:1738) > at > org.apache.hadoop.hive.ql.parse.spark.GenSparkUtils.processFileSink(GenSparkUtils.java:281) > at > org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.generateTaskTree(SparkCompiler.java:187) > at > org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:199) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9508) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:208) > at > org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:208) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:414) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:310) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1005) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1070) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:942) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:932) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:246) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:198) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:408) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:197) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8783) Create some tests that use Spark counter for stats collection [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8783: -- Fix Version/s: (was: spark-branch) 1.1.0 > Create some tests that use Spark counter for stats collection [Spark Branch] > > > Key: HIVE-8783 > URL: https://issues.apache.org/jira/browse/HIVE-8783 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Xuefu Zhang >Assignee: Chengxiang Li > Fix For: 1.1.0 > > Attachments: HIVE-8783.1-spark.patch, HIVE-8783.2-spark.patch, > HIVE-8783.2-spark.patch > > > Currently when .q tests are run with Spark, the default stats collection is > "fs". We need to have some tests that use Spark counter for stats collection > to enhance coverage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9517) UNION ALL query failed with ArrayIndexOutOfBoundsException [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9517: -- Fix Version/s: (was: spark-branch) 1.1.0 > UNION ALL query failed with ArrayIndexOutOfBoundsException [Spark Branch] > - > > Key: HIVE-9517 > URL: https://issues.apache.org/jira/browse/HIVE-9517 > Project: Hive > Issue Type: Sub-task >Affects Versions: spark-branch >Reporter: Chao Sun >Assignee: Chao Sun > Fix For: 1.1.0 > > Attachments: HIVE-9517.1.patch, HIVE-9517.2.patch > > > I was running a query from cbo_gby_empty.q: > {code} > select unionsrc.key, unionsrc.value FROM (select 'max' as key, max(c_int) as > value from cbo_t3 s1 > UNION ALL > select 'min' as key, min(c_int) as value from cbo_t3 s2 > UNION ALL > select 'avg' as key, avg(c_int) as value from cbo_t3 s3) unionsrc > order by unionsrc.key; > {code} > and got the following exception: > {noformat} > 2015-01-29 15:57:55,948 ERROR [Executor task launch worker-1]: > spark.SparkReduceRecordHandler > (SparkReduceRecordHandler.java:processRow(299)) - Fatal error: > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) {"key":{"reducesinkkey0":"max"},"value":{"_col0":1.5}} > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) {"key":{"reducesinkkey0":"max"},"value":{"_col0":1.5}} > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:339) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) > at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390) > at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating > VALUE._col0 > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:82) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:330) > ... 17 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 3 > at > org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.byteArrayToLong(LazyBinaryUtils.java:84) > at > org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryDouble.init(LazyBinaryDouble.java:43) > at > org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:264) > at > org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:201) > at > org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64) > at > org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:98) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:77) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8274) Refactoring SparkPlan and SparkPlanGeneration [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8274: -- Fix Version/s: (was: spark-branch) 1.1.0 > Refactoring SparkPlan and SparkPlanGeneration [Spark Branch] > > > Key: HIVE-8274 > URL: https://issues.apache.org/jira/browse/HIVE-8274 > Project: Hive > Issue Type: Task > Components: Spark >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > Labels: Spark-M1 > Fix For: 1.1.0 > > > As part of HIVE-8118, SparkWork will be modified with cloned Map/Reduce- > Works, and input RDDs and some intemediate RDDs may need to be cached for > performance. To accomodate these, SparkPlan model and SparkPlan generation > need to be refactored. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8422) Turn on all join .q tests [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8422: -- Fix Version/s: (was: spark-branch) 1.1.0 > Turn on all join .q tests [Spark Branch] > > > Key: HIVE-8422 > URL: https://issues.apache.org/jira/browse/HIVE-8422 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Xuefu Zhang >Assignee: Chao Sun > Fix For: 1.1.0 > > Attachments: HIVE-8422.1-spark.patch, HIVE-8422.2-spark.patch > > > With HIVE-8412, all join queries should work on Spark, whether they require a > particular optimization or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8160) Upgrade Spark dependency to 1.2.0-SNAPSHOT [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8160: -- Fix Version/s: (was: spark-branch) 1.1.0 > Upgrade Spark dependency to 1.2.0-SNAPSHOT [Spark Branch] > - > > Key: HIVE-8160 > URL: https://issues.apache.org/jira/browse/HIVE-8160 > Project: Hive > Issue Type: Task > Components: Spark >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang >Priority: Minor > Labels: Spark-M1 > Fix For: 1.1.0 > > Attachments: HIVE-8160.1-spark.patch > > > Hive on Spark needs SPARK-2978, which is now available in latest Spark main > branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8436) Modify SparkWork to split works with multiple child works [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8436: -- Fix Version/s: (was: spark-branch) 1.1.0 > Modify SparkWork to split works with multiple child works [Spark Branch] > > > Key: HIVE-8436 > URL: https://issues.apache.org/jira/browse/HIVE-8436 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Xuefu Zhang >Assignee: Chao Sun > Fix For: 1.1.0 > > Attachments: HIVE-8436.1-spark.patch, HIVE-8436.10-spark.patch, > HIVE-8436.11-spark.patch, HIVE-8436.2-spark.patch, HIVE-8436.3-spark.patch, > HIVE-8436.4-spark.patch, HIVE-8436.5-spark.patch, HIVE-8436.6-spark.patch, > HIVE-8436.7-spark.patch, HIVE-8436.8-spark.patch, HIVE-8436.9-spark.patch > > > Based on the design doc, we need to split the operator tree of a work in > SparkWork if the work is connected to multiple child works. The way splitting > the operator tree is performed by cloning the original work and removing > unwanted branches in the operator tree. Please refer to the design doc for > details. > This process should be done right before we generate SparkPlan. We should > have a utility method that takes the orignal SparkWork and return a modified > SparkWork. > This process should also keep the information about the original work and its > clones. Such information will be needed during SparkPlan generation > (HIVE-8437). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8405) Research Bucket Map Join [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8405: -- Fix Version/s: (was: spark-branch) 1.1.0 > Research Bucket Map Join [Spark Branch] > --- > > Key: HIVE-8405 > URL: https://issues.apache.org/jira/browse/HIVE-8405 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Na Yang >Assignee: Na Yang > Fix For: 1.1.0 > > Attachments: hive-on-spark-bucketmapjoin.pdf > > > Research on how to implement Bucket Map Join for hive on Spark -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7793) Enable tests on Spark branch (3) [Sparch Branch]
[ https://issues.apache.org/jira/browse/HIVE-7793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7793: -- Fix Version/s: (was: spark-branch) 1.1.0 > Enable tests on Spark branch (3) [Sparch Branch] > > > Key: HIVE-7793 > URL: https://issues.apache.org/jira/browse/HIVE-7793 > Project: Hive > Issue Type: Sub-task >Reporter: Brock Noland >Assignee: Chengxiang Li > Fix For: 1.1.0 > > Attachments: HIVE-7793.1-spark.patch > > > This jira is to enable *most* of the tests below. If tests don't pass because > of some unsupported feature, ensure that a JIRA exists and move on. > {noformat} > ptf.q,\ > sample1.q,\ > script_env_var1.q,\ > script_env_var2.q,\ > script_pipe.q,\ > scriptfile1.q,\ > stats_counter.q,\ > stats_counter_partitioned.q,\ > stats_noscan_1.q,\ > subquery_exists.q,\ > subquery_in.q,\ > temp_table.q,\ > transform1.q,\ > transform2.q,\ > transform_ppr1.q,\ > transform_ppr2.q,\ > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7411) Exclude hadoop 1 from spark dep [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7411: -- Fix Version/s: (was: spark-branch) 1.1.0 > Exclude hadoop 1 from spark dep [Spark Branch] > -- > > Key: HIVE-7411 > URL: https://issues.apache.org/jira/browse/HIVE-7411 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Brock Noland >Assignee: Brock Noland > Fix For: 1.1.0 > > Attachments: HIVE-7411.patch > > > The branch does not compile on my machine. Attached patch fixes this. > NO PRECOMMIT TESTS (I am working on this) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8540) HivePairFlatMapFunction.java missing license header [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8540: -- Fix Version/s: (was: spark-branch) 1.1.0 > HivePairFlatMapFunction.java missing license header [Spark Branch] > -- > > Key: HIVE-8540 > URL: https://issues.apache.org/jira/browse/HIVE-8540 > Project: Hive > Issue Type: Sub-task > Components: Documentation >Affects Versions: spark-branch >Reporter: Xuefu Zhang >Assignee: Chao Sun > Fix For: 1.1.0 > > Attachments: HIVE-8540.1-spark.patch > > > Also, please remove unneeded imports in SparkUtilities.java. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9207) Add more log information for debug RSC[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9207: -- Fix Version/s: (was: spark-branch) 1.1.0 > Add more log information for debug RSC[Spark Branch] > > > Key: HIVE-9207 > URL: https://issues.apache.org/jira/browse/HIVE-9207 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Chengxiang Li >Assignee: Chengxiang Li >Priority: Minor > Fix For: 1.1.0 > > Attachments: HIVE-9207.1-spark.patch > > > Currently, error message in certain scenerio is lost in RSC, and we need more > log info in DEBUG level for debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8982) IndexOutOfBounds exception in mapjoin [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8982: -- Fix Version/s: (was: spark-branch) 1.1.0 > IndexOutOfBounds exception in mapjoin [Spark Branch] > > > Key: HIVE-8982 > URL: https://issues.apache.org/jira/browse/HIVE-8982 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Szehon Ho >Assignee: Chao Sun > Fix For: 1.1.0 > > Attachments: HIVE-8982.1-spark.patch, HIVE-8982.2-spark.patch > > > There are sometimes random failures in spark mapjoin during unit tests like: > {noformat} > org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 > at > org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) > at > org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) > at > org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365) > at > org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1365) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 > at java.util.ArrayList.rangeCheck(ArrayList.java:635) > at java.util.ArrayList.get(ArrayList.java:411) > at > org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.first(MapJoinEagerRowContainer.java:70) > at > org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.write(MapJoinEagerRowContainer.java:150) > at > org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.persist(MapJoinTableContainerSerDe.java:167) > at > org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.flushToFile(SparkHashTableSinkOperator.java:128) > at > org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:77) > ... 20 more > org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 > at > org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.closeOp(SparkHashTableSinkOperator.java:83) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) > at > org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:185) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreac
[jira] [Updated] (HIVE-7880) Support subquery [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7880: -- Fix Version/s: (was: spark-branch) 1.1.0 > Support subquery [Spark Branch] > --- > > Key: HIVE-7880 > URL: https://issues.apache.org/jira/browse/HIVE-7880 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Chengxiang Li >Assignee: Xuefu Zhang > Labels: Spark-M2 > Fix For: 1.1.0 > > Attachments: HIVE-7880.1-spark.patch > > > While try to enable SubQuery qtests, I found that SubQuery cases return null > value currently, we should enable subquery for Hive on Spark. We should > enable subquery_exists.q and subquery_in.q in this task as Tez does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9568) Revert changes in two test configuration files accidently brought in by HIVE-9552 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9568: -- Fix Version/s: (was: spark-branch) 1.1.0 > Revert changes in two test configuration files accidently brought in by > HIVE-9552 [Spark Branch] > > > Key: HIVE-9568 > URL: https://issues.apache.org/jira/browse/HIVE-9568 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > Fix For: 1.1.0 > > Attachments: HIVE-9568.1-spark.patch > > > The changes in the following files, while harmless for test, needs to be > reverted because they are unnecessary. > {code} > data/conf/spark/standalone/hive-site.xml > data/conf/spark/yarn-client/hive-site.xml > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7821) StarterProject: enable groupby4.q [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7821: -- Fix Version/s: (was: spark-branch) 1.1.0 > StarterProject: enable groupby4.q [Spark Branch] > > > Key: HIVE-7821 > URL: https://issues.apache.org/jira/browse/HIVE-7821 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Brock Noland >Assignee: Suhas Satish > Fix For: 1.1.0 > > Attachments: HIVE-7821-spark.patch, HIVE-7821.3-spark.patch, > HIVE-7821.4-spark.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9627) Add cbo_gby_empty.q.out for Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9627: -- Fix Version/s: (was: spark-branch) 1.1.0 > Add cbo_gby_empty.q.out for Spark [Spark Branch] > > > Key: HIVE-9627 > URL: https://issues.apache.org/jira/browse/HIVE-9627 > Project: Hive > Issue Type: Test >Affects Versions: spark-branch >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang >Priority: Trivial > Fix For: 1.1.0 > > Attachments: HIVE-9627.1-spark.patch > > > The golden file cbo_gby_empty.q.out for Spark is missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8686) Enable vectorization tests with query results sort [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8686: -- Fix Version/s: (was: spark-branch) 1.1.0 > Enable vectorization tests with query results sort [Spark Branch] > - > > Key: HIVE-8686 > URL: https://issues.apache.org/jira/browse/HIVE-8686 > Project: Hive > Issue Type: Test > Components: Spark >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang >Priority: Trivial > Fix For: 1.1.0 > > Attachments: HIVE-8686.1-spark.patch, HIVE-8686.2-spark.patch > > > Hive-8573 added query results sort to some vectorization tests. Now since the > patch is merged to the spark branch. We can enable these tests in the spark > branch now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7591) GenMapRedUtils::addStatsTask only assumes either MapredWork or TezWork
[ https://issues.apache.org/jira/browse/HIVE-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7591: -- Fix Version/s: (was: spark-branch) 1.1.0 > GenMapRedUtils::addStatsTask only assumes either MapredWork or TezWork > -- > > Key: HIVE-7591 > URL: https://issues.apache.org/jira/browse/HIVE-7591 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Chao Sun >Assignee: Brock Noland > Fix For: 1.1.0 > > Attachments: HIVE-7591-spark.patch > > > When running queries, I got exception like this: > {noformat} > FAILED: ClassCastException org.apache.hadoop.hive.ql.plan.SparkWork cannot be > cast to org.apache.hadoop.hive.ql.plan.TezWork > 14/07/31 15:08:53 ERROR ql.Driver: FAILED: ClassCastException > org.apache.hadoop.hive.ql.plan.SparkWork cannot be cast to > org.apache.hadoop.hive.ql.plan.TezWork > java.lang.ClassCastException: org.apache.hadoop.hive.ql.plan.SparkWork cannot > be cast to org.apache.hadoop.hive.ql.plan.TezWork > at > org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.addStatsTask(GenMapRedUtils.java:1419) > at > org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.isMergeRequired(GenMapRedUtils.java:1645) > at > org.apache.hadoop.hive.ql.parse.spark.GenSparkUtils.processFileSink(GenSparkUtils.java:313) > at > org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.generateTaskTree(SparkCompiler.java:180) > at > org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:199) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9514) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:207) > at > org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:207) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:413) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:309) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:984) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1049) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:921) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:265) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:217) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:427) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:800) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:694) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:633) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {noformat} > Apparently, GenMapRedUtils::addStatsTask only assumes either MapredWork or > TezWork, and since we are introducing SparkWork, this need to be fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8507) UT: fix rcfile_bigdata test [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8507: -- Fix Version/s: (was: spark-branch) 1.1.0 > UT: fix rcfile_bigdata test [Spark Branch] > -- > > Key: HIVE-8507 > URL: https://issues.apache.org/jira/browse/HIVE-8507 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Thomas Friedrich >Assignee: Chinna Rao Lalam >Priority: Minor > Fix For: 1.1.0 > > Attachments: HIVE-8507.1-spark.patch, HIVE-8507.2-spark.patch > > > The tests > groupby_bigdata > rcfile_bigdata > fail because it can't find the dumpdata_script.py file that is referenced in > the script: rcfile_bigdata.q > /usr/bin/python: can't open file 'dumpdata_script.py': [Errno 2] No such file > or directory > There are two references: > add file ../../dumpdata_script.py; > FROM (FROM src MAP src.key,src.value USING 'python dumpdata_script.py' > Since it's using relative path it seems to be related to spark tests being > one level deeper than regular tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7781) Enable windowing and analytic function qtests [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7781: -- Fix Version/s: (was: spark-branch) 1.1.0 > Enable windowing and analytic function qtests [Spark Branch] > > > Key: HIVE-7781 > URL: https://issues.apache.org/jira/browse/HIVE-7781 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Chengxiang Li >Assignee: Chengxiang Li > Fix For: 1.1.0 > > Attachments: HIVE-7781.1-spark.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9216) Avoid redundant clone of JobConf [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9216: -- Fix Version/s: (was: spark-branch) 1.1.0 > Avoid redundant clone of JobConf [Spark Branch] > --- > > Key: HIVE-9216 > URL: https://issues.apache.org/jira/browse/HIVE-9216 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Rui Li >Assignee: Rui Li >Priority: Minor > Fix For: 1.1.0 > > Attachments: HIVE-9216.1-spark.patch > > > Currently in SparkPlanGenerator, we clone job conf twice for each MapWork. > Should avoid this as cloning job conf involves writing to HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7338) Create SparkPlanGenerator [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7338: -- Fix Version/s: (was: spark-branch) 1.1.0 > Create SparkPlanGenerator [Spark Branch] > > > Key: HIVE-7338 > URL: https://issues.apache.org/jira/browse/HIVE-7338 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > Labels: Spark-M1 > Fix For: 1.1.0 > > Attachments: HIVE-7338.patch > > > Translate SparkWork into SparkPlan. The translation may be invoked by > SparkClient when executing SparkTask. > NO PRECOMMIT TESTS. This is for spark branch only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9493) Failed job may not throw exceptions [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9493: -- Fix Version/s: (was: spark-branch) > Failed job may not throw exceptions [Spark Branch] > -- > > Key: HIVE-9493 > URL: https://issues.apache.org/jira/browse/HIVE-9493 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Rui Li >Assignee: Rui Li > Fix For: 1.1.0 > > Attachments: HIVE-9493.1-spark.patch > > > Currently remote driver assumes exception will be thrown when job fails to > run. This may not hold since job is submitted asynchronously. And we have to > check the futures before we decide the job is successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9101) bucket_map_join_spark4.q failed due to NPE.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9101: -- Fix Version/s: (was: spark-branch) 1.1.0 > bucket_map_join_spark4.q failed due to NPE.[Spark Branch] > - > > Key: HIVE-9101 > URL: https://issues.apache.org/jira/browse/HIVE-9101 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Chengxiang Li >Assignee: Jimmy Xiang > Labels: Spark-M4 > Fix For: 1.1.0 > > Attachments: HIVE-9101.1-spark.patch > > > bucket_map_join_spark4.q failed due to the following exception after > HIVE-9078: > {noformat} > 2014-12-15 04:48:56,241 ERROR [Executor task launch worker-0]: > executor.Executor (Logging.scala:logError(96)) - Exception in task 0.3 in > stage 7.0 (TID 15) > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:160) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:28) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115) > at > org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390) > at > org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:722) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:114) > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:193) > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:219) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486) > at > org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:149) > ... 16 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:104) > ... 25 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7527) Support order by and sort by on Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7527: -- Fix Version/s: (was: spark-branch) 1.1.0 > Support order by and sort by on Spark [Spark Branch] > > > Key: HIVE-7527 > URL: https://issues.apache.org/jira/browse/HIVE-7527 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Xuefu Zhang >Assignee: Rui Li > Fix For: 1.1.0 > > Attachments: HIVE-7527-spark.patch, HIVE-7527.2-spark.patch > > > Currently Hive depends completely on MapReduce's sorting as part of shuffling > to achieve order by (global sort, one reducer) and sort by (local sort). > Spark has a sort by transformation in different variations that can used to > support Hive's order by and sort by. However, we still need to evaluate > weather Spark's sortBy can achieve the same functionality inherited from > MapReduce's shuffle sort. > Currently Hive on Spark should be able to run simple sort by or order by, by > changing the currently partitionBy to sortby. This is the way to verify > theories. Complete solution will not be available until we have complete > SparkPlanGenerator. > There is also a question of how we determine that there is order by or sort > by by just looking at the operator tree, from which Spark task is created. > This is the responsibility of SparkPlanGenerator, but we need to have an idea. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9211) Research on build mini HoS cluster on YARN for unit test[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9211: -- Fix Version/s: (was: spark-branch) > Research on build mini HoS cluster on YARN for unit test[Spark Branch] > -- > > Key: HIVE-9211 > URL: https://issues.apache.org/jira/browse/HIVE-9211 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Chengxiang Li >Assignee: Chengxiang Li > Labels: Spark-M5 > Fix For: 1.1.0 > > Attachments: HIVE-9211.1-spark.patch, HIVE-9211.2-spark.patch, > HIVE-9211.2-spark.patch, HIVE-9211.3-spark.patch, HIVE-9211.4-spark.patch, > HIVE-9211.5-spark.patch, HIVE-9211.6-spark.patch, HIVE-9211.7-spark.patch > > > HoS on YARN is a common use case in product environment, we'd better enable > unit test for this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9135) Cache Map and Reduce works in RSC [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9135: -- Fix Version/s: (was: spark-branch) 1.1.0 > Cache Map and Reduce works in RSC [Spark Branch] > > > Key: HIVE-9135 > URL: https://issues.apache.org/jira/browse/HIVE-9135 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Brock Noland >Assignee: Jimmy Xiang > Fix For: 1.1.0 > > Attachments: HIVE-9135.1-spark.patch, HIVE-9135.1-spark.patch, > HIVE-9135.3-spark.patch, HIVE-9135.3.patch, HIVE-9135.4-spark.patch > > > HIVE-9127 works around the fact that we don't cache Map/Reduce works in > Spark. However, other input formats such as HiveInputFormat will not benefit > from that fix. We should investigate how to allow caching on the RSC while > not on tasks (see HIVE-7431). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7717) Add .q tests coverage for "union all" [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7717: -- Fix Version/s: (was: spark-branch) 1.1.0 > Add .q tests coverage for "union all" [Spark Branch] > > > Key: HIVE-7717 > URL: https://issues.apache.org/jira/browse/HIVE-7717 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Na Yang >Assignee: Na Yang > Fix For: 1.1.0 > > Attachments: HIVE-7717.1-spark.patch, HIVE-7717.2-spark.patch, > HIVE-7717.3-spark.patch > > > Add automation test coverage for "union all", by searching through the > q-tests in "ql/src/test/queries/clientpositive/" for union tests (like > union*.q) and verifying/enabling them on spark. > Steps to do: > 1. Enable a qtest .q in > itests/src/test/resources/testconfiguration.properties by adding the .q test > files to spark.query.files. > 2. Run mvn test -Dtest=TestSparkCliDriver -Dqfile=.q > -Dtest.output.overwrite=true -Phadoop-2 to generate the output (located in > ql/src/test/results/clientpositive/spark). File will be called > .q.out. > 3. Check the generated output is good by verifying the results. For > comparison, check the MR version in > ql/src/test/results/clientpositive/.q.out. The reason its > separate is because the explain plan outputs are different for Spark/MR. > 4. Checkin the modification to testconfiguration.properties, and the > generated q.out file as well. You only have to generate the output once. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8438) Clean up code introduced by HIVE-7503 and such [Spark Plan]
[ https://issues.apache.org/jira/browse/HIVE-8438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8438: -- Fix Version/s: (was: spark-branch) 1.1.0 > Clean up code introduced by HIVE-7503 and such [Spark Plan] > --- > > Key: HIVE-8438 > URL: https://issues.apache.org/jira/browse/HIVE-8438 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Xuefu Zhang >Assignee: Chao Sun > Fix For: 1.1.0 > > > With HIVE-8436 and HIVE-8437, we don't need the previouls, incomplete > solution for muti-insert. Thus, we need clean up the unwanted code, including > any disabled optimization or tricks added to make tests pass. All > multi-insert queries should pass. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7584) Change SparkCompiler to generate a SparkWork that contains UnionWork from logical operator tree
[ https://issues.apache.org/jira/browse/HIVE-7584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7584: -- Fix Version/s: (was: spark-branch) 1.1.0 > Change SparkCompiler to generate a SparkWork that contains UnionWork from > logical operator tree > --- > > Key: HIVE-7584 > URL: https://issues.apache.org/jira/browse/HIVE-7584 > Project: Hive > Issue Type: Task > Components: Spark >Affects Versions: spark-branch >Reporter: Na Yang >Assignee: Na Yang > Fix For: 1.1.0 > > Attachments: HIVE-7584.1-spark.patch > > > This is a subtask of supporting union all operation for Hive on Spark > We need to change the current SparkCompiler to generate a SparkWork that > contains UnionWork from logical operator tree. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8777) Should only register used counters in SparkCounters[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8777: -- Fix Version/s: (was: spark-branch) 1.1.0 > Should only register used counters in SparkCounters[Spark Branch] > - > > Key: HIVE-8777 > URL: https://issues.apache.org/jira/browse/HIVE-8777 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Chengxiang Li >Assignee: Chengxiang Li > Labels: Spark-M3 > Fix For: 1.1.0 > > Attachments: HIVE-8777.1-spark.patch > > > Currently we register all hive operator counters in SparkCounters, while > actually not all hive operators are used in SparkTask, we should iterate > SparkTask's operators, and only register conuters required. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9487) Make Remote Spark Context secure [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9487: -- Fix Version/s: (was: spark-branch) > Make Remote Spark Context secure [Spark Branch] > --- > > Key: HIVE-9487 > URL: https://issues.apache.org/jira/browse/HIVE-9487 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Labels: TODOC-SPARK > Fix For: 1.1.0 > > Attachments: HIVE-9487.1-spark.patch, HIVE-9487.2-spark.patch > > > The RSC currently uses an ad-hoc, insecure authentication mechanism. We > should instead use a proper auth mechanism and add encryption to the mix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9116) Add unit test for multi sessions.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9116: -- Fix Version/s: (was: spark-branch) 1.1.0 > Add unit test for multi sessions.[Spark Branch] > --- > > Key: HIVE-9116 > URL: https://issues.apache.org/jira/browse/HIVE-9116 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Chengxiang Li >Assignee: Chengxiang Li > Labels: Spark-M4 > Fix For: 1.1.0 > > Attachments: HIVE-9116.1-spark.patch > > > HS2 multi sessions support is enabled in HoS, we should add some unit tests > for verification and regression test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8457) MapOperator initialization fails when multiple Spark threads is enabled [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8457: -- Fix Version/s: (was: spark-branch) 1.1.0 > MapOperator initialization fails when multiple Spark threads is enabled > [Spark Branch] > -- > > Key: HIVE-8457 > URL: https://issues.apache.org/jira/browse/HIVE-8457 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Chao Sun >Assignee: Chao Sun > Fix For: 1.1.0 > > Attachments: HIVE-8457.1-spark.patch, HIVE-8457.2-spark.patch > > > Currently, on the Spark branch, each thread it is bound with a thread-local > IOContext, which gets initialized when we generates an input {{HadoopRDD}}, > and later used in {{MapOperator}}, {{FilterOperator}}, etc. > And, given the introduction of HIVE-8118, we may have multiple downstream > RDDs that share the same input {{HadoopRDD}}, and we would like to have the > {{HadoopRDD}} to be cached, to avoid scanning the same table multiple times. > A typical case would be like the following: > {noformat} > inputRDD inputRDD > || >MT_11MT_12 > || >RT_1 RT_2 > {noformat} > Here, {{MT_11}} and {{MT_12}} are {{MapTran}} from a splitted {{MapWork}}, > and {{RT_1}} and {{RT_2}} are two {{ReduceTran}}. Note that, this example is > simplified, as we may also have {{ShuffleTran}} between {{MapTran}} and > {{ReduceTran}}. > When multiple Spark threads are running, {{MT_11}} may be executed first, and > it will ask for an iterator from the {{HadoopRDD}} will trigger the creation > of the iterator, which in turn triggers the initialization of the > {{IOContext}} associated with that particular thread. > *Now, the problem is*: before {{MT_12}} starts executing, it will also ask > for an iterator from the > {{HadoopRDD}}, and since the RDD is already cached, instead of creating a new > iterator, it will just fetch it from the cached result. However, *this will > skip the initialization of the IOContext associated with this particular > thread*. And, when {{MT_12}} starts executing, it will try to initialize the > {{MapOperator}}, but since the {{IOContext}} is not initialized, this will > fail miserably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8054) Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8054: -- Fix Version/s: (was: spark-branch) 1.1.0 > Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark > Branch] > -- > > Key: HIVE-8054 > URL: https://issues.apache.org/jira/browse/HIVE-8054 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Xuefu Zhang >Assignee: Na Yang > Labels: Spark-M1, TODOC-SPARK > Fix For: 1.1.0 > > Attachments: HIVE-8054-spark.patch, HIVE-8054.2-spark.patch, > HIVE-8054.3-spark.patch > > > Option hive.optimize.union.remove introduced in HIVE-3276 removes union > operators from the operator graph in certain cases as an optimization reduce > the number of MR jobs. While making sense in MR, this optimization is > actually harmful to an execution engine such as Spark, which natives supports > union without requiring additional jobs. This is because removing union > operator creates disjointed operator graphs, each graph generating a job, and > thus this optimization requires more jobs to run the query. Not to mention > the additional complexity handling linked FS descriptors. > I propose that we disable such optimization when the execution engine is > Spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7580) Support dynamic partitioning [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7580: -- Fix Version/s: (was: spark-branch) 1.1.0 > Support dynamic partitioning [Spark Branch] > --- > > Key: HIVE-7580 > URL: https://issues.apache.org/jira/browse/HIVE-7580 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Xuefu Zhang >Assignee: Chinna Rao Lalam > Labels: Spark-M1 > Fix For: 1.1.0 > > Attachments: HIVE-7580.1-spark.patch, HIVE-7580.patch > > > My understanding is that we don't need to do anything special for this. > However, this needs to be verified and tested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7954) Investigate query failures (3)
[ https://issues.apache.org/jira/browse/HIVE-7954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7954: -- Fix Version/s: (was: spark-branch) 1.1.0 > Investigate query failures (3) > -- > > Key: HIVE-7954 > URL: https://issues.apache.org/jira/browse/HIVE-7954 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Brock Noland >Assignee: Thomas Friedrich > Fix For: 1.1.0 > > > I ran all q-file tests and the following failed with an exception: > http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/HIVE-SPARK-ALL-TESTS-Build/lastCompletedBuild/testReport/ > we don't necessary want to run all these tests as part of the spark tests, > but we should understand why they failed with an exception. This JIRA is to > look into these failures and document them with one of: > * New JIRA > * Covered under existing JIRA > * More investigation required > Tests: > {noformat} > > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_root_dir_external_table > 0.28 sec2 > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_view > 12 sec 2 > > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_complex_types > 1.5 sec 2 > > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_multi_insert_common_distinct > 3.9 sec 2 > > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_authorization_admin_almighty2 > 2.6 sec 2 > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_quotedid_smb > 3.2 sec 2 > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_input20 1.5 sec > 2 > > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_dbtxnmgr_showlocks >0.23 sec2 > > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketsortoptimize_insert_5 > 9.9 sec 2 > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_schemeAuthority > 0.54 sec2 > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucket5 1.9 sec > 2 > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_load_fs2 0.83 > sec2 > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_lock44.3 sec > 2 > > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_exim_14_managed_location_over_existing >1 sec 2 > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_udf_in_file > 0.73 sec2 > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_lock10.92 > sec2 > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mi 1.9 sec > 2 > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_nullformatdir > 1 sec 2 > > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_exim_13_managed_location > 3.4 sec 2 > > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_import_exported_table > 2.6 sec 2 > > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_correlationoptimizer8 > 10 sec 2 > > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_authorization_create_macro1 > 2.5 sec 2 > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_stats4 2.5 sec > 2 > > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_exim_11_managed_external > 0.99 sec2 > > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_complex_types_multi_single_reducer >8.2 sec 2 > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_nullgroup5 > 1.2 sec 2 > > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_5 > 9.9 sec 2 > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_lock34.2 sec > 2 > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_union_view > 4.1 sec 2 > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_sample10 2.5 sec > 2 > > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_rename_external_partition_location >2 sec 2 > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_remote_script > 0.35 sec2 > > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_exim_12_external_location > 1 sec 2 > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_load_dyn_part1 > 6.4 sec 2 > > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_authorization_insert > 3.6 sec 2 > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_newline 4.2 sec > 2 > > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_file_with_header_footer > 2.7 sec 2 > org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_17 > 10 sec 2 > > org.apache.hadoop.hive.cli.TestSparkCliD
[jira] [Updated] (HIVE-7567) support automatic calculating reduce task number [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7567: -- Fix Version/s: (was: spark-branch) 1.1.0 > support automatic calculating reduce task number [Spark Branch] > --- > > Key: HIVE-7567 > URL: https://issues.apache.org/jira/browse/HIVE-7567 > Project: Hive > Issue Type: Task > Components: Spark >Reporter: Chengxiang Li >Assignee: Chengxiang Li > Labels: TODOC-SPARK > Fix For: 1.1.0 > > Attachments: HIVE-7567.1-spark.patch, HIVE-7567.2-spark.patch, > HIVE-7567.3-spark.patch, HIVE-7567.4-spark.patch, HIVE-7567.5-spark.patch, > HIVE-7567.6-spark.patch > > > Hive have its own machenism to calculate reduce task number, we need to > implement it on spark job. > NO PRECOMMIT TESTS. This is for spark-branch only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7541) Support union all on Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7541: -- Fix Version/s: (was: spark-branch) 1.1.0 > Support union all on Spark [Spark Branch] > - > > Key: HIVE-7541 > URL: https://issues.apache.org/jira/browse/HIVE-7541 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Xuefu Zhang >Assignee: Na Yang > Fix For: 1.1.0 > > Attachments: HIVE-7541.1-spark.patch, HIVE-7541.2-spark.patch, > HIVE-7541.3-spark.patch, HIVE-7541.4-spark.patch, HIVE-7541.5-spark.patch, > Hive on Spark Union All design.pdf > > > For union all operator, we will use Spark's union transformation. Refer to > the design doc on wiki for more information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8649) Increase level of parallelism in reduce phase [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8649: -- Fix Version/s: (was: spark-branch) 1.1.0 > Increase level of parallelism in reduce phase [Spark Branch] > > > Key: HIVE-8649 > URL: https://issues.apache.org/jira/browse/HIVE-8649 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Brock Noland >Assignee: Jimmy Xiang > Fix For: 1.1.0 > > Attachments: HIVE-8649.1-spark.patch, HIVE-8649.2-spark.patch > > > We calculate the number of reducers based on the same code for MapReduce. > However, reducers are vastly cheaper in Spark and it's generally recommended > we have many more reducers than in MR. > Sandy Ryza who works on Spark has some ideas about a heuristic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8437) Modify SparkPlan generation to set toCache flag to SparkTrans where caching is needed [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8437: -- Fix Version/s: (was: spark-branch) 1.1.0 > Modify SparkPlan generation to set toCache flag to SparkTrans where caching > is needed [Spark Branch] > > > Key: HIVE-8437 > URL: https://issues.apache.org/jira/browse/HIVE-8437 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Xuefu Zhang > Fix For: 1.1.0 > > > HIVE-8436 may modify the SparkWork right before SparkPlan generation. When > this happens, the output from some SparkTrans needs to be cached to avoid > regenerating the RDD. For more information, please refer to the design doc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9007) Hive may generate wrong plan for map join queries due to IdentityProjectRemover [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9007: -- Fix Version/s: (was: spark-branch) 1.1.0 > Hive may generate wrong plan for map join queries due to > IdentityProjectRemover [Spark Branch] > -- > > Key: HIVE-9007 > URL: https://issues.apache.org/jira/browse/HIVE-9007 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Chao Sun >Assignee: Szehon Ho > Fix For: 1.1.0 > > Attachments: HIVE-9007-spark.patch, HIVE-9007.2-spark.patch > > > HIVE-8435 introduces a new logical optimizer called IdentityProjectRemover, > which may cause map join in spark branch to generate wrong plan. > Currently, the map join conversion in spark branch first goes through a > method {{convertJoinMapJoin}}, which replaces a join op with a mapjoin op, > removes RS associated with big table, and keep RSs for all small tables. > Afterwards, in {{SparkReduceSinkMapJoinProc}} it replaces all parent RSs of > the mapjoin op with HTS (note it doesn't check whether the RS belongs to > small table or big table.) > The issue arises, when IdentityProjectRemover comes into play, which may > result into a situation that a operator tree has two consecutive RSs. Imaging > the following example: > {noformat} > Join MapJoin > / \/ \ > RS RS ---> RS RS >/ \ / \ > TS RS TS TS (big table) > \ (small table) > TS > {noformat} > In this case, all parents of the mapjoin op will be RS, even the branch for > big table! In {{SparkReduceSinkMapJoinProc}}, they will be replaced with HTS, > which is obviously incorrect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8841) Make RDD caching work for multi-insert after HIVE-8793 when map join is involved [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8841: -- Fix Version/s: (was: spark-branch) 1.1.0 > Make RDD caching work for multi-insert after HIVE-8793 when map join is > involved [Spark Branch] > --- > > Key: HIVE-8841 > URL: https://issues.apache.org/jira/browse/HIVE-8841 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Xuefu Zhang >Assignee: Rui Li > Fix For: 1.1.0 > > Attachments: HIVE-8841.1-spark.patch > > > Splitting SparkWork now happens before MapJoinResolver. As MapJoinResolve may > further spins off a dependent SparkWork for small tables of a join, we need > to make Spark RDD caching continue work even across SparkWorks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8788) UT: fix partition test case [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8788: -- Fix Version/s: (was: spark-branch) 1.1.0 > UT: fix partition test case [Spark Branch] > -- > > Key: HIVE-8788 > URL: https://issues.apache.org/jira/browse/HIVE-8788 > Project: Hive > Issue Type: Sub-task > Components: Tests >Affects Versions: spark-branch >Reporter: Thomas Friedrich >Assignee: Chinna Rao Lalam > Fix For: 1.1.0 > > Attachments: HIVE-8788-spark.patch, HIVE-8788.1-spark.patch > > > The test limit_partition_metadataonly fails with > 2014-11-06 18:40:12,891 ERROR ql.Driver (SessionState.java:printError(829)) - > FAILED: SemanticException Number of partitions scanned (=4) on table srcpart > exceeds limit (=1). This is controlled by > hive.limit.query.max.table.partition. > org.apache.hadoop.hive.ql.parse.SemanticException: Number of partitions > scanned (=4) on table srcpart exceeds limit (=1). This is controlled by > hive.limit.query.max.table.partition. > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.enforceScanLimits(SemanticAnalyzer.java:10358) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10190) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:419) > In the test, SemanticAnalyzer.enforceScanLimits expects only 1 partition > ds=2008-04-08/hr=11 but gets 4 partitions: > [srcpart(ds=2008-04-08/hr=11), srcpart(ds=2008-04-08/hr=12), > srcpart(ds=2008-04-09/hr=11), srcpart(ds=2008-04-09/hr=12)] > In the log it shows that the ParitionPruner ran, and it should have only > retained one partition: > 2014-11-07 14:18:09,147 DEBUG ppr.PartitionPruner > (PartitionPruner.java:prune(206)) - Filter w/ compacting: ((hr = 11) and (ds > = '2008-04-08')); filter w/o compacting: ((hr = 11) and (ds = '2008-04-08')) > 2014-11-07 14:18:09,147 INFO metastore.HiveMetaStore > (HiveMetaStore.java:logInfo(719)) - 0: get_partitions_by_expr : db=default > tbl=srcpart > 2014-11-07 14:18:09,165 DEBUG ppr.PartitionPruner > (PartitionPruner.java:prunePartitionNames(491)) - retained partition: > ds=2008-04-08/hr=11 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8924) Investigate test failure for join_empty.q [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8924: -- Fix Version/s: (was: spark-branch) 1.1.0 > Investigate test failure for join_empty.q [Spark Branch] > > > Key: HIVE-8924 > URL: https://issues.apache.org/jira/browse/HIVE-8924 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: spark-branch >Reporter: Chao Sun >Assignee: Szehon Ho > Fix For: 1.1.0 > > Attachments: HIVE-8924-spark.patch, HIVE-8924.2-spark.patch, > HIVE-8924.3-spark.patch, HIVE-8924.4-spark.patch > > > This query has an interesting case where the big table work is empty. Here's > the MR plan: > {noformat} > STAGE DEPENDENCIES: > Stage-4 is a root stage > Stage-3 depends on stages: Stage-4 > Stage-0 depends on stages: Stage-3 > STAGE PLANS: > Stage: Stage-4 > Map Reduce Local Work > Alias -> Map Local Tables: > b > Fetch Operator > limit: -1 > Alias -> Map Local Operator Tree: > b > TableScan > alias: b > Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE > Column stats: NONE > Filter Operator > predicate: UDFToDouble(key) is not null (type: boolean) > Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE > Column stats: NONE > HashTable Sink Operator > condition expressions: > 0 {key} > 1 {value} > keys: > 0 UDFToDouble(key) (type: double) > 1 UDFToDouble(key) (type: double) > Stage: Stage-3 > Map Reduce > Local Work: > Map Reduce Local Work > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink > {noformat} > The plan for Spark is not correct. We need to investigate the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7540) NotSerializableException encountered when using sortByKey transformation
[ https://issues.apache.org/jira/browse/HIVE-7540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7540: -- Fix Version/s: (was: spark-branch) 1.1.0 > NotSerializableException encountered when using sortByKey transformation > > > Key: HIVE-7540 > URL: https://issues.apache.org/jira/browse/HIVE-7540 > Project: Hive > Issue Type: Bug > Components: Spark > Environment: Spark-1.0.1 >Reporter: Rui Li >Assignee: Rui Li > Fix For: 1.1.0 > > Attachments: HIVE-7540-spark.patch, HIVE-7540.2-spark.patch, > HIVE-7540.3-spark.patch > > > This exception is thrown when sortByKey is used as the shuffle transformation > between MapWork and ReduceWork: > {quote} > org.apache.spark.SparkException: Job aborted due to stage failure: Task not > serializable: java.io.NotSerializableException: > org.apache.hadoop.io.BytesWritable > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1049) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031) > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:772) > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:715) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:719) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:718) > at scala.collection.immutable.List.foreach(List.scala:318) > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:718) > at > org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:699) > … > {quote} > The root cause is that the RangePartitioner used by sortByKey contains > rangeBounds: Array[BytesWritable], which is considered not serializable in > spark. > A workaround to this issue is to set the number of partitions to 1 when > calling sortByKey, in which case the rangeBounds will be just an empty array. > NO PRECOMMIT TESTS. This is for spark branch only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7613) Research optimization of auto convert join to map join [Spark branch]
[ https://issues.apache.org/jira/browse/HIVE-7613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7613: -- Fix Version/s: (was: spark-branch) 1.1.0 > Research optimization of auto convert join to map join [Spark branch] > - > > Key: HIVE-7613 > URL: https://issues.apache.org/jira/browse/HIVE-7613 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Chengxiang Li >Assignee: Suhas Satish >Priority: Minor > Fix For: 1.1.0 > > Attachments: HIve on Spark Map join background.docx, Hive on Spark > Join Master Design.pdf, small_table_broadcasting.pdf > > > ConvertJoinMapJoin is an optimization the replaces a common join(aka shuffle > join) with a map join(aka broadcast or fragment replicate join) when > possible. we need to research how to make it workable with Hive on Spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8842) auto_join2.q produces incorrect tree [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8842: -- Fix Version/s: (was: spark-branch) 1.1.0 > auto_join2.q produces incorrect tree [Spark Branch] > --- > > Key: HIVE-8842 > URL: https://issues.apache.org/jira/browse/HIVE-8842 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Szehon Ho >Assignee: Chao Sun > Fix For: 1.1.0 > > Attachments: HIVE-8842.1-spark.patch, HIVE-8842.2-spark.patch, > HIVE-8842.3-spark.patch > > > Enabling the SparkMapJoinResolver and SparkReduceSinkMapJoinProc, I see the > following: > {noformat} > explain select * from src src1 JOIN src src2 ON (src1.key = src2.key) JOIN > src src3 ON (src1.key + src2.key = src3.key); > {noformat} > produces too many stages (six), and too many HashTableSink. > {noformat} > STAGE DEPENDENCIES: > Stage-5 is a root stage > Stage-4 depends on stages: Stage-5 > Stage-3 depends on stages: Stage-4 > Stage-7 is a root stage > Stage-6 depends on stages: Stage-7 > Stage-0 is a root stage > STAGE PLANS: > Stage: Stage-5 > Spark > DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:3 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: src2 > Statistics: Num rows: 29 Data size: 5812 Basic stats: > COMPLETE Column stats: NONE > Filter Operator > predicate: key is not null (type: boolean) > Statistics: Num rows: 15 Data size: 3006 Basic stats: > COMPLETE Column stats: NONE > HashTable Sink Operator > condition expressions: > 0 {key} {value} > 1 {key} {value} > keys: > 0 key (type: string) > 1 key (type: string) > Stage: Stage-4 > Spark > DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:2 > Vertices: > Map 3 > Map Operator Tree: > TableScan > alias: src1 > Statistics: Num rows: 29 Data size: 5812 Basic stats: > COMPLETE Column stats: NONE > Filter Operator > predicate: key is not null (type: boolean) > Statistics: Num rows: 15 Data size: 3006 Basic stats: > COMPLETE Column stats: NONE > Map Join Operator > condition map: >Inner Join 0 to 1 > condition expressions: > 0 {key} {value} > 1 {key} {value} > keys: > 0 key (type: string) > 1 key (type: string) > outputColumnNames: _col0, _col1, _col5, _col6 > input vertices: > 1 Map 1 > Statistics: Num rows: 16 Data size: 3306 Basic stats: > COMPLETE Column stats: NONE > Filter Operator > predicate: (_col0 + _col5) is not null (type: boolean) > Statistics: Num rows: 8 Data size: 1653 Basic stats: > COMPLETE Column stats: NONE > HashTable Sink Operator > condition expressions: > 0 {_col0} {_col1} {_col5} {_col6} > 1 {key} {value} > keys: > 0 (_col0 + _col5) (type: double) > 1 UDFToDouble(key) (type: double) > Stage: Stage-3 > Spark > DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:1 > Vertices: > Map 2 > Map Operator Tree: > TableScan > alias: src3 > Statistics: Num rows: 29 Data size: 5812 Basic stats: > COMPLETE Column stats: NONE > Filter Operator > predicate: UDFToDouble(key) is not null (type: boolean) > Statistics: Num rows: 15 Data size: 3006 Basic stats: > COMPLETE Column stats: NONE > Map Join Operator > condition map: >Inner Join 0 to 1 > condition expressions: > 0 {_col0} {_col1} {_col5} {_col6} > 1 {key} {value} > keys: > 0 (_col0 + _col5) (type: double) > 1 UDFToDouble(key) (type: double) > outputColumnNames: _col0, _col1,
[jira] [Updated] (HIVE-9088) Spark counter serialization error in spark.log [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9088: -- Fix Version/s: (was: spark-branch) 1.1.0 > Spark counter serialization error in spark.log [Spark Branch] > - > > Key: HIVE-9088 > URL: https://issues.apache.org/jira/browse/HIVE-9088 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Xuefu Zhang >Assignee: Chengxiang Li > Fix For: 1.1.0 > > Attachments: HIVE-9088.1-spark.patch > > > It seems that the counter didn't get registered. Increasing it in executor > caused this error. Task itself succeeds nevertheless. > {code} > 2014-12-11 05:24:48,951 ERROR [Executor task launch worker-0]: > counter.SparkCounters (SparkCounters.java:increment(83)) - > counter[HIVE, RECORDS_IN] has not initialized before. > 2014-12-11 05:24:48,951 ERROR [Executor task launch worker-0]: > counter.SparkCounters (SparkCounters.java:increment(83)) - > counter[HIVE, DESERIALIZE_ERRORS] has not initialized before. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7816) Enable map-join tests which Tez executes [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7816: -- Fix Version/s: (was: spark-branch) 1.1.0 > Enable map-join tests which Tez executes [Spark Branch] > --- > > Key: HIVE-7816 > URL: https://issues.apache.org/jira/browse/HIVE-7816 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Brock Noland >Assignee: Rui Li > Fix For: 1.1.0 > > Attachments: HIVE-7816.1-spark.patch, HIVE-7816.2-spark.patch > > > > {noformat} > auto_join0.q,\ > auto_join1.q,\ > cross_join.q,\ > cross_product_check_1.q,\ > cross_product_check_2.q,\ > {noformat} > {noformat} > filter_join_breaktask.q,\ > filter_join_breaktask2.q > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)