[ https://issues.apache.org/jira/browse/HIVE-20304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yongzhi Chen reassigned HIVE-20304: ----------------------------------- Assignee: Hui Huang (was: Yongzhi Chen) > When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, > and the execution engine is mr, same stage may launch twice due to the wrong > generated plan > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > Key: HIVE-20304 > URL: https://issues.apache.org/jira/browse/HIVE-20304 > Project: Hive > Issue Type: Bug > Components: CLI > Affects Versions: 1.2.1, 2.3.3 > Reporter: Hui Huang > Assignee: Hui Huang > Priority: Major > Fix For: 1.2.1, 4.0.0 > > Attachments: HIVE-20304.1.patch, HIVE-20304.2.patch, HIVE-20304.patch > > > `When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, > and the execution engine is set to mr, same stage of a query may launch twice > due to the wrong generated plan. If hive.exec.parallel is also true, the same > stage will launch at the same time and the job will failed due to the first > completed stage clear the map.xml/reduce.xml file stored in the hdfs. > use following sql to reproduce the issue: > {code:java} > CREATE TABLE `tbl1`( > `fence` string); > CREATE TABLE `tbl2`( > `order_id` string, > `phone` string, > `search_id` string > ) > PARTITIONED BY ( > `dt` string); > CREATE TABLE `tbl3`( > `order_id` string, > `platform` string) > PARTITIONED BY ( > `dt` string); > CREATE TABLE `tbl4`( > `groupname` string, > `phone` string) > PARTITIONED BY ( > `dt` string); > CREATE TABLE `tbl5`( > `search_id` string, > `fence` string) > PARTITIONED BY ( > `dt` string); > SET hive.exec.parallel = TRUE; > SET hive.auto.convert.join = TRUE; > SET hive.optimize.skewjoin = TRUE; > SELECT dt, > platform, > groupname, > count(1) as cnt > FROM > (SELECT dt, > platform, > groupname > FROM > (SELECT fence > FROM tbl1)ta > JOIN > (SELECT a0.dt, > a1.platform, > a2.groupname, > a3.fence > FROM > (SELECT dt, > order_id, > phone, > search_id > FROM tbl2 > WHERE dt =20180703 )a0 > JOIN > (SELECT order_id, > platform, > dt > FROM tbl3 > WHERE dt =20180703 )a1 ON a0.order_id = a1.order_id > INNER JOIN > (SELECT groupname, > phone, > dt > FROM tbl4 > WHERE dt =20180703 )a2 ON a0.phone = a2.phone > LEFT JOIN > (SELECT search_id, > fence, > dt > FROM tbl5 > WHERE dt =20180703)a3 ON a0.search_id = a3.search_id)t0 ON > ta.fence = t0.fence)t11 > GROUP BY dt, > platform, > groupname; > DROP TABLE tbl1; > DROP TABLE tbl2; > DROP TABLE tbl3; > DROP TABLE tbl4; > DROP TABLE tbl5; > {code} > We will get some error message like this: > Examining task ID: task_1531284442065_3637_m_000000 (and more) from job > job_1531284442065_3637 > Task with the most failures(4): > ---- > Task ID: > task_1531284442065_3637_m_000000 > URL: > > [http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1531284442065_3637&tipid=task_1531284442065_3637_m_000000] > ---- > Diagnostic Messages for this Task: > File does not exist: > hdfs://test/tmp/hive-hadoop/hadoop/fe5efa94-abb1-420f-b6ba-ec782e7b79ad/hive_2018-08-03_17-00-17_707_592882314975289971-5/-mr-10045/757eb1f7-7a37-4a7e-abc0-4a3b8b06510c/reduce.xml > java.io.FileNotFoundException: File does not exist: > hdfs://test/tmp/hive-hadoop/hadoop/fe5efa94-abb1-420f-b6ba-ec782e7b79ad/hive_2018-08-03_17-00-17_707_592882314975289971-5/-mr-10045/757eb1f7-7a37-4a7e-abc0-4a3b8b06510c/reduce.xml > Looking into the plan by executing explain, I found that the Stage-4 and > Stage-5 can reached from multi root tasks. > {code:java} > Explain > STAGE DEPENDENCIES: > Stage-21 is a root stage , consists of Stage-34, Stage-5 > Stage-34 has a backup stage: Stage-5 > Stage-20 depends on stages: Stage-34 > Stage-17 depends on stages: Stage-5, Stage-18, Stage-20 , consists of > Stage-32, Stage-33, Stage-1 > Stage-32 has a backup stage: Stage-1 > Stage-15 depends on stages: Stage-32 > Stage-10 depends on stages: Stage-1, Stage-15, Stage-16 , consists of > Stage-31, Stage-2 > Stage-31 > Stage-9 depends on stages: Stage-31 > Stage-2 depends on stages: Stage-9 > Stage-33 has a backup stage: Stage-1 > Stage-16 depends on stages: Stage-33 > Stage-1 > Stage-5 > Stage-27 is a root stage , consists of Stage-37, Stage-38, Stage-4 > Stage-37 has a backup stage: Stage-4 > Stage-25 depends on stages: Stage-37 > Stage-12 depends on stages: Stage-4, Stage-22, Stage-23, Stage-25, Stage-26 > , consists of Stage-36, Stage-5 > Stage-36 > Stage-11 depends on stages: Stage-36 > Stage-19 depends on stages: Stage-11 , consists of Stage-35, Stage-5 > Stage-35 has a backup stage: Stage-5 > Stage-18 depends on stages: Stage-35 > Stage-38 has a backup stage: Stage-4 > Stage-26 depends on stages: Stage-38 > Stage-4 > Stage-30 is a root stage , consists of Stage-42, Stage-43, Stage-3 > Stage-42 has a backup stage: Stage-3 > Stage-28 depends on stages: Stage-42 > Stage-14 depends on stages: Stage-3, Stage-28, Stage-29 , consists of > Stage-41, Stage-4 > Stage-41 > Stage-13 depends on stages: Stage-41 > Stage-24 depends on stages: Stage-13 , consists of Stage-39, Stage-40, > Stage-4 > Stage-39 has a backup stage: Stage-4 > Stage-22 depends on stages: Stage-39 > Stage-40 has a backup stage: Stage-4 > Stage-23 depends on stages: Stage-40 > Stage-43 has a backup stage: Stage-3 > Stage-29 depends on stages: Stage-43 > Stage-3 > Stage-0 depends on stages: Stage-2 > {code} > After skewjoin optimization, the processed node is added into the listTasks > of ConditionalTask and the parentTask of the processed node is removed and > during the commonJoin optimization of listTasks of ConditionalTask, the new > generated condTask will be added into root task list due to parentTask is > null. > workaround: do not set hive.optimize.skewjoin and hive.auto.convert.join to > true at the same time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)