[ https://issues.apache.org/jira/browse/HIVE-27056?focusedWorklogId=845547&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-845547 ]
ASF GitHub Bot logged work on HIVE-27056: ----------------------------------------- Author: ASF GitHub Bot Created on: 15/Feb/23 07:09 Start Date: 15/Feb/23 07:09 Worklog Time Spent: 10m Work Description: abstractdog merged PR #4037: URL: https://github.com/apache/hive/pull/4037 Issue Time Tracking ------------------- Worklog Id: (was: 845547) Time Spent: 2.5h (was: 2h 20m) > Ensure that MR distcp goes to the same Yarn queue as other parts of the query > ----------------------------------------------------------------------------- > > Key: HIVE-27056 > URL: https://issues.apache.org/jira/browse/HIVE-27056 > Project: Hive > Issue Type: Improvement > Reporter: László Bodor > Assignee: László Bodor > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Attachments: image (1).png > > Time Spent: 2.5h > Remaining Estimate: 0h > > Given the following plan for an EXPORT TABLE query: > {code} > +----------------------------------------------------+ > | Explain | > +----------------------------------------------------+ > | STAGE DEPENDENCIES: | > | Stage-2 is a root stage | > | Stage-3 depends on stages: Stage-2 | > | Stage-1 depends on stages: Stage-3 | > | Stage-6 depends on stages: Stage-1 | > | Stage-5 depends on stages: Stage-6 | > | Stage-7 depends on stages: Stage-5 | > | | > | STAGE PLANS: | > | Stage: Stage-2 | > | Tez | > | DagId: hive_20230124103803_b865e63c-4565-4b95-8664-8c983938697a:7 | > | DagName: hive_20230124103803_b865e63c-4565-4b95-8664-8c983938697a:7 | > | Vertices: | > | Map 1 | > | Map Operator Tree: | > | TableScan | > | alias: distcptest | > | Statistics: Num rows: 2 Data size: 182658 Basic stats: > COMPLETE Column stats: COMPLETE | > | Select Operator | > | expressions: id (type: int), txt (type: string) | > | outputColumnNames: _col0, _col1 | > | Statistics: Num rows: 2 Data size: 182658 Basic stats: > COMPLETE Column stats: COMPLETE | > | File Output Operator | > | compressed: false | > | Statistics: Num rows: 2 Data size: 182658 Basic > stats: COMPLETE Column stats: COMPLETE | > | table: | > | input format: > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat | > | output format: > org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat | > | serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde | > | name: > bigd35368.distcptest_ced9015c_852b_49a4_8c28_4d284d598523 | > | Execution mode: vectorized | > | | > | Stage: Stage-3 | > | Dependency Collection | > | | > | Stage: Stage-1 | > | Move Operator | > | tables: | > | replace: false | > | table: | > | input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat > | > | output format: > org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat | > | serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde | > | name: > bigd35368.distcptest_ced9015c_852b_49a4_8c28_4d284d598523 | > | | > | Stage: Stage-6 | > | Set Properties | > | table name: bigd35368.distcptest_ced9015c_852b_49a4_8c28_4d284d598523 > | > | properties: | > | transactional true | > | | > | Stage: Stage-5 | > | Export Work | > | | > | Stage: Stage-7 | > | Drop Table | > | table: bigd35368.distcptest_ced9015c_852b_49a4_8c28_4d284d598523 | > | | > +----------------------------------------------------+ > {code} > here, Stage-5 Export Work starts MR distcp jobs > the problem and solution are described in code comment, pasted below: > {code} > /** > * This method ensures if there is an explicit tez.queue.name set, the > hadoop shim will submit jobs > * to the same yarn queue. This solves a security issue where e.g settings > have the following values: > * tez.queue.name=sample > * hive.server2.tez.queue.access.check=true > * In this case, when a query submits Tez DAGs, the tez client layer checks > whether the end user has access to > * the yarn queue 'sample' via YarnQueueHelper, but this is not respected > in case of MR jobs that run > * even if the query execution engine is Tez. E.g. an EXPORT TABLE can > submit DistCp MR jobs at some stages when > * certain criteria are met. We tend to restrict the setting of > mapreduce.job.queuename in order to bypass this > * security flaw, and even the default queue is unexpected if we explicitly > set tez.queue.name. > * Under the hood the desired behavior is to have DistCp jobs in the same > yarn queue as other parts > * of the query. Most of the time, the user isn't aware that a query > involves DistCp jobs, hence isn't aware > * of these details. > */ > {code} > after the fix, the MR job went into the same yarn queue: > https://issues.apache.org/jira/secure/attachment/13055215/image%20%281%29.png -- This message was sent by Atlassian Jira (v8.20.10#820010)