[ https://issues.apache.org/jira/browse/HIVE-22416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated HIVE-22416: ---------------------------------- Labels: pull-request-available (was: ) > MR-related operation logs missing when parallel execution is enabled > --------------------------------------------------------------------- > > Key: HIVE-22416 > URL: https://issues.apache.org/jira/browse/HIVE-22416 > Project: Hive > Issue Type: Bug > Reporter: Karen Coppage > Assignee: Karen Coppage > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-1 > > Attachments: HIVE-22416.01.patch > > > Repro steps: > 1. Happy path, parallel execution disabled > {code:java} > 0: jdbc:hive2://localhost:10000> set hive.exec.parallel=false; > No rows affected (0.023 seconds) > 0: jdbc:hive2://localhost:10000> select count (*) from t1; > INFO : Compiling > command(queryId=karencoppage_20191028152610_a26c25e1-9834-446a-9a56-c676cb693e7d): > select count (*) from t1 > INFO : Semantic Analysis Completed > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:c0, > type:bigint, comment:null)], properties:null) > INFO : Completed compiling > command(queryId=karencoppage_20191028152610_a26c25e1-9834-446a-9a56-c676cb693e7d); > Time taken: 0.309 seconds > INFO : Executing > command(queryId=karencoppage_20191028152610_a26c25e1-9834-446a-9a56-c676cb693e7d): > select count (*) from t1 > WARN : > INFO : Query ID = > karencoppage_20191028152610_a26c25e1-9834-446a-9a56-c676cb693e7d > INFO : Total jobs = 1 > INFO : Launching Job 1 out of 1 > INFO : Starting task [Stage-1:MAPRED] in serial mode > INFO : Number of reduce tasks determined at compile time: 1 > INFO : In order to change the average load for a reducer (in bytes): > INFO : set hive.exec.reducers.bytes.per.reducer=<number> > INFO : In order to limit the maximum number of reducers: > INFO : set hive.exec.reducers.max=<number> > INFO : In order to set a constant number of reducers: > INFO : set mapreduce.job.reduces=<number> > DEBUG : Configuring job job_local495362389_0008 with > file:/tmp/hadoop/mapred/staging/karencoppage495362389/.staging/job_local495362389_0008 > as the submit dir > DEBUG : adding the following namenodes' delegation tokens:[file:///] > DEBUG : Creating splits at > file:/tmp/hadoop/mapred/staging/karencoppage495362389/.staging/job_local495362389_0008 > INFO : number of splits:0 > INFO : Submitting tokens for job: job_local495362389_0008 > INFO : Executing with tokens: [] > INFO : The url to track the job: http://localhost:8080/ > INFO : Job running in-process (local Hadoop) > INFO : 2019-10-28 15:26:22,537 Stage-1 map = 0%, reduce = 100% > INFO : Ended Job = job_local495362389_0008 > INFO : MapReduce Jobs Launched: > INFO : Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS > INFO : Total MapReduce CPU Time Spent: 0 msec > INFO : Completed executing > command(queryId=karencoppage_20191028152610_a26c25e1-9834-446a-9a56-c676cb693e7d); > Time taken: 6.497 seconds > INFO : OK > DEBUG : Shutting down query select count (*) from t1 > +-----+ > | c0 | > +-----+ > | 0 | > +-----+ > 1 row selected (11.874 seconds) > {code} > 2. Faulty path, parallel execution enabled > {code:java} > 0: jdbc:hive2://localhost:10000> set > hive.server2.logging.operation.level=EXECUTION; > No rows affected (0.236 seconds) > 0: jdbc:hive2://localhost:10000> set hive.exec.parallel=true; > No rows affected (0.01 seconds) > 0: jdbc:hive2://localhost:10000> select count (*) from t1; > INFO : Compiling > command(queryId=karencoppage_20191028155346_4e7b793b-654e-4d69-b588-f3f0d3ae0c77): > select count (*) from t1 > INFO : Semantic Analysis Completed > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:c0, > type:bigint, comment:null)], properties:null) > INFO : Completed compiling > command(queryId=karencoppage_20191028155346_4e7b793b-654e-4d69-b588-f3f0d3ae0c77); > Time taken: 4.707 seconds > INFO : Executing > command(queryId=karencoppage_20191028155346_4e7b793b-654e-4d69-b588-f3f0d3ae0c77): > select count (*) from t1 > WARN : > INFO : Query ID = > karencoppage_20191028155346_4e7b793b-654e-4d69-b588-f3f0d3ae0c77 > INFO : Total jobs = 1 > INFO : Launching Job 1 out of 1 > INFO : Starting task [Stage-1:MAPRED] in parallel > INFO : MapReduce Jobs Launched: > INFO : Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS > INFO : Total MapReduce CPU Time Spent: 0 msec > INFO : Completed executing > command(queryId=karencoppage_20191028155346_4e7b793b-654e-4d69-b588-f3f0d3ae0c77); > Time taken: 44.577 seconds > INFO : OK > DEBUG : Shutting down query select count (*) from t1 > +-----+ > | c0 | > +-----+ > | 0 | > +-----+ > 1 row selected (54.665 seconds) > {code} > The issue is that Log4J stores the session ID and query ID in some atomic > thread metadata > (org.apache.logging.log4j.ThreadContext.getImmutableContext()). If the > queryId is missing from this metadata, then the RoutingAppender (which is > defined programmatically in LogDivertAppender) will route the log to a > NullAppender, which logs nothing. If the queryId is present, then the > RoutingAppender routes the event to the "query-appender", which will log the > line in the operation log/Beeline. This is not happening in a multi-threaded > context since new threads created for parallel query execution do not have > the queryId/sessionId metadata. > The solution is to add the queryId/sessionId metadata to any new threads > created for MR work. -- This message was sent by Atlassian Jira (v8.20.10#820010)