[ https://issues.apache.org/jira/browse/HIVE-14344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris Drome updated HIVE-14344: ------------------------------- Target Version/s: 2.1.0, 1.2.1 (was: 1.2.1, 2.1.0) Status: Patch Available (was: Open) > Intermittent failures caused by leaking delegation tokens > --------------------------------------------------------- > > Key: HIVE-14344 > URL: https://issues.apache.org/jira/browse/HIVE-14344 > Project: Hive > Issue Type: Bug > Components: Tez > Affects Versions: 2.1.0, 1.2.1 > Reporter: Chris Drome > Assignee: Chris Drome > Attachments: HIVE-14344-branch-1.patch, HIVE-14344.patch > > > We have experienced random job failures caused by leaking delegation tokens. > The Tez child task will fail because it is attempting to read from the > delegation tokens directory of a different (related) task. > Failure results in the following type of stack trace: > {noformat} > 2016-07-21 16:57:18,061 [FATAL] [TezChild] |tez.ReduceRecordSource|: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row (tag=0) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:292) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:249) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:362) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1738) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.RuntimeException: java.io.IOException: Exception reading > file:/grid/4/tmp/yarn-local/usercache/.../appcache/application_1468602386465_489814/container_e02_1468602386465_489814_01_000001/container_tokens > at > org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237) > at > org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:650) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:756) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinObject(CommonMergeJoinOperator.java:316) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:279) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:272) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:258) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:361) > ... 17 more > Caused by: java.lang.RuntimeException: java.io.IOException: Exception reading > file:/grid/4/tmp/yarn-local/usercache/.../appcache/application_1468602386465_489814/container_e02_1468602386465_489814_01_000001/container_tokens > at > org.apache.hadoop.mapreduce.security.TokenCache.mergeBinaryTokens(TokenCache.java:141) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:119) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80) > at > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:206) > at > org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:45) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315) > at > org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:222) > ... 25 more > Caused by: java.io.IOException: Exception reading > file:/grid/4/tmp/yarn-local/usercache/.../appcache/application_1468602386465_489814/container_e02_1468602386465_489814_01_000001/container_tokens > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:175) > at > org.apache.hadoop.mapreduce.security.TokenCache.mergeBinaryTokens(TokenCache.java:136) > ... 32 more > Caused by: java.io.FileNotFoundException: File > file:/grid/4/tmp/yarn-local/usercache/.../appcache/application_1468602386465_489814/container_e02_1468602386465_489814_01_000001/container_tokens > does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:142) > at > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:346) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:768) > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:170) > ... 33 more > {noformat} > The application that failed was {{application_1468602386465_489844}} while > complaining about > {{appcache/application_1468602386465_489814/container_e02_1468602386465_489814_01_000001/container_tokens}}. > This seems to only manifest via HiveAction through Oozie. -- This message was sent by Atlassian JIRA (v6.3.4#6332)