Alex, what version of the platform are you running ? Ram
On Wed, May 11, 2016 at 7:05 AM, McCullough, Alex < [email protected]> wrote: > Hey Everyone, > > I have an application that is “failing” after running for a number of > hours, I was wondering if there is a standard way to determine the cause > for failure. > > In STRAM events I see some final exceptions on containers at the end > related to loss of socket ownership, when I click on the operator and look > at the logs the last thing logged is a different error, both are listed > below. > > In the app master logs I even see a different error. > > Is there a best practice to determine why an application becomes “Failed”? > And any insight on the exceptions below? > > Thanks, > Alex > > > App Master Log Final Lines > > 2016-05-10 20:20:33,862 WARN org.apache.hadoop.hdfs.DFSClient: > DFSOutputStream ResponseProcessor exception for block > BP-88483743-10.24.28.46-1443641081815:blk_1167862818_94520797 > java.io.IOException: Bad response ERROR for block > BP-88483743-10.24.28.46-1443641081815:blk_1167862818_94520797 from datanode > DatanodeInfoWithStorage[10.24.28.58:50010 > ,DS-d0329c7e-59b4-4c6b-b321-59f8c013f113,DISK] > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:1002) > 2016-05-10 20:20:33,862 WARN org.apache.hadoop.hdfs.DFSClient: Error > Recovery for block > BP-88483743-10.24.28.46-1443641081815:blk_1167862818_94520797 in pipeline > DatanodeInfoWithStorage[10.24.28.56:50010,DS-3664dd2d-8bf2-402a-badb-2016bce2c642,DISK], > DatanodeInfoWithStorage[10.24.28.63:50010,DS-6c2824a3-a9f1-4cef-b3f2-4069e3a596e7,DISK], > DatanodeInfoWithStorage[10.24.28.58:50010,DS-d0329c7e-59b4-4c6b-b321-59f8c013f113,DISK]: > bad datanode DatanodeInfoWithStorage[10.24.28.58:50010 > ,DS-d0329c7e-59b4-4c6b-b321-59f8c013f113,DISK] > 2016-05-10 20:20:37,646 ERROR org.apache.hadoop.hdfs.DFSClient: Failed to > close inode 96057235 > java.io.EOFException: Premature EOF: no length prefix available > at > org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2241) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:1264) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1234) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1375) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1119) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:622) > > > > Error displayed for several HDHT operators in STRAM Events: > > Stopped running due to an exception. > com.datatorrent.netlet.NetletThrowable$NetletRuntimeException: > java.lang.UnsupportedOperationException: Client does not own the socket any > longer! > at > com.datatorrent.netlet.AbstractClient$1.offer(AbstractClient.java:364) > at > com.datatorrent.netlet.AbstractClient$1.offer(AbstractClient.java:354) > at > com.datatorrent.netlet.AbstractClient.send(AbstractClient.java:300) > at > com.datatorrent.netlet.AbstractLengthPrependerClient.write(AbstractLengthPrependerClient.java:236) > at > com.datatorrent.netlet.AbstractLengthPrependerClient.write(AbstractLengthPrependerClient.java:190) > at > com.datatorrent.stram.stream.BufferServerPublisher.put(BufferServerPublisher.java:135) > at > com.datatorrent.api.DefaultOutputPort.emit(DefaultOutputPort.java:51) > at > com.capitalone.vault8.citadel.operators.AbstractTimedHdhtRecordWriter$2.emit(AbstractTimedHdhtRecordWriter.java:92) > at > com.capitalone.vault8.citadel.operators.AbstractTimedHdhtRecordWriter$2.emit(AbstractTimedHdhtRecordWriter.java:89) > at > com.capitalone.vault8.citadel.operators.AbstractTimedHdhtRecordWriter.processTuple(AbstractTimedHdhtRecordWriter.java:78) > at > com.capitalone.vault8.citadel.operators.AbstractTimedHdhtRecordWriter$1.process(AbstractTimedHdhtRecordWriter.java:85) > at > com.capitalone.vault8.citadel.operators.AbstractTimedHdhtRecordWriter$1.process(AbstractTimedHdhtRecordWriter.java:82) > at > com.datatorrent.api.DefaultInputPort.put(DefaultInputPort.java:79) > at > com.datatorrent.stram.stream.BufferServerSubscriber$BufferReservoir.sweep(BufferServerSubscriber.java:265) > at > com.datatorrent.stram.engine.GenericNode.run(GenericNode.java:252) > at > com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1388) > Caused by: java.lang.UnsupportedOperationException: Client does not own > the socket any longer! > ... 16 more > > Last lines in one of the stopped containers with above exception: > 2016-05-11 08:09:43,044 WARN com.datatorrent.stram.RecoverableRpcProxy: > RPC failure, attempting reconnect after 10000 ms (remaining 29498 ms) > java.lang.reflect.UndeclaredThrowableException > at com.sun.proxy.$Proxy18.processHeartbeat(Unknown Source) > at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > com.datatorrent.stram.RecoverableRpcProxy.invoke(RecoverableRpcProxy.java:138) > at com.sun.proxy.$Proxy18.processHeartbeat(Unknown Source) > at > com.datatorrent.stram.engine.StreamingContainer.heartbeatLoop(StreamingContainer.java:693) > at > com.datatorrent.stram.engine.StreamingContainer.main(StreamingContainer.java:312) > Caused by: java.io.EOFException: End of File Exception between local host > is: "mdcilabpdn04.kdc.capitalone.com/10.24.28.53"; destination host is: " > mdcilabpdn06.kdc.capitalone.com":49859; : java.io.EOFException; For more > details see: http://wiki.apache.org/hadoop/EOFException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) > at org.apache.hadoop.ipc.Client.call(Client.java:1476) > at org.apache.hadoop.ipc.Client.call(Client.java:1403) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:243) > ... 8 more > Caused by: java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:392) > at > org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1075) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:970) > ________________________________________________________ > > The information contained in this e-mail is confidential and/or > proprietary to Capital One and/or its affiliates and may only be used > solely in performance of work or services for Capital One. The information > transmitted herewith is intended only for use by the individual or entity > to which it is addressed. If the reader of this message is not the intended > recipient, you are hereby notified that any review, retransmission, > dissemination, distribution, copying or other use of, or taking of any > action in reliance upon this information is strictly prohibited. If you > have received this communication in error, please contact the sender and > delete the material from your computer. >
