[jira] [Commented] (FLINK-14525) buffer pool is destroyed
[ https://issues.apache.org/jira/browse/FLINK-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17217338#comment-17217338 ] Zhijiang commented on FLINK-14525: -- Close this issue for cleanup, since the reporter was not responsive for long time and the affected version is out of date for maintaining. > buffer pool is destroyed > > > Key: FLINK-14525 > URL: https://issues.apache.org/jira/browse/FLINK-14525 > Project: Flink > Issue Type: Bug > Components: Runtime / Network >Affects Versions: 1.7.2 >Reporter: Saqib >Priority: Major > > Have a flink app running in standalone mode. The app runs ok in our non-prod > env. However on our prod server it throws this exception: > Buffer pool is destroyed. > > This error is being thrown as a RuntimeException on the collect call, on the > flatmap function. The flatmap is just collecting a Tuple, > the Document is a XML Document object. > > As mentioned the non prod env (and we have multiple, DEV,QA,UAT) this is not > happening. The UAT box is spec-ed exactly as our Prod host with 4CPU. The > java version is the same too. > > Not sure how to proceed. > > Thanks > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-14525) buffer pool is destroyed
[ https://issues.apache.org/jira/browse/FLINK-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16960815#comment-16960815 ] zhijiang commented on FLINK-14525: -- The above stack trace is not helpful for tracing the root cause. If you can get the JobMaster log, then it is easy to find the first failure reason which causes the above buffer pool destroyed. > buffer pool is destroyed > > > Key: FLINK-14525 > URL: https://issues.apache.org/jira/browse/FLINK-14525 > Project: Flink > Issue Type: Bug > Components: Runtime / Network >Affects Versions: 1.7.2 >Reporter: Saqib >Priority: Blocker > > Have a flink app running in standalone mode. The app runs ok in our non-prod > env. However on our prod server it throws this exception: > Buffer pool is destroyed. > > This error is being thrown as a RuntimeException on the collect call, on the > flatmap function. The flatmap is just collecting a Tuple, > the Document is a XML Document object. > > As mentioned the non prod env (and we have multiple, DEV,QA,UAT) this is not > happening. The UAT box is spec-ed exactly as our Prod host with 4CPU. The > java version is the same too. > > Not sure how to proceed. > > Thanks > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-14525) buffer pool is destroyed
[ https://issues.apache.org/jira/browse/FLINK-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16960037#comment-16960037 ] Saqib commented on FLINK-14525: --- here is the stack trace of the exception: java.lang.RuntimeException: Buffer pool is destroyed. at org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:110) at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:89) at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:45) at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718) at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696) at org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51) at com.cs.ib.tarsan.odds.flink.CMSAccountFilter.flatMap(CMSAccountFilter.java:51) at com.cs.ib.tarsan.cdds.flink.CMSAccountFilter.flatMap(CMSAccountFilter.java:15) at org.apache.flink.streaming.api.operators.StreamFlatMap.processElement(StreamFlatMap.java:50) at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:579) at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:554) at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534) at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718) at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696) at org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51) at com.cs.ib.tarsan.cdds.flink.CddsXMLDocumentCreator.flatMap(CddsXMLDocumentCreator.java:50) at com.cs.ib.tarsan.cdds.flink.CddsXMLDocumentCreator.flatMap(CddsXMLDocumentCreator.java:22)2019-10-24 16:37:55.734 [Source: Custom 5< - GPID=30428415 ...Exception= Buffer pool is destroyed. at org.apache.flink.streaming.api.operators.StreamFlatMap.processElement(StreamFlatMap.java:50) at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:579) at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:554) at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534) at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718) at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696) at org.apache.flink.streaming.api.operators.StreamFilter.processElement(StreamFilter.java:40) at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:579) at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:554) at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534) at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718) at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696) at org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104) at org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collectWithTimestamp(StreamSourceContexts.java:111) at org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordWithTimestamp(AbstractFetcher.java:398) at org.apache.flink.streaming.connectors.kafka.internal.Kafka010Fetcher.emitRecord(Kafka010Fetcher.java:89) at org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.runFetchLoop(Kafka09Fetcher.java:154) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:665) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:94) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:58) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704) at java.lang.Thread.run(Thread.java:745) Caused by: 'iava.lang.IllegalStateException: Buffer pool is destroyed. at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.reques
[jira] [Commented] (FLINK-14525) buffer pool is destroyed
[ https://issues.apache.org/jira/browse/FLINK-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16959508#comment-16959508 ] zhijiang commented on FLINK-14525: -- As [~wind_ljy] mentioned above, there exists some other failures in the job, then it would trigger the cancel all the tasks. The "Buffer pool is destroyed" has correlation with the cancel operation. You can further check whether it exists other task failure or TaskExecutor lost. > buffer pool is destroyed > > > Key: FLINK-14525 > URL: https://issues.apache.org/jira/browse/FLINK-14525 > Project: Flink > Issue Type: Bug > Components: Runtime / Network >Affects Versions: 1.7.2 >Reporter: Saqib >Priority: Blocker > > Have a flink app running in standalone mode. The app runs ok in our non-prod > env. However on our prod server it throws this exception: > Buffer pool is destroyed. > > This error is being thrown as a RuntimeException on the collect call, on the > flatmap function. The flatmap is just collecting a Tuple, > the Document is a XML Document object. > > As mentioned the non prod env (and we have multiple, DEV,QA,UAT) this is not > happening. The UAT box is spec-ed exactly as our Prod host with 4CPU. The > java version is the same too. > > Not sure how to proceed. > > Thanks > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-14525) buffer pool is destroyed
[ https://issues.apache.org/jira/browse/FLINK-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16959386#comment-16959386 ] Jiayi Liao commented on FLINK-14525: I believe this is not the root cause. "Buffer pool is destroyed" is because the NettyShuffleEnvironment is closed, which is kind of a "normal" phenomenon when exception is thrown. It'd be better if you can attach the full logs of the jobmanager and taskmanager. > buffer pool is destroyed > > > Key: FLINK-14525 > URL: https://issues.apache.org/jira/browse/FLINK-14525 > Project: Flink > Issue Type: Bug > Components: Runtime / Network >Affects Versions: 1.7.2 >Reporter: Saqib >Priority: Blocker > > Have a flink app running in standalone mode. The app runs ok in our non-prod > env. However on our prod server it throws this exception: > Buffer pool is destroyed. > > This error is being thrown as a RuntimeException on the collect call, on the > flatmap function. The flatmap is just collecting a Tuple, > the Document is a XML Document object. > > As mentioned the non prod env (and we have multiple, DEV,QA,UAT) this is not > happening. The UAT box is spec-ed exactly as our Prod host with 4CPU. The > java version is the same too. > > Not sure how to proceed. > > Thanks > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)