[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling
[ https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715399#comment-14715399 ] Min Jiang commented on FLINK-2089: -- Sorry, just saw many mails in my mailbox. i am inside a firewall, not able to send out big file, also not able to use git to pull code to my machine. i will try from home tonight to grab the code. Will let you know the result. Thanks a lot! Min Buffer recycled IllegalStateException during cancelling - Key: FLINK-2089 URL: https://issues.apache.org/jira/browse/FLINK-2089 Project: Flink Issue Type: Bug Components: Distributed Runtime Affects Versions: master Reporter: Ufuk Celebi Assignee: Ufuk Celebi Fix For: 0.10, 0.9.1 [~rmetzger] reported the following stack trace during cancelling of high parallelism jobs: {code} Error: java.lang.IllegalStateException: Buffer has already been recycled. at org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173) at org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142) at org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78) at org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72) at org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80) at org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34) at org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73) at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96) at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745) {code} This looks like a concurrent buffer pool release/buffer usage error. I'm investing this today. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling
[ https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712683#comment-14712683 ] ASF GitHub Bot commented on FLINK-2089: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1050 Buffer recycled IllegalStateException during cancelling - Key: FLINK-2089 URL: https://issues.apache.org/jira/browse/FLINK-2089 Project: Flink Issue Type: Bug Components: Distributed Runtime Affects Versions: master Reporter: Ufuk Celebi Assignee: Ufuk Celebi Fix For: 0.10, 0.9.1 [~rmetzger] reported the following stack trace during cancelling of high parallelism jobs: {code} Error: java.lang.IllegalStateException: Buffer has already been recycled. at org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173) at org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142) at org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78) at org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72) at org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80) at org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34) at org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73) at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96) at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745) {code} This looks like a concurrent buffer pool release/buffer usage error. I'm investing this today. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling
[ https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712685#comment-14712685 ] Ufuk Celebi commented on FLINK-2089: Fixed via b2f8e30 in master. Back port to release-0.9 pending on ML discussion (http://mail-archives.apache.org/mod_mbox/flink-dev/201508.mbox/%3cCANC1h_voqCXu_-5RqOcwf0q=87p_bqrxxv-gs2qcbpkfcc2...@mail.gmail.com%3e) Buffer recycled IllegalStateException during cancelling - Key: FLINK-2089 URL: https://issues.apache.org/jira/browse/FLINK-2089 Project: Flink Issue Type: Bug Components: Distributed Runtime Affects Versions: master Reporter: Ufuk Celebi Assignee: Ufuk Celebi Fix For: 0.10, 0.9.1 [~rmetzger] reported the following stack trace during cancelling of high parallelism jobs: {code} Error: java.lang.IllegalStateException: Buffer has already been recycled. at org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173) at org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142) at org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78) at org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72) at org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80) at org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34) at org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73) at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96) at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745) {code} This looks like a concurrent buffer pool release/buffer usage error. I'm investing this today. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling
[ https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711466#comment-14711466 ] ASF GitHub Bot commented on FLINK-2089: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/1050#issuecomment-134634885 LGTM +1 Buffer recycled IllegalStateException during cancelling - Key: FLINK-2089 URL: https://issues.apache.org/jira/browse/FLINK-2089 Project: Flink Issue Type: Bug Components: Distributed Runtime Affects Versions: master Reporter: Ufuk Celebi Assignee: Ufuk Celebi Fix For: 0.10, 0.9.1 [~rmetzger] reported the following stack trace during cancelling of high parallelism jobs: {code} Error: java.lang.IllegalStateException: Buffer has already been recycled. at org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173) at org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142) at org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78) at org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72) at org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80) at org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34) at org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73) at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96) at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745) {code} This looks like a concurrent buffer pool release/buffer usage error. I'm investing this today. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling
[ https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711654#comment-14711654 ] ASF GitHub Bot commented on FLINK-2089: --- Github user uce commented on the pull request: https://github.com/apache/flink/pull/1050#issuecomment-134676836 I will address the comment and merge this for 0.10 and 0.9.1. Buffer recycled IllegalStateException during cancelling - Key: FLINK-2089 URL: https://issues.apache.org/jira/browse/FLINK-2089 Project: Flink Issue Type: Bug Components: Distributed Runtime Affects Versions: master Reporter: Ufuk Celebi Assignee: Ufuk Celebi Fix For: 0.10, 0.9.1 [~rmetzger] reported the following stack trace during cancelling of high parallelism jobs: {code} Error: java.lang.IllegalStateException: Buffer has already been recycled. at org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173) at org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142) at org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78) at org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72) at org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80) at org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34) at org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73) at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96) at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745) {code} This looks like a concurrent buffer pool release/buffer usage error. I'm investing this today. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling
[ https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711218#comment-14711218 ] ASF GitHub Bot commented on FLINK-2089: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/1050#discussion_r37861035 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/api/writer/RecordWriter.java --- @@ -20,7 +20,7 @@ import org.apache.flink.core.io.IOReadableWritable; import org.apache.flink.runtime.accumulators.AccumulatorRegistry; -import org.apache.flink.runtime.event.AbstractEvent; +import org.apache.flink.runtime.event.AbstractEvent; --- End diff -- Whitespace should be removed Buffer recycled IllegalStateException during cancelling - Key: FLINK-2089 URL: https://issues.apache.org/jira/browse/FLINK-2089 Project: Flink Issue Type: Bug Components: Distributed Runtime Affects Versions: master Reporter: Ufuk Celebi Assignee: Ufuk Celebi Fix For: 0.10, 0.9.1 [~rmetzger] reported the following stack trace during cancelling of high parallelism jobs: {code} Error: java.lang.IllegalStateException: Buffer has already been recycled. at org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173) at org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142) at org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78) at org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72) at org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80) at org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34) at org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73) at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96) at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745) {code} This looks like a concurrent buffer pool release/buffer usage error. I'm investing this today. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling
[ https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711265#comment-14711265 ] ASF GitHub Bot commented on FLINK-2089: --- Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/1050#discussion_r37865266 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/api/writer/RecordWriter.java --- @@ -20,7 +20,7 @@ import org.apache.flink.core.io.IOReadableWritable; import org.apache.flink.runtime.accumulators.AccumulatorRegistry; -import org.apache.flink.runtime.event.AbstractEvent; +import org.apache.flink.runtime.event.AbstractEvent; --- End diff -- Good catch. I'm wondering how that happened though as I don't import stuff manually... :dart: Buffer recycled IllegalStateException during cancelling - Key: FLINK-2089 URL: https://issues.apache.org/jira/browse/FLINK-2089 Project: Flink Issue Type: Bug Components: Distributed Runtime Affects Versions: master Reporter: Ufuk Celebi Assignee: Ufuk Celebi Fix For: 0.10, 0.9.1 [~rmetzger] reported the following stack trace during cancelling of high parallelism jobs: {code} Error: java.lang.IllegalStateException: Buffer has already been recycled. at org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173) at org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142) at org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78) at org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72) at org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80) at org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34) at org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73) at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96) at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745) {code} This looks like a concurrent buffer pool release/buffer usage error. I'm investing this today. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling
[ https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709191#comment-14709191 ] Ufuk Celebi commented on FLINK-2089: Hey Min Jiang, is it possible for you to build a custom version of Flink? I can only think of a situation where this problem can arise, namely when there is another error in the execution. If you can share the data and program, I can also test it myself. {code} git clone -b illegal-2089-master --single-branch https://github.com/uce/flink.git cd flink mvn package -DskipTests {code} Then execute the program as before against this version. Buffer recycled IllegalStateException during cancelling - Key: FLINK-2089 URL: https://issues.apache.org/jira/browse/FLINK-2089 Project: Flink Issue Type: Bug Components: Distributed Runtime Affects Versions: master Reporter: Ufuk Celebi Assignee: Ufuk Celebi Fix For: 0.9 [~rmetzger] reported the following stack trace during cancelling of high parallelism jobs: {code} Error: java.lang.IllegalStateException: Buffer has already been recycled. at org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173) at org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142) at org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78) at org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72) at org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80) at org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34) at org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73) at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96) at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745) {code} This looks like a concurrent buffer pool release/buffer usage error. I'm investing this today. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling
[ https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709214#comment-14709214 ] ASF GitHub Bot commented on FLINK-2089: --- GitHub user uce opened a pull request: https://github.com/apache/flink/pull/1050 [FLINK-2089] [runtime] Fix illegal state in RecordWriter after partition write failure I'm waiting for feedback from a user whether this fixes FLINK-2089, but this PR definitely addresses a problem. Record writers have a `clearBuffers` method, which is called by the task code in a finally block at the end of `invoke` (see `RegularPactTask` for example). This call clears the buffers of the record serializers. The following illegal state can arise: a buffer has been published to a partition, but the serializers still hold a reference to it. When a serializer tries to clear its current buffer, it might have already been recycled (because it was published to the partition). This will currently happen if there was an Exception during writing the buffer to the partition. This PR replaces the write-and-clear calls with a try-catch-finally block and tests the expected behaviour in a new test. The removed tests are superseded by this new test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/uce/flink illegal-2089-master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1050.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1050 commit 88ca58b5e3e78354ca1cffee4e11b48011333c6b Author: Ufuk Celebi u...@apache.org Date: 2015-08-19T14:11:13Z [FLINK-2089] [runtime] Fix illegal state in RecordWriter after partition write failure Buffer recycled IllegalStateException during cancelling - Key: FLINK-2089 URL: https://issues.apache.org/jira/browse/FLINK-2089 Project: Flink Issue Type: Bug Components: Distributed Runtime Affects Versions: master Reporter: Ufuk Celebi Assignee: Ufuk Celebi Fix For: 0.9 [~rmetzger] reported the following stack trace during cancelling of high parallelism jobs: {code} Error: java.lang.IllegalStateException: Buffer has already been recycled. at org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173) at org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142) at org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78) at org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72) at org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80) at org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34) at org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73) at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96) at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745) {code} This looks like a concurrent buffer pool release/buffer usage error. I'm investing this today. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling
[ https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709481#comment-14709481 ] ASF GitHub Bot commented on FLINK-2089: --- Github user uce commented on the pull request: https://github.com/apache/flink/pull/1050#issuecomment-134256139 The failed Travis build is due to FLINK-2564 (see #1047 for the fix). The other builds pass. **Note**: This needs to be merged to `release-0.9` as well. Buffer recycled IllegalStateException during cancelling - Key: FLINK-2089 URL: https://issues.apache.org/jira/browse/FLINK-2089 Project: Flink Issue Type: Bug Components: Distributed Runtime Affects Versions: master Reporter: Ufuk Celebi Assignee: Ufuk Celebi Fix For: 0.9 [~rmetzger] reported the following stack trace during cancelling of high parallelism jobs: {code} Error: java.lang.IllegalStateException: Buffer has already been recycled. at org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173) at org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142) at org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78) at org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72) at org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80) at org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34) at org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73) at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96) at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745) {code} This looks like a concurrent buffer pool release/buffer usage error. I'm investing this today. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling
[ https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703063#comment-14703063 ] Ufuk Celebi commented on FLINK-2089: Can you send me the complete logs? The email address is uce at apache dot org It's interesting to see whether another error occurred during execution. Buffer recycled IllegalStateException during cancelling - Key: FLINK-2089 URL: https://issues.apache.org/jira/browse/FLINK-2089 Project: Flink Issue Type: Bug Components: Distributed Runtime Affects Versions: master Reporter: Ufuk Celebi Assignee: Ufuk Celebi Fix For: 0.9 [~rmetzger] reported the following stack trace during cancelling of high parallelism jobs: {code} Error: java.lang.IllegalStateException: Buffer has already been recycled. at org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173) at org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142) at org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78) at org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72) at org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80) at org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34) at org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73) at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96) at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745) {code} This looks like a concurrent buffer pool release/buffer usage error. I'm investing this today. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling
[ https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703065#comment-14703065 ] Ufuk Celebi commented on FLINK-2089: At the moment, the only way I can reproduce a similar issue is during a failure when writing the buffer out to the result partition. Buffer recycled IllegalStateException during cancelling - Key: FLINK-2089 URL: https://issues.apache.org/jira/browse/FLINK-2089 Project: Flink Issue Type: Bug Components: Distributed Runtime Affects Versions: master Reporter: Ufuk Celebi Assignee: Ufuk Celebi Fix For: 0.9 [~rmetzger] reported the following stack trace during cancelling of high parallelism jobs: {code} Error: java.lang.IllegalStateException: Buffer has already been recycled. at org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173) at org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142) at org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78) at org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72) at org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80) at org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34) at org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73) at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96) at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745) {code} This looks like a concurrent buffer pool release/buffer usage error. I'm investing this today. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling
[ https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703425#comment-14703425 ] Min Jiang commented on FLINK-2089: -- I re-ran today and directed to a result log file. On the screen I got the message stack as below org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Job execution failed. at org.apache.flink.client.program.Client.run(Client.java:413) at org.apache.flink.client.program.Client.run(Client.java:356) at org.apache.flink.client.program.Client.run(Client.java:349) at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:63) at min.play.RWA.main(RWA.java:71) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353) at org.apache.flink.client.program.Client.run(Client.java:315) at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:584) at org.apache.flink.client.CliFrontend.run(CliFrontend.java:290) at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:880) at org.apache.flink.client.CliFrontend.main(CliFrontend.java:922) Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution failed. at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:314) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:36) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:29) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:29) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:92) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) at akka.dispatch.Mailbox.run(Mailbox.scala:221) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: java.lang.IllegalStateException: Buffer has already been recycled. at org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173) at org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142) at org.apache.flink.runtime.io.network.buffer.Buffer.setSize(Buffer.java:105) at org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.getCurrentBuffer(SpanningRecordSerializer.java:151) at org.apache.flink.runtime.io.network.api.writer.RecordWriter.clearBuffers(RecordWriter.java:166) at org.apache.flink.runtime.operators.RegularPactTask.clearWriters(RegularPactTask.java:1532) at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:220) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745) Below is the log message from result log: 08/19/2015 13:06:58 Job execution switched to status RUNNING. 08/19/2015 13:06:58 CHAIN DataSource (at main(RWA.java:40) (org.apache.flink.api.java.io.TextInputFormat)) - Filter (Filter at main(RWA.java:40))(1/1) switched to SCHEDULED 08/19/2015 13:06:58 CHAIN DataSource (at main(RWA.java:40) (org.apache.flink.api.java.io.TextInputFormat)) - Filter (Filter at main(RWA.java:40))(1/1) switched to DEPLOYING 08/19/2015 13:06:58 CHAIN DataSource (at main(RWA.java:43) (org.apache.flink.api.java.io.TextInputFormat)) - Filter (Filter at
[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling
[ https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703445#comment-14703445 ] Min Jiang commented on FLINK-2089: -- I re-ran today and directed to a result log file. On the screen I got the message stack as below org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Job execution failed. at org.apache.flink.client.program.Client.run(Client.java:413) at org.apache.flink.client.program.Client.run(Client.java:356) at org.apache.flink.client.program.Client.run(Client.java:349) at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:63) at min.play.RWA.main(RWA.java:71) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353) at org.apache.flink.client.program.Client.run(Client.java:315) at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:584) at org.apache.flink.client.CliFrontend.run(CliFrontend.java:290) at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:880) at org.apache.flink.client.CliFrontend.main(CliFrontend.java:922) Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution failed. at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:314) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:36) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:29) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:29) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:92) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) at akka.dispatch.Mailbox.run(Mailbox.scala:221) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: java.lang.IllegalStateException: Buffer has already been recycled. at org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173) at org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142) at org.apache.flink.runtime.io.network.buffer.Buffer.setSize(Buffer.java:105) at org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.getCurrentBuffer(SpanningRecordSerializer.java:151) at org.apache.flink.runtime.io.network.api.writer.RecordWriter.clearBuffers(RecordWriter.java:166) at org.apache.flink.runtime.operators.RegularPactTask.clearWriters(RegularPactTask.java:1532) at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:220) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745) Below is the log message from result log: 08/19/2015 13:06:58 Job execution switched to status RUNNING. 08/19/2015 13:06:58 CHAIN DataSource (at main(RWA.java:40) (org.apache.flink.api.java.io.TextInputFormat)) - Filter (Filter at main(RWA.java:40))(1/1) switched to SCHEDULED 08/19/2015 13:06:58 CHAIN DataSource (at main(RWA.java:40) (org.apache.flink.api.java.io.TextInputFormat)) - Filter (Filter at main(RWA.java:40))(1/1) switched to DEPLOYING 08/19/2015 13:06:58 CHAIN DataSource (at main(RWA.java:43) (org.apache.flink.api.java.io.TextInputFormat)) - Filter (Filter at main(RWA.java:43))(1/1) switched to SCHEDULED
[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling
[ https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702716#comment-14702716 ] Ufuk Celebi commented on FLINK-2089: Thanks for researching the existing issues and reporting this. You are right that it is not resolved. I will reopen. I've just looked at the fix 28eb274. You ran into a corner case that was not covered by 28eb274. Buffer recycled IllegalStateException during cancelling - Key: FLINK-2089 URL: https://issues.apache.org/jira/browse/FLINK-2089 Project: Flink Issue Type: Bug Components: Distributed Runtime Affects Versions: master Reporter: Ufuk Celebi Assignee: Ufuk Celebi Fix For: 0.9 [~rmetzger] reported the following stack trace during cancelling of high parallelism jobs: {code} Error: java.lang.IllegalStateException: Buffer has already been recycled. at org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173) at org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142) at org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78) at org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72) at org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80) at org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34) at org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73) at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96) at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745) {code} This looks like a concurrent buffer pool release/buffer usage error. I'm investing this today. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling
[ https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702049#comment-14702049 ] Min Jiang commented on FLINK-2089: -- The issue seems not resolved. I downloaded the version 0.9 and ran my code. In IDE, my code was working fine with data containing about 300,000 records, each record having 550 fields. But when running the code in linux as client, i got the same error message stack: 08/18/2015 17:30:10 CHAIN DataSource (at main(RWA.java:40) (org.apache.flink.api.java.io.TextInputFormat)) - Filter (Filter at main(RWA.java:4 0))(1/1) switched to FAILED java.lang.IllegalStateException: Buffer has already been recycled. at org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173) at org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142) at org.apache.flink.runtime.io.network.buffer.Buffer.setSize(Buffer.java:105) at org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.getCurrentBuffer(SpanningRecordSerializer.java:151) at org.apache.flink.runtime.io.network.api.writer.RecordWriter.clearBuffers(RecordWriter.java:166) at org.apache.flink.runtime.operators.RegularPactTask.clearWriters(RegularPactTask.java:1532) at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:220) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745) Buffer recycled IllegalStateException during cancelling - Key: FLINK-2089 URL: https://issues.apache.org/jira/browse/FLINK-2089 Project: Flink Issue Type: Bug Components: Distributed Runtime Affects Versions: master Reporter: Ufuk Celebi Assignee: Ufuk Celebi Fix For: 0.9 [~rmetzger] reported the following stack trace during cancelling of high parallelism jobs: {code} Error: java.lang.IllegalStateException: Buffer has already been recycled. at org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173) at org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142) at org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78) at org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72) at org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80) at org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34) at org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73) at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96) at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745) {code} This looks like a concurrent buffer pool release/buffer usage error. I'm investing this today. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling
[ https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561276#comment-14561276 ] ASF GitHub Bot commented on FLINK-2089: --- GitHub user uce opened a pull request: https://github.com/apache/flink/pull/736 [FLINK-2089] [runtime] Fix possible duplicate buffer release This PR contains multiple independent commits, which address issues discovered while debugging FLINK-2089. - It adds the partition request backoff logic to local requests as well. The backoffs were introduced recently for remote requests. I've missed that the same problem could also happen for local input channels. The fix was easy and moves the backoff logic to the abstract InputChannel, which both Local and RemoteInputChannel extend. - The duplicate buffer release was hard to track. In some corner cases, the record serializers were incorrectly holding references to buffers *after* written them out to a result partition. In failure cases, the serializers recycled these buffers too early. The later recycling (by the component, which is actually responsible for this) then resulted in an IllegalStateException. You can merge this pull request into a Git repository by running: $ git pull https://github.com/uce/incubator-flink cancel-2089 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/736.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #736 commit e134f44640f2caafc6cff76fd100b11e3aa47515 Author: Ufuk Celebi u...@apache.org Date: 2015-05-26T09:15:17Z [runtime] [tests] Add TaskCancelTest commit d6a33bfda84ea861e05e7a0aff6c529808c02bb2 Author: Ufuk Celebi u...@apache.org Date: 2015-05-26T13:37:35Z [FLINK-1636] [runtime] Add partition request backoff logic to LocalInputChannel commit 7ea3ed2ad4c95c1bec0f2d558ba0d4faf9716f14 Author: Ufuk Celebi u...@apache.org Date: 2015-05-27T12:49:01Z [FLINK-2089] Fix possible duplicate buffer release Problem: RecordWriter instances have stateful serializers, which include the buffer that they currently work with. If the serializer state is not cleared correctly by the writers after writing a buffer to the respective result partition, it is possible that buffers are recycled multiple times in failure cases. This results in an IllegalStateException. Solution: After writing a buffer to a ResultPartition, the RecordWriter makes sure that the serializer clears the reference to the respective buffer. The recycling of the buffer is then the responsibility of the result partition. commit 91b6049ac371a62671c36a8280b0a60f1b2b7408 Author: Ufuk Celebi u...@apache.org Date: 2015-05-27T16:26:46Z [runtime] [logging] Fix log message Buffer recycled IllegalStateException during cancelling - Key: FLINK-2089 URL: https://issues.apache.org/jira/browse/FLINK-2089 Project: Flink Issue Type: Bug Components: Distributed Runtime Affects Versions: master Reporter: Ufuk Celebi Assignee: Ufuk Celebi [~rmetzger] reported the following stack trace during cancelling of high parallelism jobs: {code} Error: java.lang.IllegalStateException: Buffer has already been recycled. at org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173) at org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142) at org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78) at org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72) at org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80) at org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34) at org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73) at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96) at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745) {code} This looks like a concurrent buffer pool release/buffer usage error. I'm investing this today. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling
[ https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561736#comment-14561736 ] ASF GitHub Bot commented on FLINK-2089: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/736 Buffer recycled IllegalStateException during cancelling - Key: FLINK-2089 URL: https://issues.apache.org/jira/browse/FLINK-2089 Project: Flink Issue Type: Bug Components: Distributed Runtime Affects Versions: master Reporter: Ufuk Celebi Assignee: Ufuk Celebi [~rmetzger] reported the following stack trace during cancelling of high parallelism jobs: {code} Error: java.lang.IllegalStateException: Buffer has already been recycled. at org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173) at org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142) at org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78) at org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72) at org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80) at org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34) at org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73) at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96) at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745) {code} This looks like a concurrent buffer pool release/buffer usage error. I'm investing this today. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling
[ https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561734#comment-14561734 ] ASF GitHub Bot commented on FLINK-2089: --- Github user uce commented on the pull request: https://github.com/apache/flink/pull/736#issuecomment-106074396 Travis gives a green light. I'm merging this. This will will give the changes as much testing exposure before the release as possible. Buffer recycled IllegalStateException during cancelling - Key: FLINK-2089 URL: https://issues.apache.org/jira/browse/FLINK-2089 Project: Flink Issue Type: Bug Components: Distributed Runtime Affects Versions: master Reporter: Ufuk Celebi Assignee: Ufuk Celebi [~rmetzger] reported the following stack trace during cancelling of high parallelism jobs: {code} Error: java.lang.IllegalStateException: Buffer has already been recycled. at org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173) at org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142) at org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78) at org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72) at org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80) at org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34) at org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73) at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96) at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745) {code} This looks like a concurrent buffer pool release/buffer usage error. I'm investing this today. -- This message was sent by Atlassian JIRA (v6.3.4#6332)