[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling

2015-08-26 Thread Min Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715399#comment-14715399
 ] 

Min Jiang commented on FLINK-2089:
--

Sorry, just saw many mails in my mailbox.

i am inside a firewall, not able to send out big file, also not able to use
git to pull code to my machine.
i will try from home tonight to grab the code. Will let you know the result.

Thanks a lot!
Min




 Buffer recycled IllegalStateException during cancelling
 -

 Key: FLINK-2089
 URL: https://issues.apache.org/jira/browse/FLINK-2089
 Project: Flink
  Issue Type: Bug
  Components: Distributed Runtime
Affects Versions: master
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi
 Fix For: 0.10, 0.9.1


 [~rmetzger] reported the following stack trace during cancelling of high 
 parallelism jobs:
 {code}
 Error: java.lang.IllegalStateException: Buffer has already been recycled.
 at 
 org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78)
 at 
 org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72)
 at 
 org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80)
 at 
 org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34)
 at 
 org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
 at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 This looks like a concurrent buffer pool release/buffer usage error. I'm 
 investing this today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling

2015-08-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712683#comment-14712683
 ] 

ASF GitHub Bot commented on FLINK-2089:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1050


 Buffer recycled IllegalStateException during cancelling
 -

 Key: FLINK-2089
 URL: https://issues.apache.org/jira/browse/FLINK-2089
 Project: Flink
  Issue Type: Bug
  Components: Distributed Runtime
Affects Versions: master
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi
 Fix For: 0.10, 0.9.1


 [~rmetzger] reported the following stack trace during cancelling of high 
 parallelism jobs:
 {code}
 Error: java.lang.IllegalStateException: Buffer has already been recycled.
 at 
 org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78)
 at 
 org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72)
 at 
 org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80)
 at 
 org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34)
 at 
 org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
 at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 This looks like a concurrent buffer pool release/buffer usage error. I'm 
 investing this today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling

2015-08-26 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712685#comment-14712685
 ] 

Ufuk Celebi commented on FLINK-2089:


Fixed via b2f8e30 in master.

Back port to release-0.9 pending on ML discussion 
(http://mail-archives.apache.org/mod_mbox/flink-dev/201508.mbox/%3cCANC1h_voqCXu_-5RqOcwf0q=87p_bqrxxv-gs2qcbpkfcc2...@mail.gmail.com%3e)

 Buffer recycled IllegalStateException during cancelling
 -

 Key: FLINK-2089
 URL: https://issues.apache.org/jira/browse/FLINK-2089
 Project: Flink
  Issue Type: Bug
  Components: Distributed Runtime
Affects Versions: master
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi
 Fix For: 0.10, 0.9.1


 [~rmetzger] reported the following stack trace during cancelling of high 
 parallelism jobs:
 {code}
 Error: java.lang.IllegalStateException: Buffer has already been recycled.
 at 
 org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78)
 at 
 org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72)
 at 
 org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80)
 at 
 org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34)
 at 
 org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
 at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 This looks like a concurrent buffer pool release/buffer usage error. I'm 
 investing this today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling

2015-08-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711466#comment-14711466
 ] 

ASF GitHub Bot commented on FLINK-2089:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/1050#issuecomment-134634885
  
LGTM +1


 Buffer recycled IllegalStateException during cancelling
 -

 Key: FLINK-2089
 URL: https://issues.apache.org/jira/browse/FLINK-2089
 Project: Flink
  Issue Type: Bug
  Components: Distributed Runtime
Affects Versions: master
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi
 Fix For: 0.10, 0.9.1


 [~rmetzger] reported the following stack trace during cancelling of high 
 parallelism jobs:
 {code}
 Error: java.lang.IllegalStateException: Buffer has already been recycled.
 at 
 org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78)
 at 
 org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72)
 at 
 org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80)
 at 
 org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34)
 at 
 org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
 at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 This looks like a concurrent buffer pool release/buffer usage error. I'm 
 investing this today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling

2015-08-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711654#comment-14711654
 ] 

ASF GitHub Bot commented on FLINK-2089:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/1050#issuecomment-134676836
  
I will address the comment and merge this for 0.10 and 0.9.1.


 Buffer recycled IllegalStateException during cancelling
 -

 Key: FLINK-2089
 URL: https://issues.apache.org/jira/browse/FLINK-2089
 Project: Flink
  Issue Type: Bug
  Components: Distributed Runtime
Affects Versions: master
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi
 Fix For: 0.10, 0.9.1


 [~rmetzger] reported the following stack trace during cancelling of high 
 parallelism jobs:
 {code}
 Error: java.lang.IllegalStateException: Buffer has already been recycled.
 at 
 org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78)
 at 
 org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72)
 at 
 org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80)
 at 
 org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34)
 at 
 org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
 at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 This looks like a concurrent buffer pool release/buffer usage error. I'm 
 investing this today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling

2015-08-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711218#comment-14711218
 ] 

ASF GitHub Bot commented on FLINK-2089:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1050#discussion_r37861035
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/api/writer/RecordWriter.java
 ---
@@ -20,7 +20,7 @@
 
 import org.apache.flink.core.io.IOReadableWritable;
 import org.apache.flink.runtime.accumulators.AccumulatorRegistry;
-import org.apache.flink.runtime.event.AbstractEvent;
+import  org.apache.flink.runtime.event.AbstractEvent;
--- End diff --

Whitespace should be removed


 Buffer recycled IllegalStateException during cancelling
 -

 Key: FLINK-2089
 URL: https://issues.apache.org/jira/browse/FLINK-2089
 Project: Flink
  Issue Type: Bug
  Components: Distributed Runtime
Affects Versions: master
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi
 Fix For: 0.10, 0.9.1


 [~rmetzger] reported the following stack trace during cancelling of high 
 parallelism jobs:
 {code}
 Error: java.lang.IllegalStateException: Buffer has already been recycled.
 at 
 org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78)
 at 
 org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72)
 at 
 org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80)
 at 
 org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34)
 at 
 org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
 at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 This looks like a concurrent buffer pool release/buffer usage error. I'm 
 investing this today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling

2015-08-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711265#comment-14711265
 ] 

ASF GitHub Bot commented on FLINK-2089:
---

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/1050#discussion_r37865266
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/api/writer/RecordWriter.java
 ---
@@ -20,7 +20,7 @@
 
 import org.apache.flink.core.io.IOReadableWritable;
 import org.apache.flink.runtime.accumulators.AccumulatorRegistry;
-import org.apache.flink.runtime.event.AbstractEvent;
+import  org.apache.flink.runtime.event.AbstractEvent;
--- End diff --

Good catch. I'm wondering how that happened though as I don't import stuff 
manually... :dart: 


 Buffer recycled IllegalStateException during cancelling
 -

 Key: FLINK-2089
 URL: https://issues.apache.org/jira/browse/FLINK-2089
 Project: Flink
  Issue Type: Bug
  Components: Distributed Runtime
Affects Versions: master
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi
 Fix For: 0.10, 0.9.1


 [~rmetzger] reported the following stack trace during cancelling of high 
 parallelism jobs:
 {code}
 Error: java.lang.IllegalStateException: Buffer has already been recycled.
 at 
 org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78)
 at 
 org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72)
 at 
 org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80)
 at 
 org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34)
 at 
 org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
 at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 This looks like a concurrent buffer pool release/buffer usage error. I'm 
 investing this today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling

2015-08-24 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709191#comment-14709191
 ] 

Ufuk Celebi commented on FLINK-2089:


Hey Min Jiang,

is it possible for you to build a custom version of Flink? I can only think of 
a situation where this problem can arise, namely when there is another error in 
the execution. If you can share the data and program, I can also test it myself.

{code}
git clone -b illegal-2089-master  --single-branch 
https://github.com/uce/flink.git
cd flink
mvn package -DskipTests
{code}

Then execute the program as before against this version.

 Buffer recycled IllegalStateException during cancelling
 -

 Key: FLINK-2089
 URL: https://issues.apache.org/jira/browse/FLINK-2089
 Project: Flink
  Issue Type: Bug
  Components: Distributed Runtime
Affects Versions: master
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi
 Fix For: 0.9


 [~rmetzger] reported the following stack trace during cancelling of high 
 parallelism jobs:
 {code}
 Error: java.lang.IllegalStateException: Buffer has already been recycled.
 at 
 org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78)
 at 
 org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72)
 at 
 org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80)
 at 
 org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34)
 at 
 org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
 at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 This looks like a concurrent buffer pool release/buffer usage error. I'm 
 investing this today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling

2015-08-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709214#comment-14709214
 ] 

ASF GitHub Bot commented on FLINK-2089:
---

GitHub user uce opened a pull request:

https://github.com/apache/flink/pull/1050

[FLINK-2089] [runtime] Fix illegal state in RecordWriter after partition 
write failure

I'm waiting for feedback from a user whether this fixes FLINK-2089, but 
this PR definitely addresses a problem.

Record writers have a `clearBuffers` method, which is called by the task 
code in a finally block at the end of `invoke` (see `RegularPactTask` for 
example). This call clears the buffers of the record serializers.

The following illegal state can arise: a buffer has been published to a 
partition, but the serializers still hold a reference to it. When a serializer 
tries to clear its current buffer, it might have already been recycled (because 
it was published to the partition). This will currently happen if there was an 
Exception during writing the buffer to the partition.

This PR replaces the write-and-clear calls with a try-catch-finally block 
and tests the expected behaviour in a new test. The removed tests are 
superseded by this new test.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uce/flink illegal-2089-master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1050.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1050


commit 88ca58b5e3e78354ca1cffee4e11b48011333c6b
Author: Ufuk Celebi u...@apache.org
Date:   2015-08-19T14:11:13Z

[FLINK-2089] [runtime] Fix illegal state in RecordWriter after partition 
write failure




 Buffer recycled IllegalStateException during cancelling
 -

 Key: FLINK-2089
 URL: https://issues.apache.org/jira/browse/FLINK-2089
 Project: Flink
  Issue Type: Bug
  Components: Distributed Runtime
Affects Versions: master
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi
 Fix For: 0.9


 [~rmetzger] reported the following stack trace during cancelling of high 
 parallelism jobs:
 {code}
 Error: java.lang.IllegalStateException: Buffer has already been recycled.
 at 
 org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78)
 at 
 org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72)
 at 
 org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80)
 at 
 org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34)
 at 
 org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
 at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 This looks like a concurrent buffer pool release/buffer usage error. I'm 
 investing this today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling

2015-08-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709481#comment-14709481
 ] 

ASF GitHub Bot commented on FLINK-2089:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/1050#issuecomment-134256139
  
The failed Travis build is due to FLINK-2564 (see #1047 for the fix). The 
other builds pass.

**Note**: This needs to be merged to `release-0.9` as well.


 Buffer recycled IllegalStateException during cancelling
 -

 Key: FLINK-2089
 URL: https://issues.apache.org/jira/browse/FLINK-2089
 Project: Flink
  Issue Type: Bug
  Components: Distributed Runtime
Affects Versions: master
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi
 Fix For: 0.9


 [~rmetzger] reported the following stack trace during cancelling of high 
 parallelism jobs:
 {code}
 Error: java.lang.IllegalStateException: Buffer has already been recycled.
 at 
 org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78)
 at 
 org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72)
 at 
 org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80)
 at 
 org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34)
 at 
 org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
 at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 This looks like a concurrent buffer pool release/buffer usage error. I'm 
 investing this today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling

2015-08-19 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703063#comment-14703063
 ] 

Ufuk Celebi commented on FLINK-2089:


Can you send me the complete logs? The email address is uce at apache dot org

It's interesting to see whether another error occurred during execution.

 Buffer recycled IllegalStateException during cancelling
 -

 Key: FLINK-2089
 URL: https://issues.apache.org/jira/browse/FLINK-2089
 Project: Flink
  Issue Type: Bug
  Components: Distributed Runtime
Affects Versions: master
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi
 Fix For: 0.9


 [~rmetzger] reported the following stack trace during cancelling of high 
 parallelism jobs:
 {code}
 Error: java.lang.IllegalStateException: Buffer has already been recycled.
 at 
 org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78)
 at 
 org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72)
 at 
 org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80)
 at 
 org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34)
 at 
 org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
 at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 This looks like a concurrent buffer pool release/buffer usage error. I'm 
 investing this today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling

2015-08-19 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703065#comment-14703065
 ] 

Ufuk Celebi commented on FLINK-2089:


At the moment, the only way I can reproduce a similar issue is during a failure 
when writing the buffer out to the result partition.

 Buffer recycled IllegalStateException during cancelling
 -

 Key: FLINK-2089
 URL: https://issues.apache.org/jira/browse/FLINK-2089
 Project: Flink
  Issue Type: Bug
  Components: Distributed Runtime
Affects Versions: master
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi
 Fix For: 0.9


 [~rmetzger] reported the following stack trace during cancelling of high 
 parallelism jobs:
 {code}
 Error: java.lang.IllegalStateException: Buffer has already been recycled.
 at 
 org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78)
 at 
 org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72)
 at 
 org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80)
 at 
 org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34)
 at 
 org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
 at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 This looks like a concurrent buffer pool release/buffer usage error. I'm 
 investing this today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling

2015-08-19 Thread Min Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703425#comment-14703425
 ] 

Min Jiang commented on FLINK-2089:
--

I re-ran today and directed to a result log file. On the screen I got the 
message stack as below
org.apache.flink.client.program.ProgramInvocationException: The program 
execution failed: Job execution failed.
at org.apache.flink.client.program.Client.run(Client.java:413)
at org.apache.flink.client.program.Client.run(Client.java:356)
at org.apache.flink.client.program.Client.run(Client.java:349)
at 
org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:63)
at min.play.RWA.main(RWA.java:71)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
at 
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
at org.apache.flink.client.program.Client.run(Client.java:315)
at 
org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:584)
at org.apache.flink.client.CliFrontend.run(CliFrontend.java:290)
at 
org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:880)
at org.apache.flink.client.CliFrontend.main(CliFrontend.java:922)
Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution 
failed.
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:314)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at 
org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:36)
at 
org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:29)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at 
org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:29)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at 
org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:92)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
at akka.dispatch.Mailbox.run(Mailbox.scala:221)
at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.IllegalStateException: Buffer has already been recycled.
at 
org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173)
at 
org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142)
at 
org.apache.flink.runtime.io.network.buffer.Buffer.setSize(Buffer.java:105)
at 
org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.getCurrentBuffer(SpanningRecordSerializer.java:151)
at 
org.apache.flink.runtime.io.network.api.writer.RecordWriter.clearBuffers(RecordWriter.java:166)
at 
org.apache.flink.runtime.operators.RegularPactTask.clearWriters(RegularPactTask.java:1532)
at 
org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:220)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
at java.lang.Thread.run(Thread.java:745)

Below is the log message from result log:
08/19/2015 13:06:58 Job execution switched to status RUNNING.
08/19/2015 13:06:58 CHAIN DataSource (at main(RWA.java:40) 
(org.apache.flink.api.java.io.TextInputFormat)) - Filter (Filter at 
main(RWA.java:40))(1/1) switched to SCHEDULED
08/19/2015 13:06:58 CHAIN DataSource (at main(RWA.java:40) 
(org.apache.flink.api.java.io.TextInputFormat)) - Filter (Filter at 
main(RWA.java:40))(1/1) switched to DEPLOYING
08/19/2015 13:06:58 CHAIN DataSource (at main(RWA.java:43) 
(org.apache.flink.api.java.io.TextInputFormat)) - Filter (Filter at 

[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling

2015-08-19 Thread Min Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703445#comment-14703445
 ] 

Min Jiang commented on FLINK-2089:
--

I re-ran today and directed to a result log file. On the screen I got the
message stack as below
org.apache.flink.client.program.ProgramInvocationException: The program
execution failed: Job execution failed.
at org.apache.flink.client.program.Client.run(Client.java:413)
at org.apache.flink.client.program.Client.run(Client.java:356)
at org.apache.flink.client.program.Client.run(Client.java:349)
at
org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:63)
at min.play.RWA.main(RWA.java:71)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
at
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
at org.apache.flink.client.program.Client.run(Client.java:315)
at
org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:584)
at org.apache.flink.client.CliFrontend.run(CliFrontend.java:290)
at
org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:880)
at org.apache.flink.client.CliFrontend.main(CliFrontend.java:922)
Caused by: org.apache.flink.runtime.client.JobExecutionException: Job
execution failed.
at
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:314)
at
scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at
org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:36)
at
org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:29)
at
scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at
org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:29)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at
org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:92)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
at akka.dispatch.Mailbox.run(Mailbox.scala:221)
at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.IllegalStateException: Buffer has already been
recycled.
at
org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173)
at
org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142)
at
org.apache.flink.runtime.io.network.buffer.Buffer.setSize(Buffer.java:105)
at
org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.getCurrentBuffer(SpanningRecordSerializer.java:151)
at
org.apache.flink.runtime.io.network.api.writer.RecordWriter.clearBuffers(RecordWriter.java:166)
at
org.apache.flink.runtime.operators.RegularPactTask.clearWriters(RegularPactTask.java:1532)
at
org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:220)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
at java.lang.Thread.run(Thread.java:745)

Below is the log message from result log:
08/19/2015 13:06:58 Job execution switched to status RUNNING.
08/19/2015 13:06:58 CHAIN DataSource (at main(RWA.java:40)
(org.apache.flink.api.java.io.TextInputFormat)) - Filter (Filter at
main(RWA.java:40))(1/1) switched to SCHEDULED
08/19/2015 13:06:58 CHAIN DataSource (at main(RWA.java:40)
(org.apache.flink.api.java.io.TextInputFormat)) - Filter (Filter at
main(RWA.java:40))(1/1) switched to DEPLOYING
08/19/2015 13:06:58 CHAIN DataSource (at main(RWA.java:43)
(org.apache.flink.api.java.io.TextInputFormat)) - Filter (Filter at
main(RWA.java:43))(1/1) switched to SCHEDULED

[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling

2015-08-19 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702716#comment-14702716
 ] 

Ufuk Celebi commented on FLINK-2089:


Thanks for researching the existing issues and reporting this.

You are right that it is not resolved. I will reopen.

I've just looked at the fix 28eb274. You ran into a corner case that was not 
covered by 28eb274.

 Buffer recycled IllegalStateException during cancelling
 -

 Key: FLINK-2089
 URL: https://issues.apache.org/jira/browse/FLINK-2089
 Project: Flink
  Issue Type: Bug
  Components: Distributed Runtime
Affects Versions: master
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi
 Fix For: 0.9


 [~rmetzger] reported the following stack trace during cancelling of high 
 parallelism jobs:
 {code}
 Error: java.lang.IllegalStateException: Buffer has already been recycled.
 at 
 org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78)
 at 
 org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72)
 at 
 org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80)
 at 
 org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34)
 at 
 org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
 at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 This looks like a concurrent buffer pool release/buffer usage error. I'm 
 investing this today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling

2015-08-18 Thread Min Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702049#comment-14702049
 ] 

Min Jiang commented on FLINK-2089:
--

The issue seems not resolved. I downloaded the version 0.9 and ran my code. In 
IDE, my code was working fine with data containing about 300,000 records, each 
record having 550 fields. But when running the code in linux as client, i got 
the same error message stack: 

08/18/2015 17:30:10 CHAIN DataSource (at main(RWA.java:40) 
(org.apache.flink.api.java.io.TextInputFormat)) - Filter (Filter at 
main(RWA.java:4   0))(1/1) switched to FAILED
java.lang.IllegalStateException: Buffer has already been recycled.
at 
org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173)
at 
org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142)
at 
org.apache.flink.runtime.io.network.buffer.Buffer.setSize(Buffer.java:105)
at 
org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.getCurrentBuffer(SpanningRecordSerializer.java:151)
at 
org.apache.flink.runtime.io.network.api.writer.RecordWriter.clearBuffers(RecordWriter.java:166)
at 
org.apache.flink.runtime.operators.RegularPactTask.clearWriters(RegularPactTask.java:1532)
at 
org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:220)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
at java.lang.Thread.run(Thread.java:745)



 Buffer recycled IllegalStateException during cancelling
 -

 Key: FLINK-2089
 URL: https://issues.apache.org/jira/browse/FLINK-2089
 Project: Flink
  Issue Type: Bug
  Components: Distributed Runtime
Affects Versions: master
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi
 Fix For: 0.9


 [~rmetzger] reported the following stack trace during cancelling of high 
 parallelism jobs:
 {code}
 Error: java.lang.IllegalStateException: Buffer has already been recycled.
 at 
 org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78)
 at 
 org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72)
 at 
 org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80)
 at 
 org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34)
 at 
 org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
 at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 This looks like a concurrent buffer pool release/buffer usage error. I'm 
 investing this today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling

2015-05-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561276#comment-14561276
 ] 

ASF GitHub Bot commented on FLINK-2089:
---

GitHub user uce opened a pull request:

https://github.com/apache/flink/pull/736

[FLINK-2089] [runtime] Fix possible duplicate buffer release

This PR contains multiple independent commits, which address issues 
discovered while debugging FLINK-2089.

- It adds the partition request backoff logic to local requests as well. 
The backoffs were introduced recently for remote requests. I've missed that the 
same problem could also happen for local input channels. The fix was easy and 
moves the backoff logic to the abstract InputChannel, which both Local and 
RemoteInputChannel extend.

- The duplicate buffer release was hard to track. In some corner cases, the 
record serializers were incorrectly holding references to buffers *after* 
written them out to a result partition. In failure cases, the serializers 
recycled these buffers too early. The later recycling (by the component, which 
is actually responsible for this) then resulted in an IllegalStateException.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uce/incubator-flink cancel-2089

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/736.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #736


commit e134f44640f2caafc6cff76fd100b11e3aa47515
Author: Ufuk Celebi u...@apache.org
Date:   2015-05-26T09:15:17Z

[runtime] [tests] Add TaskCancelTest

commit d6a33bfda84ea861e05e7a0aff6c529808c02bb2
Author: Ufuk Celebi u...@apache.org
Date:   2015-05-26T13:37:35Z

[FLINK-1636] [runtime] Add partition request backoff logic to 
LocalInputChannel

commit 7ea3ed2ad4c95c1bec0f2d558ba0d4faf9716f14
Author: Ufuk Celebi u...@apache.org
Date:   2015-05-27T12:49:01Z

[FLINK-2089] Fix possible duplicate buffer release

Problem: RecordWriter instances have stateful serializers, which include the
buffer that they currently work with. If the serializer state is not cleared
correctly by the writers after writing a buffer to the respective result
partition, it is possible that buffers are recycled multiple times in 
failure
cases. This results in an IllegalStateException.

Solution: After writing a buffer to a ResultPartition, the RecordWriter 
makes
sure that the serializer clears the reference to the respective buffer. The
recycling of the buffer is then the responsibility of the result partition.

commit 91b6049ac371a62671c36a8280b0a60f1b2b7408
Author: Ufuk Celebi u...@apache.org
Date:   2015-05-27T16:26:46Z

[runtime] [logging] Fix log message




 Buffer recycled IllegalStateException during cancelling
 -

 Key: FLINK-2089
 URL: https://issues.apache.org/jira/browse/FLINK-2089
 Project: Flink
  Issue Type: Bug
  Components: Distributed Runtime
Affects Versions: master
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi

 [~rmetzger] reported the following stack trace during cancelling of high 
 parallelism jobs:
 {code}
 Error: java.lang.IllegalStateException: Buffer has already been recycled.
 at 
 org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78)
 at 
 org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72)
 at 
 org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80)
 at 
 org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34)
 at 
 org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
 at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 This looks like a concurrent buffer pool release/buffer usage error. I'm 
 investing this today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling

2015-05-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561736#comment-14561736
 ] 

ASF GitHub Bot commented on FLINK-2089:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/736


 Buffer recycled IllegalStateException during cancelling
 -

 Key: FLINK-2089
 URL: https://issues.apache.org/jira/browse/FLINK-2089
 Project: Flink
  Issue Type: Bug
  Components: Distributed Runtime
Affects Versions: master
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi

 [~rmetzger] reported the following stack trace during cancelling of high 
 parallelism jobs:
 {code}
 Error: java.lang.IllegalStateException: Buffer has already been recycled.
 at 
 org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78)
 at 
 org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72)
 at 
 org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80)
 at 
 org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34)
 at 
 org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
 at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 This looks like a concurrent buffer pool release/buffer usage error. I'm 
 investing this today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling

2015-05-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561734#comment-14561734
 ] 

ASF GitHub Bot commented on FLINK-2089:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/736#issuecomment-106074396
  
Travis gives a green light. I'm merging this. This will will give the 
changes as much testing exposure before the release as possible.


 Buffer recycled IllegalStateException during cancelling
 -

 Key: FLINK-2089
 URL: https://issues.apache.org/jira/browse/FLINK-2089
 Project: Flink
  Issue Type: Bug
  Components: Distributed Runtime
Affects Versions: master
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi

 [~rmetzger] reported the following stack trace during cancelling of high 
 parallelism jobs:
 {code}
 Error: java.lang.IllegalStateException: Buffer has already been recycled.
 at 
 org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78)
 at 
 org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72)
 at 
 org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80)
 at 
 org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34)
 at 
 org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
 at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 This looks like a concurrent buffer pool release/buffer usage error. I'm 
 investing this today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)