[ https://issues.apache.org/jira/browse/FLINK-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14566817#comment-14566817 ]
ASF GitHub Bot commented on FLINK-2076: --------------------------------------- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/751#issuecomment-107255722 Thanks, this looks like some seriously great debugging! Very nice :-) It would be great if you could add a test that produces the error without the fix, and validates that the fix resolved it. I would guess that you have a setup that produced this error (for debugging). Can you add this as a test? Also, can we change the fix such that it adds a second memory segment, if it is non-null? That would help maintain the performance characteristics of the current code. I vaguely remember that there was a reason to add two memory segments (that code was written quite a while ago and I should have put more comments into the code). > Bug in re-openable hash join > ---------------------------- > > Key: FLINK-2076 > URL: https://issues.apache.org/jira/browse/FLINK-2076 > Project: Flink > Issue Type: Bug > Components: Local Runtime > Affects Versions: 0.9 > Reporter: Stephan Ewen > Assignee: Chiwan Park > > It happens deterministically in my machine with the following setup: > TaskManager: > - heap size: 512m > - network buffers: 4096 > - slots: 32 > Job: > - ConnectedComponents > - 100k vertices > - 1.2m edges > --> this gives around 260 m Flink managed memory, across 32 slots is 8MB per > slot, with several mem consumers in the job, makes the iterative hash join > out-of-core > {code} > java.lang.RuntimeException: Hash Join bug in memory management: > Memory buffers leaked. > at > org.apache.flink.runtime.operators.hash.MutableHashTable.buildTableFromSpilledPartition(MutableHashTable.java:733) > at > org.apache.flink.runtime.operators.hash.MutableHashTable.prepareNextPartition(MutableHashTable.java:508) > at > org.apache.flink.runtime.operators.hash.ReOpenableMutableHashTable.prepareNextPartition(ReOpenableMutableHashTable.java:167) > at > org.apache.flink.runtime.operators.hash.MutableHashTable.nextRecord(MutableHashTable.java:541) > at > org.apache.flink.runtime.operators.hash.NonReusingBuildSecondHashMatchIterator.callWithNextKey(NonReusingBuildSecondHashMatchIterator.java:102) > at > org.apache.flink.runtime.operators.AbstractCachedBuildSideMatchDriver.run(AbstractCachedBuildSideMatchDriver.java:155) > at > org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) > at > org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:139) > at > org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92) > at > org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:560) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)