[jira] [Commented] (FLINK-12113) User code passing to fromCollection(Iterator, Class) not cleaned
[ https://issues.apache.org/jira/browse/FLINK-12113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16816008#comment-16816008 ] yankai zhang commented on FLINK-12113: -- I'm not quite familiar with flink project development, maybe you can help fix this, thx. > User code passing to fromCollection(Iterator, Class) not cleaned > > > Key: FLINK-12113 > URL: https://issues.apache.org/jira/browse/FLINK-12113 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Affects Versions: 1.7.2 >Reporter: yankai zhang >Priority: Major > Attachments: image-2019-04-07-21-52-37-264.png, > image-2019-04-08-23-19-27-359.png > > > > {code:java} > interface IS extends Iterator, Serializable { } > StreamExecutionEnvironment env = > StreamExecutionEnvironment.getExecutionEnvironment(); > env.fromCollection(new IS() { > @Override > public boolean hasNext() { > return false; > } > @Override > public Object next() { > return null; > } > }, Object.class); > {code} > Code piece above throws exception: > {code:java} > org.apache.flink.api.common.InvalidProgramException: The implementation of > the SourceFunction is not serializable. The object probably contains or > references non serializable fields. > at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:99) > {code} > And my workaround is wrapping clean around iterator instance, like this: > > {code:java} > StreamExecutionEnvironment env = > StreamExecutionEnvironment.getExecutionEnvironment(); > env.fromCollection(env.clean(new IS() { > @Override > public boolean hasNext() { > return false; > } > @Override > public Object next() { > return null; > } > }), Object.class); > {code} > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (FLINK-12113) User code passing to fromCollection(Iterator, Class) not cleaned
[ https://issues.apache.org/jira/browse/FLINK-12113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812967#comment-16812967 ] yankai zhang edited comment on FLINK-12113 at 4/9/19 3:20 AM: -- Yes, _fromCollection(Iterator, Class)_ works well as expected without anonymous class. Problem here is anonymous class object in instance method implicitly references outer _this_(but not actually used), while outer _this_ is not serializable, and this is exactly what _StreamExecutionEnvironment#clean_ supposed to do. In fact, the iterator passed by user is wrapped within a _FromIteratorFunction_, and then _StreamExecutionEnvironment#clean_ is called on that wrapper _ _instance, not the iterator itself. However current implementation of _StreamExecutionEnvironment#clean_ is not recursive, it can't find and clean _this_ deeply nested in closure. Here is my fully reproducible code: {code:java} public class MainTest { interface IS extends Iterator, Serializable { } @Test public void cleanTest() { StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.fromCollection(new IS() { @Override public boolean hasNext() { return false; } @Override public Object next() { return null; } }, Object.class); } }{code} was (Author: vision57): Yes, _fromCollection(Iterator, Class)_ works well as expected without anonymous class. Problem here is anonymous class object in instance method implicitly references outer _this_(but not actually used), while outer _this_ is not serializable, and this is exactly what _StreamExecutionEnvironment#clean_ supposed to do. In act, the iterator passed by user is wrapped within a _FromIteratorFunction_, and then _StreamExecutionEnvironment#clean_ is called on that wrapper __ instance, not the iterator itself. However current implementation of _StreamExecutionEnvironment#clean_ is not recursive, it can't find and clean _this_ deeply nested in closure. Here is my fully reproducible code: {code:java} public class MainTest { interface IS extends Iterator, Serializable { } @Test public void cleanTest() { StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.fromCollection(new IS() { @Override public boolean hasNext() { return false; } @Override public Object next() { return null; } }, Object.class); } }{code} > User code passing to fromCollection(Iterator, Class) not cleaned > > > Key: FLINK-12113 > URL: https://issues.apache.org/jira/browse/FLINK-12113 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Affects Versions: 1.7.2 >Reporter: yankai zhang >Priority: Major > Attachments: image-2019-04-07-21-52-37-264.png, > image-2019-04-08-23-19-27-359.png > > > > {code:java} > interface IS extends Iterator, Serializable { } > StreamExecutionEnvironment env = > StreamExecutionEnvironment.getExecutionEnvironment(); > env.fromCollection(new IS() { > @Override > public boolean hasNext() { > return false; > } > @Override > public Object next() { > return null; > } > }, Object.class); > {code} > Code piece above throws exception: > {code:java} > org.apache.flink.api.common.InvalidProgramException: The implementation of > the SourceFunction is not serializable. The object probably contains or > references non serializable fields. > at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:99) > {code} > And my workaround is wrapping clean around iterator instance, like this: > > {code:java} > StreamExecutionEnvironment env = > StreamExecutionEnvironment.getExecutionEnvironment(); > env.fromCollection(env.clean(new IS() { > @Override > public boolean hasNext() { > return false; > } > @Override > public Object next() { > return null; > } > }), Object.class); > {code} > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-12113) User code passing to fromCollection(Iterator, Class) not cleaned
[ https://issues.apache.org/jira/browse/FLINK-12113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812967#comment-16812967 ] yankai zhang commented on FLINK-12113: -- Yes, _fromCollection(Iterator, Class)_ works well as expected without anonymous class. Problem here is anonymous class object in instance method implicitly references outer _this_(but not actually used), while outer _this_ is not serializable, and this is exactly what _StreamExecutionEnvironment#clean_ supposed to do. In act, the iterator passed by user is wrapped within a _FromIteratorFunction_, and then _StreamExecutionEnvironment#clean_ is called on that wrapper __ instance, not the iterator itself. However current implementation of _StreamExecutionEnvironment#clean_ is not recursive, it can't find and clean _this_ deeply nested in closure. Here is my fully reproducible code: {code:java} public class MainTest { interface IS extends Iterator, Serializable { } @Test public void cleanTest() { StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.fromCollection(new IS() { @Override public boolean hasNext() { return false; } @Override public Object next() { return null; } }, Object.class); } }{code} > User code passing to fromCollection(Iterator, Class) not cleaned > > > Key: FLINK-12113 > URL: https://issues.apache.org/jira/browse/FLINK-12113 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Affects Versions: 1.7.2 >Reporter: yankai zhang >Priority: Major > Attachments: image-2019-04-07-21-52-37-264.png, > image-2019-04-08-23-19-27-359.png > > > > {code:java} > interface IS extends Iterator, Serializable { } > StreamExecutionEnvironment env = > StreamExecutionEnvironment.getExecutionEnvironment(); > env.fromCollection(new IS() { > @Override > public boolean hasNext() { > return false; > } > @Override > public Object next() { > return null; > } > }, Object.class); > {code} > Code piece above throws exception: > {code:java} > org.apache.flink.api.common.InvalidProgramException: The implementation of > the SourceFunction is not serializable. The object probably contains or > references non serializable fields. > at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:99) > {code} > And my workaround is wrapping clean around iterator instance, like this: > > {code:java} > StreamExecutionEnvironment env = > StreamExecutionEnvironment.getExecutionEnvironment(); > env.fromCollection(env.clean(new IS() { > @Override > public boolean hasNext() { > return false; > } > @Override > public Object next() { > return null; > } > }), Object.class); > {code} > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (FLINK-12113) User code passing to fromCollection(Iterator, Class) not cleaned
[ https://issues.apache.org/jira/browse/FLINK-12113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812049#comment-16812049 ] yankai zhang edited comment on FLINK-12113 at 4/8/19 2:43 AM: -- Interesting. I guess maybe java has some optimizing to make your anonymous class instance static, so you don't have reference to outer _this_. I find an explaination on stackoverflow: https://stackoverflow.com/a/758616/4281058. Actually there is no outer _this_ in your case, you can try putting your code into an instance method instead of static main. was (Author: vision57): Interesting. I guess maybe java has some optimizing to make your anonymous class instance static, so you don't have reference to outer _this_. I find [an explaination on stackoverflow|[https://stackoverflow.com/a/758616/4281058]]. Actually there is no outer _this_ in your case, you can try putting your code into an instance method instead of static main. > User code passing to fromCollection(Iterator, Class) not cleaned > > > Key: FLINK-12113 > URL: https://issues.apache.org/jira/browse/FLINK-12113 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Affects Versions: 1.7.2 >Reporter: yankai zhang >Priority: Major > Attachments: image-2019-04-07-21-52-37-264.png > > > > {code:java} > interface IS extends Iterator, Serializable { } > StreamExecutionEnvironment env = > StreamExecutionEnvironment.getExecutionEnvironment(); > env.fromCollection(new IS() { > @Override > public boolean hasNext() { > return false; > } > @Override > public Object next() { > return null; > } > }, Object.class); > {code} > Code piece above throws exception: > {code:java} > org.apache.flink.api.common.InvalidProgramException: The implementation of > the SourceFunction is not serializable. The object probably contains or > references non serializable fields. > at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:99) > {code} > And my workaround is wrapping clean around iterator instance, like this: > > {code:java} > StreamExecutionEnvironment env = > StreamExecutionEnvironment.getExecutionEnvironment(); > env.fromCollection(env.clean(new IS() { > @Override > public boolean hasNext() { > return false; > } > @Override > public Object next() { > return null; > } > }), Object.class); > {code} > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (FLINK-12113) User code passing to fromCollection(Iterator, Class) not cleaned
[ https://issues.apache.org/jira/browse/FLINK-12113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812049#comment-16812049 ] yankai zhang edited comment on FLINK-12113 at 4/8/19 2:40 AM: -- Interesting. I guess maybe java has some optimizing to make your anonymous class instance static, so you don't have reference to outer _this_. I find [an explaination on stackoverflow|[https://stackoverflow.com/a/758616/4281058]]. Actually there is no outer _this_ in your case, you can try putting your code into an instance method instead of static main. was (Author: vision57): Interesting. I guess maybe java has some optimizing to make your anonymous class instance static, so you don't have reference to outer _this_. I find [an explaination on stackoverflow|[https://stackoverflow.com/a/758616/4281058].] Actually there is no outer _this_ in your case, you can try putting your code into an instance method instead of static main. > User code passing to fromCollection(Iterator, Class) not cleaned > > > Key: FLINK-12113 > URL: https://issues.apache.org/jira/browse/FLINK-12113 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Affects Versions: 1.7.2 >Reporter: yankai zhang >Priority: Major > Attachments: image-2019-04-07-21-52-37-264.png > > > > {code:java} > interface IS extends Iterator, Serializable { } > StreamExecutionEnvironment env = > StreamExecutionEnvironment.getExecutionEnvironment(); > env.fromCollection(new IS() { > @Override > public boolean hasNext() { > return false; > } > @Override > public Object next() { > return null; > } > }, Object.class); > {code} > Code piece above throws exception: > {code:java} > org.apache.flink.api.common.InvalidProgramException: The implementation of > the SourceFunction is not serializable. The object probably contains or > references non serializable fields. > at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:99) > {code} > And my workaround is wrapping clean around iterator instance, like this: > > {code:java} > StreamExecutionEnvironment env = > StreamExecutionEnvironment.getExecutionEnvironment(); > env.fromCollection(env.clean(new IS() { > @Override > public boolean hasNext() { > return false; > } > @Override > public Object next() { > return null; > } > }), Object.class); > {code} > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-12113) User code passing to fromCollection(Iterator, Class) not cleaned
[ https://issues.apache.org/jira/browse/FLINK-12113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812049#comment-16812049 ] yankai zhang commented on FLINK-12113: -- Interesting. I guess maybe java has some optimizing to make your anonymous class instance static, so you don't have reference to outer _this_. I find [an explaination on stackoverflow|[https://stackoverflow.com/a/758616/4281058].] Actually there is no outer _this_ in your case, you can try putting your code into an instance method instead of static main. > User code passing to fromCollection(Iterator, Class) not cleaned > > > Key: FLINK-12113 > URL: https://issues.apache.org/jira/browse/FLINK-12113 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Affects Versions: 1.7.2 >Reporter: yankai zhang >Priority: Major > Attachments: image-2019-04-07-21-52-37-264.png > > > > {code:java} > interface IS extends Iterator, Serializable { } > StreamExecutionEnvironment env = > StreamExecutionEnvironment.getExecutionEnvironment(); > env.fromCollection(new IS() { > @Override > public boolean hasNext() { > return false; > } > @Override > public Object next() { > return null; > } > }, Object.class); > {code} > Code piece above throws exception: > {code:java} > org.apache.flink.api.common.InvalidProgramException: The implementation of > the SourceFunction is not serializable. The object probably contains or > references non serializable fields. > at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:99) > {code} > And my workaround is wrapping clean around iterator instance, like this: > > {code:java} > StreamExecutionEnvironment env = > StreamExecutionEnvironment.getExecutionEnvironment(); > env.fromCollection(env.clean(new IS() { > @Override > public boolean hasNext() { > return false; > } > @Override > public Object next() { > return null; > } > }), Object.class); > {code} > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-12113) User code passing to fromCollection(Iterator, Class) not cleaned
[ https://issues.apache.org/jira/browse/FLINK-12113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yankai zhang updated FLINK-12113: - Description: {code:java} interface IS extends Iterator, Serializable { } StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.fromCollection(new IS() { @Override public boolean hasNext() { return false; } @Override public Object next() { return null; } }, Object.class); {code} Code piece above throws exception: {code:java} org.apache.flink.api.common.InvalidProgramException: The implementation of the SourceFunction is not serializable. The object probably contains or references non serializable fields. at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:99) {code} And my workaround is wrapping clean around iterator instance, like this: {code:java} StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.fromCollection(env.clean(new IS() { @Override public boolean hasNext() { return false; } @Override public Object next() { return null; } }), Object.class); {code} was: {code:java} interface IS extends Iterator, Serializable { } StreamExecutionEnvironment .getExecutionEnvironment() .fromCollection(new IS() { @Override public boolean hasNext() { return false; } @Override public Object next() { return null; } }, Object.class); {code} Code piece above throws exception: {code:java} org.apache.flink.api.common.InvalidProgramException: The implementation of the SourceFunction is not serializable. The object probably contains or references non serializable fields. at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:99) {code} > User code passing to fromCollection(Iterator, Class) not cleaned > > > Key: FLINK-12113 > URL: https://issues.apache.org/jira/browse/FLINK-12113 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Affects Versions: 1.7.2 >Reporter: yankai zhang >Priority: Major > > > {code:java} > interface IS extends Iterator, Serializable { } > StreamExecutionEnvironment env = > StreamExecutionEnvironment.getExecutionEnvironment(); > env.fromCollection(new IS() { > @Override > public boolean hasNext() { > return false; > } > @Override > public Object next() { > return null; > } > }, Object.class); > {code} > Code piece above throws exception: > {code:java} > org.apache.flink.api.common.InvalidProgramException: The implementation of > the SourceFunction is not serializable. The object probably contains or > references non serializable fields. > at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:99) > {code} > And my workaround is wrapping clean around iterator instance, like this: > > {code:java} > StreamExecutionEnvironment env = > StreamExecutionEnvironment.getExecutionEnvironment(); > env.fromCollection(env.clean(new IS() { > @Override > public boolean hasNext() { > return false; > } > @Override > public Object next() { > return null; > } > }), Object.class); > {code} > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12113) User code passing to fromCollection(Iterator, Class) not cleaned
yankai zhang created FLINK-12113: Summary: User code passing to fromCollection(Iterator, Class) not cleaned Key: FLINK-12113 URL: https://issues.apache.org/jira/browse/FLINK-12113 Project: Flink Issue Type: Bug Components: API / DataStream Affects Versions: 1.7.2 Reporter: yankai zhang {code:java} interface IS extends Iterator, Serializable { } StreamExecutionEnvironment .getExecutionEnvironment() .fromCollection(new IS() { @Override public boolean hasNext() { return false; } @Override public Object next() { return null; } }, Object.class); {code} Code piece above throws exception: {code:java} org.apache.flink.api.common.InvalidProgramException: The implementation of the SourceFunction is not serializable. The object probably contains or references non serializable fields. at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:99) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)