[ https://issues.apache.org/jira/browse/FLINK-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246554#comment-14246554 ]
Fabian Hueske commented on FLINK-1259: -------------------------------------- Yes, let's add it to the documentation for now. If we find that many users run into this problem, we can integrate it into the object non-reuse mode. > FilterFunction can modify data > ------------------------------ > > Key: FLINK-1259 > URL: https://issues.apache.org/jira/browse/FLINK-1259 > Project: Flink > Issue Type: Bug > Components: Java API, Optimizer, Scala API > Affects Versions: 0.7.0-incubating > Reporter: Fabian Hueske > > The FilterFunction returns a boolean for an input record which determines > whether the record is filtered or not. > However, the function can also modify the input record which has effects if > the record is not filtered. > The optimizer assumes that the data is not changed by a FilterFunction, i.e., > it assumes that a Filter preserves physical data properties (orders, > partitionings, etc.) and might also be pushed down in the future. These > assumptions can result in semantically incorrect programs, if the function > actually changes its incoming records. > Possible solutions are: > - document the requirements (and hope that users read it and behave nicely) > - hand a copy to the function which can be modified but is not passed on. > This has major performance implications and might confuse users as changes > are invalidated. However, this could also be integrated with the > mutable/immutable runtime switch (FLINK-1005) -- This message was sent by Atlassian JIRA (v6.3.4#6332)