[
https://issues.apache.org/jira/browse/FLINK-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246543#comment-14246543
]
Stephan Ewen commented on FLINK-1259:
-------------------------------------
Any updates or other opinions on that? Otherwise, let's add to the Java / Scala
Docs that we assume that the data is not modified...
> FilterFunction can modify data
> ------------------------------
>
> Key: FLINK-1259
> URL: https://issues.apache.org/jira/browse/FLINK-1259
> Project: Flink
> Issue Type: Bug
> Components: Java API, Optimizer, Scala API
> Affects Versions: 0.7.0-incubating
> Reporter: Fabian Hueske
>
> The FilterFunction returns a boolean for an input record which determines
> whether the record is filtered or not.
> However, the function can also modify the input record which has effects if
> the record is not filtered.
> The optimizer assumes that the data is not changed by a FilterFunction, i.e.,
> it assumes that a Filter preserves physical data properties (orders,
> partitionings, etc.) and might also be pushed down in the future. These
> assumptions can result in semantically incorrect programs, if the function
> actually changes its incoming records.
> Possible solutions are:
> - document the requirements (and hope that users read it and behave nicely)
> - hand a copy to the function which can be modified but is not passed on.
> This has major performance implications and might confuse users as changes
> are invalidated. However, this could also be integrated with the
> mutable/immutable runtime switch (FLINK-1005)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)