[ 
https://issues.apache.org/jira/browse/FLINK-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246554#comment-14246554
 ] 

Fabian Hueske commented on FLINK-1259:
--------------------------------------

Yes, let's add it to the documentation for now.
If we find that many users run into this problem, we can integrate it into the 
object non-reuse mode.

> FilterFunction can modify data
> ------------------------------
>
>                 Key: FLINK-1259
>                 URL: https://issues.apache.org/jira/browse/FLINK-1259
>             Project: Flink
>          Issue Type: Bug
>          Components: Java API, Optimizer, Scala API
>    Affects Versions: 0.7.0-incubating
>            Reporter: Fabian Hueske
>
> The FilterFunction returns a boolean for an input record which determines 
> whether the record is filtered or not. 
> However, the function can also modify the input record which has effects if 
> the record is not filtered.
> The optimizer assumes that the data is not changed by a FilterFunction, i.e., 
> it assumes that a Filter preserves physical data properties (orders, 
> partitionings, etc.) and might also be pushed down in the future. These 
> assumptions can result in semantically incorrect programs, if the function 
> actually changes its incoming records.
> Possible solutions are:
> - document the requirements (and hope that users read it and behave nicely)
> - hand a copy to the function which can be modified but is not passed on. 
> This has major performance implications and might confuse users as changes 
> are invalidated. However, this could also be integrated with the 
> mutable/immutable runtime switch (FLINK-1005)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to