[ 
https://issues.apache.org/jira/browse/FLINK-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15140820#comment-15140820
 ] 

ASF GitHub Bot commented on FLINK-3335:
---------------------------------------

Github user StephanEwen commented on the pull request:

    https://github.com/apache/flink/pull/1616#issuecomment-182390682
  
    I am very hesitant to introduce an "always copy" step in the data sources.
    Some data sources create Avro or Thrift types which are incredibly 
expensive to copy.
    
    I personally would prefer to accept that inconsistency at this point and 
make sure it is documented well (that source implementers should not emit the 
same internal elements multiple times).
    
    A user that wants to optimize the data source can actually check the object 
reuse config flag (the ExecutionConfig can be obtained from the RuntimeContext) 
and depending on that flag reuse objects differently in the code.
    
    What do you think?


> Fix DataSourceTask object reuse when disabled
> ---------------------------------------------
>
>                 Key: FLINK-3335
>                 URL: https://issues.apache.org/jira/browse/FLINK-3335
>             Project: Flink
>          Issue Type: Bug
>          Components: Distributed Runtime
>    Affects Versions: 1.0.0
>            Reporter: Greg Hogan
>            Assignee: Greg Hogan
>
> From {{DataSourceTask.invoke()}}:
> {code}
> if ((returned = format.nextRecord(serializer.createInstance())) != null) {
>     output.collect(returned);
> }
> {code}
> The returned value ({{returned}}) must be copied rather than creating and 
> passing in a new instance. The {{InputFormat}} interface only permits the 
> given object to be used and does not require a new object to be returned 
> otherwise.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to