[ 
https://issues.apache.org/jira/browse/CRUNCH-269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773828#comment-13773828
 ] 

Gabriel Reid commented on CRUNCH-269:
-------------------------------------

Any idea on the actual slowdown caused by the deep copying? And if it's 
specific to pipelines that are operating on large objects? 

The reason I ask is I was wondering if it would be worth disabling deep copying 
by default. The deep copying is only needed if people are modifying objects in 
place and then passing them through, which is probably not such a great idea in 
general anyway (even if it is at times very useful). If there's a big 
performance hit on it (I've never profiled it to find out), then we might want 
to not do it by default when it usually isn't needed. Of course, that change 
could potentially break some stuff in existing pipelines.

+1 on the patch BTW.
                
> Allow clients to disable deep copies on intermediate DoFn outputs
> -----------------------------------------------------------------
>
>                 Key: CRUNCH-269
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-269
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Josh Wills
>            Assignee: Josh Wills
>         Attachments: CRUNCH-269.patch
>
>
> I have a pipeline that operates on some large objects, and the additional 
> overhead of creating a deep copy of them on intermediate outputs (i.e., DoFns 
> w/more than one child operation) when I know that all of their consumers are 
> going to be read-only is slowing down my runtime quite a bit. I'd like to 
> have an option that would allow me to disable intermediate deep copies on a 
> DoFn-by-DoFn basis and/or across an entire pipeline run when I know that it's 
> safe to do.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to