[ 
https://issues.apache.org/jira/browse/CRUNCH-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529657#comment-13529657
 ] 

Gabriel Reid commented on CRUNCH-128:
-------------------------------------

While you're at it, a similar (but also different) scenario is pipelines that 
read and write to the same path on the filesystem (typically from different 
PCollections). 

This will run without crashing as long as the directories are in place, but it 
is rarely or never what is really intended, and will give pretty unpredictable 
behavior. If you're adding some extra checking for loops, it would be good to 
handle that situation as well.
                
> Allow one stage of an MR pipeline to depend on another target being created
> ---------------------------------------------------------------------------
>
>                 Key: CRUNCH-128
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-128
>             Project: Crunch
>          Issue Type: Improvement
>            Reporter: Josh Wills
>         Attachments: CheckpointingIT.java, CRUNCH-128.patch
>
>
> There are a couple of problems (e.g., mapside-joins, total orderings, etc.) 
> where we need to guarantee that one PCollection has been written to the 
> FileSystem before another MapReduce pipeline that depends on that file is 
> allowed to run. This doesn't fit cleanly into the current set of abstractions 
> for Crunch, which is why we force pipelines to execute via the run command to 
> guarantee that the files have been created before the second stage is run.
> We should add the ability for a particular PCollection to require that a 
> SourceTarget instance has been created before it can be executed, and the 
> planner should incorporate this information into the MR pipeline planning 
> process.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to