[
https://issues.apache.org/jira/browse/CRUNCH-128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabriel Reid updated CRUNCH-128:
--------------------------------
Attachment: CheckpointingIT.java
As mentioned on Reviewboard, I encountered an issue with this implementation
where ended up in an infinite loop in the planner.
I was trying to see if this dependency functionality would easily lend itself
to adding pipeline checkpointing (something we discussed in the past). I'm
actually not even sure if this is the way I would want to do it, but in any
case, the attached test case will put the planner into an infinite loop.
This isn't standard use of the API (yet), so it's probably not that big of a
deal; on the other hand, infinite loops aren't that cool, so if you can see an
easy way to avoid getting into an infinite loop it would be good.
It just occurred to me that this might be a case of a circular dependency
somehow, in which case it would be pretty important that that gets detected
automatically.
> Allow one stage of an MR pipeline to depend on another target being created
> ---------------------------------------------------------------------------
>
> Key: CRUNCH-128
> URL: https://issues.apache.org/jira/browse/CRUNCH-128
> Project: Crunch
> Issue Type: Bug
> Reporter: Josh Wills
> Attachments: CheckpointingIT.java, CRUNCH-128.patch
>
>
> There are a couple of problems (e.g., mapside-joins, total orderings, etc.)
> where we need to guarantee that one PCollection has been written to the
> FileSystem before another MapReduce pipeline that depends on that file is
> allowed to run. This doesn't fit cleanly into the current set of abstractions
> for Crunch, which is why we force pipelines to execute via the run command to
> guarantee that the files have been created before the second stage is run.
> We should add the ability for a particular PCollection to require that a
> SourceTarget instance has been created before it can be executed, and the
> planner should incorporate this information into the MR pipeline planning
> process.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira