[
https://issues.apache.org/jira/browse/CRUNCH-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13533963#comment-13533963
]
Matthias Friedrich edited comment on CRUNCH-128 at 12/17/12 2:56 PM:
---------------------------------------------------------------------
I think I would have just added the two parallelDo() methods like Josh did in
patch revision aa73524. These two methods take additional parameters, so it
should be clear to users that they are dealing with a generalized parallelDo().
This is more difficult to infer from advancedParallelDo(). Also, my Eclipse
version puts methods with shorter signatures at the top when auto-completing.
Anyway, this is a matter of taste.
>From the "keeping the interface small" point of view, I think those two extra
>methods are warranted -- we need them, there's no way around it. We could
>remove the sample() and sort() methods to make room ;-)
Regarding CrunchRuntimeException: How about moving it to the base package? Then
we could still use it without introducing cycles.
was (Author: mafr):
I think I would have just added the two parallelDo() methods like Josh did
in patch revision aa73524. These two methods take additional parameters, so it
should be clear to users that they are dealing with a generalized parallelDo().
This is more difficult to infer from advancedParallelDo(). Also, my Eclipse
version puts methods with shorter signatures at the top when auto-completing
while advancedParallelDo(). Anyway, this is a matter of taste.
>From the "keeping the interface small" point of view, I think those two extra
>methods are warranted -- we need them, there's no way around it. We could
>remove the sample() and sort() methods to make room ;-)
Regarding CrunchRuntimeException: How about moving it to the base package? Then
we could still use it without introducing cycles.
> Allow one stage of an MR pipeline to depend on another target being created
> ---------------------------------------------------------------------------
>
> Key: CRUNCH-128
> URL: https://issues.apache.org/jira/browse/CRUNCH-128
> Project: Crunch
> Issue Type: Improvement
> Reporter: Josh Wills
> Attachments: CheckpointingIT.java, CRUNCH-128.patch,
> CRUNCH-128v2.patch, CRUNCH-128-with-op.patch
>
>
> There are a couple of problems (e.g., mapside-joins, total orderings, etc.)
> where we need to guarantee that one PCollection has been written to the
> FileSystem before another MapReduce pipeline that depends on that file is
> allowed to run. This doesn't fit cleanly into the current set of abstractions
> for Crunch, which is why we force pipelines to execute via the run command to
> guarantee that the files have been created before the second stage is run.
> We should add the ability for a particular PCollection to require that a
> SourceTarget instance has been created before it can be executed, and the
> planner should incorporate this information into the MR pipeline planning
> process.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira