[
https://issues.apache.org/jira/browse/CRUNCH-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14034330#comment-14034330
]
Josh Wills commented on CRUNCH-420:
-----------------------------------
Yep, reading over it now. So it seems like we have two situations where
breakpointing is needed (I may have this wrong, but I'm going to try to write
it up):
1) We have two dependent GBK operations, and we want to signal to the planner
where to split in between them, which is handled by CRUNCH-294.
2) We have a single data prep step that is going to feed multiple downstream
GBKs. We don't want to run it twice in separate jobs (either b/c it's compute
intensive, or b/c it does an amazing job of filtering a large output file), so
we mark it as materialized and have it get created in a single map-only job
that then feeds the downstream GBKs, which is handled by this patch.
Is there another breakpoint situation I'm missing? Is there a reduce-side
version of this problem?
> Breakpoints Not Working
> -----------------------
>
> Key: CRUNCH-420
> URL: https://issues.apache.org/jira/browse/CRUNCH-420
> Project: Crunch
> Issue Type: Bug
> Environment: Crunch 0.8.2
> Reporter: Allan Shoup
> Assignee: Josh Wills
> Attachments: Breakpoint2IT.java, CRUNCH-420.patch,
> testBreakpoint_plan.png
>
>
> Reading through CRUNCH-294, it looks like materialize is supposed to function
> as a breakpoint to the planner. I've seen several plans where it appeared to
> me a particular DoFn shouldn't have been repeated, but it was.
> I'll attach some supporting material.
--
This message was sent by Atlassian JIRA
(v6.2#6252)