Re: Generating DOT files for Crunch job plans

Matthias Friedrich Sat, 27 Oct 2012 06:40:26 -0700

Hi,

On Saturday, 2012-10-27, Gabriel Reid wrote:
> In the few times that I've debugged issues in the planner in Crunch,
> it always takes me a bit of time to figure out (again) how things
> work there. I've been thinking/planning of writing some more inline
> docs and doing a bit of refactoring in the code to help myself (and
> others) with doing this in the future, but something else that I was
> thinking of was the generation of DOT[1] files for pipelines so that
> it's easier to visualize what's going on.


That's a great idea, it will help to win prospective users over who
wonder whether Crunch's performs as well as a sequence of hand-written
MR jobs.

There are other ways in Java to generate graphs, BTW, but from my
experience none of them produces output that matches dot/graphviz. In
my opinion we shouldn't run dot ourselves though, because most users
don't have dot installed. just generate the output and let users call
dot themselves.
 
> I'm sure that functionality like this can be useful (at least to me,
> as I was just using it in a somewhat ad-hoc way to debug
> CRUNCH-102), but I'm not sure if this is something we want to expose
> easily, or keep pretty hidden to just use for debugging. I believe
> Pig provides this same functionality with the "explain" command.
 
> Any thoughts on adding this, particularly around how we could/should
> expose it in the API?
 
I think we should make it available for users and make it really easy
to access it. I'm not sure about the API, though. Since it's really
cheap to create we could always generate dot output, store it inside
the Configuration instance and provide a static utility class to
access it? A while ago we discussed moving debugging/log4j manipulation
logic out of the MRPipeline, perhaps we can use a single CrunchDebug
utilty for both.

Regards,
  Matthias

Re: Generating DOT files for Crunch job plans

Reply via email to