[ 
https://issues.apache.org/jira/browse/PIG-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265100#comment-13265100
 ] 

Bill Graham commented on PIG-2586:
----------------------------------

As part of Twitter's Hackweek we developed a first pass at a visualization tool 
for Pig that focused on visualizing the run-time execution of jobs in a pig 
script. This helps our developers when running scripts with very large DAGs. 
We're in the process of open sourcing it, but I'll describe it here to see if 
parts of it might be leveraged, built upon or learned from.

* Design
When executing a pig script from the command line, we insert a 
{{PigProgressNotificationListener}} per PIG-2525. The PPNL launches an embedded 
Jetty server that exposes a json API of dag/script/job/progress info. Also 
embedded is the HTML/js/css content for a single page that renders the DAG, 
polls for updates, and shows progress.

* Viz
We use d3.js to render a chord diagram of the script (see 
http://mbostock.github.com/d3/ex/chord.html), where each arc in the circle is a 
job and each chord is a dependancy. This requires PIG-2660. We also render a 
tableview of all jobs where we show alias and feature initially, but then add 
jobName, #reducers, #mappers and progress percents once we have that. Other 
related patches required are PIG-2663 and PIG-2664.

* Future work
- Better visualization. The chord diagram is ok, but we'd like to find a good 
JS library for DAG rendering (ala GraphViz) and include that as an option too.
- Non-embedded mode. The Jetty server should be deployable as a standalone app 
server. Clients can push their state to it and the server has a persistant data 
store. Embedded mode is still useful during development.
- Better script bindings. Being able to reference a pop-up of the script with 
highlighting of certain parts (see PIG-2659) would be useful.


                
> A better plan/data flow visualizer
> ----------------------------------
>
>                 Key: PIG-2586
>                 URL: https://issues.apache.org/jira/browse/PIG-2586
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>            Reporter: Daniel Dai
>              Labels: gsoc2012
>
> Pig supports a dot graph style plan to visualize the 
> logical/physical/mapreduce plan (explain with -dot option, see 
> http://ofps.oreilly.com/titles/9781449302641/developing_and_testing.html). 
> However, dot graph takes extra step to generate the plan graph and the 
> quality of the output is not good. It's better we can implement a better 
> visualizer for Pig. It should:
> 1. show operator type and alias
> 2. turn on/off output schema
> 3. dive into foreach inner plan on demand
> 4. provide a way to show operator source code, eg, tooltip of an operator 
> (plan don't currently have this information, but you can assume this is in 
> place)
> 5. besides visualize logical/physical/mapreduce plan, visualize the script 
> itself is also useful
> 6. may rely on some java graphic library such as Swing
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to