[jira] [Commented] (TEZ-2105) Totally Sorted Edge with auto-parallelism

Jeff Zhang (JIRA) Fri, 06 Mar 2015 00:58:09 -0800

    [ 
https://issues.apache.org/jira/browse/TEZ-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14350128#comment-14350128
 ]


Jeff Zhang commented on TEZ-2105:
---------------------------------

[~shanaka.kuruwita] Glad to know you are interested in this. If you have any 
question regarding tez or pig on tez, you can send to pig/tez user mail list.  
Not sure you current knowledge of pig/tez, for this jira I think you should 
first understand how pig implement "order by" using tez. You can start with 
some pig script and check the source code under package 
org.apache.pig.backend.hadoop.executionengine.tez.





> Totally Sorted Edge with auto-parallelism
> -----------------------------------------
>
>                 Key: TEZ-2105
>                 URL: https://issues.apache.org/jira/browse/TEZ-2105
>             Project: Apache Tez
>          Issue Type: New Feature
>            Reporter: Gopal V
>              Labels: gsoc, gsoc2015, hadoop, java, pig, tez
>
> Pig-on-Tez supports an edge configuration using a sampled Output along with a 
> vertex manager  for automatic parallelism estimation.
> This is referred to in the Pig-on-Tez Hadoop Summit presentation.
> http://www.slideshare.net/Hadoop_Summit/pig-on-tez-low-latency-etl-with-big-data/19
> Migrating that plan-model into Tez as a native edge type would allow for much 
> more efficient scheduling of the downstream edges and effectively turn the 
> auto-parallelism implementation into a runtime skew-correcting mechanism 
> within this edge.
> The Tez Edge has enough information to sample, determine partitioning order 
> and correct parallelism.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2105) Totally Sorted Edge with auto-parallelism

Reply via email to