[ https://issues.apache.org/jira/browse/TEZ-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14350128#comment-14350128 ]
Jeff Zhang commented on TEZ-2105: --------------------------------- [~shanaka.kuruwita] Glad to know you are interested in this. If you have any question regarding tez or pig on tez, you can send to pig/tez user mail list. Not sure you current knowledge of pig/tez, for this jira I think you should first understand how pig implement "order by" using tez. You can start with some pig script and check the source code under package org.apache.pig.backend.hadoop.executionengine.tez. > Totally Sorted Edge with auto-parallelism > ----------------------------------------- > > Key: TEZ-2105 > URL: https://issues.apache.org/jira/browse/TEZ-2105 > Project: Apache Tez > Issue Type: New Feature > Reporter: Gopal V > Labels: gsoc, gsoc2015, hadoop, java, pig, tez > > Pig-on-Tez supports an edge configuration using a sampled Output along with a > vertex manager for automatic parallelism estimation. > This is referred to in the Pig-on-Tez Hadoop Summit presentation. > http://www.slideshare.net/Hadoop_Summit/pig-on-tez-low-latency-etl-with-big-data/19 > Migrating that plan-model into Tez as a native edge type would allow for much > more efficient scheduling of the downstream edges and effectively turn the > auto-parallelism implementation into a runtime skew-correcting mechanism > within this edge. > The Tez Edge has enough information to sample, determine partitioning order > and correct parallelism. -- This message was sent by Atlassian JIRA (v6.3.4#6332)