Hi, The alternate sorter is a pet project of mine. It dates from Nov 2012.
When I joined HWX I was told to "Optimize MR" & handed Terasort. I suspect it could've been a joke at my expense :) I wrote up a spec for what I thought would be a next gen sort buffer impl for MR & took 3 weeks to implement it (MAPREDUCE-4755). But this was such a huge patch that it would never really make it into trunk without an insane amount of testing & it never actually went into PA. Until Tez came along, when it became possible to plug in a new MapOutputBuffer and not just a sort class & switch between them. PipelinedSorter has been sitting in the Tez repo without any way to enable it for more than a year now (was in the initial import of Tez into incubator). At this point, I have forgotten how most of it works - which is why I'm config+enabling it to test it out. The con is that it's not tested beyond my 3 node clusters running terasort & hive's TPC-DS queries. You can poke about the spec I wrote in 2012 (some details might be unimplemented) - http://people.apache.org/~gopalv/PipelinedSorter.pdf The document has most of the arguments for this sorter. Cheers, Gopal On Fri, Jan 24, 2014 at 3:43 PM, Rohini Palaniswamy <[email protected]> wrote: > Hi, > Looks like PipelinedSorter uses multiple threads to do the sort. Can > someone explain its use, pros and cons? > > Regards, > Rohini > > ---------- Forwarded message ---------- > From: Gopal V (JIRA) <[email protected]> > Date: Fri, Jan 24, 2014 at 3:07 PM > Subject: [jira] [Created] (TEZ-765) Allow tez.runtime.sort.threads > 1 to > turn on PipelinedSorter > To: [email protected] > > > Gopal V created TEZ-765: > --------------------------- > > Summary: Allow tez.runtime.sort.threads > 1 to turn on > PipelinedSorter > Key: TEZ-765 > URL: https://issues.apache.org/jira/browse/TEZ-765 > Project: Apache Tez > Issue Type: Bug > Affects Versions: 0.3.0 > Reporter: Gopal V > Assignee: Gopal V > Priority: Trivial > Fix For: 0.3.0 > > > The Tez pipelined sorter cannot be turned on without a rebuild. > > Allow the sorter to be turned on via already existing config key > "tez.runtime.sort.threads". > > > > -- > This message was sent by Atlassian JIRA > (v6.1.5#6160) -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
