Hi Jeff, Thank you Jeff. I known Hive has handling skewed join, but I think it is not enough: 1.Need cost sample 2.Can't control the size of a task 3.Not exact 4.Must use Hive or Pig
I think this is a fundamental solution for skew problem by adding balacne between map and reduce. Maybe I need express it more detailed. Regards Jian YI 2010/2/8 Jeff Hammerbacher <[email protected]> > Hey Jian, > > Hive supports arbitrary procedural languages through Hadoop Streaming; see > http://wiki.apache.org/hadoop/Hive/LanguageManual/Transform for more. > > Also, both Hive and Pig have support for handling skewed joins if you use > their higher-level interface. See > https://issues.apache.org/jira/browse/HIVE-562 and > http://wiki.apache.org/pig/PigSkewedJoinSpec. > > Thanks, > Jeff > > On Sun, Feb 7, 2010 at 4:13 AM, jian yi <[email protected]> wrote: > > > Hey Jeff, > > > > Thank you, Jeff. > > The procedure means procedure language, like Oracle PL/SQL, which is very > > helpful to migrate old services. We want to build a data warehouse based > on > > MapReduce engine. I plan to optimize MapReduce to solve the skew problem > by > > adding a balance between map and reduce. Please refer to > > http://bbs.hadoopor.com/thread-521-1-1.html > > > > <http://bbs.hadoopor.com/thread-521-1-1.html>Regards, > > Jian > > > > 2010/2/7 Jeff Hammerbacher <[email protected]> > > > > > Hey Jian, > > > > > > I'm not sure what you mean by "Hive don't support procedure", but in > any > > > case, the Pig team has stated that they will support SQL over the Pig > > > execution engine. See https://issues.apache.org/jira/browse/PIG-824. > > > > > > Regards, > > > Jeff > > > > > > On Sat, Feb 6, 2010 at 6:16 PM, jian yi <[email protected]> wrote: > > > > > > > Hi, > > > > > > > > SQL is very helpful to develop data warehouse, but Hive don't support > > > > procedure. if Pig support SQL, it will be more powerful. > > > > > > > > > >
