Sriram, Sailfish depends on append. I just noticed the HDFS disabled append. How does one use this with Hadoop?
On Wed, May 9, 2012 at 9:00 AM, Otis Gospodnetic <otis_gospodne...@yahoo.com > wrote: > Hi Sriram, > > >> The I-file concept could possibly be implemented here in a fairly self > contained way. One > >> could even colocate/embed a KFS filesystem with such an alternate > >> shuffle, like how MR task temporary space is usually colocated with > >> HDFS storage. > > > Exactly. > > >> Does this seem reasonable in any way? > > > Great. Where do go from here? How do we get a colloborative effort > going? > > > Sounds like a JIRA issue should be opened, the approach briefly described, > and the first implementation attempt made. Then iterate. > > I look forward to seeing this! :) > > Otis > -- > > Performance Monitoring for Solr / ElasticSearch / HBase - > http://sematext.com/spm > > > > >________________________________ > > From: Sriram Rao <srirams...@gmail.com> > >To: common-dev@hadoop.apache.org > >Sent: Tuesday, May 8, 2012 6:48 PM > >Subject: Re: Sailfish > > > >Dear Andy, > > > >> From: Andrew Purtell <apurt...@apache.org> > >> ... > > > >> Do you intend this to be a joint project with the Hadoop community or > >> a technology competitor? > > > >As I had said in my email, we are looking for folks to colloborate > >with us to help get us integrated with Hadoop. So, to be explicitly > >clear, we are intending for this to be a joint project with the > >community. > > > >> Regrettably, KFS is not a "drop in replacement" for HDFS. > >> Hypothetically: I have several petabytes of data in an existing HDFS > >> deployment, which is the norm, and a continuous MapReduce workflow. > >> How do you propose I, practically, migrate to something like Sailfish > >> without a major capital expenditure and/or downtime and/or data loss? > > > >Well, we are not asking for KFS to replace HDFS. One path you could > >take is to experiment with Sailfish---use KFS just for the > >intermediate data and HDFS for everything else. There is no major > >capex :). While you get comfy with pushing intermediate data into a > >DFS, we get the ideas added to HDFS. This simplifies deployment > >considerations. > > > >> However, can the Sailfish I-files implementation be plugged in as an > >> alternate Shuffle implementation in MRv2 (see MAPREDUCE-3060 and > >> MAPREDUCE-4049), > > > >This'd be great! > > > >> with necessary additional plumbing for dynamic > >> adjustment of reduce task population? And the workbuilder could be > >> part of an alternate MapReduce Application Manager? > > > >It should be part of the AM. (Currently, with our implementation in > >Hadoop-0.20.2, the workbuilder serves the role of an AM). > > > >> The I-file concept could possibly be implemented here in a fairly self > contained way. One > >> could even colocate/embed a KFS filesystem with such an alternate > >> shuffle, like how MR task temporary space is usually colocated with > >> HDFS storage. > > > >Exactly. > > > >> Does this seem reasonable in any way? > > > >Great. Where do go from here? How do we get a colloborative effort going? > > > >Best, > > > >Sriram > > > >>> From: Sriram Rao <srirams...@gmail.com> > >>> To: common-dev@hadoop.apache.org > >>> Sent: Tuesday, May 8, 2012 10:32 AM > >>> Subject: Project announcement: Sailfish (also, looking for > colloborators) > >>> > >>> Hi, > >>> > >>> I'd like to announce the release of a new open source project, > Sailfish. > >>> > >>> http://code.google.com/p/sailfish/ > >>> > >>> Sailfish tries to improve Hadoop-performance, particularly for > large-jobs > >>> which process TB's of data and run for hours. In building Sailfish, we > >>> modify how map-output is handled and transported from map->reduce. > >>> > >>> The project pages provide more information about the project. > >>> > >>> We are looking for colloborators who can help get some of the ideas > into > >>> Apache Hadoop. A possible step forward could be to make "shuffle" > phase of > >>> Hadoop pluggable. > >>> > >>> If you are interested in working with us, please get in touch with me. > >>> > >>> Sriram > >> > > > > > > > >-- > >Best regards, > > > > - Andy > > > >Problems worthy of attack prove their worth by hitting back. - Piet > >Hein (via Tom White) > > > > > > >