Hi Tom, > There are also Hadoop tools like distcp, Hadoop archives, Streaming, > etc, which should go with MapReduce.
Good point. I agree. > The alternative would be to have a Common TLP, > which we shouldn't necessarily dismiss, since more important than the > size of the codebase is that there's a community to support the > codebase, as there certainly is here. I guess the question is who would want to be on that project? I don't think the current bundle of stuff in common would form a good kernel for a community. A lack of a coherent community for common has always been a problem with the project split IMO. I could see folks deciding that they were going to build a community around a really good RPC stack, or some other chunk of common, but frankly I think it it premature to do that. Proposals welcome of course, but I think the HDFS folks will want a copy of the RPC stuff in their project and most of the rest of the stuff in common is too small to merit a project and is more easily handled via duplication and then sorting it out / dead code elimination. On Aug 29, 2012, at 10:30 AM, Tom White wrote: > On Wed, Aug 29, 2012 at 5:31 PM, Arun C Murthy <a...@hortonworks.com> wrote: >> >> On Aug 28, 2012, at 8:50 PM, Alejandro Abdelnur wrote: >> >>> Chris, thanks for initiating the discussion. >> >> Likewise, thanks Chris! >> >>> >>> IMO a pre-requisite to this is to figure out how we'll handle the following: >>> >> >> >> Good points - I'd recommend we keep Common and HDFS in the same project. > > That seems reasonable. The alternative would be to have a Common TLP, > which we shouldn't necessarily dismiss, since more important than the > size of the codebase is that there's a community to support the > codebase, as there certainly is here. Having said that, a Common TLP > lacks a clear 'mission' since it doesn't offer any standalone > services. Also, it may diminish in utility over time if pieces are > moved into HDFS, MapReduce and YARN. > >> Yes, MR/YARN will need some changes in Common occasionally, but core pieces >> like RPC have been maintained by HDFS folks over time anyway e.g. move to >> ProtoBufs were led by Sanjay, Suresh, Todd, Jitendra et al. > > Does the work to use versioned protocol buffers for RPC mean that > different releases of HDFS and MapReduce can work together yet? If > not, this is something we should be working towards (although that > shouldn't block a move to TLPs). > >> >> We can move SequenceFile into MR if necessary and keep same package names >> for compatibility. > > There are also Hadoop tools like distcp, Hadoop archives, Streaming, > etc, which should go with MapReduce. > > Cheers, > Tom > >> >> We should, of course, stop tweaking things in different projects in the same >> jira - we've been reasonably good at not doing that. >> >> Thoughts? >> >> Arun >> >>> * Where does common stuff lives? >>> * What are the public interfaces of each project (towards the other >>> projects)? >>> * How do we do development/releases? In tandem? Separate? How this >>> will work in practice, currently we are constantly tweaking things >>> inter-projects, sometimes in the same JIRAs, sometimes in follow up >>> JIRAs. >>> >>> Thoughts? >>> >>> Thxs. >>> >>> On Tue, Aug 28, 2012 at 7:33 PM, Mattmann, Chris A (388J) >>> <chris.a.mattm...@jpl.nasa.gov> wrote: >>>> [decided to minimize traffic and to simply put this in one thread] >>>> >>>> Hi Guys, >>>> >>>> See the recent discussion on these threads: >>>> >>>> YARN as its own Hadoop "sub project": http://s.apache.org/WW1 >>>> Maintain a single committer list for the Hadoop project: >>>> http://s.apache.org/Owx >>>> >>>> ...and just pay attention to the Hadoop project over the last 3-4 years. >>>> It's operating >>>> as a single project, that's masking separate communities that themselves >>>> are really >>>> separate ASF projects. >>>> >>>> At the ASF, this has been a problem area called "umbrella" projects and >>>> over the years, >>>> all I've seen from them is wasted bandwidth, artificial barriers and the >>>> inventions of >>>> new ways to perform process mongering and to reduce the fun in developing >>>> software >>>> at this fantastic foundation. >>>> >>>> I've talked about umbrella projects enough. We've diverted conversation >>>> enough. >>>> Enough people have tried to act like there is some technical mumbo jumbo >>>> that is >>>> preventing the eventual act of higher power that I myself hope comes >>>> should these >>>> discussions prove unfruitful through normal means. >>>> >>>> *these. are. separate. projects.* >>>> *there.are.not.blocker.issues.from.spinning.out.these.projects.as.their.own.communities* >>>> >>>> In this email: http://s.apache.org/rSm >>>> >>>> And in the 2 subsequent follow ons in that thread, I've outlined a process >>>> that I'll copy >>>> through below for splitting these projects into their own TLPs: >>>> >>>> -----snip >>>> Process: >>>> >>>> 0. [DISCUSS] thread for <TLP name> in which you talk about #1 and #2 >>>> below, potentially draft resolution too. >>>> >>>> 1. Decide on an initial set of *PMC* members. I urge each new TLP to adopt >>>> PMC==C. See reasons I've >>>> already discussed. >>>> >>>> 2. Decide on a chair. Try not to VOTE for this explicitly, see if can be >>>> discussed and consensus >>>> can be reached (just a thought experiment). VOTE if necessary. >>>> >>>> 3. [VOTE] thread for <TLP name> >>>> >>>> 4. Create Project: >>>> a. paste resolution from #0 to board@ or; >>>> b. go to general@incubator and start new Incubator project. >>>> >>>> 5. infrastructure set up. >>>> MLs moving; new UNIX groups; website setup; >>>> SVN setup like this: >>>> >>>> svn copy -m "MR TLP." https://svn.apache.org/repos/asf/hadoop/ >>>> https://svn.apache.org/repos/asf/<insert cool MR name>; or >>>> svn copy -m "YARN TLP." https://svn.apache.org/repos/asf/hadoop/ >>>> https://svn.apache.org/repos/asf/<insert cool YARN name>; or >>>> svn copy -m "HDFS TLP." https://svn.apache.org/repos/asf/hadoop/ >>>> https://svn.apache.org/repos/asf/<insert cool HDFS name> >>>> >>>> After all 3 have been created run: >>>> >>>> svn remove -m "Remove Hadoop umbrella TLP. Split into separate projects." >>>> https://svn.apache.org/repos/asf/hadoop >>>> >>>> 6. (TLPs if 4a; Incubator podling if 4b;) proceed, collaborate, operate as >>>> distinct communities, and try to solve the code duplication/dependency >>>> issues from there. >>>> >>>> 7. If 4b; then graduate as TLP from Incubator. >>>> >>>> -----snip >>>> >>>> So that's my proposal. >>>> >>>> Thanks guys. >>>> >>>> Cheers, >>>> Chris >>>> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> Chris Mattmann, Ph.D. >>>> Senior Computer Scientist >>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >>>> Office: 171-266B, Mailstop: 171-246 >>>> Email: chris.a.mattm...@nasa.gov >>>> WWW: http://sunset.usc.edu/~mattmann/ >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> Adjunct Assistant Professor, Computer Science Department >>>> University of Southern California, Los Angeles, CA 90089 USA >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> >>> >>> >>> >>> -- >>> Alejandro >> >> -- >> Arun C. Murthy >> Hortonworks Inc. >> http://hortonworks.com/ >> >>