Re: [VOTE] Direction for Hadoop development

Arun C Murthy Mon, 06 Dec 2010 10:47:58 -0800


On Dec 6, 2010, at 10:40 AM, Chris Douglas wrote:


This question is backwards. If the assertion is that a part of the
framework's development should be arrested, that claim requires a
discussion and vote. The PMC should not have to weigh in on allowing
code to change. -C


Agreed.

Arresting development on SequenceFile is preposterous. There areseveral petabytes of data sitting on it all over for several reasons,including legacy. Stopping development on it is unreasonable. ApacheHadoop is volunteer driven, volunteers should be allowed to contributeas they see fit.


+1

Arun

On Mon, Dec 6, 2010 at 9:16 AM, Owen O'Malley <[email protected]>wrote:
On Dec 1, 2010, at 11:11 AM, Owen O'Malley wrote:

All,
We really need some guidance on the general direction for theproject.Please comment and/or vote. If no one cares, then I'll probablycommit it to
Yahoo's internal branch.

-- Owen
The question is how the Hadoop project wants to move forward.
It was motivated by Doug's veto of HADOOP-6685, which was based onhispersonal decisions about how the project should go forward and noton
anything that had been decided by the PMC.

These decisions are much more important to MapReduce, which is a
framework, than HDFS which is a client/server model.

1. Should Hadoop include a user-facing library of useful code?

There has been a suggestion that user-facing library code, such as
SequenceFile, TFile, DistCp, etc. should be deprecated and thatHadoopshould allow third party projects like Avro to supply the user-facinglibrary code that makes Hadoop usable. I think it is critical thatwe keepthose components as part of Hadoop and extend them as theframework evolves.Users depend heavily on SequenceFile for storing their data inHadoop and
they should not  be deprecated as Doug has suggested.
2. Should MapReduce support non-Writables through the pipeline outof the
box?

There has also been a discussion about whether we should support
non-Writables natively. There is already library code in Avro thatletsusers use Avro types in a custom MapReduce API. A generalMapReduce API thatencompasses all of the serialization frameworks and does not lockusers into
a particular one is much more powerful.
Furthermore, making it convenient for the users, by including thepluginsin the default configuration and class path, will enable the useof Avro,
Thrift and ProtoBuf objects by people who would rather not focus on
serialization. Avro and Writables should not be the only first class
serializations that Hadoop supports by default.

3. Should a framework dependency on ProtoBuf be allowed?
Doug has added several framework dependences on Avro. The questioniswhether it is acceptable to use the ProtoBuf library in theframework. Avrois good for uses where there are a lot of objects of the sametype. ProtoBufis better for small number of objects. The question is whetherAvro, JSON,and XML should be the only serialization libraries that areacceptable to
use in the framework.

Re: [VOTE] Direction for Hadoop development

Reply via email to