On 12/07/2010 02:37 PM, Roy T. Fielding wrote:
Good, these are technical reasons.  The benefits can be cleared by docs.
By incompatible, I assume you mean forward-compatibility of old versions
of Hadoop reading newer files.  Can we fix that by having the new
implementation use the old file format by default until it is configured
to use one of the new interfaces for writing?

+1

You keep referring to the kernel as if it were a product.  I don't see
a kernel product in the list of things released by Apache Hadoop.

The line is fairly clear. The kernel is the daemons plus the framework code that invokes user code. The set of pluggable user implementations is fairly small: InputFormat, OutputFormat, Mapper, Reducer, RawComparator.

SequenceFile was originally part of the kernel but is now only used by user-level InputFormats and OutputFormats.

If there were such a product, then it would make sense for Apache Hadoop
to also release ancillary products for common libraries, test frameworks,
and modular storage interfaces.  Rearchitecting the Hadoop product suite
into such a logical arrangement would make sense, and after such an
architecture is put into place then "keeping the kernel simple" would
be a reason to veto a change to the kernel.

Such a re-arrangement has been proposed but not completed. Relevant issues are MAPREDUCE-1638, MAPREDUCE-1453, and MAPREDUCE-1700. It mostly involves build issues; the architecture already largely supports the distinction.

Tom long ago provided patches showing how the existing
configuration system can provide equivalent extension
implementations outside of the kernel with no incompatible changes.
(MAPREDUCE-376 and MAPREDUCE-377)
They both seem to be active and unfinished.  If they are equivalent fixes
to the same problem, then I suggest applying them to a branch, documenting
how they work, and then agreeing to have a bake-off.  A bake-off is a
decision made by performance and feature-completeness as an objective
way to resolve an impasse due to mutually exclusive vetoes.  All sides agree
to drop the veto and accept whichever performs best, by majority decision.

A bake-off could be a good way to resolve this. Performance differences would not likely be measurable, but folks might examine user programs and consider compatibility and support implications and vote accordingly.

All action items can be voted on.  What we are talking about here is a
short term plan, and it is listed as a type of action item under
changes to products.

Then voting on specific short-term actions might be a good way to resolve this.

Some specific short-term questions we might vote on:

1. Should we add specific versions of Protocol Buffers and Thrift to the classpath of every MapReduce program?

2. Should SequenceFile be forward-compatible, i.e., if an existing program that stores Writables in a SequenceFile is run against the new version, should the old version still be able to read the output of the new version?

3. Should we continue support a specified interchange format and/or data model for configuration data, or should configurations rather be opaque binary data? An interchange format might be JSON. An interchange data model might Map<String,Value> where values can be strings, booleans, numbers, bytes or nested configuration data, defined by a standard API that all configurable items would support. A specified format or model would permit things like using -D to set configuration options and permit generic interaction with external configuration systems. With opaque binary configurations, each configurable item would provide its own API and would require specific new code that calls this API for each parameter that could be set with -D or from an external configuration system.

They are also subject to veto if and only if they
are to be applied to the current release branch (or a released branch).

Owen intends to merge this patch to a release branch.

Right.

So votes on action items would be simple majority if they're not intended to be merged to a release branch, and vetoable if they are? Is that right?

Doug

Reply via email to