On 16/05/11 13:00, Segel, Mike wrote:
But Cloudera's release is a bit murky.

The math example is a bit flawed...

X represents the set of stable releases.
Y represents the set of available patches.
C represents the set of Cloudera releases.

So if C contains a release X(n) plus a set of patches that is contained in Y,
Then does it not have the right to be considered Apache Hadoop?
It's my understanding is that any enhancement to Hadoop is made available to 
Apache and will eventually make it into a later release...

It certainly contains it.

Now, if you want to make life more complex:
-view the contributions to the code base as a series of patches P1...Pn, each of which changes the code. -These patches are essentially functions that transform the source S to a new state S'.
-the initial state of the source codebase is S0.

Hypothesis: the order in which the patch functions are applied determines the final state of the source tree.

If patches P1 and P2 were applied in order, you would get a state

S' = P2(P1(S0))

Applying the patches in a different order, you get a new final state.
S'' = P1(P2(S0))


Question for the maths people then is: can you be sure that S' and S'' are the same. As it would seem to me that it depends on the nature of the function. It could be that the set of functions that SVN supports guarantees sameness, but given conflict resolution problems I've encountered in the past, I doubt this.

Assuming that my belief holds: that the order in which a series of SVN patches are executed determines the final state of the source tree, then saying the patch sets -the set of functions applied to the source- of two codebases are equivalent does not mean the final state of the code is the same unless the sequence of application is also the same.

That would then define an apache release as a strictly ordered sequence of patches, or at least an sequence of operations that leads to the same final code state, such as S0.20.3

(oh look, I've just written a formal definition of what a release is, though I've avoided defining what a function is. View them as planar projections in cartesian space or something)



So while it may not be 'official' release X(z), all of it's components are in 
Apache.
(note: I'm talking about the core components and not Cloudera's additional 
toolsets that encompass Hadoop.)

Cloudera is clearly a derivative work.
And IMHO is the only one which can say ... 'Includes Apache Hadoop'.

Once you start thinking about the ordering of the patch functions it gets complicated.

That doesn't mean that others can't, depending on how they implemented their 
changes.

yes, though again it depends on the sequence of functions applied to the released sourcecode, such as S0.20.3, to the version they ship.

So it wouldn't be a superset since it doesn't contain a complete subset, but 
contains code that implements the API... So they can't say 'Includes Apache 
Hadoop',but they can say it's a derivative work based on Apache Hadoop and then 
go on to show how and why, in their opinion their product is better.(that's 
marketing for you...)

I agree

Fragmentation of Hadoop will occur. It's inevitable. Too much money is on the 
table...

Clearly, but there are still some questions we can resolve here
 -what do they call their products?
-how can they support assertions that their code is compatible if the series of patches they have applied to the codebase are not externally visible?
 -what are the concerns of the community about naming and branching?



But because Apache's licensing is so open, Apache will have a hard time 
controlling derivative works...

The Apache license permits anyone to fork and take that fork in house or closed source. Most people are considered daft to do this except for quick fixes, because any closed source takes on the task of writing the functions needed to transform it from the released state to one that matches customer needs. (i.e. the working state)


I believe that Steve is incorrect in his assertion concerning potential loss of 
any patent protection. Again Apache's licensing is very open and as long as 
they follow Apache's Ts and Cs, they are covered.

Possibly. I avoid such legal issues.

-steve

Reply via email to