Re: Hadoop 1.0 Compatibility Discussion.

Sanjay Radia Wed, 22 Oct 2008 11:39:57 -0700


On Oct 21, 2008, at 5:23 PM, Konstantin Shvachko wrote:

Sanjay Radia wrote:
 >>          o  Can't remove deprecated classes or methods until 2.0

Dhruba Borthakur wrote:
> 1. APIs that are deprecated in x.y release can be removed in (x+1).0 release.
Current rule is that apis deprecated in M.x.y can be remove in M.(x+2).0
I don't think we want neither relax nor stiffen this requirement.

I think we want to strengthen this to : removal of deprecated methods/classes only on major releases

Isn't this what major and minor releases mean?

I believe that is what customers will expect from a 1.0 release -stability till 2.0.Are you worried that maintaining old methods is too much burdenbecause there will be too many of them?



sanjay

> 2.  Old 1.x clients can connect to new 1.y servers, where x <= y but
> the old clients might get reduced functionality or performance. 1.x
> clients might not be able to connect to 2.z servers.
>
> 3. HDFS disk format can change from 1.x to 1.y release and is
> transparent to user-application. A cluster when rolling back to 1.x
> from 1,y will revert back to the old disk format.
>
>> * In a major release transition [ ie from a release x.y to arelease>> (x+1).0], a user should be able to read data from the clusterrunning the
>> old version.
>
> I think this is a good requirement to have. This will be very useful
> when we run multiple clusters, especially across data centers
> (HADOOP-4058 is a use-case).
I don't see anything about compatibility model going from 1.*.* to2.0.0.
Does that mean we do not provide compatibility between those?
Does that mean compatibility between 1.*.* and 2.*.* is provided bydistcp?
Or another way to ask the same question: will HDFS-1 and HDFS-2 be
as different as ext2 and ext3?
I am not saying this is bad just want it to be clarified.
May be we should somehow structure this discussion into sections,e.g.:
- deprecation rules;
- client/server communication compatibility;
- inter version data format compatibility;
    = meta-data compatibility
    = block data compatibility

--Konstantin

>> --------
>> What does Hadoop 1.0 mean?
>> * Standard release numbering: Only bug fixes in 1.x.y releasesand new
>> features in 1.x.0 releases.
>> * No need for client recompilation when upgrading from 1.x to1.y, where
>> x <= y
>>          o  Can't remove deprecated classes or methods until 2.0
>>     * Old 1.x clients can connect to new 1.y servers, where x <= y
>> * New FileSystem clients must be able to call old methods whentalking to>> old servers. This generally will be done by having old methodscontinue to>> use old rpc methods. However, it is legal to have newimplementations of old>> methods call new rpcs methods, as long as the librarytransparently handles
>> the fallback case for old servers.
>> -----------------
>>
>> A couple of  additional compatibility requirements:
>>
>> * HDFS metadata and data is preserved across release changes,both major and
>> minor. That is,
>> whenever a release is upgraded, the HDFS metadata from the oldrelease will
>> be converted automatically
>> as needed.
>>
>> The above has been followed so far in Hadoop; I am justdocumenting it in
>> the 1.0 requirements list.
>>
>> * In a major release transition [ ie from a release x.y to arelease>> (x+1).0], a user should be able to read data from the clusterrunning the>> old version. (OR shall we generalize this to: from x.y to (x+i).z ?)
>>
>> The motivation: data copying across clusters is a commonoperation for many
>> customers
>> (for example this is routinely at done at Yahoo.). Today, http(or hftp)>> provides a guaranteed compatible way of copying data acrossversions.>> Clearly one cannot force a customer to simultaneously update allits hadoop
>> clusters on to
>> a new major release. The above documents this requirement; we cansatisfy it
>> via the http/hftp mechanism or some other mechanism.
>>
>> Question: is one is willing to break applications that operateacross>> clusters (ie an application that accesses data across clustersthat cross a>> major release boundary? I asked the operations team at Yahoo thatrun our>> hadoop clusters. We currently do not have any applicaions thataccess data>> across clusters as part of a MR job. The reason being thatHadoop routinely>> breaks wire compatibility across releases and so such apps wouldbe very>> unreliable. However, the copying of data across clusters is t iscrucial and
>> needs to be supported.
>>
>> Shall we add a stronger requirement for 1.0: wire compatibilityacross>> major versions? This can be supported by class loading or othergames. Note>> we can wait to provide this when 2.0 happens. If Hadoop providedthis>> guarantee then it would allow customers to partition their dataacross>> clusters without risking apps breaking across major releases dueto wire
>> incompatibility issues.
>>
>>
>>
>

Re: Hadoop 1.0 Compatibility Discussion.

Reply via email to