Thinking about this some more, it will be beneficial to move the client-datanode data transfer communication to a more formalized RPC protocol before the 1.0 release http://issues.apache.org/jira/browse/HADOOP-4386
thanks, dhruba On Mon, Oct 20, 2008 at 11:44 PM, Dhruba Borthakur <[EMAIL PROTECTED]> wrote: > 1. APIs that are deprecated in x.y release can be removed in (x+1).0 release. > > 2. Old 1.x clients can connect to new 1.y servers, where x <= y but > the old clients might get reduced functionality or performance. 1.x > clients might not be able to connect to 2.z servers. > > 3. HDFS disk format can change from 1.x to 1.y release and is > transparent to user-application. A cluster when rolling back to 1.x > from 1,y will revert back to the old disk format. > >> * In a major release transition [ ie from a release x.y to a release >> (x+1).0], a user should be able to read data from the cluster running the >> old version. > > I think this is a good requirement to have. This will be very useful > when we run multiple clusters, especially across data centers > (HADOOP-4058 is a use-case). > > thanks, > dhruba > >> -------- >> What does Hadoop 1.0 mean? >> * Standard release numbering: Only bug fixes in 1.x.y releases and new >> features in 1.x.0 releases. >> * No need for client recompilation when upgrading from 1.x to 1.y, where >> x <= y >> o Can't remove deprecated classes or methods until 2.0 >> * Old 1.x clients can connect to new 1.y servers, where x <= y >> * New FileSystem clients must be able to call old methods when talking to >> old servers. This generally will be done by having old methods continue to >> use old rpc methods. However, it is legal to have new implementations of old >> methods call new rpcs methods, as long as the library transparently handles >> the fallback case for old servers. >> ----------------- >> >> A couple of additional compatibility requirements: >> >> * HDFS metadata and data is preserved across release changes, both major and >> minor. That is, >> whenever a release is upgraded, the HDFS metadata from the old release will >> be converted automatically >> as needed. >> >> The above has been followed so far in Hadoop; I am just documenting it in >> the 1.0 requirements list. >> >> * In a major release transition [ ie from a release x.y to a release >> (x+1).0], a user should be able to read data from the cluster running the >> old version. (OR shall we generalize this to: from x.y to (x+i).z ?) >> >> The motivation: data copying across clusters is a common operation for many >> customers >> (for example this is routinely at done at Yahoo.). Today, http (or hftp) >> provides a guaranteed compatible way of copying data across versions. >> Clearly one cannot force a customer to simultaneously update all its hadoop >> clusters on to >> a new major release. The above documents this requirement; we can satisfy it >> via the http/hftp mechanism or some other mechanism. >> >> Question: is one is willing to break applications that operate across >> clusters (ie an application that accesses data across clusters that cross a >> major release boundary? I asked the operations team at Yahoo that run our >> hadoop clusters. We currently do not have any applicaions that access data >> across clusters as part of a MR job. The reason being that Hadoop routinely >> breaks wire compatibility across releases and so such apps would be very >> unreliable. However, the copying of data across clusters is t is crucial and >> needs to be supported. >> >> Shall we add a stronger requirement for 1.0: wire compatibility across >> major versions? This can be supported by class loading or other games. Note >> we can wait to provide this when 2.0 happens. If Hadoop provided this >> guarantee then it would allow customers to partition their data across >> clusters without risking apps breaking across major releases due to wire >> incompatibility issues. >> >> >> >
