[
https://issues.apache.org/jira/browse/HADOOP-5073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sanjay Radia updated HADOOP-5073:
---------------------------------
Description:
This jira proposes an interface classification for hadoop interfaces.
The discussion was started in email alias [email protected] in Nov
2008.
was:
This jira proposes an interface classification for hadoop interfaces.
The discussion was started in email alias [email protected] in Nov
2008.
h2. Interface Taxonomy - Scope & Stability Classification
The interface taxonomy classification provided here is for guidance to
developers and users of interfaces.
The classification guides a developer to declare the scope (or targeted
audience or users) of an interface and also its stability.
* *Benefits to the user of an interface*: Knows which interfaces to use or not
use and their stability.
* *Benefits to the developer*: to prevent accidental changes of interfaces and
hence accidental impact on users or other components or system. This is
particularly useful in large systems with many developers who may not all have
a shared state/history of the project.
This classification was derived from a taxonomy used inside Yahoo and
from the OpenSolaris taxonomy
(http://www.opensolaris.org/os/community/arc/policies/interface-taxonomy/#Advice)
Interface have two main attributes: *Scope* and *Stability*
* *Scope* - _denotes the potential customers of the interface_.
For example many interfaces are merely internal or private interfaces of the
implementation while others are public or external interfaces that applications
or clients are expected to use. In posix, libc is an is an external or public
interface, while large parts of the kernel are internal or private interfaces.
In addition, some interfaces are targeted to some specific other subsystems.
Identifying the scope helps define the customers or users of the interfaces and
helps define the impact of breaking an interface. For example we may be willing
to break the comaptibility of an interface whose scope is a small number of
specific subsystems. One the other hand, one is unlikely to break a protocol
interfaces that millions of internet users depend on.
The following are useful scopes in order of increasing/wider visibility
** *project-private*
*** the interface is for internal use _within_ the project and should not be
used by applications. It is subject to change at anytime without notice. Most
interfaces of a project are project private.
** *limited-private*
*** the interface is used by a specified set of projects or systems (typically
closely related projects). Other projects or systems should not use the
interface. Changes to the interface will be communicated/negotiated with the
specified projects. For example, in the hadoop project, some interfaces are
*hdfs-mapReduce-private* in that they are private to the hdfs and mapReduce
projects.
** *company-private* (*_This not applicable to opensource projects such as
Hadoop._* It is mentioned here for completeness.)
*** the interface can use used by other projects within a company.
** *public*
*** the interface is for general use by any application.
* *Stability* - _denotes when changes can be made to the interface that break
compatibility_.
** *Stable*
*** Can evolve while retaining compatibility for minor release boundaries.;
can break compatibility only at major release (ie. at m.0).
** *Evolving*
*** Evolving, but can break compatibility at minor release (i.e. m.x)
** *Unstable*
*** This usually makes sense for only private interfaces.
*** However one may call this out for a _supposedly_ public interface to
highlight that it should not be used as an interface; for public interfaces,
labeling it as *Not-an-interface* is probably more appropriate than "unstable".
**** Examples of publically visible interfaces that are unstable (ie
not-an-interface): GUI, CLIs whose output format will change
** *Deprecated* - should not be used, will be removed in the future.
h2. FAQ
# What is the harm in applications using a private interface that is stable?
How is it different than a public stable interface?
While a private interface marked as stable is targeted to change only at
major releases, it may break at other times if the providers of that interface
are willing to changes the internal users of that interface. Further, a public
stable interface is less likely to break even at major releases (even though it
is allowed to break compatibility) because the impact of the change is larger.
*If you use a private interface (regardless of its stability) you run the risk
of incompatibility*.
# Why bother declaring the stability of a private interface?
** To communicate the intent to its internal users.
** To provide guidelines to developers of the interface
** The stability may capture other internal properties of the system
*** e.g In HDFS, NN-DN protocol stability can help implement as rolling
upgrades
*** e.g. In HDFS, FSImage stabilty can help provide more flexible roll backs.
# How will the classification be recorded for hadoop APIs?
** Each interface or class will have the scope and stability recorded using
javadoc tags, annotation, or some other mechanim. What ever mechanism we
choose, the classification must be visisble on the genrated java doc.
** APIs of private scope will not be part of the "public javadoc generated by
ant (ie by the _ant target_ "javadoc"); they will only be generated for the
developer javadoc (generated by _ant target_ "javadoc-dev")
** One can derive the scope of java classes and java interfaces by the scope of
the package in which they are contained. Hence it is useful to declare the
scope of each java package as public or private (along with the private scope
variations).
h2. Proposed Classification for Hadoop Interfaces
* Scope Public
** Stable
*** FileSystem, MapReduce, Config, CLI (inlcuding output), parts of
Mapred.lib, Job Logs API, instrumentation metrics. Audit logs
** Evolving
*** TFile, parts of Mapred.lib, some instrumentation metrics, jmx interface
(till it becomes stable),
*** Job logs and job history ( Some tools, scripts and chukwa use this to
analyze job processing)
** Not An interface
*** Web GUI
* Scope Private
** Limited-Private Evolving
*** RPC, Metrics (HDFS-MapReduce Private) - once stable, we can consider
making these public-stable.
** Project-Private Stable
*** Intra-HDFS and MR protocols (facilitates rolling upgrades down the road)
*** FSImage
**** Note this will enable old versions of HDFS to read newer fsImage and hence
enable more flexible roll backs.
**** Q. Should this be Project-Private Evolving instead?
**** Regardless of the stability of FSImage, new versions of HDFS have to be
able to transparently convert older versions and provide roll-back.
** Project-Private Evolving
*** DFSClient (Q. should this be "project-private unstable"
** Project-Private Unstable
*** System logs
*** All implementation classes and interfaces not otherwise classified are
considered to be project-private stable.
h2. Interface Taxonomy - Scope & Stability Classification
The interface taxonomy classification provided here is for guidance to
developers and users of interfaces.
The classification guides a developer to declare the scope (or targeted
audience or users) of an interface and also its stability.
* *Benefits to the user of an interface*: Knows which interfaces to use or not
use and their stability.
* *Benefits to the developer*: to prevent accidental changes of interfaces and
hence accidental impact on users or other components or system. This is
particularly useful in large systems with many developers who may not all have
a shared state/history of the project.
This classification was derived from a taxonomy used inside Yahoo and
from the OpenSolaris taxonomy
(http://www.opensolaris.org/os/community/arc/policies/interface-taxonomy/#Advice)
Interface have two main attributes: *Scope* and *Stability*
* *Scope* - _denotes the potential customers of the interface_.
For example many interfaces are merely internal or private interfaces of the
implementation while others are public or external interfaces that applications
or clients are expected to use. In posix, libc is an is an external or public
interface, while large parts of the kernel are internal or private interfaces.
In addition, some interfaces are targeted to some specific other subsystems.
Identifying the scope helps define the customers or users of the interfaces and
helps define the impact of breaking an interface. For example we may be willing
to break the comaptibility of an interface whose scope is a small number of
specific subsystems. One the other hand, one is unlikely to break a protocol
interfaces that millions of internet users depend on.
The following are useful scopes in order of increasing/wider visibility
** *project-private*
*** the interface is for internal use _within_ the project and should not be
used by applications. It is subject to change at anytime without notice. Most
interfaces of a project are project private.
** *limited-private*
*** the interface is used by a specified set of projects or systems (typically
closely related projects). Other projects or systems should not use the
interface. Changes to the interface will be communicated/negotiated with the
specified projects. For example, in the hadoop project, some interfaces are
*hdfs-mapReduce-private* in that they are private to the hdfs and mapReduce
projects.
** *company-private* (*_This not applicable to opensource projects such as
Hadoop._* It is mentioned here for completeness.)
*** the interface can use used by other projects within a company.
** *public*
*** the interface is for general use by any application.
* *Stability* - _denotes when changes can be made to the interface that break
compatibility_.
** *Stable*
*** Can evolve while retaining compatibility for minor release boundaries.;
can break compatibility only at major release (ie. at m.0).
** *Evolving*
*** Evolving, but can break compatibility at minor release (i.e. m.x)
** *Unstable*
*** This usually makes sense for only private interfaces.
*** However one may call this out for a _supposedly_ public interface to
highlight that it should not be used as an interface; for public interfaces,
labeling it as *Not-an-interface* is probably more appropriate than "unstable".
**** Examples of publically visible interfaces that are unstable (ie
not-an-interface): GUI, CLIs whose output format will change
** *Deprecated* - should not be used, will be removed in the future.
h2. FAQ
# What is the harm in applications using a private interface that is stable?
How is it different than a public stable interface?
While a private interface marked as stable is targeted to change only at
major releases, it may break at other times if the providers of that interface
are willing to changes the internal users of that interface. Further, a public
stable interface is less likely to break even at major releases (even though it
is allowed to break compatibility) because the impact of the change is larger.
*If you use a private interface (regardless of its stability) you run the risk
of incompatibility*.
# Why bother declaring the stability of a private interface?
** To communicate the intent to its internal users.
** To provide guidelines to developers of the interface
** The stability may capture other internal properties of the system
*** e.g In HDFS, NN-DN protocol stability can help implement as rolling
upgrades
*** e.g. In HDFS, FSImage stabilty can help provide more flexible roll backs.
# How will the classification be recorded for hadoop APIs?
** Each interface or class will have the scope and stability recorded using
javadoc tags, annotation, or some other mechanim. What ever mechanism we
choose, the classification must be visisble on the genrated java doc.
** APIs of private scope will not be part of the "public javadoc generated by
ant (ie by the _ant target_ "javadoc"); they will only be generated for the
developer javadoc (generated by _ant target_ "javadoc-dev")
** One can derive the scope of java classes and java interfaces by the scope of
the package in which they are contained. Hence it is useful to declare the
scope of each java package as public or private (along with the private scope
variations).
h2. Proposed Classification for Hadoop Interfaces
* Scope Public
** Stable
*** FileSystem, MapReduce, Config, CLI (inlcuding output), parts of
Mapred.lib, Job Logs API, instrumentation metrics. Audit logs
** Evolving
*** TFile, parts of Mapred.lib, some instrumentation metrics, jmx interface
(till it becomes stable),
*** Job logs and job history ( Some tools, scripts and chukwa use this to
analyze job processing)
** Not An interface
*** Web GUI
* Scope Private
** Limited-Private Evolving
*** RPC, Metrics (HDFS-MapReduce Private) - once stable, we can consider
making these public-stable.
** Project-Private Stable
*** Intra-HDFS and MR protocols (facilitates rolling upgrades down the road)
*** FSImage
**** Note this will enable old versions of HDFS to read newer fsImage and hence
enable more flexible roll backs.
**** Q. Should this be Project-Private Evolving instead?
**** Regardless of the stability of FSImage, new versions of HDFS have to be
able to transparently convert older versions and provide roll-back.
** Project-Private Evolving
*** DFSClient (Q. should this be "project-private unstable"
** Project-Private Unstable
*** System logs
*** All implementation classes and interfaces not otherwise classified are
considered to be project-private stable.
> Hadoop 1.0 Interface Classification - scope (visibility - public/private) and
> stability
> ---------------------------------------------------------------------------------------
>
> Key: HADOOP-5073
> URL: https://issues.apache.org/jira/browse/HADOOP-5073
> Project: Hadoop Core
> Issue Type: Sub-task
> Reporter: Sanjay Radia
> Assignee: Sanjay Radia
>
> This jira proposes an interface classification for hadoop interfaces.
> The discussion was started in email alias [email protected] in Nov
> 2008.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.