[1/2] hadoop git commit: HADOOP-11559. Add links to RackAwareness and InterfaceClassification to site index (Masatake Iwasaki via aw)

wang Mon, 17 Oct 2016 13:33:59 -0700

Repository: hadoop
Updated Branches:
  refs/heads/branch-2.8 b2667441e -> 15ff590c3



HADOOP-11559. Add links to RackAwareness and InterfaceClassification to site 
index (Masatake Iwasaki via aw)

(cherry picked from commit 7eeca90daabd74934d4c94af6f07fd598abdb4ed)

 Conflicts:
        hadoop-common-project/hadoop-common/CHANGES.txt
        hadoop-project/src/site/site.xml

(cherry picked from commit fbdb23d2afa993f96f073b9c4208282e8a280016)


Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/9d473b8d
Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/9d473b8d
Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/9d473b8d

Branch: refs/heads/branch-2.8
Commit: 9d473b8ddcea659cf362fa0f9331b10b9b4dfb7d
Parents: b266744
Author: Allen Wittenauer <a...@apache.org>
Authored: Tue Feb 10 17:06:03 2015 -0800
Committer: Andrew Wang <w...@apache.org>
Committed: Mon Oct 17 13:32:51 2016 -0700

----------------------------------------------------------------------
 .../site/markdown/InterfaceClassification.md    | 204 +++++++++++++++----
 .../src/site/markdown/RackAwareness.md          |  54 ++++-
 hadoop-project/src/site/site.xml                |   2 +-
 3 files changed, 204 insertions(+), 56 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hadoop/blob/9d473b8d/hadoop-common-project/hadoop-common/src/site/markdown/InterfaceClassification.md
----------------------------------------------------------------------
diff --git 
a/hadoop-common-project/hadoop-common/src/site/markdown/InterfaceClassification.md
 
b/hadoop-common-project/hadoop-common/src/site/markdown/InterfaceClassification.md
index 493b0dd..07abdac 100644
--- 
a/hadoop-common-project/hadoop-common/src/site/markdown/InterfaceClassification.md
+++ 
b/hadoop-common-project/hadoop-common/src/site/markdown/InterfaceClassification.md
@@ -20,80 +20,196 @@ Hadoop Interface Taxonomy: Audience and Stability 
Classification
 Motivation
 ----------
 
-The interface taxonomy classification provided here is for guidance to 
developers and users of interfaces. The classification guides a developer to 
declare the targeted audience or users of an interface and also its stability.
+The interface taxonomy classification provided here is for guidance to
+developers and users of interfaces. The classification guides a developer to
+declare the targeted audience or users of an interface and also its stability.
 
 * Benefits to the user of an interface: Knows which interfaces to use or not 
use and their stability.
-* Benefits to the developer: to prevent accidental changes of interfaces and 
hence accidental impact on users or other components or system. This is 
particularly useful in large systems with many developers who may not all have 
a shared state/history of the project.
+
+* Benefits to the developer: to prevent accidental changes of interfaces and
+  hence accidental impact on users or other components or system. This is
+  particularly useful in large systems with many developers who may not all 
have
+  a shared state/history of the project.
 
 Interface Classification
 ------------------------
 
-Hadoop adopts the following interface classification, this classification was 
derived from the [OpenSolaris 
taxonomy](http://www.opensolaris.org/os/community/arc/policies/interface-taxonomy/#Advice)
 and, to some extent, from taxonomy used inside Yahoo. Interfaces have two main 
attributes: Audience and Stability
+Hadoop adopts the following interface classification,
+this classification was derived from the
+[OpenSolaris 
taxonomy](http://www.opensolaris.org/os/community/arc/policies/interface-taxonomy/#Advice)
+and, to some extent, from taxonomy used inside Yahoo.
+Interfaces have two main attributes: Audience and Stability
 
 ### Audience
 
-Audience denotes the potential consumers of the interface. While many 
interfaces are internal/private to the implementation, other are 
public/external interfaces are meant for wider consumption by applications 
and/or clients. For example, in posix, libc is an external or public interface, 
while large parts of the kernel are internal or private interfaces. Also, some 
interfaces are targeted towards other specific subsystems.
+Audience denotes the potential consumers of the interface. While many 
interfaces
+are internal/private to the implementation, other are public/external 
interfaces
+are meant for wider consumption by applications and/or clients. For example, in
+posix, libc is an external or public interface, while large parts of the kernel
+are internal or private interfaces. Also, some interfaces are targeted towards
+other specific subsystems.
 
-Identifying the audience of an interface helps define the impact of breaking 
it. For instance, it might be okay to break the compatibility of an interface 
whose audience is a small number of specific subsystems. On the other hand, it 
is probably not okay to break a protocol interfaces that millions of Internet 
users depend on.
+Identifying the audience of an interface helps define the impact of breaking
+it. For instance, it might be okay to break the compatibility of an interface
+whose audience is a small number of specific subsystems. On the other hand, it
+is probably not okay to break a protocol interfaces that millions of Internet
+users depend on.
 
 Hadoop uses the following kinds of audience in order of increasing/wider 
visibility:
 
-* Private:
-    * The interface is for internal use within the project (such as HDFS or 
MapReduce) and should not be used by applications or by other projects. It is 
subject to change at anytime without notice. Most interfaces of a project are 
Private (also referred to as project-private).
-* Limited-Private:
-    * The interface is used by a specified set of projects or systems 
(typically closely related projects). Other projects or systems should not use 
the interface. Changes to the interface will be communicated/ negotiated with 
the specified projects. For example, in the Hadoop project, some interfaces are 
LimitedPrivate{HDFS, MapReduce} in that they are private to the HDFS and 
MapReduce projects.
-* Public
-    * The interface is for general use by any application.
+> Hadoop doesn't have a Company-Private classification, which is meant for APIs
+> which are intended to be used by other projects within the company, since it
+> doesn't apply to opensource projects. Also, certain APIs are annotated as
+> @VisibleForTesting (from com.google.common .annotations.VisibleForTesting) -
+> these are meant to be used strictly for unit tests and should be treated as
+> "Private" APIs.
+
+#### Private
+
+The interface is for internal use within the project (such as HDFS or 
MapReduce)
+and should not be used by applications or by other projects. It is subject to
+change at anytime without notice. Most interfaces of a project are Private 
(also
+referred to as project-private).
+
+#### Limited-Private
+
+The interface is used by a specified set of projects or systems (typically
+closely related projects). Other projects or systems should not use the
+interface. Changes to the interface will be communicated/ negotiated with the
+specified projects. For example, in the Hadoop project, some interfaces are
+LimitedPrivate{HDFS, MapReduce} in that they are private to the HDFS and
+MapReduce projects.
 
-Hadoop doesn't have a Company-Private classification, which is meant for APIs 
which are intended to be used by other projects within the company, since it 
doesn't apply to opensource projects. Also, certain APIs are annotated as 
@VisibleForTesting (from com.google.common .annotations.VisibleForTesting) - 
these are meant to be used strictly for unit tests and should be treated as 
"Private" APIs.
+#### Public
+
+The interface is for general use by any application.
 
 ### Stability
 
-Stability denotes how stable an interface is, as in when incompatible changes 
to the interface are allowed. Hadoop APIs have the following levels of 
stability.
+Stability denotes how stable an interface is, as in when incompatible changes 
to
+the interface are allowed. Hadoop APIs have the following levels of stability.
+
+#### Stable
+
+Can evolve while retaining compatibility for minor release boundaries; in other
+words, incompatible changes to APIs marked Stable are allowed only at major
+releases (i.e. at m.0).
+
+#### Evolving
+
+Evolving, but incompatible changes are allowed at minor release (i.e. m .x)
 
-* Stable
-    * Can evolve while retaining compatibility for minor release boundaries; 
in other words, incompatible changes to APIs marked Stable are allowed only at 
major releases (i.e. at m.0).
-* Evolving
-    * Evolving, but incompatible changes are allowed at minor release (i.e. m 
.x)
-* Unstable
-    * Incompatible changes to Unstable APIs are allowed any time. This usually 
makes sense for only private interfaces.
-    * However one may call this out for a supposedly public interface to 
highlight that it should not be used as an interface; for public interfaces, 
labeling it as Not-an-interface is probably more appropriate than "Unstable".
-        * Examples of publicly visible interfaces that are unstable (i.e. 
not-an-interface): GUI, CLIs whose output format will change
-* Deprecated
-    * APIs that could potentially removed in the future and should not be used.
+#### Unstable
+
+Incompatible changes to Unstable APIs are allowed any time. This usually makes
+sense for only private interfaces.
+
+However one may call this out for a supposedly public interface to highlight
+that it should not be used as an interface; for public interfaces, labeling it
+as Not-an-interface is probably more appropriate than "Unstable".
+
+Examples of publicly visible interfaces that are unstable
+(i.e. not-an-interface): GUI, CLIs whose output format will change
+
+#### Deprecated
+
+APIs that could potentially removed in the future and should not be used.
 
 How are the Classifications Recorded?
 -------------------------------------
 
 How will the classification be recorded for Hadoop APIs?
 
-* Each interface or class will have the audience and stability recorded using 
annotations in org.apache.hadoop.classification package.
+* Each interface or class will have the audience and stability recorded using
+  annotations in org.apache.hadoop.classification package.
+
 * The javadoc generated by the maven target javadoc:javadoc lists only the 
public API.
-* One can derive the audience of java classes and java interfaces by the 
audience of the package in which they are contained. Hence it is useful to 
declare the audience of each java package as public or private (along with the 
private audience variations).
+
+* One can derive the audience of java classes and java interfaces by the
+  audience of the package in which they are contained. Hence it is useful to
+  declare the audience of each java package as public or private (along with 
the
+  private audience variations).
 
 FAQ
 ---
 
 * Why arenât the java scopes (private, package private and public) good 
enough?
-    * Javaâs scoping is not very complete. One is often forced to make a 
class public in order for other internal components to use it. It does not have 
friends or sub-package-private like C++.
-* But I can easily access a private implementation interface if it is Java 
public. Where is the protection and control?
-    * The purpose of this is not providing absolute access control. Its 
purpose is to communicate to users and developers. One can access private 
implementation functions in libc; however if they change the internal 
implementation details, your application will break and you will have little 
sympathy from the folks who are supplying libc. If you use a non-public 
interface you understand the risks.
-* Why bother declaring the stability of a private interface? Arenât private 
interfaces always unstable?
-    * Private interfaces are not always unstable. In the cases where they are 
stable they capture internal properties of the system and can communicate these 
properties to its internal users and to developers of the interface.
-        * e.g. In HDFS, NN-DN protocol is private but stable and can help 
implement rolling upgrades. It communicates that this interface should not be 
changed in incompatible ways even though it is private.
+    * Javaâs scoping is not very complete. One is often forced to make a 
class
+      public in order for other internal components to use it. It does not have
+      friends or sub-package-private like C++.
+
+* But I can easily access a private implementation interface if it is Java 
public.
+  Where is the protection and control?
+    * The purpose of this is not providing absolute access control. Its purpose
+      is to communicate to users and developers. One can access private
+      implementation functions in libc; however if they change the internal
+      implementation details, your application will break and you will have
+      little sympathy from the folks who are supplying libc. If you use a
+      non-public interface you understand the risks.
+
+* Why bother declaring the stability of a private interface?
+  Arenât private interfaces always unstable?
+    * Private interfaces are not always unstable. In the cases where they are
+      stable they capture internal properties of the system and can communicate
+      these properties to its internal users and to developers of the 
interface.
+        * e.g. In HDFS, NN-DN protocol is private but stable and can help
+          implement rolling upgrades. It communicates that this interface 
should
+          not be changed in incompatible ways even though it is private.
         * e.g. In HDFS, FSImage stability can help provide more flexible roll 
backs.
-* What is the harm in applications using a private interface that is stable? 
How is it different than a public stable interface?
-    * While a private interface marked as stable is targeted to change only at 
major releases, it may break at other times if the providers of that interface 
are willing to changes the internal users of that interface. Further, a public 
stable interface is less likely to break even at major releases (even though it 
is allowed to break compatibility) because the impact of the change is larger. 
If you use a private interface (regardless of its stability) you run the risk 
of incompatibility.
-* Why bother with Limited-private? Isnât it giving special treatment to some 
projects? That is not fair.
-    * First, most interfaces should be public or private; actually let us 
state it even stronger: make it private unless you really want to expose it to 
public for general use.
-    * Limited-private is for interfaces that are not intended for general use. 
They are exposed to related projects that need special hooks. Such a 
classification has a cost to both the supplier and consumer of the limited 
interface. Both will have to work together if ever there is a need to break the 
interface in the future; for example the supplier and the consumers will have 
to work together to get coordinated releases of their respective projects. This 
should not be taken lightly â if you can get away with private then do so; if 
the interface is really for general use for all applications then do so. But 
remember that making an interface public has huge responsibility. Sometimes 
Limited-private is just right.
-    * A good example of a limited-private interface is BlockLocations, This is 
fairly low-level interface that we are willing to expose to MR and perhaps 
HBase. We are likely to change it down the road and at that time we will have 
get a coordinated effort with the MR team to release matching releases. While 
MR and HDFS are always released in sync today, they may change down the road.
-    * If you have a limited-private interface with many projects listed then 
you are fooling yourself. It is practically public.
-    * It might be worth declaring a special audience classification called 
Hadoop-Private for the Hadoop family.
-* Lets treat all private interfaces as Hadoop-private. What is the harm in 
projects in the Hadoop family have access to private classes?
-    * Do we want MR accessing class files that are implementation details 
inside HDFS. There used to be many such layer violations in the code that we 
have been cleaning up over the last few years. We donât want such layer 
violations to creep back in by no separating between the major components like 
HDFS and MR.
-* Aren't all public interfaces stable?
-    * One may mark a public interface as evolving in its early days. Here one 
is promising to make an effort to make compatible changes but may need to break 
it at minor releases.
-    * One example of a public interface that is unstable is where one is 
providing an implementation of a standards-body based interface that is still 
under development. For example, many companies, in an attampt to be first to 
market, have provided implementations of a new NFS protocol even when the 
protocol was not fully completed by IETF. The implementor cannot evolve the 
interface in a fashion that causes least distruption because the stability is 
controlled by the standards body. Hence it is appropriate to label the 
interface as unstable.
 
+* What is the harm in applications using a private interface that is stable? 
How
+  is it different than a public stable interface?
+    * While a private interface marked as stable is targeted to change only at
+      major releases, it may break at other times if the providers of that
+      interface are willing to changes the internal users of that
+      interface. Further, a public stable interface is less likely to break 
even
+      at major releases (even though it is allowed to break compatibility)
+      because the impact of the change is larger. If you use a private 
interface
+      (regardless of its stability) you run the risk of incompatibility.
+
+* Why bother with Limited-private? Isnât it giving special treatment to some 
projects?
+  That is not fair.
+    * First, most interfaces should be public or private; actually let us state
+      it even stronger: make it private unless you really want to expose it to
+      public for general use.
+    * Limited-private is for interfaces that are not intended for general
+      use. They are exposed to related projects that need special hooks. Such a
+      classification has a cost to both the supplier and consumer of the 
limited
+      interface. Both will have to work together if ever there is a need to
+      break the interface in the future; for example the supplier and the
+      consumers will have to work together to get coordinated releases of their
+      respective projects. This should not be taken lightly â if you can get
+      away with private then do so; if the interface is really for general use
+      for all applications then do so. But remember that making an interface
+      public has huge responsibility. Sometimes Limited-private is just right.
+    * A good example of a limited-private interface is BlockLocations, This is
+      fairly low-level interface that we are willing to expose to MR and 
perhaps
+      HBase. We are likely to change it down the road and at that time we will
+      have get a coordinated effort with the MR team to release matching
+      releases. While MR and HDFS are always released in sync today, they may
+      change down the road.
+    * If you have a limited-private interface with many projects listed then 
you
+      are fooling yourself. It is practically public.
+    * It might be worth declaring a special audience classification called
+      Hadoop-Private for the Hadoop family.
+
+* Lets treat all private interfaces as Hadoop-private. What is the harm in
+  projects in the Hadoop family have access to private classes?
+    * Do we want MR accessing class files that are implementation details 
inside
+      HDFS. There used to be many such layer violations in the code that we 
have
+      been cleaning up over the last few years. We donât want such layer
+      violations to creep back in by no separating between the major components
+      like HDFS and MR.
 
+* Aren't all public interfaces stable?
+    * One may mark a public interface as evolving in its early days. Here one 
is
+      promising to make an effort to make compatible changes but may need to
+      break it at minor releases.
+    * One example of a public interface that is unstable is where one is
+      providing an implementation of a standards-body based interface that is
+      still under development. For example, many companies, in an attampt to be
+      first to market, have provided implementations of a new NFS protocol even
+      when the protocol was not fully completed by IETF. The implementor cannot
+      evolve the interface in a fashion that causes least distruption because
+      the stability is controlled by the standards body. Hence it is 
appropriate
+      to label the interface as unstable.

http://git-wip-us.apache.org/repos/asf/hadoop/blob/9d473b8d/hadoop-common-project/hadoop-common/src/site/markdown/RackAwareness.md
----------------------------------------------------------------------
diff --git 
a/hadoop-common-project/hadoop-common/src/site/markdown/RackAwareness.md 
b/hadoop-common-project/hadoop-common/src/site/markdown/RackAwareness.md
index 12c7bf6..ced6c84 100644
--- a/hadoop-common-project/hadoop-common/src/site/markdown/RackAwareness.md
+++ b/hadoop-common-project/hadoop-common/src/site/markdown/RackAwareness.md
@@ -17,17 +17,49 @@
 Rack Awareness
 ==============
 
-Hadoop components are rack-aware. For example, HDFS block placement will use 
rack awareness for fault tolerance by placing one block replica on a different 
rack. This provides data availability in the event of a network switch failure 
or partition within the cluster.
-
-Hadoop master daemons obtain the rack id of the cluster slaves by invoking 
either an external script or java class as specified by configuration files. 
Using either the java class or external script for topology, output must adhere 
to the java **org.apache.hadoop.net.DNSToSwitchMapping** interface. The 
interface expects a one-to-one correspondence to be maintained and the topology 
information in the format of '/myrack/myhost', where '/' is the topology 
delimiter, 'myrack' is the rack identifier, and 'myhost' is the individual 
host. Assuming a single /24 subnet per rack, one could use the format of 
'/192.168.100.0/192.168.100.5' as a unique rack-host topology mapping.
-
-To use the java class for topology mapping, the class name is specified by the 
**topology.node.switch.mapping.impl** parameter in the configuration file. An 
example, NetworkTopology.java, is included with the hadoop distribution and can 
be customized by the Hadoop administrator. Using a Java class instead of an 
external script has a performance benefit in that Hadoop doesn't need to fork 
an external process when a new slave node registers itself.
-
-If implementing an external script, it will be specified with the 
**topology.script.file.name** parameter in the configuration files. Unlike the 
java class, the external topology script is not included with the Hadoop 
distribution and is provided by the administrator. Hadoop will send multiple IP 
addresses to ARGV when forking the topology script. The number of IP addresses 
sent to the topology script is controlled with 
**net.topology.script.number.args** and defaults to 100. If 
**net.topology.script.number.args** was changed to 1, a topology script would 
get forked for each IP submitted by DataNodes and/or NodeManagers.
-
-If **topology.script.file.name** or **topology.node.switch.mapping.impl** is 
not set, the rack id '/default-rack' is returned for any passed IP address. 
While this behavior appears desirable, it can cause issues with HDFS block 
replication as default behavior is to write one replicated block off rack and 
is unable to do so as there is only a single rack named '/default-rack'.
-
-An additional configuration setting is 
**mapreduce.jobtracker.taskcache.levels** which determines the number of levels 
(in the network topology) of caches MapReduce will use. So, for example, if it 
is the default value of 2, two levels of caches will be constructed - one for 
hosts (host -\> task mapping) and another for racks (rack -\> task mapping). 
Giving us our one-to-one mapping of '/myrack/myhost'.
+Hadoop components are rack-aware. For example, HDFS block placement
+will use rack awareness for fault tolerance by placing one block
+replica on a different rack. This provides data availability in the
+event of a network switch failure or partition within the cluster.
+
+Hadoop master daemons obtain the rack id of the cluster slaves by
+invoking either an external script or java class as specified by
+configuration files. Using either the java class or external script
+for topology, output must adhere to the java
+**org.apache.hadoop.net.DNSToSwitchMapping** interface. The interface
+expects a one-to-one correspondence to be maintained and the topology
+information in the format of '/myrack/myhost', where '/' is the
+topology delimiter, 'myrack' is the rack identifier, and 'myhost' is
+the individual host. Assuming a single /24 subnet per rack, one could
+use the format of '/192.168.100.0/192.168.100.5' as a unique rack-host
+topology mapping.
+
+To use the java class for topology mapping, the class name is
+specified by the **net.topology.node.switch.mapping.impl** parameter
+in the configuration file. An example, NetworkTopology.java, is
+included with the hadoop distribution and can be customized by the
+Hadoop administrator. Using a Java class instead of an external script
+has a performance benefit in that Hadoop doesn't need to fork an
+external process when a new slave node registers itself.
+
+If implementing an external script, it will be specified with the
+**net.topology.script.file.name** parameter in the configuration
+files. Unlike the java class, the external topology script is not
+included with the Hadoop distribution and is provided by the
+administrator. Hadoop will send multiple IP addresses to ARGV when
+forking the topology script. The number of IP addresses sent to the
+topology script is controlled with **net.topology.script.number.args**
+and defaults to 100. If **net.topology.script.number.args** was
+changed to 1, a topology script would get forked for each IP submitted
+by DataNodes and/or NodeManagers.
+
+If **net.topology.script.file.name** or
+**net.topology.node.switch.mapping.impl** is not set, the rack id
+'/default-rack' is returned for any passed IP address. While this
+behavior appears desirable, it can cause issues with HDFS block
+replication as default behavior is to write one replicated block off
+rack and is unable to do so as there is only a single rack named
+'/default-rack'.
 
 python Example
 --------------

http://git-wip-us.apache.org/repos/asf/hadoop/blob/9d473b8d/hadoop-project/src/site/site.xml
----------------------------------------------------------------------
diff --git a/hadoop-project/src/site/site.xml b/hadoop-project/src/site/site.xml
index 0705cae..403048e 100644
--- a/hadoop-project/src/site/site.xml
+++ b/hadoop-project/src/site/site.xml
@@ -52,7 +52,7 @@
       <item name="Cluster Setup" 
href="hadoop-project-dist/hadoop-common/ClusterSetup.html"/>
       <item name="Commands Reference" 
href="hadoop-project-dist/hadoop-common/CommandsManual.html"/>
       <item name="FileSystem Shell" 
href="hadoop-project-dist/hadoop-common/FileSystemShell.html"/>
-      <item name="Compatibility" 
href="hadoop-project-dist/hadoop-common/Compatibility.html"/>
+      <item name="Hadoop Compatibility" 
href="hadoop-project-dist/hadoop-common/Compatibility.html"/>
       <item name="Interface Classification" 
href="hadoop-project-dist/hadoop-common/InterfaceClassification.html"/>
       <item name="FileSystem Specification"
         href="hadoop-project-dist/hadoop-common/filesystem/index.html"/>


---------------------------------------------------------------------
To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-commits-h...@hadoop.apache.org

[1/2] hadoop git commit: HADOOP-11559. Add links to RackAwareness and InterfaceClassification to site index (Masatake Iwasaki via aw)

Reply via email to