[20/50] [abbrv] hadoop git commit: HDFS-7668. Convert site documentation from apt to markdown (Masatake Iwasaki via aw)

zhz Mon, 23 Feb 2015 11:37:56 -0800

http://git-wip-us.apache.org/repos/asf/hadoop/blob/a45ef2b6/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md 
b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md
new file mode 100644
index 0000000..87a9fcd
--- /dev/null
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md
@@ -0,0 +1,240 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+HDFS Architecture
+=================
+
+* [HDFS Architecture](#HDFS_Architecture)
+    * [Introduction](#Introduction)
+    * [Assumptions and Goals](#Assumptions_and_Goals)
+        * [Hardware Failure](#Hardware_Failure)
+        * [Streaming Data Access](#Streaming_Data_Access)
+        * [Large Data Sets](#Large_Data_Sets)
+        * [Simple Coherency Model](#Simple_Coherency_Model)
+        * ["Moving Computation is Cheaper than Moving 
Data"](#aMoving_Computation_is_Cheaper_than_Moving_Data)
+        * [Portability Across Heterogeneous Hardware and Software 
Platforms](#Portability_Across_Heterogeneous_Hardware_and_Software_Platforms)
+    * [NameNode and DataNodes](#NameNode_and_DataNodes)
+    * [The File System Namespace](#The_File_System_Namespace)
+    * [Data Replication](#Data_Replication)
+        * [Replica Placement: The First Baby 
Steps](#Replica_Placement:_The_First_Baby_Steps)
+        * [Replica Selection](#Replica_Selection)
+        * [Safemode](#Safemode)
+    * [The Persistence of File System 
Metadata](#The_Persistence_of_File_System_Metadata)
+    * [The Communication Protocols](#The_Communication_Protocols)
+    * [Robustness](#Robustness)
+        * [Data Disk Failure, Heartbeats and 
Re-Replication](#Data_Disk_Failure_Heartbeats_and_Re-Replication)
+        * [Cluster Rebalancing](#Cluster_Rebalancing)
+        * [Data Integrity](#Data_Integrity)
+        * [Metadata Disk Failure](#Metadata_Disk_Failure)
+        * [Snapshots](#Snapshots)
+    * [Data Organization](#Data_Organization)
+        * [Data Blocks](#Data_Blocks)
+        * [Staging](#Staging)
+        * [Replication Pipelining](#Replication_Pipelining)
+    * [Accessibility](#Accessibility)
+        * [FS Shell](#FS_Shell)
+        * [DFSAdmin](#DFSAdmin)
+        * [Browser Interface](#Browser_Interface)
+    * [Space Reclamation](#Space_Reclamation)
+        * [File Deletes and Undeletes](#File_Deletes_and_Undeletes)
+        * [Decrease Replication Factor](#Decrease_Replication_Factor)
+    * [References](#References)
+
+Introduction
+------------
+
+The Hadoop Distributed File System (HDFS) is a distributed file system 
designed to run on commodity hardware. It has many similarities with existing 
distributed file systems. However, the differences from other distributed file 
systems are significant. HDFS is highly fault-tolerant and is designed to be 
deployed on low-cost hardware. HDFS provides high throughput access to 
application data and is suitable for applications that have large data sets. 
HDFS relaxes a few POSIX requirements to enable streaming access to file system 
data. HDFS was originally built as infrastructure for the Apache Nutch web 
search engine project. HDFS is part of the Apache Hadoop Core project. The 
project URL is <http://hadoop.apache.org/>.
+
+Assumptions and Goals
+---------------------
+
+### Hardware Failure
+
+Hardware failure is the norm rather than the exception. An HDFS instance may 
consist of hundreds or thousands of server machines, each storing part of the 
file systemâs data. The fact that there are a huge number of components and 
that each component has a non-trivial probability of failure means that some 
component of HDFS is always non-functional. Therefore, detection of faults and 
quick, automatic recovery from them is a core architectural goal of HDFS.
+
+### Streaming Data Access
+
+Applications that run on HDFS need streaming access to their data sets. They 
are not general purpose applications that typically run on general purpose file 
systems. HDFS is designed more for batch processing rather than interactive use 
by users. The emphasis is on high throughput of data access rather than low 
latency of data access. POSIX imposes many hard requirements that are not 
needed for applications that are targeted for HDFS. POSIX semantics in a few 
key areas has been traded to increase data throughput rates.
+
+### Large Data Sets
+
+Applications that run on HDFS have large data sets. A typical file in HDFS is 
gigabytes to terabytes in size. Thus, HDFS is tuned to support large files. It 
should provide high aggregate data bandwidth and scale to hundreds of nodes in 
a single cluster. It should support tens of millions of files in a single 
instance.
+
+### Simple Coherency Model
+
+HDFS applications need a write-once-read-many access model for files. A file 
once created, written, and closed need not be changed. This assumption 
simplifies data coherency issues and enables high throughput data access. A 
Map/Reduce application or a web crawler application fits perfectly with this 
model. There is a plan to support appending-writes to files in the future.
+
+### "Moving Computation is Cheaper than Moving Data"
+
+A computation requested by an application is much more efficient if it is 
executed near the data it operates on. This is especially true when the size of 
the data set is huge. This minimizes network congestion and increases the 
overall throughput of the system. The assumption is that it is often better to 
migrate the computation closer to where the data is located rather than moving 
the data to where the application is running. HDFS provides interfaces for 
applications to move themselves closer to where the data is located.
+
+### Portability Across Heterogeneous Hardware and Software Platforms
+
+HDFS has been designed to be easily portable from one platform to another. 
This facilitates widespread adoption of HDFS as a platform of choice for a 
large set of applications.
+
+NameNode and DataNodes
+----------------------
+
+HDFS has a master/slave architecture. An HDFS cluster consists of a single 
NameNode, a master server that manages the file system namespace and regulates 
access to files by clients. In addition, there are a number of DataNodes, 
usually one per node in the cluster, which manage storage attached to the nodes 
that they run on. HDFS exposes a file system namespace and allows user data to 
be stored in files. Internally, a file is split into one or more blocks and 
these blocks are stored in a set of DataNodes. The NameNode executes file 
system namespace operations like opening, closing, and renaming files and 
directories. It also determines the mapping of blocks to DataNodes. The 
DataNodes are responsible for serving read and write requests from the file 
systemâs clients. The DataNodes also perform block creation, deletion, and 
replication upon instruction from the NameNode.
+
+![HDFS Architecture](images/hdfsarchitecture.png)
+
+The NameNode and DataNode are pieces of software designed to run on commodity 
machines. These machines typically run a GNU/Linux operating system (OS). HDFS 
is built using the Java language; any machine that supports Java can run the 
NameNode or the DataNode software. Usage of the highly portable Java language 
means that HDFS can be deployed on a wide range of machines. A typical 
deployment has a dedicated machine that runs only the NameNode software. Each 
of the other machines in the cluster runs one instance of the DataNode 
software. The architecture does not preclude running multiple DataNodes on the 
same machine but in a real deployment that is rarely the case.
+
+The existence of a single NameNode in a cluster greatly simplifies the 
architecture of the system. The NameNode is the arbitrator and repository for 
all HDFS metadata. The system is designed in such a way that user data never 
flows through the NameNode.
+
+The File System Namespace
+-------------------------
+
+HDFS supports a traditional hierarchical file organization. A user or an 
application can create directories and store files inside these directories. 
The file system namespace hierarchy is similar to most other existing file 
systems; one can create and remove files, move a file from one directory to 
another, or rename a file. HDFS does not yet implement user quotas or access 
permissions. HDFS does not support hard links or soft links. However, the HDFS 
architecture does not preclude implementing these features.
+
+The NameNode maintains the file system namespace. Any change to the file 
system namespace or its properties is recorded by the NameNode. An application 
can specify the number of replicas of a file that should be maintained by HDFS. 
The number of copies of a file is called the replication factor of that file. 
This information is stored by the NameNode.
+
+Data Replication
+----------------
+
+HDFS is designed to reliably store very large files across machines in a large 
cluster. It stores each file as a sequence of blocks; all blocks in a file 
except the last block are the same size. The blocks of a file are replicated 
for fault tolerance. The block size and replication factor are configurable per 
file. An application can specify the number of replicas of a file. The 
replication factor can be specified at file creation time and can be changed 
later. Files in HDFS are write-once and have strictly one writer at any time.
+
+The NameNode makes all decisions regarding replication of blocks. It 
periodically receives a Heartbeat and a Blockreport from each of the DataNodes 
in the cluster. Receipt of a Heartbeat implies that the DataNode is functioning 
properly. A Blockreport contains a list of all blocks on a DataNode.
+
+![HDFS DataNodes](images/hdfsdatanodes.png)
+
+### Replica Placement: The First Baby Steps
+
+The placement of replicas is critical to HDFS reliability and performance. 
Optimizing replica placement distinguishes HDFS from most other distributed 
file systems. This is a feature that needs lots of tuning and experience. The 
purpose of a rack-aware replica placement policy is to improve data 
reliability, availability, and network bandwidth utilization. The current 
implementation for the replica placement policy is a first effort in this 
direction. The short-term goals of implementing this policy are to validate it 
on production systems, learn more about its behavior, and build a foundation to 
test and research more sophisticated policies.
+
+Large HDFS instances run on a cluster of computers that commonly spread across 
many racks. Communication between two nodes in different racks has to go 
through switches. In most cases, network bandwidth between machines in the same 
rack is greater than network bandwidth between machines in different racks.
+
+The NameNode determines the rack id each DataNode belongs to via the process 
outlined in [Hadoop Rack 
Awareness](../hadoop-common/ClusterSetup.html#HadoopRackAwareness). A simple 
but non-optimal policy is to place replicas on unique racks. This prevents 
losing data when an entire rack fails and allows use of bandwidth from multiple 
racks when reading data. This policy evenly distributes replicas in the cluster 
which makes it easy to balance load on component failure. However, this policy 
increases the cost of writes because a write needs to transfer blocks to 
multiple racks.
+
+For the common case, when the replication factor is three, HDFSâs placement 
policy is to put one replica on one node in the local rack, another on a 
different node in the local rack, and the last on a different node in a 
different rack. This policy cuts the inter-rack write traffic which generally 
improves write performance. The chance of rack failure is far less than that of 
node failure; this policy does not impact data reliability and availability 
guarantees. However, it does reduce the aggregate network bandwidth used when 
reading data since a block is placed in only two unique racks rather than 
three. With this policy, the replicas of a file do not evenly distribute across 
the racks. One third of replicas are on one node, two thirds of replicas are on 
one rack, and the other third are evenly distributed across the remaining 
racks. This policy improves write performance without compromising data 
reliability or read performance.
+
+The current, default replica placement policy described here is a work in 
progress.
+
+### Replica Selection
+
+To minimize global bandwidth consumption and read latency, HDFS tries to 
satisfy a read request from a replica that is closest to the reader. If there 
exists a replica on the same rack as the reader node, then that replica is 
preferred to satisfy the read request. If angg/ HDFS cluster spans multiple 
data centers, then a replica that is resident in the local data center is 
preferred over any remote replica.
+
+### Safemode
+
+On startup, the NameNode enters a special state called Safemode. Replication 
of data blocks does not occur when the NameNode is in the Safemode state. The 
NameNode receives Heartbeat and Blockreport messages from the DataNodes. A 
Blockreport contains the list of data blocks that a DataNode is hosting. Each 
block has a specified minimum number of replicas. A block is considered safely 
replicated when the minimum number of replicas of that data block has checked 
in with the NameNode. After a configurable percentage of safely replicated data 
blocks checks in with the NameNode (plus an additional 30 seconds), the 
NameNode exits the Safemode state. It then determines the list of data blocks 
(if any) that still have fewer than the specified number of replicas. The 
NameNode then replicates these blocks to other DataNodes.
+
+The Persistence of File System Metadata
+---------------------------------------
+
+The HDFS namespace is stored by the NameNode. The NameNode uses a transaction 
log called the EditLog to persistently record every change that occurs to file 
system metadata. For example, creating a new file in HDFS causes the NameNode 
to insert a record into the EditLog indicating this. Similarly, changing the 
replication factor of a file causes a new record to be inserted into the 
EditLog. The NameNode uses a file in its local host OS file system to store the 
EditLog. The entire file system namespace, including the mapping of blocks to 
files and file system properties, is stored in a file called the FsImage. The 
FsImage is stored as a file in the NameNodeâs local file system too.
+
+The NameNode keeps an image of the entire file system namespace and file 
Blockmap in memory. This key metadata item is designed to be compact, such that 
a NameNode with 4 GB of RAM is plenty to support a huge number of files and 
directories. When the NameNode starts up, it reads the FsImage and EditLog from 
disk, applies all the transactions from the EditLog to the in-memory 
representation of the FsImage, and flushes out this new version into a new 
FsImage on disk. It can then truncate the old EditLog because its transactions 
have been applied to the persistent FsImage. This process is called a 
checkpoint. In the current implementation, a checkpoint only occurs when the 
NameNode starts up. Work is in progress to support periodic checkpointing in 
the near future.
+
+The DataNode stores HDFS data in files in its local file system. The DataNode 
has no knowledge about HDFS files. It stores each block of HDFS data in a 
separate file in its local file system. The DataNode does not create all files 
in the same directory. Instead, it uses a heuristic to determine the optimal 
number of files per directory and creates subdirectories appropriately. It is 
not optimal to create all local files in the same directory because the local 
file system might not be able to efficiently support a huge number of files in 
a single directory. When a DataNode starts up, it scans through its local file 
system, generates a list of all HDFS data blocks that correspond to each of 
these local files and sends this report to the NameNode: this is the 
Blockreport.
+
+The Communication Protocols
+---------------------------
+
+All HDFS communication protocols are layered on top of the TCP/IP protocol. A 
client establishes a connection to a configurable TCP port on the NameNode 
machine. It talks the ClientProtocol with the NameNode. The DataNodes talk to 
the NameNode using the DataNode Protocol. A Remote Procedure Call (RPC) 
abstraction wraps both the Client Protocol and the DataNode Protocol. By 
design, the NameNode never initiates any RPCs. Instead, it only responds to RPC 
requests issued by DataNodes or clients.
+
+Robustness
+----------
+
+The primary objective of HDFS is to store data reliably even in the presence 
of failures. The three common types of failures are NameNode failures, DataNode 
failures and network partitions.
+
+### Data Disk Failure, Heartbeats and Re-Replication
+
+Each DataNode sends a Heartbeat message to the NameNode periodically. A 
network partition can cause a subset of DataNodes to lose connectivity with the 
NameNode. The NameNode detects this condition by the absence of a Heartbeat 
message. The NameNode marks DataNodes without recent Heartbeats as dead and 
does not forward any new IO requests to them. Any data that was registered to a 
dead DataNode is not available to HDFS any more. DataNode death may cause the 
replication factor of some blocks to fall below their specified value. The 
NameNode constantly tracks which blocks need to be replicated and initiates 
replication whenever necessary. The necessity for re-replication may arise due 
to many reasons: a DataNode may become unavailable, a replica may become 
corrupted, a hard disk on a DataNode may fail, or the replication factor of a 
file may be increased.
+
+### Cluster Rebalancing
+
+The HDFS architecture is compatible with data rebalancing schemes. A scheme 
might automatically move data from one DataNode to another if the free space on 
a DataNode falls below a certain threshold. In the event of a sudden high 
demand for a particular file, a scheme might dynamically create additional 
replicas and rebalance other data in the cluster. These types of data 
rebalancing schemes are not yet implemented.
+
+### Data Integrity
+
+It is possible that a block of data fetched from a DataNode arrives corrupted. 
This corruption can occur because of faults in a storage device, network 
faults, or buggy software. The HDFS client software implements checksum 
checking on the contents of HDFS files. When a client creates an HDFS file, it 
computes a checksum of each block of the file and stores these checksums in a 
separate hidden file in the same HDFS namespace. When a client retrieves file 
contents it verifies that the data it received from each DataNode matches the 
checksum stored in the associated checksum file. If not, then the client can 
opt to retrieve that block from another DataNode that has a replica of that 
block.
+
+### Metadata Disk Failure
+
+The FsImage and the EditLog are central data structures of HDFS. A corruption 
of these files can cause the HDFS instance to be non-functional. For this 
reason, the NameNode can be configured to support maintaining multiple copies 
of the FsImage and EditLog. Any update to either the FsImage or EditLog causes 
each of the FsImages and EditLogs to get updated synchronously. This 
synchronous updating of multiple copies of the FsImage and EditLog may degrade 
the rate of namespace transactions per second that a NameNode can support. 
However, this degradation is acceptable because even though HDFS applications 
are very data intensive in nature, they are not metadata intensive. When a 
NameNode restarts, it selects the latest consistent FsImage and EditLog to use.
+
+The NameNode machine is a single point of failure for an HDFS cluster. If the 
NameNode machine fails, manual intervention is necessary. Currently, automatic 
restart and failover of the NameNode software to another machine is not 
supported.
+
+### Snapshots
+
+Snapshots support storing a copy of data at a particular instant of time. One 
usage of the snapshot feature may be to roll back a corrupted HDFS instance to 
a previously known good point in time. HDFS does not currently support 
snapshots but will in a future release.
+
+Data Organization
+-----------------
+
+### Data Blocks
+
+HDFS is designed to support very large files. Applications that are compatible 
with HDFS are those that deal with large data sets. These applications write 
their data only once but they read it one or more times and require these reads 
to be satisfied at streaming speeds. HDFS supports write-once-read-many 
semantics on files. A typical block size used by HDFS is 64 MB. Thus, an HDFS 
file is chopped up into 64 MB chunks, and if possible, each chunk will reside 
on a different DataNode.
+
+### Staging
+
+A client request to create a file does not reach the NameNode immediately. In 
fact, initially the HDFS client caches the file data into a temporary local 
file. Application writes are transparently redirected to this temporary local 
file. When the local file accumulates data worth over one HDFS block size, the 
client contacts the NameNode. The NameNode inserts the file name into the file 
system hierarchy and allocates a data block for it. The NameNode responds to 
the client request with the identity of the DataNode and the destination data 
block. Then the client flushes the block of data from the local temporary file 
to the specified DataNode. When a file is closed, the remaining un-flushed data 
in the temporary local file is transferred to the DataNode. The client then 
tells the NameNode that the file is closed. At this point, the NameNode commits 
the file creation operation into a persistent store. If the NameNode dies 
before the file is closed, the file is lost.
+
+The above approach has been adopted after careful consideration of target 
applications that run on HDFS. These applications need streaming writes to 
files. If a client writes to a remote file directly without any client side 
buffering, the network speed and the congestion in the network impacts 
throughput considerably. This approach is not without precedent. Earlier 
distributed file systems, e.g. AFS, have used client side caching to improve 
performance. A POSIX requirement has been relaxed to achieve higher performance 
of data uploads.
+
+### Replication Pipelining
+
+When a client is writing data to an HDFS file, its data is first written to a 
local file as explained in the previous section. Suppose the HDFS file has a 
replication factor of three. When the local file accumulates a full block of 
user data, the client retrieves a list of DataNodes from the NameNode. This 
list contains the DataNodes that will host a replica of that block. The client 
then flushes the data block to the first DataNode. The first DataNode starts 
receiving the data in small portions, writes each portion to its local 
repository and transfers that portion to the second DataNode in the list. The 
second DataNode, in turn starts receiving each portion of the data block, 
writes that portion to its repository and then flushes that portion to the 
third DataNode. Finally, the third DataNode writes the data to its local 
repository. Thus, a DataNode can be receiving data from the previous one in the 
pipeline and at the same time forwarding data to the next one in the pipeline. 
Thu
 s, the data is pipelined from one DataNode to the next.
+
+Accessibility
+-------------
+
+HDFS can be accessed from applications in many different ways. Natively, HDFS 
provides a [FileSystem Java API](http://hadoop.apache.org/docs/current/api/) 
for applications to use. A C language wrapper for this Java API is also 
available. In addition, an HTTP browser can also be used to browse the files of 
an HDFS instance. Work is in progress to expose HDFS through the WebDAV 
protocol.
+
+### FS Shell
+
+HDFS allows user data to be organized in the form of files and directories. It 
provides a commandline interface called FS shell that lets a user interact with 
the data in HDFS. The syntax of this command set is similar to other shells 
(e.g. bash, csh) that users are already familiar with. Here are some sample 
action/command pairs:
+
+| Action | Command |
+|:---- |:---- |
+| Create a directory named `/foodir` | `bin/hadoop dfs -mkdir /foodir` |
+| Remove a directory named `/foodir` | `bin/hadoop fs -rm -R /foodir` |
+| View the contents of a file named `/foodir/myfile.txt` | `bin/hadoop dfs 
-cat /foodir/myfile.txt` |
+
+FS shell is targeted for applications that need a scripting language to 
interact with the stored data.
+
+### DFSAdmin
+
+The DFSAdmin command set is used for administering an HDFS cluster. These are 
commands that are used only by an HDFS administrator. Here are some sample 
action/command pairs:
+
+| Action | Command |
+|:---- |:---- |
+| Put the cluster in Safemode | `bin/hdfs dfsadmin -safemode enter` |
+| Generate a list of DataNodes | `bin/hdfs dfsadmin -report` |
+| Recommission or decommission DataNode(s) | `bin/hdfs dfsadmin -refreshNodes` 
|
+
+### Browser Interface
+
+A typical HDFS install configures a web server to expose the HDFS namespace 
through a configurable TCP port. This allows a user to navigate the HDFS 
namespace and view the contents of its files using a web browser.
+
+Space Reclamation
+-----------------
+
+### File Deletes and Undeletes
+
+When a file is deleted by a user or an application, it is not immediately 
removed from HDFS. Instead, HDFS first renames it to a file in the `/trash` 
directory. The file can be restored quickly as long as it remains in `/trash`. 
A file remains in `/trash` for a configurable amount of time. After the expiry 
of its life in `/trash`, the NameNode deletes the file from the HDFS namespace. 
The deletion of a file causes the blocks associated with the file to be freed. 
Note that there could be an appreciable time delay between the time a file is 
deleted by a user and the time of the corresponding increase in free space in 
HDFS.
+
+A user can Undelete a file after deleting it as long as it remains in the 
`/trash` directory. If a user wants to undelete a file that he/she has deleted, 
he/she can navigate the `/trash` directory and retrieve the file. The `/trash` 
directory contains only the latest copy of the file that was deleted. The 
`/trash` directory is just like any other directory with one special feature: 
HDFS applies specified policies to automatically delete files from this 
directory. Current default trash interval is set to 0 (Deletes file without 
storing in trash). This value is configurable parameter stored as 
`fs.trash.interval` stored in core-site.xml.
+
+### Decrease Replication Factor
+
+When the replication factor of a file is reduced, the NameNode selects excess 
replicas that can be deleted. The next Heartbeat transfers this information to 
the DataNode. The DataNode then removes the corresponding blocks and the 
corresponding free space appears in the cluster. Once again, there might be a 
time delay between the completion of the setReplication API call and the 
appearance of free space in the cluster.
+
+References
+----------
+
+Hadoop [JavaDoc API](http://hadoop.apache.org/docs/current/api/).
+
+HDFS source code: <http://hadoop.apache.org/version_control.html>


http://git-wip-us.apache.org/repos/asf/hadoop/blob/a45ef2b6/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsEditsViewer.md
----------------------------------------------------------------------
diff --git 
a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsEditsViewer.md 
b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsEditsViewer.md
new file mode 100644
index 0000000..267471f
--- /dev/null
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsEditsViewer.md
@@ -0,0 +1,69 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+Offline Edits Viewer Guide
+==========================
+
+* [Offline Edits Viewer Guide](#Offline_Edits_Viewer_Guide)
+    * [Overview](#Overview)
+    * [Usage](#Usage)
+    * [Case study: Hadoop cluster 
recovery](#Case_study:_Hadoop_cluster_recovery)
+
+Overview
+--------
+
+Offline Edits Viewer is a tool to parse the Edits log file. The current 
processors are mostly useful for conversion between different formats, 
including XML which is human readable and easier to edit than native binary 
format.
+
+The tool can parse the edits formats -18 (roughly Hadoop 0.19) and later. The 
tool operates on files only, it does not need Hadoop cluster to be running.
+
+Input formats supported:
+
+1.  **binary**: native binary format that Hadoop uses internally
+2.  **xml**: XML format, as produced by xml processor, used if filename
+    has `.xml` (case insensitive) extension
+
+The Offline Edits Viewer provides several output processors (unless stated 
otherwise the output of the processor can be converted back to original edits 
file):
+
+1.  **binary**: native binary format that Hadoop uses internally
+2.  **xml**: XML format
+3.  **stats**: prints out statistics, this cannot be converted back to
+    Edits file
+
+Usage
+-----
+
+       bash$ bin/hdfs oev -i edits -o edits.xml
+
+|                                       Flag | Description |
+|:---- |:---- |
+| [`-i` ; `--inputFile`] *input file* | Specify the input edits log file to 
process. Xml (case insensitive) extension means XML format otherwise binary 
format is assumed. Required. |
+| [`-o` ; `--outputFile`] *output file* | Specify the output filename, if the 
specified output processor generates one. If the specified file already exists, 
it is silently overwritten. Required. |
+| [`-p` ; `--processor`] *processor* | Specify the image processor to apply 
against the image file. Currently valid options are `binary`, `xml` (default) 
and `stats`. |
+| [`-v` ; `--verbose`] | Print the input and output filenames and pipe output 
of processor to console as well as specified file. On extremely large files, 
this may increase processing time by an order of magnitude. |
+| [`-h` ; `--help`] | Display the tool usage and help information and exit. |
+
+Case study: Hadoop cluster recovery
+-----------------------------------
+
+In case there is some problem with hadoop cluster and the edits file is 
corrupted it is possible to save at least part of the edits file that is 
correct. This can be done by converting the binary edits to XML, edit it 
manually and then convert it back to binary. The most common problem is that 
the edits file is missing the closing record (record that has opCode -1). This 
should be recognized by the tool and the XML format should be properly closed.
+
+If there is no closing record in the XML file you can add one after last 
correct record. Anything after the record with opCode -1 is ignored.
+
+Example of a closing record (with opCode -1):
+
+      <RECORD>
+        <OPCODE>-1</OPCODE>
+        <DATA>
+        </DATA>
+      </RECORD>

http://git-wip-us.apache.org/repos/asf/hadoop/blob/a45ef2b6/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsImageViewer.md
----------------------------------------------------------------------
diff --git 
a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsImageViewer.md 
b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsImageViewer.md
new file mode 100644
index 0000000..864eee8
--- /dev/null
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsImageViewer.md
@@ -0,0 +1,172 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+Offline Image Viewer Guide
+==========================
+
+* [Offline Image Viewer Guide](#Offline_Image_Viewer_Guide)
+    * [Overview](#Overview)
+    * [Usage](#Usage)
+        * [Web Processor](#Web_Processor)
+        * [XML Processor](#XML_Processor)
+    * [Options](#Options)
+    * [Analyzing Results](#Analyzing_Results)
+    * [oiv\_legacy Command](#oiv_legacy_Command)
+        * [Usage](#Usage)
+        * [Options](#Options)
+
+Overview
+--------
+
+The Offline Image Viewer is a tool to dump the contents of hdfs fsimage files 
to a human-readable format and provide read-only WebHDFS API in order to allow 
offline analysis and examination of an Hadoop cluster's namespace. The tool is 
able to process very large image files relatively quickly. The tool handles the 
layout formats that were included with Hadoop versions 2.4 and up. If you want 
to handle older layout formats, you can use the Offline Image Viewer of Hadoop 
2.3 or [oiv\_legacy Command](#oiv_legacy_Command). If the tool is not able to 
process an image file, it will exit cleanly. The Offline Image Viewer does not 
require a Hadoop cluster to be running; it is entirely offline in its operation.
+
+The Offline Image Viewer provides several output processors:
+
+1.  Web is the default output processor. It launches a HTTP server
+    that exposes read-only WebHDFS API. Users can investigate the namespace
+    interactively by using HTTP REST API.
+
+2.  XML creates an XML document of the fsimage and includes all of the
+    information within the fsimage, similar to the lsr processor. The
+    output of this processor is amenable to automated processing and
+    analysis with XML tools. Due to the verbosity of the XML syntax,
+    this processor will also generate the largest amount of output.
+
+3.  FileDistribution is the tool for analyzing file sizes in the
+    namespace image. In order to run the tool one should define a range
+    of integers [0, maxSize] by specifying maxSize and a step. The
+    range of integers is divided into segments of size step: [0, s[1],
+    ..., s[n-1], maxSize], and the processor calculates how many files
+    in the system fall into each segment [s[i-1], s[i]). Note that
+    files larger than maxSize always fall into the very last segment.
+    The output file is formatted as a tab separated two column table:
+    Size and NumFiles. Where Size represents the start of the segment,
+    and numFiles is the number of files form the image which size falls
+    in this segment.
+
+Usage
+-----
+
+### Web Processor
+
+Web processor launches a HTTP server which exposes read-only WebHDFS API. 
Users can specify the address to listen by -addr option (default by 
localhost:5978).
+
+       bash$ bin/hdfs oiv -i fsimage
+       14/04/07 13:25:14 INFO offlineImageViewer.WebImageViewer: WebImageViewer
+       started. Listening on /127.0.0.1:5978. Press Ctrl+C to stop the viewer.
+
+Users can access the viewer and get the information of the fsimage by the 
following shell command:
+
+       bash$ bin/hdfs dfs -ls webhdfs://127.0.0.1:5978/
+       Found 2 items
+       drwxrwx--* - root supergroup          0 2014-03-26 20:16 
webhdfs://127.0.0.1:5978/tmp
+       drwxr-xr-x   - root supergroup          0 2014-03-31 14:08 
webhdfs://127.0.0.1:5978/user
+
+To get the information of all the files and directories, you can simply use 
the following command:
+
+       bash$ bin/hdfs dfs -ls -R webhdfs://127.0.0.1:5978/
+
+Users can also get JSON formatted FileStatuses via HTTP REST API.
+
+       bash$ curl -i http://127.0.0.1:5978/webhdfs/v1/?op=liststatus
+       HTTP/1.1 200 OK
+       Content-Type: application/json
+       Content-Length: 252
+
+       {"FileStatuses":{"FileStatus":[
+       
{"fileId":16386,"accessTime":0,"replication":0,"owner":"theuser","length":0,"permission":"755","blockSize":0,"modificationTime":1392772497282,"type":"DIRECTORY","group":"supergroup","childrenNum":1,"pathSuffix":"user"}
+       ]}}
+
+The Web processor now supports the following operations:
+
+* [LISTSTATUS](./WebHDFS.html#List_a_Directory)
+* [GETFILESTATUS](./WebHDFS.html#Status_of_a_FileDirectory)
+* [GETACLSTATUS](./WebHDFS.html#Get_ACL_Status)
+
+### XML Processor
+
+XML Processor is used to dump all the contents in the fsimage. Users can 
specify input and output file via -i and -o command-line.
+
+       bash$ bin/hdfs oiv -p XML -i fsimage -o fsimage.xml
+
+This will create a file named fsimage.xml contains all the information in the 
fsimage. For very large image files, this process may take several minutes.
+
+Applying the Offline Image Viewer with XML processor would result in the 
following output:
+
+       <?xml version="1.0"?>
+       <fsimage>
+       <NameSection>
+         <genstampV1>1000</genstampV1>
+         <genstampV2>1002</genstampV2>
+         <genstampV1Limit>0</genstampV1Limit>
+         <lastAllocatedBlockId>1073741826</lastAllocatedBlockId>
+         <txid>37</txid>
+       </NameSection>
+       <INodeSection>
+         <lastInodeId>16400</lastInodeId>
+         <inode>
+           <id>16385</id>
+           <type>DIRECTORY</type>
+           <name></name>
+           <mtime>1392772497282</mtime>
+           <permission>theuser:supergroup:rwxr-xr-x</permission>
+           <nsquota>9223372036854775807</nsquota>
+           <dsquota>-1</dsquota>
+         </inode>
+       ...remaining output omitted...
+
+Options
+-------
+
+| **Flag** | **Description** |
+|:---- |:---- |
+| `-i`\|`--inputFile` *input file* | Specify the input fsimage file to 
process. Required. |
+| `-o`\|`--outputFile` *output file* | Specify the output filename, if the 
specified output processor generates one. If the specified file already exists, 
it is silently overwritten. (output to stdout by default)\|
+| `-p`\|`--processor` *processor* | Specify the image processor to apply 
against the image file. Currently valid options are Web (default), XML and 
FileDistribution. |
+| `-addr` *address* | Specify the address(host:port) to listen. 
(localhost:5978 by default). This option is used with Web processor. |
+| `-maxSize` *size* | Specify the range [0, maxSize] of file sizes to be 
analyzed in bytes (128GB by default). This option is used with FileDistribution 
processor. |
+| `-step` *size* | Specify the granularity of the distribution in bytes (2MB 
by default). This option is used with FileDistribution processor. |
+| `-h`\|`--help` | Display the tool usage and help information and exit. |
+
+Analyzing Results
+-----------------
+
+The Offline Image Viewer makes it easy to gather large amounts of data about 
the hdfs namespace. This information can then be used to explore file system 
usage patterns or find specific files that match arbitrary criteria, along with 
other types of namespace analysis.
+
+oiv\_legacy Command
+-------------------
+
+Due to the internal layout changes introduced by the ProtocolBuffer-based 
fsimage ([HDFS-5698](https://issues.apache.org/jira/browse/HDFS-5698)), 
OfflineImageViewer consumes excessive amount of memory and loses some functions 
such as Indented and Delimited processor. If you want to process without large 
amount of memory or use these processors, you can use `oiv_legacy` command 
(same as `oiv` in Hadoop 2.3).
+
+### Usage
+
+1. Set `dfs.namenode.legacy-oiv-image.dir` to an appropriate directory
+   to make standby NameNode or SecondaryNameNode save its namespace in the
+   old fsimage format during checkpointing.
+
+2. Use `oiv_legacy` command to the old format fsimage.
+
+        bash$ bin/hdfs oiv_legacy -i fsimage_old -o output
+
+### Options
+
+| **Flag** | **Description** |
+|:---- |:---- |
+| `-i`\|`--inputFile` *input file* | Specify the input fsimage file to 
process. Required. |
+| `-o`\|`--outputFile` *output file* | Specify the output filename, if the 
specified output processor generates one. If the specified file already exists, 
it is silently overwritten. Required. |
+| `-p`\|`--processor` *processor* | Specify the image processor to apply 
against the image file. Valid options are Ls (default), XML, Delimited, 
Indented, and FileDistribution. |
+| `-skipBlocks` | Do not enumerate individual blocks within files. This may 
save processing time and outfile file space on namespaces with very large 
files. The Ls processor reads the blocks to correctly determine file sizes and 
ignores this option. |
+| `-printToScreen` | Pipe output of processor to console as well as specified 
file. On extremely large namespaces, this may increase processing time by an 
order of magnitude. |
+| `-delimiter` *arg* | When used in conjunction with the Delimited processor, 
replaces the default tab delimiter with the string specified by *arg*. |
+| `-h`\|`--help` | Display the tool usage and help information and exit. |

http://git-wip-us.apache.org/repos/asf/hadoop/blob/a45ef2b6/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsMultihoming.md
----------------------------------------------------------------------
diff --git 
a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsMultihoming.md 
b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsMultihoming.md
new file mode 100644
index 0000000..fc7cc3d
--- /dev/null
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsMultihoming.md
@@ -0,0 +1,127 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+HDFS Support for Multihomed Networks
+====================================
+
+This document is targetted to cluster administrators deploying `HDFS` in 
multihomed networks. Similar support for `YARN`/`MapReduce` is work in progress 
and will be documented when available.
+
+* [HDFS Support for Multihomed Networks](#HDFS_Support_for_Multihomed_Networks)
+    * [Multihoming Background](#Multihoming_Background)
+    * [Fixing Hadoop Issues In Multihomed 
Environments](#Fixing_Hadoop_Issues_In_Multihomed_Environments)
+        * [Ensuring HDFS Daemons Bind All 
Interfaces](#Ensuring_HDFS_Daemons_Bind_All_Interfaces)
+        * [Clients use Hostnames when connecting to 
DataNodes](#Clients_use_Hostnames_when_connecting_to_DataNodes)
+        * [DataNodes use HostNames when connecting to other 
DataNodes](#DataNodes_use_HostNames_when_connecting_to_other_DataNodes)
+
+Multihoming Background
+----------------------
+
+In multihomed networks the cluster nodes are connected to more than one 
network interface. There could be multiple reasons for doing so.
+
+1.  **Security**: Security requirements may dictate that intra-cluster
+    traffic be confined to a different network than the network used to
+    transfer data in and out of the cluster.
+
+2.  **Performance**: Intra-cluster traffic may use one or more high bandwidth
+    interconnects like Fiber Channel, Infiniband or 10GbE.
+
+3.  **Failover/Redundancy**: The nodes may have multiple network adapters
+    connected to a single network to handle network adapter failure.
+
+Note that NIC Bonding (also known as NIC Teaming or Link
+Aggregation) is a related but separate topic. The following settings
+are usually not applicable to a NIC bonding configuration which handles
+multiplexing and failover transparently while presenting a single 'logical
+network' to applications.
+
+Fixing Hadoop Issues In Multihomed Environments
+-----------------------------------------------
+
+### Ensuring HDFS Daemons Bind All Interfaces
+
+By default `HDFS` endpoints are specified as either hostnames or IP addresses. 
In either case `HDFS` daemons will bind to a single IP address making the 
daemons unreachable from other networks.
+
+The solution is to have separate setting for server endpoints to force binding 
the wildcard IP address `INADDR_ANY` i.e. `0.0.0.0`. Do NOT supply a port 
number with any of these settings.
+
+    <property>
+      <name>dfs.namenode.rpc-bind-host</name>
+      <value>0.0.0.0</value>
+      <description>
+        The actual address the RPC server will bind to. If this optional 
address is
+        set, it overrides only the hostname portion of 
dfs.namenode.rpc-address.
+        It can also be specified per name node or name service for 
HA/Federation.
+        This is useful for making the name node listen on all interfaces by
+        setting it to 0.0.0.0.
+      </description>
+    </property>
+
+    <property>
+      <name>dfs.namenode.servicerpc-bind-host</name>
+      <value>0.0.0.0</value>
+      <description>
+        The actual address the service RPC server will bind to. If this 
optional address is
+        set, it overrides only the hostname portion of 
dfs.namenode.servicerpc-address.
+        It can also be specified per name node or name service for 
HA/Federation.
+        This is useful for making the name node listen on all interfaces by
+        setting it to 0.0.0.0.
+      </description>
+    </property>
+
+    <property>
+      <name>dfs.namenode.http-bind-host</name>
+      <value>0.0.0.0</value>
+      <description>
+        The actual adress the HTTP server will bind to. If this optional 
address
+        is set, it overrides only the hostname portion of 
dfs.namenode.http-address.
+        It can also be specified per name node or name service for 
HA/Federation.
+        This is useful for making the name node HTTP server listen on all
+        interfaces by setting it to 0.0.0.0.
+      </description>
+    </property>
+
+    <property>
+      <name>dfs.namenode.https-bind-host</name>
+      <value>0.0.0.0</value>
+      <description>
+        The actual adress the HTTPS server will bind to. If this optional 
address
+        is set, it overrides only the hostname portion of 
dfs.namenode.https-address.
+        It can also be specified per name node or name service for 
HA/Federation.
+        This is useful for making the name node HTTPS server listen on all
+        interfaces by setting it to 0.0.0.0.
+      </description>
+    </property>
+
+### Clients use Hostnames when connecting to DataNodes
+
+By default `HDFS` clients connect to DataNodes using the IP address provided 
by the NameNode. Depending on the network configuration this IP address may be 
unreachable by the clients. The fix is letting clients perform their own DNS 
resolution of the DataNode hostname. The following setting enables this 
behavior.
+
+    <property>
+      <name>dfs.client.use.datanode.hostname</name>
+      <value>true</value>
+      <description>Whether clients should use datanode hostnames when
+        connecting to datanodes.
+      </description>
+    </property>
+
+### DataNodes use HostNames when connecting to other DataNodes
+
+Rarely, the NameNode-resolved IP address for a DataNode may be unreachable 
from other DataNodes. The fix is to force DataNodes to perform their own DNS 
resolution for inter-DataNode connections. The following setting enables this 
behavior.
+
+    <property>
+      <name>dfs.datanode.use.datanode.hostname</name>
+      <value>true</value>
+      <description>Whether datanodes should use datanode hostnames when
+        connecting to other datanodes for data transfer.
+      </description>
+    </property>

http://git-wip-us.apache.org/repos/asf/hadoop/blob/a45ef2b6/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsNfsGateway.md
----------------------------------------------------------------------
diff --git 
a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsNfsGateway.md 
b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsNfsGateway.md
new file mode 100644
index 0000000..cea491f
--- /dev/null
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsNfsGateway.md
@@ -0,0 +1,254 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+HDFS NFS Gateway
+================
+
+* [HDFS NFS Gateway](#HDFS_NFS_Gateway)
+    * [Overview](#Overview)
+    * [Configuration](#Configuration)
+    * [Start and stop NFS gateway service](#Start_and_stop_NFS_gateway_service)
+    * [Verify validity of NFS related 
services](#Verify_validity_of_NFS_related_services)
+    * [Mount the export "/"](#Mount_the_export_)
+    * [Allow mounts from unprivileged 
clients](#Allow_mounts_from_unprivileged_clients)
+    * [User authentication and mapping](#User_authentication_and_mapping)
+
+Overview
+--------
+
+The NFS Gateway supports NFSv3 and allows HDFS to be mounted as part of the 
client's local file system. Currently NFS Gateway supports and enables the 
following usage patterns:
+
+* Users can browse the HDFS file system through their local file system
+  on NFSv3 client compatible operating systems.
+* Users can download files from the the HDFS file system on to their
+  local file system.
+* Users can upload files from their local file system directly to the
+  HDFS file system.
+* Users can stream data directly to HDFS through the mount point. File
+  append is supported but random write is not supported.
+
+The NFS gateway machine needs the same thing to run an HDFS client like Hadoop 
JAR files, HADOOP\_CONF directory. The NFS gateway can be on the same host as 
DataNode, NameNode, or any HDFS client.
+
+Configuration
+-------------
+
+The NFS-gateway uses proxy user to proxy all the users accessing the NFS 
mounts. In non-secure mode, the user running the gateway is the proxy user, 
while in secure mode the user in Kerberos keytab is the proxy user. Suppose the 
proxy user is 'nfsserver' and users belonging to the groups 'users-group1' and 
'users-group2' use the NFS mounts, then in core-site.xml of the NameNode, the 
following two properities must be set and only NameNode needs restart after the 
configuration change (NOTE: replace the string 'nfsserver' with the proxy user 
name in your cluster):
+
+    <property>
+      <name>hadoop.proxyuser.nfsserver.groups</name>
+      <value>root,users-group1,users-group2</value>
+      <description>
+             The 'nfsserver' user is allowed to proxy all members of the 
'users-group1' and
+             'users-group2' groups. Note that in most cases you will need to 
include the
+             group "root" because the user "root" (which usually belonges to 
"root" group) will
+             generally be the user that initially executes the mount on the 
NFS client system.
+             Set this to '*' to allow nfsserver user to proxy any group.
+      </description>
+    </property>
+
+    <property>
+      <name>hadoop.proxyuser.nfsserver.hosts</name>
+      <value>nfs-client-host1.com</value>
+      <description>
+             This is the host where the nfs gateway is running. Set this to 
'*' to allow
+             requests from any hosts to be proxied.
+      </description>
+    </property>
+
+The above are the only required configuration for the NFS gateway in 
non-secure mode. For Kerberized hadoop clusters, the following configurations 
need to be added to hdfs-site.xml for the gateway (NOTE: replace string 
"nfsserver" with the proxy user name and ensure the user contained in the 
keytab is also the same proxy user):
+
+      <property>
+        <name>nfs.keytab.file</name>
+        <value>/etc/hadoop/conf/nfsserver.keytab</value> <!-- path to the nfs 
gateway keytab -->
+      </property>
+
+      <property>
+        <name>nfs.kerberos.principal</name>
+        <value>nfsserver/_h...@your-realm.com</value>
+      </property>
+
+The rest of the NFS gateway configurations are optional for both secure and 
non-secure mode.
+
+The AIX NFS client has a [few known 
issues](https://issues.apache.org/jira/browse/HDFS-6549) that prevent it from 
working correctly by default with the HDFS NFS Gateway. If you want to be able 
to access the HDFS NFS Gateway from AIX, you should set the following 
configuration setting to enable work-arounds for these issues:
+
+    <property>
+      <name>nfs.aix.compatibility.mode.enabled</name>
+      <value>true</value>
+    </property>
+
+Note that regular, non-AIX clients should NOT enable AIX compatibility mode. 
The work-arounds implemented by AIX compatibility mode effectively disable 
safeguards to ensure that listing of directory contents via NFS returns 
consistent results, and that all data sent to the NFS server can be assured to 
have been committed.
+
+It's strongly recommended for the users to update a few configuration 
properties based on their use cases. All the following configuration properties 
can be added or updated in hdfs-site.xml.
+
+*   If the client mounts the export with access time update allowed, make sure 
the following
+    property is not disabled in the configuration file. Only NameNode needs to 
restart after
+    this property is changed. On some Unix systems, the user can disable 
access time update
+    by mounting the export with "noatime". If the export is mounted with 
"noatime", the user
+    doesn't need to change the following property and thus no need to restart 
namenode.
+
+        <property>
+          <name>dfs.namenode.accesstime.precision</name>
+          <value>3600000</value>
+          <description>The access time for HDFS file is precise upto this 
value.
+            The default value is 1 hour. Setting a value of 0 disables
+            access times for HDFS.
+          </description>
+        </property>
+
+*   Users are expected to update the file dump directory. NFS client often
+    reorders writes. Sequential writes can arrive at the NFS gateway at random
+    order. This directory is used to temporarily save out-of-order writes
+    before writing to HDFS. For each file, the out-of-order writes are dumped 
after
+    they are accumulated to exceed certain threshold (e.g., 1MB) in memory.
+    One needs to make sure the directory has enough
+    space. For example, if the application uploads 10 files with each having
+    100MB, it is recommended for this directory to have roughly 1GB space in 
case if a
+    worst-case write reorder happens to every file. Only NFS gateway needs to 
restart after
+    this property is updated.
+
+          <property>
+            <name>nfs.dump.dir</name>
+            <value>/tmp/.hdfs-nfs</value>
+          </property>
+
+*   By default, the export can be mounted by any client. To better control the 
access,
+    users can update the following property. The value string contains machine 
name and
+    access privilege, separated by whitespace
+    characters. The machine name format can be a single host, a Java regular 
expression, or an IPv4 address. The access
+    privilege uses rw or ro to specify read/write or read-only access of the 
machines to exports. If the access privilege is not provided, the default is 
read-only. Entries are separated by ";".
+    For example: "192.168.0.0/22 rw ; host.\*\\.example\\.com ; host1.test.org 
ro;". Only the NFS gateway needs to restart after
+    this property is updated.
+
+        <property>
+          <name>nfs.exports.allowed.hosts</name>
+          <value>* rw</value>
+        </property>
+
+*   JVM and log settings. You can export JVM settings (e.g., heap size and GC 
log) in
+    HADOOP\_NFS3\_OPTS. More NFS related settings can be found in 
hadoop-env.sh.
+    To get NFS debug trace, you can edit the log4j.property file
+    to add the following. Note, debug trace, especially for ONCRPC, can be 
very verbose.
+
+    To change logging level:
+
+            log4j.logger.org.apache.hadoop.hdfs.nfs=DEBUG
+
+    To get more details of ONCRPC requests:
+
+            log4j.logger.org.apache.hadoop.oncrpc=DEBUG
+
+Start and stop NFS gateway service
+----------------------------------
+
+Three daemons are required to provide NFS service: rpcbind (or portmap), 
mountd and nfsd. The NFS gateway process has both nfsd and mountd. It shares 
the HDFS root "/" as the only export. It is recommended to use the portmap 
included in NFS gateway package. Even though NFS gateway works with 
portmap/rpcbind provide by most Linux distributions, the package included 
portmap is needed on some Linux systems such as REHL6.2 due to an [rpcbind 
bug](https://bugzilla.redhat.com/show_bug.cgi?id=731542). More detailed 
discussions can be found in 
[HDFS-4763](https://issues.apache.org/jira/browse/HDFS-4763).
+
+1.  Stop nfsv3 and rpcbind/portmap services provided by the platform (commands 
can be different on various Unix platforms):
+
+        [root]> service nfs stop
+        [root]> service rpcbind stop
+
+2.  Start Hadoop's portmap (needs root privileges):
+
+        [root]> $HADOOP_PREFIX/bin/hdfs --daemon start portmap
+
+3.  Start mountd and nfsd.
+
+    No root privileges are required for this command. In non-secure mode, the 
NFS gateway
+    should be started by the proxy user mentioned at the beginning of this 
user guide.
+    While in secure mode, any user can start NFS gateway
+    as long as the user has read access to the Kerberos keytab defined in 
"nfs.keytab.file".
+
+        [hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon start nfs3
+
+4.  Stop NFS gateway services.
+
+        [hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon stop nfs3
+        [root]> $HADOOP_PREFIX/bin/hdfs --daemon stop portmap
+
+Optionally, you can forgo running the Hadoop-provided portmap daemon and 
instead use the system portmap daemon on all operating systems if you start the 
NFS Gateway as root. This will allow the HDFS NFS Gateway to work around the 
aforementioned bug and still register using the system portmap daemon. To do 
so, just start the NFS gateway daemon as you normally would, but make sure to 
do so as the "root" user, and also set the "HADOOP\_PRIVILEGED\_NFS\_USER" 
environment variable to an unprivileged user. In this mode the NFS Gateway will 
start as root to perform its initial registration with the system portmap, and 
then will drop privileges back to the user specified by the 
HADOOP\_PRIVILEGED\_NFS\_USER afterward and for the rest of the duration of the 
lifetime of the NFS Gateway process. Note that if you choose this route, you 
should skip steps 1 and 2 above.
+
+Verify validity of NFS related services
+---------------------------------------
+
+1.  Execute the following command to verify if all the services are up and 
running:
+
+        [root]> rpcinfo -p $nfs_server_ip
+
+    You should see output similar to the following:
+
+               program vers proto   port
+
+               100005    1   tcp   4242  mountd
+
+               100005    2   udp   4242  mountd
+
+               100005    2   tcp   4242  mountd
+
+               100000    2   tcp    111  portmapper
+
+               100000    2   udp    111  portmapper
+
+               100005    3   udp   4242  mountd
+
+               100005    1   udp   4242  mountd
+
+               100003    3   tcp   2049  nfs
+
+               100005    3   tcp   4242  mountd
+
+2.  Verify if the HDFS namespace is exported and can be mounted.
+
+        [root]> showmount -e $nfs_server_ip
+
+    You should see output similar to the following:
+
+                Exports list on $nfs_server_ip :
+
+                / (everyone)
+
+Mount the export "/"
+--------------------
+
+Currently NFS v3 only uses TCP as the transportation protocol. NLM is not 
supported so mount option "nolock" is needed. It's recommended to use hard 
mount. This is because, even after the client sends all data to NFS gateway, it 
may take NFS gateway some extra time to transfer data to HDFS when writes were 
reorderd by NFS client Kernel.
+
+If soft mount has to be used, the user should give it a relatively long 
timeout (at least no less than the default timeout on the host) .
+
+The users can mount the HDFS namespace as shown below:
+
+     [root]>mount -t nfs -o vers=3,proto=tcp,nolock,noacl $server:/  
$mount_point
+
+Then the users can access HDFS as part of the local file system except that, 
hard link and random write are not supported yet. To optimize the performance 
of large file I/O, one can increase the NFS transfer size(rsize and wsize) 
during mount. By default, NFS gateway supports 1MB as the maximum transfer 
size. For larger data transfer size, one needs to update "nfs.rtmax" and 
"nfs.rtmax" in hdfs-site.xml.
+
+Allow mounts from unprivileged clients
+--------------------------------------
+
+In environments where root access on client machines is not generally 
available, some measure of security can be obtained by ensuring that only NFS 
clients originating from privileged ports can connect to the NFS server. This 
feature is referred to as "port monitoring." This feature is not enabled by 
default in the HDFS NFS Gateway, but can be optionally enabled by setting the 
following config in hdfs-site.xml on the NFS Gateway machine:
+
+    <property>
+      <name>nfs.port.monitoring.disabled</name>
+      <value>false</value>
+    </property>
+
+User authentication and mapping
+-------------------------------
+
+NFS gateway in this release uses AUTH\_UNIX style authentication. When the 
user on NFS client accesses the mount point, NFS client passes the UID to NFS 
gateway. NFS gateway does a lookup to find user name from the UID, and then 
passes the username to the HDFS along with the HDFS requests. For example, if 
the NFS client has current user as "admin", when the user accesses the mounted 
directory, NFS gateway will access HDFS as user "admin". To access HDFS as the 
user "hdfs", one needs to switch the current user to "hdfs" on the client 
system when accessing the mounted directory.
+
+The system administrator must ensure that the user on NFS client host has the 
same name and UID as that on the NFS gateway host. This is usually not a 
problem if the same user management system (e.g., LDAP/NIS) is used to create 
and deploy users on HDFS nodes and NFS client node. In case the user account is 
created manually on different hosts, one might need to modify UID (e.g., do 
"usermod -u 123 myusername") on either NFS client or NFS gateway host in order 
to make it the same on both sides. More technical details of RPC AUTH\_UNIX can 
be found in [RPC specification](http://tools.ietf.org/html/rfc1057).
+
+Optionally, the system administrator can configure a custom static mapping 
file in the event one wishes to access the HDFS NFS Gateway from a system with 
a completely disparate set of UIDs/GIDs. By default this file is located at 
"/etc/nfs.map", but a custom location can be configured by setting the 
"static.id.mapping.file" property to the path of the static mapping file. The 
format of the static mapping file is similar to what is described in the 
exports(5) manual page, but roughly it is:
+
+    # Mapping for clients accessing the NFS gateway
+    uid 10 100 # Map the remote UID 10 the local UID 100
+    gid 11 101 # Map the remote GID 11 to the local GID 101

http://git-wip-us.apache.org/repos/asf/hadoop/blob/a45ef2b6/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsPermissionsGuide.md
----------------------------------------------------------------------
diff --git 
a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsPermissionsGuide.md 
b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsPermissionsGuide.md
new file mode 100644
index 0000000..450255e
--- /dev/null
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsPermissionsGuide.md
@@ -0,0 +1,284 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+HDFS Permissions Guide
+======================
+
+* [HDFS Permissions Guide](#HDFS_Permissions_Guide)
+    * [Overview](#Overview)
+    * [User Identity](#User_Identity)
+    * [Group Mapping](#Group_Mapping)
+    * [Understanding the Implementation](#Understanding_the_Implementation)
+    * [Changes to the File System API](#Changes_to_the_File_System_API)
+    * [Changes to the Application Shell](#Changes_to_the_Application_Shell)
+    * [The Super-User](#The_Super-User)
+    * [The Web Server](#The_Web_Server)
+    * [ACLs (Access Control Lists)](#ACLs_Access_Control_Lists)
+    * [ACLs File System API](#ACLs_File_System_API)
+    * [ACLs Shell Commands](#ACLs_Shell_Commands)
+    * [Configuration Parameters](#Configuration_Parameters)
+
+Overview
+--------
+
+The Hadoop Distributed File System (HDFS) implements a permissions model for 
files and directories that shares much of the POSIX model. Each file and 
directory is associated with an owner and a group. The file or directory has 
separate permissions for the user that is the owner, for other users that are 
members of the group, and for all other users. For files, the r permission is 
required to read the file, and the w permission is required to write or append 
to the file. For directories, the r permission is required to list the contents 
of the directory, the w permission is required to create or delete files or 
directories, and the x permission is required to access a child of the 
directory.
+
+In contrast to the POSIX model, there are no setuid or setgid bits for files 
as there is no notion of executable files. For directories, there are no setuid 
or setgid bits directory as a simplification. The Sticky bit can be set on 
directories, preventing anyone except the superuser, directory owner or file 
owner from deleting or moving the files within the directory. Setting the 
sticky bit for a file has no effect. Collectively, the permissions of a file or 
directory are its mode. In general, Unix customs for representing and 
displaying modes will be used, including the use of octal numbers in this 
description. When a file or directory is created, its owner is the user 
identity of the client process, and its group is the group of the parent 
directory (the BSD rule).
+
+HDFS also provides optional support for POSIX ACLs (Access Control Lists) to 
augment file permissions with finer-grained rules for specific named users or 
named groups. ACLs are discussed in greater detail later in this document.
+
+Each client process that accesses HDFS has a two-part identity composed of the 
user name, and groups list. Whenever HDFS must do a permissions check for a 
file or directory foo accessed by a client process,
+
+* If the user name matches the owner of foo, then the owner permissions are 
tested;
+* Else if the group of foo matches any of member of the groups list, then the 
group permissions are tested;
+* Otherwise the other permissions of foo are tested.
+
+If a permissions check fails, the client operation fails.
+
+User Identity
+-------------
+
+As of Hadoop 0.22, Hadoop supports two different modes of operation to 
determine the user's identity, specified by the hadoop.security.authentication 
property:
+
+*   **simple**
+
+    In this mode of operation, the identity of a client process is
+    determined by the host operating system. On Unix-like systems,
+    the user name is the equivalent of \`whoami\`.
+
+*   **kerberos**
+
+    In Kerberized operation, the identity of a client process is
+    determined by its Kerberos credentials. For example, in a
+    Kerberized environment, a user may use the kinit utility to
+    obtain a Kerberos ticket-granting-ticket (TGT) and use klist to
+    determine their current principal. When mapping a Kerberos
+    principal to an HDFS username, all components except for the
+    primary are dropped. For example, a principal
+    todd/foo...@corp.company.com will act as the simple username todd on HDFS.
+
+Regardless of the mode of operation, the user identity mechanism is extrinsic 
to HDFS itself. There is no provision within HDFS for creating user identities, 
establishing groups, or processing user credentials.
+
+Group Mapping
+-------------
+
+Once a username has been determined as described above, the list of groups is 
determined by a group mapping service, configured by the 
hadoop.security.group.mapping property. The default implementation, 
org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback, will 
determine if the Java Native Interface (JNI) is available. If JNI is available, 
the implementation will use the API within hadoop to resolve a list of groups 
for a user. If JNI is not available then the shell implementation, 
org.apache.hadoop.security.ShellBasedUnixGroupsMapping, is used. This 
implementation shells out with the `bash -c groups` command (for a Linux/Unix 
environment) or the `net group` command (for a Windows environment) to resolve 
a list of groups for a user.
+
+An alternate implementation, which connects directly to an LDAP server to 
resolve the list of groups, is available via 
org.apache.hadoop.security.LdapGroupsMapping. However, this provider should 
only be used if the required groups reside exclusively in LDAP, and are not 
materialized on the Unix servers. More information on configuring the group 
mapping service is available in the Javadocs.
+
+For HDFS, the mapping of users to groups is performed on the NameNode. Thus, 
the host system configuration of the NameNode determines the group mappings for 
the users.
+
+Note that HDFS stores the user and group of a file or directory as strings; 
there is no conversion from user and group identity numbers as is conventional 
in Unix.
+
+Understanding the Implementation
+--------------------------------
+
+Each file or directory operation passes the full path name to the name node, 
and the permissions checks are applied along the path for each operation. The 
client framework will implicitly associate the user identity with the 
connection to the name node, reducing the need for changes to the existing 
client API. It has always been the case that when one operation on a file 
succeeds, the operation might fail when repeated because the file, or some 
directory on the path, no longer exists. For instance, when the client first 
begins reading a file, it makes a first request to the name node to discover 
the location of the first blocks of the file. A second request made to find 
additional blocks may fail. On the other hand, deleting a file does not revoke 
access by a client that already knows the blocks of the file. With the addition 
of permissions, a client's access to a file may be withdrawn between requests. 
Again, changing permissions does not revoke the access of a client that already 
 knows the file's blocks.
+
+Changes to the File System API
+------------------------------
+
+All methods that use a path parameter will throw `AccessControlException` if 
permission checking fails.
+
+New methods:
+
+* `public FSDataOutputStream create(Path f, FsPermission permission, boolean 
overwrite, int bufferSize, short replication, long
+  blockSize, Progressable progress) throws IOException;`
+* `public boolean mkdirs(Path f, FsPermission permission) throws IOException;`
+* `public void setPermission(Path p, FsPermission permission) throws 
IOException;`
+* `public void setOwner(Path p, String username, String groupname) throws 
IOException;`
+* `public FileStatus getFileStatus(Path f) throws IOException;`will 
additionally return the user, group and mode associated with the path.
+
+The mode of a new file or directory is restricted my the umask set as a 
configuration parameter. When the existing `create(path, â¦)` method (without 
the permission parameter) is used, the mode of the new file is `0666 & ^umask`. 
When the new `create(path, permission, â¦)` method (with the permission 
parameter P) is used, the mode of the new file is `P & ^umask & 0666`. When a 
new directory is created with the existing `mkdirs(path)` method (without the 
permission parameter), the mode of the new directory is `0777 & ^umask`. When 
the new `mkdirs(path, permission)` method (with the permission parameter P) is 
used, the mode of new directory is `P & ^umask & 0777`.
+
+Changes to the Application Shell
+--------------------------------
+
+New operations:
+
+*   `chmod [-R] mode file ...`
+
+    Only the owner of a file or the super-user is permitted to change the mode 
of a file.
+
+*   `chgrp [-R] group file ...`
+
+    The user invoking chgrp must belong to the specified group and be
+    the owner of the file, or be the super-user.
+
+*   `chown [-R] [owner][:[group]] file ...`
+
+    The owner of a file may only be altered by a super-user.
+
+*   `ls file ...`
+
+*   `lsr file ...`
+
+    The output is reformatted to display the owner, group and mode.
+
+The Super-User
+--------------
+
+The super-user is the user with the same identity as name node process itself. 
Loosely, if you started the name node, then you are the super-user. The 
super-user can do anything in that permissions checks never fail for the 
super-user. There is no persistent notion of who was the super-user; when the 
name node is started the process identity determines who is the super-user for 
now. The HDFS super-user does not have to be the super-user of the name node 
host, nor is it necessary that all clusters have the same super-user. Also, an 
experimenter running HDFS on a personal workstation, conveniently becomes that 
installation's super-user without any configuration.
+
+In addition, the administrator my identify a distinguished group using a 
configuration parameter. If set, members of this group are also super-users.
+
+The Web Server
+--------------
+
+By default, the identity of the web server is a configuration parameter. That 
is, the name node has no notion of the identity of the real user, but the web 
server behaves as if it has the identity (user and groups) of a user chosen by 
the administrator. Unless the chosen identity matches the super-user, parts of 
the name space may be inaccessible to the web server.
+
+ACLs (Access Control Lists)
+---------------------------
+
+In addition to the traditional POSIX permissions model, HDFS also supports 
POSIX ACLs (Access Control Lists). ACLs are useful for implementing permission 
requirements that differ from the natural organizational hierarchy of users and 
groups. An ACL provides a way to set different permissions for specific named 
users or named groups, not only the file's owner and the file's group.
+
+By default, support for ACLs is disabled, and the NameNode disallows creation 
of ACLs. To enable support for ACLs, set `dfs.namenode.acls.enabled` to true in 
the NameNode configuration.
+
+An ACL consists of a set of ACL entries. Each ACL entry names a specific user 
or group and grants or denies read, write and execute permissions for that 
specific user or group. For example:
+
+       user::rw-
+       user:bruce:rwx                  #effective:r--
+       group::r-x                      #effective:r--
+       group:sales:rwx                 #effective:r--
+       mask::r--
+       other::r--
+
+ACL entries consist of a type, an optional name and a permission string. For 
display purposes, ':' is used as the delimiter between each field. In this 
example ACL, the file owner has read-write access, the file group has 
read-execute access and others have read access. So far, this is equivalent to 
setting the file's permission bits to 654.
+
+Additionally, there are 2 extended ACL entries for the named user bruce and 
the named group sales, both granted full access. The mask is a special ACL 
entry that filters the permissions granted to all named user entries and named 
group entries, and also the unnamed group entry. In the example, the mask has 
only read permissions, and we can see that the effective permissions of several 
ACL entries have been filtered accordingly.
+
+Every ACL must have a mask. If the user doesn't supply a mask while setting an 
ACL, then a mask is inserted automatically by calculating the union of 
permissions on all entries that would be filtered by the mask.
+
+Running `chmod` on a file that has an ACL actually changes the permissions of 
the mask. Since the mask acts as a filter, this effectively constrains the 
permissions of all extended ACL entries instead of changing just the group 
entry and possibly missing other extended ACL entries.
+
+The model also differentiates between an "access ACL", which defines the rules 
to enforce during permission checks, and a "default ACL", which defines the ACL 
entries that new child files or sub-directories receive automatically during 
creation. For example:
+
+       user::rwx
+       group::r-x
+       other::r-x
+       default:user::rwx
+       default:user:bruce:rwx          #effective:r-x
+       default:group::r-x
+       default:group:sales:rwx         #effective:r-x
+       default:mask::r-x
+       default:other::r-x
+
+Only directories may have a default ACL. When a new file or sub-directory is 
created, it automatically copies the default ACL of its parent into its own 
access ACL. A new sub-directory also copies it to its own default ACL. In this 
way, the default ACL will be copied down through arbitrarily deep levels of the 
file system tree as new sub-directories get created.
+
+The exact permission values in the new child's access ACL are subject to 
filtering by the mode parameter. Considering the default umask of 022, this is 
typically 755 for new directories and 644 for new files. The mode parameter 
filters the copied permission values for the unnamed user (file owner), the 
mask and other. Using this particular example ACL, and creating a new 
sub-directory with 755 for the mode, this mode filtering has no effect on the 
final result. However, if we consider creation of a file with 644 for the mode, 
then mode filtering causes the new file's ACL to receive read-write for the 
unnamed user (file owner), read for the mask and read for others. This mask 
also means that effective permissions for named user bruce and named group 
sales are only read.
+
+Note that the copy occurs at time of creation of the new file or 
sub-directory. Subsequent changes to the parent's default ACL do not change 
existing children.
+
+The default ACL must have all minimum required ACL entries, including the 
unnamed user (file owner), unnamed group (file group) and other entries. If the 
user doesn't supply one of these entries while setting a default ACL, then the 
entries are inserted automatically by copying the corresponding permissions 
from the access ACL, or permission bits if there is no access ACL. The default 
ACL also must have mask. As described above, if the mask is unspecified, then a 
mask is inserted automatically by calculating the union of permissions on all 
entries that would be filtered by the mask.
+
+When considering a file that has an ACL, the algorithm for permission checks 
changes to:
+
+* If the user name matches the owner of file, then the owner
+  permissions are tested;
+
+* Else if the user name matches the name in one of the named user entries,
+  then these permissions are tested, filtered by the mask permissions;
+
+* Else if the group of file matches any member of the groups list,
+  and if these permissions filtered by the mask grant access, then these
+  permissions are used;
+
+* Else if there is a named group entry matching a member of the groups list,
+  and if these permissions filtered by the mask grant access, then these
+  permissions are used;
+
+* Else if the file group or any named group entry matches a member of the
+  groups list, but access was not granted by any of those permissions, then
+  access is denied;
+
+* Otherwise the other permissions of file are tested.
+
+Best practice is to rely on traditional permission bits to implement most 
permission requirements, and define a smaller number of ACLs to augment the 
permission bits with a few exceptional rules. A file with an ACL incurs an 
additional cost in memory in the NameNode compared to a file that has only 
permission bits.
+
+ACLs File System API
+--------------------
+
+New methods:
+
+* `public void modifyAclEntries(Path path, List<AclEntry> aclSpec) throws 
IOException;`
+* `public void removeAclEntries(Path path, List<AclEntry> aclSpec) throws 
IOException;`
+* `public void public void removeDefaultAcl(Path path) throws IOException;`
+* `public void removeAcl(Path path) throws IOException;`
+* `public void setAcl(Path path, List<AclEntry> aclSpec) throws IOException;`
+* `public AclStatus getAclStatus(Path path) throws IOException;`
+
+ACLs Shell Commands
+-------------------
+
+*   `hdfs dfs -getfacl [-R] <path>`
+
+    Displays the Access Control Lists (ACLs) of files and directories. If a
+    directory has a default ACL, then getfacl also displays the default ACL.
+
+*   `hdfs dfs -setfacl [-R] [-b |-k -m |-x <acl_spec> <path>] |[--set 
<acl_spec> <path>] `
+
+    Sets Access Control Lists (ACLs) of files and directories.
+
+*   `hdfs dfs -ls <args>`
+
+    The output of `ls` will append a '+' character to the permissions
+    string of any file or directory that has an ACL.
+
+    See the [File System Shell](../hadoop-common/FileSystemShell.html)
+    documentation for full coverage of these commands.
+
+Configuration Parameters
+------------------------
+
+*   `dfs.permissions.enabled = true`
+
+    If yes use the permissions system as described here. If no,
+    permission checking is turned off, but all other behavior is
+    unchanged. Switching from one parameter value to the other does not
+    change the mode, owner or group of files or directories.
+    Regardless of whether permissions are on or off, chmod, chgrp, chown and
+    setfacl always check permissions. These functions are only useful in
+    the permissions context, and so there is no backwards compatibility
+    issue. Furthermore, this allows administrators to reliably set owners and 
permissions
+    in advance of turning on regular permissions checking.
+
+*   `dfs.web.ugi = webuser,webgroup`
+
+    The user name to be used by the web server. Setting this to the
+    name of the super-user allows any web client to see everything.
+    Changing this to an otherwise unused identity allows web clients to
+    see only those things visible using "other" permissions. Additional
+    groups may be added to the comma-separated list.
+
+*   `dfs.permissions.superusergroup = supergroup`
+
+    The name of the group of super-users.
+
+*   `fs.permissions.umask-mode = 0022`
+
+    The umask used when creating files and directories. For
+    configuration files, the decimal value 18 may be used.
+
+*   `dfs.cluster.administrators = ACL-for-admins`
+
+    The administrators for the cluster specified as an ACL. This
+    controls who can access the default servlets, etc. in the HDFS.
+
+*   `dfs.namenode.acls.enabled = true`
+
+    Set to true to enable support for HDFS ACLs (Access Control Lists). By
+    default, ACLs are disabled. When ACLs are disabled, the NameNode rejects
+    all attempts to set an ACL.
+
+

[20/50] [abbrv] hadoop git commit: HDFS-7668. Convert site documentation from apt to markdown (Masatake Iwasaki via aw)

Reply via email to