[GitHub] [hadoop] busbey commented on a change in pull request #575: HADOOP-13327 Output Stream Specification

GitBox Mon, 11 Mar 2019 09:20:08 -0700

busbey commented on a change in pull request #575: HADOOP-13327 Output Stream 
Specification
URL: https://github.com/apache/hadoop/pull/575#discussion_r264270980


 ##########
 File path: 
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/outputstream.md
 ##########
 @@ -0,0 +1,857 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+<!-- MACRO{toc|fromDepth=1|toDepth=3} -->
+
+# Output: `OutputStream`, `Syncable` and `StreamCapabilities`
+
+
+With the exception of `FileSystem.copyFromLocalFile()`,
+all API operations which write data to a filesystem in Hadoop do so
+through the Java "OutputStreams" API. More specifically, they do
+so through `OutputStream` subclasses obtained through calls to
+`FileSystem.create()`, `FileSystem.append()`,
+or `FSDataOutputStreamBuilder.build()`.
+
+These all return instances of `FSDataOutputStream`, through which data
+can be written through various `write()` methods.
+After a stream's `close()` method is called, all data written to the
+stream MUST BE persisted to the fileysystem and visible to oll other
+clients attempting to read data from that path via `FileSystem.open()`.
+
+As well as operations to write the data, Hadoop's Output Streams
+provide methods to flush buffered data back to the filesystem,
+so as to ensure that the data is reliably persisted and/or visible
+to other callers. This is done via the `Syncable` interface. It was
+originally intended that the presence of this interface could be interpreted
+as a guarantee that the stream supported it's methods, but this has proven
+impossible to guarantee as the static nature of the interface is incompatible
+with filesystems whose syncability semantics may vary on a store/path basis.
+As an example, erasure coded files in HDFS have
+A new interface, `StreamCapabilities` has been implemented to allow callers to 
probe the exact capabilities of a stream, even transitively
+through a chain of streams.
+
+* HDFS's primary stream implementation is
 
 Review comment:
   why are we talking about these here? Are they meant to be examples of things 
that do or do not implement `StreamCapabilities`? If so, specifically call out 
that DFSOutputStream implements `StreamCapabilities` and through it claims to 
support sync/flush.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[GitHub] [hadoop] busbey commented on a change in pull request #575: HADOOP-13327 Output Stream Specification

Reply via email to