[GitHub] spark pull request #17834: [SPARK-7481] [build] Add spark-hadoop-cloud modul...

srowen Fri, 05 May 2017 04:22:45 -0700

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17834#discussion_r114975135
  
    --- Diff: docs/cloud-integration.md ---
    @@ -0,0 +1,203 @@
    +---
    +layout: global
    +displayTitle: Integration with Cloud Infrastructures
    +title: Integration with Cloud Infrastructures
    +description: Introduction to cloud storage support in Apache Spark 
SPARK_VERSION_SHORT
    +---
    +<!---
    +  Licensed under the Apache License, Version 2.0 (the "License");
    +  you may not use this file except in compliance with the License.
    +  You may obtain a copy of the License at
    +
    +   http://www.apache.org/licenses/LICENSE-2.0
    +
    +  Unless required by applicable law or agreed to in writing, software
    +  distributed under the License is distributed on an "AS IS" BASIS,
    +  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +  See the License for the specific language governing permissions and
    +  limitations under the License. See accompanying LICENSE file.
    +-->
    +
    +* This will become a table of contents (this text will be scraped).
    +{:toc}
    +
    +## Introduction
    +
    +
    +All major cloud providers offer persistent data storage in *object stores*.
    +These are not classic "POSIX" file systems.
    +In order to store hundreds of petabytes of data without any single points 
of failure,
    +object stores replace the classic filesystem directory tree
    +with a simpler model of `object-name => data`. To enable remote access, 
operations
    +on objects are usually offered as (slow) HTTP REST operations.
    +
    +Spark can read and write data in object stores through filesystem 
connectors implemented
    +in Hadoop or provided by the infrastructure suppliers themselves.
    +These connectors make the object stores look *almost* like filesystems, 
with directories and files
    +and the classic operations on them such as list, delete and rename.
    +
    +
    +### Important: Cloud Object Stores are Not Real Filesystems
    +
    +While the stores appear to be filesystems, underneath
    +they are still object stores, [and the difference is 
significant](http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/introduction.html)
    +
    +They cannot be used as a direct replacement for a cluster filesystem such 
as HDFS
    +*except where this is explicitly stated*.
    +
    +Key differences are
    --- End diff --
    
    Nit: I'd end the line with a colon to make it clear it's not dangling



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17834: [SPARK-7481] [build] Add spark-hadoop-cloud modul...

Reply via email to