coryan commented on a change in pull request #11842:
URL: https://github.com/apache/arrow/pull/11842#discussion_r764261730



##########
File path: cpp/src/arrow/filesystem/gcsfs.h
##########
@@ -40,24 +40,43 @@ struct ARROW_EXPORT GcsOptions {
   bool Equals(const GcsOptions& other) const;
 };
 
+/// - TODO(ARROW-1231) - review this documentation before closing the bug.
 /// \brief GCS-backed FileSystem implementation.
 ///
-/// Some implementation notes:
-/// - TODO(ARROW-1231) - review all the notes once completed.
-/// - buckets are treated as top-level directories on a "root".
-/// - GCS buckets are in a global namespace, only one bucket
-///   named `foo` exists in Google Cloud.
-/// - Creating new top-level directories is implemented by creating
-///   a bucket, this may be a slower operation than usual.
-/// - A principal (service account, user, etc) can only list the
-///   buckets for a single project, but can access the buckets
-///   for many projects. It is possible that listing "all"
-///   the buckets returns fewer buckets than you have access to.
-/// - GCS does not have directories, they are emulated in this
-///   library by listing objects with a common prefix.
-/// - In general, GCS has much higher latency than local filesystems.
-///   The throughput of GCS is comparable to the throughput of
-///   a local file system.
+/// GCS (Google Cloud Storage - https://cloud.google.com/storage) is an 
scalable object
+/// storage system for any amount of data. The main abstractions in GCS are 
buckets and
+/// objects. A bucket is a namespace for objects, buckets can store any number 
of objects,
+/// tens of millions and even billions is not uncommon.  Each object contains 
a single
+/// blob of data, up to 5TiB in size.  Buckets are typically configured to 
keep a single
+/// version of each object, but versioning can be enabled. Versioning is 
important because
+/// objects are immutable, once created one cannot append data to the object 
or modify the
+/// object data in any way.
+///
+/// GCS buckets are in a global namespace, if a Google Cloud customer creates 
a bucket
+/// named `foo` no other customer can create a bucket with the same name. Note 
that a
+/// principal (a user or service account) may only list the buckets they have 
entitled to,
+/// and then only within a project. It is not possible to list "all" the 
buckets.
+///
+/// Within each bucket objects are in flat namespace. GCS does not have 
folders or
+/// directories. However, following some conventions it is possible to emulate
+/// directories. To this end this class:
+///
+/// - All buckets are treated as directories at the "root"
+/// - Creating a root directory results in a new bucket being created, this 
may be slower
+///   than most GCS operations.
+/// - Any object with a name ending with a slash (`/`) character is treated as 
a
+///   directory.
+/// - The class creates marker objects for a directory, using a trailing slash 
in the
+///   marker names. For debugging purposes, the metadata and contents of these 
marker
+///   objects indicate that they are markers created by this class. The class 
does

Review comment:
       Ahhhhh.  No, the UI does not create any metadata, but ignores it too.  I 
think the metadata is harmless and useful for debugging.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to