coryan commented on a change in pull request #11842: URL: https://github.com/apache/arrow/pull/11842#discussion_r764261730
########## File path: cpp/src/arrow/filesystem/gcsfs.h ########## @@ -40,24 +40,43 @@ struct ARROW_EXPORT GcsOptions { bool Equals(const GcsOptions& other) const; }; +/// - TODO(ARROW-1231) - review this documentation before closing the bug. /// \brief GCS-backed FileSystem implementation. /// -/// Some implementation notes: -/// - TODO(ARROW-1231) - review all the notes once completed. -/// - buckets are treated as top-level directories on a "root". -/// - GCS buckets are in a global namespace, only one bucket -/// named `foo` exists in Google Cloud. -/// - Creating new top-level directories is implemented by creating -/// a bucket, this may be a slower operation than usual. -/// - A principal (service account, user, etc) can only list the -/// buckets for a single project, but can access the buckets -/// for many projects. It is possible that listing "all" -/// the buckets returns fewer buckets than you have access to. -/// - GCS does not have directories, they are emulated in this -/// library by listing objects with a common prefix. -/// - In general, GCS has much higher latency than local filesystems. -/// The throughput of GCS is comparable to the throughput of -/// a local file system. +/// GCS (Google Cloud Storage - https://cloud.google.com/storage) is an scalable object +/// storage system for any amount of data. The main abstractions in GCS are buckets and +/// objects. A bucket is a namespace for objects, buckets can store any number of objects, +/// tens of millions and even billions is not uncommon. Each object contains a single +/// blob of data, up to 5TiB in size. Buckets are typically configured to keep a single +/// version of each object, but versioning can be enabled. Versioning is important because +/// objects are immutable, once created one cannot append data to the object or modify the +/// object data in any way. +/// +/// GCS buckets are in a global namespace, if a Google Cloud customer creates a bucket +/// named `foo` no other customer can create a bucket with the same name. Note that a +/// principal (a user or service account) may only list the buckets they have entitled to, +/// and then only within a project. It is not possible to list "all" the buckets. +/// +/// Within each bucket objects are in flat namespace. GCS does not have folders or +/// directories. However, following some conventions it is possible to emulate +/// directories. To this end this class: +/// +/// - All buckets are treated as directories at the "root" +/// - Creating a root directory results in a new bucket being created, this may be slower +/// than most GCS operations. +/// - Any object with a name ending with a slash (`/`) character is treated as a +/// directory. +/// - The class creates marker objects for a directory, using a trailing slash in the +/// marker names. For debugging purposes, the metadata and contents of these marker +/// objects indicate that they are markers created by this class. The class does Review comment: Ahhhhh. No, the UI does not create any metadata, but ignores it too. I think the metadata is harmless and useful for debugging. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org