> ...then I still expect the directory /foo to exist

Right, but if that is the sole purpose of empty directory markers, I'm curious 
if there was an attempt at keeping track of the prefixes/directories locally?



# ------------------------------

# Aldrin


https://github.com/drin/

https://gitlab.com/octalene

https://keybase.io/octalene


On Friday, July 12th, 2024 at 19:44, Hyunseok Seo <hsseo0...@gmail.com> wrote:

> I wonder why S3 (object storage) operates based on file system semantics.
> Python users are usually data scientists. They might not be familiar with
> the differences between object storage and file storage. Furthermore, I
> think there are a lot of pyarrow users.
> 

> > Avoiding file by file operations so that internal functions can batch as
> > much as possible.
> 

> 

> Thank you for the detailed explanation. So, are you suggesting that a more
> fundamental solution is needed rather than just adding options? I thought
> supporting such options would help users who do not want markers, despite
> the issues you mentioned. Furthermore, I agree that supporting ObjectStore
> is necessary for a more fundamental solution.
> 

> Thank you.
> 

> 2024년 7월 13일 (토) 오전 10:00, Weston Pace weston.p...@gmail.com님이 작성:
> 

> > > I think my question is still relevant: no matter what semantics
> > > `S3FileSystem` is trying to provide, I'm still not sure how the 
> > > placeholder
> > > object helps. I assume it's for listing objects, but what else?
> > 

> > If I have a local filesystem and I delete a file /foo/bar then I still
> > expect the directory /foo to exist.
> > 

> > ```
> > 

> > mkdir /foo
> > 

> > touch /foo/bar
> > 

> > rm /foo/bar
> > 

> > ls / # should show /foo
> > 

> > ```
> > 

> > In an object store there is no `mkdir` and, even if I remove /foo/bar then
> > there is no guarantee /foo will exist.
> > 

> > On Fri, Jul 12, 2024, 2:50 PM Aldrin octalene....@pm.me.invalid wrote:
> > 

> > > But I think the issue being addressed 1 is essentially, "`delete_file`
> > > shouldn't create additional files/directories in S3."
> > > 

> > > I think discussion about the semantics at large is interesting but may be
> > > a digression? Also, I think there are varying degrees of "filesystem
> > > semantics" that are even being discussed (the naming system and
> > > hierarchical inode structure vs atomicity of read/write operations).
> > > 

> > > I think my question is still relevant: no matter what semantics
> > > `S3FileSystem` is trying to provide, I'm still not sure how the
> > > placeholder
> > > object helps. I assume it's for listing objects, but what else?
> > > 

> > > # ------------------------------
> > > 

> > > # Aldrin
> > > 

> > > https://github.com/drin/
> > > 

> > > https://gitlab.com/octalene
> > > 

> > > https://keybase.io/octalene
> > > 

> > > On Friday, July 12th, 2024 at 14:26, Raphael Taylor-Davies
> > > r.taylordav...@googlemail.com.INVALID wrote:
> > > 

> > > > > Many people
> > > > > are familiar with object stores these days. You could create a new
> > > > > abstraction `ObjectStore` which is very similar to `FileSystem`
> > > > > except
> > > > > the
> > > > > semantics are object store semantics and not filesystem semantics.
> > > 

> > > > FWIW in the Arrow Rust ecosystem we only provide an object store
> > > > abstraction, and this has served us very well. My 2 cents is that
> > > > object
> > > > store semantics are sufficient, if not superior 1, than filesystem
> > > > based interfaces for the vast majority of use cases, with the few
> > > > workloads that aren't sufficiently served requiring such close
> > > > integration with often OS-specific filesystem APIs and behaviours as to
> > > > make building a coherent abstraction extremely difficult.
> > > 

> > > > Iceberg also took a similar approach with its File IO abstraction 2.
> > > 

> > > > 1:
> > 

> > https://docs.rs/object_store/latest/object_store/#why-not-a-filesystem-interface
> > 

> > > > On 12/07/2024 22:05, Weston Pace wrote:
> > > 

> > > > > > The markers are necessary to offer file system semantics on top of
> > > > > > object
> > > > > > stores. You will get a ton of subtle bugs otherwise.
> > > > > > Yes, object stores and filesystems are different. If you expect
> > > > > > your
> > > > > > filesystem to act like a filesystem then these things need to be
> > > > > > done in
> > > > > > order to avoid these bugs.
> > > 

> > > > > If an option modifies a filesystem to behave more like an object
> > > > > store
> > > > > then
> > > > > I don't think it's necessarily a bad thing as long as it isn't the
> > > > > default. By turning on the option the user is intentionally altering
> > > > > the
> > > > > behavior and should not be making the same expectations.
> > > 

> > > > > On the other hand, there is another approach you could take. Many
> > > > > people
> > > > > are familiar with object stores these days. You could create a new
> > > > > abstraction `ObjectStore` which is very similar to `FileSystem`
> > > > > except
> > > > > the
> > > > > semantics are object store semantics and not filesystem semantics. I
> > > > > believe most of our filesystem classes could implement both
> > > > > `ObjectStore`
> > > > > and `FileSystem` abstractions without significant code duplication.
> > > 

> > > > > This way, if a user wants filesystem semantics, they use a
> > > > > `FileSystem` and
> > > > > they pay the abstraction cost. If a user is comfortable with
> > > > > `ObjectStore`
> > > > > semantics they use `ObjectStore` and they don't have to pay the
> > > > > costs.
> > > 

> > > > > This would be more work than just allowing options to violate
> > > > > FileSystem
> > > > > guarantees but it would provide a more clear distinction between the
> > > > > two.
> > > 

> > > > > On Fri, Jul 12, 2024 at 9:25 AM Aldrin octalene....@pm.me.invalid
> > > > > wrote:
> > > 

> > > > > > Hello!
> > > 

> > > > > > This may be naive, but why does the empty directory marker need to
> > > > > > exist
> > > > > > on the S3 side at all? If a local directory is created (because
> > > > > > filesystem
> > > > > > semantics), then I am not sure why a fake object needs to exist on
> > > > > > the
> > > > > > object-store side.
> > > 

> > > > > > # ------------------------------
> > > 

> > > > > > # Aldrin
> > > 

> > > > > > https://github.com/drin/
> > > 

> > > > > > https://gitlab.com/octalene
> > > 

> > > > > > https://keybase.io/octalene
> > > 

> > > > > > On Friday, July 12th, 2024 at 08:35, Felipe Oliveira Carvalho <
> > > > > > felipe...@gmail.com> wrote:
> > > 

> > > > > > > Hi,
> > > 

> > > > > > > The markers are necessary to offer file system semantics on top
> > > > > > > of
> > > > > > > object
> > > > > > > stores. You will get a ton of subtle bugs otherwise.
> > > 

> > > > > > > If instead of arrow::FileSystem, Arrow offered an
> > > > > > > arrow::ObjectStore
> > > > > > > interface that wraps local filesystems and object stores with
> > > > > > > object-store
> > > > > > > semantics (i.e. no concept of empty directory or atomic directory
> > > > > > > deletion), then application developers would have more control of
> > > > > > > the
> > > > > > > actions performed on the object store they are using. Cons would
> > > > > > > be
> > > > > > > slower
> > > > > > > operations when working with a local filesystem and no concept of
> > > > > > > directory.
> > > 

> > > > > > > > 1. Add an Option: Introduce an option in S3Options to control
> > > > > > > > whether empty directory markers are created, giving users the
> > > > > > > > choice.
> > > 

> > > > > > > Then it wouldn't be an honest implementation of arrow::FileSystem
> > > > > > > for the
> > > > > > > reasons listed above.
> > > 

> > > > > > > > Change Default Behavior: Modify the default behavior to avoid
> > > > > > > > creating empty directory markers when a file is deleted.
> > > 

> > > > > > > That would bring in the bugs because an arrow::FileSystem
> > > > > > > instance
> > > > > > > would
> > > > > > > behave differently depending on what is backing it.
> > > 

> > > > > > > > 3. Smarter Directory Creation: Improve the implementation to
> > > > > > > > check
> > > > > > > > for other objects in the same path before creating an empty
> > > > > > > > directory
> > > > > > > > marker.
> > > 

> > > > > > > This might be a problem when more than one client or thread is
> > > > > > > mutating
> > > > > > > the
> > > > > > > object store through the arrow::FileSystem. You can check now and
> > > > > > > once
> > > > > > > you're done deleting all the other files you thought existed are
> > > > > > > deleted
> > > > > > > as
> > > > > > > well. Very likely if clients decide to implement parallel
> > > > > > > deletion.
> > > 

> > > > > > > The existing solution of always creating a marker when done is
> > > > > > > not
> > > > > > > perfect
> > > > > > > either, but less likely to break.
> > > 

> > > > > > > ## Suggested Workaround
> > > 

> > > > > > > Avoiding file by file operations so that internal functions can
> > > > > > > batch as
> > > > > > > much as possible.
> > > 

> > > > > > > --
> > > > > > > Felipe
> > > 

> > > > > > > On Fri, Jul 12, 2024 at 7:22 AM Hyunseok Seo hsseo0...@gmail.com
> > > > > > > wrote:
> > > 

> > > > > > > > Hello. community!
> > > 

> > > > > > > > I am currently working on addressing the issue described in
> > > > > > > > [C++]
> > > > > > > > Addoption to not create parent directory with S3 delete_file.
> > > > > > > > In
> > > > > > > > this
> > > > > > > > process, I have
> > > > > > > > found it necessary to gather feedback on how to best resolve
> > > > > > > > this
> > > > > > > > issue.
> > > > > > > > Below is a summary and some questions I have for the community.
> > > 

> > > > > > > > ### Background
> > > > > > > > Currently, the S3FileSystem generates an empty directory marker
> > > > > > > > (by
> > > > > > > > calling the EnsureParentExists function) when a file is deleted
> > > > > > > > and the
> > > > > > > > directory becomes empty. This behavior maintains the appearance
> > > > > > > > of the
> > > > > > > > directory structure. However, there have been issues raised by
> > > > > > > > users
> > > > > > > > regarding this behavior in issues 1.
> > > 

> > > > > > > > ### Why Maintain Empty Directory Markers?
> > > > > > > > From what I understand, object stores like S3 do not have a
> > > > > > > > concept of
> > > > > > > > directories. The motivation behind maintaining these markers
> > > > > > > > could be
> > > > > > > > to
> > > > > > > > manage the object store as if it were a traditional file
> > > > > > > > system.
> > > > > > > > If
> > > > > > > > anyone
> > > > > > > > knows the context behind the implementation of S3FileSystem, it
> > > > > > > > would
> > > > > > > > be
> > > > > > > > great if you could share it.
> > > 

> > > > > > > > ### Issues with Marker Creation
> > > > > > > > Users who have raised concerns about the creation of empty
> > > > > > > > directory
> > > > > > > > markers cite the following reasons:
> > > 

> > > > > > > > - Increase in Unnecessary Requests 2: Creating empty directory
> > > > > > > > markers leads to additional S3 requests, which can increase
> > > > > > > > costs and
> > > > > > > > affect performance.
> > > > > > > > - File System Consistency Issues 1: S3 is designed as an object
> > > > > > > > store, and creating empty directory markers can break the
> > > > > > > > inherent
> > > > > > > > consistency of the file system.
> > > 

> > > > > > > > ### Proposed Solutions
> > > > > > > > Issue 1 suggests the following approaches:
> > > 

> > > > > > > > 1. Add an Option: Introduce an option in S3Options to control
> > > > > > > > whether
> > > > > > > > empty directory markers are created, giving users the choice.
> > > > > > > > 2. Change Default Behavior: Modify the default behavior to
> > > > > > > > avoid
> > > > > > > > creating empty directory markers when a file is deleted.
> > > > > > > > 3. Smarter Directory Creation: Improve the implementation to
> > > > > > > > check for
> > > > > > > > other objects in the same path before creating an empty
> > > > > > > > directory
> > > > > > > > marker.
> > > > > > > > Here is my personal thought (approach 1 + 3):
> > > 

> > > > > > > > (approach 1) I believe it would be best to add the Marker as an
> > > > > > > > option
> > > > > > > > (as some users might not want this enhancement).
> > > 

> > > > > > > > (approach 3) When the option is enabled, if there are no files
> > > > > > > > (objects)
> > > > > > > > in the path (prefix) corresponding to a directory based on the
> > > > > > > > file
> > > > > > > > system
> > > > > > > > concept, we should maintain the Marker. Otherwise, we should
> > > > > > > > check the
> > > > > > > > number of files in the same path and avoid calling
> > > > > > > > EnsureParentExists
> > > > > > > > if
> > > > > > > > there are two or more files.
> > > 

> > > > > > > > On the other hand, I also feel that this approach might make
> > > > > > > > the
> > > > > > > > logic
> > > > > > > > more
> > > > > > > > complicated.
> > > 

> > > > > > > > ### We Would Like Your Feedback
> > > > > > > > - What are your thoughts on the creation of empty directory
> > > > > > > > markers?
> > > > > > > > - Which of the proposed solutions do you prefer?
> > > > > > > > - Do you have any additional suggestions or comments?
> > > 

> > > > > > > > We appreciate your valuable feedback and aim to find the best
> > > > > > > > solution
> > > > > > > > based on your input.
> > > 

> > > > > > > > Thank you.

Attachment: publickey - octalene.dev@pm.me - 0x21969656.asc
Description: application/pgp-keys

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to