> The markers are necessary to offer file system semantics on top of object
> stores. You will get a ton of subtle bugs otherwise.

Yes, object stores and filesystems are different.  If you expect your
filesystem to act like a filesystem then these things need to be done in
order to avoid these bugs.

If an option modifies a filesystem to behave more like an object store then
I don't think it's necessarily a bad thing as long as it isn't the
default.  By turning on the option the user is intentionally altering the
behavior and should not be making the same expectations.

On the other hand, there is another approach you could take.  Many people
are familiar with object stores these days.  You could create a new
abstraction `ObjectStore` which is very similar to `FileSystem` except the
semantics are object store semantics and not filesystem semantics.  I
believe most of our filesystem classes could implement both `ObjectStore`
and `FileSystem` abstractions without significant code duplication.

This way, if a user wants filesystem semantics, they use a `FileSystem` and
they pay the abstraction cost.  If a user is comfortable with `ObjectStore`
semantics they use `ObjectStore` and they don't have to pay the costs.

This would be more work than just allowing options to violate FileSystem
guarantees but it would provide a more clear distinction between the two.


On Fri, Jul 12, 2024 at 9:25 AM Aldrin <octalene....@pm.me.invalid> wrote:

> Hello!
>
> This may be naive, but why does the empty directory marker need to exist
> on the S3 side at all? If a local directory is created (because filesystem
> semantics), then I am not sure why a fake object needs to exist on the
> object-store side.
>
>
>
>
>
> # ------------------------------
>
> # Aldrin
>
>
> https://github.com/drin/
>
> https://gitlab.com/octalene
>
> https://keybase.io/octalene
>
>
> On Friday, July 12th, 2024 at 08:35, Felipe Oliveira Carvalho <
> felipe...@gmail.com> wrote:
>
> > Hi,
> >
>
> > The markers are necessary to offer file system semantics on top of object
> > stores. You will get a ton of subtle bugs otherwise.
> >
>
> > If instead of arrow::FileSystem, Arrow offered an arrow::ObjectStore
> > interface that wraps local filesystems and object stores with
> object-store
> > semantics (i.e. no concept of empty directory or atomic directory
> > deletion), then application developers would have more control of the
> > actions performed on the object store they are using. Cons would be
> slower
> > operations when working with a local filesystem and no concept of
> directory.
> >
>
> > > 1. Add an Option: Introduce an option in S3Options to control
> >
>
> > whether empty directory markers are created, giving users the choice.
> >
>
> > Then it wouldn't be an honest implementation of arrow::FileSystem for the
> > reasons listed above.
> >
>
> > > Change Default Behavior: Modify the default behavior to avoid
> >
>
> > creating empty directory markers when a file is deleted.
> >
>
> > That would bring in the bugs because an arrow::FileSystem instance would
> > behave differently depending on what is backing it.
> >
>
> > > 3. Smarter Directory Creation: Improve the implementation to check
> >
>
> > for other objects in the same path before creating an empty directory
> > marker.
> >
>
> > This might be a problem when more than one client or thread is mutating
> the
> > object store through the arrow::FileSystem. You can check now and once
> > you're done deleting all the other files you thought existed are deleted
> as
> > well. Very likely if clients decide to implement parallel deletion.
> >
>
> > The existing solution of always creating a marker when done is not
> perfect
> > either, but less likely to break.
> >
>
> > ## Suggested Workaround
> >
>
> > Avoiding file by file operations so that internal functions can batch as
> > much as possible.
> >
>
> > --
> > Felipe
> >
>
> >
>
> > On Fri, Jul 12, 2024 at 7:22 AM Hyunseok Seo hsseo0...@gmail.com wrote:
> >
>
> > > Hello. community!
> > >
>
> > > I am currently working on addressing the issue described in [C++]
> Addoption to not create parent directory with S3 delete_file. In this
> process, I have
> > > found it necessary to gather feedback on how to best resolve this
> issue.
> > > Below is a summary and some questions I have for the community.
> > >
>
> > > ### Background
> > > Currently, the S3FileSystem generates an empty directory marker (by
> > > calling the EnsureParentExists function) when a file is deleted and the
> > > directory becomes empty. This behavior maintains the appearance of the
> > > directory structure. However, there have been issues raised by users
> > > regarding this behavior in issues 1.
> > >
>
> > > ### Why Maintain Empty Directory Markers?
> > > From what I understand, object stores like S3 do not have a concept of
> > > directories. The motivation behind maintaining these markers could be
> to
> > > manage the object store as if it were a traditional file system. If
> anyone
> > > knows the context behind the implementation of S3FileSystem, it would
> be
> > > great if you could share it.
> > >
>
> > > ### Issues with Marker Creation
> > > Users who have raised concerns about the creation of empty directory
> > > markers cite the following reasons:
> > >
>
> > > - Increase in Unnecessary Requests 2: Creating empty directory
> > > markers leads to additional S3 requests, which can increase costs and
> > > affect performance.
> > > - File System Consistency Issues 1: S3 is designed as an object
> > > store, and creating empty directory markers can break the inherent
> > > consistency of the file system.
> > >
>
> > > ### Proposed Solutions
> > > Issue 1 suggests the following approaches:
> > >
>
> > > 1. Add an Option: Introduce an option in S3Options to control whether
> > > empty directory markers are created, giving users the choice.
> > > 2. Change Default Behavior: Modify the default behavior to avoid
> > > creating empty directory markers when a file is deleted.
> > > 3. Smarter Directory Creation: Improve the implementation to check for
> > > other objects in the same path before creating an empty directory
> marker.
> > >
>
> > > Here is my personal thought (approach 1 + 3):
> > >
>
> > > (approach 1) I believe it would be best to add the Marker as an option
> > > (as some users might not want this enhancement).
> > >
>
> > > (approach 3) When the option is enabled, if there are no files
> (objects)
> > > in the path (prefix) corresponding to a directory based on the file
> system
> > > concept, we should maintain the Marker. Otherwise, we should check the
> > > number of files in the same path and avoid calling EnsureParentExists
> if
> > > there are two or more files.
> > >
>
> > > On the other hand, I also feel that this approach might make the logic
> more
> > > complicated.
> > >
>
> > > ### We Would Like Your Feedback
> > > - What are your thoughts on the creation of empty directory markers?
> > > - Which of the proposed solutions do you prefer?
> > > - Do you have any additional suggestions or comments?
> > >
>
> > > We appreciate your valuable feedback and aim to find the best solution
> > > based on your input.
> > >
>
> > > Thank you.

Reply via email to