Thanks Antoine!

Preserving the property across multiple clients (and presumably across 
independent sessions of the same client) is the part that I was missing.

>From the link you shared, I saw an aws page discussing the use of folders in 
>the s3 console [1]. Their approach is to create the marker on folder creation. 
>Instead of adding an `S3Options` property for avoiding marker creation on 
>delete, what if it changes the time of marker creation from "on delete" to "on 
>creation"? That would seem to align better with tools like S3 console as well 
>as cyberduck and simplify the overall consensus logic that Felipe mentioned as 
>being a potential pitfall for the 3rd proposed solution (folder creation 
>should occur far less often than file deletion/move/replace).

I'm not sure if this is already an option (I don't know much about the 
S3Filesystem implementation of Arrow) or was an old option that was changed in 
favor of creating the marker on deletion.


[1]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-folders.html





# ------------------------------

# Aldrin


https://github.com/drin/

https://gitlab.com/octalene

https://keybase.io/octalene


On Monday, July 15th, 2024 at 07:59, Antoine Pitrou <anto...@python.org> wrote:

> No, because these markers also communicate the information to other
> implementations of S3 abstractions.
> 

> An example of this is: https://docs.cyberduck.io/protocols/s3/#folders
> 

> Regards
> 

> Antoine.
> 

> 

> Le 13/07/2024 à 07:15, Aldrin a écrit :
> 

> > > ...then I still expect the directory /foo to exist
> > 

> > Right, but if that is the sole purpose of empty directory markers, I'm 
> > curious if there was an attempt at keeping track of the 
> > prefixes/directories locally?
> > 

> > # ------------------------------
> > 

> > # Aldrin
> > 

> > https://github.com/drin/
> > 

> > https://gitlab.com/octalene
> > 

> > https://keybase.io/octalene
> > 

> > On Friday, July 12th, 2024 at 19:44, Hyunseok Seo hsseo0...@gmail.com wrote:
> > 

> > > I wonder why S3 (object storage) operates based on file system semantics.
> > > Python users are usually data scientists. They might not be familiar with
> > > the differences between object storage and file storage. Furthermore, I
> > > think there are a lot of pyarrow users.
> > 

> > > > Avoiding file by file operations so that internal functions can batch as
> > > > much as possible.
> > 

> > > Thank you for the detailed explanation. So, are you suggesting that a more
> > > fundamental solution is needed rather than just adding options? I thought
> > > supporting such options would help users who do not want markers, despite
> > > the issues you mentioned. Furthermore, I agree that supporting ObjectStore
> > > is necessary for a more fundamental solution.
> > 

> > > Thank you.
> > 

> > > 2024년 7월 13일 (토) 오전 10:00, Weston Pace weston.p...@gmail.com님이 작성:
> > 

> > > > > I think my question is still relevant: no matter what semantics
> > > > > `S3FileSystem` is trying to provide, I'm still not sure how the 
> > > > > placeholder
> > > > > object helps. I assume it's for listing objects, but what else?
> > 

> > > > If I have a local filesystem and I delete a file /foo/bar then I still
> > > > expect the directory /foo to exist.
> > 

> > > > ```
> > 

> > > > mkdir /foo
> > 

> > > > touch /foo/bar
> > 

> > > > rm /foo/bar
> > 

> > > > ls / # should show /foo
> > 

> > > > ```
> > 

> > > > In an object store there is no `mkdir` and, even if I remove /foo/bar 
> > > > then
> > > > there is no guarantee /foo will exist.
> > 

> > > > On Fri, Jul 12, 2024, 2:50 PM Aldrin octalene....@pm.me.invalid wrote:
> > 

> > > > > But I think the issue being addressed 1 is essentially, "`delete_file`
> > > > > shouldn't create additional files/directories in S3."
> > 

> > > > > I think discussion about the semantics at large is interesting but 
> > > > > may be
> > > > > a digression? Also, I think there are varying degrees of "filesystem
> > > > > semantics" that are even being discussed (the naming system and
> > > > > hierarchical inode structure vs atomicity of read/write operations).
> > 

> > > > > I think my question is still relevant: no matter what semantics
> > > > > `S3FileSystem` is trying to provide, I'm still not sure how the
> > > > > placeholder
> > > > > object helps. I assume it's for listing objects, but what else?
> > 

> > > > > # ------------------------------
> > 

> > > > > # Aldrin
> > 

> > > > > https://github.com/drin/
> > 

> > > > > https://gitlab.com/octalene
> > 

> > > > > https://keybase.io/octalene
> > 

> > > > > On Friday, July 12th, 2024 at 14:26, Raphael Taylor-Davies
> > > > > r.taylordav...@googlemail.com.INVALID wrote:
> > 

> > > > > > > Many people
> > > > > > > are familiar with object stores these days. You could create a new
> > > > > > > abstraction `ObjectStore` which is very similar to `FileSystem`
> > > > > > > except
> > > > > > > the
> > > > > > > semantics are object store semantics and not filesystem semantics.
> > 

> > > > > > FWIW in the Arrow Rust ecosystem we only provide an object store
> > > > > > abstraction, and this has served us very well. My 2 cents is that
> > > > > > object
> > > > > > store semantics are sufficient, if not superior 1, than filesystem
> > > > > > based interfaces for the vast majority of use cases, with the few
> > > > > > workloads that aren't sufficiently served requiring such close
> > > > > > integration with often OS-specific filesystem APIs and behaviours 
> > > > > > as to
> > > > > > make building a coherent abstraction extremely difficult.
> > 

> > > > > > Iceberg also took a similar approach with its File IO abstraction 2.
> > 

> > > > > > 1:
> > 

> > > > https://docs.rs/object_store/latest/object_store/#why-not-a-filesystem-interface
> > 

> > > > > > On 12/07/2024 22:05, Weston Pace wrote:
> > 

> > > > > > > > The markers are necessary to offer file system semantics on top 
> > > > > > > > of
> > > > > > > > object
> > > > > > > > stores. You will get a ton of subtle bugs otherwise.
> > > > > > > > Yes, object stores and filesystems are different. If you expect
> > > > > > > > your
> > > > > > > > filesystem to act like a filesystem then these things need to be
> > > > > > > > done in
> > > > > > > > order to avoid these bugs.
> > 

> > > > > > > If an option modifies a filesystem to behave more like an object
> > > > > > > store
> > > > > > > then
> > > > > > > I don't think it's necessarily a bad thing as long as it isn't the
> > > > > > > default. By turning on the option the user is intentionally 
> > > > > > > altering
> > > > > > > the
> > > > > > > behavior and should not be making the same expectations.
> > 

> > > > > > > On the other hand, there is another approach you could take. Many
> > > > > > > people
> > > > > > > are familiar with object stores these days. You could create a new
> > > > > > > abstraction `ObjectStore` which is very similar to `FileSystem`
> > > > > > > except
> > > > > > > the
> > > > > > > semantics are object store semantics and not filesystem 
> > > > > > > semantics. I
> > > > > > > believe most of our filesystem classes could implement both
> > > > > > > `ObjectStore`
> > > > > > > and `FileSystem` abstractions without significant code 
> > > > > > > duplication.
> > 

> > > > > > > This way, if a user wants filesystem semantics, they use a
> > > > > > > `FileSystem` and
> > > > > > > they pay the abstraction cost. If a user is comfortable with
> > > > > > > `ObjectStore`
> > > > > > > semantics they use `ObjectStore` and they don't have to pay the
> > > > > > > costs.
> > 

> > > > > > > This would be more work than just allowing options to violate
> > > > > > > FileSystem
> > > > > > > guarantees but it would provide a more clear distinction between 
> > > > > > > the
> > > > > > > two.
> > 

> > > > > > > On Fri, Jul 12, 2024 at 9:25 AM Aldrin octalene....@pm.me.invalid
> > > > > > > wrote:
> > 

> > > > > > > > Hello!
> > 

> > > > > > > > This may be naive, but why does the empty directory marker need 
> > > > > > > > to
> > > > > > > > exist
> > > > > > > > on the S3 side at all? If a local directory is created (because
> > > > > > > > filesystem
> > > > > > > > semantics), then I am not sure why a fake object needs to exist 
> > > > > > > > on
> > > > > > > > the
> > > > > > > > object-store side.
> > 

> > > > > > > > # ------------------------------
> > 

> > > > > > > > # Aldrin
> > 

> > > > > > > > https://github.com/drin/
> > 

> > > > > > > > https://gitlab.com/octalene
> > 

> > > > > > > > https://keybase.io/octalene
> > 

> > > > > > > > On Friday, July 12th, 2024 at 08:35, Felipe Oliveira Carvalho <
> > > > > > > > felipe...@gmail.com> wrote:
> > 

> > > > > > > > > Hi,
> > 

> > > > > > > > > The markers are necessary to offer file system semantics on 
> > > > > > > > > top
> > > > > > > > > of
> > > > > > > > > object
> > > > > > > > > stores. You will get a ton of subtle bugs otherwise.
> > 

> > > > > > > > > If instead of arrow::FileSystem, Arrow offered an
> > > > > > > > > arrow::ObjectStore
> > > > > > > > > interface that wraps local filesystems and object stores with
> > > > > > > > > object-store
> > > > > > > > > semantics (i.e. no concept of empty directory or atomic 
> > > > > > > > > directory
> > > > > > > > > deletion), then application developers would have more 
> > > > > > > > > control of
> > > > > > > > > the
> > > > > > > > > actions performed on the object store they are using. Cons 
> > > > > > > > > would
> > > > > > > > > be
> > > > > > > > > slower
> > > > > > > > > operations when working with a local filesystem and no 
> > > > > > > > > concept of
> > > > > > > > > directory.
> > 

> > > > > > > > > > 1. Add an Option: Introduce an option in S3Options to 
> > > > > > > > > > control
> > > > > > > > > > whether empty directory markers are created, giving users 
> > > > > > > > > > the
> > > > > > > > > > choice.
> > 

> > > > > > > > > Then it wouldn't be an honest implementation of 
> > > > > > > > > arrow::FileSystem
> > > > > > > > > for the
> > > > > > > > > reasons listed above.
> > 

> > > > > > > > > > Change Default Behavior: Modify the default behavior to 
> > > > > > > > > > avoid
> > > > > > > > > > creating empty directory markers when a file is deleted.
> > 

> > > > > > > > > That would bring in the bugs because an arrow::FileSystem
> > > > > > > > > instance
> > > > > > > > > would
> > > > > > > > > behave differently depending on what is backing it.
> > 

> > > > > > > > > > 3. Smarter Directory Creation: Improve the implementation to
> > > > > > > > > > check
> > > > > > > > > > for other objects in the same path before creating an empty
> > > > > > > > > > directory
> > > > > > > > > > marker.
> > 

> > > > > > > > > This might be a problem when more than one client or thread is
> > > > > > > > > mutating
> > > > > > > > > the
> > > > > > > > > object store through the arrow::FileSystem. You can check now 
> > > > > > > > > and
> > > > > > > > > once
> > > > > > > > > you're done deleting all the other files you thought existed 
> > > > > > > > > are
> > > > > > > > > deleted
> > > > > > > > > as
> > > > > > > > > well. Very likely if clients decide to implement parallel
> > > > > > > > > deletion.
> > 

> > > > > > > > > The existing solution of always creating a marker when done is
> > > > > > > > > not
> > > > > > > > > perfect
> > > > > > > > > either, but less likely to break.
> > 

> > > > > > > > > ## Suggested Workaround
> > 

> > > > > > > > > Avoiding file by file operations so that internal functions 
> > > > > > > > > can
> > > > > > > > > batch as
> > > > > > > > > much as possible.
> > 

> > > > > > > > > --
> > > > > > > > > Felipe
> > 

> > > > > > > > > On Fri, Jul 12, 2024 at 7:22 AM Hyunseok Seo 
> > > > > > > > > hsseo0...@gmail.com
> > > > > > > > > wrote:
> > 

> > > > > > > > > > Hello. community!
> > 

> > > > > > > > > > I am currently working on addressing the issue described in
> > > > > > > > > > [C++]
> > > > > > > > > > Addoption to not create parent directory with S3 
> > > > > > > > > > delete_file.
> > > > > > > > > > In
> > > > > > > > > > this
> > > > > > > > > > process, I have
> > > > > > > > > > found it necessary to gather feedback on how to best resolve
> > > > > > > > > > this
> > > > > > > > > > issue.
> > > > > > > > > > Below is a summary and some questions I have for the 
> > > > > > > > > > community.
> > 

> > > > > > > > > > ### Background
> > > > > > > > > > Currently, the S3FileSystem generates an empty directory 
> > > > > > > > > > marker
> > > > > > > > > > (by
> > > > > > > > > > calling the EnsureParentExists function) when a file is 
> > > > > > > > > > deleted
> > > > > > > > > > and the
> > > > > > > > > > directory becomes empty. This behavior maintains the 
> > > > > > > > > > appearance
> > > > > > > > > > of the
> > > > > > > > > > directory structure. However, there have been issues raised 
> > > > > > > > > > by
> > > > > > > > > > users
> > > > > > > > > > regarding this behavior in issues 1.
> > 

> > > > > > > > > > ### Why Maintain Empty Directory Markers?
> > > > > > > > > > From what I understand, object stores like S3 do not have a
> > > > > > > > > > concept of
> > > > > > > > > > directories. The motivation behind maintaining these markers
> > > > > > > > > > could be
> > > > > > > > > > to
> > > > > > > > > > manage the object store as if it were a traditional file
> > > > > > > > > > system.
> > > > > > > > > > If
> > > > > > > > > > anyone
> > > > > > > > > > knows the context behind the implementation of 
> > > > > > > > > > S3FileSystem, it
> > > > > > > > > > would
> > > > > > > > > > be
> > > > > > > > > > great if you could share it.
> > 

> > > > > > > > > > ### Issues with Marker Creation
> > > > > > > > > > Users who have raised concerns about the creation of empty
> > > > > > > > > > directory
> > > > > > > > > > markers cite the following reasons:
> > 

> > > > > > > > > > - Increase in Unnecessary Requests 2: Creating empty 
> > > > > > > > > > directory
> > > > > > > > > > markers leads to additional S3 requests, which can increase
> > > > > > > > > > costs and
> > > > > > > > > > affect performance.
> > > > > > > > > > - File System Consistency Issues 1: S3 is designed as an 
> > > > > > > > > > object
> > > > > > > > > > store, and creating empty directory markers can break the
> > > > > > > > > > inherent
> > > > > > > > > > consistency of the file system.
> > 

> > > > > > > > > > ### Proposed Solutions
> > > > > > > > > > Issue 1 suggests the following approaches:
> > 

> > > > > > > > > > 1. Add an Option: Introduce an option in S3Options to 
> > > > > > > > > > control
> > > > > > > > > > whether
> > > > > > > > > > empty directory markers are created, giving users the 
> > > > > > > > > > choice.
> > > > > > > > > > 2. Change Default Behavior: Modify the default behavior to
> > > > > > > > > > avoid
> > > > > > > > > > creating empty directory markers when a file is deleted.
> > > > > > > > > > 3. Smarter Directory Creation: Improve the implementation to
> > > > > > > > > > check for
> > > > > > > > > > other objects in the same path before creating an empty
> > > > > > > > > > directory
> > > > > > > > > > marker.
> > > > > > > > > > Here is my personal thought (approach 1 + 3):
> > 

> > > > > > > > > > (approach 1) I believe it would be best to add the Marker 
> > > > > > > > > > as an
> > > > > > > > > > option
> > > > > > > > > > (as some users might not want this enhancement).
> > 

> > > > > > > > > > (approach 3) When the option is enabled, if there are no 
> > > > > > > > > > files
> > > > > > > > > > (objects)
> > > > > > > > > > in the path (prefix) corresponding to a directory based on 
> > > > > > > > > > the
> > > > > > > > > > file
> > > > > > > > > > system
> > > > > > > > > > concept, we should maintain the Marker. Otherwise, we should
> > > > > > > > > > check the
> > > > > > > > > > number of files in the same path and avoid calling
> > > > > > > > > > EnsureParentExists
> > > > > > > > > > if
> > > > > > > > > > there are two or more files.
> > 

> > > > > > > > > > On the other hand, I also feel that this approach might make
> > > > > > > > > > the
> > > > > > > > > > logic
> > > > > > > > > > more
> > > > > > > > > > complicated.
> > 

> > > > > > > > > > ### We Would Like Your Feedback
> > > > > > > > > > - What are your thoughts on the creation of empty directory
> > > > > > > > > > markers?
> > > > > > > > > > - Which of the proposed solutions do you prefer?
> > > > > > > > > > - Do you have any additional suggestions or comments?
> > 

> > > > > > > > > > We appreciate your valuable feedback and aim to find the 
> > > > > > > > > > best
> > > > > > > > > > solution
> > > > > > > > > > based on your input.
> > 

> > > > > > > > > > Thank you.

Attachment: publickey - octalene.dev@pm.me - 0x21969656.asc
Description: application/pgp-keys

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to