Hello!

This may be naive, but why does the empty directory marker need to exist on the 
S3 side at all? If a local directory is created (because filesystem semantics), 
then I am not sure why a fake object needs to exist on the object-store side.





# ------------------------------

# Aldrin


https://github.com/drin/

https://gitlab.com/octalene

https://keybase.io/octalene


On Friday, July 12th, 2024 at 08:35, Felipe Oliveira Carvalho 
<felipe...@gmail.com> wrote:

> Hi,
> 

> The markers are necessary to offer file system semantics on top of object
> stores. You will get a ton of subtle bugs otherwise.
> 

> If instead of arrow::FileSystem, Arrow offered an arrow::ObjectStore
> interface that wraps local filesystems and object stores with object-store
> semantics (i.e. no concept of empty directory or atomic directory
> deletion), then application developers would have more control of the
> actions performed on the object store they are using. Cons would be slower
> operations when working with a local filesystem and no concept of directory.
> 

> > 1. Add an Option: Introduce an option in S3Options to control
> 

> whether empty directory markers are created, giving users the choice.
> 

> Then it wouldn't be an honest implementation of arrow::FileSystem for the
> reasons listed above.
> 

> > Change Default Behavior: Modify the default behavior to avoid
> 

> creating empty directory markers when a file is deleted.
> 

> That would bring in the bugs because an arrow::FileSystem instance would
> behave differently depending on what is backing it.
> 

> > 3. Smarter Directory Creation: Improve the implementation to check
> 

> for other objects in the same path before creating an empty directory
> marker.
> 

> This might be a problem when more than one client or thread is mutating the
> object store through the arrow::FileSystem. You can check now and once
> you're done deleting all the other files you thought existed are deleted as
> well. Very likely if clients decide to implement parallel deletion.
> 

> The existing solution of always creating a marker when done is not perfect
> either, but less likely to break.
> 

> ## Suggested Workaround
> 

> Avoiding file by file operations so that internal functions can batch as
> much as possible.
> 

> --
> Felipe
> 

> 

> On Fri, Jul 12, 2024 at 7:22 AM Hyunseok Seo hsseo0...@gmail.com wrote:
> 

> > Hello. community!
> > 

> > I am currently working on addressing the issue described in [C++] Addoption 
> > to not create parent directory with S3 delete_file. In this process, I have
> > found it necessary to gather feedback on how to best resolve this issue.
> > Below is a summary and some questions I have for the community.
> > 

> > ### Background
> > Currently, the S3FileSystem generates an empty directory marker (by
> > calling the EnsureParentExists function) when a file is deleted and the
> > directory becomes empty. This behavior maintains the appearance of the
> > directory structure. However, there have been issues raised by users
> > regarding this behavior in issues 1.
> > 

> > ### Why Maintain Empty Directory Markers?
> > From what I understand, object stores like S3 do not have a concept of
> > directories. The motivation behind maintaining these markers could be to
> > manage the object store as if it were a traditional file system. If anyone
> > knows the context behind the implementation of S3FileSystem, it would be
> > great if you could share it.
> > 

> > ### Issues with Marker Creation
> > Users who have raised concerns about the creation of empty directory
> > markers cite the following reasons:
> > 

> > - Increase in Unnecessary Requests 2: Creating empty directory
> > markers leads to additional S3 requests, which can increase costs and
> > affect performance.
> > - File System Consistency Issues 1: S3 is designed as an object
> > store, and creating empty directory markers can break the inherent
> > consistency of the file system.
> > 

> > ### Proposed Solutions
> > Issue 1 suggests the following approaches:
> > 

> > 1. Add an Option: Introduce an option in S3Options to control whether
> > empty directory markers are created, giving users the choice.
> > 2. Change Default Behavior: Modify the default behavior to avoid
> > creating empty directory markers when a file is deleted.
> > 3. Smarter Directory Creation: Improve the implementation to check for
> > other objects in the same path before creating an empty directory marker.
> > 

> > Here is my personal thought (approach 1 + 3):
> > 

> > (approach 1) I believe it would be best to add the Marker as an option
> > (as some users might not want this enhancement).
> > 

> > (approach 3) When the option is enabled, if there are no files (objects)
> > in the path (prefix) corresponding to a directory based on the file system
> > concept, we should maintain the Marker. Otherwise, we should check the
> > number of files in the same path and avoid calling EnsureParentExists if
> > there are two or more files.
> > 

> > On the other hand, I also feel that this approach might make the logic more
> > complicated.
> > 

> > ### We Would Like Your Feedback
> > - What are your thoughts on the creation of empty directory markers?
> > - Which of the proposed solutions do you prefer?
> > - Do you have any additional suggestions or comments?
> > 

> > We appreciate your valuable feedback and aim to find the best solution
> > based on your input.
> > 

> > Thank you.

Attachment: publickey - octalene.dev@pm.me - 0x21969656.asc
Description: application/pgp-keys

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to