Hello! This may be naive, but why does the empty directory marker need to exist on the S3 side at all? If a local directory is created (because filesystem semantics), then I am not sure why a fake object needs to exist on the object-store side.
# ------------------------------ # Aldrin https://github.com/drin/ https://gitlab.com/octalene https://keybase.io/octalene On Friday, July 12th, 2024 at 08:35, Felipe Oliveira Carvalho <felipe...@gmail.com> wrote: > Hi, > > The markers are necessary to offer file system semantics on top of object > stores. You will get a ton of subtle bugs otherwise. > > If instead of arrow::FileSystem, Arrow offered an arrow::ObjectStore > interface that wraps local filesystems and object stores with object-store > semantics (i.e. no concept of empty directory or atomic directory > deletion), then application developers would have more control of the > actions performed on the object store they are using. Cons would be slower > operations when working with a local filesystem and no concept of directory. > > > 1. Add an Option: Introduce an option in S3Options to control > > whether empty directory markers are created, giving users the choice. > > Then it wouldn't be an honest implementation of arrow::FileSystem for the > reasons listed above. > > > Change Default Behavior: Modify the default behavior to avoid > > creating empty directory markers when a file is deleted. > > That would bring in the bugs because an arrow::FileSystem instance would > behave differently depending on what is backing it. > > > 3. Smarter Directory Creation: Improve the implementation to check > > for other objects in the same path before creating an empty directory > marker. > > This might be a problem when more than one client or thread is mutating the > object store through the arrow::FileSystem. You can check now and once > you're done deleting all the other files you thought existed are deleted as > well. Very likely if clients decide to implement parallel deletion. > > The existing solution of always creating a marker when done is not perfect > either, but less likely to break. > > ## Suggested Workaround > > Avoiding file by file operations so that internal functions can batch as > much as possible. > > -- > Felipe > > > On Fri, Jul 12, 2024 at 7:22 AM Hyunseok Seo hsseo0...@gmail.com wrote: > > > Hello. community! > > > > I am currently working on addressing the issue described in [C++] Addoption > > to not create parent directory with S3 delete_file. In this process, I have > > found it necessary to gather feedback on how to best resolve this issue. > > Below is a summary and some questions I have for the community. > > > > ### Background > > Currently, the S3FileSystem generates an empty directory marker (by > > calling the EnsureParentExists function) when a file is deleted and the > > directory becomes empty. This behavior maintains the appearance of the > > directory structure. However, there have been issues raised by users > > regarding this behavior in issues 1. > > > > ### Why Maintain Empty Directory Markers? > > From what I understand, object stores like S3 do not have a concept of > > directories. The motivation behind maintaining these markers could be to > > manage the object store as if it were a traditional file system. If anyone > > knows the context behind the implementation of S3FileSystem, it would be > > great if you could share it. > > > > ### Issues with Marker Creation > > Users who have raised concerns about the creation of empty directory > > markers cite the following reasons: > > > > - Increase in Unnecessary Requests 2: Creating empty directory > > markers leads to additional S3 requests, which can increase costs and > > affect performance. > > - File System Consistency Issues 1: S3 is designed as an object > > store, and creating empty directory markers can break the inherent > > consistency of the file system. > > > > ### Proposed Solutions > > Issue 1 suggests the following approaches: > > > > 1. Add an Option: Introduce an option in S3Options to control whether > > empty directory markers are created, giving users the choice. > > 2. Change Default Behavior: Modify the default behavior to avoid > > creating empty directory markers when a file is deleted. > > 3. Smarter Directory Creation: Improve the implementation to check for > > other objects in the same path before creating an empty directory marker. > > > > Here is my personal thought (approach 1 + 3): > > > > (approach 1) I believe it would be best to add the Marker as an option > > (as some users might not want this enhancement). > > > > (approach 3) When the option is enabled, if there are no files (objects) > > in the path (prefix) corresponding to a directory based on the file system > > concept, we should maintain the Marker. Otherwise, we should check the > > number of files in the same path and avoid calling EnsureParentExists if > > there are two or more files. > > > > On the other hand, I also feel that this approach might make the logic more > > complicated. > > > > ### We Would Like Your Feedback > > - What are your thoughts on the creation of empty directory markers? > > - Which of the proposed solutions do you prefer? > > - Do you have any additional suggestions or comments? > > > > We appreciate your valuable feedback and aim to find the best solution > > based on your input. > > > > Thank you.
publickey - octalene.dev@pm.me - 0x21969656.asc
Description: application/pgp-keys
signature.asc
Description: OpenPGP digital signature