coryan commented on pull request #11996:
URL: https://github.com/apache/arrow/pull/11996#issuecomment-1009463491
> My main concern here is: does this reduce interoperability with filesystem
"hierarchies" created by other GCS clients?
This is how I understand the problem:
- The Apache/Arrow APIs want to be able to query if a directory "exists",
even if empty.
- That means we need to put some kind of marker in the GCS file system or
any query will return `kNotFound` (even listing)
- Most tools and libraries using GCS natively do not bother with these
markers at all, so whatever we do will need to work when there are no markers
(by basically listing all the files prefixed with `directory/` whether a marker
is found or not).
- We can use markers using a trailing slash `name/`, that makes things more
similar to the GCS UI.
- That breaks the generic FS tests, and one may need two RPCs to check if
`name` is a file or a directory.
- And still get in trouble because something could create `name` and
`name/` in GCS (and `name//` for that matter).
- Using metadata works for the generic FS tests, requires fewer RPCs, and
generally seems to fit the APIs in Apache "better"
- It makes things less similar to the GCS UI :shrug:
That is, both solutions have downsides. If you want things to be more
compatible with the GCS UI we can certainly do that. I think the really
interesting case is working when there are no markers at all, which we can get
to work in both cases.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]