Fokko commented on code in PR #301:
URL: https://github.com/apache/iceberg-python/pull/301#discussion_r1465458297
##########
pyiceberg/io/pyarrow.py:
##########
@@ -288,6 +288,8 @@ def create(self, overwrite: bool = False) -> OutputStream:
try:
if not overwrite and self.exists() is True:
raise FileExistsError(f"Cannot create file, already exists:
{self.location}")
+ # Parent directories must be created first in certain file
systems, such as the LocalFileSystem.
+ self._filesystem.create_dir(os.path.dirname(self._path),
recursive=True)
Review Comment:
This is typically something that we try to avoid. Iceberg is designed to
work with object stores, and those don't have a notion of directories. One
recommendation is even to disallow moves and listing of directories. One thing
is also creating a directory. I'm not sure what the behavior is for the Arrow
S3 implementation. Since some of the implementations still make a call(s):
- For example, a list operation to check if the directory is there
- For example, they touch a small file under the prefix to indicate that the
path should be created.
Another concern is that we currently do this in Arrow, we also would need to
do this for other implementations to avoid discrepancies. The concept of the
FileIO is that you easily can swap them out.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]