kevinjqliu commented on code in PR #301:
URL: https://github.com/apache/iceberg-python/pull/301#discussion_r1465718521
##########
pyiceberg/io/pyarrow.py:
##########
@@ -288,6 +288,8 @@ def create(self, overwrite: bool = False) -> OutputStream:
try:
if not overwrite and self.exists() is True:
raise FileExistsError(f"Cannot create file, already exists:
{self.location}")
+ # Parent directories must be created first in certain file
systems, such as the LocalFileSystem.
+ self._filesystem.create_dir(os.path.dirname(self._path),
recursive=True)
Review Comment:
@Fokko thanks for the review.
I agree with the above. The Arrow FileIO implementation might not be the
best place to implement this behavior. So far both of the supported FS
implementations (`ARROW_FILE_IO` and `FSSPEC_FILE_IO`) are failing to write to
the local file system.
I want to make writes work for the local file system.
Looking at the Java side, there is a [`LocalOutputFile`
implementation](https://github.com/apache/iceberg/blob/fd1cf49280bde07d67c6bc1a6ec60238e1e38f7f/api/src/main/java/org/apache/iceberg/Files.java#L59)
which implements the behavior for creating parent directories.
Maybe we can implement a new FileIO implementation and make that the
preferred implementation for the `file://` scheme.
https://github.com/apache/iceberg-python/blob/4cf1f35dfd3e7cfb2996887e861d740239746306/pyiceberg/io/__init__.py#L278
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]