Fokko commented on PR #6010:
URL: https://github.com/apache/iceberg/pull/6010#issuecomment-1286621641

   Sorry for the limited context. I'm working on converting the files into a 
PyArrow Dataset. This requires passing in a single filesystem and a list of 
files. The files-paths can't have a scheme, since that will have PyArrow throw 
an error. The idea behind it is that the S3FileSystem already indicates that it 
is an S3 path.
   
   By splitting this we can re-use this logic to pass the list of files to the 
Dataset:
   
   ```python
   io = self.table.io()
   if isinstance(io, FsspecFileIO):
       ...
   elif isinstance(io, PyArrowFileIO):
       # We should not use internal methods
       fs = io._get_fs_and_path(files[0])[0]
       # This is also awkward, PyArrow requires removing the s3a://
       files = ["".join(urlparse(file)[1:3]) for file in files]
   else:
       raise ValueError(f"Unsupported FileSystem: {io}")
   ```
   
   Convert it into:
   ```python
   io = self.table.io()
   if isinstance(io, FsspecFileIO):
       ...
   elif isinstance(io, PyArrowFileIO):
       normalized_files = map(PyArrowFileIO.normalize_location, files)
       fs = io.get_fs(next(files).scheme)
       files = [file.path for file in normalized_files]
   else:
       raise ValueError(f"Unsupported FileSystem: {io}")
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to