Ben Kietzman created ARROW-8201: ----------------------------------- Summary: [Python][Dataset] Improve ergonomics of FileFragment Key: ARROW-8201 URL: https://issues.apache.org/jira/browse/ARROW-8201 Project: Apache Arrow Issue Type: Improvement Components: C++ - Dataset, Python Affects Versions: 0.16.0 Reporter: Ben Kietzman Fix For: 1.0.0
FileFragment can be made more directly useful by adding convenience methods. For example, a FileFragment could allow underlying file/buffer to be opened directly: {code} def open(self): """ Open a NativeFile of the buffer or file viewed by this fragment. """ cdef: CFileSystem* c_filesystem shared_ptr[CRandomAccessFile] opened NativeFile out = NativeFile() buf = self.buffer if buf is not None: return pa.io.BufferReader(buf) with nogil: c_filesystem = self.file_fragment.source().filesystem() opened = GetResultValue(c_filesystem.OpenInputFile( self.file_fragment.source().path())) out.set_random_access_file(opened) out.is_readable = True return out {code} Additionally, a ParquetFileFragment's metadata could be introspectable: {code} @property def metadata(self): from pyarrow._parquet import ParquetReader reader = ParquetReader() reader.open(self.open()) return reader.metadata {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)