Joris Van den Bossche created ARROW-8290:
--------------------------------------------

             Summary: [Python][Dataset] Improve ergonomy of the 
FileSystemDataset constructor
                 Key: ARROW-8290
                 URL: https://issues.apache.org/jira/browse/ARROW-8290
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Python
            Reporter: Joris Van den Bossche


Currently, to manually create a FileSystemDataset, you can do something like:

{code}
dataset = ds.FileSystemDataset(
        schema, None, ds.ParquetFileFormat(), pa.fs.LocalFileSystem(),
        ["data_file1.parquet", "data_file2.parquet"],
        [ds.field('file') == 1, ds.field('file') == 2])
{code}

There are some usibility improvements we can do though:

- Allow passing the arguments by name to improve readability of the calling 
code (now they all need to be passed positionally, due to the way they are 
implemented in cython as {{not None}})
- I would maybe change the order of the arguments (eg start with the paths, we 
don't need to match the order of the C++ constructor)
- Potentially allow {{partitions}} to be optional, in which case they need to 
be set to a list of ScalarExpression(True) values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to