[ 
https://issues.apache.org/jira/browse/ARROW-9459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche updated ARROW-9459:
-----------------------------------------
    Description: 
See some timing checks here: 
https://github.com/dask/dask/pull/6346#issuecomment-656548675

Parsing all statistics, even from a centralized {{_metadata}} file, can be 
quite expensive. If you know in advance that you are not going to use them (eg 
you are only going to do filtering on the partition fields, and otherwise read 
all data), it could be nice to have an option to disable parsing statistics.

cc [~rjzamora] [~bkietz] [~fsaintjacques]

  was:
See some timing checks here: 
https://github.com/dask/dask/pull/6346#issuecomment-656548675

Parsing all statistics, even from a centralized {{_metadata}} file can be quite 
expensive. If you know in advance that you are not going to use them (eg you 
are only going to do filtering on the partition fields, and otherwise read all 
data), it could be nice to have an option to disable parsing statistics.

cc [~rjzamora] [~bkietz] [~fsaintjacques]


> [C++][Dataset] Make collecting/parsing statistics optional for ParquetFragment
> ------------------------------------------------------------------------------
>
>                 Key: ARROW-9459
>                 URL: https://issues.apache.org/jira/browse/ARROW-9459
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Joris Van den Bossche
>            Priority: Major
>              Labels: dataset, dataset-dask-integration
>
> See some timing checks here: 
> https://github.com/dask/dask/pull/6346#issuecomment-656548675
> Parsing all statistics, even from a centralized {{_metadata}} file, can be 
> quite expensive. If you know in advance that you are not going to use them 
> (eg you are only going to do filtering on the partition fields, and otherwise 
> read all data), it could be nice to have an option to disable parsing 
> statistics.
> cc [~rjzamora] [~bkietz] [~fsaintjacques]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to