[ https://issues.apache.org/jira/browse/ARROW-9063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134085#comment-17134085 ]
William Liu commented on ARROW-9063: ------------------------------------ Thank you guys. I will install the nightly build and try today. > [Python][C++] Order of files are not respected using the new pyarrow.dataset > ---------------------------------------------------------------------------- > > Key: ARROW-9063 > URL: https://issues.apache.org/jira/browse/ARROW-9063 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python > Affects Versions: 0.17.1 > Environment: ubuntu-18.04 > Reporter: William Liu > Priority: Critical > Labels: bug, dataset > Original Estimate: 6h > Remaining Estimate: 6h > > Say we have multiple parquet files under the same folder (a.parquet, > b.parquet, c.parquet). If I pass a list of file paths into either of the two > statements below > {code:java} > ds = pq.ParquetDataset(fps, use_legacy_dataset=False) > ds = pyarrow.dataset(fps){code} > Then rows of the resulting table will have: > aaaa...bbbb...aaa...bbbb...aaa...ccc..bbb...cccc > -- This message was sent by Atlassian Jira (v8.3.4#803005)