[ https://issues.apache.org/jira/browse/ARROW-11629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17313037#comment-17313037 ]
Matthias Rosenthaler edited comment on ARROW-11629 at 4/1/21, 9:26 AM: ----------------------------------------------------------------------- [~emkornfield],was wrong, parquet-dotnet seems to be working with the current version now, Apache drill doesn't. I uploaded a csv output of both tools so you are able to identify the differences. I did the following query on sample data to get a smaller subset of it: WHERE `operating_point` = 214 AND `statistic` = 'mean'\{{}} apache drill 1.18 is using parquet-mr 1.11.0 maybe thats the cause. The new version will include 1.12.0 was (Author: matthros): [~emkornfield],was wrong, parquet-dotnet seems to be working with the current version now, Apache drill doesn't. I uploaded a csv output of both tools so you are able to identify the differences. I did the following query on sample data to get a smaller subset of it: WHERE `operating_point` = 214 AND `statistic` = 'mean'\{{}} apache drill 1.18 is using parquet-mr 1.11.0 maybe thats the cause. The new version will include 1.12.0 > [C++] Writing float32 values with "Dictionary Encoding" makes parquet files > not readable for some tools > ------------------------------------------------------------------------------------------------------- > > Key: ARROW-11629 > URL: https://issues.apache.org/jira/browse/ARROW-11629 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python > Affects Versions: 3.0.0 > Reporter: Matthias Rosenthaler > Priority: Major > Attachments: drill_query.csv, foo.parquet, > image-2021-02-15-15-49-41-908.png, output.csv, output.parquet, > parquet-dotnet.csv > > > If I try to read the attached csv file with pyarrow, changing the float64 > columns to float32 and export it to parquet, the parquet file gets corrupted. > It is not readable for apache drill or Parquet.Net any longer. > > Update: Bug in "*Dictionary Encoding*" feature. If I switch it off for > float32 columns, everything works as expected. -- This message was sent by Atlassian Jira (v8.3.4#803005)