Re: Missing min/max statistics in file footer

2017-02-10 Thread Lars Volker
In that case I don't see why reading the stats shouldn't work, assuming they are in the file in the first place. I don't know why writing them would fail, so unless someone else can help you, you may have to debug the code that writes them. On Fri, Feb 10, 2017 at 8:31 PM, Pradeep Gollakota

Re: Missing min/max statistics in file footer

2017-02-10 Thread Pradeep Gollakota
metadata.getFileMetadata().createdBy() shows this "parquet-mr version 1.9.1-SNAPSHOT (build 2fd62ee4d524c270764e9b91dca72e5cf1a005b7)" Ignore the 1.9.1-SNAPSHOT... that's my local build as I'm trying to work on PARQUET-869 On Fri, Feb 10, 2017

Re: Missing min/max statistics in file footer

2017-02-10 Thread Lars Volker
Can you check the value of ParquetMetaData.created_by? Once you have that, you should see if it gets filtered by the code in CorruptStatistics.java. On Fri, Feb 10, 2017 at 7:11 PM, Pradeep Gollakota wrote: > Data was written with Spark but I'm using the parquet APIs

Re: Missing min/max statistics in file footer

2017-02-10 Thread Pradeep Gollakota
Data was written with Spark but I'm using the parquet APIs directly for reads. I checked the stats in the footer with the following code. ParquetMetadata metadata = ParquetFileReader.readFooter(conf, path, ParquetMetadataConverter.NO_FILTER); ColumnPath deviceId = ColumnPath.get("deviceId");

Re: Missing min/max statistics in file footer

2017-02-10 Thread Lars Volker
Hi Pradeep, I don't have any experience with using Parquet APIs through Spark. That being said, there are currently several issues around column statistics, both in the format and in the parquet-mr implementation (PARQUET-686, PARQUET-839, PARQUET-840). However, in your case and depending on the

Re: Missing min/max statistics in file footer

2017-02-10 Thread Pradeep Gollakota
Bumping the thread to see if I get any responses. On Wed, Feb 8, 2017 at 6:49 PM, Pradeep Gollakota wrote: > Hi folks, > > I generated a bunch of parquet files using spark and > ParquetThriftOutputFormat. The thirft model has a column called "deviceId" > which is a string

[jira] [Commented] (PARQUET-678) Allow for custom compression codecs

2017-02-10 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15860944#comment-15860944 ] Uwe L. Korn commented on PARQUET-678: - [~cotton] A patch would be very welcome, I can help for that

[jira] [Commented] (PARQUET-678) Allow for custom compression codecs

2017-02-10 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15860940#comment-15860940 ] Uwe L. Korn commented on PARQUET-678: - Adding them to parquet-cpp and parquet-format is easy, the