Re: [parquet-dev] Efficiently obtaining the number of rows of a Parquet data store.

Brandon Amos Sat, 26 Jul 2014 07:20:06 -0700

Hi Parquet team,

I apologize for the simple question, but I'm using Parquet on HDFS in
a Scala/Spark application and am having trouble efficiently
obtaining the number of rows in my Parquet data stores without
loading and counting.


The README at https://github.com/apache/incubator-parquet-format
has great information about the format of the metadata,
and I want to extract the `num_rows` field from the
`FileMetaData` Thrift object.
However, the `_metadata` file contained in Parquet databases
contains many Thrift objects and other information
in addition to the `FileMetaData` object that I want to extract.

Can anybody give recommendations on how I can most efficiently
extract the `num_rows` field?

Thanks,
Brandon.

Re: [parquet-dev] Efficiently obtaining the number of rows of a Parquet data store.

Reply via email to