Hi Parquet team,

I apologize for the simple question, but I'm using Parquet on HDFS in
a Scala/Spark application and am having trouble efficiently
obtaining the number of rows in my Parquet data stores without
loading and counting.

The README at https://github.com/apache/incubator-parquet-format
has great information about the format of the metadata,
and I want to extract the `num_rows` field from the
`FileMetaData` Thrift object.
However, the `_metadata` file contained in Parquet databases
contains many Thrift objects and other information
in addition to the `FileMetaData` object that I want to extract.

Can anybody give recommendations on how I can most efficiently
extract the `num_rows` field?

Thanks,
Brandon.

Reply via email to