Hi Parquet team, I apologize for the simple question, but I'm using Parquet on HDFS in a Scala/Spark application and am having trouble efficiently obtaining the number of rows in my Parquet data stores without loading and counting.
The README at https://github.com/apache/incubator-parquet-format has great information about the format of the metadata, and I want to extract the `num_rows` field from the `FileMetaData` Thrift object. However, the `_metadata` file contained in Parquet databases contains many Thrift objects and other information in addition to the `FileMetaData` object that I want to extract. Can anybody give recommendations on how I can most efficiently extract the `num_rows` field? Thanks, Brandon.
