I think the metadata relationships diagram on the parquet-mr github page ( https://github.com/Parquet/parquet-format ) is out of date with what is in the code.
I do not see any BlockMetaData in the diagram and seems like ColumnMetaData has been renamed to ColumnChunkMetaData, also some of the variable names have been changed. Can anyone with the knowledge of the metadata's please update the diagram? It will help a lot in understanding of the code. Currently i am trying to figure out how to get the total count of the number of rows for a given column and if nulls would be included in the count or not. I am doing this so that i can read the entire columns in memory (have to allocate exact amount of space) and then index them by the primary key, so that i can do fast in memory lookups. Any help would be appreciated. Thanks, ~Pratik
