I think the metadata relationships diagram on the parquet-mr github page (
https://github.com/Parquet/parquet-format ) is out of date with what is in
the code.

I do not see any BlockMetaData in the diagram and seems like ColumnMetaData
has been renamed to ColumnChunkMetaData, also some of the variable names
have been changed.

Can anyone with the knowledge of the metadata's please update the diagram?
It will help a lot in understanding of the code.

Currently i am trying to figure out how to get the total count of the
number of rows for a given column and if nulls would be included in the
count or not. I am doing this so that i can read the entire columns in
memory (have to allocate exact amount of space) and then index them by the
primary key, so that i can do fast in memory lookups.

Any help would be appreciated.

Thanks,
~Pratik

Reply via email to