Dear contributors, My PR has now gathered comments for a week and the gist of all open issues is the question of how to encode pages/column chunks that contain only NaNs. There are different suggestions and I don't see one common favorite yet.
I have outlined three alternatives of how we can handle these and I want us to reach a conclusion here, so I can update my PR accordingly and move on with it. As this is my first contribution to parquet, I don't know the decision processes here. Do we vote? Is there a single or group of decision makers? *Please let me know how to come to a conclusion here; what are the next steps?* For reference, here are the three alternatives I pointed out. You can find detailed description of their PROs and CONs in my comment: https://github.com/apache/parquet-format/pull/196#issuecomment-1486416762 1. My initial proposal, i.e., encoding only-NaN pages by min=max=NaN. 2. Adding `num_values` to the ColumnIndex, to make it symmetric with Statistics in pages & `ColumnMetaData` and to enable the computation `num_values - null_count - nan_count == 0` 3. Adding a `nan_pages` bool list to the column index, which indicates whether a page contains only NaNs Cheers Jan Finis