I also think most of the proposed benefits from these new formats can be achieved using the current parquet format and improved implementations.
My concern is that: 1. For encoding, though so many interesting encoding is introduced, most implementation now just uses and implements PLAIN and Dictionary. We can make full use of current encoding and introduce some new encoding allowing skip, compress and read data in some specific scenario. 2. We can start optimizing for semi-structure and ML data. And we can do specific optimization for these case like[1] Rep-Level and Def-Level is feature rich, however we can also optimize when not necessary to read them. Besides, we can support some type like geo within Parquet [1] https://github.com/apache/arrow/issues/34510#issuecomment-2109768275