Hi Ridha, DELTA_BINARY_PACKED is enabled for parquet v2 in the parquet-mr implementation. Have you tried to set `parquet.writer.version` [1] to PARQUET_2_0 in the Spark job? I'm not sure if this helps.
[1] https://github.com/apache/parquet-mr/blob/86f90f57b7858ea1eede7bb8b6946c649d74f7e1/parquet-hadoop/README.md?plain=1#L130 Best, Gang On Wed, Feb 28, 2024 at 1:39 AM Ridha Khan <[email protected]> wrote: > Hi Team, > > Hope you're all doing well. > This is a query regarding the Parquet Encoding used by spark. > > We are interested in reducing the parquet file size to as small as > possible. Looking at the nature of our data, DELTA_BINARY_PACKED seems to > be a good option. > However, with dictionary disabled, the DefaultV1ValuesWriter class defaults > to the PlainValuesWriter. > > Is there a way to create a custom parquet writer which can be used by > Spark? > Appreciate your help on this. > > Thanks, > Ridha >
