Re: Parquet Encoding - Enable DELTA_BINARY_PACKED

Gang Wu Tue, 27 Feb 2024 17:37:17 -0800

Hi Ridha,

DELTA_BINARY_PACKED is enabled for parquet v2 in the parquet-mr
implementation. Have you tried to set `parquet.writer.version` [1] to
PARQUET_2_0 in the Spark job? I'm not sure if this helps.


[1]
https://github.com/apache/parquet-mr/blob/86f90f57b7858ea1eede7bb8b6946c649d74f7e1/parquet-hadoop/README.md?plain=1#L130

Best,
Gang

On Wed, Feb 28, 2024 at 1:39 AM Ridha Khan <[email protected]> wrote:

> Hi Team,
>
> Hope you're all doing well.
> This is a query regarding the Parquet Encoding used by spark.
>
> We are interested in reducing the parquet file size to as small as
> possible. Looking at the nature of our data, DELTA_BINARY_PACKED seems to
> be a good option.
> However, with dictionary disabled, the DefaultV1ValuesWriter class defaults
> to the PlainValuesWriter.
>
> Is there a way to create a custom parquet writer which can be used by
> Spark?
> Appreciate your help on this.
>
> Thanks,
> Ridha
>

Re: Parquet Encoding - Enable DELTA_BINARY_PACKED

Reply via email to