[ 
https://issues.apache.org/jira/browse/PARQUET-1784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Szadovszky updated PARQUET-1784:
--------------------------------------
    Description: 
After adding some new statistics and encodings into Parquet it is getting very 
hard to be smart and choose the best configs automatically. For example for 
which columns should we save column index and/or bloom-filters? Is it worth 
using dictionary for a column that we know will fall back to another encoding?

The idea of this feature is to allow the library user to fine-tune the 
configuration by setting it column-wise. To support this we extend the existing 
configuration keys by a suffix to identify the related column. (From now on we 
introduce new keys following the same syntax.)
 \{key of the configuration}{{#}}\{column path in the file schema}
 For example: {{parquet.enable.dictionary#column.path.col_1}}

This jira covers the framework to support the column-wise configuration with 
the implementation of some existing configs where it make sense (e.g. 
{{parquet.enable.dictionary}}). Implementing new configuration is not part of 
this effort.

  was:
After adding some new statistics and encodings into Parquet it is getting very 
hard to be smart and choose the best configs automatically. For example for 
which columns should we save column index and/or bloom-filters? Is it worth 
using dictionary for a column that we know will fall back to another encoding?

The idea of this feature is to allow the library user to fine-tune the 
configuration by setting it column-wise. To support this we extend the existing 
configuration keys by a suffix to identify the related column. (From now on we 
introduce new keys following the same syntax.)
 \{key of the configuration}{{#}}{column path in the file schema}
 For example: {{parquet.enable.dictionary#column.path.col_1}}

This jira covers the framework to support the column-wise configuration with 
the implementation of some existing configs where it make sense (e.g. 
{{parquet.enable.dictionary}}). Implementing new configuration is not part of 
this effort.


> Column-wise configuration
> -------------------------
>
>                 Key: PARQUET-1784
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1784
>             Project: Parquet
>          Issue Type: New Feature
>          Components: parquet-mr
>            Reporter: Gabor Szadovszky
>            Assignee: Gabor Szadovszky
>            Priority: Major
>
> After adding some new statistics and encodings into Parquet it is getting 
> very hard to be smart and choose the best configs automatically. For example 
> for which columns should we save column index and/or bloom-filters? Is it 
> worth using dictionary for a column that we know will fall back to another 
> encoding?
> The idea of this feature is to allow the library user to fine-tune the 
> configuration by setting it column-wise. To support this we extend the 
> existing configuration keys by a suffix to identify the related column. (From 
> now on we introduce new keys following the same syntax.)
>  \{key of the configuration}{{#}}\{column path in the file schema}
>  For example: {{parquet.enable.dictionary#column.path.col_1}}
> This jira covers the framework to support the column-wise configuration with 
> the implementation of some existing configs where it make sense (e.g. 
> {{parquet.enable.dictionary}}). Implementing new configuration is not part of 
> this effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to