wecharyu opened a new issue, #12071:
URL: https://github.com/apache/gluten/issues/12071
### Backend
VL (Velox)
### Bug description
## Problem
Spark builds the write-side Hadoop configuration with
`sessionState.newHadoopConfWithOptions(options)` before invoking the file
writer. This makes configs provided as `spark.hadoop.<key>` visible to the
underlying Parquet writer as `<key>`.
Gluten Velox native write currently builds native write parameters from
write options only, so configs coming from Spark HadoopConf are not propagated.
## Reproduction
Enable the Velox native writer and set a Parquet write config through Spark
HadoopConf, for example:
```scala
spark.conf.set("spark.hadoop.parquet.enable.dictionary", "false")
```
### Expected Behavior
The native writer should respect HadoopConf-backed Parquet write configs in
the same way Spark's native file write path does.
For spark.hadoop.parquet.enable.dictionary=false, the written Parquet footer
should not contain dictionary encodings such as RLE_DICTIONARY or
PLAIN_DICTIONARY.
### Actual Behavior
The native writer ignores the spark.hadoop.* config, and the Parquet footer
still shows dictionary encoding.
### Gluten version
main branch
### Spark version
None
### Spark configurations
_No response_
### System information
_No response_
### Relevant logs
```bash
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]