wecharyu opened a new issue, #12071:
URL: https://github.com/apache/gluten/issues/12071

   ### Backend
   
   VL (Velox)
   
   ### Bug description
   
   ## Problem
   
   Spark builds the write-side Hadoop configuration with 
`sessionState.newHadoopConfWithOptions(options)` before invoking the file 
writer. This makes configs provided as `spark.hadoop.<key>` visible to the 
underlying Parquet writer as `<key>`.
   
   Gluten Velox native write currently builds native write parameters from 
write options only, so configs coming from Spark HadoopConf are not propagated.
   
   ## Reproduction
   
   Enable the Velox native writer and set a Parquet write config through Spark 
HadoopConf, for example:
   
   ```scala
   spark.conf.set("spark.hadoop.parquet.enable.dictionary", "false")
   ```
   
   ### Expected Behavior
   The native writer should respect HadoopConf-backed Parquet write configs in 
the same way Spark's native file write path does.
   
   For spark.hadoop.parquet.enable.dictionary=false, the written Parquet footer 
should not contain dictionary encodings such as RLE_DICTIONARY or 
PLAIN_DICTIONARY.
   
   ### Actual Behavior
   The native writer ignores the spark.hadoop.* config, and the Parquet footer 
still shows dictionary encoding.
   
   
   ### Gluten version
   
   main branch
   
   ### Spark version
   
   None
   
   ### Spark configurations
   
   _No response_
   
   ### System information
   
   _No response_
   
   ### Relevant logs
   
   ```bash
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to