[I] Flink API rewriteDataFile How to set up scanning based on file size [iceberg]

via GitHub Wed, 27 Dec 2023 03:08:12 -0800


GuoZhaoY opened a new issue, #9386:
URL: https://github.com/apache/iceberg/issues/9386


   ### Query engine
   
         flink  code :  Actions.forTable(executionEnvironment,table)
                         .rewriteDataFiles()
                         .maxParallelism(maxParallelism)
                         .filter(Expressions.equal(xxxx,xxxxx))
                         .targetSizeInBytes(TARGET_FILE_SIZE)
                         .execute();
   
   ### Question
   
   Using Flink for small file merging, assuming that I set the target size to 
128M, I found that files larger than 128M were also scanned and merged. This is 
very unreasonable. I should scan and merge files smaller than the target size. 
How should I set this. The Filter() found here can only set partition related 
filtering conditions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] Flink API rewriteDataFile How to set up scanning based on file size [iceberg]

Reply via email to