Re: [PR] Add ParquetFileMerger for efficient row-group level file merging [iceberg]

via GitHub Tue, 10 Feb 2026 08:44:33 -0800


RussellSpitzer commented on code in PR #14435:
URL: https://github.com/apache/iceberg/pull/14435#discussion_r2789091619



##########
api/src/main/java/org/apache/iceberg/actions/RewriteDataFiles.java:
##########
@@ -147,6 +147,32 @@ public interface RewriteDataFiles
    */
   String OUTPUT_SPEC_ID = "output-spec-id";
 
+  /**
+   * Use Parquet row-group level merging during rewrite operations when 
applicable.
+   *
+   * <p>When enabled, Parquet files will be merged at the row-group level by 
directly copying row

Review Comment:
   I'm not sure I follow the problem. I thought it was generally bad practice 
to have so many row groups in a single file, i'm also not a fan of how careful 
we have to be on schema and field id matching.
   
   @lintingbin Why would we have to do 3) ? Why not just not mark the 150mb 
file for compaction if that's an issue? You could always just have compaction 
only compacts files smaller than 10 mb or what not?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add ParquetFileMerger for efficient row-group level file merging [iceberg]

Reply via email to