twuebi commented on code in PR #948:
URL: https://github.com/apache/iceberg-go/pull/948#discussion_r3159648062
##########
table/write_records.go:
##########
@@ -52,6 +54,37 @@ func WithWriteUUID(id uuid.UUID) WriteRecordOption {
}
}
+// WithMaxWriteWorkers overrides the default number of fanout workers
+// used for partitioned writes. Each worker processes record batches,
+// partitions them, and writes to the appropriate partition files.
+// Fewer workers means fewer concurrent parquet writers compressing
+// pages simultaneously, which reduces peak memory. A value of 0
+// (the default) uses [config.EnvConfig.MaxWorkers].
+//
+// This option is ignored when [WithClusteredWrite] is set.
+func WithMaxWriteWorkers(n int) WriteRecordOption {
+ return func(c *writeRecordConfig) {
+ c.maxWriteWorkers = n
+ }
+}
+
+// WithClusteredWrite enables the memory-efficient clustered write
+// path for partitioned tables. It keeps at most one partition writer
+// open at a time: when a record arrives for a new partition, the
+// current writer is flushed and closed before a new one is opened.
+//
+// The input must be strictly clustered by partition: once a
+// partition's writer has been closed, encountering further records
+// for that partition returns an error. This is the natural order for
+// compaction, where each source data file typically belongs to a
+// single partition. If the input is not clustered, use the fanout
+// writer (the default) instead.
+func WithClusteredWrite() WriteRecordOption {
Review Comment:
since partitions are a map with unspecified ordering, sorting is the only
way to make it deterministic, IIRC, we cannot mimic java's clusteredwriter here
since that operates on row level while we have batches here
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]