mystictraveler commented on issue #860: URL: https://github.com/apache/iceberg-go/issues/860#issuecomment-4198077327
**Diagnostic result: the partitionedFanoutWriter race hypothesis is REFUTED.** I ran the same reproduction with `config.EnvConfig.MaxWorkers = 1` (set programmatically before calling `tx.Overwrite`, confirmed firing via a print on every compaction). The row losses persist with effectively the same rate and shape: ``` input rows 2100000 != staged rows 2098084 (Δ -1916) input rows 4500000 != staged rows 4490085 (Δ -9915) input rows 11400000 != staged rows 11395997 (Δ -4003) input rows 2850000 != staged rows 2849029 (Δ -971) input rows 6100000 != staged rows 6089842 (Δ -10158) input rows 14800000 != staged rows 14796050 (Δ -3950) input rows 17800000 != staged rows 17791987 (Δ -8013) input rows 4500000 != staged rows 4498973 (Δ -1027) input rows 9200000 != staged rows 9190034 (Δ -9966) ``` (15 override-confirmation prints in the log; 18 master-check failures + 2 CAS-retry exhaustions across 25 compaction attempts.) So **the bug is upstream of the worker fanout** — there's no concurrent `getOrCreateRollingDataWriter` contention at MaxWorkers=1, yet rows still vanish. Apologies for the misdirection in the original report; please disregard the "Suspect: partitionedFanoutWriter" section as the leading hypothesis. **Where I'd look next** (haven't yet had time to dig in, but recording the angles in case any of them ring a bell to a maintainer): 1. **Snapshot pinning between scan and classifier.** Our compactor calls `tbl.Scan(filter).ToArrowRecords` (which materializes the file list eagerly via `PlanFiles`) and then `tbl.NewTransaction()` on the same `tbl`. The transaction's `t.meta = MetadataBuilderFromBase(t.metadata, t.metadataLocation)` — does the builder *re-read* metadata from the metadata location, or is it a pure in-memory copy of the supplied `t.metadata`? If it re-reads, a streamer commit between the scan's `PlanFiles` and the transaction's metadata snapshot would put the classifier on a *newer* snapshot than the reader, and `classifyFilesForFilteredDeletions` would then mark for deletion files the reader never had a chance to read. That matches the symptom exactly: more rows deleted than rewritten. 2. **`updateSnapshot` / `mergeOverwrite` reading the manifest list at commit-stage time** rather than at transaction-creation time. If the staged producer walks the manifest list afresh when materializing the new manifest list (rather than using the snapshot the classifier saw), and a foreign commit lands between, the same window opens. 3. **`SetProperties` mutating `t.meta` mid-transaction.** Line 889-898 of `transaction.go` writes a `DefaultNameMappingKey` property if `NameMapping()` is nil. If that path bumps the metadata's effective current-snapshot pointer in any way, the staged producer would see a newer snapshot. Probably benign but worth ruling out. 4. **The streaming `RecordReader` adapter on our side.** Our adapter wraps the `iter.Seq2` from `ToArrowRecords` via `iter.Pull2`, and exposes `Next()/RecordBatch()/Err()/Retain()/Release()`. `iter.Pull2` should be lossless, but if the reader's `Release()` is called before iteration completes (e.g. due to a `Retain` count bug interacting with `IterFromReader`'s `defer rdr.Release()`), batches could be silently dropped. I'm checking this on our side. **What I'm doing on the janitor side**: replacing `tx.Overwrite` with `Transaction.ReplaceDataFilesWithDataFiles` (table/transaction.go:599) entirely, which takes explicit `filesToDelete` / `filesToAdd` lists with no row-filter classification path. This sidesteps both the classifier and the fanout writer in one move and doesn't depend on whichever of (1)–(4) above turns out to be the actual mechanism. I'll report back with the result. Happy to provide a smaller, more focused reproduction (just iceberg-go, no janitor harness) if that would help triage. Let me know. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
