mystictraveler commented on issue #860:
URL: https://github.com/apache/iceberg-go/issues/860#issuecomment-4198077327

   **Diagnostic result: the partitionedFanoutWriter race hypothesis is 
REFUTED.**
   
   I ran the same reproduction with `config.EnvConfig.MaxWorkers = 1` (set 
programmatically before calling `tx.Overwrite`, confirmed firing via a print on 
every compaction). The row losses persist with effectively the same rate and 
shape:
   
   ```
   input rows 2100000 != staged rows 2098084   (Δ -1916)
   input rows 4500000 != staged rows 4490085   (Δ -9915)
   input rows 11400000 != staged rows 11395997 (Δ -4003)
   input rows 2850000 != staged rows 2849029   (Δ -971)
   input rows 6100000 != staged rows 6089842   (Δ -10158)
   input rows 14800000 != staged rows 14796050 (Δ -3950)
   input rows 17800000 != staged rows 17791987 (Δ -8013)
   input rows 4500000 != staged rows 4498973   (Δ -1027)
   input rows 9200000 != staged rows 9190034   (Δ -9966)
   ```
   
   (15 override-confirmation prints in the log; 18 master-check failures + 2 
CAS-retry exhaustions across 25 compaction attempts.)
   
   So **the bug is upstream of the worker fanout** — there's no concurrent 
`getOrCreateRollingDataWriter` contention at MaxWorkers=1, yet rows still 
vanish. Apologies for the misdirection in the original report; please disregard 
the "Suspect: partitionedFanoutWriter" section as the leading hypothesis.
   
   **Where I'd look next** (haven't yet had time to dig in, but recording the 
angles in case any of them ring a bell to a maintainer):
   
   1. **Snapshot pinning between scan and classifier.** Our compactor calls 
`tbl.Scan(filter).ToArrowRecords` (which materializes the file list eagerly via 
`PlanFiles`) and then `tbl.NewTransaction()` on the same `tbl`. The 
transaction's `t.meta = MetadataBuilderFromBase(t.metadata, 
t.metadataLocation)` — does the builder *re-read* metadata from the metadata 
location, or is it a pure in-memory copy of the supplied `t.metadata`? If it 
re-reads, a streamer commit between the scan's `PlanFiles` and the 
transaction's metadata snapshot would put the classifier on a *newer* snapshot 
than the reader, and `classifyFilesForFilteredDeletions` would then mark for 
deletion files the reader never had a chance to read. That matches the symptom 
exactly: more rows deleted than rewritten.
   
   2. **`updateSnapshot` / `mergeOverwrite` reading the manifest list at 
commit-stage time** rather than at transaction-creation time. If the staged 
producer walks the manifest list afresh when materializing the new manifest 
list (rather than using the snapshot the classifier saw), and a foreign commit 
lands between, the same window opens.
   
   3. **`SetProperties` mutating `t.meta` mid-transaction.** Line 889-898 of 
`transaction.go` writes a `DefaultNameMappingKey` property if `NameMapping()` 
is nil. If that path bumps the metadata's effective current-snapshot pointer in 
any way, the staged producer would see a newer snapshot. Probably benign but 
worth ruling out.
   
   4. **The streaming `RecordReader` adapter on our side.** Our adapter wraps 
the `iter.Seq2` from `ToArrowRecords` via `iter.Pull2`, and exposes 
`Next()/RecordBatch()/Err()/Retain()/Release()`. `iter.Pull2` should be 
lossless, but if the reader's `Release()` is called before iteration completes 
(e.g. due to a `Retain` count bug interacting with `IterFromReader`'s `defer 
rdr.Release()`), batches could be silently dropped. I'm checking this on our 
side.
   
   **What I'm doing on the janitor side**: replacing `tx.Overwrite` with 
`Transaction.ReplaceDataFilesWithDataFiles` (table/transaction.go:599) 
entirely, which takes explicit `filesToDelete` / `filesToAdd` lists with no 
row-filter classification path. This sidesteps both the classifier and the 
fanout writer in one move and doesn't depend on whichever of (1)–(4) above 
turns out to be the actual mechanism. I'll report back with the result.
   
   Happy to provide a smaller, more focused reproduction (just iceberg-go, no 
janitor harness) if that would help triage. Let me know.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to