aglinxinyuan opened a new issue, #4920:
URL: https://github.com/apache/texera/issues/4920

   While reviewing #4912, [Yicong-Huang 
noted](https://github.com/apache/texera/pull/4912#discussion_r3179057346) that 
there should be an API on `ProgressiveUtils` (or a sibling helper) that 
*resolves* a stream of insertion/retraction-flagged tuples down to a final 
materialized result — applying retractions to undo prior insertions.
   
   A search of the current codebase confirms no such public API exists today:
   - `ProgressiveUtils` exposes only producers (`addInsertionFlag`, 
`addRetractionFlag`) and per-tuple readers (`isInsertion`, 
`getTupleFlagAndValue`).
   - No downstream consumer references `insertRetractFlagAttr`, 
`__internal_is_insertion`, or `getTupleFlagAndValue` — so the flag column has 
no production reader applying retractions.
   
   This issue tracks adding that API. Suggested shape:
   
   ```scala
   object ProgressiveUtils {
     // Fold a stream of flagged tuples into the materialized "current" set:
     // an insertion-flagged tuple is added; a retraction-flagged tuple removes
     // any previously-inserted tuple that matches by value.
     def materialize(flagged: Iterator[Tuple]): Set[Tuple] = ...
   }
   ```
   
   Open questions for the implementer:
   1. Should the result keep ordering (use `LinkedHashSet`/`Vector`) or is 
`Set` fine?
   2. Equality basis for "matches by value" — Tuple already has a value-based 
`equals`, so the default Set semantics should be enough, but worth confirming 
downstream sinks agree.
   3. What should happen if a retraction arrives for a tuple that was never 
inserted? Today the unflagged-default reads as insertion; the materializer 
should presumably ignore an unmatched retraction (or log a warning).
   
   Tests to add alongside the API:
   - Insertion-only stream → all inserted tuples present.
   - Insertion + matching retraction → the retracted tuple is gone.
   - Out-of-order retraction (retraction first, no prior matching insertion) → 
consistent behavior (probably no-op).
   - Insertion + retraction + re-insertion → the tuple is back in the result.
   - Mixed-type tuple payloads (the round-trip is already exercised in 
`ProgressiveUtilsSpec`).
   
   Out of scope of #4912 (which is test-only); intentionally split out so the 
test PR stays narrow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to