Ma77Ball opened a new pull request, #5599:
URL: https://github.com/apache/texera/pull/5599

   ### What changes were proposed in this PR?
   - Replace the per-read `deepcopy` in `Tuple.as_dict()` 
(`amber/src/main/python/core/models/tuple.py`) with a shallow copy, so reading 
a tuple no longer recursively clones every field value; cost now scales with 
field count instead of total field byte size.
   - This path is hot: `as_dict()` backs `as_series()` (per-tuple in the batch 
operator path) and `as_key_value_pairs()`; a tuple carrying a large binary 
field previously duplicated that whole payload on every read.
   - The deepcopy's isolation was unnecessary: `as_dict()` has no callers 
outside `Tuple`, its two users immediately build a new container, and the 
Tuple's mutators only reassign dict slots (never mutate a value in place), so a 
shallow copy preserves the independent-dict contract.
   - Remove the now-unused `from copy import deepcopy` import and document why 
the shallow copy is safe.
   ### Any related issues, documentation, discussions?
   N/A
   ### How was this PR tested?
   - Existing tests only, no behavior change. Run `cd amber/src/main/python && 
python -m pytest ../../test/python/core/models/test_tuple.py -q`, expect 23 
passed (covers `as_dict`/`as_series`/`as_key_value_pairs`).
   - Run `cd amber/src/main/python && python -m pytest 
../../test/python/core/runnables/test_main_loop.py 
../../test/python/core/architecture/managers/test_tuple_processing_manager.py 
-q`, expect 22 passed (exercises the batch read path that calls `as_series`).
   ### Was this PR authored or co-authored using generative AI tooling?
   Generated-by: Claude Opus 4.8
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to