Xiao-zhen-Liu opened a new issue, #5885:
URL: https://github.com/apache/texera/issues/5885

   Parent: #5881 ยท Design: #5880
   
   ## Goal
   
   Connect the pieces so the cache is used. Look up matched ports when a 
workflow is submitted, save a result when an output port finishes, expose the 
cache endpoints, and remove entries when executions are deleted. After this PR 
the feature works on the backend.
   
   ## What is included
   
   - Submission-time cache matcher: `WorkflowExecutionService` asks the cache 
service for the matched ports and stores them on 
`WorkflowSettings.cachedOutputs`, which PR 3 reads.
   - Saving on completion: when an output port finishes, `PortCompletedHandler` 
sends a new `PortMaterialized` event; `ExecutionCacheService` receives it and 
writes a cache entry. The engine sends an event and the service layer handles 
it, matching the existing pattern for statistics and worker updates, so the 
engine stays unaware of the web layer.
   - Two small websocket events so the UI can show cache entries and which ones 
the current run can use, plus the state stores that hold them; both default to 
empty.
   - Cache endpoints on `WorkflowExecutionsResource`: list entries, clear all, 
clear for a selected operator, and remove entries whose cache key no longer 
matches the current workflow. The result location is left out of the listing.
   - Cleanup: when executions are deleted, or a computing unit is torn down, 
the cache entries those executions produced are removed.
   
   ## Behavior change to call out
   
   To make reuse possible, this PR stops eagerly clearing the previous run's 
results when a new run starts. This affects re-runs whether or not the cache is 
used, so it will be described clearly in the PR.
   
   ## Why the rest is safe with no matched ports
   
   The lookup returns nothing when the table is empty, so `cachedOutputs` stays 
empty and the scheduler behaves as in PR 3. The saving path is wrapped so a 
write failure logs and does not fail the run. The endpoints are new routes, and 
the cleanup is a no-op with no rows.
   
   ## Depends on
   
   PR 1 (cache table and service) and PR 3 (the scheduler that reads 
`cachedOutputs`). This PR activates the feature.
   
   ## Out of scope
   
   The cache panel and canvas display (PR 5). Any cost-based or eviction logic.
   
   ## Size
   
   About 750 lines of code, plus tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to