Xiao-zhen-Liu opened a new issue, #5885: URL: https://github.com/apache/texera/issues/5885
Parent: #5881 ยท Design: #5880 ## Goal Connect the pieces so the cache is used. Look up matched ports when a workflow is submitted, save a result when an output port finishes, expose the cache endpoints, and remove entries when executions are deleted. After this PR the feature works on the backend. ## What is included - Submission-time cache matcher: `WorkflowExecutionService` asks the cache service for the matched ports and stores them on `WorkflowSettings.cachedOutputs`, which PR 3 reads. - Saving on completion: when an output port finishes, `PortCompletedHandler` sends a new `PortMaterialized` event; `ExecutionCacheService` receives it and writes a cache entry. The engine sends an event and the service layer handles it, matching the existing pattern for statistics and worker updates, so the engine stays unaware of the web layer. - Two small websocket events so the UI can show cache entries and which ones the current run can use, plus the state stores that hold them; both default to empty. - Cache endpoints on `WorkflowExecutionsResource`: list entries, clear all, clear for a selected operator, and remove entries whose cache key no longer matches the current workflow. The result location is left out of the listing. - Cleanup: when executions are deleted, or a computing unit is torn down, the cache entries those executions produced are removed. ## Behavior change to call out To make reuse possible, this PR stops eagerly clearing the previous run's results when a new run starts. This affects re-runs whether or not the cache is used, so it will be described clearly in the PR. ## Why the rest is safe with no matched ports The lookup returns nothing when the table is empty, so `cachedOutputs` stays empty and the scheduler behaves as in PR 3. The saving path is wrapped so a write failure logs and does not fail the run. The endpoints are new routes, and the cleanup is a no-op with no rows. ## Depends on PR 1 (cache table and service) and PR 3 (the scheduler that reads `cachedOutputs`). This PR activates the feature. ## Out of scope The cache panel and canvas display (PR 5). Any cost-based or eviction logic. ## Size About 750 lines of code, plus tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
