Xiao-zhen-Liu opened a new issue, #5882: URL: https://github.com/apache/texera/issues/5882
Parent: #5881 ยท Design: #5880 ## Goal Add the storage layer for the cache: a table to hold cache entries, the code that reads and writes it, and the cache-key computation that identifies the computation behind an output port. Adds code only; nothing uses it during execution yet, so it changes no behavior. ## What is included - New table `operator_port_cache`, primary key `(workflow_id, global_port_id, subdag_hash)`. Columns: the cache key and its hash, the result location, an optional tuple count, the source execution, and a database-managed updated-at timestamp. Added to the schema file and as a migration. - `OperatorPortCacheDao`: get by key, list by workflow, upsert, delete. - `OperatorPortCacheService`: look up matched ports for a plan, write an entry when a port finishes, and invalidate entries. Computes cache keys and serializes the port id. - The cache-key computation: from an output port's upstream operators, their parameters, their output schemas, and the wiring, produce a value whose hash is the cache key. Same computation gives the same key; any upstream edit changes it. - The `CachedOutput` type and a defaulted, empty `cachedOutputs` field on `WorkflowSettings`. ## Why this is safe The table starts empty and nothing reads or writes it during execution in this PR. The new `WorkflowSettings` field defaults to empty. The cache-key computation is a pure function. ## Generated code The DAO uses generated jOOQ classes for the new table; generated classes are not committed, so this PR ships the `.sql` schema and migration and the build regenerates the rest. Adding to the schema file sends an automatic dev-list notification, so I will post a short heads-up about the table first. ## Depends on Nothing. PRs 3 and 4 depend on this. ## Out of scope No cost or eviction columns, and no execution-time use of the cache (PRs 3 and 4). ## Size About 900 lines of code, plus tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
