Xiao-zhen-Liu opened a new issue, #5882:
URL: https://github.com/apache/texera/issues/5882

   Parent: #5881 ยท Design: #5880
   
   ## Goal
   
   Add the storage layer for the cache: a table to hold cache entries, the code 
that reads and writes it, and the cache-key computation that identifies the 
computation behind an output port. Adds code only; nothing uses it during 
execution yet, so it changes no behavior.
   
   ## What is included
   
   - New table `operator_port_cache`, primary key `(workflow_id, 
global_port_id, subdag_hash)`. Columns: the cache key and its hash, the result 
location, an optional tuple count, the source execution, and a database-managed 
updated-at timestamp. Added to the schema file and as a migration.
   - `OperatorPortCacheDao`: get by key, list by workflow, upsert, delete.
   - `OperatorPortCacheService`: look up matched ports for a plan, write an 
entry when a port finishes, and invalidate entries. Computes cache keys and 
serializes the port id.
   - The cache-key computation: from an output port's upstream operators, their 
parameters, their output schemas, and the wiring, produce a value whose hash is 
the cache key. Same computation gives the same key; any upstream edit changes 
it.
   - The `CachedOutput` type and a defaulted, empty `cachedOutputs` field on 
`WorkflowSettings`.
   
   ## Why this is safe
   
   The table starts empty and nothing reads or writes it during execution in 
this PR. The new `WorkflowSettings` field defaults to empty. The cache-key 
computation is a pure function.
   
   ## Generated code
   
   The DAO uses generated jOOQ classes for the new table; generated classes are 
not committed, so this PR ships the `.sql` schema and migration and the build 
regenerates the rest. Adding to the schema file sends an automatic dev-list 
notification, so I will post a short heads-up about the table first.
   
   ## Depends on
   
   Nothing. PRs 3 and 4 depend on this.
   
   ## Out of scope
   
   No cost or eviction columns, and no execution-time use of the cache (PRs 3 
and 4).
   
   ## Size
   
   About 900 lines of code, plus tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to