arunpandianp commented on code in PR #38458:
URL: https://github.com/apache/beam/pull/38458#discussion_r3240125870


##########
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowStreamingPipelineOptions.java:
##########
@@ -249,6 +249,16 @@ public interface DataflowStreamingPipelineOptions extends 
PipelineOptions {
 
   void setIsWindmillServiceDirectPathEnabled(boolean 
isWindmillServiceDirectPathEnabled);
 
+  /**
+   * The maximum size of cached values in bytes. Values larger than this limit 
will not be cached by

Review Comment:
   Yes, side inputs are not included in the state cache.
   
   Renamed to `MaxWindmillStateCacheEntryBytes` to be generic. 
   
   Now we have a single state cache with this optional entry byte limit.
   
   > We could for example choose to cache large value states but not large bags.
   
   Even when there are a few large entries, they can evict most small items 
from the cache. The default cache size (configurable) is only 100MB. The added 
setting is optional and it does not change default behavior, would like to keep 
a single limit across state types.
   
   My desired end state is to have 2 caches inside WindmillStateCache, one for 
caching small entries and another for caching large entries. With a separate 
large entry cache, we won't need to differentiate by state type. I'll add the 
large cache in future PRs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to