[ https://issues.apache.org/jira/browse/BEAM-7568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Luke Cwik resolved BEAM-7568. ----------------------------- Fix Version/s: 2.23.0 Resolution: Fixed > Java dataflow harness re-encodes value state cells even if they haven't > changed > ------------------------------------------------------------------------------- > > Key: BEAM-7568 > URL: https://issues.apache.org/jira/browse/BEAM-7568 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow > Affects Versions: 2.13.0 > Reporter: Steve Niemitz > Assignee: Steve Niemitz > Priority: P2 > Fix For: 2.23.0 > > Time Spent: 50m > Remaining Estimate: 0h > > The java dataflow worker seems to re-encode ValueState cells after every work > item, even they weren't modified. > You can see here > [https://github.com/apache/beam/blob/a71bfda77df36aa1531f01533c372233cfba0dd9/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateInternals.java#L413] > that the value is always encoded (and used to weight the cache entry) even > if it won't be persisted back to windmill. > This can have some large performance implications if they values being stored > are expensive/large to encode, and infrequently modified. Ideally, the > weight would be also cached, and the value would only need to be modified if > it was changed. -- This message was sent by Atlassian Jira (v8.3.4#803005)