Steve Niemitz created BEAM-7568: ----------------------------------- Summary: Java dataflow harness re-encodes value state cells even if they haven't changed Key: BEAM-7568 URL: https://issues.apache.org/jira/browse/BEAM-7568 Project: Beam Issue Type: Improvement Components: runner-dataflow Affects Versions: 2.13.0 Reporter: Steve Niemitz
The java dataflow worker seems to re-encode ValueState cells after every work item, even they weren't modified. You can see here [https://github.com/apache/beam/blob/a71bfda77df36aa1531f01533c372233cfba0dd9/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateInternals.java#L413] that the value is always encoded (and used to weight the cache entry) even if it won't be persisted back to windmill. This can have some large performance implications if they values being stored are expensive/large to encode, and infrequently modified. Ideally, the weight would be also cached, and the value would only need to be modified if it was changed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)