[ 
https://issues.apache.org/jira/browse/FLINK-27934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-27934:
-----------------------------------
    Labels: pull-request-available stale-minor  (was: pull-request-available)

I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help 
the community manage its development. I see this issues has been marked as 
Minor but is unassigned and neither itself nor its Sub-Tasks have been updated 
for 180 days. I have gone ahead and marked it "stale-minor". If this ticket is 
still Minor, please either assign yourself or give an update. Afterwards, 
please remove the label or in 7 days the issue will be deprioritized.


> Python API- Inefficient deserialization/serialization of state variables 
> within a batch
> ---------------------------------------------------------------------------------------
>
>                 Key: FLINK-27934
>                 URL: https://issues.apache.org/jira/browse/FLINK-27934
>             Project: Flink
>          Issue Type: Improvement
>          Components: Stateful Functions
>    Affects Versions: statefun-3.2.0
>            Reporter: Frans King
>            Priority: Minor
>              Labels: pull-request-available, stale-minor
>
> In the Python API state variables can be accessed via the UserFacingContext:
> variable = context.storage.variable
> This calls into the Cell instance for that state variable which has get() & 
> set() methods.  The get() method always deserializes from the typed_value and 
> the set() always re-serializes and marks the cell dirty.
>  
> This has two side effects
> 1:
> var1 = context.storage.variable
> var2 = context.storage.variable
> id(var2) != id(var1) - they are different instances
>  
> 2:
> In a large batch (say 1000 calls to the same function type and id) this can 
> result in deserializing and re-serializing the same same state variable 1000 
> times when really it only needs to be deserialized in the first invocation in 
> the batch, held in memory until the last invocation and then re-serialized 
> prior to collecting the mutations.  
>  
> I think this can be improved by having a lazily initialized backing field in 
> the Cell class but I don't know if this was a conscious design decision to 
> have the behavior described in 1. 
>  
> Any feedback would be welcome. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to