I don't think I'd call it a bug?  The cache doesn't differentiate between a
state cell that existed but was cleared, and one that is missing from the
cache (maybe it should?).  The side input fetcher clears the blocked state
when it becomes unblocked:

https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingSideInputFetcher.java#L241

Given that interaction, once the side input becomes ready, every request
for the blocked map will result in a cache miss, and a state lookup
request.  Changing the cache to cache tombstones for cleared cells is
another possibility as well.

On Thu, Jul 22, 2021 at 2:33 AM Reuven Lax <[email protected]> wrote:

> So you're saying there's a bug in the caching logic that prevents the
> side-input cache from working?
>
> On Wed, Jul 21, 2021 at 7:07 PM Steve Niemitz <[email protected]> wrote:
>
>> I had opened a jira years ago [1] about this, but would like to actually
>> fix it for real now, given that our users have started using streaming more
>> and more.
>>
>> There's more detail in the jira, but basically side inputs in streaming
>> pipelines on dataflow lead to pretty bad performance because they result in
>> a state lookup request for each element.
>>
>
>
>> I think the best solution would be to stop storing null for the
>> blockedMap, and instead store and empty map.  This way there will
>> (generally) be a cache hit when looking it up (the cache can't cache a
>> null).  The only issue here though is that something then needs to clean up
>> the map.  It seems like the StreamingSideInputDoFnRunner could probably set
>> its own cleanup timer here to do that.
>>
>> I'm curious if there are other thoughts on the better ways to do this?
>>
>> Also, as a side note, I was looking at the SideInputHandler
>> implementation in runners-core, and I can't seem to see where the state
>> that it maintains is ever cleaned up?
>>
>> [1] https://issues.apache.org/jira/browse/BEAM-7745
>>
>

Reply via email to