StateSpec should not be used as a key within any maps. We should use the
logical name of the StateSpec relative to the DoFn as its id and should
only be using that id for comparisons/lookups.

On Fri, May 10, 2019 at 1:07 AM Jan Lukavský <je...@seznam.cz> wrote:

> I'm not sure. Generally it affects any runner that uses HashMap to store
> StateSpec.
>
> Jan
> On 5/9/19 6:32 PM, Reuven Lax wrote:
>
> Is this specific to the DirectRunner, or does it affect other runners?
>
> On Thu, May 9, 2019 at 8:13 AM Jan Lukavský <je...@seznam.cz> wrote:
>
>> Because of the use of hashCode in StateSpecs, I'd say that it is. But it
>> is not obvious. That's why I'd suggest to make it abstract on Coder, so
>> that all implementations have to override it. That's a simple solution, but
>> the question is - should hashCode of Coder be used that way? I think that
>> StateSpec instances should be equal only to itself. Then the hashCode can
>> be stored in the instance, e.g.
>>
>>   private final int hashCode = System.identityHashCode(this)
>>
>> and returned in hashCode(). There would be no need for Coder to implement
>> hashCode anymore (if there aren't any other cases, where it is needed, in
>> which case it would still be better to add abstract hashCode and equals
>> methods on Coder).
>>
>> Jan
>> On 5/9/19 5:04 PM, Reuven Lax wrote:
>>
>> Is a valid hashCode on Coder part of our contract or not? If it is, then
>> the lack of hashCode on SchemaCoder is simply a bug.
>>
>> On Thu, May 9, 2019 at 7:42 AM Jan Lukavský <je...@seznam.cz> wrote:
>>
>>> Hi,
>>>
>>> I have spent several hour digging into strange issue with DirectRunner,
>>> that manifested as non-deterministic run of pipeline. The pipeline
>>> contains basically only single stateful ParDo, which adds elements into
>>> state and after some timeout flushes these elements into output. The
>>> issues was, that sometimes (very often) when the timer fired, the state
>>> appeared to be empty, although I actually added something into the
>>> state. I will skip details, but the problem boils down to the fact, that
>>> StateSpecs hash Coder into hashCode - e.g.
>>>
>>>      @Override
>>>      public int hashCode() {
>>>        return Objects.hash(getClass(), coder);
>>>      }
>>>
>>> in ValueStateSpec. Now, when Coder doesn't have hashCode and equals
>>> implemented (and there are some of those in the codebase itself - e.g.
>>> SchemaCoder), it all blows up in a very hard-to-debug manner. So the
>>> proposal is - either to add abstract hashCode and equals to Coder, or
>>> don't hash the Coder into hashCode of StateSpecs (we can generate unique
>>> ID for each StateSpec instance for example).
>>>
>>> Any thoughts about which path to follow? Or maybe both? :)
>>>
>>> Jan
>>>
>>>
>>>

Reply via email to