Ok so this sounds like a bug in the DirectRunner then?

*From: *Lukasz Cwik <lc...@google.com>
*Date: *Fri, May 10, 2019 at 8:38 AM
*To: *dev

StateSpec should not be used as a key within any maps. We should use the
> logical name of the StateSpec relative to the DoFn as its id and should
> only be using that id for comparisons/lookups.
>
> On Fri, May 10, 2019 at 1:07 AM Jan Lukavský <je...@seznam.cz> wrote:
>
>> I'm not sure. Generally it affects any runner that uses HashMap to store
>> StateSpec.
>>
>> Jan
>> On 5/9/19 6:32 PM, Reuven Lax wrote:
>>
>> Is this specific to the DirectRunner, or does it affect other runners?
>>
>> On Thu, May 9, 2019 at 8:13 AM Jan Lukavský <je...@seznam.cz> wrote:
>>
>>> Because of the use of hashCode in StateSpecs, I'd say that it is. But it
>>> is not obvious. That's why I'd suggest to make it abstract on Coder, so
>>> that all implementations have to override it. That's a simple solution, but
>>> the question is - should hashCode of Coder be used that way? I think that
>>> StateSpec instances should be equal only to itself. Then the hashCode can
>>> be stored in the instance, e.g.
>>>
>>>   private final int hashCode = System.identityHashCode(this)
>>>
>>> and returned in hashCode(). There would be no need for Coder to
>>> implement hashCode anymore (if there aren't any other cases, where it is
>>> needed, in which case it would still be better to add abstract hashCode and
>>> equals methods on Coder).
>>>
>>> Jan
>>> On 5/9/19 5:04 PM, Reuven Lax wrote:
>>>
>>> Is a valid hashCode on Coder part of our contract or not? If it is, then
>>> the lack of hashCode on SchemaCoder is simply a bug.
>>>
>>> On Thu, May 9, 2019 at 7:42 AM Jan Lukavský <je...@seznam.cz> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have spent several hour digging into strange issue with DirectRunner,
>>>> that manifested as non-deterministic run of pipeline. The pipeline
>>>> contains basically only single stateful ParDo, which adds elements into
>>>> state and after some timeout flushes these elements into output. The
>>>> issues was, that sometimes (very often) when the timer fired, the state
>>>> appeared to be empty, although I actually added something into the
>>>> state. I will skip details, but the problem boils down to the fact,
>>>> that
>>>> StateSpecs hash Coder into hashCode - e.g.
>>>>
>>>>      @Override
>>>>      public int hashCode() {
>>>>        return Objects.hash(getClass(), coder);
>>>>      }
>>>>
>>>> in ValueStateSpec. Now, when Coder doesn't have hashCode and equals
>>>> implemented (and there are some of those in the codebase itself - e.g.
>>>> SchemaCoder), it all blows up in a very hard-to-debug manner. So the
>>>> proposal is - either to add abstract hashCode and equals to Coder, or
>>>> don't hash the Coder into hashCode of StateSpecs (we can generate
>>>> unique
>>>> ID for each StateSpec instance for example).
>>>>
>>>> Any thoughts about which path to follow? Or maybe both? :)
>>>>
>>>> Jan
>>>>
>>>>
>>>>

Reply via email to