Hm, yes, the fix might be also in fixing hashCode and equals of SimpleStateTag, so that it doesn't hash and compare the StateSpec, but only the StructureId. That looks like best option to me. But I'm not sure about other implications this might have.

Jan

On 5/10/19 5:43 PM, Reuven Lax wrote:
Ok so this sounds like a bug in the DirectRunner then?

*From: *Lukasz Cwik <lc...@google.com <mailto:lc...@google.com>>
*Date: *Fri, May 10, 2019 at 8:38 AM
*To: *dev

    StateSpec should not be used as a key within any maps. We should
    use the logical name of the StateSpec relative to the DoFn as its
    id and should only be using that id for comparisons/lookups.

    On Fri, May 10, 2019 at 1:07 AM Jan Lukavský <je...@seznam.cz
    <mailto:je...@seznam.cz>> wrote:

        I'm not sure. Generally it affects any runner that uses
        HashMap to store StateSpec.

        Jan

        On 5/9/19 6:32 PM, Reuven Lax wrote:
        Is this specific to the DirectRunner, or does it affect other
        runners?

        On Thu, May 9, 2019 at 8:13 AM Jan Lukavský <je...@seznam.cz
        <mailto:je...@seznam.cz>> wrote:

            Because of the use of hashCode in StateSpecs, I'd say
            that it is. But it is not obvious. That's why I'd suggest
            to make it abstract on Coder, so that all implementations
            have to override it. That's a simple solution, but the
            question is - should hashCode of Coder be used that way?
            I think that StateSpec instances should be equal only to
            itself. Then the hashCode can be stored in the instance, e.g.

              private final int hashCode = System.identityHashCode(this)

            and returned in hashCode(). There would be no need for
            Coder to implement hashCode anymore (if there aren't any
            other cases, where it is needed, in which case it would
            still be better to add abstract hashCode and equals
            methods on Coder).

            Jan

            On 5/9/19 5:04 PM, Reuven Lax wrote:
            Is a valid hashCode on Coder part of our contract or
            not? If it is, then the lack of hashCode on SchemaCoder
            is simply a bug.

            On Thu, May 9, 2019 at 7:42 AM Jan Lukavský
            <je...@seznam.cz <mailto:je...@seznam.cz>> wrote:

                Hi,

                I have spent several hour digging into strange issue
                with DirectRunner,
                that manifested as non-deterministic run of
                pipeline. The pipeline
                contains basically only single stateful ParDo, which
                adds elements into
                state and after some timeout flushes these elements
                into output. The
                issues was, that sometimes (very often) when the
                timer fired, the state
                appeared to be empty, although I actually added
                something into the
                state. I will skip details, but the problem boils
                down to the fact, that
                StateSpecs hash Coder into hashCode - e.g.

                     @Override
                     public int hashCode() {
                       return Objects.hash(getClass(), coder);
                     }

                in ValueStateSpec. Now, when Coder doesn't have
                hashCode and equals
                implemented (and there are some of those in the
                codebase itself - e.g.
                SchemaCoder), it all blows up in a very
                hard-to-debug manner. So the
                proposal is - either to add abstract hashCode and
                equals to Coder, or
                don't hash the Coder into hashCode of StateSpecs (we
                can generate unique
                ID for each StateSpec instance for example).

                Any thoughts about which path to follow? Or maybe
                both? :)

                Jan


Reply via email to