Hm, yes, the fix might be also in fixing hashCode and equals of
SimpleStateTag, so that it doesn't hash and compare the StateSpec, but
only the StructureId. That looks like best option to me. But I'm not
sure about other implications this might have.
Jan
On 5/10/19 5:43 PM, Reuven Lax wrote:
Ok so this sounds like a bug in the DirectRunner then?
*From: *Lukasz Cwik <lc...@google.com <mailto:lc...@google.com>>
*Date: *Fri, May 10, 2019 at 8:38 AM
*To: *dev
StateSpec should not be used as a key within any maps. We should
use the logical name of the StateSpec relative to the DoFn as its
id and should only be using that id for comparisons/lookups.
On Fri, May 10, 2019 at 1:07 AM Jan Lukavský <je...@seznam.cz
<mailto:je...@seznam.cz>> wrote:
I'm not sure. Generally it affects any runner that uses
HashMap to store StateSpec.
Jan
On 5/9/19 6:32 PM, Reuven Lax wrote:
Is this specific to the DirectRunner, or does it affect other
runners?
On Thu, May 9, 2019 at 8:13 AM Jan Lukavský <je...@seznam.cz
<mailto:je...@seznam.cz>> wrote:
Because of the use of hashCode in StateSpecs, I'd say
that it is. But it is not obvious. That's why I'd suggest
to make it abstract on Coder, so that all implementations
have to override it. That's a simple solution, but the
question is - should hashCode of Coder be used that way?
I think that StateSpec instances should be equal only to
itself. Then the hashCode can be stored in the instance, e.g.
private final int hashCode = System.identityHashCode(this)
and returned in hashCode(). There would be no need for
Coder to implement hashCode anymore (if there aren't any
other cases, where it is needed, in which case it would
still be better to add abstract hashCode and equals
methods on Coder).
Jan
On 5/9/19 5:04 PM, Reuven Lax wrote:
Is a valid hashCode on Coder part of our contract or
not? If it is, then the lack of hashCode on SchemaCoder
is simply a bug.
On Thu, May 9, 2019 at 7:42 AM Jan Lukavský
<je...@seznam.cz <mailto:je...@seznam.cz>> wrote:
Hi,
I have spent several hour digging into strange issue
with DirectRunner,
that manifested as non-deterministic run of
pipeline. The pipeline
contains basically only single stateful ParDo, which
adds elements into
state and after some timeout flushes these elements
into output. The
issues was, that sometimes (very often) when the
timer fired, the state
appeared to be empty, although I actually added
something into the
state. I will skip details, but the problem boils
down to the fact, that
StateSpecs hash Coder into hashCode - e.g.
@Override
public int hashCode() {
return Objects.hash(getClass(), coder);
}
in ValueStateSpec. Now, when Coder doesn't have
hashCode and equals
implemented (and there are some of those in the
codebase itself - e.g.
SchemaCoder), it all blows up in a very
hard-to-debug manner. So the
proposal is - either to add abstract hashCode and
equals to Coder, or
don't hash the Coder into hashCode of StateSpecs (we
can generate unique
ID for each StateSpec instance for example).
Any thoughts about which path to follow? Or maybe
both? :)
Jan