Re: Unexpected behavior of StateSpecs

2019-05-10 Thread Jan Lukavský

Hi Lukasz,

I've created JIRA issue [1] and PR [2].

Jan

[1] https://issues.apache.org/jira/browse/BEAM-7269

[2] https://github.com/apache/beam/pull/8555

On 5/10/19 7:39 PM, Lukasz Cwik wrote:
That seems like the correct fix as well. We could open up a PR and see 
what the tests catch as a first pass for understanding the implications.


On Fri, May 10, 2019 at 9:31 AM Jan Lukavský > wrote:


Hm, yes, the fix might be also in fixing hashCode and equals of
SimpleStateTag, so that it doesn't hash and compare the StateSpec,
but only the StructureId. That looks like best option to me. But
I'm not sure about other implications this might have.

Jan

On 5/10/19 5:43 PM, Reuven Lax wrote:

Ok so this sounds like a bug in the DirectRunner then?

*From: *Lukasz Cwik mailto:lc...@google.com>>
*Date: *Fri, May 10, 2019 at 8:38 AM
*To: *dev

StateSpec should not be used as a key within any maps. We
should use the logical name of the StateSpec relative to the
DoFn as its id and should only be using that id for
comparisons/lookups.

On Fri, May 10, 2019 at 1:07 AM Jan Lukavský mailto:je...@seznam.cz>> wrote:

I'm not sure. Generally it affects any runner that uses
HashMap to store StateSpec.

Jan

On 5/9/19 6:32 PM, Reuven Lax wrote:

Is this specific to the DirectRunner, or does it affect
other runners?

On Thu, May 9, 2019 at 8:13 AM Jan Lukavský
mailto:je...@seznam.cz>> wrote:

Because of the use of hashCode in StateSpecs, I'd
say that it is. But it is not obvious. That's why
I'd suggest to make it abstract on Coder, so that
all implementations have to override it. That's a
simple solution, but the question is - should
hashCode of Coder be used that way? I think that
StateSpec instances should be equal only to itself.
Then the hashCode can be stored in the instance, e.g.

  private final int hashCode =
System.identityHashCode(this)

and returned in hashCode(). There would be no need
for Coder to implement hashCode anymore (if there
aren't any other cases, where it is needed, in which
case it would still be better to add abstract
hashCode and equals methods on Coder).

Jan

On 5/9/19 5:04 PM, Reuven Lax wrote:

Is a valid hashCode on Coder part of our contract
or not? If it is, then the lack of hashCode on
SchemaCoder is simply a bug.

On Thu, May 9, 2019 at 7:42 AM Jan Lukavský
mailto:je...@seznam.cz>> wrote:

Hi,

I have spent several hour digging into strange
issue with DirectRunner,
that manifested as non-deterministic run of
pipeline. The pipeline
contains basically only single stateful ParDo,
which adds elements into
state and after some timeout flushes these
elements into output. The
issues was, that sometimes (very often) when
the timer fired, the state
appeared to be empty, although I actually added
something into the
state. I will skip details, but the problem
boils down to the fact, that
StateSpecs hash Coder into hashCode - e.g.

 @Override
 public int hashCode() {
   return Objects.hash(getClass(), coder);
 }

in ValueStateSpec. Now, when Coder doesn't have
hashCode and equals
implemented (and there are some of those in the
codebase itself - e.g.
SchemaCoder), it all blows up in a very
hard-to-debug manner. So the
proposal is - either to add abstract hashCode
and equals to Coder, or
don't hash the Coder into hashCode of
StateSpecs (we can generate unique
ID for each StateSpec instance for example).

Any thoughts about which path to follow? Or
maybe both? :)

Jan




Re: Unexpected behavior of StateSpecs

2019-05-10 Thread Lukasz Cwik
That seems like the correct fix as well. We could open up a PR and see what
the tests catch as a first pass for understanding the implications.

On Fri, May 10, 2019 at 9:31 AM Jan Lukavský  wrote:

> Hm, yes, the fix might be also in fixing hashCode and equals of
> SimpleStateTag, so that it doesn't hash and compare the StateSpec, but only
> the StructureId. That looks like best option to me. But I'm not sure about
> other implications this might have.
>
> Jan
> On 5/10/19 5:43 PM, Reuven Lax wrote:
>
> Ok so this sounds like a bug in the DirectRunner then?
>
> *From: *Lukasz Cwik 
> *Date: *Fri, May 10, 2019 at 8:38 AM
> *To: *dev
>
> StateSpec should not be used as a key within any maps. We should use the
>> logical name of the StateSpec relative to the DoFn as its id and should
>> only be using that id for comparisons/lookups.
>>
>> On Fri, May 10, 2019 at 1:07 AM Jan Lukavský  wrote:
>>
>>> I'm not sure. Generally it affects any runner that uses HashMap to store
>>> StateSpec.
>>>
>>> Jan
>>> On 5/9/19 6:32 PM, Reuven Lax wrote:
>>>
>>> Is this specific to the DirectRunner, or does it affect other runners?
>>>
>>> On Thu, May 9, 2019 at 8:13 AM Jan Lukavský  wrote:
>>>
 Because of the use of hashCode in StateSpecs, I'd say that it is. But
 it is not obvious. That's why I'd suggest to make it abstract on Coder, so
 that all implementations have to override it. That's a simple solution, but
 the question is - should hashCode of Coder be used that way? I think that
 StateSpec instances should be equal only to itself. Then the hashCode can
 be stored in the instance, e.g.

   private final int hashCode = System.identityHashCode(this)

 and returned in hashCode(). There would be no need for Coder to
 implement hashCode anymore (if there aren't any other cases, where it is
 needed, in which case it would still be better to add abstract hashCode and
 equals methods on Coder).

 Jan
 On 5/9/19 5:04 PM, Reuven Lax wrote:

 Is a valid hashCode on Coder part of our contract or not? If it is,
 then the lack of hashCode on SchemaCoder is simply a bug.

 On Thu, May 9, 2019 at 7:42 AM Jan Lukavský  wrote:

> Hi,
>
> I have spent several hour digging into strange issue with
> DirectRunner,
> that manifested as non-deterministic run of pipeline. The pipeline
> contains basically only single stateful ParDo, which adds elements
> into
> state and after some timeout flushes these elements into output. The
> issues was, that sometimes (very often) when the timer fired, the
> state
> appeared to be empty, although I actually added something into the
> state. I will skip details, but the problem boils down to the fact,
> that
> StateSpecs hash Coder into hashCode - e.g.
>
>  @Override
>  public int hashCode() {
>return Objects.hash(getClass(), coder);
>  }
>
> in ValueStateSpec. Now, when Coder doesn't have hashCode and equals
> implemented (and there are some of those in the codebase itself - e.g.
> SchemaCoder), it all blows up in a very hard-to-debug manner. So the
> proposal is - either to add abstract hashCode and equals to Coder, or
> don't hash the Coder into hashCode of StateSpecs (we can generate
> unique
> ID for each StateSpec instance for example).
>
> Any thoughts about which path to follow? Or maybe both? :)
>
> Jan
>
>
>


Re: Unexpected behavior of StateSpecs

2019-05-10 Thread Jan Lukavský
Hm, yes, the fix might be also in fixing hashCode and equals of 
SimpleStateTag, so that it doesn't hash and compare the StateSpec, but 
only the StructureId. That looks like best option to me. But I'm not 
sure about other implications this might have.


Jan

On 5/10/19 5:43 PM, Reuven Lax wrote:

Ok so this sounds like a bug in the DirectRunner then?

*From: *Lukasz Cwik mailto:lc...@google.com>>
*Date: *Fri, May 10, 2019 at 8:38 AM
*To: *dev

StateSpec should not be used as a key within any maps. We should
use the logical name of the StateSpec relative to the DoFn as its
id and should only be using that id for comparisons/lookups.

On Fri, May 10, 2019 at 1:07 AM Jan Lukavský mailto:je...@seznam.cz>> wrote:

I'm not sure. Generally it affects any runner that uses
HashMap to store StateSpec.

Jan

On 5/9/19 6:32 PM, Reuven Lax wrote:

Is this specific to the DirectRunner, or does it affect other
runners?

On Thu, May 9, 2019 at 8:13 AM Jan Lukavský mailto:je...@seznam.cz>> wrote:

Because of the use of hashCode in StateSpecs, I'd say
that it is. But it is not obvious. That's why I'd suggest
to make it abstract on Coder, so that all implementations
have to override it. That's a simple solution, but the
question is - should hashCode of Coder be used that way?
I think that StateSpec instances should be equal only to
itself. Then the hashCode can be stored in the instance, e.g.

  private final int hashCode = System.identityHashCode(this)

and returned in hashCode(). There would be no need for
Coder to implement hashCode anymore (if there aren't any
other cases, where it is needed, in which case it would
still be better to add abstract hashCode and equals
methods on Coder).

Jan

On 5/9/19 5:04 PM, Reuven Lax wrote:

Is a valid hashCode on Coder part of our contract or
not? If it is, then the lack of hashCode on SchemaCoder
is simply a bug.

On Thu, May 9, 2019 at 7:42 AM Jan Lukavský
mailto:je...@seznam.cz>> wrote:

Hi,

I have spent several hour digging into strange issue
with DirectRunner,
that manifested as non-deterministic run of
pipeline. The pipeline
contains basically only single stateful ParDo, which
adds elements into
state and after some timeout flushes these elements
into output. The
issues was, that sometimes (very often) when the
timer fired, the state
appeared to be empty, although I actually added
something into the
state. I will skip details, but the problem boils
down to the fact, that
StateSpecs hash Coder into hashCode - e.g.

 @Override
 public int hashCode() {
   return Objects.hash(getClass(), coder);
 }

in ValueStateSpec. Now, when Coder doesn't have
hashCode and equals
implemented (and there are some of those in the
codebase itself - e.g.
SchemaCoder), it all blows up in a very
hard-to-debug manner. So the
proposal is - either to add abstract hashCode and
equals to Coder, or
don't hash the Coder into hashCode of StateSpecs (we
can generate unique
ID for each StateSpec instance for example).

Any thoughts about which path to follow? Or maybe
both? :)

Jan




Re: Unexpected behavior of StateSpecs

2019-05-10 Thread Reuven Lax
Ok so this sounds like a bug in the DirectRunner then?

*From: *Lukasz Cwik 
*Date: *Fri, May 10, 2019 at 8:38 AM
*To: *dev

StateSpec should not be used as a key within any maps. We should use the
> logical name of the StateSpec relative to the DoFn as its id and should
> only be using that id for comparisons/lookups.
>
> On Fri, May 10, 2019 at 1:07 AM Jan Lukavský  wrote:
>
>> I'm not sure. Generally it affects any runner that uses HashMap to store
>> StateSpec.
>>
>> Jan
>> On 5/9/19 6:32 PM, Reuven Lax wrote:
>>
>> Is this specific to the DirectRunner, or does it affect other runners?
>>
>> On Thu, May 9, 2019 at 8:13 AM Jan Lukavský  wrote:
>>
>>> Because of the use of hashCode in StateSpecs, I'd say that it is. But it
>>> is not obvious. That's why I'd suggest to make it abstract on Coder, so
>>> that all implementations have to override it. That's a simple solution, but
>>> the question is - should hashCode of Coder be used that way? I think that
>>> StateSpec instances should be equal only to itself. Then the hashCode can
>>> be stored in the instance, e.g.
>>>
>>>   private final int hashCode = System.identityHashCode(this)
>>>
>>> and returned in hashCode(). There would be no need for Coder to
>>> implement hashCode anymore (if there aren't any other cases, where it is
>>> needed, in which case it would still be better to add abstract hashCode and
>>> equals methods on Coder).
>>>
>>> Jan
>>> On 5/9/19 5:04 PM, Reuven Lax wrote:
>>>
>>> Is a valid hashCode on Coder part of our contract or not? If it is, then
>>> the lack of hashCode on SchemaCoder is simply a bug.
>>>
>>> On Thu, May 9, 2019 at 7:42 AM Jan Lukavský  wrote:
>>>
 Hi,

 I have spent several hour digging into strange issue with DirectRunner,
 that manifested as non-deterministic run of pipeline. The pipeline
 contains basically only single stateful ParDo, which adds elements into
 state and after some timeout flushes these elements into output. The
 issues was, that sometimes (very often) when the timer fired, the state
 appeared to be empty, although I actually added something into the
 state. I will skip details, but the problem boils down to the fact,
 that
 StateSpecs hash Coder into hashCode - e.g.

  @Override
  public int hashCode() {
return Objects.hash(getClass(), coder);
  }

 in ValueStateSpec. Now, when Coder doesn't have hashCode and equals
 implemented (and there are some of those in the codebase itself - e.g.
 SchemaCoder), it all blows up in a very hard-to-debug manner. So the
 proposal is - either to add abstract hashCode and equals to Coder, or
 don't hash the Coder into hashCode of StateSpecs (we can generate
 unique
 ID for each StateSpec instance for example).

 Any thoughts about which path to follow? Or maybe both? :)

 Jan





Re: Unexpected behavior of StateSpecs

2019-05-10 Thread Lukasz Cwik
StateSpec should not be used as a key within any maps. We should use the
logical name of the StateSpec relative to the DoFn as its id and should
only be using that id for comparisons/lookups.

On Fri, May 10, 2019 at 1:07 AM Jan Lukavský  wrote:

> I'm not sure. Generally it affects any runner that uses HashMap to store
> StateSpec.
>
> Jan
> On 5/9/19 6:32 PM, Reuven Lax wrote:
>
> Is this specific to the DirectRunner, or does it affect other runners?
>
> On Thu, May 9, 2019 at 8:13 AM Jan Lukavský  wrote:
>
>> Because of the use of hashCode in StateSpecs, I'd say that it is. But it
>> is not obvious. That's why I'd suggest to make it abstract on Coder, so
>> that all implementations have to override it. That's a simple solution, but
>> the question is - should hashCode of Coder be used that way? I think that
>> StateSpec instances should be equal only to itself. Then the hashCode can
>> be stored in the instance, e.g.
>>
>>   private final int hashCode = System.identityHashCode(this)
>>
>> and returned in hashCode(). There would be no need for Coder to implement
>> hashCode anymore (if there aren't any other cases, where it is needed, in
>> which case it would still be better to add abstract hashCode and equals
>> methods on Coder).
>>
>> Jan
>> On 5/9/19 5:04 PM, Reuven Lax wrote:
>>
>> Is a valid hashCode on Coder part of our contract or not? If it is, then
>> the lack of hashCode on SchemaCoder is simply a bug.
>>
>> On Thu, May 9, 2019 at 7:42 AM Jan Lukavský  wrote:
>>
>>> Hi,
>>>
>>> I have spent several hour digging into strange issue with DirectRunner,
>>> that manifested as non-deterministic run of pipeline. The pipeline
>>> contains basically only single stateful ParDo, which adds elements into
>>> state and after some timeout flushes these elements into output. The
>>> issues was, that sometimes (very often) when the timer fired, the state
>>> appeared to be empty, although I actually added something into the
>>> state. I will skip details, but the problem boils down to the fact, that
>>> StateSpecs hash Coder into hashCode - e.g.
>>>
>>>  @Override
>>>  public int hashCode() {
>>>return Objects.hash(getClass(), coder);
>>>  }
>>>
>>> in ValueStateSpec. Now, when Coder doesn't have hashCode and equals
>>> implemented (and there are some of those in the codebase itself - e.g.
>>> SchemaCoder), it all blows up in a very hard-to-debug manner. So the
>>> proposal is - either to add abstract hashCode and equals to Coder, or
>>> don't hash the Coder into hashCode of StateSpecs (we can generate unique
>>> ID for each StateSpec instance for example).
>>>
>>> Any thoughts about which path to follow? Or maybe both? :)
>>>
>>> Jan
>>>
>>>
>>>


Re: Unexpected behavior of StateSpecs

2019-05-10 Thread Jan Lukavský
I'm not sure. Generally it affects any runner that uses HashMap to store 
StateSpec.


Jan

On 5/9/19 6:32 PM, Reuven Lax wrote:

Is this specific to the DirectRunner, or does it affect other runners?

On Thu, May 9, 2019 at 8:13 AM Jan Lukavský > wrote:


Because of the use of hashCode in StateSpecs, I'd say that it is.
But it is not obvious. That's why I'd suggest to make it abstract
on Coder, so that all implementations have to override it. That's
a simple solution, but the question is - should hashCode of Coder
be used that way? I think that StateSpec instances should be equal
only to itself. Then the hashCode can be stored in the instance, e.g.

  private final int hashCode = System.identityHashCode(this)

and returned in hashCode(). There would be no need for Coder to
implement hashCode anymore (if there aren't any other cases, where
it is needed, in which case it would still be better to add
abstract hashCode and equals methods on Coder).

Jan

On 5/9/19 5:04 PM, Reuven Lax wrote:

Is a valid hashCode on Coder part of our contract or not? If it
is, then the lack of hashCode on SchemaCoder is simply a bug.

On Thu, May 9, 2019 at 7:42 AM Jan Lukavský mailto:je...@seznam.cz>> wrote:

Hi,

I have spent several hour digging into strange issue with
DirectRunner,
that manifested as non-deterministic run of pipeline. The
pipeline
contains basically only single stateful ParDo, which adds
elements into
state and after some timeout flushes these elements into
output. The
issues was, that sometimes (very often) when the timer fired,
the state
appeared to be empty, although I actually added something
into the
state. I will skip details, but the problem boils down to the
fact, that
StateSpecs hash Coder into hashCode - e.g.

 @Override
 public int hashCode() {
   return Objects.hash(getClass(), coder);
 }

in ValueStateSpec. Now, when Coder doesn't have hashCode and
equals
implemented (and there are some of those in the codebase
itself - e.g.
SchemaCoder), it all blows up in a very hard-to-debug manner.
So the
proposal is - either to add abstract hashCode and equals to
Coder, or
don't hash the Coder into hashCode of StateSpecs (we can
generate unique
ID for each StateSpec instance for example).

Any thoughts about which path to follow? Or maybe both? :)

Jan




Re: Unexpected behavior of StateSpecs

2019-05-10 Thread Jan Lukavský

Hi Anton,

yes, if the keyCoder doesn't have proper hashCode and equals, then it 
would manifest exactly as described.


Jan

On 5/9/19 6:28 PM, Anton Kedin wrote:
Does it look similar to 
https://issues.apache.org/jira/browse/BEAM-6813 ? I also stumbled on a 
problem with a state in DirectRunner but wasn't able to figure it out 
yet: 
https://lists.apache.org/thread.html/dae8b605a218532c085a0eea4e71338eae51922c26820f37b24875c0@%3Cdev.beam.apache.org%3E


Regards,
Anton

*From: *Jan Lukavský mailto:je...@seznam.cz>>
*Date: *Thu, May 9, 2019 at 8:13 AM
*To: * mailto:dev@beam.apache.org>>

Because of the use of hashCode in StateSpecs, I'd say that it is.
But it is not obvious. That's why I'd suggest to make it abstract
on Coder, so that all implementations have to override it. That's
a simple solution, but the question is - should hashCode of Coder
be used that way? I think that StateSpec instances should be equal
only to itself. Then the hashCode can be stored in the instance, e.g.

  private final int hashCode = System.identityHashCode(this)

and returned in hashCode(). There would be no need for Coder to
implement hashCode anymore (if there aren't any other cases, where
it is needed, in which case it would still be better to add
abstract hashCode and equals methods on Coder).

Jan

On 5/9/19 5:04 PM, Reuven Lax wrote:

Is a valid hashCode on Coder part of our contract or not? If it
is, then the lack of hashCode on SchemaCoder is simply a bug.

On Thu, May 9, 2019 at 7:42 AM Jan Lukavský mailto:je...@seznam.cz>> wrote:

Hi,

I have spent several hour digging into strange issue with
DirectRunner,
that manifested as non-deterministic run of pipeline. The
pipeline
contains basically only single stateful ParDo, which adds
elements into
state and after some timeout flushes these elements into
output. The
issues was, that sometimes (very often) when the timer fired,
the state
appeared to be empty, although I actually added something
into the
state. I will skip details, but the problem boils down to the
fact, that
StateSpecs hash Coder into hashCode - e.g.

 @Override
 public int hashCode() {
   return Objects.hash(getClass(), coder);
 }

in ValueStateSpec. Now, when Coder doesn't have hashCode and
equals
implemented (and there are some of those in the codebase
itself - e.g.
SchemaCoder), it all blows up in a very hard-to-debug manner.
So the
proposal is - either to add abstract hashCode and equals to
Coder, or
don't hash the Coder into hashCode of StateSpecs (we can
generate unique
ID for each StateSpec instance for example).

Any thoughts about which path to follow? Or maybe both? :)

Jan




Re: Unexpected behavior of StateSpecs

2019-05-09 Thread Reuven Lax
Is this specific to the DirectRunner, or does it affect other runners?

On Thu, May 9, 2019 at 8:13 AM Jan Lukavský  wrote:

> Because of the use of hashCode in StateSpecs, I'd say that it is. But it
> is not obvious. That's why I'd suggest to make it abstract on Coder, so
> that all implementations have to override it. That's a simple solution, but
> the question is - should hashCode of Coder be used that way? I think that
> StateSpec instances should be equal only to itself. Then the hashCode can
> be stored in the instance, e.g.
>
>   private final int hashCode = System.identityHashCode(this)
>
> and returned in hashCode(). There would be no need for Coder to implement
> hashCode anymore (if there aren't any other cases, where it is needed, in
> which case it would still be better to add abstract hashCode and equals
> methods on Coder).
>
> Jan
> On 5/9/19 5:04 PM, Reuven Lax wrote:
>
> Is a valid hashCode on Coder part of our contract or not? If it is, then
> the lack of hashCode on SchemaCoder is simply a bug.
>
> On Thu, May 9, 2019 at 7:42 AM Jan Lukavský  wrote:
>
>> Hi,
>>
>> I have spent several hour digging into strange issue with DirectRunner,
>> that manifested as non-deterministic run of pipeline. The pipeline
>> contains basically only single stateful ParDo, which adds elements into
>> state and after some timeout flushes these elements into output. The
>> issues was, that sometimes (very often) when the timer fired, the state
>> appeared to be empty, although I actually added something into the
>> state. I will skip details, but the problem boils down to the fact, that
>> StateSpecs hash Coder into hashCode - e.g.
>>
>>  @Override
>>  public int hashCode() {
>>return Objects.hash(getClass(), coder);
>>  }
>>
>> in ValueStateSpec. Now, when Coder doesn't have hashCode and equals
>> implemented (and there are some of those in the codebase itself - e.g.
>> SchemaCoder), it all blows up in a very hard-to-debug manner. So the
>> proposal is - either to add abstract hashCode and equals to Coder, or
>> don't hash the Coder into hashCode of StateSpecs (we can generate unique
>> ID for each StateSpec instance for example).
>>
>> Any thoughts about which path to follow? Or maybe both? :)
>>
>> Jan
>>
>>
>>


Re: Unexpected behavior of StateSpecs

2019-05-09 Thread Anton Kedin
Does it look similar to https://issues.apache.org/jira/browse/BEAM-6813 ? I
also stumbled on a problem with a state in DirectRunner but wasn't able to
figure it out yet:
https://lists.apache.org/thread.html/dae8b605a218532c085a0eea4e71338eae51922c26820f37b24875c0@%3Cdev.beam.apache.org%3E

Regards,
Anton

*From: *Jan Lukavský 
*Date: *Thu, May 9, 2019 at 8:13 AM
*To: * 

Because of the use of hashCode in StateSpecs, I'd say that it is. But it is
> not obvious. That's why I'd suggest to make it abstract on Coder, so that
> all implementations have to override it. That's a simple solution, but the
> question is - should hashCode of Coder be used that way? I think that
> StateSpec instances should be equal only to itself. Then the hashCode can
> be stored in the instance, e.g.
>
>   private final int hashCode = System.identityHashCode(this)
>
> and returned in hashCode(). There would be no need for Coder to implement
> hashCode anymore (if there aren't any other cases, where it is needed, in
> which case it would still be better to add abstract hashCode and equals
> methods on Coder).
>
> Jan
> On 5/9/19 5:04 PM, Reuven Lax wrote:
>
> Is a valid hashCode on Coder part of our contract or not? If it is, then
> the lack of hashCode on SchemaCoder is simply a bug.
>
> On Thu, May 9, 2019 at 7:42 AM Jan Lukavský  wrote:
>
>> Hi,
>>
>> I have spent several hour digging into strange issue with DirectRunner,
>> that manifested as non-deterministic run of pipeline. The pipeline
>> contains basically only single stateful ParDo, which adds elements into
>> state and after some timeout flushes these elements into output. The
>> issues was, that sometimes (very often) when the timer fired, the state
>> appeared to be empty, although I actually added something into the
>> state. I will skip details, but the problem boils down to the fact, that
>> StateSpecs hash Coder into hashCode - e.g.
>>
>>  @Override
>>  public int hashCode() {
>>return Objects.hash(getClass(), coder);
>>  }
>>
>> in ValueStateSpec. Now, when Coder doesn't have hashCode and equals
>> implemented (and there are some of those in the codebase itself - e.g.
>> SchemaCoder), it all blows up in a very hard-to-debug manner. So the
>> proposal is - either to add abstract hashCode and equals to Coder, or
>> don't hash the Coder into hashCode of StateSpecs (we can generate unique
>> ID for each StateSpec instance for example).
>>
>> Any thoughts about which path to follow? Or maybe both? :)
>>
>> Jan
>>
>>
>>


Re: Unexpected behavior of StateSpecs

2019-05-09 Thread Jan Lukavský
Because of the use of hashCode in StateSpecs, I'd say that it is. But it 
is not obvious. That's why I'd suggest to make it abstract on Coder, so 
that all implementations have to override it. That's a simple solution, 
but the question is - should hashCode of Coder be used that way? I think 
that StateSpec instances should be equal only to itself. Then the 
hashCode can be stored in the instance, e.g.


  private final int hashCode = System.identityHashCode(this)

and returned in hashCode(). There would be no need for Coder to 
implement hashCode anymore (if there aren't any other cases, where it is 
needed, in which case it would still be better to add abstract hashCode 
and equals methods on Coder).


Jan

On 5/9/19 5:04 PM, Reuven Lax wrote:
Is a valid hashCode on Coder part of our contract or not? If it is, 
then the lack of hashCode on SchemaCoder is simply a bug.


On Thu, May 9, 2019 at 7:42 AM Jan Lukavský > wrote:


Hi,

I have spent several hour digging into strange issue with
DirectRunner,
that manifested as non-deterministic run of pipeline. The pipeline
contains basically only single stateful ParDo, which adds elements
into
state and after some timeout flushes these elements into output. The
issues was, that sometimes (very often) when the timer fired, the
state
appeared to be empty, although I actually added something into the
state. I will skip details, but the problem boils down to the
fact, that
StateSpecs hash Coder into hashCode - e.g.

 @Override
 public int hashCode() {
   return Objects.hash(getClass(), coder);
 }

in ValueStateSpec. Now, when Coder doesn't have hashCode and equals
implemented (and there are some of those in the codebase itself -
e.g.
SchemaCoder), it all blows up in a very hard-to-debug manner. So the
proposal is - either to add abstract hashCode and equals to Coder, or
don't hash the Coder into hashCode of StateSpecs (we can generate
unique
ID for each StateSpec instance for example).

Any thoughts about which path to follow? Or maybe both? :)

Jan




Re: Unexpected behavior of StateSpecs

2019-05-09 Thread Reuven Lax
Is a valid hashCode on Coder part of our contract or not? If it is, then
the lack of hashCode on SchemaCoder is simply a bug.

On Thu, May 9, 2019 at 7:42 AM Jan Lukavský  wrote:

> Hi,
>
> I have spent several hour digging into strange issue with DirectRunner,
> that manifested as non-deterministic run of pipeline. The pipeline
> contains basically only single stateful ParDo, which adds elements into
> state and after some timeout flushes these elements into output. The
> issues was, that sometimes (very often) when the timer fired, the state
> appeared to be empty, although I actually added something into the
> state. I will skip details, but the problem boils down to the fact, that
> StateSpecs hash Coder into hashCode - e.g.
>
>  @Override
>  public int hashCode() {
>return Objects.hash(getClass(), coder);
>  }
>
> in ValueStateSpec. Now, when Coder doesn't have hashCode and equals
> implemented (and there are some of those in the codebase itself - e.g.
> SchemaCoder), it all blows up in a very hard-to-debug manner. So the
> proposal is - either to add abstract hashCode and equals to Coder, or
> don't hash the Coder into hashCode of StateSpecs (we can generate unique
> ID for each StateSpec instance for example).
>
> Any thoughts about which path to follow? Or maybe both? :)
>
> Jan
>
>
>


Unexpected behavior of StateSpecs

2019-05-09 Thread Jan Lukavský

Hi,

I have spent several hour digging into strange issue with DirectRunner, 
that manifested as non-deterministic run of pipeline. The pipeline 
contains basically only single stateful ParDo, which adds elements into 
state and after some timeout flushes these elements into output. The 
issues was, that sometimes (very often) when the timer fired, the state 
appeared to be empty, although I actually added something into the 
state. I will skip details, but the problem boils down to the fact, that 
StateSpecs hash Coder into hashCode - e.g.


    @Override
    public int hashCode() {
  return Objects.hash(getClass(), coder);
    }

in ValueStateSpec. Now, when Coder doesn't have hashCode and equals 
implemented (and there are some of those in the codebase itself - e.g. 
SchemaCoder), it all blows up in a very hard-to-debug manner. So the 
proposal is - either to add abstract hashCode and equals to Coder, or 
don't hash the Coder into hashCode of StateSpecs (we can generate unique 
ID for each StateSpec instance for example).


Any thoughts about which path to follow? Or maybe both? :)

Jan