Thank you all for the reply. One last question, I noticed that 
ParDoTest$StateTests > testValueStateSameId 
<https://github.com/apache/beam/blob/c7e8c6bdb819b0d32f8727036a09c227529d01d0/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java#L2118>
 seems to be testing exact this case, however, the first ParDo intentionally 
changed the key of inputs thus the subsequent ParDo would never share the same 
state cell anyway. Is this expected or do we want to actually want to test that 
same state id in different DoFn(s) is actually completely separate?

> On Aug 21, 2020, at 4:50 PM, Robert Bradshaw <rober...@google.com> wrote:
> 
> We should be using PTransform Labels (aka Names), not ids, for naming
> state. This is why the names must line up when doing, for example, a
> Dataflow update operation with Stateful DoFns.
> 
> (And, yes, if the user does not specify the transform name, and it is
> autogenerated differently, this will be an error. This is why we throw
> exceptions in the SDK if a name is re-used rather than just appending
> a counter or similar.)
> 
> 
> On Fri, Aug 21, 2020 at 4:12 PM Ke Wu <ke.wu...@gmail.com> wrote:
>> 
>> If user does not explicitly specify transform name, in which case a 
>> autogenerated name will be used when generating the unique id, does it mean, 
>> the id could change when the pipeline changes, such as adding extra 
>> transforms etc?
>> 
>> On Aug 21, 2020, at 11:43 AM, Luke Cwik <lc...@google.com> wrote:
>> 
>> The DoFn is associated with a PTransform and in the pipeline proto there is 
>> a unique id associated with each PTransform. You can use that to generate a 
>> composite key (ptransformid, stateid) which will be unique within the 
>> pipeline.
>> 
>> On Fri, Aug 21, 2020 at 11:26 AM Ke Wu <ke.wu...@gmail.com> wrote:
>>> 
>>> Thank you Reuven for the confirmation. Do you know what is the recommended 
>>> way for underlying runners to distinguish same state id in different 
>>> DoFn(s)?
>>> 
>>> On Aug 21, 2020, at 10:27 AM, Reuven Lax <re...@google.com> wrote:
>>> 
>>> StateId is scoped to the DoFn. You can use the same string in different 
>>> DoFns for completely different states.
>>> 
>>> On Fri, Aug 21, 2020 at 10:21 AM Ke Wu <ke.wu...@gmail.com> wrote:
>>>> 
>>>> Hello everyone,
>>>> 
>>>> After reading through Stateful processing with Apache Beam and 
>>>> DoFn.StateId, I understand that each state id must be unique and must be 
>>>> the same type at least in the same DoFn, however, it does not explicitly 
>>>> mention whether or not it is expected and supported that the same state id 
>>>> to be declared in different DoFn(s). If Yes, is the state supposed to be a 
>>>> shared state or is supposed to completed separate, therefore it could even 
>>>> be different types. If No, it seems that the validation in Beam SDK only 
>>>> validates uniqueness in the same DoFn.
>>>> 
>>>> Thanks,
>>>> Ke
>>> 
>>> 
>> 

Reply via email to