Re: Apache Flink - Sharing state in processors

2020-01-23 Thread Chesnay Schepler
1. a/b) No, they are deserialized into separate instances in any case 
and are independent afterwards.


2. a/b) No, see 1).

3. a/b) No, as individual tasks are isolated by different class-loaders.

On 23/01/2020 09:25, M Singh wrote:

Thanks Yun for your answers.

By processor I did mean user defined processor function. Keeping that 
in view, do you have any advice on how the shared state - ie, the 
parameters passed to the processor as mentioned above (not the key 
state or operator state) will be affected in a distributed runtime env ?


Mans

On Sunday, January 12, 2020, 09:51:10 PM EST, Yun Tang 
 wrote:



Hi Mans

What's the meaning of 'processor' you defined here? A user defined 
function?


When talking about share state, I'm afraid it's not so easy to 
implement in Flink. As no matter keyed state or operator state, 
they're both instantiated, used and only thread-safe in operator 
scope. The only way to read read-only state during runtime is via 
queryable state[1]


For the question of keyBy, the message would only sent to one of task 
in downstream according to the hashcode [2].


[1] 
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/queryable_state.html
[2] 
https://github.com/apache/flink/blob/7a6ca9c03f67f488e40a114e94c389a5cfb67836/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/partitioner/KeyGroupStreamPartitioner.java#L58



Best
Yun Tang


*From:* M Singh 
*Sent:* Friday, January 10, 2020 23:29
*To:* User 
*Subject:* Apache Flink - Sharing state in processors
Hi:

I have a few question about how state is shared in processors in Flink.

1. If I have a processor instantiated in the Flink app, and apply use 
in multiple times in the Flink -
    (a) if the tasks are in the same slot - do they share the same 
processor on the taskmanager ?
    (b) if the tasks are on same node but different slots - do they 
share the same processor on the taskmanager ?


2. If I instantiate a single processor with local state and use it in 
multiple times in Flink
    (a) if the tasks are in the same slot - do they share the same 
processor and state on the taskmanager ?
    (b) if the tasks are on same node but different slots - do they 
share the same processor and state on the taskmanager ?


3. If I instantiate a multiple processors with shared collection and 
use it in multiple times in Flink
    (a) if the tasks are in the same slot - do they share the state on 
the taskmanager ?
    (b) if the tasks are on same node but different slots - do they 
share the state on the taskmanager ?


4. How do the above scenarios affect sharing
(a) operator state
(b) keyed state

5. If I have have a parallelism of > 1, and use keyBy - is each key 
handled by only one instance of the processor ?  I believe so, but 
wanted to confirm.



Thanks

Mans









Re: Apache Flink - Sharing state in processors

2020-01-23 Thread M Singh
 Thanks Yun for your answers.
By processor I did mean user defined processor function. Keeping that in view, 
do you have any advice on how the shared state - ie, the parameters passed to 
the processor as mentioned above (not the key state or operator state) will be 
affected in a distributed runtime env ?
Mans
On Sunday, January 12, 2020, 09:51:10 PM EST, Yun Tang  
wrote:  
 
 #yiv0773511519 P {margin-top:0;margin-bottom:0;}Hi Mans
What's the meaning of 'processor' you defined here? A user defined function?
When talking about share state, I'm afraid it's not so easy to implement in 
Flink. As no matter keyed state or operator state, they're both instantiated, 
used and only thread-safe in operator scope. The only way to read read-only 
state during runtime is via queryable state[1]
For the question of keyBy, the message would only sent to one of task in 
downstream according to the hashcode [2].

[1] 
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/queryable_state.html[2]
 
https://github.com/apache/flink/blob/7a6ca9c03f67f488e40a114e94c389a5cfb67836/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/partitioner/KeyGroupStreamPartitioner.java#L58


BestYun Tang

From: M Singh 
Sent: Friday, January 10, 2020 23:29
To: User 
Subject: Apache Flink - Sharing state in processors Hi:
I have a few question about how state is shared in processors in Flink.
1. If I have a processor instantiated in the Flink app, and apply use in 
multiple times in the Flink -     (a) if the tasks are in the same slot - do 
they share the same processoron the taskmanager ?
    (b) if the tasks are on same node but different slots - do they share the 
same processor on the taskmanager ?

2. If I instantiate a single processor with local state and use it in multiple 
times in Flink     (a) if the tasks are in the same slot - do they share the 
same processor and stateon the taskmanager ?
    (b) if the tasks are on same node but different slots - do they share the 
same processor and stateon the taskmanager ?

3. If I instantiate a multiple processors with shared collection and use it in 
multiple times in Flink     (a) if the tasks are in the same slot - do they 
share the state on the taskmanager ?
    (b) if the tasks are on same node but different slots - do they share the 
stateon the taskmanager ?
4. How do the above scenarios affect sharing (a) operator state(b) 
keyed state
5. If I have have a parallelism of > 1, and use keyBy - is each key handled by 
only one instance of the processor ?  I believe so, but wanted to confirm.

Thanks
Mans




  

Re: Apache Flink - Sharing state in processors

2020-01-12 Thread Yun Tang
Hi Mans

What's the meaning of 'processor' you defined here? A user defined function?

When talking about share state, I'm afraid it's not so easy to implement in 
Flink. As no matter keyed state or operator state, they're both instantiated, 
used and only thread-safe in operator scope. The only way to read read-only 
state during runtime is via queryable state[1]

For the question of keyBy, the message would only sent to one of task in 
downstream according to the hashcode [2].

[1] 
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/queryable_state.html
[2] 
https://github.com/apache/flink/blob/7a6ca9c03f67f488e40a114e94c389a5cfb67836/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/partitioner/KeyGroupStreamPartitioner.java#L58


Best
Yun Tang


From: M Singh 
Sent: Friday, January 10, 2020 23:29
To: User 
Subject: Apache Flink - Sharing state in processors

Hi:

I have a few question about how state is shared in processors in Flink.

1. If I have a processor instantiated in the Flink app, and apply use in 
multiple times in the Flink -
(a) if the tasks are in the same slot - do they share the same processor on 
the taskmanager ?
(b) if the tasks are on same node but different slots - do they share the 
same processor on the taskmanager ?

2. If I instantiate a single processor with local state and use it in multiple 
times in Flink
(a) if the tasks are in the same slot - do they share the same processor 
and state on the taskmanager ?
(b) if the tasks are on same node but different slots - do they share the 
same processor and state on the taskmanager ?

3. If I instantiate a multiple processors with shared collection and use it in 
multiple times in Flink
(a) if the tasks are in the same slot - do they share the state on the 
taskmanager ?
(b) if the tasks are on same node but different slots - do they share the 
state on the taskmanager ?

4. How do the above scenarios affect sharing
(a) operator state
(b) keyed state

5. If I have have a parallelism of > 1, and use keyBy - is each key handled by 
only one instance of the processor ?  I believe so, but wanted to confirm.


Thanks

Mans







Apache Flink - Sharing state in processors

2020-01-10 Thread M Singh
Hi:
I have a few question about how state is shared in processors in Flink.
1. If I have a processor instantiated in the Flink app, and apply use in 
multiple times in the Flink -     (a) if the tasks are in the same slot - do 
they share the same processor on the taskmanager ?
    (b) if the tasks are on same node but different slots - do they share the 
same processor on the taskmanager ?

2. If I instantiate a single processor with local state and use it in multiple 
times in Flink     (a) if the tasks are in the same slot - do they share the 
same processor and state on the taskmanager ?
    (b) if the tasks are on same node but different slots - do they share the 
same processor and state on the taskmanager ?

3. If I instantiate a multiple processors with shared collection and use it in 
multiple times in Flink     (a) if the tasks are in the same slot - do they 
share the state on the taskmanager ?
    (b) if the tasks are on same node but different slots - do they share the 
state on the taskmanager ?
4. How do the above scenarios affect sharing (a) operator state(b) 
keyed state
5. If I have have a parallelism of > 1, and use keyBy - is each key handled by 
only one instance of the processor ?  I believe so, but wanted to confirm.

Thanks
Mans