[jira] [Commented] (FLINK-2491) Operators are not participating in state checkpointing in some cases

2017-02-20 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874746#comment-15874746
 ] 

Stephan Ewen commented on FLINK-2491:
-

[~srichter] For this to work properly, we need to take the status of sources 
into account for the checkpoint.
There have been some discussions about how to do this in a more general way to 
allow checkpoints for batch programs as well.

I think we should have a FLIP on that in the near future.

> Operators are not participating in state checkpointing in some cases
> 
>
> Key: FLINK-2491
> URL: https://issues.apache.org/jira/browse/FLINK-2491
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 0.10.0
>Reporter: Robert Metzger
>Assignee: Márton Balassi
>Priority: Critical
> Fix For: 1.0.0
>
>
> While implementing a test case for the Kafka Consumer, I came across the 
> following bug:
> Consider the following topology, with the operator parallelism in parentheses:
> Source (2) --> Sink (1).
> In this setup, the {{snapshotState()}} method is called on the source, but 
> not on the Sink.
> The sink receives the generated data.
> only one of the two sources is generating data.
> I've implemented a test case for this, you can find it here: 
> https://github.com/rmetzger/flink/blob/para_checkpoint_bug/flink-tests/src/test/java/org/apache/flink/test/checkpointing/ParallelismChangeCheckpoinedITCase.java



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-2491) Operators are not participating in state checkpointing in some cases

2017-02-17 Thread Stefan Richter (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871538#comment-15871538
 ] 

Stefan Richter commented on FLINK-2491:
---

This seems still valid, but I assume it is not in progress anymore. I think the 
reason for the problem is that some operator instances might shutdown at some 
point, but the checkpoint coordinator is still expecting all instances that 
were initially started to confirm in a checkpoint. What we would need is some 
way to unregister operator instances from checkpointing after their shutdown. 
Also this should be reestablished in case of restarts. Is this summary correct 
[~aljoscha] [~StephanEwen]?

> Operators are not participating in state checkpointing in some cases
> 
>
> Key: FLINK-2491
> URL: https://issues.apache.org/jira/browse/FLINK-2491
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 0.10.0
>Reporter: Robert Metzger
>Assignee: Márton Balassi
>Priority: Critical
> Fix For: 1.0.0
>
>
> While implementing a test case for the Kafka Consumer, I came across the 
> following bug:
> Consider the following topology, with the operator parallelism in parentheses:
> Source (2) --> Sink (1).
> In this setup, the {{snapshotState()}} method is called on the source, but 
> not on the Sink.
> The sink receives the generated data.
> only one of the two sources is generating data.
> I've implemented a test case for this, you can find it here: 
> https://github.com/rmetzger/flink/blob/para_checkpoint_bug/flink-tests/src/test/java/org/apache/flink/test/checkpointing/ParallelismChangeCheckpoinedITCase.java



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-2491) Operators are not participating in state checkpointing in some cases

2016-10-24 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15601982#comment-15601982
 ] 

Aljoscha Krettek commented on FLINK-2491:
-

What source are you using, [~Revy1313]?

> Operators are not participating in state checkpointing in some cases
> 
>
> Key: FLINK-2491
> URL: https://issues.apache.org/jira/browse/FLINK-2491
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 0.10.0
>Reporter: Robert Metzger
>Assignee: Márton Balassi
>Priority: Critical
> Fix For: 1.0.0
>
>
> While implementing a test case for the Kafka Consumer, I came across the 
> following bug:
> Consider the following topology, with the operator parallelism in parentheses:
> Source (2) --> Sink (1).
> In this setup, the {{snapshotState()}} method is called on the source, but 
> not on the Sink.
> The sink receives the generated data.
> The only one of the two sources is generating data.
> I've implemented a test case for this, you can find it here: 
> https://github.com/rmetzger/flink/blob/para_checkpoint_bug/flink-tests/src/test/java/org/apache/flink/test/checkpointing/ParallelismChangeCheckpoinedITCase.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2491) Operators are not participating in state checkpointing in some cases

2016-10-20 Thread Eryn Dahlstedt (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15592479#comment-15592479
 ] 

Eryn Dahlstedt commented on FLINK-2491:
---

We have been running into this problem as well. Do you have an example of this 
work around? 

Thanks for any help on this. 

> Operators are not participating in state checkpointing in some cases
> 
>
> Key: FLINK-2491
> URL: https://issues.apache.org/jira/browse/FLINK-2491
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 0.10.0
>Reporter: Robert Metzger
>Assignee: Márton Balassi
>Priority: Critical
> Fix For: 1.0.0
>
>
> While implementing a test case for the Kafka Consumer, I came across the 
> following bug:
> Consider the following topology, with the operator parallelism in parentheses:
> Source (2) --> Sink (1).
> In this setup, the {{snapshotState()}} method is called on the source, but 
> not on the Sink.
> The sink receives the generated data.
> The only one of the two sources is generating data.
> I've implemented a test case for this, you can find it here: 
> https://github.com/rmetzger/flink/blob/para_checkpoint_bug/flink-tests/src/test/java/org/apache/flink/test/checkpointing/ParallelismChangeCheckpoinedITCase.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2491) Operators are not participating in state checkpointing in some cases

2016-07-19 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384113#comment-15384113
 ] 

Aljoscha Krettek commented on FLINK-2491:
-

I think we want to tackle this for the 1.2 release, which should happen roughly 
3 months after 1.1 is out. We're currently in the last stages of releasing 1.1.

> Operators are not participating in state checkpointing in some cases
> 
>
> Key: FLINK-2491
> URL: https://issues.apache.org/jira/browse/FLINK-2491
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 0.10.0
>Reporter: Robert Metzger
>Assignee: Márton Balassi
>Priority: Critical
> Fix For: 1.0.0
>
>
> While implementing a test case for the Kafka Consumer, I came across the 
> following bug:
> Consider the following topology, with the operator parallelism in parentheses:
> Source (2) --> Sink (1).
> In this setup, the {{snapshotState()}} method is called on the source, but 
> not on the Sink.
> The sink receives the generated data.
> The only one of the two sources is generating data.
> I've implemented a test case for this, you can find it here: 
> https://github.com/rmetzger/flink/blob/para_checkpoint_bug/flink-tests/src/test/java/org/apache/flink/test/checkpointing/ParallelismChangeCheckpoinedITCase.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2491) Operators are not participating in state checkpointing in some cases

2016-07-19 Thread Valentin Denisenkov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384081#comment-15384081
 ] 

Valentin Denisenkov commented on FLINK-2491:


Can you please give an ETA for this issue?

> Operators are not participating in state checkpointing in some cases
> 
>
> Key: FLINK-2491
> URL: https://issues.apache.org/jira/browse/FLINK-2491
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 0.10.0
>Reporter: Robert Metzger
>Assignee: Márton Balassi
>Priority: Critical
> Fix For: 1.0.0
>
>
> While implementing a test case for the Kafka Consumer, I came across the 
> following bug:
> Consider the following topology, with the operator parallelism in parentheses:
> Source (2) --> Sink (1).
> In this setup, the {{snapshotState()}} method is called on the source, but 
> not on the Sink.
> The sink receives the generated data.
> The only one of the two sources is generating data.
> I've implemented a test case for this, you can find it here: 
> https://github.com/rmetzger/flink/blob/para_checkpoint_bug/flink-tests/src/test/java/org/apache/flink/test/checkpointing/ParallelismChangeCheckpoinedITCase.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2491) Operators are not participating in state checkpointing in some cases

2015-10-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/FLINK-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967987#comment-14967987
 ] 

Márton Balassi commented on FLINK-2491:
---

Unfortunately it is. I did not find the time to fix it yet, but [~StephanEwen] 
gave me some pointers last week.

> Operators are not participating in state checkpointing in some cases
> 
>
> Key: FLINK-2491
> URL: https://issues.apache.org/jira/browse/FLINK-2491
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 0.10
>Reporter: Robert Metzger
>Assignee: Márton Balassi
>Priority: Critical
> Fix For: 0.10
>
>
> While implementing a test case for the Kafka Consumer, I came across the 
> following bug:
> Consider the following topology, with the operator parallelism in parentheses:
> Source (2) --> Sink (1).
> In this setup, the {{snapshotState()}} method is called on the source, but 
> not on the Sink.
> The sink receives the generated data.
> The only one of the two sources is generating data.
> I've implemented a test case for this, you can find it here: 
> https://github.com/rmetzger/flink/blob/para_checkpoint_bug/flink-tests/src/test/java/org/apache/flink/test/checkpointing/ParallelismChangeCheckpoinedITCase.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2491) Operators are not participating in state checkpointing in some cases

2015-08-17 Thread JIRA

[ 
https://issues.apache.org/jira/browse/FLINK-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699336#comment-14699336
 ] 

Márton Balassi commented on FLINK-2491:
---

That should be good enough. I was testing with a weaker version of that, having 
0 buffertimeout. With this setup your test passed most of the time. Can you do 
the fix or would you like me to do it?

 Operators are not participating in state checkpointing in some cases
 

 Key: FLINK-2491
 URL: https://issues.apache.org/jira/browse/FLINK-2491
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.10
Reporter: Robert Metzger
Assignee: Márton Balassi
Priority: Critical
 Fix For: 0.10


 While implementing a test case for the Kafka Consumer, I came across the 
 following bug:
 Consider the following topology, with the operator parallelism in parentheses:
 Source (2) -- Sink (1).
 In this setup, the {{snapshotState()}} method is called on the source, but 
 not on the Sink.
 The sink receives the generated data.
 The only one of the two sources is generating data.
 I've implemented a test case for this, you can find it here: 
 https://github.com/rmetzger/flink/blob/para_checkpoint_bug/flink-tests/src/test/java/org/apache/flink/test/checkpointing/ParallelismChangeCheckpoinedITCase.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2491) Operators are not participating in state checkpointing in some cases

2015-08-17 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699315#comment-14699315
 ] 

Robert Metzger commented on FLINK-2491:
---

I quickly talked with [~StephanEwen] about this. Currently, the checkpoints are 
aborted if not all tasks are online, so no checkpointing is happening if some 
of the tasks have finished.
A simple workaround is keeping the source running even tough its not producing 
anything.


 Operators are not participating in state checkpointing in some cases
 

 Key: FLINK-2491
 URL: https://issues.apache.org/jira/browse/FLINK-2491
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.10
Reporter: Robert Metzger
Assignee: Márton Balassi
Priority: Critical
 Fix For: 0.10


 While implementing a test case for the Kafka Consumer, I came across the 
 following bug:
 Consider the following topology, with the operator parallelism in parentheses:
 Source (2) -- Sink (1).
 In this setup, the {{snapshotState()}} method is called on the source, but 
 not on the Sink.
 The sink receives the generated data.
 The only one of the two sources is generating data.
 I've implemented a test case for this, you can find it here: 
 https://github.com/rmetzger/flink/blob/para_checkpoint_bug/flink-tests/src/test/java/org/apache/flink/test/checkpointing/ParallelismChangeCheckpoinedITCase.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2491) Operators are not participating in state checkpointing in some cases

2015-08-17 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699292#comment-14699292
 ] 

Robert Metzger commented on FLINK-2491:
---

I suspect FLINK-2519 resolves this bug.
I'll re-run my test case to see whether it is fixed now.

 Operators are not participating in state checkpointing in some cases
 

 Key: FLINK-2491
 URL: https://issues.apache.org/jira/browse/FLINK-2491
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.10
Reporter: Robert Metzger
Assignee: Márton Balassi
Priority: Critical
 Fix For: 0.10


 While implementing a test case for the Kafka Consumer, I came across the 
 following bug:
 Consider the following topology, with the operator parallelism in parentheses:
 Source (2) -- Sink (1).
 In this setup, the {{snapshotState()}} method is called on the source, but 
 not on the Sink.
 The sink receives the generated data.
 The only one of the two sources is generating data.
 I've implemented a test case for this, you can find it here: 
 https://github.com/rmetzger/flink/blob/para_checkpoint_bug/flink-tests/src/test/java/org/apache/flink/test/checkpointing/ParallelismChangeCheckpoinedITCase.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2491) Operators are not participating in state checkpointing in some cases

2015-08-17 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699299#comment-14699299
 ] 

Robert Metzger commented on FLINK-2491:
---

The issue still persists

{code}
1750 [Checkpoint Timer] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator  - Triggering 
checkpoint 9 @ 1439805353030
1750 [Checkpoint Timer] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator  - Checkpoint 
triggering task Custom Source (2/4) is not being executed at the moment. 
Aborting checkpoint.
1800 [Checkpoint Timer] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator  - Triggering 
checkpoint 10 @ 1439805353080
1800 [Checkpoint Timer] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator  - Checkpoint 
triggering task Custom Source (2/4) is not being executed at the moment. 
Aborting checkpoint.
1850 [Checkpoint Timer] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator  - Triggering 
checkpoint 11 @ 1439805353130
1851 [Checkpoint Timer] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator  - Checkpoint 
triggering task Custom Source (2/4) is not being executed at the moment. 
Aborting checkpoint.
1900 [Checkpoint Timer] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator  - Triggering 
checkpoint 12 @ 1439805353180
1900 [Checkpoint Timer] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator  - Checkpoint 
triggering task Custom Source (2/4) is not being executed at the moment. 
Aborting checkpoint.
1950 [Checkpoint Timer] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator  - Triggering 
checkpoint 13 @ 1439805353230
1951 [Checkpoint Timer] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator  - Checkpoint 
triggering task Custom Source (2/4) is not being executed at the moment. 
Aborting checkpoint.
2000 [Checkpoint Timer] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator  - Triggering 
checkpoint 14 @ 1439805353280
2000 [Checkpoint Timer] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator  - Checkpoint 
triggering task Custom Source (2/4) is not being executed at the moment. 
Aborting checkpoint.
2050 [Checkpoint Timer] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator  - Triggering 
checkpoint 15 @ 1439805353330
2051 [Checkpoint Timer] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator  - Checkpoint 
triggering task Custom Source (2/4) is not being executed at the moment. 
Aborting checkpoint.
2100 [Checkpoint Timer] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator  - Triggering 
checkpoint 16 @ 1439805353380
2100 [Checkpoint Timer] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator  - Checkpoint 
triggering task Custom Source (2/4) is not being executed at the moment. 
Aborting checkpoint.
2150 [Checkpoint Timer] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator  - Triggering 
checkpoint 17 @ 1439805353430
2150 [Checkpoint Timer] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator  - Checkpoint 
triggering task Custom Source (2/4) is not being executed at the moment. 
Aborting checkpoint.
{code}


 Operators are not participating in state checkpointing in some cases
 

 Key: FLINK-2491
 URL: https://issues.apache.org/jira/browse/FLINK-2491
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.10
Reporter: Robert Metzger
Assignee: Márton Balassi
Priority: Critical
 Fix For: 0.10


 While implementing a test case for the Kafka Consumer, I came across the 
 following bug:
 Consider the following topology, with the operator parallelism in parentheses:
 Source (2) -- Sink (1).
 In this setup, the {{snapshotState()}} method is called on the source, but 
 not on the Sink.
 The sink receives the generated data.
 The only one of the two sources is generating data.
 I've implemented a test case for this, you can find it here: 
 https://github.com/rmetzger/flink/blob/para_checkpoint_bug/flink-tests/src/test/java/org/apache/flink/test/checkpointing/ParallelismChangeCheckpoinedITCase.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2491) Operators are not participating in state checkpointing in some cases

2015-08-12 Thread JIRA

[ 
https://issues.apache.org/jira/browse/FLINK-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693304#comment-14693304
 ] 

Márton Balassi commented on FLINK-2491:
---

This is troublesome, when setting log level to debug it shows that the 
`StreamTask` never calls a checkpoint on the sink. I am looking into it.

 Operators are not participating in state checkpointing in some cases
 

 Key: FLINK-2491
 URL: https://issues.apache.org/jira/browse/FLINK-2491
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.10
Reporter: Robert Metzger
Assignee: Márton Balassi
Priority: Critical
 Fix For: 0.10


 While implementing a test case for the Kafka Consumer, I came across the 
 following bug:
 Consider the following topology, with the operator parallelism in parentheses:
 Source (2) -- Sink (1).
 In this setup, the {{snapshotState()}} method is called on the source, but 
 not on the Sink.
 The sink receives the generated data.
 The only one of the two sources is generating data.
 I've implemented a test case for this, you can find it here: 
 https://github.com/rmetzger/flink/blob/para_checkpoint_bug/flink-tests/src/test/java/org/apache/flink/test/checkpointing/ParallelismChangeCheckpoinedITCase.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2491) Operators are not participating in state checkpointing in some cases

2015-08-12 Thread JIRA

[ 
https://issues.apache.org/jira/browse/FLINK-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693387#comment-14693387
 ] 

Márton Balassi commented on FLINK-2491:
---

Here is the root cause. [1]

[1] 
https://github.com/apache/flink/blob/master/flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/graph/StreamingJobGraphGenerator.java#L415

The same parallelism case works because of chaining.

 Operators are not participating in state checkpointing in some cases
 

 Key: FLINK-2491
 URL: https://issues.apache.org/jira/browse/FLINK-2491
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.10
Reporter: Robert Metzger
Assignee: Márton Balassi
Priority: Critical
 Fix For: 0.10


 While implementing a test case for the Kafka Consumer, I came across the 
 following bug:
 Consider the following topology, with the operator parallelism in parentheses:
 Source (2) -- Sink (1).
 In this setup, the {{snapshotState()}} method is called on the source, but 
 not on the Sink.
 The sink receives the generated data.
 The only one of the two sources is generating data.
 I've implemented a test case for this, you can find it here: 
 https://github.com/rmetzger/flink/blob/para_checkpoint_bug/flink-tests/src/test/java/org/apache/flink/test/checkpointing/ParallelismChangeCheckpoinedITCase.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)