[ 
https://issues.apache.org/jira/browse/FLINK-15105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16993554#comment-16993554
 ] 

Congxian Qiu(klion26) edited comment on FLINK-15105 at 12/11/19 1:46 PM:
-------------------------------------------------------------------------

First, answer the last question: we can't just remove "error" message in 
{{RuntimeException}},  we'll fail in 
{{common.sh#}}{{check_logs_for_exceptions()}} because of the 
{{RuntimeException}}.

Then I'll try to describe more about the things about {{FailureMapper}}.
 # {{FailureMapper is only used in {{DataStreamAllroundTestProgram.}}}}
 # we'll add a {{FailureMapper}} in {{DataStreamAllroundTestProgram only if we 
[enabled 
TEST_SIMULATE_FAILURE|https://github.com/apache/flink/blob/eddad99123525211c900102206384dacaf8385fc/flink-end-to-end-tests/flink-datastream-allround-test/src/main/java/org/apache/flink/streaming/tests/DataStreamAllroundTestProgram.java#L173]}}
 in {{DataStreamAllroundTestProgram}}
 # {{We'll throw Exception in {{FailureMapper#map}}}} and 
{{FailureMapper#notifyCheckpointComplete}}
 # {{we'll enable {{TEST_SIMULATE_FAILURE}} }} in {{test_ha_datastream.sh}}, 
{{test_ha_per_job_cluster_datastream.sh}} and 
{{test_resume_externalized_checkpoints.sh}}

IIUC, all the above tests are wanna test whether the job can restore 
from(restore with checkpoint) the last failed job successfully(but we do not 
care where the exception come from, then Exception thrown from 
FailureMapper#mapper or FailureMapper#notifyCheckpointComplete have the same 
effect, please correct me if I miss anything here). If we want to verify that 
`failure of notifyCheckpointComplete can fail task`, maybe we can add a ut for 
it.

 

 


was (Author: klion26):
First, answer the last question: we can't just remove "error" message in 
{{RuntimeException}},  we'll fail in 
{{common.sh#}}{{check_logs_for_exceptions()}} because of the 
{{RuntimeException}}.

Then I'll try to describe more about the things about {{FailureMapper}}.
 # {{FailureMapper is only used in {{DataStreamAllroundTestProgram.}}}}
 # we'll add a {{FailureMapper}} in {{DataStreamAllroundTestProgram only }}if 
we [enabled 
TEST_SIMULATE_FAILURE|https://github.com/apache/flink/blob/eddad99123525211c900102206384dacaf8385fc/flink-end-to-end-tests/flink-datastream-allround-test/src/main/java/org/apache/flink/streaming/tests/DataStreamAllroundTestProgram.java#L173]{{}}
 in {{DataStreamAllroundTestProgram}}
 # {{We'll throw Exception in }}{{FailureMapper#map}} and 
{{FailureMapper#notifyCheckpointComplete}}
 # {{we'll enable }}{{TEST_SIMULATE_FAILURE}} in {{test_ha_datastream.sh}}, 
{{test_ha_per_job_cluster_datastream.sh}} and 
{{test_resume_externalized_checkpoints.sh}}

IIUC, all the above tests are wanna test whether the job can restore 
from(restore with checkpoint) the last failed job successfully(but we do not 
care where the exception come from, then Exception thrown from 
FailureMapper#mapper or FailureMapper#notifyCheckpointComplete have the same 
effect, please correct me if I miss anything here). If we want to verify that 
`failure of notifyCheckpointComplete can fail task`, maybe we can add a ut for 
it.


 

 

> Resuming Externalized Checkpoint after terminal failure (rocks, incremental) 
> end-to-end test stalls on travis
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-15105
>                 URL: https://issues.apache.org/jira/browse/FLINK-15105
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.10.0
>            Reporter: Yu Li
>            Priority: Critical
>              Labels: test-stability
>             Fix For: 1.10.0
>
>
> Resuming Externalized Checkpoint after terminal failure (rocks, incremental) 
> end-to-end test fails on release-1.9 nightly build stalls with "The job 
> exceeded the maximum log length, and has been terminated".
> https://api.travis-ci.org/v3/job/621090394/log.txt



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to