[
https://issues.apache.org/jira/browse/FLINK-40016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18092308#comment-18092308
]
Rui Fan commented on FLINK-40016:
---------------------------------
[^UnalignedCheckpointRescaleITCase_upscale_pipeline_20_to_21_FAILED.log]
^The log from failed case^
org.apache.flink.test.checkpointing.UnalignedCheckpointRescaleITCase.shouldRescaleUnalignedCheckpoint[upscale
pipeline from 20 to 21, sourceSleepMs = 0]
[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=76170&view=logs&j=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3&t=115e5c38-6efb-5006-4921-5e2851da71ef&l=6608]
{code:java}
# unaligned checkpoint is always be enabled later by api
Randomly selected false for execution.checkpointing.unaligned.enabled
Randomly selected true for
execution.checkpointing.unaligned.recover-output-on-downstream.enabled
Randomly selected true for execution.checkpointing.during-recovery.enabled
Randomly selected PT2S for execution.checkpointing.aligned-checkpoint-timeout
Randomly selected false for execution.checkpointing.cleaner.parallel-mode
Randomly selected true for
execution.checkpointing.unaligned.interruptible-timers.enabled
Randomly selected false for execution.checkpointing.snapshot-compression
Randomly selected true for execution.checkpointing.file-merging.enabled
Randomly selected false for state.backend.rocksdb.use-ingest-db-restore-mode
Randomly selected 2 for pipeline.watermark-alignment.buffer-size
Randomly selected 2 for table.exec.unbounded-over.version
Randomly selected MAP for table.exec.sink.upsert-materialize-strategy.type
{code}
> UnalignedCheckpointRescaleITCase.shouldRescaleUnalignedCheckpoint fails with
> "Corrupt stream, found tag" (in-flight stream desync) on rescale recovery
> ------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-40016
> URL: https://issues.apache.org/jira/browse/FLINK-40016
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Checkpointing
> Reporter: Martijn Visser
> Assignee: Rui Fan
> Priority: Major
> Attachments:
> UnalignedCheckpointRescaleITCase_upscale_pipeline_20_to_21_FAILED.log
>
>
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=76170&view=results]
> (leg: test_cron_azure tests)
> {code:java}
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
> at
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:180)
> at
> org.apache.flink.test.checkpointing.UnalignedCheckpointTestBase.execute(UnalignedCheckpointTestBase.java:217)
> at
> org.apache.flink.test.checkpointing.UnalignedCheckpointRescaleITCase.shouldRescaleUnalignedCheckpoint(UnalignedCheckpointRescaleITCase.java:683)
> Caused by: org.apache.flink.runtime.JobException: Recovery is suppressed by
> FixedDelayRestartBackoffTimeStrategy(maxNumberRestartAttempts=1,
> backoffTimeMS=100)
> Caused by: java.io.IOException: Can't get next record for channel
> InputChannelInfo{gateIdx=0, inputChannelIdx=5}
> Caused by: java.io.IOException: Corrupt stream, found tag: 8
> {code}
> {{org.apache.flink.test.checkpointing.UnalignedCheckpointRescaleITCase.shouldRescaleUnalignedCheckpoint}}
> (parameter {{{}[19]{}}}) fails on rescale recovery: the restored in-flight
> byte stream is misaligned, so {{StreamElementSerializer}} reads a data byte
> where an element tag is expected. Valid tags are 0-6; "tag 8" is out of
> range, confirming a stream desync rather than a format mismatch.
> Suspected to be in the recovered in-flight-buffer / filtering-recovery path
> that landed in master 2026-02 .. 2026-04 (FLINK-39018 buffer migration from
> RecoveredInputChannel to physical channels; FLINK-38930 filtering record
> before processing without spilling / heap-buffer fallback during recovery;
> FLINK-39519 reusable heap segment for pre-filter source buffers). The failing
> build's head commit (4902753429, FLINK-35562) is a flink-table-planner
> test-only change and cannot have introduced this, so the defect is latent in
> master.
> Related: FLINK-22197 (same test/method but a {{{}NoSuchFileException{}}},
> different cause), FLINK-38643 (hang), FLINK-35351 / FLINK-39162 (closed
> UC-corruption, custom-partitioner specific).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)