Denis Chudov created IGNITE-25418:
-------------------------------------
Summary: Long write of 1000-rows batches into storage
Key: IGNITE-25418
URL: https://issues.apache.org/jira/browse/IGNITE-25418
Project: Ignite
Issue Type: Bug
Reporter: Denis Chudov
While investigating "Send with retry timeout" exceptions occurring under the
load from data streamer, I logged the time of call in
PartitionListener#handleUpdateAllCommand here:
{code:java}
storageUpdateHandler.handleUpdateAll(
txId,
cmd.rowsToUpdate(),
cmd.commitPartitionId().asReplicationGroupId(),
!cmd.full(),
() -> storage.lastApplied(commandIndex, commandTerm),
cmd.full() ? safeTimestamp : null,
indexIdsAtRwTxBeginTs(catalogService, txId, storage.tableId())
); {code}
At some time, it took over hundreds of milliseconds:
{code:java}
[2025-05-19T11:33:22,849][INFO
][%istod_sdt_3345%JRaft-FSMCaller-Disruptor_stripe_3-0][PartitionListener] qqq
groupId=19_part_9, storageWriteMcs=341377, hash=-1115824889 {code}
In this line, the number is in microseconds.
Also, I see that the delta files' compaction time is constantly growing (look
at the two pairs of lines, page number is the same but compaction time grew
dramatically, as well as the checkpoint fsync time):
{code:java}
2025-03-26 14:46:23:999 +0000 [INFO][%node1%checkpoint-thread][Checkpointer]
Checkpoint finished [checkpointId=67f35966-b0fd-4377-8e5b-7a3af7d73aa2,
pages=140807, pagesWriteTime=1132ms, fsyncTime=2200ms,
replicatorLogSyncTime=126ms, totalTime=3591ms, avgWriteSpeed=613MB/s]2025-03-26
14:46:26:609 +0000 [INFO][%node1%compaction-thread][Compactor] Compaction round
finished [compactionId=8fc11431-783f-4111-8e5b-017870110cf5, pages=140807,
duration=2610ms, avgWriteSpeed=843MB/s]
2025-03-26 15:11:02:160 +0000 [INFO][%node1%checkpoint-thread][Checkpointer]
Checkpoint finished [checkpointId=2b4d735f-1b09-471a-a47e-3d5c5c8c7f5d,
pages=165589, pagesWriteTime=1476ms, fsyncTime=6027ms,
replicatorLogSyncTime=230ms, totalTime=7798ms, avgWriteSpeed=332MB/s]2025-03-26
15:11:07:276 +0000 [INFO][%node1%compaction-thread][Compactor] Compaction round
finished [compactionId=6984cff7-e1e1-4ec7-ba5a-5e5d7037898a, pages=170116,
duration=10213ms, avgWriteSpeed=260MB/s] {code}
I have also seen the plenty of lines like this:
{code:java}
[2025-05-19T11:32:41,268][INFO ][checkpoint-runner-io0][CheckpointPagesWriter]
Checkpoint pages were not written yet due to unsuccessful page write lock
acquisition and will be retried [pageCount=1] {code}
*Definition of done*
The reason of long write time and the degradation of compaction time should be
investigated and fixed, if applicable.
*Notes*
Reproducer is attached.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)