[
https://issues.apache.org/jira/browse/IGNITE-25418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Denis Chudov updated IGNITE-25418:
----------------------------------
Attachment: ItStreamTonsOfData.java
> Long write of 1000-rows batches into storage
> --------------------------------------------
>
> Key: IGNITE-25418
> URL: https://issues.apache.org/jira/browse/IGNITE-25418
> Project: Ignite
> Issue Type: Bug
> Reporter: Denis Chudov
> Priority: Major
> Labels: ignite-3
> Attachments: ItStreamTonsOfData.java
>
>
> While investigating "Send with retry timeout" exceptions occurring under the
> load from data streamer, I logged the time of call in
> PartitionListener#handleUpdateAllCommand here:
>
> {code:java}
> storageUpdateHandler.handleUpdateAll(
> txId,
> cmd.rowsToUpdate(),
> cmd.commitPartitionId().asReplicationGroupId(),
> !cmd.full(),
> () -> storage.lastApplied(commandIndex, commandTerm),
> cmd.full() ? safeTimestamp : null,
> indexIdsAtRwTxBeginTs(catalogService, txId, storage.tableId())
> ); {code}
> At some time, it took over hundreds of milliseconds:
>
> {code:java}
> [2025-05-19T11:33:22,849][INFO
> ][%istod_sdt_3345%JRaft-FSMCaller-Disruptor_stripe_3-0][PartitionListener]
> qqq groupId=19_part_9, storageWriteMcs=341377, hash=-1115824889 {code}
> In this line, the number is in microseconds.
> Also, I see that the delta files' compaction time is constantly growing (look
> at the two pairs of lines, page number is the same but compaction time grew
> dramatically, as well as the checkpoint fsync time):
> {code:java}
> 2025-03-26 14:46:23:999 +0000 [INFO][%node1%checkpoint-thread][Checkpointer]
> Checkpoint finished [checkpointId=67f35966-b0fd-4377-8e5b-7a3af7d73aa2,
> pages=140807, pagesWriteTime=1132ms, fsyncTime=2200ms,
> replicatorLogSyncTime=126ms, totalTime=3591ms,
> avgWriteSpeed=613MB/s]2025-03-26 14:46:26:609 +0000
> [INFO][%node1%compaction-thread][Compactor] Compaction round finished
> [compactionId=8fc11431-783f-4111-8e5b-017870110cf5, pages=140807,
> duration=2610ms, avgWriteSpeed=843MB/s]
> 2025-03-26 15:11:02:160 +0000 [INFO][%node1%checkpoint-thread][Checkpointer]
> Checkpoint finished [checkpointId=2b4d735f-1b09-471a-a47e-3d5c5c8c7f5d,
> pages=165589, pagesWriteTime=1476ms, fsyncTime=6027ms,
> replicatorLogSyncTime=230ms, totalTime=7798ms,
> avgWriteSpeed=332MB/s]2025-03-26 15:11:07:276 +0000
> [INFO][%node1%compaction-thread][Compactor] Compaction round finished
> [compactionId=6984cff7-e1e1-4ec7-ba5a-5e5d7037898a, pages=170116,
> duration=10213ms, avgWriteSpeed=260MB/s] {code}
> I have also seen the plenty of lines like this:
> {code:java}
> [2025-05-19T11:32:41,268][INFO
> ][checkpoint-runner-io0][CheckpointPagesWriter] Checkpoint pages were not
> written yet due to unsuccessful page write lock acquisition and will be
> retried [pageCount=1] {code}
> *Definition of done*
> The reason of long write time and the degradation of compaction time should
> be investigated and fixed, if applicable.
> *Notes*
> Reproducer is attached, put it into ignite-runner - integrationTests.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)