[ 
https://issues.apache.org/jira/browse/IGNITE-22310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denis Chudov updated IGNITE-22310:
----------------------------------
    Description: 
When running tests on my local machine, I've encountered a lot of messages like 
"Message handling has been too long" for messages of type 
{{{}TxCleanupMessage{}}}. Looks like the sync part of the 
{{TxCleanupRequestHandler}} should either be optimized or dispatched onto a 
different thread.

The reason is that we have the following flow: tx finish (on commit partition) 
-> tx cleanup (on every node containing enlisted partitions) -> tx write intent 
switch (on every enlisted partition). In the case of commit partition all of 
this is executed on the same node, so the network engine does "send to self" 
without changing the thread. In other words, on commit partition the flow is 
following: tx finish (partition operations thread) -> tx cleanup (same thread) 
-> tx write intent switch (same thread).

There is a network message handler measuring the time and writing the mentioned 
message to log. In the case of tx cleanup on commit partition it actually 
measures the tx write intent switch, which is not very fast and happens 
synchronously because we started tx cleanup already being within a partition 
operations thread. Seems that this time measuring handler should be aware of 
the thread permissions and not write any warnings when the current thread has 
(storage_read, storage_write) permissions.

  was:When running tests on my local machine, I've encountered a lot of 
messages like "Message handling has been too long" for messages of type 
{{TxCleanupMessage}}. Looks like the sync part of the 
{{TxCleanupRequestHandler}} should either be optimized or dispatched onto a 
different thread.


> Handling of TxCleanupMessage takes too much time on the network thread 
> -----------------------------------------------------------------------
>
>                 Key: IGNITE-22310
>                 URL: https://issues.apache.org/jira/browse/IGNITE-22310
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Aleksandr Polovtsev
>            Priority: Major
>              Labels: ignite-3
>
> When running tests on my local machine, I've encountered a lot of messages 
> like "Message handling has been too long" for messages of type 
> {{{}TxCleanupMessage{}}}. Looks like the sync part of the 
> {{TxCleanupRequestHandler}} should either be optimized or dispatched onto a 
> different thread.
> The reason is that we have the following flow: tx finish (on commit 
> partition) -> tx cleanup (on every node containing enlisted partitions) -> tx 
> write intent switch (on every enlisted partition). In the case of commit 
> partition all of this is executed on the same node, so the network engine 
> does "send to self" without changing the thread. In other words, on commit 
> partition the flow is following: tx finish (partition operations thread) -> 
> tx cleanup (same thread) -> tx write intent switch (same thread).
> There is a network message handler measuring the time and writing the 
> mentioned message to log. In the case of tx cleanup on commit partition it 
> actually measures the tx write intent switch, which is not very fast and 
> happens synchronously because we started tx cleanup already being within a 
> partition operations thread. Seems that this time measuring handler should be 
> aware of the thread permissions and not write any warnings when the current 
> thread has (storage_read, storage_write) permissions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to