[jira] [Commented] (IGNITE-22842) Clients return to the fork-join pool after handle sync-operation

2024-10-03 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-22842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886721#comment-17886721
 ] 

Vladislav Pyatkov commented on IGNITE-22842:


Merged bc4855de8b290434d70cdaf0e7a68ad0157c00b3

> Clients return to the fork-join pool after handle sync-operation
> 
>
> Key: IGNITE-22842
> URL: https://issues.apache.org/jira/browse/IGNITE-22842
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> In synchronous operation, we return the result to the fork join pool before 
> providing the result to the client in their pool. Of course, the tric costs 
> several microseconds.
> {noformat}
> loadSchemaAndReadData:ForkJoinPool.commonPool-worker-9 0.2 9950400 9950600
> Here is hidden 5.2 us
> kvClientPutEndMark:org.apache.ignite.internal.benchmark.InsertBenchmark.kvThinInsert-jmh-worker-1
>  0.0 9955800 9955800
> {noformat}
> The issue is simullar IGNITE-22838 but about that type of operation that 
> started on the client side.
> h3. Definition of done
> We need to develop a strategy that does not involve the extra thread in 
> synchronous operation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22842) Clients return to the fork-join pool after handle sync-operation

2024-10-03 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-22842:
---
Summary: Clients return to the fork-join pool after handle sync-operation  
(was: The client returns to the fork-join pool after handling operations on the 
server side)

> Clients return to the fork-join pool after handle sync-operation
> 
>
> Key: IGNITE-22842
> URL: https://issues.apache.org/jira/browse/IGNITE-22842
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> In synchronous operation, we return the result to the fork join pool before 
> providing the result to the client in their pool. Of course, the tric costs 
> several microseconds.
> {noformat}
> loadSchemaAndReadData:ForkJoinPool.commonPool-worker-9 0.2 9950400 9950600
> Here is hidden 5.2 us
> kvClientPutEndMark:org.apache.ignite.internal.benchmark.InsertBenchmark.kvThinInsert-jmh-worker-1
>  0.0 9955800 9955800
> {noformat}
> The issue is simullar IGNITE-22838 but about that type of operation that 
> started on the client side.
> h3. Definition of done
> We need to develop a strategy that does not involve the extra thread in 
> synchronous operation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (IGNITE-22842) The client returns to the fork-join pool after handling operations on the server side

2024-10-03 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-22842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886690#comment-17886690
 ] 

Vladislav Pyatkov edited comment on IGNITE-22842 at 10/3/24 2:38 PM:
-

Currently, we do not use an asynchronous executor on synchronous operations.

{noformat}
kvClientGetMark:org.apache.ignite.internal.benchmark.SelectBenchmark.kvThinGet-jmh-worker-1
 0.3 6952900 6953200
    Here is hidden 3.3 us
writeMessage:org.apache.ignite.internal.benchmark.SelectBenchmark.kvThinGet-jmh-worker-1
 7.9 6956500 6964400
    Here is hidden 77.8 us
loadSchemaAndReadData:org.apache.ignite.internal.benchmark.SelectBenchmark.kvThinGet-jmh-worker-1
 0.3 7042200 7042500
    Here is hidden 0.5 us
kvClientGetEndMark:org.apache.ignite.internal.benchmark.SelectBenchmark.kvThinGet-jmh-worker-1
 0.1 7043000 7043100
Whole time: 93.1 us
Found time: 8600 Not found time: 84500.0 Percentage of found: 9.237379162191193
{noformat}


was (Author: v.pyatkov):
Currently, we do not use an asynchronous executor on synchronous operations.

{nofoamt}

kvClientGetMark:org.apache.ignite.internal.benchmark.SelectBenchmark.kvThinGet-jmh-worker-1
 0.3 6952900 6953200
    Here is hidden 3.3 us
writeMessage:org.apache.ignite.internal.benchmark.SelectBenchmark.kvThinGet-jmh-worker-1
 7.9 6956500 6964400
    Here is hidden 77.8 us
loadSchemaAndReadData:org.apache.ignite.internal.benchmark.SelectBenchmark.kvThinGet-jmh-worker-1
 0.3 7042200 7042500
    Here is hidden 0.5 us
kvClientGetEndMark:org.apache.ignite.internal.benchmark.SelectBenchmark.kvThinGet-jmh-worker-1
 0.1 7043000 7043100
Whole time: 93.1 us
Found time: 8600 Not found time: 84500.0 Percentage of found: 9.237379162191193

{nofoamt}

> The client returns to the fork-join pool after handling operations on the 
> server side
> -
>
> Key: IGNITE-22842
> URL: https://issues.apache.org/jira/browse/IGNITE-22842
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> In synchronous operation, we return the result to the fork join pool before 
> providing the result to the client in their pool. Of course, the tric costs 
> several microseconds.
> {noformat}
> loadSchemaAndReadData:ForkJoinPool.commonPool-worker-9 0.2 9950400 9950600
> Here is hidden 5.2 us
> kvClientPutEndMark:org.apache.ignite.internal.benchmark.InsertBenchmark.kvThinInsert-jmh-worker-1
>  0.0 9955800 9955800
> {noformat}
> The issue is simullar IGNITE-22838 but about that type of operation that 
> started on the client side.
> h3. Definition of done
> We need to develop a strategy that does not involve the extra thread in 
> synchronous operation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (IGNITE-22842) The client returns to the fork-join pool after handling operations on the server side

2024-10-03 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-22842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886690#comment-17886690
 ] 

Vladislav Pyatkov edited comment on IGNITE-22842 at 10/3/24 2:37 PM:
-

Currently, we do not use an asynchronous executor on synchronous operations.

{nofoamt}

kvClientGetMark:org.apache.ignite.internal.benchmark.SelectBenchmark.kvThinGet-jmh-worker-1
 0.3 6952900 6953200
    Here is hidden 3.3 us
writeMessage:org.apache.ignite.internal.benchmark.SelectBenchmark.kvThinGet-jmh-worker-1
 7.9 6956500 6964400
    Here is hidden 77.8 us
loadSchemaAndReadData:org.apache.ignite.internal.benchmark.SelectBenchmark.kvThinGet-jmh-worker-1
 0.3 7042200 7042500
    Here is hidden 0.5 us
kvClientGetEndMark:org.apache.ignite.internal.benchmark.SelectBenchmark.kvThinGet-jmh-worker-1
 0.1 7043000 7043100
Whole time: 93.1 us
Found time: 8600 Not found time: 84500.0 Percentage of found: 9.237379162191193

{nofoamt}


was (Author: v.pyatkov):
Currently, we do not use an asynchronous executor on synchronous operations.
{nofomeat}

kvClientGetMark:org.apache.ignite.internal.benchmark.SelectBenchmark.kvThinGet-jmh-worker-1
 0.3 6952900 6953200
    Here is hidden 3.3 us
writeMessage:org.apache.ignite.internal.benchmark.SelectBenchmark.kvThinGet-jmh-worker-1
 7.9 6956500 6964400
    Here is hidden 77.8 us
loadSchemaAndReadData:org.apache.ignite.internal.benchmark.SelectBenchmark.kvThinGet-jmh-worker-1
 0.3 7042200 7042500
    Here is hidden 0.5 us
kvClientGetEndMark:org.apache.ignite.internal.benchmark.SelectBenchmark.kvThinGet-jmh-worker-1
 0.1 7043000 7043100
Whole time: 93.1 us
Found time: 8600 Not found time: 84500.0 Percentage of found: 9.237379162191193

{nofoemat}

> The client returns to the fork-join pool after handling operations on the 
> server side
> -
>
> Key: IGNITE-22842
> URL: https://issues.apache.org/jira/browse/IGNITE-22842
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> In synchronous operation, we return the result to the fork join pool before 
> providing the result to the client in their pool. Of course, the tric costs 
> several microseconds.
> {noformat}
> loadSchemaAndReadData:ForkJoinPool.commonPool-worker-9 0.2 9950400 9950600
> Here is hidden 5.2 us
> kvClientPutEndMark:org.apache.ignite.internal.benchmark.InsertBenchmark.kvThinInsert-jmh-worker-1
>  0.0 9955800 9955800
> {noformat}
> The issue is simullar IGNITE-22838 but about that type of operation that 
> started on the client side.
> h3. Definition of done
> We need to develop a strategy that does not involve the extra thread in 
> synchronous operation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-22842) The client returns to the fork-join pool after handling operations on the server side

2024-10-03 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-22842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886690#comment-17886690
 ] 

Vladislav Pyatkov commented on IGNITE-22842:


Currently, we do not use an asynchronous executor on synchronous operations.
{nofomeat}

kvClientGetMark:org.apache.ignite.internal.benchmark.SelectBenchmark.kvThinGet-jmh-worker-1
 0.3 6952900 6953200
    Here is hidden 3.3 us
writeMessage:org.apache.ignite.internal.benchmark.SelectBenchmark.kvThinGet-jmh-worker-1
 7.9 6956500 6964400
    Here is hidden 77.8 us
loadSchemaAndReadData:org.apache.ignite.internal.benchmark.SelectBenchmark.kvThinGet-jmh-worker-1
 0.3 7042200 7042500
    Here is hidden 0.5 us
kvClientGetEndMark:org.apache.ignite.internal.benchmark.SelectBenchmark.kvThinGet-jmh-worker-1
 0.1 7043000 7043100
Whole time: 93.1 us
Found time: 8600 Not found time: 84500.0 Percentage of found: 9.237379162191193

{nofoemat}

> The client returns to the fork-join pool after handling operations on the 
> server side
> -
>
> Key: IGNITE-22842
> URL: https://issues.apache.org/jira/browse/IGNITE-22842
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> In synchronous operation, we return the result to the fork join pool before 
> providing the result to the client in their pool. Of course, the tric costs 
> several microseconds.
> {noformat}
> loadSchemaAndReadData:ForkJoinPool.commonPool-worker-9 0.2 9950400 9950600
> Here is hidden 5.2 us
> kvClientPutEndMark:org.apache.ignite.internal.benchmark.InsertBenchmark.kvThinInsert-jmh-worker-1
>  0.0 9955800 9955800
> {noformat}
> The issue is simullar IGNITE-22838 but about that type of operation that 
> started on the client side.
> h3. Definition of done
> We need to develop a strategy that does not involve the extra thread in 
> synchronous operation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-23255) Fastest way to use Placement driver for RO transaction

2024-10-03 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov reassigned IGNITE-23255:
--

Assignee: Vladislav Pyatkov

> Fastest way to use Placement driver for RO transaction
> --
>
> Key: IGNITE-23255
> URL: https://issues.apache.org/jira/browse/IGNITE-23255
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> We know the _completebelFuture#orTimeout_ method is not fast. By the reason 
> we added a method _LeasePlacementDriver#getCurrentPrimaryReplica_ which can 
> return a primary replica immediately if the information has cached already.
> This approach is still not used in methods for RO transactions:
> _InternalTableImpl#evaluateReadOnlyPrimaryNode_
> _InternalTableImpl#evaluateReadOnlyRecipientNode_
> h3. Definition of done
> Enrich the methods with getCurrentPrimaryReplica invokation before waiting 
> directly (We have to check it in benchmark. If the benchmark does not show a 
> reducing latency, only the next point is needed to implement).
> Enforce the awaitPrimaryReplica method in PD to use getCurrentPrimaryReplica 
> before waiting.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (IGNITE-22842) The client returns to the fork-join pool after handling operations on the server side

2024-10-02 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-22842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886340#comment-17886340
 ] 

Vladislav Pyatkov edited comment on IGNITE-22842 at 10/2/24 10:48 AM:
--

Local benchmark result:
{noformat}
Benchmark  (clusterSize)  (fsync)  (useSyncOptimization)  Mode  
Cnt   Score   Error  Units
SelectBenchmark.kvThinGet  1false   true  avgt  
 20  86,590 ± 2,708  us/op
SelectBenchmark.kvThinGet  1false  false  avgt  
 20  94,047 ± 0,979  us/op
{noformat}
The useSyncOptimization is a flag to apply this optimization or not. Based on 
this measurement, the optimization gives about an 8% reduction in latency.


was (Author: v.pyatkov):
Local benchmark result:
Benchmark  (clusterSize)  (fsync)  (useSyncOptimization)  Mode  
Cnt   Score   Error  Units
SelectBenchmark.kvThinGet  1false   true  avgt  
 20  86,590 ± 2,708  us/op
SelectBenchmark.kvThinGet  1false  false  avgt  
 20  94,047 ± 0,979  us/op

The useSyncOptimization is a flag to apply this optimization or not. Based on 
this measurement, the optimization gives about an 8% reduction in latency.

> The client returns to the fork-join pool after handling operations on the 
> server side
> -
>
> Key: IGNITE-22842
> URL: https://issues.apache.org/jira/browse/IGNITE-22842
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> In synchronous operation, we return the result to the fork join pool before 
> providing the result to the client in their pool. Of course, the tric costs 
> several microseconds.
> {noformat}
> loadSchemaAndReadData:ForkJoinPool.commonPool-worker-9 0.2 9950400 9950600
> Here is hidden 5.2 us
> kvClientPutEndMark:org.apache.ignite.internal.benchmark.InsertBenchmark.kvThinInsert-jmh-worker-1
>  0.0 9955800 9955800
> {noformat}
> The issue is simullar IGNITE-22838 but about that type of operation that 
> started on the client side.
> h3. Definition of done
> We need to develop a strategy that does not involve the extra thread in 
> synchronous operation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-22842) The client returns to the fork-join pool after handling operations on the server side

2024-10-02 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-22842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886340#comment-17886340
 ] 

Vladislav Pyatkov commented on IGNITE-22842:


Local benchmark result:
Benchmark  (clusterSize)  (fsync)  (useSyncOptimization)  Mode  
Cnt   Score   Error  Units
SelectBenchmark.kvThinGet  1false   true  avgt  
 20  86,590 ± 2,708  us/op
SelectBenchmark.kvThinGet  1false  false  avgt  
 20  94,047 ± 0,979  us/op

The useSyncOptimization is a flag to apply this optimization or not. Based on 
this measurement, the optimization gives about an 8% reduction in latency.

> The client returns to the fork-join pool after handling operations on the 
> server side
> -
>
> Key: IGNITE-22842
> URL: https://issues.apache.org/jira/browse/IGNITE-22842
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> In synchronous operation, we return the result to the fork join pool before 
> providing the result to the client in their pool. Of course, the tric costs 
> several microseconds.
> {noformat}
> loadSchemaAndReadData:ForkJoinPool.commonPool-worker-9 0.2 9950400 9950600
> Here is hidden 5.2 us
> kvClientPutEndMark:org.apache.ignite.internal.benchmark.InsertBenchmark.kvThinInsert-jmh-worker-1
>  0.0 9955800 9955800
> {noformat}
> The issue is simullar IGNITE-22838 but about that type of operation that 
> started on the client side.
> h3. Definition of done
> We need to develop a strategy that does not involve the extra thread in 
> synchronous operation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IGNITE-23199) Investigate GET scalability issues

2024-10-01 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov resolved IGNITE-23199.

Resolution: Fixed

> Investigate GET scalability issues
> --
>
> Key: IGNITE-23199
> URL: https://issues.apache.org/jira/browse/IGNITE-23199
> Project: Ignite
>  Issue Type: Test
>Reporter: Alexey Scherbakov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3, important
> Fix For: 3.0
>
>
> Current benchmark results show what GETs do not scale well, especially under 
> high concurrency.
> This should be investigated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-23199) Investigate GET scalability issues

2024-10-01 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-23199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886136#comment-17886136
 ] 

Vladislav Pyatkov commented on IGNITE-23199:


Benchmark results:

{noformat}
8
Benchmark  (clusterSize)  (delayDuration)  (fsync)  (hClockType)   
Mode  Cnt   Score   Error  Units
SelectBenchmark.kvGet  1  500falseorigin  
thrpt   20  549544.656 ?  7113.368  ops/s
SelectBenchmark.kvGet  1 5000falseorigin  
thrpt   20  531773.282 ? 36628.275  ops/s

16
Benchmark  (clusterSize)  (delayDuration)  (fsync)  (hClockType)   
Mode  Cnt   Score   Error  Units
SelectBenchmark.kvGet  1  500falseorigin  
thrpt   20  481676.983 ? 13758.331  ops/s
SelectBenchmark.kvGet  1 5000falseorigin  
thrpt   20  472153.175 ? 11941.622  ops/s

32
Benchmark  (clusterSize)  (delayDuration)  (fsync)  (hClockType)   
Mode  Cnt   Score  Error  Units
SelectBenchmark.kvGet  1  500falseorigin  
thrpt   20  581220.229 ? 7390.287  ops/s
SelectBenchmark.kvGet  1 5000falseorigin  
thrpt   20  554774.290 ? 5439.051  ops/s

64
Benchmark  (clusterSize)  (delayDuration)  (fsync)  (hClockType)   
Mode  Cnt   Score   Error  Units
SelectBenchmark.kvGet  1  500falseorigin  
thrpt   20  706067.468 ? 34174.959  ops/s
SelectBenchmark.kvGet  1 5000falseorigin  
thrpt   20  695281.461 ? 30481.453  ops/s

128
Benchmark  (clusterSize)  (delayDuration)  (fsync)  (hClockType)   
Mode  Cnt   Score   Error  Units
SelectBenchmark.kvGet  1  500falseorigin  
thrpt   20  655976.549 ? 11583.202  ops/s
SelectBenchmark.kvGet  1 5000falseorigin  
thrpt   20  663624.438 ?  8228.357  ops/s
{noformat}

> Investigate GET scalability issues
> --
>
> Key: IGNITE-23199
> URL: https://issues.apache.org/jira/browse/IGNITE-23199
> Project: Ignite
>  Issue Type: Test
>Reporter: Alexey Scherbakov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3, important
> Fix For: 3.0
>
>
> Current benchmark results show what GETs do not scale well, especially under 
> high concurrency.
> This should be investigated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-23323) Do not increment hybrid clock timestamp in RO transaction

2024-10-01 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-23323:
--

 Summary: Do not increment hybrid clock timestamp in RO transaction
 Key: IGNITE-23323
 URL: https://issues.apache.org/jira/browse/IGNITE-23323
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov


h3. Motivation
Hybrid clock is incremented in any time when we are getting a current 
timestamp. We can avoid increments, reduce latency in the case of RO 
transactions, and replace the cas operation with the only volatile read.
{code}
@Override
public long nonUniqNow() {
return latestTime;
}
@Override
public HybridTimestamp nonUniqTimestampNow() {
return hybridTimestamp(latestTime);
}
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-23316) Hybrid Clock#update can be optimized in case where return value is not needed

2024-09-30 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-23316:
---
Description: 
h3. Motivation
Thread-safe update is an expensive operation, and we ought to avoid it in cases 
where it is not needed. The stack below shows the usage update operation where 
the result value in not used:
{code}
if (response instanceof TimestampAware) {
clock.update(((TimestampAware) response).timestamp());
}
{code}
{noformat}
hybridClock:update-:java.lang.Exception: 
at 
org.apache.ignite.internal.util.IgniteUtils.dumpStack(IgniteUtils.java:682)
at 
org.apache.ignite.internal.hlc.HybridClockImpl.update(HybridClockImpl.java:119)
at 
org.apache.ignite.internal.replicator.ReplicaService.lambda$sendToReplica$7(ReplicaService.java:164)
at 
java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
at 
java.base/java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:883)
at 
java.base/java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2257)
at 
org.apache.ignite.internal.replicator.ReplicaService.sendToReplica(ReplicaService.java:144)
at 
org.apache.ignite.internal.replicator.ReplicaService.invoke(ReplicaService.java:275)
at 
org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.lambda$evaluateReadOnlyPrimaryNode$19(InternalTableImpl.java:772)
at 
java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1106)
at 
java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2241)
at 
org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.evaluateReadOnlyPrimaryNode(InternalTableImpl.java:763)
at 
org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.get(InternalTableImpl.java:859)
at 
org.apache.ignite.internal.table.KeyValueBinaryViewImpl.lambda$getAsync$1(KeyValueBinaryViewImpl.java:120)
at 
org.apache.ignite.internal.table.AbstractTableView.lambda$withSchemaSync$2(AbstractTableView.java:143)
at 
java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1106)
at 
java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2241)
at 
org.apache.ignite.internal.table.AbstractTableView.withSchemaSync(AbstractTableView.java:143)
at 
org.apache.ignite.internal.table.AbstractTableView.withSchemaSync(AbstractTableView.java:134)
at 
org.apache.ignite.internal.table.AbstractTableView.doOperation(AbstractTableView.java:112)
at 
org.apache.ignite.internal.table.KeyValueBinaryViewImpl.getAsync(KeyValueBinaryViewImpl.java:117)
at 
org.apache.ignite.internal.table.KeyValueBinaryViewImpl.get(KeyValueBinaryViewImpl.java:104)
at 
org.apache.ignite.internal.table.KeyValueBinaryViewImpl.get(KeyValueBinaryViewImpl.java:74)
at 
org.apache.ignite.internal.table.PublicApiThreadingKeyValueView.lambda$get$0(PublicApiThreadingKeyValueView.java:57)
at 
org.apache.ignite.internal.thread.PublicApiThreading.executeWithRole(PublicApiThreading.java:144)
at 
org.apache.ignite.internal.thread.PublicApiThreading.execUserSyncOperation(PublicApiThreading.java:102)
at 
org.apache.ignite.internal.table.PublicApiThreadingViewBase.executeSyncOp(PublicApiThreadingViewBase.java:107)
at 
org.apache.ignite.internal.table.PublicApiThreadingKeyValueView.get(PublicApiThreadingKeyValueView.java:57)
at 
org.apache.ignite.internal.restart.RestartProofKeyValueView.lambda$get$0(RestartProofKeyValueView.java:58)
at 
org.apache.ignite.internal.restart.RestartProofApiObject.lambda$attached$0(RestartProofApiObject.java:46)
at 
org.apache.ignite.internal.restart.IgniteAttachmentLock.attached(IgniteAttachmentLock.java:59)
at 
org.apache.ignite.internal.restart.RestartProofApiObject.attached(RestartProofApiObject.java:46)
at 
org.apache.ignite.internal.restart.RestartProofKeyValueView.get(RestartProofKeyValueView.java:58)
at 
org.apache.ignite.internal.benchmark.SelectBenchmark.kvGet(SelectBenchmark.java:175)
at 
org.apache.ignite.internal.benchmark.jmh_generated.SelectBenchmark_kvGet_jmhTest.kvGet_avgt_jmhStub(SelectBenchmark_kvGet_jmhTest.java:238)
at 
org.apache.ignite.internal.benchmark.jmh_generated.SelectBenchmark_kvGet_jmhTest.kvGet_AverageTime(SelectBenchmark_kvGet_jmhTest.java:177)
at jdk.internal.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at 
org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.

[jira] [Created] (IGNITE-23316) Hybrid Clock#update can be optimized in case where return value is not needed

2024-09-30 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-23316:
--

 Summary: Hybrid Clock#update can be optimized in case where return 
value is not needed
 Key: IGNITE-23316
 URL: https://issues.apache.org/jira/browse/IGNITE-23316
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov


h3. Motivation
Thread-safe update is an expensive operation, and we ought to avoid it in cases 
where it is not needed. The stack below shows the usage update operation where 
the result value in not used:
{code}
if (response instanceof TimestampAware) {
clock.fastUpdate(((TimestampAware) response).timestamp());
}
{code}
{noformat}
hybridClock:update-:java.lang.Exception: 
at 
org.apache.ignite.internal.util.IgniteUtils.dumpStack(IgniteUtils.java:682)
at 
org.apache.ignite.internal.hlc.HybridClockImpl.update(HybridClockImpl.java:119)
at 
org.apache.ignite.internal.replicator.ReplicaService.lambda$sendToReplica$7(ReplicaService.java:164)
at 
java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
at 
java.base/java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:883)
at 
java.base/java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2257)
at 
org.apache.ignite.internal.replicator.ReplicaService.sendToReplica(ReplicaService.java:144)
at 
org.apache.ignite.internal.replicator.ReplicaService.invoke(ReplicaService.java:275)
at 
org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.lambda$evaluateReadOnlyPrimaryNode$19(InternalTableImpl.java:772)
at 
java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1106)
at 
java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2241)
at 
org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.evaluateReadOnlyPrimaryNode(InternalTableImpl.java:763)
at 
org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.get(InternalTableImpl.java:859)
at 
org.apache.ignite.internal.table.KeyValueBinaryViewImpl.lambda$getAsync$1(KeyValueBinaryViewImpl.java:120)
at 
org.apache.ignite.internal.table.AbstractTableView.lambda$withSchemaSync$2(AbstractTableView.java:143)
at 
java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1106)
at 
java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2241)
at 
org.apache.ignite.internal.table.AbstractTableView.withSchemaSync(AbstractTableView.java:143)
at 
org.apache.ignite.internal.table.AbstractTableView.withSchemaSync(AbstractTableView.java:134)
at 
org.apache.ignite.internal.table.AbstractTableView.doOperation(AbstractTableView.java:112)
at 
org.apache.ignite.internal.table.KeyValueBinaryViewImpl.getAsync(KeyValueBinaryViewImpl.java:117)
at 
org.apache.ignite.internal.table.KeyValueBinaryViewImpl.get(KeyValueBinaryViewImpl.java:104)
at 
org.apache.ignite.internal.table.KeyValueBinaryViewImpl.get(KeyValueBinaryViewImpl.java:74)
at 
org.apache.ignite.internal.table.PublicApiThreadingKeyValueView.lambda$get$0(PublicApiThreadingKeyValueView.java:57)
at 
org.apache.ignite.internal.thread.PublicApiThreading.executeWithRole(PublicApiThreading.java:144)
at 
org.apache.ignite.internal.thread.PublicApiThreading.execUserSyncOperation(PublicApiThreading.java:102)
at 
org.apache.ignite.internal.table.PublicApiThreadingViewBase.executeSyncOp(PublicApiThreadingViewBase.java:107)
at 
org.apache.ignite.internal.table.PublicApiThreadingKeyValueView.get(PublicApiThreadingKeyValueView.java:57)
at 
org.apache.ignite.internal.restart.RestartProofKeyValueView.lambda$get$0(RestartProofKeyValueView.java:58)
at 
org.apache.ignite.internal.restart.RestartProofApiObject.lambda$attached$0(RestartProofApiObject.java:46)
at 
org.apache.ignite.internal.restart.IgniteAttachmentLock.attached(IgniteAttachmentLock.java:59)
at 
org.apache.ignite.internal.restart.RestartProofApiObject.attached(RestartProofApiObject.java:46)
at 
org.apache.ignite.internal.restart.RestartProofKeyValueView.get(RestartProofKeyValueView.java:58)
at 
org.apache.ignite.internal.benchmark.SelectBenchmark.kvGet(SelectBenchmark.java:175)
at 
org.apache.ignite.internal.benchmark.jmh_generated.SelectBenchmark_kvGet_jmhTest.kvGet_avgt_jmhStub(SelectBenchmark_kvGet_jmhTest.java:238)
at 
org.apache.ignite.internal.benchmark.jmh_generated.SelectBenchmark_kvGet_jmhTest.kvGet_AverageTime(SelectBenchmark_kvGet_jmhTest.java:177)
at jdk.internal.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(D

[jira] [Created] (IGNITE-23315) Use direct read transaction start time in case of the primary replica locally

2024-09-30 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-23315:
--

 Summary: Use direct read transaction start time in case of the 
primary replica locally
 Key: IGNITE-23315
 URL: https://issues.apache.org/jira/browse/IGNITE-23315
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov


h3 Motivation
A direct read transaction is an implicit transaction in a single partition, 
where the read timestamp is determined on the server side.
The stack was got for the single node cluster. This means we are getting 
HybridClock#now two times: one on transaction start and again in the primary 
replica listener.
{noformat}
hybridClock:nowLong-:java.lang.Exception: 
at 
org.apache.ignite.internal.util.IgniteUtils.dumpStack(IgniteUtils.java:682)
at 
org.apache.ignite.internal.hlc.HybridClockImpl.nowLong(HybridClockImpl.java:76)
at 
org.apache.ignite.internal.hlc.HybridClockImpl.now(HybridClockImpl.java:108)
at 
org.apache.ignite.internal.hlc.ClockServiceImpl.now(ClockServiceImpl.java:43)
at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.getTxOpTimestamp(PartitionReplicaListener.java:655)
at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$processRequest$5(PartitionReplicaListener.java:524)
at 
org.apache.ignite.internal.tracing.Instrumentation.measure(Instrumentation.java:202)
at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.processRequest(PartitionReplicaListener.java:524)
at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$invoke$2(PartitionReplicaListener.java:471)
at 
java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1106)
at 
java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2241)
at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.invoke(PartitionReplicaListener.java:471)
at 
org.apache.ignite.internal.replicator.ReplicaImpl.processRequest(ReplicaImpl.java:154)
at 
org.apache.ignite.internal.replicator.ReplicaManager.handleReplicaRequest(ReplicaManager.java:451)
at 
org.apache.ignite.internal.replicator.ReplicaManager.onReplicaMessageReceived(ReplicaManager.java:382)
at 
org.apache.ignite.internal.network.DefaultMessagingService.sendToSelf(DefaultMessagingService.java:390)
at 
org.apache.ignite.internal.network.DefaultMessagingService.invoke0(DefaultMessagingService.java:318)
at 
org.apache.ignite.internal.network.DefaultMessagingService.invoke(DefaultMessagingService.java:247)
at 
org.apache.ignite.internal.network.wrapper.JumpToExecutorByConsistentIdAfterSend.invoke(JumpToExecutorByConsistentIdAfterSend.java:97)
at 
org.apache.ignite.internal.network.MessagingService.invoke(MessagingService.java:198)
at 
org.apache.ignite.internal.replicator.ReplicaService.sendToReplica(ReplicaService.java:140)
at 
org.apache.ignite.internal.replicator.ReplicaService.invoke(ReplicaService.java:275)
at 
org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.lambda$evaluateReadOnlyPrimaryNode$19(InternalTableImpl.java:770)
at 
java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1106)
at 
java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2241)
at 
org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.evaluateReadOnlyPrimaryNode(InternalTableImpl.java:761)
at 
org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.get(InternalTableImpl.java:857)
at 
org.apache.ignite.internal.table.KeyValueBinaryViewImpl.lambda$getAsync$1(KeyValueBinaryViewImpl.java:120)
at 
org.apache.ignite.internal.table.AbstractTableView.lambda$withSchemaSync$2(AbstractTableView.java:143)
at 
java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1106)
at 
java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2241)
at 
org.apache.ignite.internal.table.AbstractTableView.withSchemaSync(AbstractTableView.java:143)
at 
org.apache.ignite.internal.table.AbstractTableView.withSchemaSync(AbstractTableView.java:134)
at 
org.apache.ignite.internal.table.AbstractTableView.doOperation(AbstractTableView.java:112)
at 
org.apache.ignite.internal.table.KeyValueBinaryViewImpl.getAsync(KeyValueBinaryViewImpl.java:117)
at 
org.apache.ignite.internal.table.KeyValueBinaryViewImpl.get(KeyValueBinaryViewImpl.java:104)
at 
org.apache.ignite.internal.table.KeyValueBinaryViewImpl.get(KeyValueBinaryViewImpl.java:74)
at 
org.apache.ignite.internal.table.

[jira] [Updated] (IGNITE-23315) Use direct read transaction start time in case of the primary replica locally

2024-09-30 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-23315:
---
Description: 
h3. Motivation
A direct read transaction is an implicit transaction in a single partition, 
where the read timestamp is determined on the server side.
The stack was got for the single node cluster. This means we are getting 
HybridClock#now two times: one on transaction start and again in the primary 
replica listener.
{noformat}
hybridClock:nowLong-:java.lang.Exception: 
at 
org.apache.ignite.internal.util.IgniteUtils.dumpStack(IgniteUtils.java:682)
at 
org.apache.ignite.internal.hlc.HybridClockImpl.nowLong(HybridClockImpl.java:76)
at 
org.apache.ignite.internal.hlc.HybridClockImpl.now(HybridClockImpl.java:108)
at 
org.apache.ignite.internal.hlc.ClockServiceImpl.now(ClockServiceImpl.java:43)
at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.getTxOpTimestamp(PartitionReplicaListener.java:655)
at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$processRequest$5(PartitionReplicaListener.java:524)
at 
org.apache.ignite.internal.tracing.Instrumentation.measure(Instrumentation.java:202)
at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.processRequest(PartitionReplicaListener.java:524)
at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$invoke$2(PartitionReplicaListener.java:471)
at 
java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1106)
at 
java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2241)
at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.invoke(PartitionReplicaListener.java:471)
at 
org.apache.ignite.internal.replicator.ReplicaImpl.processRequest(ReplicaImpl.java:154)
at 
org.apache.ignite.internal.replicator.ReplicaManager.handleReplicaRequest(ReplicaManager.java:451)
at 
org.apache.ignite.internal.replicator.ReplicaManager.onReplicaMessageReceived(ReplicaManager.java:382)
at 
org.apache.ignite.internal.network.DefaultMessagingService.sendToSelf(DefaultMessagingService.java:390)
at 
org.apache.ignite.internal.network.DefaultMessagingService.invoke0(DefaultMessagingService.java:318)
at 
org.apache.ignite.internal.network.DefaultMessagingService.invoke(DefaultMessagingService.java:247)
at 
org.apache.ignite.internal.network.wrapper.JumpToExecutorByConsistentIdAfterSend.invoke(JumpToExecutorByConsistentIdAfterSend.java:97)
at 
org.apache.ignite.internal.network.MessagingService.invoke(MessagingService.java:198)
at 
org.apache.ignite.internal.replicator.ReplicaService.sendToReplica(ReplicaService.java:140)
at 
org.apache.ignite.internal.replicator.ReplicaService.invoke(ReplicaService.java:275)
at 
org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.lambda$evaluateReadOnlyPrimaryNode$19(InternalTableImpl.java:770)
at 
java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1106)
at 
java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2241)
at 
org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.evaluateReadOnlyPrimaryNode(InternalTableImpl.java:761)
at 
org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.get(InternalTableImpl.java:857)
at 
org.apache.ignite.internal.table.KeyValueBinaryViewImpl.lambda$getAsync$1(KeyValueBinaryViewImpl.java:120)
at 
org.apache.ignite.internal.table.AbstractTableView.lambda$withSchemaSync$2(AbstractTableView.java:143)
at 
java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1106)
at 
java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2241)
at 
org.apache.ignite.internal.table.AbstractTableView.withSchemaSync(AbstractTableView.java:143)
at 
org.apache.ignite.internal.table.AbstractTableView.withSchemaSync(AbstractTableView.java:134)
at 
org.apache.ignite.internal.table.AbstractTableView.doOperation(AbstractTableView.java:112)
at 
org.apache.ignite.internal.table.KeyValueBinaryViewImpl.getAsync(KeyValueBinaryViewImpl.java:117)
at 
org.apache.ignite.internal.table.KeyValueBinaryViewImpl.get(KeyValueBinaryViewImpl.java:104)
at 
org.apache.ignite.internal.table.KeyValueBinaryViewImpl.get(KeyValueBinaryViewImpl.java:74)
at 
org.apache.ignite.internal.table.PublicApiThreadingKeyValueView.lambda$get$0(PublicApiThreadingKeyValueView.java:57)
at 
org.apache.ignite.internal.thread.PublicApiThreading.executeWithRole(Pub

[jira] [Updated] (IGNITE-23315) Use direct read transaction start time in case of the primary replica locally

2024-09-30 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-23315:
---
Description: 
h3. Motivation
A direct read transaction is an implicit transaction in a single partition, 
where the read timestamp is determined on the server side.
The stack was got for the single node cluster. This means we are getting 
HybridClock#now two times: one on transaction start and again in the primary 
replica listener.
{noformat}
hybridClock:nowLong-:java.lang.Exception: 
at 
org.apache.ignite.internal.util.IgniteUtils.dumpStack(IgniteUtils.java:682)
at 
org.apache.ignite.internal.hlc.HybridClockImpl.nowLong(HybridClockImpl.java:76)
at 
org.apache.ignite.internal.hlc.HybridClockImpl.now(HybridClockImpl.java:108)
at 
org.apache.ignite.internal.hlc.ClockServiceImpl.now(ClockServiceImpl.java:43)
at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.getTxOpTimestamp(PartitionReplicaListener.java:655)
at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$processRequest$5(PartitionReplicaListener.java:524)
at 
org.apache.ignite.internal.tracing.Instrumentation.measure(Instrumentation.java:202)
at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.processRequest(PartitionReplicaListener.java:524)
at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$invoke$2(PartitionReplicaListener.java:471)
at 
java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1106)
at 
java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2241)
at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.invoke(PartitionReplicaListener.java:471)
at 
org.apache.ignite.internal.replicator.ReplicaImpl.processRequest(ReplicaImpl.java:154)
at 
org.apache.ignite.internal.replicator.ReplicaManager.handleReplicaRequest(ReplicaManager.java:451)
at 
org.apache.ignite.internal.replicator.ReplicaManager.onReplicaMessageReceived(ReplicaManager.java:382)
at 
org.apache.ignite.internal.network.DefaultMessagingService.sendToSelf(DefaultMessagingService.java:390)
at 
org.apache.ignite.internal.network.DefaultMessagingService.invoke0(DefaultMessagingService.java:318)
at 
org.apache.ignite.internal.network.DefaultMessagingService.invoke(DefaultMessagingService.java:247)
at 
org.apache.ignite.internal.network.wrapper.JumpToExecutorByConsistentIdAfterSend.invoke(JumpToExecutorByConsistentIdAfterSend.java:97)
at 
org.apache.ignite.internal.network.MessagingService.invoke(MessagingService.java:198)
at 
org.apache.ignite.internal.replicator.ReplicaService.sendToReplica(ReplicaService.java:140)
at 
org.apache.ignite.internal.replicator.ReplicaService.invoke(ReplicaService.java:275)
at 
org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.lambda$evaluateReadOnlyPrimaryNode$19(InternalTableImpl.java:770)
at 
java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1106)
at 
java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2241)
at 
org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.evaluateReadOnlyPrimaryNode(InternalTableImpl.java:761)
at 
org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.get(InternalTableImpl.java:857)
at 
org.apache.ignite.internal.table.KeyValueBinaryViewImpl.lambda$getAsync$1(KeyValueBinaryViewImpl.java:120)
at 
org.apache.ignite.internal.table.AbstractTableView.lambda$withSchemaSync$2(AbstractTableView.java:143)
at 
java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1106)
at 
java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2241)
at 
org.apache.ignite.internal.table.AbstractTableView.withSchemaSync(AbstractTableView.java:143)
at 
org.apache.ignite.internal.table.AbstractTableView.withSchemaSync(AbstractTableView.java:134)
at 
org.apache.ignite.internal.table.AbstractTableView.doOperation(AbstractTableView.java:112)
at 
org.apache.ignite.internal.table.KeyValueBinaryViewImpl.getAsync(KeyValueBinaryViewImpl.java:117)
at 
org.apache.ignite.internal.table.KeyValueBinaryViewImpl.get(KeyValueBinaryViewImpl.java:104)
at 
org.apache.ignite.internal.table.KeyValueBinaryViewImpl.get(KeyValueBinaryViewImpl.java:74)
at 
org.apache.ignite.internal.table.PublicApiThreadingKeyValueView.lambda$get$0(PublicApiThreadingKeyValueView.java:57)
at 
org.apache.ignite.internal.thread.PublicApiThreading.executeWithRole(Pub

[jira] [Updated] (IGNITE-23314) PartitionReplicaListener#ensureReplicaIsPrimary does not require to use hybrid timestamp

2024-09-30 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-23314:
---
Description: 
h3. Motivation
The main reason is to avoid contention on the hybrid clock. The algorithm does 
not require checking if the replica is primary in the hybrid timestamp. We did 
this to rollback fast 
a transaction in case the primary replica is changed just in time when the 
request is transported through the network.
{noformat}
hybridClock:nowLong-:java.lang.Exception: 
at 
org.apache.ignite.internal.util.IgniteUtils.dumpStack(IgniteUtils.java:682)
at 
org.apache.ignite.internal.hlc.HybridClockImpl.nowLong(HybridClockImpl.java:76)
at 
org.apache.ignite.internal.hlc.HybridClockImpl.now(HybridClockImpl.java:108)
at 
org.apache.ignite.internal.hlc.ClockServiceImpl.now(ClockServiceImpl.java:43)
at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.ensureReplicaIsPrimary(PartitionReplicaListener.java:3535)
at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$invoke$1(PartitionReplicaListener.java:470)
at 
org.apache.ignite.internal.tracing.Instrumentation.measure(Instrumentation.java:259)
at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.invoke(PartitionReplicaListener.java:470)
at 
org.apache.ignite.internal.replicator.ReplicaImpl.processRequest(ReplicaImpl.java:154)
at 
org.apache.ignite.internal.replicator.ReplicaManager.handleReplicaRequest(ReplicaManager.java:451)
at 
org.apache.ignite.internal.replicator.ReplicaManager.onReplicaMessageReceived(ReplicaManager.java:382)
at 
org.apache.ignite.internal.network.DefaultMessagingService.sendToSelf(DefaultMessagingService.java:390)
at 
org.apache.ignite.internal.network.DefaultMessagingService.invoke0(DefaultMessagingService.java:318)
at 
org.apache.ignite.internal.network.DefaultMessagingService.invoke(DefaultMessagingService.java:247)
at 
org.apache.ignite.internal.network.wrapper.JumpToExecutorByConsistentIdAfterSend.invoke(JumpToExecutorByConsistentIdAfterSend.java:97)
at 
org.apache.ignite.internal.network.MessagingService.invoke(MessagingService.java:198)
at 
org.apache.ignite.internal.replicator.ReplicaService.sendToReplica(ReplicaService.java:140)
at 
org.apache.ignite.internal.replicator.ReplicaService.invoke(ReplicaService.java:275)
at 
org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.lambda$evaluateReadOnlyPrimaryNode$19(InternalTableImpl.java:770)
at 
java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1106)
at 
java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2241)
at 
org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.evaluateReadOnlyPrimaryNode(InternalTableImpl.java:761)
at 
org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.get(InternalTableImpl.java:857)
at 
org.apache.ignite.internal.table.KeyValueBinaryViewImpl.lambda$getAsync$1(KeyValueBinaryViewImpl.java:120)
at 
org.apache.ignite.internal.table.AbstractTableView.lambda$withSchemaSync$2(AbstractTableView.java:143)
at 
java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1106)
at 
java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2241)
at 
org.apache.ignite.internal.table.AbstractTableView.withSchemaSync(AbstractTableView.java:143)
at 
org.apache.ignite.internal.table.AbstractTableView.withSchemaSync(AbstractTableView.java:134)
at 
org.apache.ignite.internal.table.AbstractTableView.doOperation(AbstractTableView.java:112)
at 
org.apache.ignite.internal.table.KeyValueBinaryViewImpl.getAsync(KeyValueBinaryViewImpl.java:117)
at 
org.apache.ignite.internal.table.KeyValueBinaryViewImpl.get(KeyValueBinaryViewImpl.java:104)
at 
org.apache.ignite.internal.table.KeyValueBinaryViewImpl.get(KeyValueBinaryViewImpl.java:74)
at 
org.apache.ignite.internal.table.PublicApiThreadingKeyValueView.lambda$get$0(PublicApiThreadingKeyValueView.java:57)
at 
org.apache.ignite.internal.thread.PublicApiThreading.executeWithRole(PublicApiThreading.java:144)
at 
org.apache.ignite.internal.thread.PublicApiThreading.execUserSyncOperation(PublicApiThreading.java:102)
at 
org.apache.ignite.internal.table.PublicApiThreadingViewBase.executeSyncOp(PublicApiThreadingViewBase.java:107)
at 
org.apache.ignite.internal.table.PublicApiThreadingKeyValueView.get(PublicApiThreadingKeyValueView.java:57)
at 
org.apache.ignite.internal.restart.RestartProofKeyValueView.lambda$get$0(RestartProofKeyValueView.java

[jira] [Updated] (IGNITE-23314) PartitionReplicaListener#ensureReplicaIsPrimary does not require to use hybrid timestamp

2024-09-30 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-23314:
---
Description: 
h3. Motivation
The main reason is to avoid contention on the hybrid clock. The algorithm does 
not require checking if the replica is primary in the hybrid timestamp. We did 
this to rollback fast 
a transaction in case the primary replica is changed just in time when the 
request is transported through the network.
{noformat}
hybridClock:nowLong-:java.lang.Exception: 
at 
org.apache.ignite.internal.util.IgniteUtils.dumpStack(IgniteUtils.java:682)
at 
org.apache.ignite.internal.hlc.HybridClockImpl.nowLong(HybridClockImpl.java:76)
at 
org.apache.ignite.internal.hlc.HybridClockImpl.now(HybridClockImpl.java:108)
at 
org.apache.ignite.internal.hlc.ClockServiceImpl.now(ClockServiceImpl.java:43)
at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.ensureReplicaIsPrimary(PartitionReplicaListener.java:3535)
at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$invoke$1(PartitionReplicaListener.java:470)
at 
org.apache.ignite.internal.tracing.Instrumentation.measure(Instrumentation.java:259)
at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.invoke(PartitionReplicaListener.java:470)
at 
org.apache.ignite.internal.replicator.ReplicaImpl.processRequest(ReplicaImpl.java:154)
at 
org.apache.ignite.internal.replicator.ReplicaManager.handleReplicaRequest(ReplicaManager.java:451)
at 
org.apache.ignite.internal.replicator.ReplicaManager.onReplicaMessageReceived(ReplicaManager.java:382)
at 
org.apache.ignite.internal.network.DefaultMessagingService.sendToSelf(DefaultMessagingService.java:390)
at 
org.apache.ignite.internal.network.DefaultMessagingService.invoke0(DefaultMessagingService.java:318)
at 
org.apache.ignite.internal.network.DefaultMessagingService.invoke(DefaultMessagingService.java:247)
at 
org.apache.ignite.internal.network.wrapper.JumpToExecutorByConsistentIdAfterSend.invoke(JumpToExecutorByConsistentIdAfterSend.java:97)
at 
org.apache.ignite.internal.network.MessagingService.invoke(MessagingService.java:198)
at 
org.apache.ignite.internal.replicator.ReplicaService.sendToReplica(ReplicaService.java:140)
at 
org.apache.ignite.internal.replicator.ReplicaService.invoke(ReplicaService.java:275)
at 
org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.lambda$evaluateReadOnlyPrimaryNode$19(InternalTableImpl.java:770)
at 
java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1106)
at 
java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2241)
at 
org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.evaluateReadOnlyPrimaryNode(InternalTableImpl.java:761)
at 
org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.get(InternalTableImpl.java:857)
at 
org.apache.ignite.internal.table.KeyValueBinaryViewImpl.lambda$getAsync$1(KeyValueBinaryViewImpl.java:120)
at 
org.apache.ignite.internal.table.AbstractTableView.lambda$withSchemaSync$2(AbstractTableView.java:143)
at 
java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1106)
at 
java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2241)
at 
org.apache.ignite.internal.table.AbstractTableView.withSchemaSync(AbstractTableView.java:143)
at 
org.apache.ignite.internal.table.AbstractTableView.withSchemaSync(AbstractTableView.java:134)
at 
org.apache.ignite.internal.table.AbstractTableView.doOperation(AbstractTableView.java:112)
at 
org.apache.ignite.internal.table.KeyValueBinaryViewImpl.getAsync(KeyValueBinaryViewImpl.java:117)
at 
org.apache.ignite.internal.table.KeyValueBinaryViewImpl.get(KeyValueBinaryViewImpl.java:104)
at 
org.apache.ignite.internal.table.KeyValueBinaryViewImpl.get(KeyValueBinaryViewImpl.java:74)
at 
org.apache.ignite.internal.table.PublicApiThreadingKeyValueView.lambda$get$0(PublicApiThreadingKeyValueView.java:57)
at 
org.apache.ignite.internal.thread.PublicApiThreading.executeWithRole(PublicApiThreading.java:144)
at 
org.apache.ignite.internal.thread.PublicApiThreading.execUserSyncOperation(PublicApiThreading.java:102)
at 
org.apache.ignite.internal.table.PublicApiThreadingViewBase.executeSyncOp(PublicApiThreadingViewBase.java:107)
at 
org.apache.ignite.internal.table.PublicApiThreadingKeyValueView.get(PublicApiThreadingKeyValueView.java:57)
at 
org.apache.ignite.internal.restart.RestartProofKeyValueView.lambda$get$0(RestartProofKeyValueView.java

[jira] [Created] (IGNITE-23314) PartitionReplicaListener#ensureReplicaIsPrimary does not require to use hybrid timestamp

2024-09-30 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-23314:
--

 Summary: PartitionReplicaListener#ensureReplicaIsPrimary does not 
require to use hybrid timestamp
 Key: IGNITE-23314
 URL: https://issues.apache.org/jira/browse/IGNITE-23314
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov


h3. Motivation
The main reason is to avoid contention on the hybrid clock. The algorithm does 
not require checking if the replica is primary in the hybrid timestamp. We did 
this to rollback fast 
a transaction in case the primary replica is changed just in time when the 
request is transported through the network.

h3. Definition of done
Use not-contentioned (HybridClockSyncImpl#currentTime) method to deterine an 
astronomical timestamp when the replica request is getting.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-23313) Avoid useing fullfilded hybrid time in SchemaVersions#schemaVersionAtNow

2024-09-30 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-23313:
--

 Summary: Avoid useing fullfilded hybrid time in 
SchemaVersions#schemaVersionAtNow
 Key: IGNITE-23313
 URL: https://issues.apache.org/jira/browse/IGNITE-23313
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov


h3. Motivation
Contantion is the main issue with one-node performance. The hybrid clock is a 
natural contraction in our algorithm. Hence, reducing the number of 
HybridClock#now invocations increases performance.
{code}
hybridClock:nowLong-:java.lang.Exception: 
at 
org.apache.ignite.internal.util.IgniteUtils.dumpStack(IgniteUtils.java:682)
at 
org.apache.ignite.internal.hlc.HybridClockImpl.nowLong(HybridClockImpl.java:76)
at 
org.apache.ignite.internal.hlc.HybridClockImpl.now(HybridClockImpl.java:108)
at 
org.apache.ignite.internal.hlc.ClockServiceImpl.now(ClockServiceImpl.java:43)
at 
org.apache.ignite.internal.table.distributed.schema.SchemaVersionsImpl.schemaVersionAtNow(SchemaVersionsImpl.java:86)
at 
org.apache.ignite.internal.table.AbstractTableView.withSchemaSync(AbstractTableView.java:139)
at 
org.apache.ignite.internal.table.AbstractTableView.withSchemaSync(AbstractTableView.java:134)
at 
org.apache.ignite.internal.table.AbstractTableView.doOperation(AbstractTableView.java:112)
at 
org.apache.ignite.internal.table.KeyValueBinaryViewImpl.getAsync(KeyValueBinaryViewImpl.java:117)
at 
org.apache.ignite.internal.table.KeyValueBinaryViewImpl.get(KeyValueBinaryViewImpl.java:104)
at 
org.apache.ignite.internal.table.KeyValueBinaryViewImpl.get(KeyValueBinaryViewImpl.java:74)
at 
org.apache.ignite.internal.table.PublicApiThreadingKeyValueView.lambda$get$0(PublicApiThreadingKeyValueView.java:57)
at 
org.apache.ignite.internal.thread.PublicApiThreading.executeWithRole(PublicApiThreading.java:144)
at 
org.apache.ignite.internal.thread.PublicApiThreading.execUserSyncOperation(PublicApiThreading.java:102)
at 
org.apache.ignite.internal.table.PublicApiThreadingViewBase.executeSyncOp(PublicApiThreadingViewBase.java:107)
at 
org.apache.ignite.internal.table.PublicApiThreadingKeyValueView.get(PublicApiThreadingKeyValueView.java:57)
at 
org.apache.ignite.internal.restart.RestartProofKeyValueView.lambda$get$0(RestartProofKeyValueView.java:58)
at 
org.apache.ignite.internal.restart.RestartProofApiObject.lambda$attached$0(RestartProofApiObject.java:46)
at 
org.apache.ignite.internal.restart.IgniteAttachmentLock.attached(IgniteAttachmentLock.java:59)
at 
org.apache.ignite.internal.restart.RestartProofApiObject.attached(RestartProofApiObject.java:46)
at 
org.apache.ignite.internal.restart.RestartProofKeyValueView.get(RestartProofKeyValueView.java:58)
at 
org.apache.ignite.internal.benchmark.SelectBenchmark.kvGet(SelectBenchmark.java:175)
at 
org.apache.ignite.internal.benchmark.jmh_generated.SelectBenchmark_kvGet_jmhTest.kvGet_avgt_jmhStub(SelectBenchmark_kvGet_jmhTest.java:238)
at 
org.apache.ignite.internal.benchmark.jmh_generated.SelectBenchmark_kvGet_jmhTest.kvGet_AverageTime(SelectBenchmark_kvGet_jmhTest.java:177)
at jdk.internal.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at 
org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:527)
at 
org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:504)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
{code}
Here the call stack to determine scema version in case of an implicit 
transaction. But if the transaction is alredy created for at this time, we can 
use a transaction start time.

h3. Implementation notes
# In the case above we ought to create an implicit transaction instance before 
the palce. (it could be difficult, but it will be need in the future)
# We can use a fast, not continuous method of getting timestamps based on 
current astronomical time. It coud be the method 
HybridClockSyncImpl#currentTime.

h3. Definition of done
Do not use hybrid time in the schemaVersionAtNow method because there is no 
reason to capture this timestamp.



--
This message was s

[jira] [Commented] (IGNITE-23132) java.lang.NullPointerException: null in LogId.compareTo when handle vote request

2024-09-25 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-23132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884741#comment-17884741
 ] 

Vladislav Pyatkov commented on IGNITE-23132:


Merged 12166027fbdd5d3ae0506def0607f27d7d07fa2e

> java.lang.NullPointerException: null in LogId.compareTo when handle vote 
> request
> 
>
> Key: IGNITE-23132
> URL: https://issues.apache.org/jira/browse/IGNITE-23132
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mirza Aliev
>Assignee: Mirza Aliev
>Priority: Major
>  Labels: ignite-3
> Attachments: _Integration_Tests_Module_Table_27645.log.zip
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> TC main has started to detect NullPointerException in builds logs. This 
> happens in {{ItEstimatedSizeTest#testEstimatedSize}}, seems like it happens 
> when node has already been stopped, but still handle vote request with 
> already cleared states. We need to figure out and fix NPE.
> {noformat}
> [2024-08-30T13:58:43,614][ERROR][%iest_tes_1%JRaft-Request-Processor-13][RpcRequestProcessor]
>  handleRequest RequestVoteRequestImpl [groupId=12_part_10, lastLogIndex=193, 
> lastLogTerm=1, peerId=iest_tes_1, preVote=false, serverId=iest_tes_2, term=2] 
> failed
> java.lang.NullPointerException: null
>   at org.apache.ignite.raft.jraft.entity.LogId.compareTo(LogId.java:91) 
> ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
>   at 
> org.apache.ignite.raft.jraft.core.NodeImpl.handleRequestVoteRequest(NodeImpl.java:2071)
>  ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
>   at 
> org.apache.ignite.raft.jraft.rpc.impl.core.RequestVoteRequestProcessor.processRequest0(RequestVoteRequestProcessor.java:52)
>  ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
>   at 
> org.apache.ignite.raft.jraft.rpc.impl.core.RequestVoteRequestProcessor.processRequest0(RequestVoteRequestProcessor.java:29)
>  ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
>   at 
> org.apache.ignite.raft.jraft.rpc.impl.core.NodeRequestProcessor.processRequest(NodeRequestProcessor.java:55)
>  ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
>   at 
> org.apache.ignite.raft.jraft.rpc.RpcRequestProcessor.handleRequest(RpcRequestProcessor.java:49)
>  ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
>   at 
> org.apache.ignite.raft.jraft.rpc.RpcRequestProcessor.handleRequest(RpcRequestProcessor.java:29)
>  ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
>   at 
> org.apache.ignite.raft.jraft.rpc.impl.IgniteRpcServer$RpcMessageHandler.lambda$onReceived$0(IgniteRpcServer.java:181)
>  ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>   at java.base/java.lang.Thread.run(Thread.java:834) [?:?]
> {noformat}
> https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/8439410?hideProblemsFromDependencies=false&hideTestsFromDependencies=false&expandCode+Inspection=true&expandBuildChangesSection=true&expandBuildDeploymentsSection=false&expandBuildProblemsSection=true&logFilter=debug&logView=flowAware



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-23257) Switch to partiton-operation thread to handle client messages

2024-09-23 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-23257:
--

 Summary: Switch to partiton-operation thread to handle client 
messages
 Key: IGNITE-23257
 URL: https://issues.apache.org/jira/browse/IGNITE-23257
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov


h3. Motivation
Too much activity going into srv-workere thread:
{noformat}
channelReadMark:node_3344-srv-worker-3 0.0 1862800 1862800
Here is hidden 0.6 us
getPacker:node_3344-srv-worker-3 0.5 1863400 1863900
Here is hidden 0.4 us
readOpId:node_3344-srv-worker-3 0.1 1864300 1864400
Here is hidden 0.7 us
readReqId:node_3344-srv-worker-3 0.1 1865100 1865200
Here is hidden 0.4 us
processOperation:12:node_3344-srv-worker-3 0.0 1865600 1865600
Here is hidden 0.5 us
readTx:node_3344-srv-worker-3 0.1 1866100 1866200
Here is hidden 0.3 us
readTuple:node_3344-srv-worker-3 1.4 1866500 1867900
Here is hidden 1.7 us
readSchemaVersion:node_3344-srv-worker-3 0.1 1869600 1869700
Here is hidden 0.4 us
marshal:node_3344-srv-worker-3 3.4 1870100 1873500
Here is hidden 0.4 us
startImplicitTx:node_3344-srv-worker-3 1.0 1873900 1874900
Here is hidden 0.3 us
awaitPrimaryReplica:node_3344-srv-worker-3 1.0 1875200 1876200
Here is hidden 0.4 us
prepareRequest:node_3344-srv-worker-3 0.2 1876600 1876800
Here is hidden 0.3 us
sendToReplica:node_3344:node_3344-srv-worker-3 0.1 1877100 1877200
Here is hidden 1.0 us
onRequestReceived:node_3344-srv-worker-3 0.0 1878200 1878200
Here is hidden 9.0 us
{noformat}
It takes about 20 microseconds. This behavior does not allow the client to push 
other operations.

h3. Definition of done
Swith from srv-worker to partiton-operation thread just after the message is 
received.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-23199) Investigate GET scalability issues

2024-09-23 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov reassigned IGNITE-23199:
--

Assignee: Vladislav Pyatkov

> Investigate GET scalability issues
> --
>
> Key: IGNITE-23199
> URL: https://issues.apache.org/jira/browse/IGNITE-23199
> Project: Ignite
>  Issue Type: Test
>Reporter: Alexey Scherbakov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0
>
>
> Current benchmark results show what GETs do not scale well, especially under 
> high concurrency.
> This should be investigated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-23199) Investigate GET scalability issues

2024-09-23 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-23199:
---
Issue Type: Test  (was: Improvement)

> Investigate GET scalability issues
> --
>
> Key: IGNITE-23199
> URL: https://issues.apache.org/jira/browse/IGNITE-23199
> Project: Ignite
>  Issue Type: Test
>Reporter: Alexey Scherbakov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0
>
>
> Current benchmark results show what GETs do not scale well, especially under 
> high concurrency.
> This should be investigated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-23255) Fastest way to use Placement driver for RO transaction

2024-09-23 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-23255:
---
Summary: Fastest way to use Placement driver for RO transaction  (was: 
Faser way to use Placement driver for RO transaction)

> Fastest way to use Placement driver for RO transaction
> --
>
> Key: IGNITE-23255
> URL: https://issues.apache.org/jira/browse/IGNITE-23255
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> We know the _completebelFuture#orTimeout_ method is not fast. By the reason 
> we added a method _LeasePlacementDriver#getCurrentPrimaryReplica_ which can 
> return a primary replica immediately if the information has cached already.
> This approach is still not used in methods for RO transactions:
> _InternalTableImpl#evaluateReadOnlyPrimaryNode_
> _InternalTableImpl#evaluateReadOnlyRecipientNode_
> h3. Definition of done
> Enrich the methods with getCurrentPrimaryReplica invokation before waiting 
> directly (We have to check it in benchmark. If the benchmark does not show a 
> reducing latency, only the next point is needed to implement).
> Enforce the awaitPrimaryReplica method in PD to use getCurrentPrimaryReplica 
> before waiting.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-23255) Faser way to use Placement driver for RO transaction

2024-09-23 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-23255:
--

 Summary: Faser way to use Placement driver for RO transaction
 Key: IGNITE-23255
 URL: https://issues.apache.org/jira/browse/IGNITE-23255
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov


h3. Motivation
We know the _completebelFuture#orTimeout_ method is not fast. By the reason we 
added a method _LeasePlacementDriver#getCurrentPrimaryReplica_ which can return 
a primary replica immediately if the information has cached already.
This approach is still not used in methods for RO transactions:
_InternalTableImpl#evaluateReadOnlyPrimaryNode_
_InternalTableImpl#evaluateReadOnlyRecipientNode_

h3. Definition of done
Enrich the methods with getCurrentPrimaryReplica invokation before waiting 
directly (We have to check it in benchmark. If the benchmark does not show a 
reducing latency, only the next point is needed to implement).
Enforce the awaitPrimaryReplica method in PD to use getCurrentPrimaryReplica 
before waiting.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-23246) Lease agreements might leak

2024-09-23 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-23246:
---
Description: 
h3. Motivation
Agreements are used to consider a lease, but unlike leases, the agreements are 
stored in memory. 
{code}
/** Lease agreements which are in progress of negotiation. */
private final Map leaseToNegotiate;
{code}
These agreements exist only in PD active-actor, but in the case where the 
active-actor does not change for a long time, we would get agreements for 
removed replication groups.
The type of agreement is useless but consumes a lot of memory.

h3. Definition of done
Agreemens should be deleted ether with replication group stop or when the 
agreement is expired.

> Lease agreements might leak
> ---
>
> Key: IGNITE-23246
> URL: https://issues.apache.org/jira/browse/IGNITE-23246
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> Agreements are used to consider a lease, but unlike leases, the agreements 
> are stored in memory. 
> {code}
> /** Lease agreements which are in progress of negotiation. */
> private final Map leaseToNegotiate;
> {code}
> These agreements exist only in PD active-actor, but in the case where the 
> active-actor does not change for a long time, we would get agreements for 
> removed replication groups.
> The type of agreement is useless but consumes a lot of memory.
> h3. Definition of done
> Agreemens should be deleted ether with replication group stop or when the 
> agreement is expired.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-23246) Lease agreements might leak

2024-09-20 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-23246:
--

 Summary: Lease agreements might leak
 Key: IGNITE-23246
 URL: https://issues.apache.org/jira/browse/IGNITE-23246
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-23245) Placement driver have to provide only alive node

2024-09-20 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-23245:
--

 Summary: Placement driver have to provide only alive node
 Key: IGNITE-23245
 URL: https://issues.apache.org/jira/browse/IGNITE-23245
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov


h3. Motivation
LeasePlacementDriver returns only alive nodes. The guarantee is convenient to 
use. But the interface AssignmentsPlacementDriver returns assignments as they 
are in MS. Nodes in this result can be down.
Currently, the only pattern to use it is:
{code:java}
CompletableFuture assignmentsFuture = 
placementDriver.getAssignments(tableGroupId, now);

assertThat(assignmentsFuture, willCompleteSuccessfully());

Set assignments = assignmentsFuture.join()
.nodes()
.stream()
.filter(node -> logicalTopology.nodes.contains(node))
.collect(toSet());{code}
It is not so complicated for PD filtered node internally because it has 
TopologyTracker.
h3. Definition of done

Rename both methos of AssignmentsPlacementDriver to getAliveAssignment 
(getAliveAssignments) and change their behavior.

Only alive nodes have to be returned to the interface of PD.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-23207) Optimize read-only tx inflights for implicit gets and getAlls

2024-09-20 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-23207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883248#comment-17883248
 ] 

Vladislav Pyatkov commented on IGNITE-23207:


As I understand the issue, which is resolving using inflights does not help for 
SQL. It creates a transaction in one node and invokes an operation in others.
Hence, we cannot linearize an operation and closing transaction.
If the issue will be confirmed we *must create a ticket* for SQL team.

> Optimize read-only tx inflights for implicit gets and getAlls
> -
>
> Key: IGNITE-23207
> URL: https://issues.apache.org/jira/browse/IGNITE-23207
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Denis Chudov
>Priority: Major
>  Labels: ignite-3
>
> *Motivation* 
> We need inflights counter for read-only transactions to make sure that tx 
> resources (specifically, cursors) will be cleaned up on remote nodes no 
> earlier than user finishes the tx. Seems that it's not necessary for implicit 
> read-only txns. Removing the usage of txn context may make them significantly 
> faster.
> *Definition of done*
> Inflights are not used for implicit read-only txns.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-23207) Optimize read-only tx inflights for implicit gets and getAlls

2024-09-20 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-23207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883241#comment-17883241
 ] 

Vladislav Pyatkov commented on IGNITE-23207:


I do not think we can remove inflights right now because coursor is not a 
thread-safe object even in an implicit transaction. What I doubt about making 
the course thread-safe is the fact that it is worse in all cases except 
implicit transactions.
>From the other side, we can get rid of the inflight map in all cases (RWs, RO 
>coursors):
{code:java}
TransactionInflights#txCtxMap
{code}
Just add a counter in the Ignite transaction. It is possible because a 
transaction is passed to any operation.

> Optimize read-only tx inflights for implicit gets and getAlls
> -
>
> Key: IGNITE-23207
> URL: https://issues.apache.org/jira/browse/IGNITE-23207
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Denis Chudov
>Priority: Major
>  Labels: ignite-3
>
> *Motivation* 
> We need inflights counter for read-only transactions to make sure that tx 
> resources (specifically, cursors) will be cleaned up on remote nodes no 
> earlier than user finishes the tx. Seems that it's not necessary for implicit 
> read-only txns. Removing the usage of txn context may make them significantly 
> faster.
> *Definition of done*
> Inflights are not used for implicit read-only txns.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-23225) Add logging to LogManagerImpl

2024-09-19 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-23225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882935#comment-17882935
 ] 

Vladislav Pyatkov commented on IGNITE-23225:


Merged 5d690bd7ed0ab932b3101f3103eaebf59ee32cf9

> Add logging to LogManagerImpl
> -
>
> Key: IGNITE-23225
> URL: https://issues.apache.org/jira/browse/IGNITE-23225
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mirza Aliev
>Assignee: Mirza Aliev
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We have an idea why NPE happens in 
> https://issues.apache.org/jira/browse/IGNITE-23132, seems like 
> {{LastLogIdClosure}} from 
> {{org.apache.ignite.raft.jraft.storage.impl.LogManagerImpl#getLastLogId}} is 
> completed when {{LogManagerImpl.stopped}}, so 
> {{org.apache.ignite.raft.jraft.storage.impl.LogManagerImpl#offerEvent}} leads 
> to completing {{LastLogIdClosure}}  with {{RaftError.ESTOP}} and 
> {{org.apache.ignite.raft.jraft.storage.impl.LogManagerImpl.LastLogIdClosure#lastLogId}}
>  is not set. To check that, we want to add some logging and monitor TC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-23081) Lease configuration

2024-09-17 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-23081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882332#comment-17882332
 ] 

Vladislav Pyatkov commented on IGNITE-23081:


Meregd 9d0fac024f84a909a85421138112560470083f15

> Lease configuration
> ---
>
> Key: IGNITE-23081
> URL: https://issues.apache.org/jira/browse/IGNITE-23081
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee:  Kirill Sizov
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> h3. Motivation
> Lease is a valid interval for the primary replica. Leases are distributed by 
> a placement driver for a particular period. This period is hardcoded in the 
> lease updater and cannot be configurable.
> Another interval is an interval that is given by the placement driver to a 
> replica to apply a lease.
> {code:java}
> LeaseUpdater#LEASE_INTERVAL
> LeaseUpdater#longLeaseInterval
> {code}
> h3. Definition of done
> Both intervals have to be configurable through cluster configuration.
> *Implementation notes*
> Need to create a configuration extension that is a subclass of 
> ClusterConfigurationSchema where this configuration will be added to. Also 
> need better names for the provided properties.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (IGNITE-23081) Lease configuration

2024-09-17 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-23081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882332#comment-17882332
 ] 

Vladislav Pyatkov edited comment on IGNITE-23081 at 9/17/24 9:31 AM:
-

Merged 9d0fac024f84a909a85421138112560470083f15


was (Author: v.pyatkov):
Meregd 9d0fac024f84a909a85421138112560470083f15

> Lease configuration
> ---
>
> Key: IGNITE-23081
> URL: https://issues.apache.org/jira/browse/IGNITE-23081
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee:  Kirill Sizov
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> h3. Motivation
> Lease is a valid interval for the primary replica. Leases are distributed by 
> a placement driver for a particular period. This period is hardcoded in the 
> lease updater and cannot be configurable.
> Another interval is an interval that is given by the placement driver to a 
> replica to apply a lease.
> {code:java}
> LeaseUpdater#LEASE_INTERVAL
> LeaseUpdater#longLeaseInterval
> {code}
> h3. Definition of done
> Both intervals have to be configurable through cluster configuration.
> *Implementation notes*
> Need to create a configuration extension that is a subclass of 
> ClusterConfigurationSchema where this configuration will be added to. Also 
> need better names for the provided properties.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-23213) Lease agreement can be overridden, but MS not updated

2024-09-16 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-23213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882172#comment-17882172
 ] 

Vladislav Pyatkov commented on IGNITE-23213:


If we fix the issue, a modification of the lease interval ought to be stable 
work on TC.
https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunNetTests/8471370?expandBuildDeploymentsSection=false&hideTestsFromDependencies=false&hideProblemsFromDependencies=false&expandBuildTestsSection=true


> Lease agreement can be overridden, but MS not updated
> -
>
> Key: IGNITE-23213
> URL: https://issues.apache.org/jira/browse/IGNITE-23213
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> Placement-driven creates an agreement to start negotiation about the primary 
> replica lease. Then a lease is created after a replica sends an 
> acknowledgement of this agreement. 
> We propose this process in lenirialized through the MS. But agreements do not 
> store in MS and might be updated in a local lease update process that we 
> apply a lease which is not math to current agreement:
> {noformat}
> Caused by: 
> org.apache.ignite.internal.replicator.exception.PrimaryReplicaMissException: 
> IGN-REP-6 TraceId:2f960fa5-b393-40f4-8066-9f9f4b539f07 The primary replica 
> has changed [txId=0191ea41-2f04-0004-2a17-e67f0001, 
> expectedEnlistmentConsistencyToken=113129031393542406, 
> currentEnlistmentConsistencyToken=113129031360315925].
>   at 
> org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$applyUpdateAllCommand$124(PartitionReplicaListener.java:2891)
>   at 
> java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072)
>   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
>   at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
>   at 
> org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$applyCmdWithRetryOnSafeTimeReorderException$117(PartitionReplicaListener.java:2683)
>   at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
>   at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
>   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
>   at 
> java.base/java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:610)
>   at 
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:649)
>   at 
> java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
>   ... 3 more
> {noformat}
> h3. Definition of done
> Lease should match the agreement, which was sent to a remote replica.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-23213) Lease agreement can be overridden, but MS not updated

2024-09-16 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-23213:
--

 Summary: Lease agreement can be overridden, but MS not updated
 Key: IGNITE-23213
 URL: https://issues.apache.org/jira/browse/IGNITE-23213
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


h3. Motivation
Placement-driven creates an agreement to start negotiation about the primary 
replica lease. Then a lease is created after a replica sends an acknowledgement 
of this agreement. 
We propose this process in lenirialized through the MS. But agreements do not 
store in MS and might be updated in a local lease update process that we apply 
a lease which is not math to current agreement:
{noformat}
Caused by: 
org.apache.ignite.internal.replicator.exception.PrimaryReplicaMissException: 
IGN-REP-6 TraceId:2f960fa5-b393-40f4-8066-9f9f4b539f07 The primary replica has 
changed [txId=0191ea41-2f04-0004-2a17-e67f0001, 
expectedEnlistmentConsistencyToken=113129031393542406, 
currentEnlistmentConsistencyToken=113129031360315925].
  at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$applyUpdateAllCommand$124(PartitionReplicaListener.java:2891)
  at 
java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072)
  at 
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
  at 
java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
  at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$applyCmdWithRetryOnSafeTimeReorderException$117(PartitionReplicaListener.java:2683)
  at 
java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
  at 
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
  at 
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
  at 
java.base/java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:610)
  at 
java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:649)
  at 
java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
  ... 3 more
{noformat}

h3. Definition of done
Lease should match the agreement, which was sent to a remote replica.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-23125) AssertionError: Unexpected primary replica state STOPPED in ItRestartPartitionsCommandTest.testRestartAllPartitions

2024-09-13 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-23125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881528#comment-17881528
 ] 

Vladislav Pyatkov commented on IGNITE-23125:


Merged 2c4900ca7cb77834015638fee2da78ebe68e7369

> AssertionError: Unexpected primary replica state STOPPED in 
> ItRestartPartitionsCommandTest.testRestartAllPartitions
> ---
>
> Key: IGNITE-23125
> URL: https://issues.apache.org/jira/browse/IGNITE-23125
> Project: Ignite
>  Issue Type: Bug
>Reporter: Denis Chudov
>Assignee: Denis Chudov
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> This assertion happens when ReplicaStateManager gets an event notification 
> about primary replica elected, and replica is already stopped. Usually, 
> replica is reserved as primary at the moment when this notification is gotten 
> (and there is clear HB), and cannot be stopped because of assignments change 
> if it is reserved as primary. But it can be stopped because of replica 
> restart (which happens in this test) and this assertion doesn't take into 
> account this reason.
> It seems that this assertion is invalid and can be removed.
> Other solution may be adding special "restarting" flag to know that the 
> primary is being restarted while asserting.
> See ReplicaManager.ReplicaStateManager#onPrimaryElected.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-22833) Test DistributionZoneCausalityDataNodesTest.testEmptyDataNodesOnZoneCreationBeforeTopologyEventAndZoneInitialisation is flaky

2024-09-12 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-22833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881374#comment-17881374
 ] 

Vladislav Pyatkov commented on IGNITE-22833:


Merged cc2323d72d1873e016da53f09c5e13818be4e5da

> Test 
> DistributionZoneCausalityDataNodesTest.testEmptyDataNodesOnZoneCreationBeforeTopologyEventAndZoneInitialisation
>  is flaky
> -
>
> Key: IGNITE-22833
> URL: https://issues.apache.org/jira/browse/IGNITE-22833
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
> Attachments: 
> _Run_Unit_Tests_auto-generated_Run_Unit_Tests_3_30973.log.zip
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {noformat}
> [10:10:14] : [:ignite-distribution-zones:test] 
> DistributionZoneCausalityDataNodesTest > 
> testEmptyDataNodesOnZoneCreationBeforeTopologyEventAndZoneInitialisation(int, 
> int) > [1] 1, 1 STANDARD_ERROR
> [10:10:14] : [:ignite-distribution-zones:test] 
> [2024-07-23T10:10:14,100][WARN 
> ][%test%metastorage-watch-executor-2][UpdateLogImpl] Unable to process 
> catalog event
> [10:10:14] : [:ignite-distribution-zones:test] 
> java.lang.NullPointerException: null
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.distributionzones.DistributionZoneManager.initDataNodesAndTriggerKeysInMetaStorage(DistributionZoneManager.java:527)
>  ~[ignite-distribution-zones-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.distributionzones.DistributionZoneManager.onCreateZone(DistributionZoneManager.java:456)
>  ~[ignite-distribution-zones-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.distributionzones.DistributionZoneManager.lambda$registerCatalogEventListenersOnStartManagerBusy$37(DistributionZoneManager.java:1396)
>  ~[ignite-distribution-zones-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.util.IgniteUtils.inBusyLock(IgniteUtils.java:832) 
> ~[ignite-core-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.distributionzones.DistributionZoneManager.lambda$registerCatalogEventListenersOnStartManagerBusy$38(DistributionZoneManager.java:1395)
>  ~[ignite-distribution-zones-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.event.AbstractEventProducer.fireEvent(AbstractEventProducer.java:88)
>  ~[ignite-core-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.catalog.CatalogManagerImpl.access$000(CatalogManagerImpl.java:81)
>  ~[ignite-catalog-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.catalog.CatalogManagerImpl$OnUpdateHandlerImpl.handle(CatalogManagerImpl.java:577)
>  ~[ignite-catalog-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.catalog.CatalogManagerImpl$OnUpdateHandlerImpl.handle(CatalogManagerImpl.java:544)
>  ~[ignite-catalog-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.catalog.storage.UpdateLogImpl$UpdateListener.onUpdate(UpdateLogImpl.java:320)
>  ~[ignite-catalog-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.metastorage.server.Watch.onUpdate(Watch.java:67) 
> ~[ignite-metastorage-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.metastorage.server.WatchProcessor.notifyWatches(WatchProcessor.java:245)
>  ~[ignite-metastorage-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.metastorage.server.WatchProcessor.lambda$notifyWatches$4(WatchProcessor.java:193)
>  ~[ignite-metastorage-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072)
>  [?:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
>  [?:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> java.base/java.util

[jira] [Commented] (IGNITE-23075) Call failure processor on timeout worker crash

2024-09-12 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-23075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881364#comment-17881364
 ] 

Vladislav Pyatkov commented on IGNITE-23075:


Merged da5703bc0bd5aef3fce855aa6002e6bb49ca2865

> Call failure processor on timeout worker crash
> --
>
> Key: IGNITE-23075
> URL: https://issues.apache.org/jira/browse/IGNITE-23075
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Denis Chudov
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> h3. Motivation
> For the majority of the cluster transaction operation, we use a Timeout 
> worker. It is a single thread that completes futures with a timeout 
> exception. But if the thread stops (due to an unhandled exception) no more 
> operations can time out.
> h3. Definition of done
> The falure processor has to be called on the catch block of the timeout 
> worker.
> {code:java}
> } catch (Throwable t) {
>     failureProcessor.process(new FailureContext(SYSTEM_WORKER_TERMINATION, 
> t));
>     
> throw new IgniteInternalException(t);
> }{code}
> *Implementation notes*
> It would be also nice to add tests for timeout worker.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-22980) Lock manager may fail and lock waiter simultaneously

2024-09-12 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-22980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881297#comment-17881297
 ] 

Vladislav Pyatkov commented on IGNITE-22980:


Meregd 46113b5d4db7034ff559402215e26665918bec62

> Lock manager may fail and lock waiter simultaneously
> 
>
> Key: IGNITE-22980
> URL: https://issues.apache.org/jira/browse/IGNITE-22980
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> h3. Motivation
> The behavior was hardly predicted or planned. But currently, we can acquire a 
> lock:
> {code:java}
> private void lock() {
> lockMode = intendedLockMode;
> intendedLockMode = null;
> intendedLocks.clear();
> }
> {code}
> and made the waiter fail:
> {code:java}
> private void fail(LockException e) {
> ex = e;
> }
> {code}
> without limitation (assertion checking or explicitly prohibition).
> Scenario:
>  * tx1 tries to acquire a lock and finds conflicting transaction tx2;
>  * lock manager tries to check the state and coordinator of tx2;
>  * coordinator of tx2 has left, so TxRecoveryMessage is sent;
>  * the primary replica of commit partition of tx2 is on the same node, so 
> TxRecoveryMessage is sent locally. It also triggers the tx recovery, so tx2 
> is finished and tx cleanup is performed locally. All of this happens in the 
> same thread, and during txn cleanup the locks of tx2 are released;
>  * the release of locks of tx2 allows the conflicting waiter of tx1 to 
> acquire a lock;
>  * the processing of conflicting transaction continues and #fail is called on 
> the same waiter.
> There is also another problem: tx recovery shouldn't happen within 
> synchronized block of HeapLockManager. It can be moved to another pool, and 
> this also won't allow the tx recovery, which releases the locks, to grant 
> lock for waiter of tx1.
> h3. Definition of done
>  * Only one method can be applied to a lock attempt ether lock() or fail(), 
> but not both. Do not forget, a retry attempt may be successful even though 
> the previous attempt failed. Also, there are cases of lock upgrade: S-lock 
> can be taken, but attempt to upgrade it to X-lock can fail, there will be 
> another lock future and it will be completed exceptionally, meanwhile S-lock 
> would be still active;
>  * tx recovery is not executed synchronously within synchronized block of 
> HeapLockManager.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-22842) The client returns to the fork-join pool after handling operations on the server side

2024-09-12 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov reassigned IGNITE-22842:
--

Assignee: Vladislav Pyatkov

> The client returns to the fork-join pool after handling operations on the 
> server side
> -
>
> Key: IGNITE-22842
> URL: https://issues.apache.org/jira/browse/IGNITE-22842
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> In synchronous operation, we return the result to the fork join pool before 
> providing the result to the client in their pool. Of course, the tric costs 
> several microseconds.
> {noformat}
> loadSchemaAndReadData:ForkJoinPool.commonPool-worker-9 0.2 9950400 9950600
> Here is hidden 5.2 us
> kvClientPutEndMark:org.apache.ignite.internal.benchmark.InsertBenchmark.kvThinInsert-jmh-worker-1
>  0.0 9955800 9955800
> {noformat}
> The issue is simullar IGNITE-22838 but about that type of operation that 
> started on the client side.
> h3. Definition of done
> We need to develop a strategy that does not involve the extra thread in 
> synchronous operation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-22833) Test DistributionZoneCausalityDataNodesTest.testEmptyDataNodesOnZoneCreationBeforeTopologyEventAndZoneInitialisation is flaky

2024-09-12 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-22833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881205#comment-17881205
 ] 

Vladislav Pyatkov commented on IGNITE-22833:


[~alapin] Please review.

> Test 
> DistributionZoneCausalityDataNodesTest.testEmptyDataNodesOnZoneCreationBeforeTopologyEventAndZoneInitialisation
>  is flaky
> -
>
> Key: IGNITE-22833
> URL: https://issues.apache.org/jira/browse/IGNITE-22833
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
> Attachments: 
> _Run_Unit_Tests_auto-generated_Run_Unit_Tests_3_30973.log.zip
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {noformat}
> [10:10:14] : [:ignite-distribution-zones:test] 
> DistributionZoneCausalityDataNodesTest > 
> testEmptyDataNodesOnZoneCreationBeforeTopologyEventAndZoneInitialisation(int, 
> int) > [1] 1, 1 STANDARD_ERROR
> [10:10:14] : [:ignite-distribution-zones:test] 
> [2024-07-23T10:10:14,100][WARN 
> ][%test%metastorage-watch-executor-2][UpdateLogImpl] Unable to process 
> catalog event
> [10:10:14] : [:ignite-distribution-zones:test] 
> java.lang.NullPointerException: null
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.distributionzones.DistributionZoneManager.initDataNodesAndTriggerKeysInMetaStorage(DistributionZoneManager.java:527)
>  ~[ignite-distribution-zones-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.distributionzones.DistributionZoneManager.onCreateZone(DistributionZoneManager.java:456)
>  ~[ignite-distribution-zones-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.distributionzones.DistributionZoneManager.lambda$registerCatalogEventListenersOnStartManagerBusy$37(DistributionZoneManager.java:1396)
>  ~[ignite-distribution-zones-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.util.IgniteUtils.inBusyLock(IgniteUtils.java:832) 
> ~[ignite-core-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.distributionzones.DistributionZoneManager.lambda$registerCatalogEventListenersOnStartManagerBusy$38(DistributionZoneManager.java:1395)
>  ~[ignite-distribution-zones-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.event.AbstractEventProducer.fireEvent(AbstractEventProducer.java:88)
>  ~[ignite-core-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.catalog.CatalogManagerImpl.access$000(CatalogManagerImpl.java:81)
>  ~[ignite-catalog-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.catalog.CatalogManagerImpl$OnUpdateHandlerImpl.handle(CatalogManagerImpl.java:577)
>  ~[ignite-catalog-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.catalog.CatalogManagerImpl$OnUpdateHandlerImpl.handle(CatalogManagerImpl.java:544)
>  ~[ignite-catalog-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.catalog.storage.UpdateLogImpl$UpdateListener.onUpdate(UpdateLogImpl.java:320)
>  ~[ignite-catalog-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.metastorage.server.Watch.onUpdate(Watch.java:67) 
> ~[ignite-metastorage-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.metastorage.server.WatchProcessor.notifyWatches(WatchProcessor.java:245)
>  ~[ignite-metastorage-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.metastorage.server.WatchProcessor.lambda$notifyWatches$4(WatchProcessor.java:193)
>  ~[ignite-metastorage-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072)
>  [?:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
>  [?:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> java.base/java.util.concurrent.ThreadPoolE

[jira] [Updated] (IGNITE-22833) Test DistributionZoneCausalityDataNodesTest.testEmptyDataNodesOnZoneCreationBeforeTopologyEventAndZoneInitialisation is flaky

2024-09-12 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-22833:
---
Reviewer: Alexander Lapin

> Test 
> DistributionZoneCausalityDataNodesTest.testEmptyDataNodesOnZoneCreationBeforeTopologyEventAndZoneInitialisation
>  is flaky
> -
>
> Key: IGNITE-22833
> URL: https://issues.apache.org/jira/browse/IGNITE-22833
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
> Attachments: 
> _Run_Unit_Tests_auto-generated_Run_Unit_Tests_3_30973.log.zip
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {noformat}
> [10:10:14] : [:ignite-distribution-zones:test] 
> DistributionZoneCausalityDataNodesTest > 
> testEmptyDataNodesOnZoneCreationBeforeTopologyEventAndZoneInitialisation(int, 
> int) > [1] 1, 1 STANDARD_ERROR
> [10:10:14] : [:ignite-distribution-zones:test] 
> [2024-07-23T10:10:14,100][WARN 
> ][%test%metastorage-watch-executor-2][UpdateLogImpl] Unable to process 
> catalog event
> [10:10:14] : [:ignite-distribution-zones:test] 
> java.lang.NullPointerException: null
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.distributionzones.DistributionZoneManager.initDataNodesAndTriggerKeysInMetaStorage(DistributionZoneManager.java:527)
>  ~[ignite-distribution-zones-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.distributionzones.DistributionZoneManager.onCreateZone(DistributionZoneManager.java:456)
>  ~[ignite-distribution-zones-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.distributionzones.DistributionZoneManager.lambda$registerCatalogEventListenersOnStartManagerBusy$37(DistributionZoneManager.java:1396)
>  ~[ignite-distribution-zones-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.util.IgniteUtils.inBusyLock(IgniteUtils.java:832) 
> ~[ignite-core-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.distributionzones.DistributionZoneManager.lambda$registerCatalogEventListenersOnStartManagerBusy$38(DistributionZoneManager.java:1395)
>  ~[ignite-distribution-zones-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.event.AbstractEventProducer.fireEvent(AbstractEventProducer.java:88)
>  ~[ignite-core-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.catalog.CatalogManagerImpl.access$000(CatalogManagerImpl.java:81)
>  ~[ignite-catalog-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.catalog.CatalogManagerImpl$OnUpdateHandlerImpl.handle(CatalogManagerImpl.java:577)
>  ~[ignite-catalog-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.catalog.CatalogManagerImpl$OnUpdateHandlerImpl.handle(CatalogManagerImpl.java:544)
>  ~[ignite-catalog-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.catalog.storage.UpdateLogImpl$UpdateListener.onUpdate(UpdateLogImpl.java:320)
>  ~[ignite-catalog-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.metastorage.server.Watch.onUpdate(Watch.java:67) 
> ~[ignite-metastorage-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.metastorage.server.WatchProcessor.notifyWatches(WatchProcessor.java:245)
>  ~[ignite-metastorage-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> org.apache.ignite.internal.metastorage.server.WatchProcessor.lambda$notifyWatches$4(WatchProcessor.java:193)
>  ~[ignite-metastorage-3.0.0-SNAPSHOT.jar:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072)
>  [?:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
>  [?:?]
> [10:10:14] : [:ignite-distribution-zones:test]  at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?

[jira] [Commented] (IGNITE-23117) Substitute systemUTC with currentTimeMillis in HybridClockImpl

2024-09-09 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-23117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880211#comment-17880211
 ] 

Vladislav Pyatkov commented on IGNITE-23117:


I ran the patch 8 times, and it failed in two tests from all runs.
Both fails are not connected with this chage. Therefore,  I am sure this PR is 
ready for review.

> Substitute systemUTC with currentTimeMillis in HybridClockImpl
> --
>
> Key: IGNITE-23117
> URL: https://issues.apache.org/jira/browse/IGNITE-23117
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 3.0
>Reporter: Alexander Lapin
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h3. Motivation
> For more details please see https://issues.apache.org/jira/browse/IGNITE-23049
> h3. Definition of Done
>  * It's required to substitute current 
> org.apache.ignite.internal.hlc.HybridClockImpl#currentTime 
> {code:java}
> private static long currentTime() {
> return systemUTC().instant().toEpochMilli() << LOGICAL_TIME_BITS_SIZE;
> }{code}
> with 
> {code:java}
> private static long currentTime() {
> return System.currentTimeMillis() << LOGICAL_TIME_BITS_SIZE;
> }{code}
>  * And adjust mocks in HybridClockTest.
>  * And of course ensure that currentTimeMillis based implementation is stable 
> by multiple TC runs.
> h3. Implementation Notes
> The only non-trivial part here is mock adjustments in the test.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-23082) Reduce a leases object in Meta storage

2024-08-27 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-23082:
--

 Summary: Reduce a leases object in Meta storage
 Key: IGNITE-23082
 URL: https://issues.apache.org/jira/browse/IGNITE-23082
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov


h3. Motivation
Currently, we store all leases in one MC entry. In the case where we have many 
leases, that object would be too large.

h3. Definition of done
Limit object size for one MC entry.
Probably, to reach this, it is only possible to divide the full lease set.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-23081) Lease configuration

2024-08-27 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-23081:
--

 Summary: Lease configuration
 Key: IGNITE-23081
 URL: https://issues.apache.org/jira/browse/IGNITE-23081
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov


h3. Motivation
Lease is a valid interval for the primary replica. Leases are distributed by a 
placement driver for a particular period. This period is hardcoded in the lease 
updater and cannot be configurable.
Another interval is an interval that is given by the placement driver to a 
replica to apply a lease.
{code:java}
LeaseUpdater#LEASE_INTERVAL
LeaseUpdater#longLeaseInterval
{code}

h3. Definition of done
Both intervals have to be configurable through cluster configuration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-20869) CompletableFuture with orTimeout has noticeable performance impact

2024-08-27 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-20869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17877023#comment-17877023
 ] 

Vladislav Pyatkov commented on IGNITE-20869:


Merged ecbe35e102fad3cdd7608aa8f0c3845862c0ed46

> CompletableFuture with orTimeout has noticeable performance impact
> --
>
> Key: IGNITE-20869
> URL: https://issues.apache.org/jira/browse/IGNITE-20869
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Kirill Gusakov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
> Attachments: screenshot-1.png
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> h3. Motivation
> [This 
> |https://github.com/apache/ignite-3/blob/82c74598b5006ea3e4e86da744a68022dd799c89/modules/network/src/main/java/org/apache/ignite/network/DefaultMessagingService.java#L254-L255]line
>  of code cost us near the 3us from the 26us of the whole query (~10%). The 
> reason is the orTimeout(...) call.
> The simple switch to the simple new CompletableFuture() gives us a 10% boost 
> from 26us/s to 23us/s in the 1-node fsync=false run of SelectBenchmark.kvGet.
> {code}
> responseFuture.orTimeout(timeout, TimeUnit.MILLISECONDS);
> {code}
> The code spends 3 microseconds or more.
> The note above is absolutely correct and appropriate for put operation as 
> well.
> Next, I want to notice all the places where `orTimeput` is used in the hot 
> path: 
> * _LeaseTracker#awaitPrimaryReplica_ Althogh at this place, the future 
> usually is completed, we call `orTimout` in any way.
> * _DefaultMessagingService#invoke0_ It is not a network timeout (despite the 
> fact that it is set in the network layer), but this is an important timeout 
> for application code.
> * The _IgniteRpcClient#invokeAsync_ Raft retry timeout is set here.
> * _TcpClientChannel#serviceAsync_ Thin cluient operation timeout.
> h3. Implementation notes
> I compared an invocation `orTimeout` and a timeout worker implementation 
> based on a dedicated thread with a concurrent queue:
> {code}
> Thread t = new Thread(() -> {
> TimeoutObject o;
> while (true) {
> o = queue.poll();
> if (o == null) {
> try {
> Thread.sleep(200);
> } catch (InterruptedException e) {
> throw new RuntimeException(e);
> }
> continue;
> }
> if (System.currentTimeMillis() > o.timeout) {
> o.onTimeout();
> } else {
> queue.add(o);
> }
> }
> }, "timout-worker");
> {code}
> !screenshot-1.png|height=250,width=250!
> h3. Definition of done
> `orTimeout` method is not used in the operation hot (put/get e.t.c) path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-23076) Start single timeout worker thread for several client in one JVM

2024-08-26 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-23076:
--

 Summary: Start single timeout worker thread for several client in 
one JVM
 Key: IGNITE-23076
 URL: https://issues.apache.org/jira/browse/IGNITE-23076
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov


h3. Motivation
Currently, each client starts own timeout worker:
{code:java}
this.timeoutWorker = new TimeoutWorker(
log,
cfg.getAddress().getHostString() + cfg.getAddress().getPort(),
"TcpClientChannel-timeout-worker",
pendingReqs,
// Client-facing future will fail with a timeout, but internal 
ClientRequestFuture will stay in the map -
// otherwise we'll fail with "protocol breakdown" error when a late 
response arrives from the server.
false
);

new IgniteThread(timeoutWorker).start();
{code}
but in the case where we start several clients on one JVM, the only timeout 
worker would be enough.

h3. Definition of done
All clients that were started in one JVM share the timeout worker.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-23075) Call failure processor during a timeout worker crash

2024-08-26 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-23075:
---
Labels: ignite-3  (was: )

> Call failure processor during a timeout worker crash
> 
>
> Key: IGNITE-23075
> URL: https://issues.apache.org/jira/browse/IGNITE-23075
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> For the majority of the cluester transaction operation, we use a Timout 
> worker. It is a single thread that completes futures with a timeout 
> exception. But if the thread stops (due to an unhandled exception) no more 
> operations can time out.
> h3. Definition of done
> The falure processor has to be called on the catch block of the timeout 
> worker.
> {code:java}
> } catch (Throwable t) {
>   failureProcessor.process(new FailureContext(SYSTEM_WORKER_TERMINATION, t));
>   throw new IgniteInternalException(t);
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-23075) Call failure processor during a timeout worker crash

2024-08-26 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-23075:
---
Description: 
h3. Motivation
For the majority of the cluester transaction operation, we use a Timout worker. 
It is a single thread that completes futures with a timeout exception. But if 
the thread stops (due to an unhandled exception) no more operations can time 
out.

h3. Definition of done
The falure processor has to be called on the catch block of the timeout worker.

{code:java}
} catch (Throwable t) {
  failureProcessor.process(new FailureContext(SYSTEM_WORKER_TERMINATION, t));

  throw new IgniteInternalException(t);
}
{code}


  was:
h3. Motivation
For the majority of the cluester transaction operation, we use a Timout worker. 
It is a single thread that completes futures with a timeout exception. But if 
the thread stops (due to an unhandled exception) no more operations can time 
out.

h3.
The falure processor has to be called on the catch block of the timeout worker.

{code:java}
} catch (Throwable t) {
  failureProcessor.process(new FailureContext(SYSTEM_WORKER_TERMINATION, t));

  throw new IgniteInternalException(t);
}
{code}



> Call failure processor during a timeout worker crash
> 
>
> Key: IGNITE-23075
> URL: https://issues.apache.org/jira/browse/IGNITE-23075
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Priority: Major
>
> h3. Motivation
> For the majority of the cluester transaction operation, we use a Timout 
> worker. It is a single thread that completes futures with a timeout 
> exception. But if the thread stops (due to an unhandled exception) no more 
> operations can time out.
> h3. Definition of done
> The falure processor has to be called on the catch block of the timeout 
> worker.
> {code:java}
> } catch (Throwable t) {
>   failureProcessor.process(new FailureContext(SYSTEM_WORKER_TERMINATION, t));
>   throw new IgniteInternalException(t);
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-23075) Call failure processor during a timeout worker crash

2024-08-26 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-23075:
--

 Summary: Call failure processor during a timeout worker crash
 Key: IGNITE-23075
 URL: https://issues.apache.org/jira/browse/IGNITE-23075
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov


h3. Motivation
For the majority of the cluester transaction operation, we use a Timout worker. 
It is a single thread that completes futures with a timeout exception. But if 
the thread stops (due to an unhandled exception) no more operations can time 
out.

h3.
The falure processor has to be called on the catch block of the timeout worker.

{code:java}
} catch (Throwable t) {
  failureProcessor.process(new FailureContext(SYSTEM_WORKER_TERMINATION, t));

  throw new IgniteInternalException(t);
}
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (IGNITE-20869) CompletableFuture with orTimeout has noticeable performance impact

2024-08-23 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-20869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876246#comment-17876246
 ] 

Vladislav Pyatkov edited comment on IGNITE-20869 at 8/23/24 12:18 PM:
--

I prepared a benchmark and run it on my local laptop:
{noformat}
Benchmark (useFutureEmbeddedTimeout)  Mode  Cnt   Score
Error  Units
FeatureTimeoutBenchmark.test   false  avgt   20   1,565 ±  
0,065  us/op
FeatureTimeoutBenchmark.testtrue  avgt   19  34,812 ± 
40,596  us/op
{noformat}
It seems like an implementation based on a concurrent collection is 20 times 
better.


was (Author: v.pyatkov):
I prepared a bookmark and run it on my local laptop:

{noformat}
Benchmark (useFutureEmbeddedTimeout)  Mode  Cnt   Score
Error  Units
FeatureTimeoutBenchmark.test   false  avgt   20   1,565 ±  
0,065  us/op
FeatureTimeoutBenchmark.testtrue  avgt   19  34,812 ± 
40,596  us/op
{noformat}

It seems like an implementation based on a concurrent collection is 20 times 
better.

> CompletableFuture with orTimeout has noticeable performance impact
> --
>
> Key: IGNITE-20869
> URL: https://issues.apache.org/jira/browse/IGNITE-20869
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Kirill Gusakov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
> Attachments: screenshot-1.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h3. Motivation
> [This 
> |https://github.com/apache/ignite-3/blob/82c74598b5006ea3e4e86da744a68022dd799c89/modules/network/src/main/java/org/apache/ignite/network/DefaultMessagingService.java#L254-L255]line
>  of code cost us near the 3us from the 26us of the whole query (~10%). The 
> reason is the orTimeout(...) call.
> The simple switch to the simple new CompletableFuture() gives us a 10% boost 
> from 26us/s to 23us/s in the 1-node fsync=false run of SelectBenchmark.kvGet.
> {code}
> responseFuture.orTimeout(timeout, TimeUnit.MILLISECONDS);
> {code}
> The code spends 3 microseconds or more.
> The note above is absolutely correct and appropriate for put operation as 
> well.
> Next, I want to notice all the places where `orTimeput` is used in the hot 
> path: 
> * _LeaseTracker#awaitPrimaryReplica_ Althogh at this place, the future 
> usually is completed, we call `orTimout` in any way.
> * _DefaultMessagingService#invoke0_ It is not a network timeout (despite the 
> fact that it is set in the network layer), but this is an important timeout 
> for application code.
> * The _IgniteRpcClient#invokeAsync_ Raft retry timeout is set here.
> * _TcpClientChannel#serviceAsync_ Thin cluient operation timeout.
> h3. Implementation notes
> I compared an invocation `orTimeout` and a timeout worker implementation 
> based on a dedicated thread with a concurrent queue:
> {code}
> Thread t = new Thread(() -> {
> TimeoutObject o;
> while (true) {
> o = queue.poll();
> if (o == null) {
> try {
> Thread.sleep(200);
> } catch (InterruptedException e) {
> throw new RuntimeException(e);
> }
> continue;
> }
> if (System.currentTimeMillis() > o.timeout) {
> o.onTimeout();
> } else {
> queue.add(o);
> }
> }
> }, "timout-worker");
> {code}
> !screenshot-1.png|height=250,width=250!
> h3. Definition of done
> `orTimeout` method is not used in the operation hot (put/get e.t.c) path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (IGNITE-20869) CompletableFuture with orTimeout has noticeable performance impact

2024-08-23 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-20869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876246#comment-17876246
 ] 

Vladislav Pyatkov edited comment on IGNITE-20869 at 8/23/24 12:18 PM:
--

I prepared a bookmark and run it on my local laptop:

{noformat}
Benchmark (useFutureEmbeddedTimeout)  Mode  Cnt   Score
Error  Units
FeatureTimeoutBenchmark.test   false  avgt   20   1,565 ±  
0,065  us/op
FeatureTimeoutBenchmark.testtrue  avgt   19  34,812 ± 
40,596  us/op
{noformat}

It seems like an implementation based on a concurrent collection is 20 times 
better.


was (Author: v.pyatkov):
I prepared a bookmark and run it on my local laptop:
Benchmark (useFutureEmbeddedTimeout)  Mode  Cnt   Score
Error  Units
FeatureTimeoutBenchmark.test   false  avgt   20   1,565 ±  
0,065  us/op
FeatureTimeoutBenchmark.testtrue  avgt   19  34,812 ± 
40,596  us/op

It seems like an implementation based on a concurrent collection is 20 times 
better.

> CompletableFuture with orTimeout has noticeable performance impact
> --
>
> Key: IGNITE-20869
> URL: https://issues.apache.org/jira/browse/IGNITE-20869
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Kirill Gusakov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
> Attachments: screenshot-1.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h3. Motivation
> [This 
> |https://github.com/apache/ignite-3/blob/82c74598b5006ea3e4e86da744a68022dd799c89/modules/network/src/main/java/org/apache/ignite/network/DefaultMessagingService.java#L254-L255]line
>  of code cost us near the 3us from the 26us of the whole query (~10%). The 
> reason is the orTimeout(...) call.
> The simple switch to the simple new CompletableFuture() gives us a 10% boost 
> from 26us/s to 23us/s in the 1-node fsync=false run of SelectBenchmark.kvGet.
> {code}
> responseFuture.orTimeout(timeout, TimeUnit.MILLISECONDS);
> {code}
> The code spends 3 microseconds or more.
> The note above is absolutely correct and appropriate for put operation as 
> well.
> Next, I want to notice all the places where `orTimeput` is used in the hot 
> path: 
> * _LeaseTracker#awaitPrimaryReplica_ Althogh at this place, the future 
> usually is completed, we call `orTimout` in any way.
> * _DefaultMessagingService#invoke0_ It is not a network timeout (despite the 
> fact that it is set in the network layer), but this is an important timeout 
> for application code.
> * The _IgniteRpcClient#invokeAsync_ Raft retry timeout is set here.
> * _TcpClientChannel#serviceAsync_ Thin cluient operation timeout.
> h3. Implementation notes
> I compared an invocation `orTimeout` and a timeout worker implementation 
> based on a dedicated thread with a concurrent queue:
> {code}
> Thread t = new Thread(() -> {
> TimeoutObject o;
> while (true) {
> o = queue.poll();
> if (o == null) {
> try {
> Thread.sleep(200);
> } catch (InterruptedException e) {
> throw new RuntimeException(e);
> }
> continue;
> }
> if (System.currentTimeMillis() > o.timeout) {
> o.onTimeout();
> } else {
> queue.add(o);
> }
> }
> }, "timout-worker");
> {code}
> !screenshot-1.png|height=250,width=250!
> h3. Definition of done
> `orTimeout` method is not used in the operation hot (put/get e.t.c) path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-20869) CompletableFuture with orTimeout has noticeable performance impact

2024-08-23 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-20869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876246#comment-17876246
 ] 

Vladislav Pyatkov commented on IGNITE-20869:


I prepared a bookmark and run it on my local laptop:
Benchmark (useFutureEmbeddedTimeout)  Mode  Cnt   Score
Error  Units
FeatureTimeoutBenchmark.test   false  avgt   20   1,565 ±  
0,065  us/op
FeatureTimeoutBenchmark.testtrue  avgt   19  34,812 ± 
40,596  us/op

It seems like an implementation based on a concurrent collection is 20 times 
better.

> CompletableFuture with orTimeout has noticeable performance impact
> --
>
> Key: IGNITE-20869
> URL: https://issues.apache.org/jira/browse/IGNITE-20869
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Kirill Gusakov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
> Attachments: screenshot-1.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h3. Motivation
> [This 
> |https://github.com/apache/ignite-3/blob/82c74598b5006ea3e4e86da744a68022dd799c89/modules/network/src/main/java/org/apache/ignite/network/DefaultMessagingService.java#L254-L255]line
>  of code cost us near the 3us from the 26us of the whole query (~10%). The 
> reason is the orTimeout(...) call.
> The simple switch to the simple new CompletableFuture() gives us a 10% boost 
> from 26us/s to 23us/s in the 1-node fsync=false run of SelectBenchmark.kvGet.
> {code}
> responseFuture.orTimeout(timeout, TimeUnit.MILLISECONDS);
> {code}
> The code spends 3 microseconds or more.
> The note above is absolutely correct and appropriate for put operation as 
> well.
> Next, I want to notice all the places where `orTimeput` is used in the hot 
> path: 
> * _LeaseTracker#awaitPrimaryReplica_ Althogh at this place, the future 
> usually is completed, we call `orTimout` in any way.
> * _DefaultMessagingService#invoke0_ It is not a network timeout (despite the 
> fact that it is set in the network layer), but this is an important timeout 
> for application code.
> * The _IgniteRpcClient#invokeAsync_ Raft retry timeout is set here.
> * _TcpClientChannel#serviceAsync_ Thin cluient operation timeout.
> h3. Implementation notes
> I compared an invocation `orTimeout` and a timeout worker implementation 
> based on a dedicated thread with a concurrent queue:
> {code}
> Thread t = new Thread(() -> {
> TimeoutObject o;
> while (true) {
> o = queue.poll();
> if (o == null) {
> try {
> Thread.sleep(200);
> } catch (InterruptedException e) {
> throw new RuntimeException(e);
> }
> continue;
> }
> if (System.currentTimeMillis() > o.timeout) {
> o.onTimeout();
> } else {
> queue.add(o);
> }
> }
> }, "timout-worker");
> {code}
> !screenshot-1.png|height=250,width=250!
> h3. Definition of done
> `orTimeout` method is not used in the operation hot (put/get e.t.c) path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-20869) CompletableFuture with orTimeout has noticeable performance impact

2024-08-19 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov reassigned IGNITE-20869:
--

Assignee: Vladislav Pyatkov

> CompletableFuture with orTimeout has noticeable performance impact
> --
>
> Key: IGNITE-20869
> URL: https://issues.apache.org/jira/browse/IGNITE-20869
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Kirill Gusakov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
> Attachments: screenshot-1.png
>
>
> h3. Motivation
> [This 
> |https://github.com/apache/ignite-3/blob/82c74598b5006ea3e4e86da744a68022dd799c89/modules/network/src/main/java/org/apache/ignite/network/DefaultMessagingService.java#L254-L255]line
>  of code cost us near the 3us from the 26us of the whole query (~10%). The 
> reason is the orTimeout(...) call.
> The simple switch to the simple new CompletableFuture() gives us a 10% boost 
> from 26us/s to 23us/s in the 1-node fsync=false run of SelectBenchmark.kvGet.
> {code}
> responseFuture.orTimeout(timeout, TimeUnit.MILLISECONDS);
> {code}
> The code spends 3 microseconds or more.
> The note above is absolutely correct and appropriate for put operation as 
> well.
> Next, I want to notice all the places where `orTimeput` is used in the hot 
> path: 
> * _LeaseTracker#awaitPrimaryReplica_ Althogh at this place, the future 
> usually is completed, we call `orTimout` in any way.
> * _DefaultMessagingService#invoke0_ It is not a network timeout (despite the 
> fact that it is set in the network layer), but this is an important timeout 
> for application code.
> * The _IgniteRpcClient#invokeAsync_ Raft retry timeout is set here.
> * _TcpClientChannel#serviceAsync_ Thin cluient operation timeout.
> h3. Implementation notes
> I compared an invocation `orTimeout` and a timeout worker implementation 
> based on a dedicated thread with a concurrent queue:
> {code}
> Thread t = new Thread(() -> {
> TimeoutObject o;
> while (true) {
> o = queue.poll();
> if (o == null) {
> try {
> Thread.sleep(200);
> } catch (InterruptedException e) {
> throw new RuntimeException(e);
> }
> continue;
> }
> if (System.currentTimeMillis() > o.timeout) {
> o.onTimeout();
> } else {
> queue.add(o);
> }
> }
> }, "timout-worker");
> {code}
> !screenshot-1.png|height=250,width=250!
> h3. Definition of done
> `orTimeout` method is not used in the operation hot (put/get e.t.c) path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-22599) Exception inside RAFT listener does not invoke falure handler

2024-08-16 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-22599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874165#comment-17874165
 ] 

Vladislav Pyatkov commented on IGNITE-22599:


Merged 9672234bf81b35ee367138238d5585293d910fcc

> Exception inside RAFT listener does not invoke falure handler
> -
>
> Key: IGNITE-22599
> URL: https://issues.apache.org/jira/browse/IGNITE-22599
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vladislav Pyatkov
>Assignee: Mirza Aliev
>Priority: Major
>  Labels: ignite-3
> Attachments: 
> poc-tester-SERVER-192.168.1.41-id-0-2024-06-27-09-14-17-client.log.2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> h3. Motivation
> An exception that is thrown during handling RAFT command cannot be recovered, 
> because a RAFT storage is already corrupted. This is a reason to call Falure 
> handler:
> {code:java}
> failureProcessor.process(new FailureContext(CRITICAL_ERROR, err));
> {code}
> Currently, I do not see any FH invokation in the log, but many exception has 
> been thrown.
> h3. Definition of done
> FH is invoked in the case where RAFT listener fails.
> h3. Implementation Notes
>  * Failure handler should be invoked in case of critical exceptions while 
> onWrite processing onRead processing, snapshotting and snapshot installation.
>  * While testing, it's required to test all CMG MG and Partition listeners.
>  * In order to ease testing, it worth implementing a test failure handler in 
> order to detect that it was triggered.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-22599) Exception inside RAFT listener does not invoke falure handler

2024-08-16 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-22599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874163#comment-17874163
 ] 

Vladislav Pyatkov commented on IGNITE-22599:


LGTM

> Exception inside RAFT listener does not invoke falure handler
> -
>
> Key: IGNITE-22599
> URL: https://issues.apache.org/jira/browse/IGNITE-22599
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vladislav Pyatkov
>Assignee: Mirza Aliev
>Priority: Major
>  Labels: ignite-3
> Attachments: 
> poc-tester-SERVER-192.168.1.41-id-0-2024-06-27-09-14-17-client.log.2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> h3. Motivation
> An exception that is thrown during handling RAFT command cannot be recovered, 
> because a RAFT storage is already corrupted. This is a reason to call Falure 
> handler:
> {code:java}
> failureProcessor.process(new FailureContext(CRITICAL_ERROR, err));
> {code}
> Currently, I do not see any FH invokation in the log, but many exception has 
> been thrown.
> h3. Definition of done
> FH is invoked in the case where RAFT listener fails.
> h3. Implementation Notes
>  * Failure handler should be invoked in case of critical exceptions while 
> onWrite processing onRead processing, snapshotting and snapshot installation.
>  * While testing, it's required to test all CMG MG and Partition listeners.
>  * In order to ease testing, it worth implementing a test failure handler in 
> order to detect that it was triggered.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-22850) The same set of leases writes to Meta storage

2024-08-14 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-22850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17873503#comment-17873503
 ] 

Vladislav Pyatkov commented on IGNITE-22850:


Merged 2c4232f51f4832240f1f69a15b50e5cd92ff333d

> The same set of leases writes to Meta storage
> -
>
> Key: IGNITE-22850
> URL: https://issues.apache.org/jira/browse/IGNITE-22850
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
> Attachments: Lease_updater.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> h3. Motivation
> We are trying to reduce the load on Meta storage. Therefore, adding extra 
> entities to the metastorage is undesirable. Here, we can simply not write 
> leases if nothing has changed for them.
> h3. Definition of done
> Write leases to Meta storage only if they have been updated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20869) CompletableFuture with orTimeout has noticeable performance impact

2024-08-14 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-20869:
---
Description: 
h3. Motivation
[This 
|https://github.com/apache/ignite-3/blob/82c74598b5006ea3e4e86da744a68022dd799c89/modules/network/src/main/java/org/apache/ignite/network/DefaultMessagingService.java#L254-L255]line
 of code cost us near the 3us from the 26us of the whole query (~10%). The 
reason is the orTimeout(...) call.

The simple switch to the simple new CompletableFuture() gives us a 10% boost 
from 26us/s to 23us/s in the 1-node fsync=false run of SelectBenchmark.kvGet.

{code}
responseFuture.orTimeout(timeout, TimeUnit.MILLISECONDS);
{code}
The code spends 3 microseconds or more.

The note above is absolutely correct and appropriate for put operation as well.
Next, I want to notice all the places where `orTimeput` is used in the hot 
path: 
* _LeaseTracker#awaitPrimaryReplica_ Althogh at this place, the future usually 
is completed, we call `orTimout` in any way.
* _DefaultMessagingService#invoke0_ It is not a network timeout (despite the 
fact that it is set in the network layer), but this is an important timeout for 
application code.
* The _IgniteRpcClient#invokeAsync_ Raft retry timeout is set here.
* _TcpClientChannel#serviceAsync_ Thin cluient operation timeout.

h3. Implementation notes
I compared an invocation `orTimeout` and a timeout worker implementation based 
on a dedicated thread with a concurrent queue:
{code}
Thread t = new Thread(() -> {
TimeoutObject o;

while (true) {
o = queue.poll();

if (o == null) {
try {
Thread.sleep(200);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}

continue;
}

if (System.currentTimeMillis() > o.timeout) {
o.onTimeout();
} else {
queue.add(o);
}

}
}, "timout-worker");
{code}
!screenshot-1.png|height=250,width=250!

h3. Definition of done
`orTimeout` method is not used in the operation hot (put/get e.t.c) path.

  was:
h3. Motivation
[This 
|https://github.com/apache/ignite-3/blob/82c74598b5006ea3e4e86da744a68022dd799c89/modules/network/src/main/java/org/apache/ignite/network/DefaultMessagingService.java#L254-L255]line
 of code cost us near the 3us from the 26us of the whole query (~10%). The 
reason is the orTimeout(...) call.

The simple switch to the simple new CompletableFuture() gives us a 10% boost 
from 26us/s to 23us/s in the 1-node fsync=false run of SelectBenchmark.kvGet.

{code}
responseFuture.orTimeout(timeout, TimeUnit.MILLISECONDS);
{code}
The code spends 3 microseconds or more.

The note above is absolutely correct and appropriate for put operation as well.
Next, I want to notice all the places where `orTimeput` is used in the hot 
path: 
* _LeaseTracker#awaitPrimaryReplica_ Althogh at this place, the future usually 
is completed, we call `orTimout` in any way.
* _DefaultMessagingService#invoke0_ It is not a network timeout (despite the 
fact that it is set in the network layer), but this is an important timeout for 
application code.
* The _IgniteRpcClient#invokeAsync_ Raft retry timeout is set here.
* _TcpClientChannel#serviceAsync_ Thin cluient operation timeout.

h3. Implementation notes
I compared an invocation `orTimeout` and a timeout worker implementation based 
on a dedicated thread with a concurrent queue:
{code}
Thread t = new Thread(() -> {
TimeoutObject o;

while (true) {
o = queue.poll();

if (o == null) {
try {
Thread.sleep(200);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}

continue;
}

if (System.currentTimeMillis() > o.timeout) {
o.onTimeout();
} else {
queue.add(o);
}

}
}, "timout-worker");
{code}
 !screenshot-1.png! 

h3. Definition of done
`orTimeout` method is not used in the operation hot (put/get e.t.c) path.


> CompletableFuture with orTimeout has noticeable performance impact
> --
>
> Key: IGNITE-20869
> URL: https://issues.apache.org/jira/browse/IGNITE-20869
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Kirill Gusakov
>Priority: Major
>  Labels: ignite-3
> Attachments: screenshot-1.png
>
>
> h3. Motivation
> [This 
> |https://github.com/apache/ignite-3/blob/82c74598b5006ea3e4e86da7

[jira] [Updated] (IGNITE-20869) CompletableFuture with orTimeout has noticeable performance impact

2024-08-14 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-20869:
---
Attachment: screenshot-1.png

> CompletableFuture with orTimeout has noticeable performance impact
> --
>
> Key: IGNITE-20869
> URL: https://issues.apache.org/jira/browse/IGNITE-20869
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Kirill Gusakov
>Priority: Major
>  Labels: ignite-3
> Attachments: screenshot-1.png
>
>
> h3. Motivation
> [This 
> |https://github.com/apache/ignite-3/blob/82c74598b5006ea3e4e86da744a68022dd799c89/modules/network/src/main/java/org/apache/ignite/network/DefaultMessagingService.java#L254-L255]line
>  of code cost us near the 3us from the 26us of the whole query (~10%). The 
> reason is the orTimeout(...) call.
> The simple switch to the simple new CompletableFuture() gives us a 10% boost 
> from 26us/s to 23us/s in the 1-node fsync=false run of SelectBenchmark.kvGet.
> {code}
> responseFuture.orTimeout(timeout, TimeUnit.MILLISECONDS);
> {code}
> The code spends 3 microseconds or more.
> The note above is absolutely correct and appropriate for put operation as 
> well.
> Next, I want to notice all the places where `orTimeput` is used in the hot 
> path: 
> * _LeaseTracker#awaitPrimaryReplica_ Althogh at this place, the future 
> usually is completed, we call `orTimout` in any way.
> * _DefaultMessagingService#invoke0_ It is not a network timeout (despite the 
> fact that it is set in the network layer), but this is an important timeout 
> for application code.
> * _IgniteRpcClient#invokeAsync_ Raft retry timeout is set here.
> * _TcpClientChannel#serviceAsync_ Thin cluient operation timeout.
> h3. Definition of done
> `orTimeout` method is not used in the operation hot (put/get e.t.c) path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20869) CompletableFuture with orTimeout has noticeable performance impact

2024-08-14 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-20869:
---
Description: 
h3. Motivation
[This 
|https://github.com/apache/ignite-3/blob/82c74598b5006ea3e4e86da744a68022dd799c89/modules/network/src/main/java/org/apache/ignite/network/DefaultMessagingService.java#L254-L255]line
 of code cost us near the 3us from the 26us of the whole query (~10%). The 
reason is the orTimeout(...) call.

The simple switch to the simple new CompletableFuture() gives us a 10% boost 
from 26us/s to 23us/s in the 1-node fsync=false run of SelectBenchmark.kvGet.

{code}
responseFuture.orTimeout(timeout, TimeUnit.MILLISECONDS);
{code}
The code spends 3 microseconds or more.

The note above is absolutely correct and appropriate for put operation as well.
Next, I want to notice all the places where `orTimeput` is used in the hot 
path: 
* _LeaseTracker#awaitPrimaryReplica_ Althogh at this place, the future usually 
is completed, we call `orTimout` in any way.
* _DefaultMessagingService#invoke0_ It is not a network timeout (despite the 
fact that it is set in the network layer), but this is an important timeout for 
application code.
* The _IgniteRpcClient#invokeAsync_ Raft retry timeout is set here.
* _TcpClientChannel#serviceAsync_ Thin cluient operation timeout.

h3. Implementation notes
I compared an invocation `orTimeout` and a timeout worker implementation based 
on a dedicated thread with a concurrent queue:
{code}
Thread t = new Thread(() -> {
TimeoutObject o;

while (true) {
o = queue.poll();

if (o == null) {
try {
Thread.sleep(200);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}

continue;
}

if (System.currentTimeMillis() > o.timeout) {
o.onTimeout();
} else {
queue.add(o);
}

}
}, "timout-worker");
{code}
 !screenshot-1.png! 

h3. Definition of done
`orTimeout` method is not used in the operation hot (put/get e.t.c) path.

  was:
h3. Motivation
[This 
|https://github.com/apache/ignite-3/blob/82c74598b5006ea3e4e86da744a68022dd799c89/modules/network/src/main/java/org/apache/ignite/network/DefaultMessagingService.java#L254-L255]line
 of code cost us near the 3us from the 26us of the whole query (~10%). The 
reason is the orTimeout(...) call.

The simple switch to the simple new CompletableFuture() gives us a 10% boost 
from 26us/s to 23us/s in the 1-node fsync=false run of SelectBenchmark.kvGet.

{code}
responseFuture.orTimeout(timeout, TimeUnit.MILLISECONDS);
{code}
The code spends 3 microseconds or more.

The note above is absolutely correct and appropriate for put operation as well.
Next, I want to notice all the places where `orTimeput` is used in the hot 
path: 
* _LeaseTracker#awaitPrimaryReplica_ Althogh at this place, the future usually 
is completed, we call `orTimout` in any way.
* _DefaultMessagingService#invoke0_ It is not a network timeout (despite the 
fact that it is set in the network layer), but this is an important timeout for 
application code.
* _IgniteRpcClient#invokeAsync_ Raft retry timeout is set here.
* _TcpClientChannel#serviceAsync_ Thin cluient operation timeout.

h3. Definition of done
`orTimeout` method is not used in the operation hot (put/get e.t.c) path.


> CompletableFuture with orTimeout has noticeable performance impact
> --
>
> Key: IGNITE-20869
> URL: https://issues.apache.org/jira/browse/IGNITE-20869
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Kirill Gusakov
>Priority: Major
>  Labels: ignite-3
> Attachments: screenshot-1.png
>
>
> h3. Motivation
> [This 
> |https://github.com/apache/ignite-3/blob/82c74598b5006ea3e4e86da744a68022dd799c89/modules/network/src/main/java/org/apache/ignite/network/DefaultMessagingService.java#L254-L255]line
>  of code cost us near the 3us from the 26us of the whole query (~10%). The 
> reason is the orTimeout(...) call.
> The simple switch to the simple new CompletableFuture() gives us a 10% boost 
> from 26us/s to 23us/s in the 1-node fsync=false run of SelectBenchmark.kvGet.
> {code}
> responseFuture.orTimeout(timeout, TimeUnit.MILLISECONDS);
> {code}
> The code spends 3 microseconds or more.
> The note above is absolutely correct and appropriate for put operation as 
> well.
> Next, I want to notice all the places where `orTimeput` is used in the hot 
> path: 
> * _LeaseTracker#awaitPrimaryReplica_ Althogh at this place, the future 
> usually is completed, we call `orTimout` in any way.
> * _DefaultMessagingS

[jira] [Updated] (IGNITE-20869) CompletableFuture with orTimeout has noticeable performance impact

2024-08-14 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-20869:
---
Description: 
h3. Motivation
[This 
|https://github.com/apache/ignite-3/blob/82c74598b5006ea3e4e86da744a68022dd799c89/modules/network/src/main/java/org/apache/ignite/network/DefaultMessagingService.java#L254-L255]line
 of code cost us near the 3us from the 26us of the whole query (~10%). The 
reason is the orTimeout(...) call.

The simple switch to the simple new CompletableFuture() gives us a 10% boost 
from 26us/s to 23us/s in the 1-node fsync=false run of SelectBenchmark.kvGet.

{code}
responseFuture.orTimeout(timeout, TimeUnit.MILLISECONDS);
{code}
The code spends 3 microseconds or more.

The note above is absolutely correct and appropriate for put operation as well.
Next, I want to notice all the places where `orTimeput` is used in the hot 
path: 
* _LeaseTracker#awaitPrimaryReplica_ Althogh at this place, the future usually 
is completed, we call `orTimout` in any way.
* _DefaultMessagingService#invoke0_ It is not a network timeout (despite the 
fact that it is set in the network layer), but this is an important timeout for 
application code.
* _IgniteRpcClient#invokeAsync_ Raft retry timeout is set here.
* _TcpClientChannel#serviceAsync_ Thin cluient operation timeout.

h3. Definition of done
`orTimeout` method is not used in the operation hot (put/get e.t.c) path.

  was:
h3. Motivation
[This 
|https://github.com/apache/ignite-3/blob/82c74598b5006ea3e4e86da744a68022dd799c89/modules/network/src/main/java/org/apache/ignite/network/DefaultMessagingService.java#L254-L255]line
 of code cost us near the 3us from the 26us of the whole query (~10%). The 
reason is the orTimeout(...) call.

The simple switch to the simple new CompletableFuture() gives us a 10% boost 
from 26us/s to 23us/s in the 1-node fsync=false run of SelectBenchmark.kvGet.

{code}
responseFuture.orTimeout(timeout, TimeUnit.MILLISECONDS);
{code}
The code spends 3 microseconds or more.

The note above is absolutely correct and also appropriate for put operation as 
well.
Next, I want to notice all the places where `orTimeput` is used in the hot 
path: 
* _LeaseTracker#awaitPrimaryReplica_ Althogh at this place, the future usually 
is completed, we call `orTimout` in any way.
* _DefaultMessagingService#invoke0_ It is not a network timeout (despite the 
fact that it is set in the network layer), but this is an important timeout for 
application code.
* _IgniteRpcClient#invokeAsync_ Raft retry timeout is set here.
* _TcpClientChannel#serviceAsync_ Thin cluient operation timeout.

h3. Definition of done


> CompletableFuture with orTimeout has noticeable performance impact
> --
>
> Key: IGNITE-20869
> URL: https://issues.apache.org/jira/browse/IGNITE-20869
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Kirill Gusakov
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> [This 
> |https://github.com/apache/ignite-3/blob/82c74598b5006ea3e4e86da744a68022dd799c89/modules/network/src/main/java/org/apache/ignite/network/DefaultMessagingService.java#L254-L255]line
>  of code cost us near the 3us from the 26us of the whole query (~10%). The 
> reason is the orTimeout(...) call.
> The simple switch to the simple new CompletableFuture() gives us a 10% boost 
> from 26us/s to 23us/s in the 1-node fsync=false run of SelectBenchmark.kvGet.
> {code}
> responseFuture.orTimeout(timeout, TimeUnit.MILLISECONDS);
> {code}
> The code spends 3 microseconds or more.
> The note above is absolutely correct and appropriate for put operation as 
> well.
> Next, I want to notice all the places where `orTimeput` is used in the hot 
> path: 
> * _LeaseTracker#awaitPrimaryReplica_ Althogh at this place, the future 
> usually is completed, we call `orTimout` in any way.
> * _DefaultMessagingService#invoke0_ It is not a network timeout (despite the 
> fact that it is set in the network layer), but this is an important timeout 
> for application code.
> * _IgniteRpcClient#invokeAsync_ Raft retry timeout is set here.
> * _TcpClientChannel#serviceAsync_ Thin cluient operation timeout.
> h3. Definition of done
> `orTimeout` method is not used in the operation hot (put/get e.t.c) path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20869) CompletableFuture with orTimeout has noticeable performance impact

2024-08-14 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-20869:
---
Description: 
h3. Motivation
[This 
|https://github.com/apache/ignite-3/blob/82c74598b5006ea3e4e86da744a68022dd799c89/modules/network/src/main/java/org/apache/ignite/network/DefaultMessagingService.java#L254-L255]line
 of code cost us near the 3us from the 26us of the whole query (~10%). The 
reason is the orTimeout(...) call.

The simple switch to the simple new CompletableFuture() gives us a 10% boost 
from 26us/s to 23us/s in the 1-node fsync=false run of SelectBenchmark.kvGet.

{code}
responseFuture.orTimeout(timeout, TimeUnit.MILLISECONDS);
{code}
The code spends 3 microseconds or more.

The note above is absolutely correct and also appropriate for put operation as 
well.
Next, I want to notice all the places where `orTimeput` is used in the hot 
path: 
* _LeaseTracker#awaitPrimaryReplica_ Althogh at this place, the future usually 
is completed, we call `orTimout` in any way.
* _DefaultMessagingService#invoke0_ It is not a network timeout (despite the 
fact that it is set in the network layer), but this is an important timeout for 
application code.
* _IgniteRpcClient#invokeAsync_ Raft retry timeout is set here.
* _TcpClientChannel#serviceAsync_ Thin cluient operation timeout.

h3. Definition of done

  was:
[This 
|https://github.com/apache/ignite-3/blob/82c74598b5006ea3e4e86da744a68022dd799c89/modules/network/src/main/java/org/apache/ignite/network/DefaultMessagingService.java#L254-L255]line
 of code cost us near the 3us from the 26us of the whole query (~10%). The 
reason is the orTimeout(...) call.

The simple switch to simple new CompletableFuture() give us 10% boost from 
26us/s to 23us/s in the 1-node fsync=false run of SelectBenchmark.kvGet.


> CompletableFuture with orTimeout has noticeable performance impact
> --
>
> Key: IGNITE-20869
> URL: https://issues.apache.org/jira/browse/IGNITE-20869
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Kirill Gusakov
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> [This 
> |https://github.com/apache/ignite-3/blob/82c74598b5006ea3e4e86da744a68022dd799c89/modules/network/src/main/java/org/apache/ignite/network/DefaultMessagingService.java#L254-L255]line
>  of code cost us near the 3us from the 26us of the whole query (~10%). The 
> reason is the orTimeout(...) call.
> The simple switch to the simple new CompletableFuture() gives us a 10% boost 
> from 26us/s to 23us/s in the 1-node fsync=false run of SelectBenchmark.kvGet.
> {code}
> responseFuture.orTimeout(timeout, TimeUnit.MILLISECONDS);
> {code}
> The code spends 3 microseconds or more.
> The note above is absolutely correct and also appropriate for put operation 
> as well.
> Next, I want to notice all the places where `orTimeput` is used in the hot 
> path: 
> * _LeaseTracker#awaitPrimaryReplica_ Althogh at this place, the future 
> usually is completed, we call `orTimout` in any way.
> * _DefaultMessagingService#invoke0_ It is not a network timeout (despite the 
> fact that it is set in the network layer), but this is an important timeout 
> for application code.
> * _IgniteRpcClient#invokeAsync_ Raft retry timeout is set here.
> * _TcpClientChannel#serviceAsync_ Thin cluient operation timeout.
> h3. Definition of done



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-22838) Operation returns to partition pool after it is handled in RAFT

2024-08-13 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-22838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17873193#comment-17873193
 ] 

Vladislav Pyatkov commented on IGNITE-22838:


Merged 2486390b2b237a37eb335ba61cc8c97796f18505

> Operation returns to partition pool after it is handled in RAFT
> ---
>
> Key: IGNITE-22838
> URL: https://issues.apache.org/jira/browse/IGNITE-22838
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> h3. Motivation
> After executing the synchronous embedded insert operation, we are returning 
> to the partition-operations pool to return control to the client pool. We are 
> just losing several microseconds instead of returning to the client pool 
> directly.
> {noformat}
> finishFullTx:%node_3344%partition-operations-2 1.3 5476200 5477500
> Here is hidden 5.5 us
> kvPutEndMark:org.apache.ignite.internal.benchmark.InsertBenchmark.kvInsert-jmh-worker-1
>  0.0 5483000 5483000
> {noformat}
> h3. Definition of done
> We need to develop a strategy that does not involve the extra thread in 
> synchronous operation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22850) The same set of leases writes to Meta storage

2024-08-13 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-22850:
---
Fix Version/s: 3.0.0-beta2
 Reviewer: Mirza Aliev

> The same set of leases writes to Meta storage
> -
>
> Key: IGNITE-22850
> URL: https://issues.apache.org/jira/browse/IGNITE-22850
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
> Attachments: Lease_updater.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h3. Motivation
> We are trying to reduce the load on Meta storage. Therefore, adding extra 
> entities to the metastorage is undesirable. Here, we can simply not write 
> leases if nothing has changed for them.
> h3. Definition of done
> Write leases to Meta storage only if they have been updated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22980) Lock manager may fail and lock waiter simultaneously

2024-08-13 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-22980:
---
Description: 
h3. Motivation
The behavior was hardly predicted or planned. But currently, we can acquire a 
lock:
{code:java}
private void lock() {
lockMode = intendedLockMode;

intendedLockMode = null;

intendedLocks.clear();
}
{code}
and made the waiter fail:
{code:java}
private void fail(LockException e) {
ex = e;
}
{code}
without limitation (assertion checking or explicitly prohibition).

h3. Definition of done
Only one method can be applied to a lock attempt ether lock() or fail(), but 
not both.
Do not forget, a retry attempt may be successful even though the previous 
attempt failed.



  was:
h3. Motivation
The behavior was hardly predicted or planned. But currently, we can acquery a 
lock:
{code:java}
private void lock() {
IgniteUtils.dumpStack(Loggers.forClass(HeapLockManager.class), 
"Lock is taken " + intendedLockMode);

lockMode = intendedLockMode;

intendedLockMode = null;

intendedLocks.clear();
}
{code}
and made the waiter fail:
{code:java}
private void fail(LockException e) {
ex = e;
}
{code}
without limitation (assertion checking or explicitly prohibition).

h3. Definition of done
Only one method can be applied to a lock attempt ether lock() or fail(), but 
not both.
Do not forget, a retry attempt may be successful even though the previous 
attempt failed.




> Lock manager may fail and lock waiter simultaneously
> 
>
> Key: IGNITE-22980
> URL: https://issues.apache.org/jira/browse/IGNITE-22980
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> The behavior was hardly predicted or planned. But currently, we can acquire a 
> lock:
> {code:java}
> private void lock() {
> lockMode = intendedLockMode;
> intendedLockMode = null;
> intendedLocks.clear();
> }
> {code}
> and made the waiter fail:
> {code:java}
> private void fail(LockException e) {
> ex = e;
> }
> {code}
> without limitation (assertion checking or explicitly prohibition).
> h3. Definition of done
> Only one method can be applied to a lock attempt ether lock() or fail(), but 
> not both.
> Do not forget, a retry attempt may be successful even though the previous 
> attempt failed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22980) Lock manager may fail and lock waiter simultaneously

2024-08-13 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-22980:
---
Description: 
h3. Motivation
The behavior was hardly predicted or planned. But currently, we can acquery a 
lock:
{code:java}
private void lock() {
IgniteUtils.dumpStack(Loggers.forClass(HeapLockManager.class), 
"Lock is taken " + intendedLockMode);

lockMode = intendedLockMode;

intendedLockMode = null;

intendedLocks.clear();
}
{code}
and made the waiter fail:
{code:java}
private void fail(LockException e) {
ex = e;
}
{code}
without limitation (assertion checking or explicitly prohibition).

h3. Definition of done
Only one method can be applied to a lock attempt ether lock() or fail(), but 
not both.
Do not forget, a retry attempt may be successful even though the previous 
attempt failed.



  was:
h3. Motivation
The behavior was hardly predicted or planned. But currently, we can acquery a 
lock:
{code:java}
private void lock() {
IgniteUtils.dumpStack(Loggers.forClass(HeapLockManager.class), 
"Lock is taken " + intendedLockMode);

lockMode = intendedLockMode;

intendedLockMode = null;

intendedLocks.clear();
}
{code}
and made the waiter fail:
{code:java}
private void fail(LockException e) {
ex = e;
}
{code}
without limitation (assertion checking or explicitly prohibition).

h3. Definition of done
Only one method can be applied to a lock attempt ether lock() or fail(), but 
not both.




> Lock manager may fail and lock waiter simultaneously
> 
>
> Key: IGNITE-22980
> URL: https://issues.apache.org/jira/browse/IGNITE-22980
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> The behavior was hardly predicted or planned. But currently, we can acquery a 
> lock:
> {code:java}
> private void lock() {
> IgniteUtils.dumpStack(Loggers.forClass(HeapLockManager.class), 
> "Lock is taken " + intendedLockMode);
> lockMode = intendedLockMode;
> intendedLockMode = null;
> intendedLocks.clear();
> }
> {code}
> and made the waiter fail:
> {code:java}
> private void fail(LockException e) {
> ex = e;
> }
> {code}
> without limitation (assertion checking or explicitly prohibition).
> h3. Definition of done
> Only one method can be applied to a lock attempt ether lock() or fail(), but 
> not both.
> Do not forget, a retry attempt may be successful even though the previous 
> attempt failed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-22980) Lock manager may fail and lock waiter simultaneously

2024-08-13 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-22980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17873171#comment-17873171
 ] 

Vladislav Pyatkov commented on IGNITE-22980:


This issue was reproduced in 
ItTransactionRecoveryTest#testRecoveryIsTriggeredOnce. The reason is that we 
can acquire a lock in the same thread where we try to fail the waiter.
{noformat}
[2024-08-12T22:18:20,774][WARN ][main][HeapLockManager] Dumping stack
 java.lang.Exception: Lock is taken S
at 
org.apache.ignite.internal.util.IgniteUtils.dumpStack(IgniteUtils.java:666) 
~[main/:?]
at 
org.apache.ignite.internal.tx.impl.HeapLockManager$WaiterImpl.lock(HeapLockManager.java:908)
 ~[main/:?]
at 
org.apache.ignite.internal.tx.impl.HeapLockManager$LockState.isWaiterReadyToNotify(HeapLockManager.java:480)
 ~[main/:?]
at 
org.apache.ignite.internal.tx.impl.HeapLockManager$LockState.unlockCompatibleWaiters(HeapLockManager.java:596)
 ~[main/:?]
at 
org.apache.ignite.internal.tx.impl.HeapLockManager$LockState.release(HeapLockManager.java:577)
 ~[main/:?]
at 
org.apache.ignite.internal.tx.impl.HeapLockManager$LockState.tryRelease(HeapLockManager.java:519)
 ~[main/:?]
at 
org.apache.ignite.internal.tx.impl.HeapLockManager.releaseAll(HeapLockManager.java:220)
 ~[main/:?]
at 
org.apache.ignite.internal.tx.impl.TxCleanupRequestHandler.releaseTxLocks(TxCleanupRequestHandler.java:173)
 ~[main/:?]
at 
org.apache.ignite.internal.tx.impl.TxCleanupRequestHandler.lambda$processTxCleanup$2(TxCleanupRequestHandler.java:144)
 ~[main/:?]
at 
java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
 ~[?:?]
at 
java.base/java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:883)
 ~[?:?]
at 
java.base/java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2251)
 ~[?:?]
at 
org.apache.ignite.internal.tx.impl.TxCleanupRequestHandler.processTxCleanup(TxCleanupRequestHandler.java:143)
 ~[main/:?]
at 
org.apache.ignite.internal.tx.impl.TxCleanupRequestHandler.lambda$start$0(TxCleanupRequestHandler.java:108)
 ~[main/:?]
at 
org.apache.ignite.internal.network.TrackableNetworkMessageHandler.onReceived(TrackableNetworkMessageHandler.java:52)
 ~[main/:?]
at 
org.apache.ignite.internal.network.DefaultMessagingService.sendToSelf(DefaultMessagingService.java:370)
 ~[main/:?]
at 
org.apache.ignite.internal.network.DefaultMessagingService.invoke0(DefaultMessagingService.java:298)
 ~[main/:?]
at 
org.apache.ignite.internal.network.DefaultMessagingService.invoke(DefaultMessagingService.java:226)
 ~[main/:?]
at 
org.apache.ignite.internal.network.wrapper.JumpToExecutorByConsistentIdAfterSend.invoke(JumpToExecutorByConsistentIdAfterSend.java:97)
 ~[main/:?]
at 
org.apache.ignite.internal.network.MessagingService.invoke(MessagingService.java:198)
 ~[main/:?]
at 
org.apache.ignite.internal.tx.impl.TxMessageSender.cleanup(TxMessageSender.java:131)
 ~[main/:?]
at 
org.apache.ignite.internal.tx.impl.TxCleanupRequestSender.sendCleanupMessageWithRetries(TxCleanupRequestSender.java:240)
 ~[main/:?]
at 
org.apache.ignite.internal.tx.impl.TxCleanupRequestSender.cleanupPartitions(TxCleanupRequestSender.java:226)
 ~[main/:?]
at 
org.apache.ignite.internal.tx.impl.TxCleanupRequestSender.cleanup(TxCleanupRequestSender.java:165)
 ~[main/:?]
at 
org.apache.ignite.internal.tx.impl.TxManagerImpl.cleanup(TxManagerImpl.java:833)
 ~[main/:?]
at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$finishAndCleanup$65(PartitionReplicaListener.java:1738)
 ~[main/:?]
at 
java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1106)
 ~[?:?]
at 
java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2235)
 ~[?:?]
at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.finishAndCleanup(PartitionReplicaListener.java:1737)
 ~[main/:?]
at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.processTxFinishAction(PartitionReplicaListener.java:1657)
 ~[main/:?]
at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.processOperationRequest(PartitionReplicaListener.java:795)
 ~[main/:?]
at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.processOperationRequestWithTxRwCounter(PartitionReplicaListener.java:4045)
 ~[main/:?]
at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$processRequest$6(PartitionReplicaListener.java:522)
 ~[main/:?]
at 
java.base/java.util.concurrent.CompletableFuture.uniComposeStage(Completable

[jira] [Created] (IGNITE-22980) Lock manager may fail and lock waiter simultaneously

2024-08-13 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-22980:
--

 Summary: Lock manager may fail and lock waiter simultaneously
 Key: IGNITE-22980
 URL: https://issues.apache.org/jira/browse/IGNITE-22980
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


h3. Motivation
The behavior was hardly predicted or planned. But currently, we can acquery a 
lock:
{code:java}
private void lock() {
IgniteUtils.dumpStack(Loggers.forClass(HeapLockManager.class), 
"Lock is taken " + intendedLockMode);

lockMode = intendedLockMode;

intendedLockMode = null;

intendedLocks.clear();
}
{code}
and made the waiter fail:
{code:java}
private void fail(LockException e) {
ex = e;
}
{code}
without limitation (assertion checking or explicitly prohibition).

h3. Definition of done
Only one method can be applied to a lock attempt ether lock() or fail(), but 
not both.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-22850) The same set of leases writes to Meta storage

2024-08-13 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov reassigned IGNITE-22850:
--

Assignee: Vladislav Pyatkov

> The same set of leases writes to Meta storage
> -
>
> Key: IGNITE-22850
> URL: https://issues.apache.org/jira/browse/IGNITE-22850
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
> Attachments: Lease_updater.patch
>
>
> h3. Motivation
> We are trying to reduce the load on Meta storage. Therefore, adding extra 
> entities to the metastorage is undesirable. Here, we can simply not write 
> leases if nothing has changed for them.
> h3. Definition of done
> Write leases to Meta storage only if they have been updated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-22837) Invocation of the local raft client happens in a different pool

2024-08-09 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-22837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17872428#comment-17872428
 ] 

Vladislav Pyatkov commented on IGNITE-22837:


Merged 513d54d637b64215b7fc04e3992aeb80faee9f66

> Invocation of the local raft client happens in a different pool
> ---
>
> Key: IGNITE-22837
> URL: https://issues.apache.org/jira/browse/IGNITE-22837
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> h3. Motivation
> By our architecture, we are involving RAFT client from the same node where 
> the RAFT node is. In this case, we do not need to change threads (the only 
> case where we have it is an invocation in the network thread).
> {noformat}
> resolvePeer:org.apache.ignite.internal.benchmark.InsertBenchmark.kvInsert-jmh-worker-1
>  0.301 782499 782800
> Here is hidden 9.3 us
> deserializedCommand:%node_3344%JRaft-Request-Processor-2 0.1 792100 792200
> {noformat}
> h3. Definition of done
> Get rid of chnaging threads in loca RAFT node invocation. 
> We already have the same logic for the replica manager 
> (ReplicaManager#onReplicaMessageReceived).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-22838) Operation returns to partition pool after it is handled in RAFT

2024-08-09 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov reassigned IGNITE-22838:
--

Assignee: Vladislav Pyatkov

> Operation returns to partition pool after it is handled in RAFT
> ---
>
> Key: IGNITE-22838
> URL: https://issues.apache.org/jira/browse/IGNITE-22838
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> After executing the synchronous embedded insert operation, we are returning 
> to the partition-operations pool to return control to the client pool. We are 
> just losing several microseconds instead of returning to the client pool 
> directly.
> {noformat}
> finishFullTx:%node_3344%partition-operations-2 1.3 5476200 5477500
> Here is hidden 5.5 us
> kvPutEndMark:org.apache.ignite.internal.benchmark.InsertBenchmark.kvInsert-jmh-worker-1
>  0.0 5483000 5483000
> {noformat}
> h3. Definition of done
> We need to develop a strategy that does not involve the extra thread in 
> synchronous operation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22837) Invocation of the local raft client happens in a different pool

2024-08-08 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-22837:
---
Reviewer: Roman Puchkovskiy

> Invocation of the local raft client happens in a different pool
> ---
>
> Key: IGNITE-22837
> URL: https://issues.apache.org/jira/browse/IGNITE-22837
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h3. Motivation
> By our architecture, we are involving RAFT client from the same node where 
> the RAFT node is. In this case, we do not need to change threads (the only 
> case where we have it is an invocation in the network thread).
> {noformat}
> resolvePeer:org.apache.ignite.internal.benchmark.InsertBenchmark.kvInsert-jmh-worker-1
>  0.301 782499 782800
> Here is hidden 9.3 us
> deserializedCommand:%node_3344%JRaft-Request-Processor-2 0.1 792100 792200
> {noformat}
> h3. Definition of done
> Get rid of chnaging threads in loca RAFT node invocation. 
> We already have the same logic for the replica manager 
> (ReplicaManager#onReplicaMessageReceived).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-22837) Invocation of the local raft client happens in a different pool

2024-08-08 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-22837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17872055#comment-17872055
 ] 

Vladislav Pyatkov commented on IGNITE-22837:


In the result above, we do not have a switching thread between operations 
(resolvePeer and deserializedCommand) any more.
But we still have an unexpectedly long time interval (about 3 microseconds). 
The issue is in this code:
{code}
CompletableFuture responseFuture = new 
CompletableFuture()
.orTimeout(timeout, TimeUnit.MILLISECONDS);
{code}
A setting of a timeout for the complete future may take about 3 microseconds. 
It has to be optimized soon in other tiket.
Moreover I do not shure the timeout is necessory for the local network 
invokation.

> Invocation of the local raft client happens in a different pool
> ---
>
> Key: IGNITE-22837
> URL: https://issues.apache.org/jira/browse/IGNITE-22837
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h3. Motivation
> By our architecture, we are involving RAFT client from the same node where 
> the RAFT node is. In this case, we do not need to change threads (the only 
> case where we have it is an invocation in the network thread).
> {noformat}
> resolvePeer:org.apache.ignite.internal.benchmark.InsertBenchmark.kvInsert-jmh-worker-1
>  0.301 782499 782800
> Here is hidden 9.3 us
> deserializedCommand:%node_3344%JRaft-Request-Processor-2 0.1 792100 792200
> {noformat}
> h3. Definition of done
> Get rid of chnaging threads in loca RAFT node invocation. 
> We already have the same logic for the replica manager 
> (ReplicaManager#onReplicaMessageReceived).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-22837) Invocation of the local raft client happens in a different pool

2024-08-08 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-22837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17872029#comment-17872029
 ] 

Vladislav Pyatkov commented on IGNITE-22837:


Load from client
{code}
resolvePeer:%node_3344%partition-operations-17 0.3 6500100 6500400
Here is hidden 3.9 us
deserializedCommand:%node_3344%partition-operations-17 0.2 6504300 6504500
{code}

Embedded mode
{code}
resolvePeer:org.apache.ignite.internal.benchmark.InsertBenchmark.kvInsert-jmh-worker-1
 0.2 2350800 2351000
Here is hidden 2.7 us
deserializedCommand:org.apache.ignite.internal.benchmark.InsertBenchmark.kvInsert-jmh-worker-1
 0.1 2353700 2353800
{code}

> Invocation of the local raft client happens in a different pool
> ---
>
> Key: IGNITE-22837
> URL: https://issues.apache.org/jira/browse/IGNITE-22837
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h3. Motivation
> By our architecture, we are involving RAFT client from the same node where 
> the RAFT node is. In this case, we do not need to change threads (the only 
> case where we have it is an invocation in the network thread).
> {noformat}
> resolvePeer:org.apache.ignite.internal.benchmark.InsertBenchmark.kvInsert-jmh-worker-1
>  0.301 782499 782800
> Here is hidden 9.3 us
> deserializedCommand:%node_3344%JRaft-Request-Processor-2 0.1 792100 792200
> {noformat}
> h3. Definition of done
> Get rid of chnaging threads in loca RAFT node invocation. 
> We already have the same logic for the replica manager 
> (ReplicaManager#onReplicaMessageReceived).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22837) Invocation of the local raft client happens in a different pool

2024-08-07 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-22837:
---
Description: 
h3. Motivation
By our architecture, we are involving RAFT client from the same node where the 
RAFT node is. In this case, we do not need to change threads (the only case 
where we have it is an invocation in the network thread).
{noformat}
resolvePeer:org.apache.ignite.internal.benchmark.InsertBenchmark.kvInsert-jmh-worker-1
 0.301 782499 782800
Here is hidden 9.3 us
deserializedCommand:%node_3344%JRaft-Request-Processor-2 0.1 792100 792200
{noformat}
h3. Definition of done
Get rid of chnaging threads in loca RAFT node invocation. 
We already have the same logic for the replica manager 
(ReplicaManager#onReplicaMessageReceived).

  was:
h3. Motivation
By our architecture, we are involving RAFT client from the same node where the 
RAFT node is. In this case, we do not need to change threads (the only case 
where we have it is an invocation in the network thread).
{noformat}
onBeforeApplyCmd:%node_3344%JRaft-Request-Processor-3 0.2 5415400 5415600
Here is hidden 6.1 us
RaftNodeOnEvent:%node_3344%JRaft-NodeImpl-Disruptor_stripe_2-0 0.0 5421700 
5421700
{noformat}
h3. Definition of done
Get rid of chnaging threads in loca RAFT node invocation. 
We already have the same logic for the replica manager 
(ReplicaManager#onReplicaMessageReceived).


> Invocation of the local raft client happens in a different pool
> ---
>
> Key: IGNITE-22837
> URL: https://issues.apache.org/jira/browse/IGNITE-22837
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> By our architecture, we are involving RAFT client from the same node where 
> the RAFT node is. In this case, we do not need to change threads (the only 
> case where we have it is an invocation in the network thread).
> {noformat}
> resolvePeer:org.apache.ignite.internal.benchmark.InsertBenchmark.kvInsert-jmh-worker-1
>  0.301 782499 782800
> Here is hidden 9.3 us
> deserializedCommand:%node_3344%JRaft-Request-Processor-2 0.1 792100 792200
> {noformat}
> h3. Definition of done
> Get rid of chnaging threads in loca RAFT node invocation. 
> We already have the same logic for the replica manager 
> (ReplicaManager#onReplicaMessageReceived).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-22837) Invocation of the local raft client happens in a different pool

2024-08-06 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov reassigned IGNITE-22837:
--

Assignee: Vladislav Pyatkov

> Invocation of the local raft client happens in a different pool
> ---
>
> Key: IGNITE-22837
> URL: https://issues.apache.org/jira/browse/IGNITE-22837
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> By our architecture, we are involving RAFT client from the same node where 
> the RAFT node is. In this case, we do not need to change threads (the only 
> case where we have it is an invocation in the network thread).
> {noformat}
> onBeforeApplyCmd:%node_3344%JRaft-Request-Processor-3 0.2 5415400 5415600
> Here is hidden 6.1 us
> RaftNodeOnEvent:%node_3344%JRaft-NodeImpl-Disruptor_stripe_2-0 0.0 5421700 
> 5421700
> {noformat}
> h3. Definition of done
> Get rid of chnaging threads in loca RAFT node invocation. 
> We already have the same logic for the replica manager 
> (ReplicaManager#onReplicaMessageReceived).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-22848) ItDmlTest#scanExecutedWithinGivenTransaction make this test correctness

2024-07-26 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-22848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17869077#comment-17869077
 ] 

Vladislav Pyatkov commented on IGNITE-22848:


Merged 72bf4c09a6138e0e9e73d44d5dde1a60334db1a7

> ItDmlTest#scanExecutedWithinGivenTransaction make this test correctness
> ---
>
> Key: IGNITE-22848
> URL: https://issues.apache.org/jira/browse/IGNITE-22848
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> h3. Motivation
> We have to get rid of disabled tests. Unfortunately, this test should be 
> rewritten.
> Right now, it would fail with:
> {noformat}
> Failed to acquire a lock due to a possible deadlock...
> {noformat}
> h3. Definition of done
> The test passes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-22848) ItDmlTest#scanExecutedWithinGivenTransaction make this test correctness

2024-07-26 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov reassigned IGNITE-22848:
--

Assignee: Vladislav Pyatkov

> ItDmlTest#scanExecutedWithinGivenTransaction make this test correctness
> ---
>
> Key: IGNITE-22848
> URL: https://issues.apache.org/jira/browse/IGNITE-22848
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> h3. Motivation
> We have to get rid of disabled tests. Unfortunately, this test should be 
> rewritten.
> Right now, it would fail with:
> {noformat}
> Failed to acquire a lock due to a possible deadlock...
> {noformat}
> h3. Definition of done
> The test passes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22848) ItDmlTest#scanExecutedWithinGivenTransaction make this test correctness

2024-07-26 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-22848:
---
Reviewer: Konstantin Orlov

> ItDmlTest#scanExecutedWithinGivenTransaction make this test correctness
> ---
>
> Key: IGNITE-22848
> URL: https://issues.apache.org/jira/browse/IGNITE-22848
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> h3. Motivation
> We have to get rid of disabled tests. Unfortunately, this test should be 
> rewritten.
> Right now, it would fail with:
> {noformat}
> Failed to acquire a lock due to a possible deadlock...
> {noformat}
> h3. Definition of done
> The test passes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (IGNITE-22850) The same set of leases writes to Meta storage

2024-07-26 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-22850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17868952#comment-17868952
 ] 

Vladislav Pyatkov edited comment on IGNITE-22850 at 7/26/24 2:13 PM:
-

Also, I remember I had an issue in the test 
(ItDisasterRecoveryReconfigurationTest).
Please look at this file revision to fix it.
https://github.com/gridgain/apache-ignite-3/blob/8453245c185ee47a63fbe47423ca2c0635671677/modules/table/src/integrationTest/java/org/apache/ignite/internal/disaster/ItDisasterRecoveryReconfigurationTest.java


was (Author: v.pyatkov):
Also, I remember I had an issue in the test 
(ItDisasterRecoveryReconfigurationTest).
Please look at this file to fix it.
https://github.com/gridgain/apache-ignite-3/blob/8453245c185ee47a63fbe47423ca2c0635671677/modules/table/src/integrationTest/java/org/apache/ignite/internal/disaster/ItDisasterRecoveryReconfigurationTest.java

> The same set of leases writes to Meta storage
> -
>
> Key: IGNITE-22850
> URL: https://issues.apache.org/jira/browse/IGNITE-22850
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
> Attachments: Lease_updater.patch
>
>
> h3. Motivation
> We are trying to reduce the load on Meta storage. Therefore, adding extra 
> entities to the metastorage is undesirable. Here, we can simply not write 
> leases if nothing has changed for them.
> h3. Definition of done
> Write leases to Meta storage only if they have been updated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-22850) The same set of leases writes to Meta storage

2024-07-26 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-22850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17868952#comment-17868952
 ] 

Vladislav Pyatkov commented on IGNITE-22850:


Also, I remember I had an issue in the test 
(ItDisasterRecoveryReconfigurationTest).
Please look at this file to fix it.
https://github.com/gridgain/apache-ignite-3/blob/8453245c185ee47a63fbe47423ca2c0635671677/modules/table/src/integrationTest/java/org/apache/ignite/internal/disaster/ItDisasterRecoveryReconfigurationTest.java

> The same set of leases writes to Meta storage
> -
>
> Key: IGNITE-22850
> URL: https://issues.apache.org/jira/browse/IGNITE-22850
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
> Attachments: Lease_updater.patch
>
>
> h3. Motivation
> We are trying to reduce the load on Meta storage. Therefore, adding extra 
> entities to the metastorage is undesirable. Here, we can simply not write 
> leases if nothing has changed for them.
> h3. Definition of done
> Write leases to Meta storage only if they have been updated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22850) The same set of leases writes to Meta storage

2024-07-26 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-22850:
---
Labels: ignite-3  (was: )

> The same set of leases writes to Meta storage
> -
>
> Key: IGNITE-22850
> URL: https://issues.apache.org/jira/browse/IGNITE-22850
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
> Attachments: Lease_updater.patch
>
>
> h3. Motivation
> We are trying to reduce the load on Meta storage. Therefore, adding extra 
> entities to the metastorage is undesirable. Here, we can simply not write 
> leases if nothing has changed for them.
> h3. Definition of done
> Write leases to Meta storage only if they have been updated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22850) The same set of leases writes to Meta storage

2024-07-26 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-22850:
---
Attachment: Lease_updater.patch

> The same set of leases writes to Meta storage
> -
>
> Key: IGNITE-22850
> URL: https://issues.apache.org/jira/browse/IGNITE-22850
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vladislav Pyatkov
>Priority: Major
> Attachments: Lease_updater.patch
>
>
> h3. Motivation
> We are trying to reduce the load on Meta storage. Therefore, adding extra 
> entities to the metastorage is undesirable. Here, we can simply not write 
> leases if nothing has changed for them.
> h3. Definition of done
> Write leases to Meta storage only if they have been updated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-22850) The same set of leases writes to Meta storage

2024-07-26 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-22850:
--

 Summary: The same set of leases writes to Meta storage
 Key: IGNITE-22850
 URL: https://issues.apache.org/jira/browse/IGNITE-22850
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


h3. Motivation
We are trying to reduce the load on Meta storage. Therefore, adding extra 
entities to the metastorage is undesirable. Here, we can simply not write 
leases if nothing has changed for them.

h3. Definition of done
Write leases to Meta storage only if they have been updated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-22848) ItDmlTest#scanExecutedWithinGivenTransaction make this test correctness

2024-07-26 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-22848:
--

 Summary: ItDmlTest#scanExecutedWithinGivenTransaction make this 
test correctness
 Key: IGNITE-22848
 URL: https://issues.apache.org/jira/browse/IGNITE-22848
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


h3. Motivation
We have to get rid of disabled tests. Unfortunately, this test should be 
rewritten.
Right now, it would fail with:
{noformat}
Failed to acquire a lock due to a possible deadlock...
{noformat}

h3. Definition of done
The test passes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22261) Deadlock on configuration application in NodeImpl when disruptors are full

2024-07-26 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-22261:
---
Reviewer: Mirza Aliev

> Deadlock on configuration application in NodeImpl when disruptors are full
> --
>
> Key: IGNITE-22261
> URL: https://issues.apache.org/jira/browse/IGNITE-22261
> Project: Ignite
>  Issue Type: Bug
>Reporter: Roman Puchkovskiy
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> # NodeImpl#executeApplyingTasks() takes NodeImpl.writeLock and calls 
> LogManager.appendEntries()
>  # LogManager tries to enqueue a task to diskQueue which is full, hence it 
> blocks until a task gets consumed from diskQueue
>  # diskQueue is consumed by StableClosureEventHandler
>  # StableClosureEventHandler tries to enqueue a task to 
> FSMCallerImpl#taskQueue, which is also full, so this also blocks until a task 
> gets consumed from FSMCallerImpl#taskQueue
>  # FSMCallerImpl#taskQueue is consumed by ApplyTaskHandler
>  # ApplyTaskHandler calls NodeImpl#onConfigurationChangeDone(), which tries 
> to take NodeImpl#writeLock
> As a result, there is a deadlock: 
> NodeImpl#writeLock->LogManager#diskQueue->FSMCallerImpl#taskQueue->NodeImpl#writeLock
>  (disruptors are used as blocking queues in JRaft, so, when full, they act 
> like locks).
> This was caught by ItNodeTest#testNodeTaskOverload() which uses extremely 
> short disruptors (2 items max each).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22838) Operation returns to partition pool after it is handled in RAFT

2024-07-26 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-22838:
---
Summary: Operation returns to partition pool after it is handled in RAFT  
(was: Operation returns to partition pool after it is handled in ARFT)

> Operation returns to partition pool after it is handled in RAFT
> ---
>
> Key: IGNITE-22838
> URL: https://issues.apache.org/jira/browse/IGNITE-22838
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> After executing the synchronous embedded insert operation, we are returning 
> to the partition-operations pool to return control to the client pool. We are 
> just losing several microseconds instead of returning to the client pool 
> directly.
> {noformat}
> finishFullTx:%node_3344%partition-operations-2 1.3 5476200 5477500
> Here is hidden 5.5 us
> kvPutEndMark:org.apache.ignite.internal.benchmark.InsertBenchmark.kvInsert-jmh-worker-1
>  0.0 5483000 5483000
> {noformat}
> h3. Definition of done
> We need to develop a strategy that does not involve the extra thread in 
> synchronous operation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22843) Writing into RAFT log is too long

2024-07-25 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-22843:
---
Summary: Writing into RAFT log is too long  (was: Writing into ARFT log is 
too long)

> Writing into RAFT log is too long
> -
>
> Key: IGNITE-22843
> URL: https://issues.apache.org/jira/browse/IGNITE-22843
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> We are using RocksDB as RAFT log storage. Writing in the log is significantly 
> longer than writing in the memory-mapped buffer (as we used in Ignite 2).
> {noformat}
> appendLogEntry 0.8 6493700 6494500
> Here is hidden 0.5 us
> flushLog 20.1 6495000 6515100
> Here is hidden 2.8 us
> {noformat}
> h3. Definition of done
> We should find a way to implement faster log storage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22845) Tuple marshaller works too long for schema with several fields

2024-07-25 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-22845:
---
Description: 
h3. Motivation
The tuple serializer takes a long time to serialize a simple object. 
{noformat}
kvPutMark 0.2 6429200 6429400
Here is hidden 1.7 us
kvMarshal 9.6 6431100 6440700
Here is hidden 0.2 us
{noformat}
h3. Definition of done
Optimize the tuple serialization to be close to Ignite 2 binary object.

  was:
h3. Motivation
The tuple serializer takes a long time to serialize a simple object. 
{noformat}
kvPutMark 0.2 6429200 6429400
Here is hidden 1.7 us
kvMarshal 9.6 6431100 6440700
Here is hidden 0.2 us
{noformat}


> Tuple marshaller works too long for schema with several fields
> --
>
> Key: IGNITE-22845
> URL: https://issues.apache.org/jira/browse/IGNITE-22845
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> The tuple serializer takes a long time to serialize a simple object. 
> {noformat}
> kvPutMark 0.2 6429200 6429400
> Here is hidden 1.7 us
> kvMarshal 9.6 6431100 6440700
> Here is hidden 0.2 us
> {noformat}
> h3. Definition of done
> Optimize the tuple serialization to be close to Ignite 2 binary object.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22845) Tuple marshaller works too long for schema with several fields

2024-07-25 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-22845:
---
Description: 
h3. Motivation
The tuple serializer takes a long time to serialize a simple object. 
{noformat}
kvPutMark 0.2 6429200 6429400
Here is hidden 1.7 us
kvMarshal 9.6 6431100 6440700
Here is hidden 0.2 us
{noformat}

> Tuple marshaller works too long for schema with several fields
> --
>
> Key: IGNITE-22845
> URL: https://issues.apache.org/jira/browse/IGNITE-22845
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> The tuple serializer takes a long time to serialize a simple object. 
> {noformat}
> kvPutMark 0.2 6429200 6429400
> Here is hidden 1.7 us
> kvMarshal 9.6 6431100 6440700
> Here is hidden 0.2 us
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22842) The client returns to the fork-join pool after handling operations on the server side

2024-07-25 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-22842:
---
Description: 
h3. Motivation
In synchronous operation, we return the result to the fork join pool before 
providing the result to the client in their pool. Of course, the tric costs 
several microseconds.
{noformat}
loadSchemaAndReadData:ForkJoinPool.commonPool-worker-9 0.2 9950400 9950600
Here is hidden 5.2 us
kvClientPutEndMark:org.apache.ignite.internal.benchmark.InsertBenchmark.kvThinInsert-jmh-worker-1
 0.0 9955800 9955800
{noformat}
The issue is simullar IGNITE-22838 but about that type of operation that 
started on the client side.
h3. Definition of done
We need to develop a strategy that does not involve the extra thread in 
synchronous operation.

  was:TBD


> The client returns to the fork-join pool after handling operations on the 
> server side
> -
>
> Key: IGNITE-22842
> URL: https://issues.apache.org/jira/browse/IGNITE-22842
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> In synchronous operation, we return the result to the fork join pool before 
> providing the result to the client in their pool. Of course, the tric costs 
> several microseconds.
> {noformat}
> loadSchemaAndReadData:ForkJoinPool.commonPool-worker-9 0.2 9950400 9950600
> Here is hidden 5.2 us
> kvClientPutEndMark:org.apache.ignite.internal.benchmark.InsertBenchmark.kvThinInsert-jmh-worker-1
>  0.0 9955800 9955800
> {noformat}
> The issue is simullar IGNITE-22838 but about that type of operation that 
> started on the client side.
> h3. Definition of done
> We need to develop a strategy that does not involve the extra thread in 
> synchronous operation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22838) Operation returns to partition pool after it is handled in ARFT

2024-07-25 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-22838:
---
Description: 
h3. Motivation
After executing the synchronous embedded insert operation, we are returning to 
the partition-operations pool to return control to the client pool. We are just 
losing several microseconds instead of returning to the client pool directly.
{noformat}
finishFullTx:%node_3344%partition-operations-2 1.3 5476200 5477500
Here is hidden 5.5 us
kvPutEndMark:org.apache.ignite.internal.benchmark.InsertBenchmark.kvInsert-jmh-worker-1
 0.0 5483000 5483000
{noformat}
h3. Definition of done
We need to develop a strategy that does not involve the extra thread in 
synchronous operation.

  was:
h3. Motivation
After executing the synchronous embedded insert operation, we are returning to 
the partition-operations pool to return control to the client pool. We are just 
losing several microseconds instead of returning to the client pool directly.
{noformat}
finishFullTx:%node_3344%partition-operations-2 1.3 5476200 5477500
Here is hidden 5.5 us
kvPutEndMark:org.apache.ignite.internal.benchmark.InsertBenchmark.kvInsert-jmh-worker-1
 0.0 5483000 5483000

{noformat}
h3. Definition of done
We need to develop a strategy that does not involve the extra thread in 
synchronous operation.


> Operation returns to partition pool after it is handled in ARFT
> ---
>
> Key: IGNITE-22838
> URL: https://issues.apache.org/jira/browse/IGNITE-22838
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> After executing the synchronous embedded insert operation, we are returning 
> to the partition-operations pool to return control to the client pool. We are 
> just losing several microseconds instead of returning to the client pool 
> directly.
> {noformat}
> finishFullTx:%node_3344%partition-operations-2 1.3 5476200 5477500
> Here is hidden 5.5 us
> kvPutEndMark:org.apache.ignite.internal.benchmark.InsertBenchmark.kvInsert-jmh-worker-1
>  0.0 5483000 5483000
> {noformat}
> h3. Definition of done
> We need to develop a strategy that does not involve the extra thread in 
> synchronous operation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22838) Operation returns to partition pool after it is handled in ARFT

2024-07-25 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-22838:
---
Description: 
h3. Motivation
After executing the synchronous embedded insert operation, we are returning to 
the partition pool to return control to the client pool. We are just losing 
several microseconds instead of returning to the clietn pool directly.
{noformat}
finishFullTx:%node_3344%partition-operations-2 1.3 5476200 5477500
Here is hidden 5.5 us
kvPutEndMark:org.apache.ignite.internal.benchmark.InsertBenchmark.kvInsert-jmh-worker-1
 0.0 5483000 5483000

{noformat}
h3. Definition of done
We need to develop a strategy that does not involve the extra thread in 
synchronous operation.

  was:
h3. Motivation
After executing the synchronous embedded insert operation, we are returning to 
the partition pool to return control to the client pool. We are just losing 
several microseconds instead of returning to the clietn pool directly.
{noformat}
RaftNodeOnEvent:%node_3344%JRaft-NodeImpl-Disruptor_stripe_2-0 0.0 5421700 
5421700
Here is hidden 2.8 us
LogManagerOnEvent:%node_3344%JRaft-LogManager-Disruptor_stripe_2-0 0.1 5424500 
5424600
{noformat}
h3. Definition of done
We need to develop a strategy that does not involve the extra thread in 
synchronous operation.


> Operation returns to partition pool after it is handled in ARFT
> ---
>
> Key: IGNITE-22838
> URL: https://issues.apache.org/jira/browse/IGNITE-22838
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> After executing the synchronous embedded insert operation, we are returning 
> to the partition pool to return control to the client pool. We are just 
> losing several microseconds instead of returning to the clietn pool directly.
> {noformat}
> finishFullTx:%node_3344%partition-operations-2 1.3 5476200 5477500
> Here is hidden 5.5 us
> kvPutEndMark:org.apache.ignite.internal.benchmark.InsertBenchmark.kvInsert-jmh-worker-1
>  0.0 5483000 5483000
> {noformat}
> h3. Definition of done
> We need to develop a strategy that does not involve the extra thread in 
> synchronous operation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22838) Operation returns to partition pool after it is handled in ARFT

2024-07-25 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-22838:
---
Description: 
h3. Motivation
After executing the synchronous embedded insert operation, we are returning to 
the partition-operations pool to return control to the client pool. We are just 
losing several microseconds instead of returning to the client pool directly.
{noformat}
finishFullTx:%node_3344%partition-operations-2 1.3 5476200 5477500
Here is hidden 5.5 us
kvPutEndMark:org.apache.ignite.internal.benchmark.InsertBenchmark.kvInsert-jmh-worker-1
 0.0 5483000 5483000

{noformat}
h3. Definition of done
We need to develop a strategy that does not involve the extra thread in 
synchronous operation.

  was:
h3. Motivation
After executing the synchronous embedded insert operation, we are returning to 
the partition pool to return control to the client pool. We are just losing 
several microseconds instead of returning to the clietn pool directly.
{noformat}
finishFullTx:%node_3344%partition-operations-2 1.3 5476200 5477500
Here is hidden 5.5 us
kvPutEndMark:org.apache.ignite.internal.benchmark.InsertBenchmark.kvInsert-jmh-worker-1
 0.0 5483000 5483000

{noformat}
h3. Definition of done
We need to develop a strategy that does not involve the extra thread in 
synchronous operation.


> Operation returns to partition pool after it is handled in ARFT
> ---
>
> Key: IGNITE-22838
> URL: https://issues.apache.org/jira/browse/IGNITE-22838
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> After executing the synchronous embedded insert operation, we are returning 
> to the partition-operations pool to return control to the client pool. We are 
> just losing several microseconds instead of returning to the client pool 
> directly.
> {noformat}
> finishFullTx:%node_3344%partition-operations-2 1.3 5476200 5477500
> Here is hidden 5.5 us
> kvPutEndMark:org.apache.ignite.internal.benchmark.InsertBenchmark.kvInsert-jmh-worker-1
>  0.0 5483000 5483000
> {noformat}
> h3. Definition of done
> We need to develop a strategy that does not involve the extra thread in 
> synchronous operation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22838) Operation returns to partition pool after it is handled in ARFT

2024-07-25 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-22838:
---
Description: 
h3. Motivation
After executing the synchronous embedded insert operation, we are returning to 
the partition pool to return control to the client pool. We are just losing 
several microseconds instead of returning to the clietn pool directly.
{noformat}
RaftNodeOnEvent:%node_3344%JRaft-NodeImpl-Disruptor_stripe_2-0 0.0 5421700 
5421700
Here is hidden 2.8 us
LogManagerOnEvent:%node_3344%JRaft-LogManager-Disruptor_stripe_2-0 0.1 5424500 
5424600
{noformat}
h3. Definition of done
We need to develop a strategy that does not involve the extra thread in 
synchronous operation.

  was:
h3. Motivation
After executing the synchronous insert operation, we are returning to the 
partition pool to return control to the client pool. We just lost several 
microsseconds  


> Operation returns to partition pool after it is handled in ARFT
> ---
>
> Key: IGNITE-22838
> URL: https://issues.apache.org/jira/browse/IGNITE-22838
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> After executing the synchronous embedded insert operation, we are returning 
> to the partition pool to return control to the client pool. We are just 
> losing several microseconds instead of returning to the clietn pool directly.
> {noformat}
> RaftNodeOnEvent:%node_3344%JRaft-NodeImpl-Disruptor_stripe_2-0 0.0 5421700 
> 5421700
> Here is hidden 2.8 us
> LogManagerOnEvent:%node_3344%JRaft-LogManager-Disruptor_stripe_2-0 0.1 
> 5424500 5424600
> {noformat}
> h3. Definition of done
> We need to develop a strategy that does not involve the extra thread in 
> synchronous operation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22838) Operation returns to partition pool after it is handled in ARFT

2024-07-25 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-22838:
---
Description: 
h3. Motivation
After executing the synchronous insert operation, we are returning to the 
partition pool to return control to the client pool. We just lost several 
microsseconds  

  was:TBD


> Operation returns to partition pool after it is handled in ARFT
> ---
>
> Key: IGNITE-22838
> URL: https://issues.apache.org/jira/browse/IGNITE-22838
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> After executing the synchronous insert operation, we are returning to the 
> partition pool to return control to the client pool. We just lost several 
> microsseconds  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22837) Invocation of the local raft client happens in a different pool

2024-07-25 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-22837:
---
Description: 
h3. Motivation
By our architecture, we are involving RAFT client from the same node where the 
RAFT node is. In this case, we do not need to change threads (the only case 
where we have it is an invocation in the network thread).
{noformat}
onBeforeApplyCmd:%node_3344%JRaft-Request-Processor-3 0.2 5415400 5415600
Here is hidden 6.1 us
RaftNodeOnEvent:%node_3344%JRaft-NodeImpl-Disruptor_stripe_2-0 0.0 5421700 
5421700
{noformat}
h3. Definition of done
Get rid of chnaging threads in loca RAFT node invocation. 
We already have the same logic for the replica manager 
(ReplicaManager#onReplicaMessageReceived).

  was:TBD


> Invocation of the local raft client happens in a different pool
> ---
>
> Key: IGNITE-22837
> URL: https://issues.apache.org/jira/browse/IGNITE-22837
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> By our architecture, we are involving RAFT client from the same node where 
> the RAFT node is. In this case, we do not need to change threads (the only 
> case where we have it is an invocation in the network thread).
> {noformat}
> onBeforeApplyCmd:%node_3344%JRaft-Request-Processor-3 0.2 5415400 5415600
> Here is hidden 6.1 us
> RaftNodeOnEvent:%node_3344%JRaft-NodeImpl-Disruptor_stripe_2-0 0.0 5421700 
> 5421700
> {noformat}
> h3. Definition of done
> Get rid of chnaging threads in loca RAFT node invocation. 
> We already have the same logic for the replica manager 
> (ReplicaManager#onReplicaMessageReceived).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22843) Writing into ARFT log is too long

2024-07-25 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-22843:
---
Description: 
h3. Motivation
We are using RocksDB as RAFT log storage. Writing in the log is significantly 
longer than writing in the memory-mapped buffer (as we used in Ignite 2).
{noformat}
appendLogEntry 0.8 6493700 6494500
Here is hidden 0.5 us
flushLog 20.1 6495000 6515100
Here is hidden 2.8 us
{noformat}
h3. Definition of done
We should find a way to implement faster log storage.

  was:
Motivation
We are using RocksDB as RAFT log storage. Writing in the log is significantly 
longer than writing in the memory-mapped buffer (as we used in Ignite 2).
{noformat}
appendLogEntry 0.8 6493700 6494500
Here is hidden 0.5 us
flushLog 20.1 6495000 6515100
Here is hidden 2.8 us
{noformat}

Definition of done
We should find a way to implement faster log storage.


> Writing into ARFT log is too long
> -
>
> Key: IGNITE-22843
> URL: https://issues.apache.org/jira/browse/IGNITE-22843
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> We are using RocksDB as RAFT log storage. Writing in the log is significantly 
> longer than writing in the memory-mapped buffer (as we used in Ignite 2).
> {noformat}
> appendLogEntry 0.8 6493700 6494500
> Here is hidden 0.5 us
> flushLog 20.1 6495000 6515100
> Here is hidden 2.8 us
> {noformat}
> h3. Definition of done
> We should find a way to implement faster log storage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22835) Latency penalty for using disruptor threads

2024-07-25 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-22835:
---
Description: 
h3. Motivation
All three disruptors are part of the RAFT implementation. Each time a command 
has to be replicated, all the disruptors are used. This leads to a bad impact 
on the entire operation's latency.
{noformat}
onBeforeApplyCmd:%node_3344%JRaft-Request-Processor-3 0.2 5415400 5415600
Here is hidden 6.1 us
RaftNodeOnEvent:%node_3344%JRaft-NodeImpl-Disruptor_stripe_2-0 0.0 5421700 
5421700
--
RaftNodeOnEvent:%node_3344%JRaft-NodeImpl-Disruptor_stripe_2-0 0.0 5421700 
5421700
Here is hidden 2.8 us
LogManagerOnEvent:%node_3344%JRaft-LogManager-Disruptor_stripe_2-0 0.1 5424500 
5424600
--
flushLog:%node_3344%JRaft-LogManager-Disruptor_stripe_2-0 18.7 5426400 5445100
Here is hidden 2.9 us
FSMCallerOnEvent:%node_3344%JRaft-FSMCaller-Disruptor_stripe_1-0 0.1 5448000 
5448100
{noformat}
h3. Definition of done
* We can try to use the disruptor wait policy, depending on the case.
* Maybe we can reduce a number of the dusruptor queues.

  was:
h3. Motivation
All three disruptors are part of the RAFT implementation. Each time a command 
has to be replicated, all the disruptors are used. This leads to a bad impact 
on the entire operation's latency.
h3. Definition of done
* We can try to use the disruptor wait policy, depending on the case.
* Maybe we can reduce a number of the dusruptor queues.


> Latency penalty for using disruptor threads
> ---
>
> Key: IGNITE-22835
> URL: https://issues.apache.org/jira/browse/IGNITE-22835
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> All three disruptors are part of the RAFT implementation. Each time a command 
> has to be replicated, all the disruptors are used. This leads to a bad impact 
> on the entire operation's latency.
> {noformat}
> onBeforeApplyCmd:%node_3344%JRaft-Request-Processor-3 0.2 5415400 5415600
> Here is hidden 6.1 us
> RaftNodeOnEvent:%node_3344%JRaft-NodeImpl-Disruptor_stripe_2-0 0.0 5421700 
> 5421700
> --
> RaftNodeOnEvent:%node_3344%JRaft-NodeImpl-Disruptor_stripe_2-0 0.0 5421700 
> 5421700
> Here is hidden 2.8 us
> LogManagerOnEvent:%node_3344%JRaft-LogManager-Disruptor_stripe_2-0 0.1 
> 5424500 5424600
> --
> flushLog:%node_3344%JRaft-LogManager-Disruptor_stripe_2-0 18.7 5426400 5445100
> Here is hidden 2.9 us
> FSMCallerOnEvent:%node_3344%JRaft-FSMCaller-Disruptor_stripe_1-0 0.1 5448000 
> 5448100
> {noformat}
> h3. Definition of done
> * We can try to use the disruptor wait policy, depending on the case.
> * Maybe we can reduce a number of the dusruptor queues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   4   5   6   7   8   9   10   >