[jira] [Commented] (IGNITE-14197) Checkpoint thread can't take checkpoint write lock because it waits for parked threads to complete their work

2022-03-16 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-14197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17507667#comment-17507667
 ] 

Anton Kalashnikov commented on IGNITE-14197:


It is actually a good question. I remember that we discussed that but I can't 
find the decision about closing it. Perhaps, we expected to fix this problem in 
a different ticket but I don't see a linked ticket here as well.

[~sergey-chugunov] or [~ibessonov] can you check how relevant is this task? and 
if it is we can reopen the PR

> Checkpoint thread can't take checkpoint write lock because it waits for 
> parked threads to complete their work
> -
>
> Key: IGNITE-14197
> URL: https://issues.apache.org/jira/browse/IGNITE-14197
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Kalashnikov
>Assignee: Anton Kalashnikov
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In case of enabled write throttling, when, for example, node parks data 
> streamer thread, it still holds checkpoint read lock and it leads to the long 
> pauses on waiting for checkpoint lock:
> [2020-07-23 07:09:21,614][INFO 
> ][db-checkpoint-thread-#371][GridCacheDatabaseSharedManager] Checkpoint 
> started [checkpointId=f964c8f2-daa5-41b2-80ef-944326f26f8a, 
> startPtr=FileWALPointer [idx=56913, fileOff=10362905, len=41972], 
> checkpointBeforeLockTime=1983ms, *checkpointLockWait=812117ms*, 
> checkpointListenersExecuteTime=90ms, checkpointLockHoldTime=93ms, 
> walCpRecordFsyncDuration=123ms, writeCheckpointEntryDuration=4ms, 
> splitAndSortCpPagesDuration=4155ms, pages=10516815, reason='too big size of 
> WAL without checkpoint']
> All operations at this moment are blocked.
> Sometimes, it can lead to a complete disaster:
> Parking thread=data-streamer-stripe-47-#144 for timeout(ms)=*21278855*
> {quote}“data-streamer-stripe-78-#175” #209 prio=5 os_prio=0 
> tid=0x7f6161d6a800 nid=0xf932 waiting on condition [0x7f5c292d1000]
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:338)
> at 
> org.apache.ignite.internal.processors.cache.persistence.pagemem.PagesWriteSpeedBasedThrottle.doPark(PagesWriteSpeedBasedThrottle.java:244)
> at 
> org.apache.ignite.internal.processors.cache.persistence.pagemem.PagesWriteSpeedBasedThrottle.onMarkDirty(PagesWriteSpeedBasedThrottle.java:227)
> at 
> org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlockPage(PageMemoryImpl.java:1730)
> at 
> org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:491)
> at 
> org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:483)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writeUnlock(PageHandler.java:394)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writePage(PageHandler.java:369)
> at 
> org.apache.ignite.internal.processors.cache.persistence.DataStructure.write(DataStructure.java:296)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access$11300(BPlusTree.java:98)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Put.tryInsert(BPlusTree.java:3864)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Put.access$7100(BPlusTree.java:3544)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.onNotFound(BPlusTree.java:4103)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.access$5800(BPlusTree.java:3894)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:2022)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1997)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1904)
> at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1662)
> at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1645)
> at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:2473)
> at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:436)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4306)
> at 

[jira] [Created] (IGNITE-14197) Checkpoint thread can't take checkpoint write lock because it waits for parked threads to complete their work

2021-02-17 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-14197:
--

 Summary: Checkpoint thread can't take checkpoint write lock 
because it waits for parked threads to complete their work
 Key: IGNITE-14197
 URL: https://issues.apache.org/jira/browse/IGNITE-14197
 Project: Ignite
  Issue Type: Bug
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


In case of enabled write throttling, when, for example, node parks data 
streamer thread, it still holds checkpoint read lock and it leads to the long 
pauses on waiting for checkpoint lock:
[2020-07-23 07:09:21,614][INFO 
][db-checkpoint-thread-#371][GridCacheDatabaseSharedManager] Checkpoint started 
[checkpointId=f964c8f2-daa5-41b2-80ef-944326f26f8a, startPtr=FileWALPointer 
[idx=56913, fileOff=10362905, len=41972], checkpointBeforeLockTime=1983ms, 
*checkpointLockWait=812117ms*, checkpointListenersExecuteTime=90ms, 
checkpointLockHoldTime=93ms, walCpRecordFsyncDuration=123ms, 
writeCheckpointEntryDuration=4ms, splitAndSortCpPagesDuration=4155ms, 
pages=10516815, reason='too big size of WAL without checkpoint']
All operations at this moment are blocked.

Sometimes, it can lead to a complete disaster:
Parking thread=data-streamer-stripe-47-#144 for timeout(ms)=*21278855*
{quote}“data-streamer-stripe-78-#175” #209 prio=5 os_prio=0 
tid=0x7f6161d6a800 nid=0xf932 waiting on condition [0x7f5c292d1000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:338)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PagesWriteSpeedBasedThrottle.doPark(PagesWriteSpeedBasedThrottle.java:244)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PagesWriteSpeedBasedThrottle.onMarkDirty(PagesWriteSpeedBasedThrottle.java:227)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlockPage(PageMemoryImpl.java:1730)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:491)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:483)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writeUnlock(PageHandler.java:394)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writePage(PageHandler.java:369)
at 
org.apache.ignite.internal.processors.cache.persistence.DataStructure.write(DataStructure.java:296)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access$11300(BPlusTree.java:98)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Put.tryInsert(BPlusTree.java:3864)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Put.access$7100(BPlusTree.java:3544)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.onNotFound(BPlusTree.java:4103)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.access$5800(BPlusTree.java:3894)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:2022)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1997)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1904)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1662)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1645)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:2473)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:436)
at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4306)
at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3441)
at 
org.apache.ignite.internal.processors.cache.GridCacheEntryEx.initialValue(GridCacheEntryEx.java:770)
at 
org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$IsolatedUpdater.receive(DataStreamerImpl.java:2278)
at 
org.apache.ignite.internal.processors.datastreamer.DataStreamerUpdateJob.call(DataStreamerUpdateJob.java:139)
at 
org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7104)
at 
org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:966)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
at 
org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:559)
at 

[jira] [Commented] (IGNITE-13761) Implement Segmented-LRU and CLOCK page replacement algorithms

2021-02-16 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17285289#comment-17285289
 ] 

Anton Kalashnikov commented on IGNITE-13761:


[~alex_pl] thanks for your changes. It looks good to me too.

> Implement Segmented-LRU and CLOCK page replacement algorithms
> -
>
> Key: IGNITE-13761
> URL: https://issues.apache.org/jira/browse/IGNITE-13761
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Aleksey Plekhanov
>Assignee: Aleksey Plekhanov
>Priority: Major
>  Labels: iep-62
> Attachments: GetBenchmark.zip, PutBenchmark.zip
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> See IEP-62 for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-14139) Incorrect initialize checkpoint-runner-cpu thread pool

2021-02-11 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-14139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283024#comment-17283024
 ] 

Anton Kalashnikov commented on IGNITE-14139:


[~v.pyatkov] it looks good to me. [~ibessonov] can you help with merge please?

> Incorrect initialize checkpoint-runner-cpu thread pool
> --
>
> Key: IGNITE-14139
> URL: https://issues.apache.org/jira/browse/IGNITE-14139
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> First initialization of checkpoint thread pool for CPU is incorrect.
> Look at the constructor of {{CheckpointWorkflow}}:
> At start, we initialize the pool:
> {code:java}
> this.checkpointCollectPagesInfoPool = initializeCheckpointPool();
> {code}
> and only after, we set a size of the pool:
> {code:java}
> this.checkpointCollectInfoThreads = checkpointCollectInfoThreads;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-14055) Deadlock in timeoutObjectProcessor between 'send message' & 'handshake timeout'

2021-02-08 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280843#comment-17280843
 ] 

Anton Kalashnikov commented on IGNITE-14055:


[~ibessonov] can you take a look and merge please?

> Deadlock in timeoutObjectProcessor between 'send message' & 'handshake 
> timeout'
> ---
>
> Key: IGNITE-14055
> URL: https://issues.apache.org/jira/browse/IGNITE-14055
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Kalashnikov
>Assignee: Anton Kalashnikov
>Priority: Major
> Attachments: StartServerWithTxPuts (1).java, freeze (1).sh
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Cluster hangs after jvm pauses on one of server nodes.
>  Scenario:
>  1. Start three server nodes with put operations using StartServerWithTxPuts.
>  2. Emulate jvm freezes on one server node by running the attached script:
>  {{*sh freeze.sh *}}
>  3. Wait until the script has finished.
> Result:
>  The cluster hangs on tx put operations.
> The first server node continuously prints:
> {noformat}
> [2020-11-03 09:36:01,719][INFO 
> ][grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Accepted incoming communication connection [locAddr=/127.0.0.1:47100, 
> rmtAddr=/127.0.0.1:57714][2020-11-03 09:36:01,720][INFO 
> ][grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Received incoming connection from remote node while connecting to this node, 
> rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, 
> rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3][2020-11-03 
> 09:36:01,922][INFO 
> ][grid-nio-worker-tcp-comm-0-#23%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Accepted incoming communication connection [locAddr=/127.0.0.1:47100, 
> rmtAddr=/127.0.0.1:57716][2020-11-03 09:36:01,922][INFO 
> ][grid-nio-worker-tcp-comm-0-#23%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Received incoming connection from remote node while connecting to this node, 
> rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, 
> rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3][2020-11-03 
> 09:36:02,124][INFO 
> ][grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Accepted incoming communication connection [locAddr=/127.0.0.1:47100, 
> rmtAddr=/127.0.0.1:57718][2020-11-03 09:36:02,125][INFO 
> ][grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Received incoming connection from remote node while connecting to this node, 
> rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, 
> rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3][2020-11-03 
> 09:36:02,326][INFO 
> ][grid-nio-worker-tcp-comm-2-#25%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Accepted incoming communication connection [locAddr=/127.0.0.1:47100, 
> rmtAddr=/127.0.0.1:57720][2020-11-03 09:36:02,327][INFO 
> ][grid-nio-worker-tcp-comm-2-#25%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Received incoming connection from remote node while connecting to this node, 
> rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, 
> rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3][2020-11-03 
> 09:36:02,528][INFO 
> ][grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Accepted incoming communication connection [locAddr=/127.0.0.1:47100, 
> rmtAddr=/127.0.0.1:57722][2020-11-03 09:36:02,529][INFO 
> ][grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Received incoming connection from remote node while connecting to this node, 
> rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, 
> rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3]}}
> {noformat}
>  The second node prints long running transactions in prepared state ignoring 
> the default tx timeout:
>  
> {noformat}
> [2020-11-03 09:36:46,199][WARN 
> ][sys-#83%56b4f715-82d6-4d63-ba99-441ffcd673b4%][diagnostic] >>> Future 
> [startTime=09:33:08.496, curTime=09:36:46.181, fut=GridNearTxFinishFuture 
> [futId=425decc8571-4ce98554-8c56-4daf-a7a9-5b9bff52fa08, tx=GridNearTxLocal 
> [mappings=IgniteTxMappingsSingleImpl [mapping=GridDistributedTxMapping 
> [entries=LinkedHashSet [IgniteTxEntry [txKey=IgniteTxKey 
> [key=KeyCacheObjectImpl [part=833, val=833, hasValBytes=true], 
> cacheId=-923393186], val=TxEntryValueHolder [val=CacheObjectByteArrayImpl 
> [arrLen=1048576], op=CREATE], prevVal=TxEntryValueHolder [val=null, op=NOOP], 
> oldVal=TxEntryValueHolder [val=null, op=NOOP], entryProcessorsCol=null, 
> ttl=-1, conflictExpireTime=-1, conflictVer=null, explicitVer=null, 
> dhtVer=null, filters=CacheEntryPredicate[] [], filtersPassed=false, 
> filtersSet=true, 

[jira] [Updated] (IGNITE-14110) Create networking module

2021-02-02 Thread Anton Kalashnikov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-14110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kalashnikov updated IGNITE-14110:
---
Labels: iep-66  (was: )

> Create networking module
> 
>
> Key: IGNITE-14110
> URL: https://issues.apache.org/jira/browse/IGNITE-14110
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Anton Kalashnikov
>Assignee: Anton Kalashnikov
>Priority: Major
>  Labels: iep-66
>
> It needs to create a networking module with some API and simple 
> implementation for further improvment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14110) Create networking module

2021-02-02 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-14110:
--

 Summary: Create networking module
 Key: IGNITE-14110
 URL: https://issues.apache.org/jira/browse/IGNITE-14110
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


It needs to create a networking module with some API and simple implementation 
for further improvment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-14091) Implement messaging service

2021-01-28 Thread Anton Kalashnikov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-14091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kalashnikov updated IGNITE-14091:
---
Labels: iep-66 ignite-3  (was: )

> Implement messaging service
> ---
>
> Key: IGNITE-14091
> URL: https://issues.apache.org/jira/browse/IGNITE-14091
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Anton Kalashnikov
>Priority: Major
>  Labels: iep-66, ignite-3
>
> It needs to implement the ability to send/receive messages to/from network 
> members:
>  * there's a requirements of being able to send idempotent messages with very 
> weak guarantees:
>  ** no delivery guarantees required;
>  ** multiple copies of the same message might be sent;
>  ** no need to have any kind of acknowledgement;
>  * there's another requirement for the common use:
>  ** message must be sent exactly once with an acknowledgement that it has 
> actually been received (not necessarily processed);
>  ** messages must be received in the same order they were sent.
> These types of messages might utilize current recovery protocol with acks 
> every 32 (or so) messages. This setting must be flexible enough so that we 
> won't get OOM in big topologies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-14092) Design network address resolver

2021-01-28 Thread Anton Kalashnikov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-14092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kalashnikov updated IGNITE-14092:
---
Labels: iep-66 ignite-3  (was: )

> Design network address resolver
> ---
>
> Key: IGNITE-14092
> URL: https://issues.apache.org/jira/browse/IGNITE-14092
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Anton Kalashnikov
>Priority: Major
>  Labels: iep-66, ignite-3
>
> It needs to design network address resolver/ip finder/discovery which would 
> help to choose the right ip/port for connection. Perhaps we don't need such a 
> service at all but it should be explicitly agreed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-14081) Networking module

2021-01-28 Thread Anton Kalashnikov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-14081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kalashnikov updated IGNITE-14081:
---
Labels: iep-66 ignite-3  (was: ignite-3)

> Networking module
> -
>
> Key: IGNITE-14081
> URL: https://issues.apache.org/jira/browse/IGNITE-14081
> Project: Ignite
>  Issue Type: New Feature
>Reporter: Anton Kalashnikov
>Priority: Major
>  Labels: iep-66, ignite-3
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-14090) Networking API

2021-01-28 Thread Anton Kalashnikov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kalashnikov updated IGNITE-14090:
---
Labels: iep-66 ignite-3  (was: )

> Networking API
> --
>
> Key: IGNITE-14090
> URL: https://issues.apache.org/jira/browse/IGNITE-14090
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Anton Kalashnikov
>Priority: Major
>  Labels: iep-66, ignite-3
>
> It needs to design convinient public API for networking module which allow to 
> get information about network members and send/receive messages from them.
> Draft:
> {noformat}
> public interface NetworkService {
> static NetworkService create(NetworkConfiguration cfg);
> void shutdown() throws ???;NetworkMember localMember();
> 
> Collection remoteMembers();
> 
> void weakSend(NetworkMember member, Message msg);
> Future guaranteedSend(NetworkMember member, Message msg);
> 
> void listenMembers(MembershipListener lsnr);
> 
> void listenMessages(Consumer lsnr);
> }
> public interface MembershipListener {
> void onAppeared(NetworkMember member);
> void onDisappeared(NetworkMember member);
> void onAcceptedByGroup(List remoteMembers);
> }
> public interface NetworkMember {
> UUID id();
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-14089) Override scalecube internal message by custom one

2021-01-28 Thread Anton Kalashnikov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-14089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kalashnikov updated IGNITE-14089:
---
Labels: iep-66 ignite-3  (was: )

> Override scalecube internal message by custom one
> -
>
> Key: IGNITE-14089
> URL: https://issues.apache.org/jira/browse/IGNITE-14089
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Anton Kalashnikov
>Priority: Major
>  Labels: iep-66, ignite-3
>
> There is some custom logic in the networking module like a specific 
> handshake, message recovery etc. which requires to have specific messages but 
> at the same time default scalecube behaviour should be worked correctly. So 
> it needs to implement one logic over another.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-14088) Implement scalecube transport API over netty

2021-01-28 Thread Anton Kalashnikov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-14088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kalashnikov updated IGNITE-14088:
---
Labels: iep-66 ignite-3  (was: )

> Implement scalecube transport API over netty
> 
>
> Key: IGNITE-14088
> URL: https://issues.apache.org/jira/browse/IGNITE-14088
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Anton Kalashnikov
>Priority: Major
>  Labels: iep-66, ignite-3
>
> scalecube has its own netty inside but it is idea to integrate our expanded 
> netty into it. It will help us to support more features like our own 
> handshake, marshalling etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-14085) Implement message recovery protocol over handshake

2021-01-28 Thread Anton Kalashnikov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-14085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kalashnikov updated IGNITE-14085:
---
Labels: iep-66 ignite-3  (was: )

> Implement message recovery protocol over handshake
> --
>
> Key: IGNITE-14085
> URL: https://issues.apache.org/jira/browse/IGNITE-14085
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Anton Kalashnikov
>Priority: Major
>  Labels: iep-66, ignite-3
>
> The central idea of recovery protocol is the same as it is in the current 
> implementation. So it needs to implement a similar idea with the recovery 
> descriptor. This means information about last sending/received messages 
> should be sent during the handshake and according to this information 
> messages which were not received should be sent one more time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-14084) Integrate direct marshalling to networking

2021-01-28 Thread Anton Kalashnikov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-14084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kalashnikov updated IGNITE-14084:
---
Labels: iep-66 ignite-3  (was: )

> Integrate direct marshalling to networking
> --
>
> Key: IGNITE-14084
> URL: https://issues.apache.org/jira/browse/IGNITE-14084
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Anton Kalashnikov
>Priority: Major
>  Labels: iep-66, ignite-3
>
> Direct marshalling can be extracted from ignite2.x and integrate to 
> ignite3.0. It helps to avoid extra data copy during the sending/receiving 
> messages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-14086) Implement retry of establishing connection if it was lost

2021-01-28 Thread Anton Kalashnikov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-14086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kalashnikov updated IGNITE-14086:
---
Labels: iep-66 ignite-3  (was: )

> Implement retry of establishing connection if it was lost
> -
>
> Key: IGNITE-14086
> URL: https://issues.apache.org/jira/browse/IGNITE-14086
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Anton Kalashnikov
>Priority: Major
>  Labels: iep-66, ignite-3
>
> It needs to implement a retry of establishing the connection. It is not clear 
> which way is better to implement such idea because the current implementation 
> too difficult to configure(number of retries, several properties of retry 
> time). So it needs to think a better way to configure it. And it needs to be 
> implementeded.
> Perhaps, scalecube(gossip protocol) do all work already and we should do 
> nothing here. Need to recheck.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-14083) Add SSL support to networking

2021-01-28 Thread Anton Kalashnikov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-14083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kalashnikov updated IGNITE-14083:
---
Labels: iep-66 ignite-3  (was: )

> Add SSL support to networking
> -
>
> Key: IGNITE-14083
> URL: https://issues.apache.org/jira/browse/IGNITE-14083
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Anton Kalashnikov
>Priority: Major
>  Labels: iep-66, ignite-3
>
> It needs to add the ability to establish SSL connection. It looks like it 
> should not be a problem. But at least, it needs to design configuration which 
> allow to manage the ssl(path to certificate, password, etc.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-14082) Implementation of handshake for new connection

2021-01-28 Thread Anton Kalashnikov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kalashnikov updated IGNITE-14082:
---
Labels: iep-66 ignite-3  (was: )

> Implementation of handshake for new connection
> --
>
> Key: IGNITE-14082
> URL: https://issues.apache.org/jira/browse/IGNITE-14082
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Anton Kalashnikov
>Priority: Major
>  Labels: iep-66, ignite-3
>
> It needs to implement the handshake after netty establish the connection. 
> Perhaps, It makes sense to use netty handlers. During the handshake, It needs 
> to exchange instanceId from one endpoint to another.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-14081) Networking module

2021-01-28 Thread Anton Kalashnikov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-14081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kalashnikov updated IGNITE-14081:
---
Labels: ignite-3  (was: )

> Networking module
> -
>
> Key: IGNITE-14081
> URL: https://issues.apache.org/jira/browse/IGNITE-14081
> Project: Ignite
>  Issue Type: New Feature
>Reporter: Anton Kalashnikov
>Priority: Major
>  Labels: ignite-3
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14092) Design network address resolver

2021-01-28 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-14092:
--

 Summary: Design network address resolver
 Key: IGNITE-14092
 URL: https://issues.apache.org/jira/browse/IGNITE-14092
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov


It needs to design network address resolver/ip finder/discovery which would 
help to choose the right ip/port for connection. Perhaps we don't need such a 
service at all but it should be explicitly agreed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14091) Implement messaging service

2021-01-28 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-14091:
--

 Summary: Implement messaging service
 Key: IGNITE-14091
 URL: https://issues.apache.org/jira/browse/IGNITE-14091
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov


It needs to implement the ability to send/receive messages to/from network 
members:
 * there's a requirements of being able to send idempotent messages with very 
weak guarantees:

 ** no delivery guarantees required;

 ** multiple copies of the same message might be sent;

 ** no need to have any kind of acknowledgement;

 * there's another requirement for the common use:

 ** message must be sent exactly once with an acknowledgement that it has 
actually been received (not necessarily processed);

 ** messages must be received in the same order they were sent.
These types of messages might utilize current recovery protocol with acks every 
32 (or so) messages. This setting must be flexible enough so that we won't get 
OOM in big topologies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-14090) Networking API

2021-01-28 Thread Anton Kalashnikov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kalashnikov updated IGNITE-14090:
---
Description: 
It needs to design convinient public API for networking module which allow to 
get information about network members and send/receive messages from them.

Draft:
{noformat}
public interface NetworkService {
static NetworkService create(NetworkConfiguration cfg);

void shutdown() throws ???;NetworkMember localMember();

Collection remoteMembers();

void weakSend(NetworkMember member, Message msg);

Future guaranteedSend(NetworkMember member, Message msg);

void listenMembers(MembershipListener lsnr);

void listenMessages(Consumer lsnr);
}

public interface MembershipListener {
void onAppeared(NetworkMember member);
void onDisappeared(NetworkMember member);
void onAcceptedByGroup(List remoteMembers);
}

public interface NetworkMember {
UUID id();
}

{noformat}

  was:
It needs to design convinient public API for networking module which allow to 
get information about network members and send/receive messages from them.

Draft:
{noformat}
public interface NetworkService {
static NetworkService create(NetworkConfiguration cfg);void shutdown() 
throws ???;NetworkMember localMember();

Collection remoteMembers();

void weakSend(NetworkMember member, Message msg);Future 
guaranteedSend(NetworkMember member, Message msg);

void listenMembers(MembershipListener lsnr);

void listenMessages(Consumer lsnr);
}

public interface MembershipListener {
void onAppeared(NetworkMember member);
void onDisappeared(NetworkMember member);
void onAcceptedByGroup(List remoteMembers);
}

public interface NetworkMember {
UUID id();
}

{noformat}


> Networking API
> --
>
> Key: IGNITE-14090
> URL: https://issues.apache.org/jira/browse/IGNITE-14090
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Anton Kalashnikov
>Priority: Major
>
> It needs to design convinient public API for networking module which allow to 
> get information about network members and send/receive messages from them.
> Draft:
> {noformat}
> public interface NetworkService {
> static NetworkService create(NetworkConfiguration cfg);
> void shutdown() throws ???;NetworkMember localMember();
> 
> Collection remoteMembers();
> 
> void weakSend(NetworkMember member, Message msg);
> Future guaranteedSend(NetworkMember member, Message msg);
> 
> void listenMembers(MembershipListener lsnr);
> 
> void listenMessages(Consumer lsnr);
> }
> public interface MembershipListener {
> void onAppeared(NetworkMember member);
> void onDisappeared(NetworkMember member);
> void onAcceptedByGroup(List remoteMembers);
> }
> public interface NetworkMember {
> UUID id();
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-14090) Networking API

2021-01-28 Thread Anton Kalashnikov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kalashnikov updated IGNITE-14090:
---
Description: 
It needs to design convinient public API for networking module which allow to 
get information about network members and send/receive messages from them.

Draft:
{noformat}
public interface NetworkService {
static NetworkService create(NetworkConfiguration cfg);void shutdown() 
throws ???;NetworkMember localMember();

Collection remoteMembers();

void weakSend(NetworkMember member, Message msg);Future 
guaranteedSend(NetworkMember member, Message msg);

void listenMembers(MembershipListener lsnr);

void listenMessages(Consumer lsnr);
}

public interface MembershipListener {
void onAppeared(NetworkMember member);
void onDisappeared(NetworkMember member);
void onAcceptedByGroup(List remoteMembers);
}

public interface NetworkMember {
UUID id();
}

{noformat}

  was:
It needs to design convinient public API for networking module which allow to 
get information about network members and send/receive messages from them.

Draft:

{noformat}

public interface NetworkService \{ static NetworkService 
create(NetworkConfiguration cfg); void shutdown() throws ???; NetworkMember 
localMember(); Collection remoteMembers(); void 
weakSend(NetworkMember member, Message msg); Future 
guaranteedSend(NetworkMember member, Message msg); void 
listenMembers(MembershipListener lsnr); void 
listenMessages(Consumer lsnr); } public interface 
MembershipListener \{ void onAppeared(NetworkMember member); void 
onDisappeared(NetworkMember member); void onAcceptedByGroup(List 
remoteMembers); } public interface NetworkMember \{ UUID id(); }

{noformat}


> Networking API
> --
>
> Key: IGNITE-14090
> URL: https://issues.apache.org/jira/browse/IGNITE-14090
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Anton Kalashnikov
>Priority: Major
>
> It needs to design convinient public API for networking module which allow to 
> get information about network members and send/receive messages from them.
> Draft:
> {noformat}
> public interface NetworkService {
> static NetworkService create(NetworkConfiguration cfg);void 
> shutdown() throws ???;NetworkMember localMember();
> 
>   Collection remoteMembers();
> 
>   void weakSend(NetworkMember member, Message msg);Future 
> guaranteedSend(NetworkMember member, Message msg);
> 
>   void listenMembers(MembershipListener lsnr);
> 
>   void listenMessages(Consumer lsnr);
> }
> public interface MembershipListener {
> void onAppeared(NetworkMember member);
> void onDisappeared(NetworkMember member);
> void onAcceptedByGroup(List remoteMembers);
> }
> public interface NetworkMember {
> UUID id();
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14090) Networking API

2021-01-28 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-14090:
--

 Summary: Networking API
 Key: IGNITE-14090
 URL: https://issues.apache.org/jira/browse/IGNITE-14090
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov


It needs to design convinient public API for networking module which allow to 
get information about network members and send/receive messages from them.

Draft:

{noformat}

public interface NetworkService \{ static NetworkService 
create(NetworkConfiguration cfg); void shutdown() throws ???; NetworkMember 
localMember(); Collection remoteMembers(); void 
weakSend(NetworkMember member, Message msg); Future 
guaranteedSend(NetworkMember member, Message msg); void 
listenMembers(MembershipListener lsnr); void 
listenMessages(Consumer lsnr); } public interface 
MembershipListener \{ void onAppeared(NetworkMember member); void 
onDisappeared(NetworkMember member); void onAcceptedByGroup(List 
remoteMembers); } public interface NetworkMember \{ UUID id(); }

{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14089) Override scalecube internal message by custom one

2021-01-28 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-14089:
--

 Summary: Override scalecube internal message by custom one
 Key: IGNITE-14089
 URL: https://issues.apache.org/jira/browse/IGNITE-14089
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov


There is some custom logic in the networking module like a specific handshake, 
message recovery etc. which requires to have specific messages but at the same 
time default scalecube behaviour should be worked correctly. So it needs to 
implement one logic over another.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14088) Implement scalecube transport API over netty

2021-01-28 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-14088:
--

 Summary: Implement scalecube transport API over netty
 Key: IGNITE-14088
 URL: https://issues.apache.org/jira/browse/IGNITE-14088
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov


scalecube has its own netty inside but it is idea to integrate our expanded 
netty into it. It will help us to support more features like our own handshake, 
marshalling etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14086) Implement retry of establishing connection if it was lost

2021-01-28 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-14086:
--

 Summary: Implement retry of establishing connection if it was lost
 Key: IGNITE-14086
 URL: https://issues.apache.org/jira/browse/IGNITE-14086
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov


It needs to implement a retry of establishing the connection. It is not clear 
which way is better to implement such idea because the current implementation 
too difficult to configure(number of retries, several properties of retry 
time). So it needs to think a better way to configure it. And it needs to be 
implementeded.

Perhaps, scalecube(gossip protocol) do all work already and we should do 
nothing here. Need to recheck.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14085) Implement message recovery protocol over handshake

2021-01-28 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-14085:
--

 Summary: Implement message recovery protocol over handshake
 Key: IGNITE-14085
 URL: https://issues.apache.org/jira/browse/IGNITE-14085
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov


The central idea of recovery protocol is the same as it is in the current 
implementation. So it needs to implement a similar idea with the recovery 
descriptor. This means information about last sending/received messages should 
be sent during the handshake and according to this information messages which 
were not received should be sent one more time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14084) Integrate direct marshalling to networking

2021-01-28 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-14084:
--

 Summary: Integrate direct marshalling to networking
 Key: IGNITE-14084
 URL: https://issues.apache.org/jira/browse/IGNITE-14084
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov


Direct marshalling can be extracted from ignite2.x and integrate to ignite3.0. 
It helps to avoid extra data copy during the sending/receiving messages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14083) Add SSL support to networking

2021-01-28 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-14083:
--

 Summary: Add SSL support to networking
 Key: IGNITE-14083
 URL: https://issues.apache.org/jira/browse/IGNITE-14083
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov


It needs to add the ability to establish SSL connection. It looks like it 
should not be a problem. But at least, it needs to design configuration which 
allow to manage the ssl(path to certificate, password, etc.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14082) Implementation of handshake for new connection

2021-01-28 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-14082:
--

 Summary: Implementation of handshake for new connection
 Key: IGNITE-14082
 URL: https://issues.apache.org/jira/browse/IGNITE-14082
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov


It needs to implement the handshake after netty establish the connection. 
Perhaps, It makes sense to use netty handlers. During the handshake, It needs 
to exchange instanceId from one endpoint to another.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14081) Networking module

2021-01-28 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-14081:
--

 Summary: Networking module
 Key: IGNITE-14081
 URL: https://issues.apache.org/jira/browse/IGNITE-14081
 Project: Ignite
  Issue Type: New Feature
Reporter: Anton Kalashnikov






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-14055) Deadlock in timeoutObjectProcessor between 'send message' & 'handshake timeout'

2021-01-25 Thread Anton Kalashnikov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kalashnikov updated IGNITE-14055:
---
Summary: Deadlock in timeoutObjectProcessor between 'send message' & 
'handshake timeout'  (was: Deadlock in timeoutObjectProcessor between 'send 
messag'e & 'handshake timeout')

> Deadlock in timeoutObjectProcessor between 'send message' & 'handshake 
> timeout'
> ---
>
> Key: IGNITE-14055
> URL: https://issues.apache.org/jira/browse/IGNITE-14055
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Kalashnikov
>Assignee: Anton Kalashnikov
>Priority: Major
> Attachments: StartServerWithTxPuts (1).java, freeze (1).sh
>
>
> Cluster hangs after jvm pauses on one of server nodes.
>  Scenario:
>  1. Start three server nodes with put operations using StartServerWithTxPuts.
>  2. Emulate jvm freezes on one server node by running the attached script:
>  {{*sh freeze.sh *}}
>  3. Wait until the script has finished.
> Result:
>  The cluster hangs on tx put operations.
> The first server node continuously prints:
> {noformat}
> [2020-11-03 09:36:01,719][INFO 
> ][grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Accepted incoming communication connection [locAddr=/127.0.0.1:47100, 
> rmtAddr=/127.0.0.1:57714][2020-11-03 09:36:01,720][INFO 
> ][grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Received incoming connection from remote node while connecting to this node, 
> rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, 
> rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3][2020-11-03 
> 09:36:01,922][INFO 
> ][grid-nio-worker-tcp-comm-0-#23%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Accepted incoming communication connection [locAddr=/127.0.0.1:47100, 
> rmtAddr=/127.0.0.1:57716][2020-11-03 09:36:01,922][INFO 
> ][grid-nio-worker-tcp-comm-0-#23%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Received incoming connection from remote node while connecting to this node, 
> rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, 
> rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3][2020-11-03 
> 09:36:02,124][INFO 
> ][grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Accepted incoming communication connection [locAddr=/127.0.0.1:47100, 
> rmtAddr=/127.0.0.1:57718][2020-11-03 09:36:02,125][INFO 
> ][grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Received incoming connection from remote node while connecting to this node, 
> rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, 
> rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3][2020-11-03 
> 09:36:02,326][INFO 
> ][grid-nio-worker-tcp-comm-2-#25%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Accepted incoming communication connection [locAddr=/127.0.0.1:47100, 
> rmtAddr=/127.0.0.1:57720][2020-11-03 09:36:02,327][INFO 
> ][grid-nio-worker-tcp-comm-2-#25%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Received incoming connection from remote node while connecting to this node, 
> rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, 
> rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3][2020-11-03 
> 09:36:02,528][INFO 
> ][grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Accepted incoming communication connection [locAddr=/127.0.0.1:47100, 
> rmtAddr=/127.0.0.1:57722][2020-11-03 09:36:02,529][INFO 
> ][grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Received incoming connection from remote node while connecting to this node, 
> rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, 
> rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3]}}
> {noformat}
>  The second node prints long running transactions in prepared state ignoring 
> the default tx timeout:
>  
> {noformat}
> [2020-11-03 09:36:46,199][WARN 
> ][sys-#83%56b4f715-82d6-4d63-ba99-441ffcd673b4%][diagnostic] >>> Future 
> [startTime=09:33:08.496, curTime=09:36:46.181, fut=GridNearTxFinishFuture 
> [futId=425decc8571-4ce98554-8c56-4daf-a7a9-5b9bff52fa08, tx=GridNearTxLocal 
> [mappings=IgniteTxMappingsSingleImpl [mapping=GridDistributedTxMapping 
> [entries=LinkedHashSet [IgniteTxEntry [txKey=IgniteTxKey 
> [key=KeyCacheObjectImpl [part=833, val=833, hasValBytes=true], 
> cacheId=-923393186], val=TxEntryValueHolder [val=CacheObjectByteArrayImpl 
> [arrLen=1048576], op=CREATE], prevVal=TxEntryValueHolder [val=null, op=NOOP], 
> oldVal=TxEntryValueHolder [val=null, op=NOOP], entryProcessorsCol=null, 
> ttl=-1, conflictExpireTime=-1, conflictVer=null, explicitVer=null, 
> dhtVer=null, filters=CacheEntryPredicate[] [], 

[jira] [Updated] (IGNITE-14055) Deadlock in timeoutObjectProcessor between 'send messag'e & 'handshake timeout'

2021-01-25 Thread Anton Kalashnikov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kalashnikov updated IGNITE-14055:
---
Attachment: StartServerWithTxPuts (1).java

> Deadlock in timeoutObjectProcessor between 'send messag'e & 'handshake 
> timeout'
> ---
>
> Key: IGNITE-14055
> URL: https://issues.apache.org/jira/browse/IGNITE-14055
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Kalashnikov
>Assignee: Anton Kalashnikov
>Priority: Major
> Attachments: StartServerWithTxPuts (1).java, freeze (1).sh
>
>
> Cluster hangs after jvm pauses on one of server nodes.
>  Scenario:
>  1. Start three server nodes with put operations using StartServerWithTxPuts.
>  2. Emulate jvm freezes on one server node by running the attached script:
>  {{*sh freeze.sh *}}
>  3. Wait until the script has finished.
> Result:
>  The cluster hangs on tx put operations.
> The first server node continuously prints:
> {noformat}
> [2020-11-03 09:36:01,719][INFO 
> ][grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Accepted incoming communication connection [locAddr=/127.0.0.1:47100, 
> rmtAddr=/127.0.0.1:57714][2020-11-03 09:36:01,720][INFO 
> ][grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Received incoming connection from remote node while connecting to this node, 
> rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, 
> rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3][2020-11-03 
> 09:36:01,922][INFO 
> ][grid-nio-worker-tcp-comm-0-#23%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Accepted incoming communication connection [locAddr=/127.0.0.1:47100, 
> rmtAddr=/127.0.0.1:57716][2020-11-03 09:36:01,922][INFO 
> ][grid-nio-worker-tcp-comm-0-#23%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Received incoming connection from remote node while connecting to this node, 
> rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, 
> rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3][2020-11-03 
> 09:36:02,124][INFO 
> ][grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Accepted incoming communication connection [locAddr=/127.0.0.1:47100, 
> rmtAddr=/127.0.0.1:57718][2020-11-03 09:36:02,125][INFO 
> ][grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Received incoming connection from remote node while connecting to this node, 
> rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, 
> rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3][2020-11-03 
> 09:36:02,326][INFO 
> ][grid-nio-worker-tcp-comm-2-#25%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Accepted incoming communication connection [locAddr=/127.0.0.1:47100, 
> rmtAddr=/127.0.0.1:57720][2020-11-03 09:36:02,327][INFO 
> ][grid-nio-worker-tcp-comm-2-#25%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Received incoming connection from remote node while connecting to this node, 
> rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, 
> rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3][2020-11-03 
> 09:36:02,528][INFO 
> ][grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Accepted incoming communication connection [locAddr=/127.0.0.1:47100, 
> rmtAddr=/127.0.0.1:57722][2020-11-03 09:36:02,529][INFO 
> ][grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Received incoming connection from remote node while connecting to this node, 
> rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, 
> rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3]}}
> {noformat}
>  The second node prints long running transactions in prepared state ignoring 
> the default tx timeout:
>  
> {noformat}
> [2020-11-03 09:36:46,199][WARN 
> ][sys-#83%56b4f715-82d6-4d63-ba99-441ffcd673b4%][diagnostic] >>> Future 
> [startTime=09:33:08.496, curTime=09:36:46.181, fut=GridNearTxFinishFuture 
> [futId=425decc8571-4ce98554-8c56-4daf-a7a9-5b9bff52fa08, tx=GridNearTxLocal 
> [mappings=IgniteTxMappingsSingleImpl [mapping=GridDistributedTxMapping 
> [entries=LinkedHashSet [IgniteTxEntry [txKey=IgniteTxKey 
> [key=KeyCacheObjectImpl [part=833, val=833, hasValBytes=true], 
> cacheId=-923393186], val=TxEntryValueHolder [val=CacheObjectByteArrayImpl 
> [arrLen=1048576], op=CREATE], prevVal=TxEntryValueHolder [val=null, op=NOOP], 
> oldVal=TxEntryValueHolder [val=null, op=NOOP], entryProcessorsCol=null, 
> ttl=-1, conflictExpireTime=-1, conflictVer=null, explicitVer=null, 
> dhtVer=null, filters=CacheEntryPredicate[] [], filtersPassed=false, 
> filtersSet=true, entry=GridDhtDetachedCacheEntry 
> [super=GridDistributedCacheEntry [super=GridCacheMapEntry 
> 

[jira] [Updated] (IGNITE-14055) Deadlock in timeoutObjectProcessor between 'send messag'e & 'handshake timeout'

2021-01-25 Thread Anton Kalashnikov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kalashnikov updated IGNITE-14055:
---
Attachment: freeze (1).sh

> Deadlock in timeoutObjectProcessor between 'send messag'e & 'handshake 
> timeout'
> ---
>
> Key: IGNITE-14055
> URL: https://issues.apache.org/jira/browse/IGNITE-14055
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Kalashnikov
>Assignee: Anton Kalashnikov
>Priority: Major
> Attachments: StartServerWithTxPuts (1).java, freeze (1).sh
>
>
> Cluster hangs after jvm pauses on one of server nodes.
>  Scenario:
>  1. Start three server nodes with put operations using StartServerWithTxPuts.
>  2. Emulate jvm freezes on one server node by running the attached script:
>  {{*sh freeze.sh *}}
>  3. Wait until the script has finished.
> Result:
>  The cluster hangs on tx put operations.
> The first server node continuously prints:
> {noformat}
> [2020-11-03 09:36:01,719][INFO 
> ][grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Accepted incoming communication connection [locAddr=/127.0.0.1:47100, 
> rmtAddr=/127.0.0.1:57714][2020-11-03 09:36:01,720][INFO 
> ][grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Received incoming connection from remote node while connecting to this node, 
> rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, 
> rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3][2020-11-03 
> 09:36:01,922][INFO 
> ][grid-nio-worker-tcp-comm-0-#23%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Accepted incoming communication connection [locAddr=/127.0.0.1:47100, 
> rmtAddr=/127.0.0.1:57716][2020-11-03 09:36:01,922][INFO 
> ][grid-nio-worker-tcp-comm-0-#23%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Received incoming connection from remote node while connecting to this node, 
> rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, 
> rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3][2020-11-03 
> 09:36:02,124][INFO 
> ][grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Accepted incoming communication connection [locAddr=/127.0.0.1:47100, 
> rmtAddr=/127.0.0.1:57718][2020-11-03 09:36:02,125][INFO 
> ][grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Received incoming connection from remote node while connecting to this node, 
> rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, 
> rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3][2020-11-03 
> 09:36:02,326][INFO 
> ][grid-nio-worker-tcp-comm-2-#25%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Accepted incoming communication connection [locAddr=/127.0.0.1:47100, 
> rmtAddr=/127.0.0.1:57720][2020-11-03 09:36:02,327][INFO 
> ][grid-nio-worker-tcp-comm-2-#25%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Received incoming connection from remote node while connecting to this node, 
> rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, 
> rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3][2020-11-03 
> 09:36:02,528][INFO 
> ][grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Accepted incoming communication connection [locAddr=/127.0.0.1:47100, 
> rmtAddr=/127.0.0.1:57722][2020-11-03 09:36:02,529][INFO 
> ][grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%][TcpCommunicationSpi] 
> Received incoming connection from remote node while connecting to this node, 
> rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, 
> rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3]}}
> {noformat}
>  The second node prints long running transactions in prepared state ignoring 
> the default tx timeout:
>  
> {noformat}
> [2020-11-03 09:36:46,199][WARN 
> ][sys-#83%56b4f715-82d6-4d63-ba99-441ffcd673b4%][diagnostic] >>> Future 
> [startTime=09:33:08.496, curTime=09:36:46.181, fut=GridNearTxFinishFuture 
> [futId=425decc8571-4ce98554-8c56-4daf-a7a9-5b9bff52fa08, tx=GridNearTxLocal 
> [mappings=IgniteTxMappingsSingleImpl [mapping=GridDistributedTxMapping 
> [entries=LinkedHashSet [IgniteTxEntry [txKey=IgniteTxKey 
> [key=KeyCacheObjectImpl [part=833, val=833, hasValBytes=true], 
> cacheId=-923393186], val=TxEntryValueHolder [val=CacheObjectByteArrayImpl 
> [arrLen=1048576], op=CREATE], prevVal=TxEntryValueHolder [val=null, op=NOOP], 
> oldVal=TxEntryValueHolder [val=null, op=NOOP], entryProcessorsCol=null, 
> ttl=-1, conflictExpireTime=-1, conflictVer=null, explicitVer=null, 
> dhtVer=null, filters=CacheEntryPredicate[] [], filtersPassed=false, 
> filtersSet=true, entry=GridDhtDetachedCacheEntry 
> [super=GridDistributedCacheEntry [super=GridCacheMapEntry 
> 

[jira] [Updated] (IGNITE-14055) Deadlock in timeoutObjectProcessor between 'send messag'e & 'handshake timeout'

2021-01-25 Thread Anton Kalashnikov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kalashnikov updated IGNITE-14055:
---
Description: 
Cluster hangs after jvm pauses on one of server nodes.
 Scenario:
 1. Start three server nodes with put operations using StartServerWithTxPuts.
 2. Emulate jvm freezes on one server node by running the attached script:
 {{*sh freeze.sh *}}
 3. Wait until the script has finished.

Result:
 The cluster hangs on tx put operations.

The first server node continuously prints:


{noformat}
[2020-11-03 09:36:01,719][INFO 
][grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%][TcpCommunicationSpi] 
Accepted incoming communication connection [locAddr=/127.0.0.1:47100, 
rmtAddr=/127.0.0.1:57714][2020-11-03 09:36:01,720][INFO 
][grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%][TcpCommunicationSpi] 
Received incoming connection from remote node while connecting to this node, 
rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, 
rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3][2020-11-03 
09:36:01,922][INFO 
][grid-nio-worker-tcp-comm-0-#23%TcpCommunicationSpi%][TcpCommunicationSpi] 
Accepted incoming communication connection [locAddr=/127.0.0.1:47100, 
rmtAddr=/127.0.0.1:57716][2020-11-03 09:36:01,922][INFO 
][grid-nio-worker-tcp-comm-0-#23%TcpCommunicationSpi%][TcpCommunicationSpi] 
Received incoming connection from remote node while connecting to this node, 
rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, 
rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3][2020-11-03 
09:36:02,124][INFO 
][grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi%][TcpCommunicationSpi] 
Accepted incoming communication connection [locAddr=/127.0.0.1:47100, 
rmtAddr=/127.0.0.1:57718][2020-11-03 09:36:02,125][INFO 
][grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi%][TcpCommunicationSpi] 
Received incoming connection from remote node while connecting to this node, 
rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, 
rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3][2020-11-03 
09:36:02,326][INFO 
][grid-nio-worker-tcp-comm-2-#25%TcpCommunicationSpi%][TcpCommunicationSpi] 
Accepted incoming communication connection [locAddr=/127.0.0.1:47100, 
rmtAddr=/127.0.0.1:57720][2020-11-03 09:36:02,327][INFO 
][grid-nio-worker-tcp-comm-2-#25%TcpCommunicationSpi%][TcpCommunicationSpi] 
Received incoming connection from remote node while connecting to this node, 
rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, 
rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3][2020-11-03 
09:36:02,528][INFO 
][grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%][TcpCommunicationSpi] 
Accepted incoming communication connection [locAddr=/127.0.0.1:47100, 
rmtAddr=/127.0.0.1:57722][2020-11-03 09:36:02,529][INFO 
][grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%][TcpCommunicationSpi] 
Received incoming connection from remote node while connecting to this node, 
rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, 
rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3]}}
{noformat}

 The second node prints long running transactions in prepared state ignoring 
the default tx timeout:

 
{noformat}
[2020-11-03 09:36:46,199][WARN 
][sys-#83%56b4f715-82d6-4d63-ba99-441ffcd673b4%][diagnostic] >>> Future 
[startTime=09:33:08.496, curTime=09:36:46.181, fut=GridNearTxFinishFuture 
[futId=425decc8571-4ce98554-8c56-4daf-a7a9-5b9bff52fa08, tx=GridNearTxLocal 
[mappings=IgniteTxMappingsSingleImpl [mapping=GridDistributedTxMapping 
[entries=LinkedHashSet [IgniteTxEntry [txKey=IgniteTxKey 
[key=KeyCacheObjectImpl [part=833, val=833, hasValBytes=true], 
cacheId=-923393186], val=TxEntryValueHolder [val=CacheObjectByteArrayImpl 
[arrLen=1048576], op=CREATE], prevVal=TxEntryValueHolder [val=null, op=NOOP], 
oldVal=TxEntryValueHolder [val=null, op=NOOP], entryProcessorsCol=null, ttl=-1, 
conflictExpireTime=-1, conflictVer=null, explicitVer=null, dhtVer=null, 
filters=CacheEntryPredicate[] [], filtersPassed=false, filtersSet=true, 
entry=GridDhtDetachedCacheEntry [super=GridDistributedCacheEntry 
[super=GridCacheMapEntry [key=KeyCacheObjectImpl [part=833, val=833, 
hasValBytes=true], val=null, ver=GridCacheVersion [topVer=0, order=0, 
nodeOrder=0], hash=833, extras=null, flags=0]]], prepared=0, locked=false, 
nodeId=07583a9d-36c8-4100-a69c-8cbd26ca82c9, locMapped=false, expiryPlc=null, 
transferExpiryPlc=false, flags=0, partUpdateCntr=0, serReadVer=null, 
xidVer=GridCacheVersion [topVer=215865159, order=1604385188157, nodeOrder=2]]], 
explicitLock=false, queryUpdate=false, dhtVer=null, last=false, nearEntries=0, 
clientFirst=false, node=07583a9d-36c8-4100-a69c-8cbd26ca82c9]], 
nearLocallyMapped=false, colocatedLocallyMapped=false, needCheckBackup=null, 
hasRemoteLocks=false, trackTimeout=false, 

[jira] [Created] (IGNITE-14055) Deadlock in timeoutObjectProcessor between 'send messag'e & 'handshake timeout'

2021-01-25 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-14055:
--

 Summary: Deadlock in timeoutObjectProcessor between 'send messag'e 
& 'handshake timeout'
 Key: IGNITE-14055
 URL: https://issues.apache.org/jira/browse/IGNITE-14055
 Project: Ignite
  Issue Type: Bug
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


Cluster hangs after jvm pauses on one of server nodes.
Scenario:
1. Start three server nodes with put operations using StartServerWithTxPuts.
2. Emulate jvm freezes on one server node by running the attached script:
{{*sh freeze.sh  *}}
3. Wait until the script has finished.

Result:
The cluster hangs on tx put operations.

The first server node continuously prints:
{{{noformat}}}
{{}}{{[2020-11-03 09:36:01,719][INFO 
][grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%][TcpCommunicationSpi] 
Accepted incoming communication connection [locAddr=/127.0.0.1:47100, 
rmtAddr=/127.0.0.1:57714][2020-11-03 09:36:01,720][INFO 
][grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%][TcpCommunicationSpi] 
Received incoming connection from remote node while connecting to this node, 
rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, 
rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3][2020-11-03 
09:36:01,922][INFO 
][grid-nio-worker-tcp-comm-0-#23%TcpCommunicationSpi%][TcpCommunicationSpi] 
Accepted incoming communication connection [locAddr=/127.0.0.1:47100, 
rmtAddr=/127.0.0.1:57716][2020-11-03 09:36:01,922][INFO 
][grid-nio-worker-tcp-comm-0-#23%TcpCommunicationSpi%][TcpCommunicationSpi] 
Received incoming connection from remote node while connecting to this node, 
rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, 
rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3][2020-11-03 
09:36:02,124][INFO 
][grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi%][TcpCommunicationSpi] 
Accepted incoming communication connection [locAddr=/127.0.0.1:47100, 
rmtAddr=/127.0.0.1:57718][2020-11-03 09:36:02,125][INFO 
][grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi%][TcpCommunicationSpi] 
Received incoming connection from remote node while connecting to this node, 
rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, 
rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3][2020-11-03 
09:36:02,326][INFO 
][grid-nio-worker-tcp-comm-2-#25%TcpCommunicationSpi%][TcpCommunicationSpi] 
Accepted incoming communication connection [locAddr=/127.0.0.1:47100, 
rmtAddr=/127.0.0.1:57720][2020-11-03 09:36:02,327][INFO 
][grid-nio-worker-tcp-comm-2-#25%TcpCommunicationSpi%][TcpCommunicationSpi] 
Received incoming connection from remote node while connecting to this node, 
rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, 
rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3][2020-11-03 
09:36:02,528][INFO 
][grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%][TcpCommunicationSpi] 
Accepted incoming communication connection [locAddr=/127.0.0.1:47100, 
rmtAddr=/127.0.0.1:57722][2020-11-03 09:36:02,529][INFO 
][grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%][TcpCommunicationSpi] 
Received incoming connection from remote node while connecting to this node, 
rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, 
rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3]}}
{{{noformat}}}{{}}
The second node prints long running transactions in prepared state ignoring the 
default tx timeout:

{{{noformat}}}
{{1}}{{[2020-11-03 09:36:46,199][WARN 
][sys-#83%56b4f715-82d6-4d63-ba99-441ffcd673b4%][diagnostic] >>> Future 
[startTime=09:33:08.496, curTime=09:36:46.181, fut=GridNearTxFinishFuture 
[futId=425decc8571-4ce98554-8c56-4daf-a7a9-5b9bff52fa08, tx=GridNearTxLocal 
[mappings=IgniteTxMappingsSingleImpl [mapping=GridDistributedTxMapping 
[entries=LinkedHashSet [IgniteTxEntry [txKey=IgniteTxKey 
[key=KeyCacheObjectImpl [part=833, val=833, hasValBytes=true], 
cacheId=-923393186], val=TxEntryValueHolder [val=CacheObjectByteArrayImpl 
[arrLen=1048576], op=CREATE], prevVal=TxEntryValueHolder [val=null, op=NOOP], 
oldVal=TxEntryValueHolder [val=null, op=NOOP], entryProcessorsCol=null, ttl=-1, 
conflictExpireTime=-1, conflictVer=null, explicitVer=null, dhtVer=null, 
filters=CacheEntryPredicate[] [], filtersPassed=false, filtersSet=true, 
entry=GridDhtDetachedCacheEntry [super=GridDistributedCacheEntry 
[super=GridCacheMapEntry [key=KeyCacheObjectImpl [part=833, val=833, 
hasValBytes=true], val=null, ver=GridCacheVersion [topVer=0, order=0, 
nodeOrder=0], hash=833, extras=null, flags=0]]], prepared=0, locked=false, 
nodeId=07583a9d-36c8-4100-a69c-8cbd26ca82c9, locMapped=false, expiryPlc=null, 
transferExpiryPlc=false, flags=0, partUpdateCntr=0, serReadVer=null, 
xidVer=GridCacheVersion [topVer=215865159, order=1604385188157, nodeOrder=2]]], 
explicitLock=false, queryUpdate=false, 

[jira] [Commented] (IGNITE-13836) Multiple property roots support

2021-01-22 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17270095#comment-17270095
 ] 

Anton Kalashnikov commented on IGNITE-13836:


[~sergeychugunov], it looks good to me.

> Multiple property roots support
> ---
>
> Key: IGNITE-13836
> URL: https://issues.apache.org/jira/browse/IGNITE-13836
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Anton Kalashnikov
>Assignee: Sergey Chugunov
>Priority: Major
> Fix For: 3.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Right now, Configurator is able to manage only one root. It looks like it is 
> not enough. The current idea is to provide the ability to maintain multiple 
> property roots, which allows other modules to create their own roots as 
> needed.
> ex.:
>  * indexing.query.bufferSize
>  * persistence.pageSize
> NB! There is not any local/cluster root because it looks like local/cluster 
> shouldn't be there at all. Perhaps it should be a storage-specific feature 
> rather than a property path specific.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13912) Incorrect calculation of WAL segments that should be deleted from WAL archive

2021-01-19 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17267807#comment-17267807
 ] 

Anton Kalashnikov commented on IGNITE-13912:


[~ktkale...@gridgain.com] thanks for the changes it looks good to me. 
[~sergeychugunov] can you help with the merge please?

> Incorrect calculation of WAL segments that should be deleted from WAL archive
> -
>
> Key: IGNITE-13912
> URL: https://issues.apache.org/jira/browse/IGNITE-13912
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Reporter: Kirill Tkalenko
>Assignee: Kirill Tkalenko
>Priority: Critical
> Fix For: 2.10
>
> Attachments: wal_usage_dec12.PNG, wal_usage_dec22nd_binary.PNG
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now there is an incorrect calculation of WAL segments that should be deleted 
> from WAL archive. Since we delete only those segments whose total size should 
> not exceed *DataStorageConfiguration#maxWalArchiveSize * 
> IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE*, but should be up to  
> DataStorageConfiguration#maxWalArchiveSize * 
> IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE*. Therefore, an excess of 
> *DataStorageConfiguration#maxWalArchiveSize* occurs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13972) Clear the item id before moving the page to the reuse bucket

2021-01-11 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13972:
--

 Summary: Clear the item id before moving the page to the reuse  
bucket
 Key: IGNITE-13972
 URL: https://issues.apache.org/jira/browse/IGNITE-13972
 Project: Ignite
  Issue Type: Task
Reporter: Anton Kalashnikov


There is assert - 'Incorrectly recycled pageId in reuse 
bucket:'(org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList#takeEmptyPage).
 This assert sometimes fails. The reason is not clear because the same 
condition checked before putting this page in to reuse bucket. (Perhaps we have 
more than 1 link to this page?)

There is an idea to reset item id to 1 before the putting page to reuse bucket 
in order of decreasing the possible invariants which can break this assert. It 
is already true for all data pages but item id can be still more than 1 if it 
is not a data page(ex. inner page).

After that, we can change this assert from checking the range to checking the 
equality to 1 which theoretically will help us detect the problem fastly.

Maybe it is also not a bad idea to set itemId to an impossible value(ex. 0 or 
255). Then we can add the assert on every taking from the free list which 
checks that itemId more than 0 and if it is false that means we have a link to 
the reuse bucket page from the bucket which is not reused. Which is a bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13831) Move WAL archive cleanup from checkpoint to rollover

2020-12-23 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254078#comment-17254078
 ] 

Anton Kalashnikov commented on IGNITE-13831:


[~ktkale...@gridgain.com] thanks for your changes. It looks good to me.

> Move WAL archive cleanup from checkpoint to rollover
> 
>
> Key: IGNITE-13831
> URL: https://issues.apache.org/jira/browse/IGNITE-13831
> Project: Ignite
>  Issue Type: Improvement
>  Components: persistence
>Reporter: Kirill Tkalenko
>Assignee: Kirill Tkalenko
>Priority: Major
> Fix For: 2.10
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Users expect *DataStorageConfiguration#maxWalArchiveSize* to mean that WAL 
> archive will not exceed this value, but it is not.
> It seems that to reduce the chance of getting into a situation when we exceed 
> WAL archive, it will be lowed when we clean it when switching to a new 
> segment than at the end of the checkpoint. It is proposed to move the archive 
> cleanup to *FileWriteAheadLogManager#rollOver* when the 
> *DataStorageConfiguration#maxWalArchiveSize* is reached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13856) Superlinear performance of DirectByteBufferStreamImplV2.writeMessage(msg, writer)

2020-12-23 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254073#comment-17254073
 ] 

Anton Kalashnikov commented on IGNITE-13856:


[~kazakov], thanks for your effort. Now it looks good to me.

> Superlinear performance of DirectByteBufferStreamImplV2.writeMessage(msg, 
> writer)
> -
>
> Key: IGNITE-13856
> URL: https://issues.apache.org/jira/browse/IGNITE-13856
> Project: Ignite
>  Issue Type: Improvement
>  Components: binary
>Affects Versions: 2.9
>Reporter: Ilya Kazakov
>Assignee: Ilya Kazakov
>Priority: Major
> Attachments: LongStringSQL.java
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code:java}
> @Override public void writeMessage(Message msg, MessageWriter writer) { 
> if (msg != null) { 
> if (buf.hasRemaining()) { 
> try { 
> writer.beforeInnerMessageWrite()
> writer.setCurrentWriteClass(msg.getClass()); 
> lastFinished = msg.writeTo(buf, writer); 
> } 
> finally { 
> writer.afterInnerMessageWrite(lastFinished); 
> }
> }
> } 
> }{code}
> It is going to do multiple invocations of msg.writeTo(). If msg is 
> GridH2String, it will to val.getBytes() on every invocation of writeTo(), 
> leading to spiking of CPU and RAM usage.
> We should change this module to make sure that all serialization happens only 
> once.
>  
> Reproducer is attached. If we increase string size in 10 times, then the 
> execution time increases more than 10 times. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13720) Defragmentation parallelism implementation

2020-12-22 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17253565#comment-17253565
 ] 

Anton Kalashnikov commented on IGNITE-13720:


[~sergeychugunov] can you take a look, please?

> Defragmentation parallelism implementation
> --
>
> Key: IGNITE-13720
> URL: https://issues.apache.org/jira/browse/IGNITE-13720
> Project: Ignite
>  Issue Type: Sub-task
>  Components: persistence
>Reporter: Anton Kalashnikov
>Assignee: Anton Kalashnikov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Defragmentation is executed in a single thread right now. It makes sense to 
> execute the defragmentation of partitions of one group in parallel.
> Several parameters will be added to the defragmentation configuration:
>  * checkpointThreadPoolSize - the size of thread pool which would be used by 
> checkpointer for writing defragmented pages to disk.
>  * executionThreadPoolSize - the size of the thread pool which shows how many 
> partitions maximum can be defragmented at the same time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13856) Superlinear performance of DirectByteBufferStreamImplV2.writeMessage(msg, writer)

2020-12-21 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17252778#comment-17252778
 ] 

Anton Kalashnikov commented on IGNITE-13856:


[~kazakov], can you take a look at one more comment in PR.

> Superlinear performance of DirectByteBufferStreamImplV2.writeMessage(msg, 
> writer)
> -
>
> Key: IGNITE-13856
> URL: https://issues.apache.org/jira/browse/IGNITE-13856
> Project: Ignite
>  Issue Type: Improvement
>  Components: binary
>Affects Versions: 2.9
>Reporter: Ilya Kazakov
>Assignee: Ilya Kazakov
>Priority: Major
> Attachments: LongStringSQL.java
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code:java}
> @Override public void writeMessage(Message msg, MessageWriter writer) { 
> if (msg != null) { 
> if (buf.hasRemaining()) { 
> try { 
> writer.beforeInnerMessageWrite()
> writer.setCurrentWriteClass(msg.getClass()); 
> lastFinished = msg.writeTo(buf, writer); 
> } 
> finally { 
> writer.afterInnerMessageWrite(lastFinished); 
> }
> }
> } 
> }{code}
> It is going to do multiple invocations of msg.writeTo(). If msg is 
> GridH2String, it will to val.getBytes() on every invocation of writeTo(), 
> leading to spiking of CPU and RAM usage.
> We should change this module to make sure that all serialization happens only 
> once.
>  
> Reproducer is attached. If we increase string size in 10 times, then the 
> execution time increases more than 10 times. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13856) Superlinear performance of DirectByteBufferStreamImplV2.writeMessage(msg, writer)

2020-12-18 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251827#comment-17251827
 ] 

Anton Kalashnikov commented on IGNITE-13856:


[~kazakov] I left a couple of comments in PR. The major one is about using the 
map for caching. I believe you can use a simple byte array instead.(you can 
look at the usage of arrOff)

> Superlinear performance of DirectByteBufferStreamImplV2.writeMessage(msg, 
> writer)
> -
>
> Key: IGNITE-13856
> URL: https://issues.apache.org/jira/browse/IGNITE-13856
> Project: Ignite
>  Issue Type: Improvement
>  Components: binary
>Affects Versions: 2.9
>Reporter: Ilya Kazakov
>Assignee: Ilya Kazakov
>Priority: Major
> Attachments: LongStringSQL.java
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code:java}
> @Override public void writeMessage(Message msg, MessageWriter writer) { 
> if (msg != null) { 
> if (buf.hasRemaining()) { 
> try { 
> writer.beforeInnerMessageWrite()
> writer.setCurrentWriteClass(msg.getClass()); 
> lastFinished = msg.writeTo(buf, writer); 
> } 
> finally { 
> writer.afterInnerMessageWrite(lastFinished); 
> }
> }
> } 
> }{code}
> It is going to do multiple invocations of msg.writeTo(). If msg is 
> GridH2String, it will to val.getBytes() on every invocation of writeTo(), 
> leading to spiking of CPU and RAM usage.
> We should change this module to make sure that all serialization happens only 
> once.
>  
> Reproducer is attached. If we increase string size in 10 times, then the 
> execution time increases more than 10 times. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13190) Core defragmentation functions

2020-12-17 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250972#comment-17250972
 ] 

Anton Kalashnikov commented on IGNITE-13190:


[~timonin.maksim] thanks for your notice. it really looks suspicious, perhaps 
we lost some changes during the merge. I'll check this out.

> Core defragmentation functions
> --
>
> Key: IGNITE-13190
> URL: https://issues.apache.org/jira/browse/IGNITE-13190
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Sergey Chugunov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: IEP-47
> Fix For: 2.10
>
>  Time Spent: 20h 50m
>  Remaining Estimate: 0h
>
> The following set of functions covering defragmentation happy-case needed:
>  * Initialization of defragmentation manager when node is started in 
> maintenance mode.
>  * Information about partition files is gathered by defrag mgr.
>  * For each partition file corresponding file of defragmented partition is 
> created and initialized.
>  * Keys are transferred from old partitions to new partitions.
>  * Checkpointer is aware of new partition files and flushes defragmented 
> memory to new partition files.
>  
> No fault-tolerance code nor index defragmentation mappings are needed in this 
> task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13848) Premature update SegmentReservationStorage#minReserveIdx during truncate of segments

2020-12-17 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250967#comment-17250967
 ] 

Anton Kalashnikov commented on IGNITE-13848:


[~ktkale...@gridgain.com] LGTM

> Premature update SegmentReservationStorage#minReserveIdx during truncate of  
> segments
> -
>
> Key: IGNITE-13848
> URL: https://issues.apache.org/jira/browse/IGNITE-13848
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Reporter: Kirill Tkalenko
>Assignee: Kirill Tkalenko
>Priority: Major
> Fix For: 2.10
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It was found premature *SegmentReservationStorage#minReserveIdx* update in 
> *FileWriteAheadLogManager#truncate*. Which creates the wrong state of the 
> segments in the archive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13847) Make GridEncryptionManager#onWalSegmentRemoved async

2020-12-17 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250944#comment-17250944
 ] 

Anton Kalashnikov commented on IGNITE-13847:


[~ktkale...@gridgain.com] LGTM

> Make GridEncryptionManager#onWalSegmentRemoved async
> 
>
> Key: IGNITE-13847
> URL: https://issues.apache.org/jira/browse/IGNITE-13847
> Project: Ignite
>  Issue Type: Improvement
>  Components: persistence
>Reporter: Kirill Tkalenko
>Assignee: Kirill Tkalenko
>Priority: Major
>  Labels: IEP-18
> Fix For: 2.10
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When implementing IGNITE-13831 I was faced with deadlock.
> When execute *FileWriteAheadLogManager#rollOver*, begin to clean WAL archive 
> since we have reached the *DataStorageConfiguration#maxWalArchiveSize*, after 
> deleting a segment, execute the *GridEncryptionManager#onWalSegmentRemoved* 
> that wants to write to the metastore, but it will not succeed, since it will 
> wait for *FileWriteAheadLogManager#rollOver*.
> I suggest making the *GridEncryptionManager#onWalSegmentRemoved* asynchronous 
> in a separate pool, for example, as a *CacheGroupPageScanner#singleExecSvc*.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13843) Wrapper/Converter for primitive configuration

2020-12-11 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13843:
--

 Summary: Wrapper/Converter for primitive configuration 
 Key: IGNITE-13843
 URL: https://issues.apache.org/jira/browse/IGNITE-13843
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov


Do we need the ability to use complex type such InternetAddress as wrapper of 
some string property?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13842) Creating the new configuration on old cluster

2020-12-11 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13842:
--

 Summary: Creating the new configuration on old cluster
 Key: IGNITE-13842
 URL: https://issues.apache.org/jira/browse/IGNITE-13842
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov


Do we need the ability to create a new configuration/property on the working 
cluster? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13841) Cluster bootstrapping

2020-12-11 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13841:
--

 Summary: Cluster bootstrapping 
 Key: IGNITE-13841
 URL: https://issues.apache.org/jira/browse/IGNITE-13841
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov


How cluster bootstrapping should look like? Format of files? What is the right 
moment fr applying configuration? What is the state of the cluster before 
applying?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13840) Rething API of Init*, change* classes

2020-12-11 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13840:
--

 Summary: Rething API  of Init*, change* classes
 Key: IGNITE-13840
 URL: https://issues.apache.org/jira/browse/IGNITE-13840
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov


Right now, API of Init*, change* classes look too heavy and contain a lot of 
code boilerplate. It needs to think about how to simplify it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13815) Remove ability to delete segments from the middle of WAL archive

2020-12-11 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247819#comment-17247819
 ] 

Anton Kalashnikov commented on IGNITE-13815:


[~ktkale...@gridgain.com], it looks good to me. I just want to propose to 
rename two methods incMinReserveIndex and incMinLockIndex to something without 
'inc' because 'inc' associated to increment and it is expected the delta as a 
given parameter(or nothing). But in your case, the parameter is not the delta 
but is an absolute value which means it should not be 'inc' it should be 'set'. 
So maybe it is better to rename to setMinReserveIndex or just minReserveIndex. 
In my opinion, it is ok if any other restriction like 'setting only value which 
greater than current' can be described in java-doc rather than name because it 
is anyway impossible to make such an informative name.

> Remove ability to delete segments from the middle of WAL archive
> 
>
> Key: IGNITE-13815
> URL: https://issues.apache.org/jira/browse/IGNITE-13815
> Project: Ignite
>  Issue Type: Improvement
>  Components: persistence
>Reporter: Kirill Tkalenko
>Assignee: Kirill Tkalenko
>Priority: Major
> Fix For: 2.10
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> At the moment we have the option to delete segments from the middle of the 
> archive via the *FileWriteAheadLogManager#truncate*. This creates gaps in the 
> archive and makes it invalid.
> It should be possible to delete segments sequentially up to the upper 
> boundary. It has also been found that there is no protection against segment 
> deletion, which may be needed for a binary recovery.
> Also need to get rid of the physical check when reserving segments through 
> the *FileWriteAheadLogManager#reserve*.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13837) Configuration initialization

2020-12-10 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13837:
--

 Summary: Configuration initialization
 Key: IGNITE-13837
 URL: https://issues.apache.org/jira/browse/IGNITE-13837
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov


It needs to think how the first initialization of node/cluster should look 
like. What is the format of initial properties(json/hocon etc.)? How should 
they be handled?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13836) Multiple property roots support

2020-12-10 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13836:
--

 Summary: Multiple property roots support
 Key: IGNITE-13836
 URL: https://issues.apache.org/jira/browse/IGNITE-13836
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov


Right now, Configurator is able to manage only one root. It looks like it is 
not enough. The current idea is to provide the ability to maintain multiple 
property roots, which allows other modules to create their own roots as needed.

ex.:
 * indexing.query.bufferSize
 * persistence.pageSize

NB! There is not any local/cluster root because it looks like local/cluster 
shouldn't be there at all. Perhaps it should be a storage-specific feature 
rather than a property path specific.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13786) PDS defragmentation can inflate index size

2020-12-10 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247220#comment-17247220
 ] 

Anton Kalashnikov commented on IGNITE-13786:


[~ibessonov] changes look good to me. [~agoncharuk] can you also take a look at 
the changes and then merge them(if everything is ok)?

> PDS defragmentation can inflate index size
> --
>
> Key: IGNITE-13786
> URL: https://issues.apache.org/jira/browse/IGNITE-13786
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For huge caches it is possible that defragmentation will lead to bigger 
> indexes size.
> The reason is that we only append new data to index trees and never insert 
> into the middle, this leads to under-utilization of B+Tree pages space.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13709) Control.sh API - status

2020-12-08 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17245960#comment-17245960
 ] 

Anton Kalashnikov commented on IGNITE-13709:


[~ibessonov] it looks good to me. [~sergeychugunov] can you help with merge 
please?

> Control.sh API - status
> ---
>
> Key: IGNITE-13709
> URL: https://issues.apache.org/jira/browse/IGNITE-13709
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: IEP-47
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> _Prerequisites:_ command can be sent to nodes in maintenance mode and in 
> normal operations as well.
>  
> _Command output:_
>  # For node in normal operations:
> defragmentation is scheduled for caches: 
>  # For node in maintenance mode executing defragmentation:
> defragmentation is completed for the caches:
>     cache0 - size before/after: 200GB/150GB, time took: 15 mins 42 secs
> defragmentation is in progress for cache:
>     cache1 - partitions processed/all: 177/512, time elapsed: 7 mins 11 secs
> awaiting defragmentation: cache2, cache3, cache4.
>  # For node in maintenance mode for other reason:
> no defragmentation is scheduled for the node, the node is in maintenance to 
> perform tasks: 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13775) U.ReentrantReadWriteLockTracer improper realization.

2020-12-04 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243944#comment-17243944
 ] 

Anton Kalashnikov commented on IGNITE-13775:


[~zstan], now changes look good to me. Waiting for TC...

> U.ReentrantReadWriteLockTracer improper realization.
> 
>
> Key: IGNITE-13775
> URL: https://issues.apache.org/jira/browse/IGNITE-13775
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Affects Versions: 2.9
>Reporter: Stanilovsky Evgeny
>Assignee: Stanilovsky Evgeny
>Priority: Major
> Attachments: image-2020-12-01-13-51-39-048.png, screenshot-1.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> ReentrantReadWriteLockTracer accepts ReentrantReadWriteLock as a delegate and 
> stores delegates for readLock and writeLock. But 
> ReentrantReadWriteLock#isWriteLockedByCurrentThread uses sync object to 
> evaluate the result instead of writeLock, and ReentrantReadWriteLockTracer 
> has it's own sync object.
> As a result, if ReentrantReadWriteLockTracer is used to create checkpoint 
> lock (when IGNITE_PDS_LOG_CP_READ_LOCK_HOLDERS=true), 
> GridCacheDatabaseSharedManager#checkpointLockIsHeldByThread doesn't work 
> correctly: it returns false when checkpoint lock is acquired.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13697) Control.sh API - schedule & cancel

2020-12-03 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243288#comment-17243288
 ] 

Anton Kalashnikov commented on IGNITE-13697:


[~ibessonov] changes look good to me.

> Control.sh API - schedule & cancel
> --
>
> Key: IGNITE-13697
> URL: https://issues.apache.org/jira/browse/IGNITE-13697
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: IEP-47
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
>  
>  From original draft by [~sergeychugunov]:
>   
>  Schedule
>  *control.sh defragmentation schedule nodes 
> nodeConsistentId0[,nodeConsistentId1] [caches 
> cacheName0,cacheName1,cacheName2]*
>   
>  Optional list of caches is passed to perform defragmentation for a 
> particular set of caches. By default all caches are defragmented.
>   
>  _Prerequisites_: command is sent to node in normal operations, node in 
> maintenance mode should not accept it
> _Command output:_
>  Defragmentation is successfully scheduled on nodes , on next 
> restart the following caches will be defragmented: .
>  Cancel
>  *control.sh defragmentation cancel nodeHost nodePort [cache cacheName0]*
> _Prerequisites_: command is sent to node in maintenance mode or in normal mode
> _Command output:_
>  Defragmentation is already completed for caches: 
>  Defragmentation is cancelled for caches: ; all intermediate 
> files are cleaned up.
>   
>  *Note:* Caches list for cancel command will not be implemented here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13720) Defragmentation parallelism implementation

2020-11-18 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13720:
--

 Summary: Defragmentation parallelism implementation
 Key: IGNITE-13720
 URL: https://issues.apache.org/jira/browse/IGNITE-13720
 Project: Ignite
  Issue Type: Sub-task
  Components: persistence
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


Defragmentation is executed in a single thread right now. It makes sense to 
execute the defragmentation of partitions of one group in parallel.

Several parameters will be added to the defragmentation configuration:
 * checkpointThreadPoolSize - the size of thread pool which would be used by 
checkpointer for writing defragmented pages to disk.
 * executionThreadPoolSize - the size of the thread pool which shows how many 
partitions maximum can be defragmented at the same time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13681) Non markers checkpoint implementation

2020-11-10 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229319#comment-17229319
 ] 

Anton Kalashnikov commented on IGNITE-13681:


[~sergey-chugunov] can you, please, help with review and merge?

> Non markers checkpoint implementation
> -
>
> Key: IGNITE-13681
> URL: https://issues.apache.org/jira/browse/IGNITE-13681
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Anton Kalashnikov
>Assignee: Anton Kalashnikov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It's needed to implement a new version of checkpoint which will be simpler 
> than the current one. The main differences compared to the current checkpoint:
> * It doesn't contain any write operation to WAL.
> * It doesn't create checkpoint markers.
> * It should be possible to configure checkpoint listener only on the exact 
> data region
> This checkpoint will be helpful for defragmentation and for recovery(it is 
> not possible to use the current checkpoint during recovery right now)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13684) Prepare PageStore/B+Tree to usage outside of standart lifecycle

2020-11-10 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229318#comment-17229318
 ] 

Anton Kalashnikov commented on IGNITE-13684:


[~ibessonov] changes look good to me. Can you only remove useless TODOs which I 
emphasized in the pull-request?

[~sergey-chugunov] can you help with merge please?

> Prepare PageStore/B+Tree to usage outside of standart lifecycle
> ---
>
> Key: IGNITE-13684
> URL: https://issues.apache.org/jira/browse/IGNITE-13684
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Anton Kalashnikov
>Assignee: Ivan Bessonov
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Right now, PageStore and some other classes which responsible for persistent 
> too couple with many other dependencies which not allow to use it in 
> different initial conditions(ex. defragmentation). So it is needed to 
> refactor some places in order to improve this situation.
> Changes are:
> * static constant for cache group meta page;
> * PageStore allocation tracker replaced with a more generic LongConsumer do 
> decouple it from metrics framework;
> * PageReadWriteManager added to basically allow having same cache group in 
> different data regions;
> * several methods and fields exposed as internally public/protected API;
> * several inner classes refactored so that they become static classes;
> * PageIOResolver interface created and used to make data structure more 
> flexible;
> * InsertLast interface for B+Tree added that will optimize comparisons on 
> inserts. Unused for now;
> * All this code doesn't affect existing behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13684) Prepare PageStore/B+Tree to usage outside of standart lifecycle

2020-11-10 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229316#comment-17229316
 ] 

Anton Kalashnikov commented on IGNITE-13684:


Benchmarks look good:
||Benchmark||master - operation||this branch - operation||diff %||
|tx_put|4.00|43723.30|0.90%|
|atomic_put_all_bs_10|67685.50|67149.10|-0.79%|
|atomic_put_get|79930.80|78607.00|-1.66%|
|tx_put_get|25010.20|24600.10|-1.64%|
|sql_query|58618.50|59074.10|0.78%|
|atomic_put_random_value|160666.00|157411.00|-2.03%|
|sql_query_put|104990.00|103457.00|-1.46%|

> Prepare PageStore/B+Tree to usage outside of standart lifecycle
> ---
>
> Key: IGNITE-13684
> URL: https://issues.apache.org/jira/browse/IGNITE-13684
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Anton Kalashnikov
>Assignee: Ivan Bessonov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Right now, PageStore and some other classes which responsible for persistent 
> too couple with many other dependencies which not allow to use it in 
> different initial conditions(ex. defragmentation). So it is needed to 
> refactor some places in order to improve this situation.
> Changes are:
> * static constant for cache group meta page;
> * PageStore allocation tracker replaced with a more generic LongConsumer do 
> decouple it from metrics framework;
> * PageReadWriteManager added to basically allow having same cache group in 
> different data regions;
> * several methods and fields exposed as internally public/protected API;
> * several inner classes refactored so that they become static classes;
> * PageIOResolver interface created and used to make data structure more 
> flexible;
> * InsertLast interface for B+Tree added that will optimize comparisons on 
> inserts. Unused for now;
> * All this code doesn't affect existing behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-13684) Prepare PageStore/B+Tree to usage outside of standart lifecycle

2020-11-10 Thread Anton Kalashnikov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kalashnikov updated IGNITE-13684:
---
Description: 
Right now, PageStore and some other classes which responsible for persistent 
too couple with many other dependencies which not allow to use it in different 
initial conditions(ex. defragmentation). So it is needed to refactor some 
places in order to improve this situation.

Changes are:

* static constant for cache group meta page;
* PageStore allocation tracker replaced with a more generic LongConsumer do 
decouple it from metrics framework;
* PageReadWriteManager added to basically allow having same cache group in 
different data regions;
* several methods and fields exposed as internally public/protected API;
* several inner classes refactored so that they become static classes;
* PageIOResolver interface created and used to make data structure more 
flexible;
* InsertLast interface for B+Tree added that will optimize comparisons on 
inserts. Unused for now;
* All this code doesn't affect existing behavior.

  was:
Right now, ignite has a static pageIo resolver which not allow substituting the 
different implementation if needed. So it is needed to rewrite the current 
implementation in order of this target.

Changes are:

* static constant for cache group meta page;
* PageStore allocation tracker replaced with a more generic LongConsumer do 
decouple it from metrics framework;
* PageReadWriteManager added to basically allow having same cache group in 
different data regions;
* several methods and fields exposed as internally public/protected API;
* several inner classes refactored so that they become static classes;
* PageIOResolver interface created and used to make data structure more 
flexible;
* InsertLast interface for B+Tree added that will optimize comparisons on 
inserts. Unused for now;
* All this code doesn't affect existing behavior.


> Prepare PageStore/B+Tree to usage outside of standart lifecycle
> ---
>
> Key: IGNITE-13684
> URL: https://issues.apache.org/jira/browse/IGNITE-13684
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Anton Kalashnikov
>Assignee: Ivan Bessonov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Right now, PageStore and some other classes which responsible for persistent 
> too couple with many other dependencies which not allow to use it in 
> different initial conditions(ex. defragmentation). So it is needed to 
> refactor some places in order to improve this situation.
> Changes are:
> * static constant for cache group meta page;
> * PageStore allocation tracker replaced with a more generic LongConsumer do 
> decouple it from metrics framework;
> * PageReadWriteManager added to basically allow having same cache group in 
> different data regions;
> * several methods and fields exposed as internally public/protected API;
> * several inner classes refactored so that they become static classes;
> * PageIOResolver interface created and used to make data structure more 
> flexible;
> * InsertLast interface for B+Tree added that will optimize comparisons on 
> inserts. Unused for now;
> * All this code doesn't affect existing behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-13684) Prepare PageStore/B+Tree to usage outside of standart lifecycle

2020-11-10 Thread Anton Kalashnikov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kalashnikov updated IGNITE-13684:
---
Summary: Prepare PageStore/B+Tree to usage outside of standart lifecycle  
(was: Rewrite PageIo resolver from static to explicit dependency)

> Prepare PageStore/B+Tree to usage outside of standart lifecycle
> ---
>
> Key: IGNITE-13684
> URL: https://issues.apache.org/jira/browse/IGNITE-13684
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Anton Kalashnikov
>Assignee: Ivan Bessonov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Right now, ignite has a static pageIo resolver which not allow substituting 
> the different implementation if needed. So it is needed to rewrite the 
> current implementation in order of this target.
> Changes are:
> * static constant for cache group meta page;
> * PageStore allocation tracker replaced with a more generic LongConsumer do 
> decouple it from metrics framework;
> * PageReadWriteManager added to basically allow having same cache group in 
> different data regions;
> * several methods and fields exposed as internally public/protected API;
> * several inner classes refactored so that they become static classes;
> * PageIOResolver interface created and used to make data structure more 
> flexible;
> * InsertLast interface for B+Tree added that will optimize comparisons on 
> inserts. Unused for now;
> * All this code doesn't affect existing behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-13684) Rewrite PageIo resolver from static to explicit dependency

2020-11-10 Thread Anton Kalashnikov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kalashnikov updated IGNITE-13684:
---
Description: 
Right now, ignite has a static pageIo resolver which not allow substituting the 
different implementation if needed. So it is needed to rewrite the current 
implementation in order of this target.

Changes are:

* static constant for cache group meta page;
* PageStore allocation tracker replaced with a more generic LongConsumer do 
decouple it from metrics framework;
* PageReadWriteManager added to basically allow having same cache group in 
different data regions;
* several methods and fields exposed as internally public/protected API;
* several inner classes refactored so that they become static classes;
* PageIOResolver interface created and used to make data structure more 
flexible;
* InsertLast interface for B+Tree added that will optimize comparisons on 
inserts. Unused for now;
* All this code doesn't affect existing behavior.

  was:Right now, ignite has a static pageIo resolver which not allow 
substituting the different implementation if needed. So it is needed to rewrite 
the current implementation in order of this target.


> Rewrite PageIo resolver from static to explicit dependency
> --
>
> Key: IGNITE-13684
> URL: https://issues.apache.org/jira/browse/IGNITE-13684
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Anton Kalashnikov
>Assignee: Ivan Bessonov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Right now, ignite has a static pageIo resolver which not allow substituting 
> the different implementation if needed. So it is needed to rewrite the 
> current implementation in order of this target.
> Changes are:
> * static constant for cache group meta page;
> * PageStore allocation tracker replaced with a more generic LongConsumer do 
> decouple it from metrics framework;
> * PageReadWriteManager added to basically allow having same cache group in 
> different data regions;
> * several methods and fields exposed as internally public/protected API;
> * several inner classes refactored so that they become static classes;
> * PageIOResolver interface created and used to make data structure more 
> flexible;
> * InsertLast interface for B+Tree added that will optimize comparisons on 
> inserts. Unused for now;
> * All this code doesn't affect existing behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13682) Add generic to maintenance mode feature

2020-11-09 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228459#comment-17228459
 ] 

Anton Kalashnikov commented on IGNITE-13682:


[~sergey-chugunov] Can you please take a review and merge this?

> Add generic to maintenance mode feature
> ---
>
> Key: IGNITE-13682
> URL: https://issues.apache.org/jira/browse/IGNITE-13682
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Anton Kalashnikov
>Assignee: Anton Kalashnikov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> MaintenanceAction has no generic right now which lead to parametirezed problem



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-13682) Add generic to maintenance mode feature

2020-11-09 Thread Anton Kalashnikov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kalashnikov updated IGNITE-13682:
---
Summary: Add generic to maintenance mode feature  (was: Added generic to 
maintenance mode feature)

> Add generic to maintenance mode feature
> ---
>
> Key: IGNITE-13682
> URL: https://issues.apache.org/jira/browse/IGNITE-13682
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Anton Kalashnikov
>Assignee: Anton Kalashnikov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> MaintenanceAction has no generic right now which lead to parametirezed problem



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13684) Rewrite PageIo resolver from static to explicit dependency

2020-11-06 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13684:
--

 Summary: Rewrite PageIo resolver from static to explicit dependency
 Key: IGNITE-13684
 URL: https://issues.apache.org/jira/browse/IGNITE-13684
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov
Assignee: Ivan Bessonov


Right now, ignite has a static pageIo resolver which not allow substituting the 
different implementation if needed. So it is needed to rewrite the current 
implementation in order of this target.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13683) Added MVCC validation to ValidateIndexesClosure

2020-11-06 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13683:
--

 Summary: Added MVCC validation to ValidateIndexesClosure
 Key: IGNITE-13683
 URL: https://issues.apache.org/jira/browse/IGNITE-13683
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov
Assignee: Semyon Danilov


MVCC indexes validation should be added to ValidateIndexesClosure



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13682) Added generic to maintenance mode feature

2020-11-06 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13682:
--

 Summary: Added generic to maintenance mode feature
 Key: IGNITE-13682
 URL: https://issues.apache.org/jira/browse/IGNITE-13682
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


MaintenanceAction has no generic right now which lead to parametirezed problem



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13681) Non markers checkpoint implementation

2020-11-06 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13681:
--

 Summary: Non markers checkpoint implementation
 Key: IGNITE-13681
 URL: https://issues.apache.org/jira/browse/IGNITE-13681
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


It's needed to implement a new version of checkpoint which will be simpler than 
the current one. The main differences compared to the current checkpoint:
* It doesn't contain any write operation to WAL.
* It doesn't create checkpoint markers.
* It should be possible to configure checkpoint listener only on the exact data 
region
This checkpoint will be helpful for defragmentation and for recovery(it is not 
possible to use the current checkpoint during recovery right now)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13366) Special mode for maintenance of Ignite node. Employing Maintenance Mode for clearing corrupted PDS files.

2020-10-14 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213750#comment-17213750
 ] 

Anton Kalashnikov commented on IGNITE-13366:


[~sergeychugunov] ok, let's implement it a little later. LGTM.

> Special mode for maintenance of Ignite node. Employing Maintenance Mode for 
> clearing corrupted PDS files.
> -
>
> Key: IGNITE-13366
> URL: https://issues.apache.org/jira/browse/IGNITE-13366
> Project: Ignite
>  Issue Type: New Feature
>  Components: persistence
>Affects Versions: 2.8.1
>Reporter: Sergey Chugunov
>Assignee: Sergey Chugunov
>Priority: Critical
>  Labels: IEP-53
> Fix For: 2.10
>
>   Original Estimate: 168h
>  Time Spent: 1h 40m
>  Remaining Estimate: 166h 20m
>
> If node with persistence is stopped when WAL was disabled for a cache (no 
> matters because of rebalancing in progress or by explicit user request) on 
> next node start all data files of that cache are removed automatically and 
> unconditionally.
> This behavior may be unexpected for users as they may not understand all 
> consequences of disabling WAL locally (for rebalancing) or globally (via 
> IgniteCluster API call). Also it is not smart enough as there is no point in 
> deleting consistent data files.
> We should change this behavior to the following list: no automatic deletions 
> whatsoever. If data files are consistent (equivalent to: no checkpoint was 
> running when node was stopped) start up normally. If data files are 
> corrupted, don't let the node start.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IGNITE-12489) Error during purges by expiration: Unknown page type

2020-10-13 Thread Anton Kalashnikov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kalashnikov reassigned IGNITE-12489:
--

Assignee: (was: Anton Kalashnikov)

> Error during purges by expiration: Unknown page type
> 
>
> Key: IGNITE-12489
> URL: https://issues.apache.org/jira/browse/IGNITE-12489
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.7, 2.7.6
>Reporter: Ruslan Kamashev
>Priority: Blocker
> Fix For: 2.10
>
>
> {{*logger*}}
> {code:java}
> org.apache.ignite.internal.processors.cache.GridCacheIoManager
> {code}
> {{*message*}}
> {code:java}
> Failed to process message [senderId=969d56ba-4b46-40cf-886e-ac445cf6a95d, 
> messageType=class 
> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicUpdateRequest]{code}
> {{*thread*}}
> {code:java}
> sys-stripe-19-#20{code}
> {{*trace*}}
> {code:java}
> java.lang.IllegalStateException: Unknown page type: 1 pageId: 00010303117d
>   at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.io(BPlusTree.java:5058)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access$200(BPlusTree.java:90)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.nextPage(BPlusTree.java:5330)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.next(BPlusTree.java:5566)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpiredInternal(GridCacheOffheapManager.java:2232)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpired(GridCacheOffheapManager.java:2157)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.expire(GridCacheOffheapManager.java:845)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheTtlManager.expire(GridCacheTtlManager.java:207)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheUtils.unwindEvicts(GridCacheUtils.java:888)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessageProcessed(GridCacheIoManager.java:1103)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1076)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:581)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:380)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:306)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:101)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:295)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1569)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1197)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:127)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1093)
>   at 
> org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:505)
>   at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
>   at java.lang.Thread.run(Thread.java:748)
>   Dec 23, 2019 @ 18:28:28.457 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13366) Special mode for maintenance of Ignite node. Employing Maintenance Mode for clearing corrupted PDS files.

2020-10-12 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212401#comment-17212401
 ] 

Anton Kalashnikov commented on IGNITE-13366:


In general, it looks good to me. But I have several questions:
* I noticed that you rewrite the file when a new record is added.  Did you 
think about copy-on-write approach with a temp file?
* Your maintenanceId is UUID right now. But maybe it is better to use something 
more human-readable?
* You start the autoAction(mntcProcessor.prepareAndExecuteMaintenance();) 
before the discovery is started. I don't have the right answer for it but do 
you sure it is the right place for it? Don't we want to call this method in 
another thread(not started one) after the node was entirely started?
* Do we want to add some version for the maintenance record store file? Maybe 
we should add it to the name of the file?

> Special mode for maintenance of Ignite node. Employing Maintenance Mode for 
> clearing corrupted PDS files.
> -
>
> Key: IGNITE-13366
> URL: https://issues.apache.org/jira/browse/IGNITE-13366
> Project: Ignite
>  Issue Type: New Feature
>  Components: persistence
>Affects Versions: 2.8.1
>Reporter: Sergey Chugunov
>Assignee: Sergey Chugunov
>Priority: Critical
>  Labels: IEP-53
> Fix For: 2.10
>
>   Original Estimate: 168h
>  Time Spent: 1h 40m
>  Remaining Estimate: 166h 20m
>
> If node with persistence is stopped when WAL was disabled for a cache (no 
> matters because of rebalancing in progress or by explicit user request) on 
> next node start all data files of that cache are removed automatically and 
> unconditionally.
> This behavior may be unexpected for users as they may not understand all 
> consequences of disabling WAL locally (for rebalancing) or globally (via 
> IgniteCluster API call). Also it is not smart enough as there is no point in 
> deleting consistent data files.
> We should change this behavior to the following list: no automatic deletions 
> whatsoever. If data files are consistent (equivalent to: no checkpoint was 
> running when node was stopped) start up normally. If data files are 
> corrupted, don't let the node start.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13500) Checkpoint read lock fail if it is taking under write lock during the stopping node

2020-10-09 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211041#comment-17211041
 ] 

Anton Kalashnikov commented on IGNITE-13500:


[~sergeychugunov] please, take a look at these changes. They should fix the 
problem with BasicIndexTest#testInlineSizeChange

> Checkpoint read lock fail if it is taking under write lock during the 
> stopping node
> ---
>
> Key: IGNITE-13500
> URL: https://issues.apache.org/jira/browse/IGNITE-13500
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Kalashnikov
>Assignee: Anton Kalashnikov
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> org.apache.ignite.internal.processors.cache.index.BasicIndexTest#testDynamicIndexesDropWithPersistence
> {noformat}
> [2020-09-30 
> 15:09:26,085][ERROR][db-checkpoint-thread-#371%index.BasicIndexTest0%][Checkpointer]
>  Runtime error caught during grid runnable execution: GridWorker 
> [name=db-checkpoint-thread, igniteInstanceName=index.BasicIndexTest0, 
> finished=false, heartbeatTs=1601467766063, hashCode=963964001, 
> interrupted=false, runner=db-checkpoint-thread-#371%index.BasicIndexTest0%]
> class org.apache.ignite.IgniteException: Failed to perform cache update: node 
> is stopping.
>   at 
> org.apache.ignite.internal.processors.cache.persistence.checkpoint.Checkpointer.doCheckpoint(Checkpointer.java:396)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.checkpoint.Checkpointer.body(Checkpointer.java:263)
>   at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: class org.apache.ignite.IgniteException: Failed to perform cache 
> update: node is stopping.
>   at 
> org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointTimeoutLock.checkpointReadLock(CheckpointTimeoutLock.java:128)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1298)
>   at 
> org.apache.ignite.internal.processors.localtask.DurableBackgroundTasksProcessor.removeDurableBackgroundTask(DurableBackgroundTasksProcessor.java:245)
>   at 
> org.apache.ignite.internal.processors.localtask.DurableBackgroundTasksProcessor.onMarkCheckpointBegin(DurableBackgroundTasksProcessor.java:277)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointWorkflow.markCheckpointBegin(CheckpointWorkflow.java:274)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.checkpoint.Checkpointer.doCheckpoint(Checkpointer.java:387)
>   ... 3 more
> Caused by: class org.apache.ignite.internal.NodeStoppingException: Failed to 
> perform cache update: node is stopping.
>   ... 9 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13569) disable archiving + walCompactionEnabled probably broke reading from wal on server restart

2020-10-09 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13569:
--

 Summary: disable archiving + walCompactionEnabled probably broke 
reading from wal on server restart
 Key: IGNITE-13569
 URL: https://issues.apache.org/jira/browse/IGNITE-13569
 Project: Ignite
  Issue Type: Bug
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


* Start cluster with 4 server node
* Preload
* Start 4 clients 
* Start transactional loading
* Wait 10 sec
While loading:
For node in server nodes:
   Kill -9 node
   Wait 20 sec
   Return node back
   Wait 20 sec

Wal + Wal_archive - lab40, lab41 - 
/storage/hdd/aromantsov/GG-18739

Looks like node can't read all wal files that was generated before start node 
back

{noformat}
[12:50:27,001][SEVERE][wal-file-compressor-%null%-1-#71][FileWriteAheadLogManager]
 Compression of WAL segment [idx=0] was skipped due to unexpected error
class org.apache.ignite.IgniteCheckedException: WAL archive segment is missing: 
/storage/ssd/aromantsov/tiden/snapshots-190514-121520/test_pitr/node_1_1/.wal
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressorWorker.body0(FileWriteAheadLogManager.java:2076)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressorWorker.body(FileWriteAheadLogManager.java:2054)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:748)
[12:50:27,001][SEVERE][wal-file-compressor-%null%-0-#69][FileWriteAheadLogManager]
 Compression of WAL segment [idx=2] was skipped due to unexpected error
class org.apache.ignite.IgniteCheckedException: WAL archive segment is missing: 
/storage/ssd/aromantsov/tiden/snapshots-190514-121520/test_pitr/node_1_1/0002.wal
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressorWorker.body0(FileWriteAheadLogManager.java:2076)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressorWorker.access$4800(FileWriteAheadLogManager.java:2019)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressor.body(FileWriteAheadLogManager.java:1995)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:748)
[12:50:27,001][SEVERE][wal-file-compressor-%null%-3-#73][FileWriteAheadLogManager]
 Compression of WAL segment [idx=3] was skipped due to unexpected error
class org.apache.ignite.IgniteCheckedException: WAL archive segment is missing: 
/storage/ssd/aromantsov/tiden/snapshots-190514-121520/test_pitr/node_1_1/0003.wal
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressorWorker.body0(FileWriteAheadLogManager.java:2076)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressorWorker.body(FileWriteAheadLogManager.java:2054)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:748)
[12:50:27,001][SEVERE][wal-file-compressor-%null%-2-#72][FileWriteAheadLogManager]
 Compression of WAL segment [idx=1] was skipped due to unexpected error
class org.apache.ignite.IgniteCheckedException: WAL archive segment is missing: 
/storage/ssd/aromantsov/tiden/snapshots-190514-121520/test_pitr/node_1_1/0001.wal
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressorWorker.body0(FileWriteAheadLogManager.java:2076)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressorWorker.body(FileWriteAheadLogManager.java:2054)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:748)
[12:50:27,002][SEVERE][wal-file-compressor-%null%-1-#71][FileWriteAheadLogManager]
 Compression of WAL segment [idx=4] was skipped due to unexpected error
class org.apache.ignite.IgniteCheckedException: WAL archive segment is missing: 
/storage/ssd/aromantsov/tiden/snapshots-190514-121520/test_pitr/node_1_1/0004.wal
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressorWorker.body0(FileWriteAheadLogManager.java:2076)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressorWorker.body(FileWriteAheadLogManager.java:2054)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:748)
[12:50:27,002][SEVERE][wal-file-compressor-%null%-0-#69][FileWriteAheadLogManager]
 

[jira] [Commented] (IGNITE-13565) Potential further bugs with DurableBackgroundTasks.

2020-10-09 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210785#comment-17210785
 ] 

Anton Kalashnikov commented on IGNITE-13565:


In my opinion, it is not a potential bug, it is already a bug. It looks like if 
DurableBackgroundTask is finished but status isn't updated it metastore, it 
leads to data corruption but finishing DurableBackgroundTask and changing 
status in metastore is not atomic operation so nobody can guarantee that node 
doesn't fail between these two actions. Perhaps, It needs to add some atomic 
operation for detection of finish the DurableBackgroundTask(maybe we should 
write something in WAL).

> Potential further bugs with DurableBackgroundTasks.
> ---
>
> Key: IGNITE-13565
> URL: https://issues.apache.org/jira/browse/IGNITE-13565
> Project: Ignite
>  Issue Type: Bug
>  Components: general
>Affects Versions: 2.8.1
>Reporter: Stanilovsky Evgeny
>Priority: Major
>
> After some code refactoring [1] we obtain a problem with simpe test: 
> org.apache.ignite.internal.processors.cache.index.BasicIndexTest#testInlineSizeChange
> between 
> {noformat}
> execSql(cache, "drop index \"idx1\"");
> {noformat}
> and
> {noformat}
> ig0 = startGrid(0);
> {noformat}
> operations, seems [2] will fix it, but problem could potentially happen again 
> (check attached stacks). In few words already completed durable task not 
> updated 
> {noformat}
> DurableBackgroundTask#complete
> {noformat}
> status on metastore, thus after cluster running this task still can try to 
> run once more with undefined behavior. [~Denis Chudov], [~makedonskaya] pay 
> your attention plz.
> [1] https://issues.apache.org/jira/browse/IGNITE-13207
> [2] https://issues.apache.org/jira/browse/IGNITE-13500
> {noformat}
> 2020-10-09 11:42:41,982][INFO ][test-runner-#1%index.BasicIndexTest%][root] 
> >>> Stopping grid [name=index.BasicIndexTest0, 
> id=161e62a2-1a5d-46b0-892d-2e0274e0]
> [2020-10-09 
> 11:42:41,999][ERROR][db-checkpoint-thread-#61%index.BasicIndexTest0%][root] 
> Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=StopNodeFailureHandler [super=AbstractFailureHandler 
> [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext 
> [type=CRITICAL_ERROR, err=class o.a.i.IgniteException: Failed to perform 
> cache update: node is stopping.]]
> class org.apache.ignite.IgniteException: Failed to perform cache update: node 
> is stopping.
>   at 
> org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointTimeoutLock.checkpointReadLock(CheckpointTimeoutLock.java:125)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1297)
>   at 
> org.apache.ignite.internal.processors.localtask.DurableBackgroundTasksProcessor.removeDurableBackgroundTask(DurableBackgroundTasksProcessor.java:245)
>   at 
> org.apache.ignite.internal.processors.localtask.DurableBackgroundTasksProcessor.onMarkCheckpointBegin(DurableBackgroundTasksProcessor.java:277)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointWorkflow.markCheckpointBegin(CheckpointWorkflow.java:274)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.checkpoint.Checkpointer.doCheckpoint(Checkpointer.java:387)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.checkpoint.Checkpointer.body(Checkpointer.java:263)
>   at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
>   at java.lang.Thread.run(Thread.java:748)
> ...
> starting grid and ...
> java.lang.AssertionError: calculatedOffset=49152, allocated=45056, 
> headerSize=4096, 
> cfgFile=/work/repo/apache-ignite/work/db/index_BasicIndexTest0/cache-default/index.bin
> >>> +---+
> >>> Ignite ver. 2.10.0-SNAPSHOT#20201009-sha1:DEV
> >>> +---+
>   at 
> org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore.read(FilePageStore.java:492)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.read(FilePageStoreManager.java:554)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.read(FilePageStoreManager.java:538)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:884)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:710)
>   at 
> 

[jira] [Created] (IGNITE-13562) Prototype dynamic configuration

2020-10-08 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13562:
--

 Summary: Prototype dynamic configuration
 Key: IGNITE-13562
 URL: https://issues.apache.org/jira/browse/IGNITE-13562
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Kalashnikov
Assignee: Semyon Danilov


The main target to add a new extra configuration module with a framework that 
allows us to create dynamic properties(node local and cluster wide?).

The framework should provide the following:
* Describing a rule for the schema by which public and private property classes 
would be generated
* Implementing generation public and private classes from schema
* Describing a view of public POJO(update/insert/get) to interact with 
properties in a type-safe way 
* Converting the property from HOCON to the inner view







--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13511) Unified configuration

2020-10-02 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13511:
--

 Summary: Unified configuration
 Key: IGNITE-13511
 URL: https://issues.apache.org/jira/browse/IGNITE-13511
 Project: Ignite
  Issue Type: New Feature
Reporter: Anton Kalashnikov


https://cwiki.apache.org/confluence/display/IGNITE/IEP-55+Unified+Configuration



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12489) Error during purges by expiration: Unknown page type

2020-09-30 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17204722#comment-17204722
 ] 

Anton Kalashnikov commented on IGNITE-12489:


[~xdang] Can you please provide more details about your 
configuration(CacheConfiguration mostly) and your load profile(what type of 
request you have)? Also, it will help a lot if you able to write some 
reproducer for this scenario(but I suppose it's not so easy to do)

> Error during purges by expiration: Unknown page type
> 
>
> Key: IGNITE-12489
> URL: https://issues.apache.org/jira/browse/IGNITE-12489
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.7, 2.7.6
>Reporter: Ruslan Kamashev
>Assignee: Anton Kalashnikov
>Priority: Blocker
> Fix For: 2.10
>
>
> {{*logger*}}
> {code:java}
> org.apache.ignite.internal.processors.cache.GridCacheIoManager
> {code}
> {{*message*}}
> {code:java}
> Failed to process message [senderId=969d56ba-4b46-40cf-886e-ac445cf6a95d, 
> messageType=class 
> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicUpdateRequest]{code}
> {{*thread*}}
> {code:java}
> sys-stripe-19-#20{code}
> {{*trace*}}
> {code:java}
> java.lang.IllegalStateException: Unknown page type: 1 pageId: 00010303117d
>   at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.io(BPlusTree.java:5058)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access$200(BPlusTree.java:90)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.nextPage(BPlusTree.java:5330)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.next(BPlusTree.java:5566)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpiredInternal(GridCacheOffheapManager.java:2232)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpired(GridCacheOffheapManager.java:2157)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.expire(GridCacheOffheapManager.java:845)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheTtlManager.expire(GridCacheTtlManager.java:207)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheUtils.unwindEvicts(GridCacheUtils.java:888)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessageProcessed(GridCacheIoManager.java:1103)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1076)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:581)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:380)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:306)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:101)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:295)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1569)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1197)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:127)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1093)
>   at 
> org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:505)
>   at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
>   at java.lang.Thread.run(Thread.java:748)
>   Dec 23, 2019 @ 18:28:28.457 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IGNITE-13500) Checkpoint read lock fail if it is taking under write lock during the stopping node

2020-09-30 Thread Anton Kalashnikov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kalashnikov reassigned IGNITE-13500:
--

Assignee: Anton Kalashnikov

> Checkpoint read lock fail if it is taking under write lock during the 
> stopping node
> ---
>
> Key: IGNITE-13500
> URL: https://issues.apache.org/jira/browse/IGNITE-13500
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Kalashnikov
>Assignee: Anton Kalashnikov
>Priority: Major
>
> org.apache.ignite.internal.processors.cache.index.BasicIndexTest#testDynamicIndexesDropWithPersistence
> {noformat}
> [2020-09-30 
> 15:09:26,085][ERROR][db-checkpoint-thread-#371%index.BasicIndexTest0%][Checkpointer]
>  Runtime error caught during grid runnable execution: GridWorker 
> [name=db-checkpoint-thread, igniteInstanceName=index.BasicIndexTest0, 
> finished=false, heartbeatTs=1601467766063, hashCode=963964001, 
> interrupted=false, runner=db-checkpoint-thread-#371%index.BasicIndexTest0%]
> class org.apache.ignite.IgniteException: Failed to perform cache update: node 
> is stopping.
>   at 
> org.apache.ignite.internal.processors.cache.persistence.checkpoint.Checkpointer.doCheckpoint(Checkpointer.java:396)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.checkpoint.Checkpointer.body(Checkpointer.java:263)
>   at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: class org.apache.ignite.IgniteException: Failed to perform cache 
> update: node is stopping.
>   at 
> org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointTimeoutLock.checkpointReadLock(CheckpointTimeoutLock.java:128)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1298)
>   at 
> org.apache.ignite.internal.processors.localtask.DurableBackgroundTasksProcessor.removeDurableBackgroundTask(DurableBackgroundTasksProcessor.java:245)
>   at 
> org.apache.ignite.internal.processors.localtask.DurableBackgroundTasksProcessor.onMarkCheckpointBegin(DurableBackgroundTasksProcessor.java:277)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointWorkflow.markCheckpointBegin(CheckpointWorkflow.java:274)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.checkpoint.Checkpointer.doCheckpoint(Checkpointer.java:387)
>   ... 3 more
> Caused by: class org.apache.ignite.internal.NodeStoppingException: Failed to 
> perform cache update: node is stopping.
>   ... 9 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13500) Checkpoint read lock fail if it is taking under write lock during the stopping node

2020-09-30 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13500:
--

 Summary: Checkpoint read lock fail if it is taking under write 
lock during the stopping node
 Key: IGNITE-13500
 URL: https://issues.apache.org/jira/browse/IGNITE-13500
 Project: Ignite
  Issue Type: Bug
Reporter: Anton Kalashnikov


org.apache.ignite.internal.processors.cache.index.BasicIndexTest#testDynamicIndexesDropWithPersistence

{noformat}
[2020-09-30 
15:09:26,085][ERROR][db-checkpoint-thread-#371%index.BasicIndexTest0%][Checkpointer]
 Runtime error caught during grid runnable execution: GridWorker 
[name=db-checkpoint-thread, igniteInstanceName=index.BasicIndexTest0, 
finished=false, heartbeatTs=1601467766063, hashCode=963964001, 
interrupted=false, runner=db-checkpoint-thread-#371%index.BasicIndexTest0%]
class org.apache.ignite.IgniteException: Failed to perform cache update: node 
is stopping.
at 
org.apache.ignite.internal.processors.cache.persistence.checkpoint.Checkpointer.doCheckpoint(Checkpointer.java:396)
at 
org.apache.ignite.internal.processors.cache.persistence.checkpoint.Checkpointer.body(Checkpointer.java:263)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:748)
Caused by: class org.apache.ignite.IgniteException: Failed to perform cache 
update: node is stopping.
at 
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointTimeoutLock.checkpointReadLock(CheckpointTimeoutLock.java:128)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1298)
at 
org.apache.ignite.internal.processors.localtask.DurableBackgroundTasksProcessor.removeDurableBackgroundTask(DurableBackgroundTasksProcessor.java:245)
at 
org.apache.ignite.internal.processors.localtask.DurableBackgroundTasksProcessor.onMarkCheckpointBegin(DurableBackgroundTasksProcessor.java:277)
at 
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointWorkflow.markCheckpointBegin(CheckpointWorkflow.java:274)
at 
org.apache.ignite.internal.processors.cache.persistence.checkpoint.Checkpointer.doCheckpoint(Checkpointer.java:387)
... 3 more
Caused by: class org.apache.ignite.internal.NodeStoppingException: Failed to 
perform cache update: node is stopping.
... 9 more
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-13207) Checkpointer code refactoring: Splitting GridCacheDatabaseSharedManager ant Checkpointer

2020-09-23 Thread Anton Kalashnikov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kalashnikov updated IGNITE-13207:
---
Description: 
The main target of this ticket - providing the possibility to reuse all or part 
of the checkpoint classes in a different way(ex. light-weight checkpoint during 
the defragmentation).

What was done in this ticket:

New classes:
* CheckpointReadWriteLock - encapsulation of specific logic of checkpoint lock
* CheckpointTimeoutLock - read lock with a timeout which able to trigger the 
new checkpoint if needed
* CheckpointMakersStorage - encapsulation of the work over the checkpoint 
markers - write to/read from disk, caching the actual markers
* CheckpointWorkflow - encapsulation of the checkpoint steps like checkpoint 
begin, checkpoint end
* CheckpointManager - the entry point of the checkpoint. It responsible for 
consistent initialization of all checkpoint related components and it provides 
API for interaction with them.
* WorkProgressDispatcher - interface for worker's heartbeat management

Renamed classes:
* DbCheckpointListener -> CheckpointListener(it also moved to checkpoint 
package)
* WriteCheckpointPages -> CheckpointPagesWriter
* DbCheckpointContextImpl -> CheckpointContextImpl

Logical changes:
* asyncRunner(Checkpoint runner thread pool) was replaced by two 
checkpointWritePagesPool(IO-bound thread pool for writing dirty pages to disk) 
and checkpointCollectPagesInfoPool(CPU-bound thread pool for collection the 
dirty pages from memory)
* mehod afterCheckpointEnd was added to CheckpointListener

  was:
The main target of this ticket - providing the possibility to reuse all or part 
of the checkpoint classes in a different way(ex. light-weight checkpoint during 
the defragmentation).

What was done in this ticket:

New classes:
* CheckpointReadWriteLock - encapsulation of specific logic of checkpoint lock
* CheckpointTimeoutLock - read lock with a timeout which able to trigger the 
new checkpoint if needed
* CheckpointStorage - encapsulation of the work over the checkpoint markers - 
write to/read from disk, caching the actual markers
* CheckpointProcess - encapsulation of the checkpoint steps like checkpoint 
begin, checkpoint end
* CheckpointManager - the entry point of the checkpoint. It responsible for 
consistent initialization of all checkpoint related components and it provides 
API for interaction with them.
* WorkProgressDispatcher - interface for worker's heartbeat management

Renamed classes:
* DbCheckpointListener -> CheckpointListener(it also moved to checkpoint 
package)
* WriteCheckpointPages -> CheckpointPagesWriter
* DbCheckpointContextImpl -> CheckpointContextImpl

Logical changes:
* asyncRunner(Checkpoint runner thread pool) was replaced by two 
checkpointWritePagesPool(IO-bound thread pool for writing dirty pages to disk) 
and checkpointCollectPagesInfoPool(CPU-bound thread pool for collection the 
dirty pages from memory)
* mehod afterCheckpointEnd was added to CheckpointListener


> Checkpointer code refactoring: Splitting GridCacheDatabaseSharedManager ant 
> Checkpointer
> 
>
> Key: IGNITE-13207
> URL: https://issues.apache.org/jira/browse/IGNITE-13207
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Anton Kalashnikov
>Assignee: Anton Kalashnikov
>Priority: Major
>  Labels: IEP-47
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The main target of this ticket - providing the possibility to reuse all or 
> part of the checkpoint classes in a different way(ex. light-weight checkpoint 
> during the defragmentation).
> What was done in this ticket:
> New classes:
> * CheckpointReadWriteLock - encapsulation of specific logic of checkpoint lock
> * CheckpointTimeoutLock - read lock with a timeout which able to trigger the 
> new checkpoint if needed
> * CheckpointMakersStorage - encapsulation of the work over the checkpoint 
> markers - write to/read from disk, caching the actual markers
> * CheckpointWorkflow - encapsulation of the checkpoint steps like checkpoint 
> begin, checkpoint end
> * CheckpointManager - the entry point of the checkpoint. It responsible for 
> consistent initialization of all checkpoint related components and it 
> provides API for interaction with them.
> * WorkProgressDispatcher - interface for worker's heartbeat management
> Renamed classes:
> * DbCheckpointListener -> CheckpointListener(it also moved to checkpoint 
> package)
> * WriteCheckpointPages -> CheckpointPagesWriter
> * DbCheckpointContextImpl -> CheckpointContextImpl
> Logical changes:
> * asyncRunner(Checkpoint runner thread pool) was replaced by two 
> checkpointWritePagesPool(IO-bound thread pool for writing dirty pages to 
> disk) and 

[jira] [Commented] (IGNITE-13435) Fixing some unrecorded issues command warm-up control.sh

2020-09-17 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17197571#comment-17197571
 ] 

Anton Kalashnikov commented on IGNITE-13435:


[~ktkale...@gridgain.com] LGTM.

> Fixing some unrecorded issues command warm-up control.sh
> 
>
> Key: IGNITE-13435
> URL: https://issues.apache.org/jira/browse/IGNITE-13435
> Project: Ignite
>  Issue Type: Bug
>  Components: control.sh
>Reporter: Kirill Tkalenko
>Assignee: Kirill Tkalenko
>Priority: Major
>  Labels: IEP-40
> Fix For: 2.10
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Unrecorded problems:
> * When parsing arguments for the warm-up command, subsequent arguments may be 
> skipped, such as auto-confirmation "--yes";
> * Processing requests for jetty;
> * Authorization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-13207) Checkpointer code refactoring: Splitting GridCacheDatabaseSharedManager ant Checkpointer

2020-09-11 Thread Anton Kalashnikov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kalashnikov updated IGNITE-13207:
---
Description: 
The main target of this ticket - providing the possibility to reuse all or part 
of the checkpoint classes in a different way(ex. light-weight checkpoint during 
the defragmentation).

What was done in this ticket:

New classes:
* CheckpointReadWriteLock - encapsulation of specific logic of checkpoint lock
* CheckpointTimeoutLock - read lock with a timeout which able to trigger the 
new checkpoint if needed
* CheckpointStorage - encapsulation of the work over the checkpoint markers - 
write to/read from disk, caching the actual markers
* CheckpointProcess - encapsulation of the checkpoint steps like checkpoint 
begin, checkpoint end
* CheckpointManager - the entry point of the checkpoint. It responsible for 
consistent initialization of all checkpoint related components and it provides 
API for interaction with them.
* WorkProgressDispatcher - interface for worker's heartbeat management

Renamed classes:
* DbCheckpointListener -> CheckpointListener(it also moved to checkpoint 
package)
* WriteCheckpointPages -> CheckpointPagesWriter
* DbCheckpointContextImpl -> CheckpointContextImpl

Logical changes:
* asyncRunner(Checkpoint runner thread pool) was replaced by two 
checkpointWritePagesPool(IO-bound thread pool for writing dirty pages to disk) 
and checkpointCollectPagesInfoPool(CPU-bound thread pool for collection the 
dirty pages from memory)
* mehod afterCheckpointEnd was added to CheckpointListener

> Checkpointer code refactoring: Splitting GridCacheDatabaseSharedManager ant 
> Checkpointer
> 
>
> Key: IGNITE-13207
> URL: https://issues.apache.org/jira/browse/IGNITE-13207
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Anton Kalashnikov
>Assignee: Anton Kalashnikov
>Priority: Major
>  Labels: IEP-47
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The main target of this ticket - providing the possibility to reuse all or 
> part of the checkpoint classes in a different way(ex. light-weight checkpoint 
> during the defragmentation).
> What was done in this ticket:
> New classes:
> * CheckpointReadWriteLock - encapsulation of specific logic of checkpoint lock
> * CheckpointTimeoutLock - read lock with a timeout which able to trigger the 
> new checkpoint if needed
> * CheckpointStorage - encapsulation of the work over the checkpoint markers - 
> write to/read from disk, caching the actual markers
> * CheckpointProcess - encapsulation of the checkpoint steps like checkpoint 
> begin, checkpoint end
> * CheckpointManager - the entry point of the checkpoint. It responsible for 
> consistent initialization of all checkpoint related components and it 
> provides API for interaction with them.
> * WorkProgressDispatcher - interface for worker's heartbeat management
> Renamed classes:
> * DbCheckpointListener -> CheckpointListener(it also moved to checkpoint 
> package)
> * WriteCheckpointPages -> CheckpointPagesWriter
> * DbCheckpointContextImpl -> CheckpointContextImpl
> Logical changes:
> * asyncRunner(Checkpoint runner thread pool) was replaced by two 
> checkpointWritePagesPool(IO-bound thread pool for writing dirty pages to 
> disk) and checkpointCollectPagesInfoPool(CPU-bound thread pool for collection 
> the dirty pages from memory)
> * mehod afterCheckpointEnd was added to CheckpointListener



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13362) Stop warm-up via control.sh

2020-09-10 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193618#comment-17193618
 ] 

Anton Kalashnikov commented on IGNITE-13362:


[~ktkale...@gridgain.com] thanks for your changes. The code looks good to me.

> Stop warm-up via control.sh
> ---
>
> Key: IGNITE-13362
> URL: https://issues.apache.org/jira/browse/IGNITE-13362
> Project: Ignite
>  Issue Type: New Feature
>Reporter: Kirill Tkalenko
>Assignee: Kirill Tkalenko
>Priority: Major
>  Labels: IEP-40
> Fix For: 2.10
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> At the moment, stop warm-up via "control.sh" is not possible due to fact that 
> processing messages from "control.sh" occurs after "discovery" and warm-up 
> goes before it. 
> It is necessary to do processing of messages from "control.sh" before warming 
> up and implement command for "control.sh".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13345) Warming up strategy

2020-08-24 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17183251#comment-17183251
 ] 

Anton Kalashnikov commented on IGNITE-13345:


[~sergey-chugunov] can you help with the merge, please.

> Warming up strategy
> ---
>
> Key: IGNITE-13345
> URL: https://issues.apache.org/jira/browse/IGNITE-13345
> Project: Ignite
>  Issue Type: New Feature
>Reporter: Kirill Tkalenko
>Assignee: Kirill Tkalenko
>Priority: Major
>  Labels: IEP-40
> Fix For: 2.10
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Summary of 
> [Dev-list|http://apache-ignite-developers.2346864.n4.nabble.com/DISCUSSION-Cache-warmup-td48582.html]
> # Adding a marker interface 
> *org.apache.ignite.configuration.WarmUpConfiguration*;
> # Adding a configuration to
> ## 
> *org.apache.ignite.configuration.DataRegionConfiguration#setWarmUpConfiguration*
> ## 
> *org.apache.ignite.configuration.DataStorageConfiguration#setDefaultWarmUpConfiguration*
> # Add an internal warm-up interface that will start in [1] after [2] (after 
> recovery);
> {code:java}
> package org.apache.ignite.internal.processors.cache.warmup;
> import org.apache.ignite.IgniteCheckedException;
> import org.apache.ignite.configuration.WarmUpConfiguration;
> import org.apache.ignite.internal.GridKernalContext;
> import org.apache.ignite.internal.processors.cache.persistence.DataRegion;
> /**
>  * Interface for warming up.
>  */
> public interface WarmUpStrategy {
> /**
>  * Returns configuration class for mapping to strategy.
>  *
>  * @return Configuration class.
>  */
> Class configClass();
> /**
>  * Warm up.
>  *
>  * @param kernalCtx Kernal context.
>  * @param cfg   Warm-up configuration.
>  * @param regionData region.
>  * @throws IgniteCheckedException if faild.
>  */
> void warmUp(GridKernalContext kernalCtx, T cfg, DataRegion region) throws 
> IgniteCheckedException;
> /**
>  * Closing warm up.
>  *
>  * @throws IgniteCheckedException if faild.
>  */
> void close() throws IgniteCheckedException;
> }
> {code}
> # Adding an internal plugin extension for add own strategies;
> {code:java}
> package org.apache.ignite.internal.processors.cache.warmup;
>  
> import java.util.Collection;
> import org.apache.ignite.plugin.Extension;
>  
> /**
>  * Interface for getting warm-up strategies from plugins.
>  */
> public interface WarmUpStrategySupplier extends Extension {
> /**
>  * Getting warm-up strategies.
>  *
>  * @return Warm-up strategies.
>  */
> Collection strategies();
> }
> {code}
> # Adding strategies:
> ## Without implementation, for the possibility of disabling the warm-up: NoOP
> ## Loading everything while there is RAM with priority to indexes: LoadAll
> # Add a command to "control.sh", to stop current warm-up and cancel all 
> others: --warm-up stop in IGNITE-13362
> [1] - 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.CacheRecoveryLifecycle#afterLogicalUpdatesApplied
> [2] - 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.CacheRecoveryLifecycle#restorePartitionStates



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13345) Warming up strategy

2020-08-24 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17183243#comment-17183243
 ] 

Anton Kalashnikov commented on IGNITE-13345:


[~ktkale...@gridgain.com] LGTM.

> Warming up strategy
> ---
>
> Key: IGNITE-13345
> URL: https://issues.apache.org/jira/browse/IGNITE-13345
> Project: Ignite
>  Issue Type: New Feature
>Reporter: Kirill Tkalenko
>Assignee: Kirill Tkalenko
>Priority: Major
>  Labels: IEP-40
> Fix For: 2.10
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Summary of 
> [Dev-list|http://apache-ignite-developers.2346864.n4.nabble.com/DISCUSSION-Cache-warmup-td48582.html]
> # Adding a marker interface 
> *org.apache.ignite.configuration.WarmUpConfiguration*;
> # Adding a configuration to
> ## 
> *org.apache.ignite.configuration.DataRegionConfiguration#setWarmUpConfiguration*
> ## 
> *org.apache.ignite.configuration.DataStorageConfiguration#setDefaultWarmUpConfiguration*
> # Add an internal warm-up interface that will start in [1] after [2] (after 
> recovery);
> {code:java}
> package org.apache.ignite.internal.processors.cache.warmup;
> import org.apache.ignite.IgniteCheckedException;
> import org.apache.ignite.configuration.WarmUpConfiguration;
> import org.apache.ignite.internal.GridKernalContext;
> import org.apache.ignite.internal.processors.cache.persistence.DataRegion;
> /**
>  * Interface for warming up.
>  */
> public interface WarmUpStrategy {
> /**
>  * Returns configuration class for mapping to strategy.
>  *
>  * @return Configuration class.
>  */
> Class configClass();
> /**
>  * Warm up.
>  *
>  * @param kernalCtx Kernal context.
>  * @param cfg   Warm-up configuration.
>  * @param regionData region.
>  * @throws IgniteCheckedException if faild.
>  */
> void warmUp(GridKernalContext kernalCtx, T cfg, DataRegion region) throws 
> IgniteCheckedException;
> /**
>  * Closing warm up.
>  *
>  * @throws IgniteCheckedException if faild.
>  */
> void close() throws IgniteCheckedException;
> }
> {code}
> # Adding an internal plugin extension for add own strategies;
> {code:java}
> package org.apache.ignite.internal.processors.cache.warmup;
>  
> import java.util.Collection;
> import org.apache.ignite.plugin.Extension;
>  
> /**
>  * Interface for getting warm-up strategies from plugins.
>  */
> public interface WarmUpStrategySupplier extends Extension {
> /**
>  * Getting warm-up strategies.
>  *
>  * @return Warm-up strategies.
>  */
> Collection strategies();
> }
> {code}
> # Adding strategies:
> ## Without implementation, for the possibility of disabling the warm-up: NoOP
> ## Loading everything while there is RAM with priority to indexes: LoadAll
> # Add a command to "control.sh", to stop current warm-up and cancel all 
> others: --warm-up stop in IGNITE-13362
> [1] - 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.CacheRecoveryLifecycle#afterLogicalUpdatesApplied
> [2] - 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.CacheRecoveryLifecycle#restorePartitionStates



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13367) meta --remove command usage improvements

2020-08-20 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181119#comment-17181119
 ] 

Anton Kalashnikov commented on IGNITE-13367:


[~sergeychugunov] yes, I took a look at it already. It looks good to me.

> meta --remove command usage improvements
> 
>
> Key: IGNITE-13367
> URL: https://issues.apache.org/jira/browse/IGNITE-13367
> Project: Ignite
>  Issue Type: Improvement
>  Components: control.sh
>Reporter: Sergey Chugunov
>Assignee: Sergey Chugunov
>Priority: Major
> Fix For: 2.10
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Command for removing metadata has the following issues:
> # In 'Type not found' scenario it prints long stack traces to console instead 
> of short information about requested type.
> # When used it registers some internal classes which are not supposed to go 
> through binary metadata registration protocol.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13368) Speed base throttling unexpectedly degraded to zero

2020-08-19 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180371#comment-17180371
 ] 

Anton Kalashnikov commented on IGNITE-13368:


[~sergey-chugunov] can you take a look at it?

> Speed base throttling unexpectedly degraded to zero
> ---
>
> Key: IGNITE-13368
> URL: https://issues.apache.org/jira/browse/IGNITE-13368
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Kalashnikov
>Assignee: Anton Kalashnikov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> New test failure in master PagesWriteThrottleSmokeTest.testThrottle 
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=2808794487465215609=%3Cdefault%3E=testDetails
> Throttling degraded to zero.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13368) Speed base throttling unexpectedly degraded to zero

2020-08-18 Thread Anton Kalashnikov (Jira)
Anton Kalashnikov created IGNITE-13368:
--

 Summary: Speed base throttling unexpectedly degraded to zero
 Key: IGNITE-13368
 URL: https://issues.apache.org/jira/browse/IGNITE-13368
 Project: Ignite
  Issue Type: Bug
Reporter: Anton Kalashnikov
Assignee: Anton Kalashnikov


New test failure in master PagesWriteThrottleSmokeTest.testThrottle 
https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=2808794487465215609=%3Cdefault%3E=testDetails

Throttling degraded to zero.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13151) Checkpointer code refactoring: extracting classes from GridCacheDatabaseSharedManager

2020-08-13 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177015#comment-17177015
 ] 

Anton Kalashnikov commented on IGNITE-13151:


[~agura] Thanks for your comment, but there are no new classes, in fact, all of 
these classes were extracted mostly from GridCacheDatabaseSharedManager with 
minimum changes. So I agree that javadocs are not perfect and I improved it a 
little but the further improvement I suggest to do in my next task because 
these classes will be changed. It is the same about naming - I left old names 
for easier the review but in the further, it is a high probability that I will 
find a more suitable name for them.

[~sergey-chugunov] can you recheck these changes(there are not a lot of changes 
since the last time) and merge it to master.

> Checkpointer code refactoring: extracting classes from 
> GridCacheDatabaseSharedManager
> -
>
> Key: IGNITE-13151
> URL: https://issues.apache.org/jira/browse/IGNITE-13151
> Project: Ignite
>  Issue Type: Sub-task
>  Components: persistence
>Reporter: Sergey Chugunov
>Assignee: Anton Kalashnikov
>Priority: Major
>  Labels: IEP-47
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Checkpointer is at the center of Ignite persistence subsystem and more people 
> from the community understand it the better means it is more stable and more 
> efficient.
> However for now checkpointer code sits inside of 
> GridCacheDatabaseSharedManager class and is entangled with this higher-level 
> and more general component.
> To take a step forward to more modular checkpointer we need to do two things:
>  # Move checkpointer code outside database manager to a separate class. 
> (That's what this ticket is about.)
>  # Create a well-defined API of checkpointer that will allow us to create new 
> implementations of checkpointer in the future. An example of this is new 
> checkpointer implementation needed for defragmentation feature purposes. 
> (Should be done in a separate ticket)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13013) Thick client must not open server sockets when used by serverless functions

2020-08-06 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17172149#comment-17172149
 ] 

Anton Kalashnikov commented on IGNITE-13013:


[~ibessonov] LGTM. [~agoncharuk] can you help with the merge, please?

> Thick client must not open server sockets when used by serverless functions
> ---
>
> Key: IGNITE-13013
> URL: https://issues.apache.org/jira/browse/IGNITE-13013
> Project: Ignite
>  Issue Type: Improvement
>  Components: networking
>Affects Versions: 2.8
>Reporter: Denis A. Magda
>Assignee: Ivan Bessonov
>Priority: Critical
> Fix For: 2.10
>
> Attachments: image-2020-07-30-18-42-01-266.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> A thick client fails to start if being used inside of a serverless function 
> such as AWS Lamda or Azure Functions. Cloud providers prohibit opening 
> network ports to accept connections on the function's end. In short, the 
> function can only connect to a remote address.
> To reproduce, you can follow this tutorial and swap the thin client (used in 
> the tutorial) with the thick one: 
> https://www.gridgain.com/docs/tutorials/serverless/azure_functions_tutorial
> The thick client needs to support a mode when the communication SPI doesn't 
> create a server socket if the client is used for serverless computing. This 
> improvement looks like an extra task of this initiative: 
> https://issues.apache.org/jira/browse/IGNITE-12438



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13013) Thick client must not open server sockets when used by serverless functions

2020-08-04 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17170832#comment-17170832
 ] 

Anton Kalashnikov commented on IGNITE-13013:


[~ibessonov] new changes also LGTM but I have one note - maybe it is better 
instead of using magic number = 0(when socket not needed) use well-described 
constant(like PORT_DISABLED or RECEIVED_SOCKET_DISABLE) or at least adding some 
comments to this number because when I see port equal to 0 it's not obvious 
which behaviour I should expect.

> Thick client must not open server sockets when used by serverless functions
> ---
>
> Key: IGNITE-13013
> URL: https://issues.apache.org/jira/browse/IGNITE-13013
> Project: Ignite
>  Issue Type: Improvement
>  Components: networking
>Affects Versions: 2.8
>Reporter: Denis A. Magda
>Assignee: Ivan Bessonov
>Priority: Critical
> Fix For: 2.10
>
> Attachments: image-2020-07-30-18-42-01-266.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> A thick client fails to start if being used inside of a serverless function 
> such as AWS Lamda or Azure Functions. Cloud providers prohibit opening 
> network ports to accept connections on the function's end. In short, the 
> function can only connect to a remote address.
> To reproduce, you can follow this tutorial and swap the thin client (used in 
> the tutorial) with the thick one: 
> https://www.gridgain.com/docs/tutorials/serverless/azure_functions_tutorial
> The thick client needs to support a mode when the communication SPI doesn't 
> create a server socket if the client is used for serverless computing. This 
> improvement looks like an extra task of this initiative: 
> https://issues.apache.org/jira/browse/IGNITE-12438



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13098) TcpCommunicationSpi split to independent classes

2020-07-31 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17168963#comment-17168963
 ] 

Anton Kalashnikov commented on IGNITE-13098:


[~ivan.glukos][~mstepachev], I took a look at it. LGTM.

> TcpCommunicationSpi split to independent classes
> 
>
> Key: IGNITE-13098
> URL: https://issues.apache.org/jira/browse/IGNITE-13098
> Project: Ignite
>  Issue Type: Bug
> Environment: TcpCommunicationSpi split to independent classes
>Reporter: Stepachev Maksim
>Assignee: Stepachev Maksim
>Priority: Major
> Fix For: 2.10
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h2. Description
> This ticket describes  requirements for TcpCommunicationSpi refactoring. The 
> main goal is to split the class without changing behavior and public API.
> *Actual problem:*
> CurrentlyTcpCommunicationSpi has over 5K lines and includes about15+ inner 
> classes like:
>  # ShmemAcceptWorker
>  # SHMemHandshakeClosure
>  # ShmemWorker
>  # CommunicationDiscoveryEventListener
>  # CommunicationWorker
>  # ConnectFuture
>  # ConnectGateway
>  # ConnectionKey
>  # ConnectionPolicy
>  # DisconnectedSessionInfo
>  # FirstConnectionPolicy
>  # HandshakeTimeoutObject
>  # RoundRobinConnectionPolicy
>  # TcpCommunicationConnectionCheckFuture
>  # TcpCommunicationSpiMBeanImpl
> In addition, it contains logic of client connection life cycle, nio server 
> handler, and handshake handler.
> The classes above have cyclic dependencies and high coupling.The whole 
> mechanism works because classes have access to each other via parent class 
> references. As a result, initialization of class isn't consistent. By 
> consistent I mean that class created via constructor is ready to be used. All 
> of the classes work with context and shareproperties everywhere.
> Many methods of TcpCommunicationSpi don’t have a single responsibility. 
> Example is getNodeAttribute:,it makes client reservation,  takes the IP 
> address of the node and provides attributes.
> It works fine and we usually don’t have reasons to change anything. But if 
> you want to create a test that has a little different behavior than a 
> blocking message, you can't mock or change the behavior of inner classes. For 
> example, test covering change in the handshake process. Some people make test 
> methods in public API like "closeConnections" or "openSocketChannel" because 
> the current design isn't fine for it. It also takes a lot of time for test 
> development for minor changes.
> *Solution:*
> The scope of work is big and communication spi is place which should be 
> changed carefully. I recommend to make this refactoring step by step.
>  * The first idea is to split the parent class into independent classes and 
> move them to the internal package. We should achieveSOLID when it’s done.
>  * Extract spread logic to appropriate classes like ClientPool, 
> HandshakeHandler, etc.
>  * Make a common transfer object for TCSpi configuration.
>  * Make dependencies direct if it is possible.
>  * Initialize all dependencies in one place.
>  * Make child classes context-free.
>  * Try to do classes more testable.
>  * Use the idea of dependency injection without a framework for it.
> *Benefits:*
> With the ability to write truly jUnit-style tests and cover functionality 
> with better testing we get a way to easier develop new features and 
> optimizations needed in such low-level components as TcpCommunicationSpi.
> Examples of features that improve usability of Apache Ignite a lot are: 
> inverse communication connection with optimizations and connection 
> multiplexing. Both of the features could be used in environments with 
> restricted network connectivity (e.g. when connections between nodes could be 
> established only in one direction).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13013) Thick client must not open server sockets when used by serverless functions

2020-07-31 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17168832#comment-17168832
 ] 

Anton Kalashnikov commented on IGNITE-13013:


[~ibessonov] thanks for your changes. LGTM.

> Thick client must not open server sockets when used by serverless functions
> ---
>
> Key: IGNITE-13013
> URL: https://issues.apache.org/jira/browse/IGNITE-13013
> Project: Ignite
>  Issue Type: Improvement
>  Components: networking
>Affects Versions: 2.8
>Reporter: Denis A. Magda
>Assignee: Ivan Bessonov
>Priority: Critical
> Fix For: 2.10
>
> Attachments: image-2020-07-30-18-42-01-266.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> A thick client fails to start if being used inside of a serverless function 
> such as AWS Lamda or Azure Functions. Cloud providers prohibit opening 
> network ports to accept connections on the function's end. In short, the 
> function can only connect to a remote address.
> To reproduce, you can follow this tutorial and swap the thin client (used in 
> the tutorial) with the thick one: 
> https://www.gridgain.com/docs/tutorials/serverless/azure_functions_tutorial
> The thick client needs to support a mode when the communication SPI doesn't 
> create a server socket if the client is used for serverless computing. This 
> improvement looks like an extra task of this initiative: 
> https://issues.apache.org/jira/browse/IGNITE-12438



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13269) Waiting for completion of operations on indexes before cache stop

2020-07-22 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17162841#comment-17162841
 ] 

Anton Kalashnikov commented on IGNITE-13269:


[~ktkale...@gridgain.com] LGTM. [~sergey-chugunov] Can you help with merge, 
please?

> Waiting for completion of operations on indexes before cache stop
> -
>
> Key: IGNITE-13269
> URL: https://issues.apache.org/jira/browse/IGNITE-13269
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Kirill Tkalenko
>Assignee: Kirill Tkalenko
>Priority: Major
> Fix For: 2.10
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, there is no waiting for completion of operation on indexes when 
> cache is stopped. Because of this, there may be errors, for example, when 
> restarting the node:
> {code:java}
>   Suppressed: java.lang.AssertionError: Release pinned page: FullPageId 
> [pageId=000206bfc352, effectivePageId=06bfc352, grpId=-782612924]
>   at 
> org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$PagePool.releaseFreePage(PageMemoryImpl.java:1902)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$PagePool.access$2100(PageMemoryImpl.java:1773)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$ClearSegmentRunnable.run(PageMemoryImpl.java:2878)
>   ... 3 common frames omitted
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-11942) IGFS and Hadoop Accelerator Discontinuation

2020-07-09 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-11942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154421#comment-17154421
 ] 

Anton Kalashnikov commented on IGNITE-11942:


[~agoncharuk] can you take a look at these changes?
[~kuaw26], [~vsisko] can you take a look at the web-console part?

> IGFS and Hadoop Accelerator Discontinuation
> ---
>
> Key: IGNITE-11942
> URL: https://issues.apache.org/jira/browse/IGNITE-11942
> Project: Ignite
>  Issue Type: Task
>Reporter: Denis A. Magda
>Assignee: Anton Kalashnikov
>Priority: Blocker
> Fix For: 2.9
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The community has voted for the following decision:
> * IGFS and In-Memory Hadoop Accelerator components are to be discontinued and 
> no longer supported by the community 
> * The existing source code of IGFS and In-Memory Hadoop Accelerator is to be 
> removed from Ignite master. Before that, a special branch like 
> "ignite-igfs-and-hadoop-accelerator" to be forked off the master in order to 
> preserve the sources in Git history for those who might need it. 
> The voting thread:
> http://apache-ignite-developers.2346864.n4.nabble.com/VOTE-Complete-Discontinuation-of-IGFS-and-Hadoop-Accelerator-td42405.html
> Once the changes are made for Ignite 2.8, please contact Denis Magda to 
> update a public documentation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13013) Thick client must not open server sockets when used by serverless functions

2020-07-08 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153620#comment-17153620
 ] 

Anton Kalashnikov commented on IGNITE-13013:


[~dmagda], I think I agree that client-to-client connectivity is not soo useful 
in ignite. So I have another solution which looks pretty easy for 
implementation. We can add the possibility to set a communication port to 
-1(this means server socket shouldn't be open). And when the user sets this 
port to -1 we also set forceClientToServer to true. Also, we can add validation 
on establishing communication connection, and if we see that it is the 
client-to-client connection but the remote client doesn't support such 
connection we notify the user about it( exception is thrown) - as I understand 
this scenario mostly corresponds to the compute.

In conclusion, expected changes:
* Setting communication port to -1 is allowed
* If the communication port set to -1, forceClientToServer will set to true
* If the client tries to establish a connection with another client which port 
equal to -1, the exception will be thrown.

> Thick client must not open server sockets when used by serverless functions
> ---
>
> Key: IGNITE-13013
> URL: https://issues.apache.org/jira/browse/IGNITE-13013
> Project: Ignite
>  Issue Type: Improvement
>  Components: networking
>Affects Versions: 2.8
>Reporter: Denis A. Magda
>Priority: Critical
> Fix For: 2.9
>
>
> A thick client fails to start if being used inside of a serverless function 
> such as AWS Lamda or Azure Functions. Cloud providers prohibit opening 
> network ports to accept connections on the function's end. In short, the 
> function can only connect to a remote address.
> To reproduce, you can follow this tutorial and swap the thin client (used in 
> the tutorial) with the thick one: 
> https://www.gridgain.com/docs/tutorials/serverless/azure_functions_tutorial
> The thick client needs to support a mode when the communication SPI doesn't 
> create a server socket if the client is used for serverless computing. This 
> improvement looks like an extra task of this initiative: 
> https://issues.apache.org/jira/browse/IGNITE-12438



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   >