[jira] [Created] (IGNITE-14527) CVE-2021-2816[3,4,5] in Jetty
Alexander Belyak created IGNITE-14527: - Summary: CVE-2021-2816[3,4,5] in Jetty Key: IGNITE-14527 URL: https://issues.apache.org/jira/browse/IGNITE-14527 Project: Ignite Issue Type: Task Components: integrations Reporter: Alexander Belyak Assignee: Alexander Belyak Vulnerabilities found: [https://nvd.nist.gov/vuln/detail/CVE-2021-28163] [https://nvd.nist.gov/vuln/detail/CVE-2021-28164] [https://nvd.nist.gov/vuln/detail/CVE-2021-28165] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-11783) Open file limit for deb distribution
Alexander Belyak created IGNITE-11783: - Summary: Open file limit for deb distribution Key: IGNITE-11783 URL: https://issues.apache.org/jira/browse/IGNITE-11783 Project: Ignite Issue Type: Bug Components: persistence Affects Versions: 2.7 Environment: ubuntu-16.04 Reporter: Alexander Belyak Step to reproduce: 1) Install ignite from deb package on ubuntu 16.04 2) Start with persistence 3) Create 5 caches (or one with 4000+ partitions) Error text: {noformat} [18:29:44,369][INFO][exchange-worker-#43][GridCacheDatabaseSharedManager] Restoring partition state for local groups [cntPartStateWal=0, lastCheckpointId=bd24ff23-da6f-46e5-bafd-b643db3870d4] [18:29:51,864][SEVERE][exchange-worker-#43][] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureH andler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.processors.cache.persistence.StorageException: Failed to initialize partition file: /usr/s hare/apache-ignite/work/db/node00-f49af718-48da-4186-b664-62aca736bdc9/cache-SQL_PUBLIC_VERTEX_TBL/part-913.bin]] class org.apache.ignite.internal.processors.cache.persistence.StorageException: Failed to initialize partition file: /usr/share/apache-ignite/work/db/node00-f49af718-48da-4186-b664-62aca736bdc9/cache-SQL_PUBLIC_ VERTEX_TBL/part-913.bin at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore.init(FilePageStore.java:444) at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore.ensure(FilePageStore.java:650) at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.ensure(FilePageStoreManager.java:712) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restorePartitionStates(GridCacheDatabaseSharedManager.java:2472) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyLastUpdates(GridCacheDatabaseSharedManager.java:2419) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreState(GridCacheDatabaseSharedManager.java:1628) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.beforeExchange(GridCacheDatabaseSharedManager.java:1302) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1453) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:806) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2667) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2539) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at java.lang.Thread.run(Thread.java:748) Caused by: java.nio.file.FileSystemException: /usr/share/apache-ignite/work/db/node00-f49af718-48da-4186-b664-62aca736bdc9/cache-SQL_PUBLIC_VERTEX_TBL/part-913.bin: Too many open files at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileSystemProvider.newAsynchronousFileChannel(UnixFileSystemProvider.java:196) at java.nio.channels.AsynchronousFileChannel.open(AsynchronousFileChannel.java:248) at java.nio.channels.AsynchronousFileChannel.open(AsynchronousFileChannel.java:301) at org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIO.(AsyncFileIO.java:57) at org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIOFactory.create(AsyncFileIOFactory.java:53) at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore.init(FilePageStore.java:416) ... 12 more {noformat} It happen because systemd service description (/etc/systemd/system/apache-ignite@.service) didn't contain {noformat} LimitNOFILE=50 (possible with) LimitNPROC=50 {noformat} see: https://fredrikaverpil.github.io/2016/04/27/systemd-and-resource-limits/ Possible, installation script should also add: * "fs.file-max = 2097152" to "/etc/sysctl.conf" * into /etc/security/limits.conf: {noformat} * hardnofile 50 * softnofile 50 root hardnofile 50 root
[jira] [Created] (IGNITE-8407) Wrong memory size printing in IgniteCacheDatabaseSnaredManager
Alexander Belyak created IGNITE-8407: Summary: Wrong memory size printing in IgniteCacheDatabaseSnaredManager Key: IGNITE-8407 URL: https://issues.apache.org/jira/browse/IGNITE-8407 Project: Ignite Issue Type: Bug Components: general Affects Versions: 2.4 Reporter: Alexander Belyak In checkDataRegionSize regCfg printing in "si" format (based on 1000, not 1024). Need to fix it and any other usages of getInitialSize()/getMaxSize()) with U.readableSize(8, true) {noformat} throw new IgniteCheckedException("DataRegion must have size more than 10MB (use " + "DataRegionConfiguration.initialSize and .maxSize properties to set correct size in bytes) " + "[name=" + regCfg.getName() + ", initialSize=" + U.readableSize(regCfg.getInitialSize(), true) + ", maxSize=" + U.readableSize(regCfg.getMaxSize(), true) + "]" {noformat} should be replaced with {noformat} throw new IgniteCheckedException("DataRegion must have size more than 10MB (use " + "DataRegionConfiguration.initialSize and .maxSize properties to set correct size in bytes) " + "[name=" + regCfg.getName() + ", initialSize=" + U.readableSize(regCfg.getInitialSize(), false) + ", maxSize=" + U.readableSize(regCfg.getMaxSize(), false) + "]" {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8288) ScanQuery ignore readFromBackups
Alexander Belyak created IGNITE-8288: Summary: ScanQuery ignore readFromBackups Key: IGNITE-8288 URL: https://issues.apache.org/jira/browse/IGNITE-8288 Project: Ignite Issue Type: Bug Reporter: Alexander Belyak 1) Create partitioned cache on -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8286) ScanQuery ignore setLocal with non local partition
Alexander Belyak created IGNITE-8286: Summary: ScanQuery ignore setLocal with non local partition Key: IGNITE-8286 URL: https://issues.apache.org/jira/browse/IGNITE-8286 Project: Ignite Issue Type: Bug Affects Versions: 2.4 Reporter: Alexander Belyak 1) Create partitioned cache on 2+ nodes cluster 2) Select some partition N, local node should not be OWNER of partition N 3) execute: cache.query(new ScanQuery<>().setLocal(true).setPartition(N)) Expected result: empty result (probaply with logging smth like "Trying to execute local query with non local partition N") or even throw excedption Actual result: executing (with ScanQueryFallbackClosableIterator) query on remote node. Problem is that we execute local query on remote node. Same behaviour can be achieved if we get empty node list from GridCacheQueryAdapter.node() by any reasons, for example - if we run "local" query from non data node from given cache (see GridDiscoveryNamager.cacheAffinityNode(ClusterNode node, String cacheName) in GridcacheQueryAdapter.executeScanQuery() -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8119) NPE on clear DB and unclear WAL/WAL_ARCHIVE
Alexander Belyak created IGNITE-8119: Summary: NPE on clear DB and unclear WAL/WAL_ARCHIVE Key: IGNITE-8119 URL: https://issues.apache.org/jira/browse/IGNITE-8119 Project: Ignite Issue Type: Bug Components: general Affects Versions: 2.4 Reporter: Alexander Belyak Attachments: ClearTestP.java 1) Start grid (1 node will be enought), activate it and populate some data 2) Stop node and clear db folder 3) Start grid and activate it Expected result: Error about inconsistent storage configuration with/without start node with such store Actual result: Exchange-worker on node stop with NPE, this can hang whole cluster from complete any PME operations. {noformat} Failed to reinitialize local partitions (preloading will be stopped): GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=1, minorTopVer=1], ... java.lang.NullPointerException at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyUpdate(GridCacheDatabaseSharedManager.java:2354) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyLastUpdates(GridCacheDatabaseSharedManager.java:2099) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreState(GridCacheDatabaseSharedManager.java:1325) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.beforeExchange(GridCacheDatabaseSharedManager.java:1113) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1063) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:661) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2329) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at java.lang.Thread.run(Thread.java:748) {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8105) Close() while auto activate
Alexander Belyak created IGNITE-8105: Summary: Close() while auto activate Key: IGNITE-8105 URL: https://issues.apache.org/jira/browse/IGNITE-8105 Project: Ignite Issue Type: Bug Components: general Affects Versions: 2.4 Reporter: Alexander Belyak 1) Start one node, activate, fill some data 2) Close node 3) Start node and right after start call close() Expected result: Node start and close correctly (maybe with auto activate/deactivate, maybe without) Actual result: Node start and throw java.nio.channels.ClosedByInterruptException in activation process cause close() process close checkpoint file channel Expection is: {noformat} [2018-04-02 19:57:27,831][ERROR][exchange-worker-#94%srv1%][GridCachePartitionExchangeManager] Failed to wait for completion of partition map exchange (preloading will not start): GridDhtPartitionsExchangeFuture [firstDiscoEvt=DiscoveryCustomEvent [customMsg=null, affTopVer=AffinityTopologyVersion [topVer=1, minorTopVer=1], super=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=a9fa729f-613f-496b-8e7c-e53142817226, addrs=[0:0:0:0:0:0:0:1%lo, 10.0.3.1, 10.38.184.66, 10.42.1.107, 127.0.0.1, 172.17.0.1], sockAddrs=[/10.38.184.66:47500, /172.17.0.1:47500, /0:0:0:0:0:0:0:1%lo:47500, /10.0.3.1:47500, /10.42.1.107:47500, /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1522673846477, loc=true, ver=2.4.0#19700101-sha1:, isClient=false], topVer=1, nodeId8=a9fa729f, msg=null, type=DISCOVERY_CUSTOM_EVT, tstamp=1522673847732]], crd=TcpDiscoveryNode [id=a9fa729f-613f-496b-8e7c-e53142817226, addrs=[0:0:0:0:0:0:0:1%lo, 10.0.3.1, 10.38.184.66, 10.42.1.107, 127.0.0.1, 172.17.0.1], sockAddrs=[/10.38.184.66:47500, /172.17.0.1:47500, /0:0:0:0:0:0:0:1%lo:47500, /10.0.3.1:47500, /10.42.1.107:47500, /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1522673846477, loc=true, ver=2.4.0#19700101-sha1:, isClient=false], exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=1, minorTopVer=1], discoEvt=DiscoveryCustomEvent [customMsg=null, affTopVer=AffinityTopologyVersion [topVer=1, minorTopVer=1], super=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=a9fa729f-613f-496b-8e7c-e53142817226, addrs=[0:0:0:0:0:0:0:1%lo, 10.0.3.1, 10.38.184.66, 10.42.1.107, 127.0.0.1, 172.17.0.1], sockAddrs=[/10.38.184.66:47500, /172.17.0.1:47500, /0:0:0:0:0:0:0:1%lo:47500, /10.0.3.1:47500, /10.42.1.107:47500, /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1522673846477, loc=true, ver=2.4.0#19700101-sha1:, isClient=false], topVer=1, nodeId8=a9fa729f, msg=null, type=DISCOVERY_CUSTOM_EVT, tstamp=1522673847732]], nodeId=a9fa729f, evt=DISCOVERY_CUSTOM_EVT], added=true, initFut=GridFutureAdapter [ignoreInterrupts=false, state=DONE, res=false, hash=791289709], init=false, lastVer=null, partReleaseFut=PartitionReleaseFuture [topVer=AffinityTopologyVersion [topVer=1, minorTopVer=1], futures=[ExplicitLockReleaseFuture [topVer=AffinityTopologyVersion [topVer=1, minorTopVer=1], futures=[]], TxReleaseFuture [topVer=AffinityTopologyVersion [topVer=1, minorTopVer=1], futures=[]], AtomicUpdateReleaseFuture [topVer=AffinityTopologyVersion [topVer=1, minorTopVer=1], futures=[]], DataStreamerReleaseFuture [topVer=AffinityTopologyVersion [topVer=1, minorTopVer=1], futures=[, exchActions=null, affChangeMsg=null, initTs=1522673847742, centralizedAff=false, changeGlobalStateE=class o.a.i.IgniteCheckedException: Failed to read checkpoint pointer from marker file: /tmp/test/srv1/db/cons_srv1/cp/1522673842894-84776dc9-6fac-4aa0-804c-f56cbee68c12-START.bin, done=true, state=CRD, evtLatch=0, remaining=[], super=GridFutureAdapter [ignoreInterrupts=false, state=DONE, res=class o.a.i.IgniteCheckedException: Failed to read checkpoint pointer from marker file: /tmp/test/srv1/db/cons_srv1/cp/1522673842894-84776dc9-6fac-4aa0-804c-f56cbee68c12-START.bin, hash=1311860231]] class org.apache.ignite.IgniteCheckedException: Failed to read checkpoint pointer from marker file: /tmp/test/srv1/db/cons_srv1/cp/1522673842894-84776dc9-6fac-4aa0-804c-f56cbee68c12-START.bin at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readPointer(GridCacheDatabaseSharedManager.java:1794) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointStatus(GridCacheDatabaseSharedManager.java:1764) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreState(GridCacheDatabaseSharedManager.java:1321) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.beforeExchange(GridCacheDatabaseSharedManager.java:1114) at
[jira] [Created] (IGNITE-8103) Node with BLT is not allowed to join cluster without one
Alexander Belyak created IGNITE-8103: Summary: Node with BLT is not allowed to join cluster without one Key: IGNITE-8103 URL: https://issues.apache.org/jira/browse/IGNITE-8103 Project: Ignite Issue Type: Improvement Components: general Affects Versions: 2.4 Reporter: Alexander Belyak 1) Start cluster of 2-3 nodes and activate it, fill some data 2) Stop cluster, clear LFS on first node 3) Start cluster from first node (or start all nodes synchronously) Expected result: ? Actual result: "Node with set up BaselineTopology is not allowed to join cluster without one: cons_srv2" In the technical point of view it's expected behaviour, because first node with cleared storage became grid coordinator and reject any connection attempts from nodes with different baseline. But it's bad for usability: if we always start all nodes together and wanna clear storage on one node by some reason - we need to define start sequence. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8066) Reset wal segment idx
Alexander Belyak created IGNITE-8066: Summary: Reset wal segment idx Key: IGNITE-8066 URL: https://issues.apache.org/jira/browse/IGNITE-8066 Project: Ignite Issue Type: New Feature Components: general Affects Versions: 2.4 Reporter: Alexander Belyak 1) On activation grid read checkpoint status with segment idx=7742: 2018-03-21 02:34:04.465[INFO ][exchange-worker-#152%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture] Successfully activated caches [nodeId=9c0c2e76-fb7f-46df-8b0b-3379d0c91db9, clie nt=false, topVer=AffinityTopologyVersion [topVer=161, minorTopVer=1]] 2018-03-21 02:34:04.479[INFO ][exchange-worker-#152%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture] Finished waiting for partition release future [topVer=AffinityTopologyVersion [t opVer=161, minorTopVer=1], waitTime=0ms, futInfo=NA] 2018-03-21 02:34:04.487[INFO ][exchange-worker-#152%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.p.GridCacheDatabaseSharedManager] Read checkpoint status [startMarker=/gridgain/ssd/data/10_126_1_172_47500/cp/15215870 60132-aafbf88b-f783-40e8-8e3c-ef60cd383e21-START.bin, endMarker=/gridgain/ssd/data/10_126_1_172_47500/cp/1521587060132-aafbf88b-f783-40e8-8e3c-ef60cd383e21-END.bin] 2018-03-21 02:34:04.488[INFO ][exchange-worker-#152%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.p.GridCacheDatabaseSharedManager] Applying lost cache updates since last checkpoint record [lastMarked=FileWALPointer [ idx=7742, fileOff=1041057120, len=1470746], lastCheckpointId=aafbf88b-f783-40e8-8e3c-ef60cd383e21] 2) but right after it (with only two metrics messages in log between it) write checkpoint with wal segment idx=0 2018-03-21 02:35:21.875[INFO ][exchange-worker-#152%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.p.GridCacheDatabaseSharedManager] Finished applying WAL changes [updatesApplied=0, time=77388ms] 2018-03-21 02:35:22.386[INFO ][db-checkpoint-thread-#243%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.p.GridCacheDatabaseSharedManager] Checkpoint started [checkpointId=8cf946e6-a718-4388-8bef-c76bf79d93cd, startPtr= FileWALPointer [idx=0, fileOff=77196029, len=450864], checkpointLockWait=0ms, checkpointLockHoldTime=422ms, pages=16379, reason='node started'] 2018-03-21 02:35:25.934[INFO ][db-checkpoint-thread-#243%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.p.GridCacheDatabaseSharedManager] Checkpoint finished [cpId=8cf946e6-a718-4388-8bef-c76bf79d93cd, pages=16379, mar kPos=FileWALPointer [idx=0, fileOff=77196029, len=450864], walSegmentsCleared=0, markDuration=508ms, pagesWrite=155ms, fsync=3391ms, total=4054ms] Then we get some AssertionError while trying to archive wal segment 0 when lastArchivedIdx=7742 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7995) Assertion on GridGhtPartitionDemandMessage
Alexander Belyak created IGNITE-7995: Summary: Assertion on GridGhtPartitionDemandMessage Key: IGNITE-7995 URL: https://issues.apache.org/jira/browse/IGNITE-7995 Project: Ignite Issue Type: New Feature Components: general Affects Versions: 2.4 Reporter: Alexander Belyak After applying new baseline topology get Failed processing message [sender=..., msg=GridGhtPartitionDemandMessage[updateSeq=10524, timeout=1, workerId=-1, topVer=ArrinityTopologyVersion [topVer=170, minorTopVer=1], partCnt=1, super=GridCacheGroupIdMessage [grpId=-1029020343]]] java.lang.AssertionError: partCntr=5338946, reservations=Map [] from GridCacheOffheapManager.rebalanceIterator:704 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7951) Add metrics for remains to evict keys/partitions
Alexander Belyak created IGNITE-7951: Summary: Add metrics for remains to evict keys/partitions Key: IGNITE-7951 URL: https://issues.apache.org/jira/browse/IGNITE-7951 Project: Ignite Issue Type: New Feature Components: general Affects Versions: 2.4 Reporter: Alexander Belyak Need to add some metrics for remains to evict keys/partitions to indicate total amount of evicting work. In some cases we have synchronous eviction and it's critically important to know how many keys need to be evicted before exchange process end and cluster became working again. In some other cases we just wanna know what happens in cluster now (background eviction without workload) and when cluster will became 100% healthy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7892) Remove aquirence of any locks from toString methods
Alexander Belyak created IGNITE-7892: Summary: Remove aquirence of any locks from toString methods Key: IGNITE-7892 URL: https://issues.apache.org/jira/browse/IGNITE-7892 Project: Ignite Issue Type: Wish Components: general Affects Versions: 2.4 Reporter: Alexander Belyak In org.apache.ignite.internal.processors.cache.GridCacheMapEntry we have thread safe toString() method that can lead to some hangs of monitoring threads like grid-timeout-worker if we try to dump LongRunningOperations with locked entry. I think that toString methods will never need to be a thread safe and can throw ConcurrentModificationException or print inconsistent data, so we must remove synchronization from every toString methods in codebase. If we need some "consistent" string representation - let's add consistentToString methods or do external synchronization. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7776) Check calculated values in javadoc
Alexander Belyak created IGNITE-7776: Summary: Check calculated values in javadoc Key: IGNITE-7776 URL: https://issues.apache.org/jira/browse/IGNITE-7776 Project: Ignite Issue Type: Bug Components: documentation Affects Versions: 2.3, 2.2, 2.1, 2.0 Reporter: Alexander Belyak Assignee: Alexander Belyak We have two issue with calculated value in javadoc: 1) wrong numbers, for example: #\{5 * 1024 * 102 * 1024} 2) overflow int type, for example: #\{5 * 1024 * 1024 * 1024} Need to check as many places as possible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7765) walSegmentSize can be negative in config
Alexander Belyak created IGNITE-7765: Summary: walSegmentSize can be negative in config Key: IGNITE-7765 URL: https://issues.apache.org/jira/browse/IGNITE-7765 Project: Ignite Issue Type: Bug Components: general Affects Versions: 2.2 Reporter: Alexander Belyak Grid use default (64Mb) DataStorageConfiguration.walSegmentSize without warnings if negative value specified, for example if in xml specified something like (overflow) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7760) Handle FS hangs
Alexander Belyak created IGNITE-7760: Summary: Handle FS hangs Key: IGNITE-7760 URL: https://issues.apache.org/jira/browse/IGNITE-7760 Project: Ignite Issue Type: Improvement Components: general Affects Versions: 2.2, 2.1, 2.0, 1.9, 1.8, 1.7, 1.6 Reporter: Alexander Belyak Need to handle FS operations hangs, for example - copy WAL into wal archive (specially if wal archive mount as network file system volume). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7684) Ignore IGNITE_USE_ASYNC_FILE_IO_FACTORY in FileWriteAheadLogManager
Alexander Belyak created IGNITE-7684: Summary: Ignore IGNITE_USE_ASYNC_FILE_IO_FACTORY in FileWriteAheadLogManager Key: IGNITE-7684 URL: https://issues.apache.org/jira/browse/IGNITE-7684 Project: Ignite Issue Type: Improvement Components: general Affects Versions: 2.4 Reporter: Alexander Belyak If IGNITE_USE_ASYNC_FILE_IO_FACTORY specified and no IGNITE_WAL_MMAP we get: {noformat} java.lang.UnsupportedOperationException: AsynchronousFileChannel doesn't support mmap. at org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIO.map(AsyncFileIO.java:173) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.restoreWriteHandle(FileWriteAheadLogManager.java:1068) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.resumeLogging(FileWriteAheadLogManager.java:552) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointAndRestoreMemory(GridCacheDatabaseSharedManager.java:714) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onClusterStateChangeRequest(GridDhtPartitionsExchangeFuture.java:841) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:595) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2329) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at java.lang.Thread.run(Thread.java:748) {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7608) Sort keys in putAll/removeAll methods
Alexander Belyak created IGNITE-7608: Summary: Sort keys in putAll/removeAll methods Key: IGNITE-7608 URL: https://issues.apache.org/jira/browse/IGNITE-7608 Project: Ignite Issue Type: Improvement Components: general Affects Versions: 2.2, 2.1, 2.0 Environment: We need to sort keys in cache putAll/removeAll operations to avoid deadlocks there. Reporter: Alexander Belyak -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7565) Remove IgniteSet from heap
Alexander Belyak created IGNITE-7565: Summary: Remove IgniteSet from heap Key: IGNITE-7565 URL: https://issues.apache.org/jira/browse/IGNITE-7565 Project: Ignite Issue Type: Bug Components: general Affects Versions: 2.2 Reporter: Alexander Belyak IgniteSet store all data in durable memory and in java heap. It's not good for big clusters and big sets, so we need to remove values from heap. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7564) Document IgniteSet memory consumption
Alexander Belyak created IGNITE-7564: Summary: Document IgniteSet memory consumption Key: IGNITE-7564 URL: https://issues.apache.org/jira/browse/IGNITE-7564 Project: Ignite Issue Type: Bug Components: documentation Affects Versions: 2.2 Reporter: Alexander Belyak We need to document onheap memory consumption of IgniteSet collections (all values stored in durable memory AND in java heap). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7478) Too many HistoryAffinityAssignments in HistAffAssignmentsCache
Alexander Belyak created IGNITE-7478: Summary: Too many HistoryAffinityAssignments in HistAffAssignmentsCache Key: IGNITE-7478 URL: https://issues.apache.org/jira/browse/IGNITE-7478 Project: Ignite Issue Type: Bug Affects Versions: 2.4 Reporter: Alexander Belyak Get throuble with GC, found over 26k instances of org.apache.ignite.internal.processors.affinity.HistoryAffinityAssignment with about 12Gb of: ArrayList->Object[]->ArrayList->Object[] but can't find ClusterNode objects there! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7448) destroy*() API for datastructures
Alexander Belyak created IGNITE-7448: Summary: destroy*() API for datastructures Key: IGNITE-7448 URL: https://issues.apache.org/jira/browse/IGNITE-7448 Project: Ignite Issue Type: Bug Affects Versions: 2.3, 2.2, 2.1, 2.0, 1.9, 1.8, 1.7, 1.6 Environment: In public API we have ignite.destroyCache(String) and ingite.destroyCaches(Collection) methods to destroy cache by name(s) and ignite.services().cancel(String) to undeploy services, but no method for data structures like: * destroyAtomicSequence() * destroyAtomicLong() * destroyAtomicReference() * destroyAtomicStamped() * destroyQueue() * destroySet() * destroySemaphore() * destroyCountDownLatch() Reporter: Alexander Belyak -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7385) Fix GridToStringBuilder
Alexander Belyak created IGNITE-7385: Summary: Fix GridToStringBuilder Key: IGNITE-7385 URL: https://issues.apache.org/jira/browse/IGNITE-7385 Project: Ignite Issue Type: Bug Reporter: Alexander Belyak Assignee: Semen Boikov Need to review and merge ignite-7195-hotfix. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-7246) MarshallerContextimpl.putAtIndex
Alexander Belyak created IGNITE-7246: Summary: MarshallerContextimpl.putAtIndex Key: IGNITE-7246 URL: https://issues.apache.org/jira/browse/IGNITE-7246 Project: Ignite Issue Type: Bug Components: general Affects Versions: 2.4 Reporter: Alexander Belyak Priority: Minor 1) putAtIndex in org.apache.ignite.internal.MarshallerContextImpl contains code for unordered insertion, but it didn't work (add only into tail of allCaches collection). Test: {panel} public static void main(String[] args) { ArrayList> all = new ArrayList<>(); ConcurrentMap m0 = new ConcurrentHashMap<>(); ConcurrentMap m1 = new ConcurrentHashMap<>(); putAtIndex(m1, all,(byte)1, all.size()); putAtIndex(m0, all, (byte)0, all.size()); System.out.println(all.get(0)==m0); System.out.println(all.get(1)==m1); System.out.println(all.size()); } {panel} 2) Interface Collection is unordered (javadoc: "Some are ordered and others unordered") so its better to use List interface; 3) putAtIndex called only from getCacheFor(byte) method from synchro block so it can get size of allCaches by itself -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-7146) Assertion in GridCacheTxFinishSync
Alexander Belyak created IGNITE-7146: Summary: Assertion in GridCacheTxFinishSync Key: IGNITE-7146 URL: https://issues.apache.org/jira/browse/IGNITE-7146 Project: Ignite Issue Type: Bug Affects Versions: 2.1 Reporter: Alexander Belyak Got assertion error in clear log: {noformat} 2017-12-07 17:24:10.358 [ERROR][sys-#2376%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.GridClosureProcessor] Closure execution failed with error. java.lang.AssertionError: null at org.apache.ignite.internal.processors.cache.distributed.GridCacheTxFinishSync$TxFinishSync.onSend(GridCacheTxFinishSync.java:250) at org.apache.ignite.internal.processors.cache.distributed.GridCacheTxFinishSync$ThreadFinishSync.onSend(GridCacheTxFinishSync.java:163) at org.apache.ignite.internal.processors.cache.distributed.GridCacheTxFinishSync.onFinishSend(GridCacheTxFinishSync.java:70) at org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager.beforeFinishRemote(IgniteTxManager.java:1522) at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.finish(GridNearTxFinishFuture.java:750) at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.finish(GridNearTxFinishFuture.java:690) at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.finish(GridNearTxFinishFuture.java:430) at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.rollbackNearTxLocalAsync(GridNearTxLocal.java:3314) at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.access$4900(GridNearTxLocal.java:122) at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal$26.run(GridNearTxLocal.java:4130) at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6685) at org.apache.ignite.internal.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:827) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2017-12-07 17:24:10.358 [ERROR][sys-#2376%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.GridClosureProcessor] Runtime error caught during grid runnable execution: GridWorker [name=closure-proc-worker, igniteInstanceName=DPL_GRID%DplGridNodeName, finished=false, hashCode=1220995949, interrupted=false, runner=sys-#2376%DPL_GRID%DplGridNodeName%] {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-7130) Simplify message related code in communication
Alexander Belyak created IGNITE-7130: Summary: Simplify message related code in communication Key: IGNITE-7130 URL: https://issues.apache.org/jira/browse/IGNITE-7130 Project: Ignite Issue Type: Improvement Components: general Affects Versions: 2.1 Reporter: Alexander Belyak Priority: Minor All code, auto generated in org.apache.ignite.plugin.extensions.communication.Message implementation by MessageCodeGenerator should be annotated with link to generator. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-7078) *Names() API for datastructures
Alexander Belyak created IGNITE-7078: Summary: *Names() API for datastructures Key: IGNITE-7078 URL: https://issues.apache.org/jira/browse/IGNITE-7078 Project: Ignite Issue Type: Wish Components: general Affects Versions: 2.3, 2.2, 2.1, 2.0, 1.9, 1.8, 1.7, 1.6 Reporter: Alexander Belyak In public API we have ignite.cacheNames() method to get all cache names and ignite.services().serviceDescriptors() to get services, but no method for data structures like: * atomicSequenceNames() * atomicLongNames() * atomicReferenceNames() * atomicStampedNames() * queueNames() * setNames() * semaphoreNames() * countDownLatchNames() -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-7076) NPE while stopping with GridDhtLockFuture
Alexander Belyak created IGNITE-7076: Summary: NPE while stopping with GridDhtLockFuture Key: IGNITE-7076 URL: https://issues.apache.org/jira/browse/IGNITE-7076 Project: Ignite Issue Type: Bug Affects Versions: 2.1 Reporter: Alexander Belyak Priority: Minor Get NPE after "Stopped cache" msg {noformat} 2017-11-29 08:18:20.994 [ERROR][grid-timeout-worker-#119%DPL_GRID%DplGridNodeName%][o.a.i.i.p.t.GridTimeoutProcessor] Error when executing timeout callback: LockTimeoutObject [] java.lang.NullPointerException: null at org.apache.ignite.internal.processors.cache.GridCacheContext.loadPreviousValue(GridCacheContext.java:1446) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.loadMissingFromStore(GridDhtLockFuture.java:1030) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.onComplete(GridDhtLockFuture.java:731) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.access$900(GridDhtLockFuture.java:82) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture$LockTimeoutObject.onTimeout(GridDhtLockFuture.java:1133) at org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor$TimeoutWorker.body(GridTimeoutProcessor.java:163) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at java.lang.Thread.run(Thread.java:748) {noformat} because in GridCacheContext.java:1446 tryint to read from cacheCfg local variable, but cacheCfg was zeroed out while cache stopping. Probability of such error will be significantly lowered if in GridDhtLockFuture.LockTimeoutObject.onTimeout we pass actual value of nodeStopping flag (GridGhtLockFuture:1133) instead of hardcoded false. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-6977) Wrong initial BitSet size in GridPartitionStateMap
Alexander Belyak created IGNITE-6977: Summary: Wrong initial BitSet size in GridPartitionStateMap Key: IGNITE-6977 URL: https://issues.apache.org/jira/browse/IGNITE-6977 Project: Ignite Issue Type: Bug Components: general Affects Versions: 2.1 Reporter: Alexander Belyak In constructor of org.apache.ignite.internal.utilGridPartitionStateMap(int parts) { states = new BitSet(parts); } we initialize BitSet with part bit, but use private static final int BITS for each partition state. As result long[] in BitSet get difficult predictable size (depends of access order it can be exact as needed or almost twice bigger with at least one additional array copying) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-6962) Reduce ExchangeHistory memory consumption
Alexander Belyak created IGNITE-6962: Summary: Reduce ExchangeHistory memory consumption Key: IGNITE-6962 URL: https://issues.apache.org/jira/browse/IGNITE-6962 Project: Ignite Issue Type: Bug Affects Versions: 2.1 Reporter: Alexander Belyak GridDhtPartitionExchangeManager$ExhcangeFutureSet store huge message GridDhtPartitionsFullMessage with IgniteDhtPartitionCountersMap2 for each cache group with two long[partCount]. If we have big grid (100+ nodes) with large amount of cacheGroups and partitions in CachePartitionFullCountersMap(long[] initialUpdCntrs; long[] updCntrs;) *<2
[jira] [Created] (IGNITE-6958) Reduce FilePageStore allocation on start
Alexander Belyak created IGNITE-6958: Summary: Reduce FilePageStore allocation on start Key: IGNITE-6958 URL: https://issues.apache.org/jira/browse/IGNITE-6958 Project: Ignite Issue Type: Bug Components: general Affects Versions: 2.1 Reporter: Alexander Belyak On cache start ignite create FilePageStore for all partition in CacheGroup, even if that partition never assigned to particular node. See FilePageStoreManager.initForCache method. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-6898) Datastreamers can lead to OOM on server side
Alexander Belyak created IGNITE-6898: Summary: Datastreamers can lead to OOM on server side Key: IGNITE-6898 URL: https://issues.apache.org/jira/browse/IGNITE-6898 Project: Ignite Issue Type: Bug Security Level: Public (Viewable by anyone) Components: general Affects Versions: 2.1 Reporter: Alexander Belyak If grid server node process many datastreamer in same time (from many clients, with many cache backups and persistence, i.e. if processing take some time) it can lead to OutOfMemoryError in server JVM. To fix we can: 1) specify buffer sized in bytes instead of entries 2) use pageMemory to store streamer buffers I get this problem on 16 server node grid with 45g heap each and 15 clients with 2 datastreamer each with this settings: autoFlushFrequency=0 allowOverwrite=false perNodeParallelOperations=8 perNodeBufferSize=1 Each client have 64g heap. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-6866) Allocate offheap on client
Alexander Belyak created IGNITE-6866: Summary: Allocate offheap on client Key: IGNITE-6866 URL: https://issues.apache.org/jira/browse/IGNITE-6866 Project: Ignite Issue Type: Bug Security Level: Public (Viewable by anyone) Components: general Affects Versions: 2.1 Reporter: Alexander Belyak Often client use the same config file as a server and ignite start offheap memory for client too... but never use it. How it happens: 1) Default memory configuration for server is creating in IgnitionEx.initializeConfiguration() method: if (!myCfg.isClientMode() && myCfg.getMemoryConfiguration() == null) so if ignite configuration already contains memoryConfiguration - it stay there 2) In IgniteCacheDatabaseSharedManager.anActivate method do nothing only: if (cctx.kernalContext().clientNode() && cctx.kernalContext().config().getMemoryConfiguration() == null) return; So if ignite configuration contains memory configuration - it will be allocated. Why its not good: 1) Memory allocation spend virtual memory (OS didn't really allocate memory before first access to it) and if overcommit_memory strategy is set to OVERCOMMIT_NEVER - it can block start client node (maybe first or second one) in same host (see: /proc/sys/vm/overcommit_memory and /proc/sys/vm/overcommit_ratio) 2) In IgniteKernal.checkPhysicalRam() we use maxSize of offheap memory and log warning about memory overusage Good news only one - often in memory configuration really big only maxSize, but initialSize is just about 256Mb so each client really allocate not so many RAM. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-6832) handle IO errors while checkpointing
Alexander Belyak created IGNITE-6832: Summary: handle IO errors while checkpointing Key: IGNITE-6832 URL: https://issues.apache.org/jira/browse/IGNITE-6832 Project: Ignite Issue Type: Bug Security Level: Public (Viewable by anyone) Affects Versions: 2.1 Reporter: Alexander Belyak If we get some IO error (like "No spece left on device") during checkpointing (GridCacheDatabaseSharedManager$WriteCheckpointPages:2509) node didn't stop as when get same error while writting WAL log and clients will get some "Long running cache futures". We must stop node in this case! Better - add some internal healthcheck and stop node anyway if it won't pass for few times (do it with different issue). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-6825) Unhandled interruption in GridH2Table
Alexander Belyak created IGNITE-6825: Summary: Unhandled interruption in GridH2Table Key: IGNITE-6825 URL: https://issues.apache.org/jira/browse/IGNITE-6825 Project: Ignite Issue Type: Bug Security Level: Public (Viewable by anyone) Affects Versions: 2.1 Reporter: Alexander Belyak Priority: Blocker In GridH2Table.lock(Ses, excl, force) method we: 1) put session in sessions table; 2) add lock in H2 session locks 3) try to Lock(excl), but if in GridH2Table.lock(excl):277 while thread in lock.lockInterruptiblu() it got interruption - session with lock still alive in GridH2Table sessions map but no really lock acquired and when session will trying to unlock all acquired locks it will try to unlock it too and we get exception: {noformat} [ERROR][pub-#3855%DPL_GRID%DplGridNodeName%][o.a.i.i.p.q.h.t.GridMapQueryExecutor] Failed to run map query on local node. org.apache.ignite.IgniteCheckedException: Failed to execute SQL query. at org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.executeSqlQuery(IgniteH2Indexing.java:970) at org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.executeSqlQueryWithTimer(IgniteH2Indexing.java:1029) at org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.executeSqlQueryWithTimer(IgniteH2Indexing.java:1008) at org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest0(GridMapQueryExecutor.java:660) at org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest(GridMapQueryExecutor.java:506) at org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onMessage(GridMapQueryExecutor.java:206) at org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor$1.applyx(GridReduceQueryExecutor.java:145) at org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor$1.applyx(GridReduceQueryExecutor.java:143) at org.apache.ignite.internal.util.lang.IgniteInClosure2X.apply(IgniteInClosure2X.java:38) at org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.send(IgniteH2Indexing.java:2066) at org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.send(GridReduceQueryExecutor.java:1273) at org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.query(GridReduceQueryExecutor.java:733) at org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing$8.iterator(IgniteH2Indexing.java:1214) at org.apache.ignite.internal.processors.cache.QueryCursorImpl.iterator(QueryCursorImpl.java:95) at org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing$9.iterator(IgniteH2Indexing.java:1256) at org.apache.ignite.internal.processors.cache.QueryCursorImpl.iterator(QueryCursorImpl.java:95) at com.sbt.dpl.gridgain.collection.dataselectors.executor.cachequery.IgniteCacheQueryExecutor.iterator(IgniteCacheQueryExecutor.java:131) at com.sbt.dpl.gridgain.collection.dataselectors.executor.cachequery.impl.SqlQueryExecutor.iterator(SqlQueryExecutor.java:58) at com.sbt.dpl.gridgain.collection.dataselectors.executor.cachequery.impl.SqlQueryExecutor.iterator(SqlQueryExecutor.java:23) at com.sbt.dpl.gridgain.collection.dataselectors.impl.H2IndexesDataSelector.binaryIterator(H2IndexesDataSelector.java:142) at com.sbt.dpl.gridgain.collection.dataselectors.AbstractDataSelector.getIterator(AbstractDataSelector.java:110) at com.sbt.dpl.gridgain.collection.dataselectors.IndexesSwitchSelectDataSelector.getIterator(IndexesSwitchSelectDataSelector.java:106) at com.sbt.dpl.gridgain.collection.base.GGAbstractCollectionWithDataSelector.iterator(GGAbstractCollectionWithDataSelector.java:390) at ru.sbt.deposit_pf_api.comparators.EntityService.findDepositByProduct(EntityService.java:846) at ru.sbt.deposit_pf_api.comparators.EntityService.findDepositByProduct(EntityService.java:807) at ru.sbt.deposit_pf_api.comparators.EntityService.getDepositByObjectInner(EntityService.java:1350) at ru.sbt.deposit_pf_api.comparators.EntityService.getDepositByObject(EntityService.java:1169) at ru.sbt.deposit_pf_api.comparators.EntityService.getGroupingObject(EntityService.java:1098) at ru.sbt.deposit_pf_api.comparators.UnknownClassMapFunction$FindUnknownMapFunctionPredicate.apply(UnknownClassMapFunction.java:183) at ru.sbt.deposit_pf_api.comparators.UnknownClassMapFunction$FindUnknownMapFunctionPredicate.apply(UnknownClassMapFunction.java:1) at ru.sbt.deposit_pf_api.CollectionUtils.filter(CollectionUtils.java:55) at
[jira] [Created] (IGNITE-6817) CME in GridCacheIoManager.cacheHandlers access
Alexander Belyak created IGNITE-6817: Summary: CME in GridCacheIoManager.cacheHandlers access Key: IGNITE-6817 URL: https://issues.apache.org/jira/browse/IGNITE-6817 Project: Ignite Issue Type: Bug Security Level: Public (Viewable by anyone) Components: general Affects Versions: 2.1 Reporter: Alexander Belyak Got exception: {noformat} java.util.ConcurrentModificationException: null at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437) at java.util.HashMap$EntryIterator.next(HashMap.java:1471) at java.util.HashMap$EntryIterator.next(HashMap.java:1469) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:355) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99) at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293) at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1562) at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1190) at org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:126) at org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1097) at org.apache.ignite.internal.util.StripedExecutor$Stripe.run(StripedExecutor.java:505) at java.lang.Thread.run(Thread.java:748) {noformat} becouse in GridCacheIoManager.handleMessage access to GridCacheIoManager.cacheHandles protected by GridCacheIoManager.rw.readLock, but in GridCacheIoManager.addHandler same collection modify without rw.writeLock accuiring and idxClsHandlers is just HashMap in GridCacheioManager.MessageHandlers class. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-6798) Ignite start without WAL with no exceptions
Alexander Belyak created IGNITE-6798: Summary: Ignite start without WAL with no exceptions Key: IGNITE-6798 URL: https://issues.apache.org/jira/browse/IGNITE-6798 Project: Ignite Issue Type: Bug Security Level: Public (Viewable by anyone) Affects Versions: 2.1 Reporter: Alexander Belyak Priority: Critical Ignite start without any WAL log files. Step to reproduce: 1) Start node with persistence (WAL_MODE != NONE) 2) Create cache with some data 3) Stop node 4) Delete WAL 5) Start node Expected: If last checkpoint was finished - start with error in log If last checkpoint wasn't finished - LFS can be corrupted so, maybe, we shouldn't start at all (with some message like "if you really wan't to start with possible corrupt database just remove last CP_start marker) Actual: Start without any errors/warnings. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-6797) Handle IO errors in LFS files
Alexander Belyak created IGNITE-6797: Summary: Handle IO errors in LFS files Key: IGNITE-6797 URL: https://issues.apache.org/jira/browse/IGNITE-6797 Project: Ignite Issue Type: Bug Security Level: Public (Viewable by anyone) Affects Versions: 2.1 Reporter: Alexander Belyak Priority: Minor If some thread was interrupted while IO operation with LFS file (for example - read page) then JVM close FileChannel of such file and mark it as closed by interrupt. If next thread try to load any page from closed file it get ClosedChannelException, but PageMemoryImpl first register page in segment FillPageIdTable loadedPages and didn't clear it after IO error, so third thread will find empty page in it and throw Unknown page type: 0 IgniteCheckedException. To fix it we should try to restore FileChannel after ClosedChannelException (for first time) and stop node if we get any other exception or get some error while reopening by ClosedChannelException in FilePageStore. Read from closed channel exception: {noformat} at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.google.common.eventbus.EventSubscriber.handleEvent(EventSubscriber.java:74) at com.google.common.eventbus.EventBus.dispatch(EventBus.java:322) at com.google.common.eventbus.AsyncEventBus.access$001(AsyncEventBus.java:34) at com.google.common.eventbus.AsyncEventBus$1.run(AsyncEventBus.java:117) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.ignite.IgniteCheckedException: Runtime failure on lookup row: org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$SearchRow@5678e76a at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(BPlusTree.java:1070) at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.find(IgniteCacheOffheapManagerImpl.java:1476) at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.find(GridCacheOffheapManager.java:1276) at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.read(IgniteCacheOffheapManagerImpl.java:406) at org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAllAsync0(GridCacheAdapter.java:1902) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.getDhtAllAsync(GridDhtCacheAdapter.java:780) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtGetSingleFuture.getAsync(GridDhtGetSingleFuture.java:360) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtGetSingleFuture.map0(GridDhtGetSingleFuture.java:254) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtGetSingleFuture.map(GridDhtGetSingleFuture.java:237) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtGetSingleFuture.init(GridDhtGetSingleFuture.java:161) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.getDhtSingleAsync(GridDhtCacheAdapter.java:878) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.processNearSingleGetRequest(GridDhtCacheAdapter.java:892) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter$2.apply(GridDhtTransactionalCacheAdapter.java:131) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter$2.apply(GridDhtTransactionalCacheAdapter.java:129) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1060) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99) at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293) at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1562) at
[jira] [Created] (IGNITE-6759) URL not using in http rest API
Alexander Belyak created IGNITE-6759: Summary: URL not using in http rest API Key: IGNITE-6759 URL: https://issues.apache.org/jira/browse/IGNITE-6759 Project: Ignite Issue Type: Bug Security Level: Public (Viewable by anyone) Components: general Affects Versions: 2.2, 2.1, 2.0 Reporter: Alexander Belyak Fix For: 3.0 In http rest API I cat send: curl "http://localhost:8080/ignite?cmd=get; {"successStatus":1,"sessionToken":null,"error":"Failed to handle request: [req=CACHE_GET, err=Failed to find mandatory parameter in request: key]","response":null} and curl "http://localhost:8080/ignite2/2/2/2/2/?cmd=get; {"successStatus":1,"sessionToken":null,"error":"Failed to handle request: [req=CACHE_GET, err=Failed to find mandatory parameter in request: key]","response":null} With same result, i.e. we didn't test whole request URL (only /ignite prefix is mandatory). Btw - its REST antipattern to use single URL to do anything (set ignite version 3.0 as fix version to be able to change API): http://www.restapitutorial.com/lessons/restfulresourcenaming.html -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-6758) Slow memory releasing while deactivation
Alexander Belyak created IGNITE-6758: Summary: Slow memory releasing while deactivation Key: IGNITE-6758 URL: https://issues.apache.org/jira/browse/IGNITE-6758 Project: Ignite Issue Type: Bug Security Level: Public (Viewable by anyone) Components: general Affects Versions: 2.1 Reporter: Alexander Belyak Priority: Minor Fix For: 2.4 It take about 1 minutes to fill each page by 0 and release it to page pool in PageMemoryImpl.ClearSegmentRunnable() from GridCacheDatabaseSharedManager.onCacheGroupsStopped(). When we have 100+M pages in hundred of Gb of pageCache in take quite long to GridUnsafe.setMemory to 0 and in logs we get lot of "Failed to wait for partition map exchange [topVer=AffinityTopologyVersion [topVer=56, minorTop Ver=1], node=3676f020-0bf0-4145-861e-689c96d7e853]. Dumping pending objects that might be the cause: " without any cause and progress indicator. So full grid reboot take longer downtime with unnecessary warnings. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-6750) Return "wrong command" error in http rest api
Alexander Belyak created IGNITE-6750: Summary: Return "wrong command" error in http rest api Key: IGNITE-6750 URL: https://issues.apache.org/jira/browse/IGNITE-6750 Project: Ignite Issue Type: Bug Security Level: Public (Viewable by anyone) Components: general Affects Versions: 2.2, 2.1, 2.0, 1.9 Reporter: Alexander Belyak Priority: Minor Fix For: 2.4 If I make mistake in command name, for example curl "http://localhost:8080/ignite?cmd=wrongcmd; I get no error message and nothing will be logged in ignite log (even in IGNITE_QUIET=false mode) and only by getting response code curl -I "http://localhost:8080/ignite?cmd=wrongcmd; HTTP/1.1 400 Bad Request Date: Wed, 25 Oct 2017 10:03:06 GMT Content-Type: application/json; charset=UTF-8 Content-Length: 0 Server: Jetty(9.2.11.v20150529) I can see something, but without root cause. We need: 1) return error text curl "http://localhost:8080/ignite?cmd=wrongcmd; {"successStatus":1,"sessionToken":null,"error":"Failed to handle request: [req=UNKNOWN, err=Failed to find command: wrongcmd]","response":null} as usual: curl "http://localhost:8080/ignite?cmd=get; {"successStatus":1,"sessionToken":null,"error":"Failed to handle request: [req=CACHE_GET, err=Failed to find mandatory parameter in request: key]","response":null} 2) set status code in http response to 400 ( http://www.restapitutorial.com/httpstatuscodes.html ) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-6749) Illegal comparsion in NodeOrderComparator
Alexander Belyak created IGNITE-6749: Summary: Illegal comparsion in NodeOrderComparator Key: IGNITE-6749 URL: https://issues.apache.org/jira/browse/IGNITE-6749 Project: Ignite Issue Type: Bug Security Level: Public (Viewable by anyone) Components: general Affects Versions: 2.1 Reporter: Alexander Belyak Fix For: 2.4 In org.apache.ignite.internal.cluster.compare method code {panel} Object consId1 = n1.consistentId(); Object consId2 = n2.consistentId(); if (consId1 instanceof Comparable && consId2 instanceof Comparable) { return ((Comparable)consId1).compareTo(consId2); } {panel} check only that consId1 and consId2 is Comparable, but they may not be Comparable to each other. For example: String and UUID is comparable, but UUID.compareTo(String) throw ClassCastException. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-6616) WebConsole cache config parse
Alexander Belyak created IGNITE-6616: Summary: WebConsole cache config parse Key: IGNITE-6616 URL: https://issues.apache.org/jira/browse/IGNITE-6616 Project: Ignite Issue Type: Bug Components: wizards Affects Versions: 2.1 Reporter: Alexander Belyak Priority: Minor Fix For: 2.4 1) Go to /monitoring/dashboard 2) Press Start cache button 3) Add (without quotes in value) 4) Press Start button Expected result: warning about xml format Actual result: "Are you sure you want to start cache with name: ?" message -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-6606) Web console agent download
Alexander Belyak created IGNITE-6606: Summary: Web console agent download Key: IGNITE-6606 URL: https://issues.apache.org/jira/browse/IGNITE-6606 Project: Ignite Issue Type: Improvement Components: wizards Affects Versions: 2.1 Reporter: Alexander Belyak Assignee: Alexey Kuznetsov Priority: Minor Fix For: 2.4 To connect web console to ignite cluster I must use web-agent, but in first time its not oblivious where to get it. 1) Documentation ( https://apacheignite-tools.readme.io/docs/getting-started ) say "Ignite Web Agent zip ships with ignite-web-agent.{sh|bat} script" It's wrong. 2) In web console cluster configure screen I see big red buttons "Save project" and "Save and download projects", but to download web-agent I must found small link in bottom "Download agent" (near Feedback and Apache Ignite logo, it's wrong place) Moreover, agent configuration contain one parameter from cluster configuration (IGNITE_JETTY_PORT) so download link should be cluster wide, not web console wide. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-6604) Log exchange progress
Alexander Belyak created IGNITE-6604: Summary: Log exchange progress Key: IGNITE-6604 URL: https://issues.apache.org/jira/browse/IGNITE-6604 Project: Ignite Issue Type: Improvement Components: general Affects Versions: 2.1 Reporter: Alexander Belyak Priority: Minor Sometimes exchange process hangs (because some errors, OOMe, deadlocks, etc), sometimes it require significant time to finish (finish eviction, long Full GC, etc). We need some logging, that will show progress, because often exchange is block whole cluster and support team wanna know what's happen and how many time it will continue. Main point - simplify throubleshooting, just as grep standard message/logging class, for example: "Exchange progress: evicting partition " or "Exchange progress: waiting for response from nodes". -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-6578) Too many diagnostic: Found long running cache future
Alexander Belyak created IGNITE-6578: Summary: Too many diagnostic: Found long running cache future Key: IGNITE-6578 URL: https://issues.apache.org/jira/browse/IGNITE-6578 Project: Ignite Issue Type: Bug Components: general Affects Versions: 2.1 Reporter: Alexander Belyak Priority: Critical Get about 100Mb of message: [WARN][grid-timeout-worker-...][o.apache.ignite.internal.diagnostic] Found long running cache future few equals message per ms! Can loose logs by rotating! Can't read logs without prefiltering! -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-6559) Wrong JMX CacheClusterMetrics
Alexander Belyak created IGNITE-6559: Summary: Wrong JMX CacheClusterMetrics Key: IGNITE-6559 URL: https://issues.apache.org/jira/browse/IGNITE-6559 Project: Ignite Issue Type: Bug Reporter: Alexander Belyak In JMX org.apacheorg.apache.ignite.internal.processors.cache.CacheClusterMetrics I see: 1) same values as in CacheLocalMetrics: same Size, KeySize (cluster metrics must represent cluster wide numbers, right?) 2) zero in CacheClusterMetrics.TotalPartitionsCount (must contain real partitions count in cluster) and cacheContiguration.partitions in CacheLocalmetrics.TotalPartitionsCount (must contain real partitions count, owning by local node) 3) zero in all rebalancing* keys in CacheClusterMetrics -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-6544) Can't switch WalMode from LOG_ONLY/BACKGROUND to DEFAULT
Alexander Belyak created IGNITE-6544: Summary: Can't switch WalMode from LOG_ONLY/BACKGROUND to DEFAULT Key: IGNITE-6544 URL: https://issues.apache.org/jira/browse/IGNITE-6544 Project: Ignite Issue Type: Bug Affects Versions: 2.1 Reporter: Alexander Belyak Fix For: 2.4 To reproduce: 1) Start ignite with persistence with LOG_ONLY/BACKGROUND log mode 2) Stop and start with DEFAULT log mode Exception is: {noformat} Exception in thread "main" class org.apache.ignite.IgniteException: Failed to start processor: GridProcessorAdapter [] at org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:966) at org.apache.ignite.Ignition.start(Ignition.java:325) at org.apache.ignite.examples.datagrid.CacheApiExample.main(CacheApiExample.java:59) Caused by: class org.apache.ignite.IgniteCheckedException: Failed to start processor: GridProcessorAdapter [] at org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1813) at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:931) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1904) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1646) at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1074) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:594) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:519) at org.apache.ignite.Ignition.start(Ignition.java:322) ... 1 more Caused by: class org.apache.ignite.IgniteCheckedException: Failed to initialize WAL log segment (WAL segment size change is not supported):/tmp/s1/wal/0_0_0_0_0_0_0_1_lo_10_0_3_1_10_42_1_107_127_0_0_1_172_17_0_1_47500/.wal at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.checkFiles(FileWriteAheadLogManager.java:1420) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.checkOrPrepareFiles(FileWriteAheadLogManager.java:934) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.start0(FileWriteAheadLogManager.java:274) at org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.start(GridCacheSharedManagerAdapter.java:61) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.start(GridCacheProcessor.java:614) at org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1810) ... 8 more {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-6477) Add cache index metric to represent index size
Alexander Belyak created IGNITE-6477: Summary: Add cache index metric to represent index size Key: IGNITE-6477 URL: https://issues.apache.org/jira/browse/IGNITE-6477 Project: Ignite Issue Type: Bug Components: general Affects Versions: 2.1, 2.0, 1.9, 1.8 Reporter: Alexander Belyak Priority: Minor Fix For: 2.2 Now we can't estimate space used by particular cache index. Let's add it! -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-6451) AssertionError: null in GridCacheIoManager.onSend on stop
Alexander Belyak created IGNITE-6451: Summary: AssertionError: null in GridCacheIoManager.onSend on stop Key: IGNITE-6451 URL: https://issues.apache.org/jira/browse/IGNITE-6451 Project: Ignite Issue Type: Bug Components: general Affects Versions: 1.8 Reporter: Alexander Belyak Priority: Minor If we stop node while sending message (after GridCacheIoManager.onSend test if grid is stopping) - we get AssertionError, for example, from: {noformat} java.lang.AssertionError: null at org.apache.ignite.internal.processors.cache.GridCacheMessage.marshalCollection(GridCacheMessage.java:481) ~[ignite-core-1.10.3.ea15-SNAPSHOT.jar:2.0.0-SNAPSHOT] at org.apache.ignite.internal.processors.cache.query.GridCacheQueryResponse.prepareMarshal(GridCacheQueryResponse.java:134) ~[ignite-core-1.10.3.ea15-SNAPSHOT.jar:2.0.0-SNAPSHOT] at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onSend(GridCacheIoManager.java:917) [ignite-core-1.10.3.ea15-SNAPSHOT.jar:2.0.0-SNAPSHOT] {noformat} I think we need more reliable approach to stop grid, ideally - we must stop all activity as first step of stopping grid and go to next step only after it. Or we can just add many tests in code like after each cctx = ctx.getCacheContext(cacheId) do if (cctx == null && ...kernalContext().isStopping()) return false; //<= handle parallel stop here to correctly cancel operation I think its important because no one can trust db with assertions in logs! -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-6062) IllegalArgumentException thrown while getHeapMemoryUsage()
Alexander Belyak created IGNITE-6062: Summary: IllegalArgumentException thrown while getHeapMemoryUsage() Key: IGNITE-6062 URL: https://issues.apache.org/jira/browse/IGNITE-6062 Project: Ignite Issue Type: Bug Components: general Affects Versions: 1.8 Reporter: Alexander Belyak Fix For: 1.8 In org.apache.ignite.internal.managers.discovery.GridDiscoveryManager we can't just use getHeapMemoryUsage(): private static final MemoryMXBean mem = ManagementFactory.getMemoryMXBean(); mem.getHeapMemoryUsage().getCommitted(); because of https://bugs.openjdk.java.net/browse/JDK-6870537 It should be somehow wrapped to catch IllegalArgumentException. Also we need to test all codebase and use wrapped version of getHeapMemoryUsage() method. In version 2.1 its already fixed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-5755) Wrong msg: calculation of memory policy size
Alexander Belyak created IGNITE-5755: Summary: Wrong msg: calculation of memory policy size Key: IGNITE-5755 URL: https://issues.apache.org/jira/browse/IGNITE-5755 Project: Ignite Issue Type: Bug Components: general Affects Versions: 2.1 Reporter: Alexander Belyak Priority: Trivial Fix For: 2.3 In PageMemoryNoStoreImpl: {noformat} throw new IgniteOutOfMemoryException("Not enough memory allocated " + "(consider increasing memory policy size or enabling evictions) " + "[policyName=" + memoryPolicyCfg.getName() + ", size=" + U.readableSize(memoryPolicyCfg.getMaxSize(), true) + "]" {noformat} wrong usage of U.readableSize - we should use non "Si" (1024 instead of 1000) multiplier. Right code is: {noformat} throw new IgniteOutOfMemoryException("Not enough memory allocated " + "(consider increasing memory policy size or enabling evictions) " + "[policyName=" + memoryPolicyCfg.getName() + ", size=" + U.readableSize(memoryPolicyCfg.getMaxSize(), false) + "]" {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-5733) Activate/deactivate cluster through http-rest api
Alexander Belyak created IGNITE-5733: Summary: Activate/deactivate cluster through http-rest api Key: IGNITE-5733 URL: https://issues.apache.org/jira/browse/IGNITE-5733 Project: Ignite Issue Type: Improvement Affects Versions: 2.0 Reporter: Alexander Belyak Priority: Minor Fix For: 2.1 Need to add command to get/set cluster active flag into http rest api. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-5709) Node stopped on OutOfMemoryException with persistence
Alexander Belyak created IGNITE-5709: Summary: Node stopped on OutOfMemoryException with persistence Key: IGNITE-5709 URL: https://issues.apache.org/jira/browse/IGNITE-5709 Project: Ignite Issue Type: Bug Components: persistence Affects Versions: 2.1 Reporter: Alexander Belyak Priority: Critical In long heavy (100%) load node with configured persistence can stop with "org.apache.ignite.internal.mem.OutOfMemoryException: Failed to find a page for eviction" exception. In my test it fail after 23 hour of 100% load while expiration outdated entries (by CreatedExpiryPolicy). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-5631) Can't write value greater then wal segment
Alexander Belyak created IGNITE-5631: Summary: Can't write value greater then wal segment Key: IGNITE-5631 URL: https://issues.apache.org/jira/browse/IGNITE-5631 Project: Ignite Issue Type: Bug Components: persistence Affects Versions: 2.1 Reporter: Alexander Belyak Priority: Minor Fix For: 2.1 Step to reproduce: insert value greater then wal segment size. Expected behavior: get few wal segments and insert value Current behavior: infinite writing of wal archive For test I use 256Kb of WAL segment size and value from 10M length String. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-5445) ServerImpl can't process NodeFailedMessage about itself
Alexander Belyak created IGNITE-5445: Summary: ServerImpl can't process NodeFailedMessage about itself Key: IGNITE-5445 URL: https://issues.apache.org/jira/browse/IGNITE-5445 Project: Ignite Issue Type: Bug Components: general Affects Versions: 1.9 Reporter: Alexander Belyak Priority: Minor Fix For: 2.1 If for some reason (GC pause or heavy load) node get NodeLeft(FAILED) message about itself - it can't correctly handle it, because it call TcpDiscoveryNodesRing.removeNode with local node id and get assertion error. I think - node should correctly determine such event and throw something like "segmented" event and so on. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (IGNITE-5390) But in IgniteCacheTxStoreSessionWriteBehindCoalescingTest
Alexander Belyak created IGNITE-5390: Summary: But in IgniteCacheTxStoreSessionWriteBehindCoalescingTest Key: IGNITE-5390 URL: https://issues.apache.org/jira/browse/IGNITE-5390 Project: Ignite Issue Type: Bug Environment: 1.9.3 Reporter: Alexander Belyak Assignee: Alexander Belyak Priority: Trivial IgniteCacheTxStoreSessionWriteBehindCoalescingTest override cacheConfiguration method from IgniteCacheStoreSessionWriteBehindAbstractTest to switch TestStore into TestNonCoalescingStore. But IgniteCacheStoreSessionWriteBehindAbstractTest.getConfiguration cacheStoreFactory explicitly set to TestStore for ccfg1. Need to remove it. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (IGNITE-5184) Collect write behind batch with out of order
Alexander Belyak created IGNITE-5184: Summary: Collect write behind batch with out of order Key: IGNITE-5184 URL: https://issues.apache.org/jira/browse/IGNITE-5184 Project: Ignite Issue Type: Improvement Reporter: Alexander Belyak Now write behind flusher trying to batch cache operation with only natural order, i.e. if cache have "insert1, update2, delete3, insert4, delete5" operations it will be splitted to 4 batch opearations: 1) insert1, update2 2) delete3 3) insert4 4) delete5 Or even worse if we have two flush threads (they can get operation as: 1 thread: 1) insert1 2 thread: 1) update2 1 thread: 2) delete3 2 thread: 2) insert4 1 thread: 3) delete5 And we get 5 "batch" operation with store. Because we already don't have real historical order in WB (with insert key1=1, delete key2, update key1=3 store wil get writeAll(key1=3) and then deleteAll(key2) operations) - it will be better if flusher thying to skip cache entries with different operation, i.e.process first example as: 1) insert1, update2, insert4 (skip delete3 and process it later) 2) delete3, delete5 (process delete3 operation) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (IGNITE-5062) Support new parameters in .Net
Alexander Belyak created IGNITE-5062: Summary: Support new parameters in .Net Key: IGNITE-5062 URL: https://issues.apache.org/jira/browse/IGNITE-5062 Project: Ignite Issue Type: Bug Reporter: Alexander Belyak Assignee: Pavel Tupitsyn Need to support new value and remove old ones: In TcpDiscoverySpi: remove maxMissedHeartbeats remove maxMissedClientHeartbeats remove heartbeatFrequency rename hbFreq to metricsUpdateFrequency In IgniteConfiguration: add clientFailureDetectionTimeout (long with bounds from metricsUpdateFrequency to Integer.MAX_VALUE) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (IGNITE-5060) Check configuration parameters on the Integer overflowing
Alexander Belyak created IGNITE-5060: Summary: Check configuration parameters on the Integer overflowing Key: IGNITE-5060 URL: https://issues.apache.org/jira/browse/IGNITE-5060 Project: Ignite Issue Type: Bug Reporter: Alexander Belyak Time related configuration parameters using long data type (and expect value in ms), but standard java.net.Socket class expect integer for soDelay and usually long timeouts from configuration cast to ineter with simple (int) method with overflow if configuration timeout > Integer.MAX_VALUE. Need to add configuration check for: * IgniteConfiguration.failureDetectionTimeout * IgniteConfiguration.clientFailureDetectionTimeout * TcpDiscoverySpi.ackTimeout * TcpDiscoverySpi.netTimeout -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (IGNITE-5043) Support CacheConfiguration.writeBehindCoalescing in .Net
Alexander Belyak created IGNITE-5043: Summary: Support CacheConfiguration.writeBehindCoalescing in .Net Key: IGNITE-5043 URL: https://issues.apache.org/jira/browse/IGNITE-5043 Project: Ignite Issue Type: Bug Reporter: Alexander Belyak Assignee: Pavel Tupitsyn Please support new parameter CacheConfiguration.writeCoalescing in .Net. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (IGNITE-5042) Add internal ring wide msg for status check
Alexander Belyak created IGNITE-5042: Summary: Add internal ring wide msg for status check Key: IGNITE-5042 URL: https://issues.apache.org/jira/browse/IGNITE-5042 Project: Ignite Issue Type: Improvement Reporter: Alexander Belyak Ignite cluster, perhaps, need some special ring message to fast node status check, because metrics update message now is too heavy and require unmarshalling/marshalling in each node to go through the ring (and in big cluster it can take a lot of time). New ring status check message must work with keep binary approach. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (IGNITE-5015) In TcpCommunicationSpi use IgniteConfiguration.clientFailureDetectionTimeout
Alexander Belyak created IGNITE-5015: Summary: In TcpCommunicationSpi use IgniteConfiguration.clientFailureDetectionTimeout Key: IGNITE-5015 URL: https://issues.apache.org/jira/browse/IGNITE-5015 Project: Ignite Issue Type: Bug Reporter: Alexander Belyak Need to use new IgniteConfiguration.clientFailureDetectionTimeout in CommunicationSpi when interacting with client nodes. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (IGNITE-5005) WriteBehindStore - split flusher's to different classes by writeCoalescing
Alexander Belyak created IGNITE-5005: Summary: WriteBehindStore - split flusher's to different classes by writeCoalescing Key: IGNITE-5005 URL: https://issues.apache.org/jira/browse/IGNITE-5005 Project: Ignite Issue Type: Improvement Reporter: Alexander Belyak In GridCacheWriteBehindStore.Flusher too many if statements, because its behavior depends of writeCoalescing flag too much. Need to split this class into two different (with one abstract base Flusher class). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (IGNITE-5004) GridCacheWriteBegindStore - remove StatefulValue
Alexander Belyak created IGNITE-5004: Summary: GridCacheWriteBegindStore - remove StatefulValue Key: IGNITE-5004 URL: https://issues.apache.org/jira/browse/IGNITE-5004 Project: Ignite Issue Type: Improvement Affects Versions: 1.9 Reporter: Alexander Belyak If writeCoalescing=false - GridCacheWriteBehindStore doesn't need to create StatefulValue for each KV entry. Need to implement WBStore without this wrapper at all (if it is possible to solve ABA problem in cacheMap's) or with thinner wrapper without unnecesarry syncronizations/state. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (IGNITE-5003) Parallel write same key in CacheWriteBehindStore
Alexander Belyak created IGNITE-5003: Summary: Parallel write same key in CacheWriteBehindStore Key: IGNITE-5003 URL: https://issues.apache.org/jira/browse/IGNITE-5003 Project: Ignite Issue Type: Improvement Affects Versions: 1.9 Reporter: Alexander Belyak Now GridCacheWriteBehindStore.updateCache wait for writeLock in StatefulValue and, moreover, waitForFlush() if value is in pending (flushing) state. We need to remove waiting. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (IGNITE-4999) Use one thread pool to flush all CacheWriteBehindStore
Alexander Belyak created IGNITE-4999: Summary: Use one thread pool to flush all CacheWriteBehindStore Key: IGNITE-4999 URL: https://issues.apache.org/jira/browse/IGNITE-4999 Project: Ignite Issue Type: Improvement Affects Versions: 1.9 Reporter: Alexander Belyak For now we have flusher threads for each CacheWriteBehindStore so we can't create many caches with this mechanism (too many threads). We should use single thread pool for all CacheWriteBehindStore instances (as for TTL). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (IGNITE-4940) GridCacheWriteBehindStore lose more data then necessary
Alexander Belyak created IGNITE-4940: Summary: GridCacheWriteBehindStore lose more data then necessary Key: IGNITE-4940 URL: https://issues.apache.org/jira/browse/IGNITE-4940 Project: Ignite Issue Type: Bug Affects Versions: 1.9 Reporter: Alexander Belyak Priority: Minor Unnecessary data loss happen in case of slowdown or errors in underlying store & populate new data in cache: 1) Writer add new cache entry and check cache size 2) If cache size > criticalSize (by default criticalSize = 1.5 * cacheSize) - writer will try to flush single value synchronously At this point we have: N flusher threads wich trying to flush data in batch mode 1+ writer threads wich trying to flush single value Both writer and flusher use updateStore procedure, but if updateStore get Exception from underlying store it will check cacheSize and if it will be greater chen criticalCacheSize - it log cache overflow event and return true (as if data was sucessfully stored). Then data will be removed from writeBehind cache. Moreower, we can loss not only single value, but 1+ batch if flusher's threads will get store exception on overflowed cache. Reproduce: {panel} /** * Tests that cache would keep values if underlying store fails. * * @throws Exception If failed. */ private void testStoreFailure(boolean writeCoalescing) throws Exception { delegate.setShouldFail(true); initStore(2, writeCoalescing); Set exp; try { Thread timer = new Thread(new Runnable() { @Override public void run() { try { U.sleep(FLUSH_FREQUENCY*2); } catch (IgniteInterruptedCheckedException e) { assertTrue("Timer was interrupted", false); } delegate.setShouldFail(false); } }); timer.start(); exp = runPutGetRemoveMultithreaded(10, 10); timer.join(); info(">>> There are " + store.getWriteBehindErrorRetryCount() + " entries in RETRY state"); // Despite that we set shouldFail flag to false, flush thread may just have caught an exception. // If we move store to the stopping state right away, this value will be lost. That's why this sleep // is inserted here to let all exception handlers in write-behind store exit. U.sleep(1000); } finally { shutdownStore(); } Mapmap = delegate.getMap(); Collection extra = new HashSet<>(map.keySet()); extra.removeAll(exp); assertTrue("The underlying store contains extra keys: " + extra, extra.isEmpty()); Collection missing = new HashSet<>(exp); missing.removeAll(map.keySet()); assertTrue("Missing keys in the underlying store: " + missing, missing.isEmpty()); for (Integer key : exp) assertEquals("Invalid value for key " + key, "val" + key, map.get(key)); } {panel} Solution: test cache size before inserting new value + a) with some kind of synchronization to prevent cacheSize growing more then criticalCacheSize (strong restriction) b) remove cache size test from updateStore - cache can grow more then cacheCriticalSize in single point - if we get race on updateCache... I preferr b becouse of less synchronization pressure (cache can store 1 or 2 extra elements) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (IGNITE-4934) GridCacheWriteBehindStore broke if store backend throw a single exception
Alexander Belyak created IGNITE-4934: Summary: GridCacheWriteBehindStore broke if store backend throw a single exception Key: IGNITE-4934 URL: https://issues.apache.org/jira/browse/IGNITE-4934 Project: Ignite Issue Type: Bug Components: cache Affects Versions: 1.9 Reporter: Alexander Belyak Assignee: Alexey Dmitriev Priority: Critical If flusher in GridCacheWriteBehindStore get runtime exception from underlying CacheStore - it will stop working at all. So future operation with GridCacheWriteBehindStore will be performed by writer threads (in flushSingleValue procedure) without batching, without write behind and moreower - with deadlock if writer will try to owerride some key in pending state, wich was processed by broken flusher thread. Reproducer: GridCacheWriteBehindStoreMultithreadedSelfTest.testStoreFailure with exp = runPutGetRemoveMultithreaded(10, 10); changed to exp = runPutGetRemoveMultithreaded(10, 500); This test should be changed too. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (IGNITE-4022) IgniteServices soesn't throw an exception if there are no server nodes
Alexander Belyak created IGNITE-4022: Summary: IgniteServices soesn't throw an exception if there are no server nodes Key: IGNITE-4022 URL: https://issues.apache.org/jira/browse/IGNITE-4022 Project: Ignite Issue Type: Bug Affects Versions: 1.8 Reporter: Alexander Belyak If you call deployNodeSingleton method, but there are no server nodes in IgniteServices base ClusterGroup - you will never know about it and can't find deployed service instance. Probably we should print out these errors in logs as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)