[ https://issues.apache.org/jira/browse/IGNITE-7751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16396852#comment-16396852 ]
Ivan Rakov commented on IGNITE-7751: ------------------------------------ [https://github.com/apache/ignite/pull/3611] https://ci.ignite.apache.org/viewLog.html?buildId=1135034&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_RunAll > Pages Write Throttle mode doesn't protect from checkpoint buffer overflow > ------------------------------------------------------------------------- > > Key: IGNITE-7751 > URL: https://issues.apache.org/jira/browse/IGNITE-7751 > Project: Ignite > Issue Type: Bug > Affects Versions: 2.3 > Reporter: Ivan Rakov > Assignee: Ivan Rakov > Priority: Critical > Fix For: 2.5 > > > Even with write throttling enabled, checkpoint buffer still can be > overflowed. Overflow chance increases with number of writing threads. Example > stacktrace: > {noformat} > 2018-02-17 21:00:14.777 > [ERROR][sys-stripe-12-#13%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.dht.GridDhtTxRemote] > Commit failed. > org.apache.ignite.internal.transactions.IgniteTxHeuristicCheckedException: > Commit produced a runtime exception (all transaction entries will be > invalidated): > GridDhtTxRemote[id=06db48da161-00000000-07c5-23f5-0000-000000000005, > concurrency=OPTIMISTIC, isolation=SERIALIZABLE, state=COMMITTING, > invalidate=false, rollbackOnly=false, > nodeId=da415868-d9b3-48a5-9b56-0706ae60dd3b, duration=60] > at > org.apache.ignite.internal.processors.cache.distributed.GridDistributedTxRemoteAdapter.commitIfLocked(GridDistributedTxRemoteAdapter.java:739) > at > org.apache.ignite.internal.processors.cache.distributed.GridDistributedTxRemoteAdapter.commitRemoteTx(GridDistributedTxRemoteAdapter.java:813) > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finish(IgniteTxHandler.java:1319) > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processDhtTxFinishRequest(IgniteTxHandler.java:1231) > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$600(IgniteTxHandler.java:97) > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$7.apply(IgniteTxHandler.java:213) > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$7.apply(IgniteTxHandler.java:211) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1060) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293) > at > org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1555) > at > org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1183) > at > org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:126) > at > org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1090) > at > org.apache.ignite.internal.util.StripedExecutor$Stripe.run(StripedExecutor.java:499) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.ignite.IgniteException: Runtime failure on row: > Row@9f0a081[ key: 4694439661580364888, val: > com.sbt.bm.ucp.common.dpl.model.party.DUserInfo_DPL_PROXY [idHash=1290746929, > hash=400782371, colocationKey=16678, lastChangeDate=1518890414661, > userFullName=null, partition_DPL_id=6, bankInfo_DPL_id=4694439661580364888, > bankInfo_DPL_colocationKey=16678, ownerId=null, > infoFlowChannel_DPL_colocationKey=0, userLogin=reloading, > uid=1102030258731339432, isDeleted=false, infoFlowChannel_DPL_id=0, > sourceSystem_DPL_id=65, id=4694439661580364888, > colocationId=1102030258828706483], ver: GridCacheVersion [topVer=130360309, > order=1519034613156, nodeOrder=5] ][ 1102030258731339432, reloading, > 4694439661580364888, 0, null, 65, 4694439661580364888, FALSE, 6 ] > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doPut(BPlusTree.java:2102) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.putx(BPlusTree.java:2049) > at > org.apache.ignite.internal.processors.query.h2.database.H2TreeIndex.putx(H2TreeIndex.java:247) > at > org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.addToIndex(GridH2Table.java:536) > at > org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.update(GridH2Table.java:468) > at > org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.store(IgniteH2Indexing.java:595) > at > org.apache.ignite.internal.processors.query.GridQueryProcessor.store(GridQueryProcessor.java:1865) > at > org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.store(GridCacheQueryManager.java:407) > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.finishUpdate(IgniteCacheOffheapManagerImpl.java:1343) > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1207) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:1356) > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:345) > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:3527) > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerSet(GridCacheMapEntry.java:1039) > at > org.apache.ignite.internal.processors.cache.distributed.GridDistributedTxRemoteAdapter.commitIfLocked(GridDistributedTxRemoteAdapter.java:609) > ... 18 common frames omitted > Caused by: org.apache.ignite.IgniteException: Failed to allocate temporary > buffer for checkpoint (increase checkpointPageBufferSize configuration > property) > at > org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.postWriteLockPage(PageMemoryImpl.java:1293) > at > org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeLockPage(PageMemoryImpl.java:1276) > at > org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeLock(PageMemoryImpl.java:398) > at > org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeLock(PageMemoryImpl.java:393) > at > org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writeLock(PageHandler.java:398) > at > org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writePage(PageHandler.java:326) > at > org.apache.ignite.internal.processors.cache.persistence.DataStructure.write(DataStructure.java:262) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access$11100(BPlusTree.java:82) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Put.tryInsert(BPlusTree.java:2922) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Put.access$7600(BPlusTree.java:2610) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.putDown(BPlusTree.java:2348) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.putDown(BPlusTree.java:2329) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.putDown(BPlusTree.java:2329) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.putDown(BPlusTree.java:2329) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doPut(BPlusTree.java:2069) > ... 32 common frames omitted{noformat} > The problem is that we apply throttling by checkpoint buffer only for pages > that are present in current checkpoint: > {noformat} > if (isPageInCheckpoint) { > int checkpointBufLimit = pageMemory.checkpointBufferPagesSize() * 2 / 3; > shouldThrottle = pageMemory.checkpointBufferPagesCount() > > checkpointBufLimit; > }{noformat} > On the other hand, we clear backoff counter if we don't apply throttling, > which can happen for page which is not in checkpoint: > {noformat} > if (shouldThrottle) { > int throttleLevel = exponentialBackoffCntr.getAndIncrement(); > LockSupport.parkNanos((long)(STARTING_THROTTLE_NANOS * > Math.pow(BACKOFF_RATIO, throttleLevel))); > } > else > exponentialBackoffCntr.set(0);{noformat} > Possible solution: introduce two separate backoff counters for pages in / not > in checkpoint. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)