[jira] [Commented] (IGNITE-12096) Ignite memory metrics incorrect on cache usage contraction
[ https://issues.apache.org/jira/browse/IGNITE-12096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015067#comment-17015067 ] Colin Cassidy commented on IGNITE-12096: I have been looking at the Ignite code - in particular, AbstractFreeList and PagesList and debugging the occurrence of this error. I believe that I have some understanding of the problem and also a simple potential fix, but I would like to verify this.I have been looking at the Ignite code - in particular, AbstractFreeList and PagesList and debugging the occurrence of this error. I believe that I have some understanding of the problem and also a simple potential fix, but I would like to verify this. The problem occurs because fillFactor is calculated in DataRegionMetricsImpl as fillFactor = totalAllocated - freeSpace. When entries are removed from the cache in the supplied example, the value of totalAllocated is not reduced. In Ignite 2.6, freeSpace increased to compensate but from 2.7, this does not happen. The reason is that the freeSpace calculation appears to delibareately exclude the REUSE_BUCKET - which is the final entry (index 255) in the bucket list. [https://github.com/gridgain/gridgain/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/freelist/AbstractFreeList.java#L394] When the test is run in Ignite 2.6, cache entry removal results in a large increment to bucket 251. This brings fillFactor close to zero. From Ignite 2.7, it is the REUSE_BUCKET that is incremented - but this does not contribute to freeSpace. The difference in behaviour appears to be caused by the following change, which appears to mark pages for recycling: [https://github.com/gridgain/gridgain/blame/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/freelist/AbstractFreeList.java#L641https://github.com/gridgain/gridgain/commit/47da5df328a18d0d55ba534b1af541b72df76901] My proposed fix is to change AbstractFreeList:394 to include the REUSE_BUCKET in the freeSpace calculation i.e. _for (int b = BUCKETS - 1; b > 0; b--) {_ instead of _for (int b = BUCKETS - 2; b > 0; b--) {_ I can confirm that this fixes the metrics reporting for the test - but would like to understand the reasoning behind excluding the REUSE_BUCKET in the first place. Is there some reason why pages that are marked for recycling cannot be included in the free list? If so, is there a way we can avoid these pages being left in an apparently permanent limbo? Do you agree with my proposed change, [~sergey-chugunov] and [~jokser]? Regards, Colin. > Ignite memory metrics incorrect on cache usage contraction > -- > > Key: IGNITE-12096 > URL: https://issues.apache.org/jira/browse/IGNITE-12096 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 2.7 >Reporter: Colin Cassidy >Priority: Critical > > When using the Ignite metrics API to measure available memory, the usage > figures appear to be accurate while memory is being consumed - but when > memory is freed the metrics do not drop. They appear to report that memory > has not been freed up, even though it has. > A reliable estimate of memory consumption is very important for solutions > that don't use native persistence - as this is the only controllable way of > avoiding a critical OOM condition. > Reproducer below. This affects Ignite 2.7+. > {{}}{{import org.apache.ignite.failure.NoOpFailureHandler; }} > {{import org.junit.Test; }} > {{public class MemoryTest2 { }} > {{ private static final String CACHE_NAME = "cache"; }} > {{ private static final String DEFAULT_MEMORY_REGION = "Default_Region"; > }} > {{ private static final long MEM_SIZE = 100L * 1024 * 1024; }} > {{ @Test }} > {{ public void testOOM() throws InterruptedException { }} > {{ try (Ignite ignite = startIgnite("IgniteMemoryMonitorTest1")) { }} > {{ fillDataRegion(ignite); }} > {{ CacheConfiguration cfg = new }} > {{CacheConfiguration<>(CACHE_NAME); }} > {{ cfg.setStatisticsEnabled(true); }} > {{ IgniteCache cache = }} > {{ignite.getOrCreateCache(cfg); }} > {{ // Clear all entries from the cache to free up memory }} > {{ memUsed(ignite); }} > {{ cache.clear(); }} > {{ cache.removeAll(); }} > {{ cache.put("Key", "Value"); }} > {{ memUsed(ignite); }} > {{ cache.destroy(); }} > {{ Thread.sleep(5000); }} > {{ // Should now report close to 0% but reports 59% still }} > {{ memUsed(ignite); }} > {{ } }} > {{ } }} > {{ }} > {{ private Ignite startIgnite(String instanceName) { }} > {{ IgniteConfiguration cfg =
[jira] [Commented] (IGNITE-12096) Ignite memory metrics incorrect on cache usage contraction
[ https://issues.apache.org/jira/browse/IGNITE-12096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914570#comment-16914570 ] Colin Cassidy commented on IGNITE-12096: getOffheapUsedSize seems to be equivalent to getTotalAllocatedPages() * getPageSize() Which is to say, it doesn't take account of the fill factor either - so the value doesn't come down on cache purge even before 2.7 I'm happy to multiply by the fill factor - but the problem seems to be that it behaves differently from 2.7. Thanks for the link to the discussion - I'll take a look. > Ignite memory metrics incorrect on cache usage contraction > -- > > Key: IGNITE-12096 > URL: https://issues.apache.org/jira/browse/IGNITE-12096 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 2.7 >Reporter: Colin Cassidy >Priority: Critical > > When using the Ignite metrics API to measure available memory, the usage > figures appear to be accurate while memory is being consumed - but when > memory is freed the metrics do not drop. They appear to report that memory > has not been freed up, even though it has. > Reproducer below. This affects Ignite 2.7+. > {{}}{{import org.apache.ignite.failure.NoOpFailureHandler; }} > {{import org.junit.Test; }} > {{public class MemoryTest2 { }} > {{ private static final String CACHE_NAME = "cache"; }} > {{ private static final String DEFAULT_MEMORY_REGION = "Default_Region"; }} > {{ private static final long MEM_SIZE = 100L * 1024 * 1024; }} > {{ @Test }} > {{ public void testOOM() throws InterruptedException { }} > {{ try (Ignite ignite = startIgnite("IgniteMemoryMonitorTest1")) { }} > {{ fillDataRegion(ignite); }} > {{ CacheConfiguration cfg = new }} > {{CacheConfiguration<>(CACHE_NAME); }} > {{ cfg.setStatisticsEnabled(true); }} > {{ IgniteCache cache = }} > {{ignite.getOrCreateCache(cfg); }} > {{ // Clear all entries from the cache to free up memory }} > {{ memUsed(ignite); }} > {{ cache.clear(); }} > {{ cache.removeAll(); }} > {{ cache.put("Key", "Value"); }} > {{ memUsed(ignite); }} > {{ cache.destroy(); }} > {{ Thread.sleep(5000); }} > {{ // Should now report close to 0% but reports 59% still }} > {{ memUsed(ignite); }} > {{ } }} > {{ } }} > {{ }} > {{ private Ignite startIgnite(String instanceName) { }} > {{ IgniteConfiguration cfg = new IgniteConfiguration(); }} > {{ cfg.setIgniteInstanceName(instanceName); }} > {{ cfg.setDataStorageConfiguration(createDataStorageConfiguration()); > }} > {{ cfg.setFailureHandler(new NoOpFailureHandler()); }} > {{ return Ignition.start(cfg); }} > {{ } }} > {{ private DataStorageConfiguration createDataStorageConfiguration() { }} > {{ return new DataStorageConfiguration() }} > {{ .setDefaultDataRegionConfiguration( }} > {{ new DataRegionConfiguration() }} > {{ .setName(DEFAULT_MEMORY_REGION) }} > {{ .setInitialSize(MEM_SIZE) }} > {{ .setMaxSize(MEM_SIZE) }} > {{ .setMetricsEnabled(true)); }} > {{ } }} > {{ private void fillDataRegion(Ignite ignite) { }} > {{ byte[] megabyte = new byte[1024 * 1024]; }} > {{ IgniteCache cache = }} > {{ ignite.getOrCreateCache(CACHE_NAME); }} > {{ for (int i = 0; i < 50; i++) { }} > {{ cache.put(i, megabyte); }} > {{ memUsed(ignite); }} > {{ } }} > {{ } }} > {{ private void memUsed(Ignite ignite) { }} > {{ DataRegionConfiguration defaultDataRegionCfg = }} > {{ignite.configuration() }} > {{ .getDataStorageConfiguration() }} > {{ .getDefaultDataRegionConfiguration(); }} > {{ String regionName = defaultDataRegionCfg.getName(); }} > {{ DataRegionMetrics metrics = ignite.dataRegionMetrics(regionName); }} > {{ float usedMem = metrics.getPagesFillFactor() * }} > {{metrics.getTotalAllocatedPages() * metrics.getPageSize(); }} > {{ float pctUsed = 100 * usedMem / defaultDataRegionCfg.getMaxSize(); > }} > {{ System.out.println("Memory used: " + pctUsed + "%"); }} > {{ } }} > {{} }} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (IGNITE-12096) Ignite memory metrics incorrect on cache usage contraction
[ https://issues.apache.org/jira/browse/IGNITE-12096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914551#comment-16914551 ] Denis Magda commented on IGNITE-12096: -- Collin, how about this metric that is designed to return the actual size of a region - DataRegionMetricsMXBean.getOffheapUsedSize? It's available since Ignite 2.5: https://issues.apache.org/jira/browse/IGNITE-8078 Plus, we might need to restart a discussion here if the metric doesn't suit your needs: http://apache-ignite-developers.2346864.n4.nabble.com/Memory-usage-per-cache-td28470.html > Ignite memory metrics incorrect on cache usage contraction > -- > > Key: IGNITE-12096 > URL: https://issues.apache.org/jira/browse/IGNITE-12096 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 2.7 >Reporter: Colin Cassidy >Priority: Critical > > When using the Ignite metrics API to measure available memory, the usage > figures appear to be accurate while memory is being consumed - but when > memory is freed the metrics do not drop. They appear to report that memory > has not been freed up, even though it has. > Reproducer below. This affects Ignite 2.7+. > {{}}{{import org.apache.ignite.failure.NoOpFailureHandler; }} > {{import org.junit.Test; }} > {{public class MemoryTest2 { }} > {{ private static final String CACHE_NAME = "cache"; }} > {{ private static final String DEFAULT_MEMORY_REGION = "Default_Region"; }} > {{ private static final long MEM_SIZE = 100L * 1024 * 1024; }} > {{ @Test }} > {{ public void testOOM() throws InterruptedException { }} > {{ try (Ignite ignite = startIgnite("IgniteMemoryMonitorTest1")) { }} > {{ fillDataRegion(ignite); }} > {{ CacheConfiguration cfg = new }} > {{CacheConfiguration<>(CACHE_NAME); }} > {{ cfg.setStatisticsEnabled(true); }} > {{ IgniteCache cache = }} > {{ignite.getOrCreateCache(cfg); }} > {{ // Clear all entries from the cache to free up memory }} > {{ memUsed(ignite); }} > {{ cache.clear(); }} > {{ cache.removeAll(); }} > {{ cache.put("Key", "Value"); }} > {{ memUsed(ignite); }} > {{ cache.destroy(); }} > {{ Thread.sleep(5000); }} > {{ // Should now report close to 0% but reports 59% still }} > {{ memUsed(ignite); }} > {{ } }} > {{ } }} > {{ }} > {{ private Ignite startIgnite(String instanceName) { }} > {{ IgniteConfiguration cfg = new IgniteConfiguration(); }} > {{ cfg.setIgniteInstanceName(instanceName); }} > {{ cfg.setDataStorageConfiguration(createDataStorageConfiguration()); > }} > {{ cfg.setFailureHandler(new NoOpFailureHandler()); }} > {{ return Ignition.start(cfg); }} > {{ } }} > {{ private DataStorageConfiguration createDataStorageConfiguration() { }} > {{ return new DataStorageConfiguration() }} > {{ .setDefaultDataRegionConfiguration( }} > {{ new DataRegionConfiguration() }} > {{ .setName(DEFAULT_MEMORY_REGION) }} > {{ .setInitialSize(MEM_SIZE) }} > {{ .setMaxSize(MEM_SIZE) }} > {{ .setMetricsEnabled(true)); }} > {{ } }} > {{ private void fillDataRegion(Ignite ignite) { }} > {{ byte[] megabyte = new byte[1024 * 1024]; }} > {{ IgniteCache cache = }} > {{ ignite.getOrCreateCache(CACHE_NAME); }} > {{ for (int i = 0; i < 50; i++) { }} > {{ cache.put(i, megabyte); }} > {{ memUsed(ignite); }} > {{ } }} > {{ } }} > {{ private void memUsed(Ignite ignite) { }} > {{ DataRegionConfiguration defaultDataRegionCfg = }} > {{ignite.configuration() }} > {{ .getDataStorageConfiguration() }} > {{ .getDefaultDataRegionConfiguration(); }} > {{ String regionName = defaultDataRegionCfg.getName(); }} > {{ DataRegionMetrics metrics = ignite.dataRegionMetrics(regionName); }} > {{ float usedMem = metrics.getPagesFillFactor() * }} > {{metrics.getTotalAllocatedPages() * metrics.getPageSize(); }} > {{ float pctUsed = 100 * usedMem / defaultDataRegionCfg.getMaxSize(); > }} > {{ System.out.println("Memory used: " + pctUsed + "%"); }} > {{ } }} > {{} }} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (IGNITE-12096) Ignite memory metrics incorrect on cache usage contraction
[ https://issues.apache.org/jira/browse/IGNITE-12096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914510#comment-16914510 ] Colin Cassidy commented on IGNITE-12096: Thanks for the response. Some observations: * I'm using the technique recommended to me by GG support - but happy to be corrected. * The memory usage calculation page recommends using DataStorageMetrics. This returns null for me even with setMetricsEnabled(true) - presumably because I am not using native persistence. * The above code worked fine up to Ignite 2.6 - so I assume there has been some change to the purging logic to cause this. * If using the DataRegion allocatedSize, it appears not to be account for the fill factor - so this value doesn't drop on cache purge even in Ignite 2.6. * My entries are 1MB each - comfortably larger than the page size. So I expect fragmentation is probably be minimal. * Memory is not reported as free even when my cache is destroyed. * If I remove all entries from the cache and then write them back again, the memory usage stays static - it doesn't drop at any point, even if the cache was close to full. The memory must be reclaimed at some point before it is reused - or is the problem that they are overwritten and never actually purged? Prior to 2.7, I would see the fill factor drop to near 0 indicating that the pages are still allocated but are now considered to be empty. For many use cases, it's important to have a timely and reasonably accurate estimate of memory usage because in a pure in-memory configuration (no native persistence) there is no other way to avoid an OOM condition. OOM is considered a critical error and causes the node to stop. Although this can be overridden, I am told that this is not a good idea. > Ignite memory metrics incorrect on cache usage contraction > -- > > Key: IGNITE-12096 > URL: https://issues.apache.org/jira/browse/IGNITE-12096 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 2.7 >Reporter: Colin Cassidy >Priority: Critical > > When using the Ignite metrics API to measure available memory, the usage > figures appear to be accurate while memory is being consumed - but when > memory is freed the metrics do not drop. They appear to report that memory > has not been freed up, even though it has. > Reproducer below. This affects Ignite 2.7+. > {{}}{{import org.apache.ignite.failure.NoOpFailureHandler; }} > {{import org.junit.Test; }} > {{public class MemoryTest2 { }} > {{ private static final String CACHE_NAME = "cache"; }} > {{ private static final String DEFAULT_MEMORY_REGION = "Default_Region"; }} > {{ private static final long MEM_SIZE = 100L * 1024 * 1024; }} > {{ @Test }} > {{ public void testOOM() throws InterruptedException { }} > {{ try (Ignite ignite = startIgnite("IgniteMemoryMonitorTest1")) { }} > {{ fillDataRegion(ignite); }} > {{ CacheConfiguration cfg = new }} > {{CacheConfiguration<>(CACHE_NAME); }} > {{ cfg.setStatisticsEnabled(true); }} > {{ IgniteCache cache = }} > {{ignite.getOrCreateCache(cfg); }} > {{ // Clear all entries from the cache to free up memory }} > {{ memUsed(ignite); }} > {{ cache.clear(); }} > {{ cache.removeAll(); }} > {{ cache.put("Key", "Value"); }} > {{ memUsed(ignite); }} > {{ cache.destroy(); }} > {{ Thread.sleep(5000); }} > {{ // Should now report close to 0% but reports 59% still }} > {{ memUsed(ignite); }} > {{ } }} > {{ } }} > {{ }} > {{ private Ignite startIgnite(String instanceName) { }} > {{ IgniteConfiguration cfg = new IgniteConfiguration(); }} > {{ cfg.setIgniteInstanceName(instanceName); }} > {{ cfg.setDataStorageConfiguration(createDataStorageConfiguration()); > }} > {{ cfg.setFailureHandler(new NoOpFailureHandler()); }} > {{ return Ignition.start(cfg); }} > {{ } }} > {{ private DataStorageConfiguration createDataStorageConfiguration() { }} > {{ return new DataStorageConfiguration() }} > {{ .setDefaultDataRegionConfiguration( }} > {{ new DataRegionConfiguration() }} > {{ .setName(DEFAULT_MEMORY_REGION) }} > {{ .setInitialSize(MEM_SIZE) }} > {{ .setMaxSize(MEM_SIZE) }} > {{ .setMetricsEnabled(true)); }} > {{ } }} > {{ private void fillDataRegion(Ignite ignite) { }} > {{ byte[] megabyte = new byte[1024 * 1024]; }} > {{ IgniteCache cache = }} > {{ ignite.getOrCreateCache(CACHE_NAME); }} > {{ for (int i = 0; i <
[jira] [Commented] (IGNITE-12096) Ignite memory metrics incorrect on cache usage contraction
[ https://issues.apache.org/jira/browse/IGNITE-12096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914311#comment-16914311 ] Denis Magda commented on IGNITE-12096: -- I'm not sure that the recommended way for the used space calculation. Memory cleaning can be deferred until the compaction process kicks off: https://apacheignite.readme.io/docs/memory-defragmentation Try to adjust the way you do the calculation and see if there is any change: https://apacheignite.readme.io/docs/memory-metrics#section-memory-usage-calculation But I still believe that we need to way for next compaction round to purge deleted entries from memory. [~DmitriyGovorukhin] does it sound correct? > Ignite memory metrics incorrect on cache usage contraction > -- > > Key: IGNITE-12096 > URL: https://issues.apache.org/jira/browse/IGNITE-12096 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 2.7 >Reporter: Colin Cassidy >Priority: Critical > > When using the Ignite metrics API to measure available memory, the usage > figures appear to be accurate while memory is being consumed - but when > memory is freed the metrics do not drop. They appear to report that memory > has not been freed up, even though it has. > Reproducer below. This affects Ignite 2.7+. > {{}}{{import org.apache.ignite.failure.NoOpFailureHandler; }} > {{import org.junit.Test; }} > {{public class MemoryTest2 { }} > {{ private static final String CACHE_NAME = "cache"; }} > {{ private static final String DEFAULT_MEMORY_REGION = "Default_Region"; }} > {{ private static final long MEM_SIZE = 100L * 1024 * 1024; }} > {{ @Test }} > {{ public void testOOM() throws InterruptedException { }} > {{ try (Ignite ignite = startIgnite("IgniteMemoryMonitorTest1")) { }} > {{ fillDataRegion(ignite); }} > {{ CacheConfiguration cfg = new }} > {{CacheConfiguration<>(CACHE_NAME); }} > {{ cfg.setStatisticsEnabled(true); }} > {{ IgniteCache cache = }} > {{ignite.getOrCreateCache(cfg); }} > {{ // Clear all entries from the cache to free up memory }} > {{ memUsed(ignite); }} > {{ cache.clear(); }} > {{ cache.removeAll(); }} > {{ cache.put("Key", "Value"); }} > {{ memUsed(ignite); }} > {{ cache.destroy(); }} > {{ Thread.sleep(5000); }} > {{ // Should now report close to 0% but reports 59% still }} > {{ memUsed(ignite); }} > {{ } }} > {{ } }} > {{ }} > {{ private Ignite startIgnite(String instanceName) { }} > {{ IgniteConfiguration cfg = new IgniteConfiguration(); }} > {{ cfg.setIgniteInstanceName(instanceName); }} > {{ cfg.setDataStorageConfiguration(createDataStorageConfiguration()); > }} > {{ cfg.setFailureHandler(new NoOpFailureHandler()); }} > {{ return Ignition.start(cfg); }} > {{ } }} > {{ private DataStorageConfiguration createDataStorageConfiguration() { }} > {{ return new DataStorageConfiguration() }} > {{ .setDefaultDataRegionConfiguration( }} > {{ new DataRegionConfiguration() }} > {{ .setName(DEFAULT_MEMORY_REGION) }} > {{ .setInitialSize(MEM_SIZE) }} > {{ .setMaxSize(MEM_SIZE) }} > {{ .setMetricsEnabled(true)); }} > {{ } }} > {{ private void fillDataRegion(Ignite ignite) { }} > {{ byte[] megabyte = new byte[1024 * 1024]; }} > {{ IgniteCache cache = }} > {{ ignite.getOrCreateCache(CACHE_NAME); }} > {{ for (int i = 0; i < 50; i++) { }} > {{ cache.put(i, megabyte); }} > {{ memUsed(ignite); }} > {{ } }} > {{ } }} > {{ private void memUsed(Ignite ignite) { }} > {{ DataRegionConfiguration defaultDataRegionCfg = }} > {{ignite.configuration() }} > {{ .getDataStorageConfiguration() }} > {{ .getDefaultDataRegionConfiguration(); }} > {{ String regionName = defaultDataRegionCfg.getName(); }} > {{ DataRegionMetrics metrics = ignite.dataRegionMetrics(regionName); }} > {{ float usedMem = metrics.getPagesFillFactor() * }} > {{metrics.getTotalAllocatedPages() * metrics.getPageSize(); }} > {{ float pctUsed = 100 * usedMem / defaultDataRegionCfg.getMaxSize(); > }} > {{ System.out.println("Memory used: " + pctUsed + "%"); }} > {{ } }} > {{} }} -- This message was sent by Atlassian Jira (v8.3.2#803003)