keith-ratcliffe opened a new issue #1973:
URL: https://github.com/apache/accumulo/issues/1973
**Bug Description**
It appears that we've encountered the same behavior as observed in #774
while testing our rel/2.0.1 deployment in AWS. For a period of time, we
observed that our table scans would sporadically fail with the error below
(maybe once or twice per hour over a period of a couple days of continuous
query testing)
```
[scan.LookupTask] WARN : exception while doing multi-scan
java.lang.IllegalStateException: Block was evicted
at
org.apache.accumulo.core.file.blockfile.cache.lru.CachedBlock.recordSize(CachedBlock.java:159)
at
org.apache.accumulo.core.file.blockfile.cache.lru.LruBlockCache.cacheBlock(LruBlockCache.java:238)
at
org.apache.accumulo.core.file.blockfile.cache.lru.LruBlockCache.cacheBlock(LruBlockCache.java:263)
at
org.apache.accumulo.core.file.blockfile.cache.lru.SynchronousLoadingBlockCache.getBlock(SynchronousLoadingBlockCache.java:133)
at
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:409)
at
org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader.getIndexBlock(MultiLevelIndex.java:808)
at
org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader.access$100(MultiLevelIndex.java:585)
at
org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader$Node.lookup(MultiLevelIndex.java:628)
at
org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader$Node.access$400(MultiLevelIndex.java:591)
at
org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader.lookup(MultiLevelIndex.java:817)
at
org.apache.accumulo.core.file.rfile.RFile$LocalityGroupReader._seek(RFile.java:997)
at
org.apache.accumulo.core.file.rfile.RFile$LocalityGroupReader.seek(RFile.java:895)
at
org.apache.accumulo.core.iterators.system.LocalityGroupIterator.seek(LocalityGroupIterator.java:281)
at
org.apache.accumulo.core.file.rfile.RFile$Reader.seek(RFile.java:1437)
at
org.apache.accumulo.server.problems.ProblemReportingIterator.seek(ProblemReportingIterator.java:103)
at
org.apache.accumulo.core.iterators.system.MultiIterator.seek(MultiIterator.java:104)
at
org.apache.accumulo.core.iterators.system.StatsIterator.seek(StatsIterator.java:63)
at
org.apache.accumulo.core.iterators.system.DeletingIterator.seek(DeletingIterator.java:73)
at
org.apache.accumulo.core.iterators.ServerSkippingIterator.seek(ServerSkippingIterator.java:52)
at
org.apache.accumulo.core.iterators.system.ColumnFamilySkippingIterator.seek(ColumnFamilySkippingIterator.java:127)
at
org.apache.accumulo.core.iterators.SynchronizedServerFilter.seek(SynchronizedServerFilter.java:56)
at
org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:99)
at
org.apache.accumulo.core.iterators.user.VersioningIterator.seek(VersioningIterator.java:82)
at aquery.iterators.Query.get_pageid(Query.java:453)
at aquery.iterators.Query.next(Query.java:351)
at
org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.readNext(SourceSwitchingIterator.java:174)
at
org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.next(SourceSwitchingIterator.java:145)
at org.apache.accumulo.tserver.tablet.Tablet.lookup(Tablet.java:641)
at org.apache.accumulo.tserver.tablet.Tablet.lookup(Tablet.java:770)
at
org.apache.accumulo.tserver.scan.LookupTask.run(LookupTask.java:116)
at
org.apache.accumulo.tserver.session.ScanSession$ScanMeasurer.run(ScanSession.java:54)
at
org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at
org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
at java.lang.Thread.run(Thread.java:748)
```
**Versions:**
- **Accumulo:** 2.0.1
- **OS:** CentOS 7.7.1908
- **Java:** OpenJDK 1.8.0_272
- **Hadoop:** 3.0.0-cdh6.3.1
- **ZK:** 3.4.14
**Cluster Environment:**
- 12-node m5.2xlarge ec2 cluster; 10 tservers; 3 ZKs; Hadoop HA namenodes
- LRU cache config:
```
default | tserver.cache.data.size ........................... | 10%
site | @override ...................................... | 2G
system | @override ...................................... | 4G
default | tserver.cache.index.size .......................... | 25%
site | @override ...................................... | 2G
system | @override ...................................... | 1G
default | tserver.cache.manager.class ....................... |
org.apache.accumulo.core.file.blockfile.cache.lru.LruBlockCacheManager
```
**Other relevant facts:**
- As best we can tell, the errors began to surface several hours after
having switched `tserver.cache.manager.class` (via accumulo shell) from
*[accumulo.ohc.OhcCacheManager](https://github.com/keith-turner/accumulo-ohc)*
back to *LruBlockCacheManager*
- No intervening restart of Accumulo was performed after said config change
in the shell
- At the time of the errors, the cluster was under medium-heavy load from
concurrent user queries/scans and live ingest
- After a recent full restart of Accumulo, the errors seem to have ceased
There is no evidence in logs of this error having occurred prior to this
instance (going back over a month), and several hours have lapsed since the
full restart of Accumulo. No recurrence so far
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]