keith-turner opened a new issue, #3144: URL: https://github.com/apache/accumulo/issues/3144
**Describe the bug** While working on #3143 and running [these test](https://gist.github.com/keith-turner/f2159111b025e600a6e0abbaba1d92f3) I saw the following exception in the scan server that caused a scan to fail with a server side error. ``` 2022-12-29T15:03:16,443 [tserver.ScanServer] INFO : RFFS 169171 extent not found in metadata table 1;000139510< 2022-12-29T15:03:16,443 [tserver.ScanServer] ERROR: Error starting scan org.apache.accumulo.core.tabletserver.thrift.NotServingTabletException: null at org.apache.accumulo.tserver.ScanServer.reserveFilesInner(ScanServer.java:503) ~[accumulo-tserver-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.accumulo.tserver.ScanServer.reserveFiles(ScanServer.java:641) ~[accumulo-tserver-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.accumulo.tserver.ScanServer.startMultiScan(ScanServer.java:884) ~[accumulo-tserver-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at jdk.internal.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) ~[?:?] at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?] at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?] at org.apache.accumulo.core.trace.TraceUtil.lambda$wrapService$0(TraceUtil.java:202) ~[accumulo-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at com.sun.proxy.$Proxy34.startMultiScan(Unknown Source) ~[?:?] at org.apache.accumulo.core.tabletserver.thrift.TabletScanClientService$Processor$startMultiScan.getResult(TabletScanClientService.java:855) ~[accumulo-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.accumulo.core.tabletserver.thrift.TabletScanClientService$Processor$startMultiScan.getResult(TabletScanClientService.java:831) ~[accumulo-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:40) ~[libthrift-0.17.0.jar:0.17.0] at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:40) ~[libthrift-0.17.0.jar:0.17.0] at org.apache.thrift.TMultiplexedProcessor.process(TMultiplexedProcessor.java:147) ~[libthrift-0.17.0.jar:0.17.0] at org.apache.accumulo.server.rpc.TimedProcessor.process(TimedProcessor.java:54) ~[accumulo-server-base-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:492) ~[libthrift-0.17.0.jar:0.17.0] at org.apache.accumulo.server.rpc.CustomNonBlockingServer$CustomFrameBuffer.invoke(CustomNonBlockingServer.java:129) ~[accumulo-server-base-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.thrift.server.Invocation.run(Invocation.java:18) ~[libthrift-0.17.0.jar:0.17.0] at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52) ~[accumulo-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?] at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52) ~[accumulo-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at java.lang.Thread.run(Thread.java:829) ~[?:?] 2022-12-29T15:03:16,443 [thrift.ProcessFunction] ERROR: Internal error processing startMultiScan org.apache.accumulo.core.tabletserver.thrift.NotServingTabletException: null at org.apache.accumulo.tserver.ScanServer.reserveFilesInner(ScanServer.java:503) ~[accumulo-tserver-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.accumulo.tserver.ScanServer.reserveFiles(ScanServer.java:641) ~[accumulo-tserver-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.accumulo.tserver.ScanServer.startMultiScan(ScanServer.java:884) ~[accumulo-tserver-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at jdk.internal.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) ~[?:?] at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?] at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?] at org.apache.accumulo.core.trace.TraceUtil.lambda$wrapService$0(TraceUtil.java:202) ~[accumulo-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at com.sun.proxy.$Proxy34.startMultiScan(Unknown Source) ~[?:?] at org.apache.accumulo.core.tabletserver.thrift.TabletScanClientService$Processor$startMultiScan.getResult(TabletScanClientService.java:855) ~[accumulo-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.accumulo.core.tabletserver.thrift.TabletScanClientService$Processor$startMultiScan.getResult(TabletScanClientService.java:831) ~[accumulo-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:40) ~[libthrift-0.17.0.jar:0.17.0] at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:40) ~[libthrift-0.17.0.jar:0.17.0] at org.apache.thrift.TMultiplexedProcessor.process(TMultiplexedProcessor.java:147) ~[libthrift-0.17.0.jar:0.17.0] at org.apache.accumulo.server.rpc.TimedProcessor.process(TimedProcessor.java:54) ~[accumulo-server-base-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:492) ~[libthrift-0.17.0.jar:0.17.0] at org.apache.accumulo.server.rpc.CustomNonBlockingServer$CustomFrameBuffer.invoke(CustomNonBlockingServer.java:129) ~[accumulo-server-base-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.thrift.server.Invocation.run(Invocation.java:18) ~[libthrift-0.17.0.jar:0.17.0] at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52) ~[accumulo-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?] at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52) ~[accumulo-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at java.lang.Thread.run(Thread.java:829) ~[?:?] ``` Looking into this, the ScanSever is not properly handling not finding a tablet in the metadata table for a batch scan. For normal scans when a tablet is not found the thrift RPC throws a NotServingTabletException. Batch scans process multiple tablets in a single RPC and do not throw this exception. A batch scan RPC can process a subset of the requested tablets, so the RPC returns a list of the extents that it did not process. The scan server does not do this, it throws the NotServingTabletException which is not declared on the startMultiscan RPC and therefore ended up looking like a server side error the the client. This is where the [problems happens](https://github.com/apache/accumulo/blob/f7a62bf46bff438c1f77c62d6261f33fa9c0beb3/server/tserver/src/main/java/org/apache/accumulo/tserver/ScanServer.java#L884) in the scan server code. For batch scans, the scan server need to return the list of failed tablets for ones it could not find. **To Reproduce** Run the tests mentioned earlier for a while. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
