keith-turner opened a new issue, #3144:
URL: https://github.com/apache/accumulo/issues/3144

   **Describe the bug**
   
   While working on #3143 and running [these 
test](https://gist.github.com/keith-turner/f2159111b025e600a6e0abbaba1d92f3) I 
saw the following exception in the scan server that caused a scan to fail with 
a server side error.
   
   ```
   2022-12-29T15:03:16,443 [tserver.ScanServer] INFO : RFFS 169171 extent not 
found in metadata table 1;000139510<
   2022-12-29T15:03:16,443 [tserver.ScanServer] ERROR: Error starting scan
   org.apache.accumulo.core.tabletserver.thrift.NotServingTabletException: null
        at 
org.apache.accumulo.tserver.ScanServer.reserveFilesInner(ScanServer.java:503) 
~[accumulo-tserver-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at 
org.apache.accumulo.tserver.ScanServer.reserveFiles(ScanServer.java:641) 
~[accumulo-tserver-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at 
org.apache.accumulo.tserver.ScanServer.startMultiScan(ScanServer.java:884) 
~[accumulo-tserver-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at jdk.internal.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) 
~[?:?]
        at 
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:?]
        at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
        at 
org.apache.accumulo.core.trace.TraceUtil.lambda$wrapService$0(TraceUtil.java:202)
 ~[accumulo-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at com.sun.proxy.$Proxy34.startMultiScan(Unknown Source) ~[?:?]
        at 
org.apache.accumulo.core.tabletserver.thrift.TabletScanClientService$Processor$startMultiScan.getResult(TabletScanClientService.java:855)
 ~[accumulo-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at 
org.apache.accumulo.core.tabletserver.thrift.TabletScanClientService$Processor$startMultiScan.getResult(TabletScanClientService.java:831)
 ~[accumulo-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:40) 
~[libthrift-0.17.0.jar:0.17.0]
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:40) 
~[libthrift-0.17.0.jar:0.17.0]
        at 
org.apache.thrift.TMultiplexedProcessor.process(TMultiplexedProcessor.java:147) 
~[libthrift-0.17.0.jar:0.17.0]
        at 
org.apache.accumulo.server.rpc.TimedProcessor.process(TimedProcessor.java:54) 
~[accumulo-server-base-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at 
org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:492)
 ~[libthrift-0.17.0.jar:0.17.0]
        at 
org.apache.accumulo.server.rpc.CustomNonBlockingServer$CustomFrameBuffer.invoke(CustomNonBlockingServer.java:129)
 ~[accumulo-server-base-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.apache.thrift.server.Invocation.run(Invocation.java:18) 
~[libthrift-0.17.0.jar:0.17.0]
        at 
org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
 ~[accumulo-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
~[?:?]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
~[?:?]
        at 
org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
 ~[accumulo-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at java.lang.Thread.run(Thread.java:829) ~[?:?]
   2022-12-29T15:03:16,443 [thrift.ProcessFunction] ERROR: Internal error 
processing startMultiScan
   org.apache.accumulo.core.tabletserver.thrift.NotServingTabletException: null
        at 
org.apache.accumulo.tserver.ScanServer.reserveFilesInner(ScanServer.java:503) 
~[accumulo-tserver-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at 
org.apache.accumulo.tserver.ScanServer.reserveFiles(ScanServer.java:641) 
~[accumulo-tserver-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at 
org.apache.accumulo.tserver.ScanServer.startMultiScan(ScanServer.java:884) 
~[accumulo-tserver-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at jdk.internal.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) 
~[?:?]
        at 
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:?]
        at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
        at 
org.apache.accumulo.core.trace.TraceUtil.lambda$wrapService$0(TraceUtil.java:202)
 ~[accumulo-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at com.sun.proxy.$Proxy34.startMultiScan(Unknown Source) ~[?:?]
        at 
org.apache.accumulo.core.tabletserver.thrift.TabletScanClientService$Processor$startMultiScan.getResult(TabletScanClientService.java:855)
 ~[accumulo-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at 
org.apache.accumulo.core.tabletserver.thrift.TabletScanClientService$Processor$startMultiScan.getResult(TabletScanClientService.java:831)
 ~[accumulo-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:40) 
~[libthrift-0.17.0.jar:0.17.0]
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:40) 
~[libthrift-0.17.0.jar:0.17.0]
        at 
org.apache.thrift.TMultiplexedProcessor.process(TMultiplexedProcessor.java:147) 
~[libthrift-0.17.0.jar:0.17.0]
        at 
org.apache.accumulo.server.rpc.TimedProcessor.process(TimedProcessor.java:54) 
~[accumulo-server-base-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at 
org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:492)
 ~[libthrift-0.17.0.jar:0.17.0]
        at 
org.apache.accumulo.server.rpc.CustomNonBlockingServer$CustomFrameBuffer.invoke(CustomNonBlockingServer.java:129)
 ~[accumulo-server-base-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.apache.thrift.server.Invocation.run(Invocation.java:18) 
~[libthrift-0.17.0.jar:0.17.0]
        at 
org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
 ~[accumulo-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
~[?:?]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
~[?:?]
        at 
org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
 ~[accumulo-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at java.lang.Thread.run(Thread.java:829) ~[?:?]
   ``` 
   
   Looking into this, the ScanSever is not properly handling not finding a 
tablet in the metadata table for a batch scan.  For normal scans when a tablet 
is not found the thrift RPC throws a NotServingTabletException.  Batch scans 
process multiple tablets in a single RPC and do not throw this exception.  A 
batch scan RPC can process a subset of the requested tablets, so the RPC 
returns a list of the extents that it did not process.  The scan server does 
not do this, it throws the NotServingTabletException which is not declared on 
the startMultiscan RPC and therefore ended up looking like a server side error 
the the client.  This is where the [problems 
happens](https://github.com/apache/accumulo/blob/f7a62bf46bff438c1f77c62d6261f33fa9c0beb3/server/tserver/src/main/java/org/apache/accumulo/tserver/ScanServer.java#L884)
 in the scan server code.  For batch scans, the scan server need to return the 
list of failed tablets for ones it could not find.
   
   **To Reproduce**
   
   Run the tests mentioned earlier for a while.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to