cshannon commented on PR #3134: URL: https://github.com/apache/accumulo/pull/3134#issuecomment-1373735941
@ctubbsii and @dlmarion - Ok so I finally figured out what happened. Everything was working with the maxFrameSize setting being configured on the transport factory until version 0.17.0. The test I just added passes fine without modification to TServerUtils with version 0.16.0. The reason it broke in 0.17.0 was due to the change that is part of https://github.com/apache/thrift/pull/2533 . Specifically this commit introduced the maxFrameSize option as part of the constructor https://github.com/apache/thrift/commit/66ac7b46fab85f175aec601cb48ea05408a1c186 and is necessary to configure. My fix works but this behavior change is obviously a breaking change and I'm not sure whether or not it's something that was intended or should maybe be addressed in Thrift itself at some point. ### TLDR version: My commit in this PR fixes the issue and it only applies to version 2.1.0 since no other version uses Thrift 0.17.0. So this is a new regression and does not apply to previous versions, like 1.10. Also, when we configure the settings for our custom non blocking server we do still need to keep our existing maxFrameSize settings on both the transport factory configuration and the `maxReadBufferBytes` property on the options as shown here otherwise it breaks further downstream, even if we set the constructor args. https://github.com/apache/accumulo/blob/706612f859d6e68891d487d624eda9ecf3fea7f9/server/base/src/main/java/org/apache/accumulo/server/rpc/TServerUtils.java#L274-L276 ### Detailed Version: #### Why it breaks The root cause is the changes inside of `AbstractNonblockingServer`. This commit changed how max frame size is checked during reads, here is version [0.16.0](https://github.com/apache/thrift/blob/2a93df80f27739ccabb5b885cb12a8dc7595ecdf/lib/java/src/org/apache/thrift/server/AbstractNonblockingServer.java#L355). Previously, before 0.17.0, frame size was compared against `MAX_READ_BUFFER_BYTES` which comes from `maxReadBufferBytes` and is set by us inside [TServerUtils](https://github.com/apache/accumulo/blob/706612f859d6e68891d487d624eda9ecf3fea7f9/server/base/src/main/java/org/apache/accumulo/server/rpc/TServerUtils.java#L276) But in version 0.17.0, this has been refactored now and instead of reading the value from `MAX_READ_BUFFER_BYTES`, this is read from `trans_.getMaxFrameSize()` as shown here in version [0.17.0](https://github.com/apache/thrift/blob/4d493e867b349f3475203ef9848353b315203c51/lib/java/src/main/java/org/apache/thrift/server/AbstractNonblockingServer.java#L335). This value is [configured](https://github.com/apache/thrift/blob/4d493e867b349f3475203ef9848353b315203c51/lib/java/src/main/java/org/apache/thrift/transport/TNonblockingServerSocket.java#L137) by the new constructor argument which we were not setting so it was always the default value and caused the failure. #### Why our existing config doesn't work Now, what confused me initially is that we also set the value on the configuration of the factory so I was unsure why there were two ways to configure the setting and why ours didn't work. It turns out the reason is they are different configuration objects used for different things. The `maxFrameSize` that is now set by the constructor argument is applied to the `trans_` transport which is passed to `FrameBuffer` as a [constructor arg ](https://github.com/apache/thrift/blob/4d493e867b349f3475203ef9848353b315203c51/lib/java/src/main/java/org/apache/thrift/server/AbstractNonblockingServer.java#L288)and configured back in `TNonblockingServerSocket` on line 137. However, the `maxFrameSize` set on the configuration for the factory we create is applied to different configurations and transports as the factory that we configure doesn't get used to create `trans_` transport but is used to create the `inTrans_` and `outTrans_` transport objects in FrameBuffer as shown [here](https://github.com/apache/thrift/blob/4d493e867b349f3475203ef9848353b315203c51/lib/java/src/main/java/org/apache/thrift/server/AbstractNonblockingServer.java#L295-L296). So there are multiple transports configured in `FrameBuffer` and the constructor arg configuration applies to one of them and the factory configuration applies to the others. #### Why we need to keep existing config as well As I said above, we still need to keep the existing configuration on the transport factory. The new constructor setting will allow the read to get further but if we don't set the factory config then we get errors downstream [here](https://github.com/apache/thrift/blob/4d493e867b349f3475203ef9848353b315203c51/lib/java/src/main/java/org/apache/thrift/transport/layered/TFramedTransport.java#L141) when reading a frame. Furthermore, I think we need to keep setting the `maxFrameSize` configuration on `options.maxReadBufferBytes` because if the frame size is larger than that value then thrift just hangs and spins forever with no error (which is even worse). This happened when I was testing not setting it. This is because it's [waiting](https://github.com/apache/thrift/blob/2a93df80f27739ccabb5b885cb12a8dc7595ecdf/lib/java/src/org/apache/thrift/server/AbstractNonblockingServer.java#L363) for more memory to free up but it won't ever be free since the frame to be read is less than the configured max frame size so it tries to read it but it is larger than the max read bytes so it never finishes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
