[ https://issues.apache.org/jira/browse/GEODE-9825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bill Burcham reassigned GEODE-9825: ----------------------------------- Assignee: Bill Burcham > Disparate socket-buffer-size Results in "IOException: Unknown header byte" > and Hangs > ------------------------------------------------------------------------------------ > > Key: GEODE-9825 > URL: https://issues.apache.org/jira/browse/GEODE-9825 > Project: Geode > Issue Type: Bug > Components: messaging > Affects Versions: 1.12.4, 1.15.0 > Reporter: Bill Burcham > Assignee: Bill Burcham > Priority: Major > Labels: pull-request-available > > GEODE-9141 introduced a bug that causes {{IOException: "Unknown header > byte..."}} and hangs if members are configured with different > {{socket-buffer-size}} settings. > h2. Reproduction > To reproduce this bug turn off TLS and set socket-buffer-size on sender to be > 64KB and set socket-buffer-size on receiver to be 32KB. See associated PR for > an example. > h2. Analysis > In {{{}Connection.processInputBuffer(){}}}. When that method has read all the > messages it can from the current input buffer, it then considers whether the > buffer needs expansion. If it does then: > {code:java} > inputBuffer = inputSharing.expandReadBufferIfNeeded(allocSize); {code} > Is executed and the method returns. The caller then expects to be able to > _write_ bytes into {{{}inputBuffer{}}}. > The problem, it seems, is that > {{ByteBufferSharingInternalImpl.expandReadBufferIfNeeded()}} does not leave > the the {{ByteBuffer}} in the proper state. It leaves the buffer ready to be > _read_ not written. > Before the changes for GEODE-9141 were introduced, the line of code > referenced above used to be this snippet in > {{Connection.compactOrResizeBuffer(int messageLength)}} (that method has > since been removed): > {code:java} > // need a bigger buffer > logger.info("Allocating larger network read buffer, new size is {} old > size was {}.", > allocSize, oldBufferSize); > ByteBuffer oldBuffer = inputBuffer; > inputBuffer = getBufferPool().acquireDirectReceiveBuffer(allocSize); > if (oldBuffer != null) { > int oldByteCount = oldBuffer.remaining(); > inputBuffer.put(oldBuffer); > inputBuffer.position(oldByteCount); > getBufferPool().releaseReceiveBuffer(oldBuffer); > } {code} > Notice how this method leaves {{inputBuffer}} ready to be _written_ to. > But the code inside > {{ByteBufferSharingInternalImpl.expandReadBufferIfNeeded()}} is doing > something like: > {code:java} > newBuffer.clear(); > newBuffer.put(existing); > newBuffer.flip(); > releaseBuffer(type, existing); > return newBuffer; {code} > A solution (shown in the associated PR) is to do add logic after the call to > {{expandReadBufferIfNeeded(allocSize)}} to leave the buffer in a _writeable_ > state: > {code:java} > inputBuffer = inputSharing.expandReadBufferIfNeeded(allocSize); > // we're returning to the caller (done == true) so make buffer writeable > inputBuffer.position(inputBuffer.limit()); > inputBuffer.limit(inputBuffer.capacity()); {code} > h2. Resolution > When this ticket is complete the bug will be fixed and > {{P2PMessagingConcurrencyDUnitTest}} will be enhanced to test at least these > combinations: > [security, sender/locator socket-buffer-size, receiver socket-buffer-size] > [TLS, (default), (default)] this is what the test currently does > [no TLS, 64 * 1024, 32 * 1024] *new: this illustrates this bug* > [no TLS, (default), (default)] *new* > We might want to mix in conserve-sockets true/false in there too while we're > at it (the test currently holds it at true). -- This message was sent by Atlassian Jira (v8.20.1#820001)