Yah I saw this a lot when I wasn't closing thrift connections...but also
saw it when the client would close prematurely and not return the transport
to the thrift transport pool .

In one case I hadn't finished with the work in a thread but kept opening
thrift connections since it would be 'time sliced' for io. In that case I
opened too many sockets ( fds )...maybe hitting max open files because a
transport isn't being returned in the middle of a work unit ?

On Tue, Aug 30, 2016, 6:12 PM Christopher <ctubb...@apache.org> wrote:

> Thrift is not happy on some replication ITs I've run lately. I had one test
> timeout after 40 minutes... and it never finished. The symptom is lots of
> client side messages about failure to open transport, and the server side
> messages were (and both were occurring a *lot*, indicating indefinite
> retries):
>
> 2016-08-30 19:48:13,476 [rpc.CustomNonBlockingServer$CustomFrameBuffer]
> WARN : Got an IOException in internalRead!
> java.io.IOException: Connection reset by peer
>         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>         at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:384)
>         at
>
> org.apache.thrift.transport.TNonblockingSocket.read(TNonblockingSocket.java:142)
>         at
>
> org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.internalRead(AbstractNonblockingServer.java:539)
>         at
>
> org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.read(AbstractNonblockingServer.java:338)
>         at
>
> org.apache.thrift.server.AbstractNonblockingServer$AbstractSelectThread.handleRead(AbstractNonblockingServer.java:203)
>         at
>
> org.apache.thrift.server.TNonblockingServer$SelectAcceptThread.select(TNonblockingServer.java:203)
>         at
>
> org.apache.thrift.server.TNonblockingServer$SelectAcceptThread.run(TNonblockingServer.java:154)
>
> I saw one comment on a mailing list somewhere that indicated this might be
> caused by a client side handling of a custom Thrift Exception, not properly
> closing the connection. It's possible we're doing something badly before we
> retry. I think more investigation is needed before I file a JIRA (not even
> sure what to file it against, right now... because I'm not sure what
> component is even at fault).
>
> In the meantime, has anybody seen this? Does anybody have any insight into
> this? This is all on a single node, running ITs. There really shouldn't be
> any "network" problems which would cause a TCP reset from external to the
> test and Accumulo itself, since it's all localhost.
>

Reply via email to