[ https://issues.apache.org/jira/browse/THRIFT-4591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
James E. King III updated THRIFT-4591: -------------------------------------- Summary: Incompatibility using non-blocking server and frame transport on C++ side? (was: C++ TFramedTransport fails miserably on partial message read) > Incompatibility using non-blocking server and frame transport on C++ side? > -------------------------------------------------------------------------- > > Key: THRIFT-4591 > URL: https://issues.apache.org/jira/browse/THRIFT-4591 > Project: Thrift > Issue Type: Bug > Components: C++ - Library > Affects Versions: 0.11.0 > Reporter: allen_lee > Assignee: James E. King III > Priority: Blocker > Attachments: 9090.pcap, 9090_1.pcap > > Original Estimate: 4h > Remaining Estimate: 4h > > (jking): C++ TFramedTransport reads the frame size then attempts to read the > message. If it only gets part of the message it returns the partial read, > and the upper layer will not be able to decode the message, further read may > be called again, when it will go and try to read a frame size again, but it > could be in the middle of message payload the underlying transport hadn't yet > received. It's amazing to see this in code that's been around so long! > Original Bug report: > 1) realize thrift server with TNonblockingServer via c++; > 2) realize thrift client via lua lib and choose frame transport. > 3) call remote interface failed with "TTransportException:0: Default > (unknown)" print, and the server show "TConnection::workSocket(): > THRIFT_EAGAIN (unavailable resources)" error. > 4)investigate this fault with tcpdump tool, attachment 9090.pcap show the > frame msg doesnot contains frame size field, the rifht situation of > attachment 9090_1.pcap show the frame msg contains 4 bytes (00 00 00 25) > before protocol id field. > 5) dig into the fault and tried to find root cause, then i found there is an > fault in TFramedTransport:flush function in TFramedTransport.lua file. the > original realization is: > ----- > function TFramedTransport:flush() > if self.doWrite == false then > return self.trans:flush() > end > -- If the write fails we still want wBuf to be clear > local tmp = self.wBuf > self.wBuf = '' > local frame_len_buf = libluabpack.bpack("i", string.len(tmp)) > self.trans:write(frame_len_buf) > self.trans:write(tmp) > self.trans:flush() > end > ----- > which send frame size file and reset msg content independently. > ---------------------- > (jking) Analysis of original report: it fixes the sender to send once, but it > shouldn't matter if the size is sent separately from the payload. It's the > receiver where the root cause is, in this case the C++ library. This issue > may not be limited to the C++ implementation, but we need a test to insert a > pause between sending a frame size and sending the payload and see what > happens on all the implementations. > We're not going to merge the lua client fix as it doubles the memory > requirements to send, despite reducing the write() count from 2 to 1. -- This message was sent by Atlassian JIRA (v7.6.3#76005)