[ 
https://issues.apache.org/jira/browse/THRIFT-4591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James E. King III updated THRIFT-4591:
--------------------------------------
    Summary: Incompatibility using non-blocking server and frame transport on 
C++ side?  (was: C++ TFramedTransport fails miserably on partial message read)

> Incompatibility using non-blocking server and frame transport on C++ side?
> --------------------------------------------------------------------------
>
>                 Key: THRIFT-4591
>                 URL: https://issues.apache.org/jira/browse/THRIFT-4591
>             Project: Thrift
>          Issue Type: Bug
>          Components: C++ - Library
>    Affects Versions: 0.11.0
>            Reporter: allen_lee
>            Assignee: James E. King III
>            Priority: Blocker
>         Attachments: 9090.pcap, 9090_1.pcap
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> (jking): C++ TFramedTransport reads the frame size then attempts to read the 
> message.  If it only gets part of the message it returns the partial read, 
> and the upper layer will not be able to decode the message, further read may 
> be called again, when it will go and try to read a frame size again, but it 
> could be in the middle of message payload the underlying transport hadn't yet 
> received.  It's amazing to see this in code that's been around so long!
> Original Bug report:
> 1) realize thrift server with TNonblockingServer via c++;
> 2) realize thrift client via lua lib and choose frame transport.
> 3) call remote interface failed with "TTransportException:0: Default 
> (unknown)" print, and the server show "TConnection::workSocket(): 
> THRIFT_EAGAIN (unavailable resources)" error.
> 4)investigate this fault with tcpdump tool, attachment 9090.pcap show the 
> frame msg doesnot contains frame size field, the rifht situation of 
> attachment 9090_1.pcap show the frame msg contains 4 bytes (00 00 00 25) 
> before protocol id field.
> 5) dig into the fault and tried to find root cause, then i found there is an 
> fault in TFramedTransport:flush function in TFramedTransport.lua file. the 
> original realization is:
> -----
> function TFramedTransport:flush()
>   if self.doWrite == false then
>     return self.trans:flush()
>   end
>   -- If the write fails we still want wBuf to be clear
>   local tmp = self.wBuf
>   self.wBuf = ''
>   local frame_len_buf = libluabpack.bpack("i", string.len(tmp))
>   self.trans:write(frame_len_buf)
>   self.trans:write(tmp)
>   self.trans:flush()
> end
> -----
> which send frame size file and reset msg content independently.
>  ----------------------
> (jking) Analysis of original report: it fixes the sender to send once, but it 
> shouldn't matter if the size is sent separately from the payload.  It's the 
> receiver where the root cause is, in this case the C++ library.  This issue 
> may not be limited to the C++ implementation, but we need a test to insert a 
> pause between sending a frame size and sending the payload and see what 
> happens on all the implementations.
> We're not going to merge the lua client fix as it doubles the memory 
> requirements to send, despite reducing the write() count from 2 to 1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to