[ 
https://issues.apache.org/jira/browse/THRIFT-4591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James E. King III updated THRIFT-4591:
--------------------------------------
    Description: 
1) realize thrift server with TNonblockingServer via c++;

2) realize thrift client via lua lib and choose frame transport.

3) call remote interface failed with "TTransportException:0: Default (unknown)" 
print, and the server show "TConnection::workSocket(): THRIFT_EAGAIN 
(unavailable resources)" error.

4)investigate this fault with tcpdump tool, attachment 9090.pcap show the frame 
msg doesnot contains frame size field, the rifht situation of attachment 
9090_1.pcap show the frame msg contains 4 bytes (00 00 00 25) before protocol 
id field.

5) dig into the fault and tried to find root cause, then i found there is an 
fault in TFramedTransport:flush function in TFramedTransport.lua file. the 
original realization is:

-----

function TFramedTransport:flush()
  if self.doWrite == false then
    return self.trans:flush()
  end

  -- If the write fails we still want wBuf to be clear
  local tmp = self.wBuf
  self.wBuf = ''
  local frame_len_buf = libluabpack.bpack("i", string.len(tmp))
  self.trans:write(frame_len_buf)
  self.trans:write(tmp)
  self.trans:flush()
end

-----

which send frame size file and reset msg content independently.

  was:
(jking): C++ TFramedTransport reads the frame size then attempts to read the 
message.  If it only gets part of the message it returns the partial read, and 
the upper layer will not be able to decode the message, further read may be 
called again, when it will go and try to read a frame size again, but it could 
be in the middle of message payload the underlying transport hadn't yet 
received.  It's amazing to see this in code that's been around so long!

Original Bug report:

1) realize thrift server with TNonblockingServer via c++;

2) realize thrift client via lua lib and choose frame transport.

3) call remote interface failed with "TTransportException:0: Default (unknown)" 
print, and the server show "TConnection::workSocket(): THRIFT_EAGAIN 
(unavailable resources)" error.

4)investigate this fault with tcpdump tool, attachment 9090.pcap show the frame 
msg doesnot contains frame size field, the rifht situation of attachment 
9090_1.pcap show the frame msg contains 4 bytes (00 00 00 25) before protocol 
id field.

5) dig into the fault and tried to find root cause, then i found there is an 
fault in TFramedTransport:flush function in TFramedTransport.lua file. the 
original realization is:

-----

function TFramedTransport:flush()
  if self.doWrite == false then
    return self.trans:flush()
  end

  -- If the write fails we still want wBuf to be clear
  local tmp = self.wBuf
  self.wBuf = ''
  local frame_len_buf = libluabpack.bpack("i", string.len(tmp))
  self.trans:write(frame_len_buf)
  self.trans:write(tmp)
  self.trans:flush()
end

-----

which send frame size file and reset msg content independently.

 ----------------------

(jking) Analysis of original report: it fixes the sender to send once, but it 
shouldn't matter if the size is sent separately from the payload.  It's the 
receiver where the root cause is, in this case the C++ library.  This issue may 
not be limited to the C++ implementation, but we need a test to insert a pause 
between sending a frame size and sending the payload and see what happens on 
all the implementations.

We're not going to merge the lua client fix as it doubles the memory 
requirements to send, despite reducing the write() count from 2 to 1.


> Incompatibility using non-blocking server and frame transport on C++ side?
> --------------------------------------------------------------------------
>
>                 Key: THRIFT-4591
>                 URL: https://issues.apache.org/jira/browse/THRIFT-4591
>             Project: Thrift
>          Issue Type: Bug
>          Components: C++ - Library
>    Affects Versions: 0.11.0
>            Reporter: allen_lee
>            Assignee: James E. King III
>            Priority: Blocker
>         Attachments: 9090.pcap, 9090_1.pcap
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> 1) realize thrift server with TNonblockingServer via c++;
> 2) realize thrift client via lua lib and choose frame transport.
> 3) call remote interface failed with "TTransportException:0: Default 
> (unknown)" print, and the server show "TConnection::workSocket(): 
> THRIFT_EAGAIN (unavailable resources)" error.
> 4)investigate this fault with tcpdump tool, attachment 9090.pcap show the 
> frame msg doesnot contains frame size field, the rifht situation of 
> attachment 9090_1.pcap show the frame msg contains 4 bytes (00 00 00 25) 
> before protocol id field.
> 5) dig into the fault and tried to find root cause, then i found there is an 
> fault in TFramedTransport:flush function in TFramedTransport.lua file. the 
> original realization is:
> -----
> function TFramedTransport:flush()
>   if self.doWrite == false then
>     return self.trans:flush()
>   end
>   -- If the write fails we still want wBuf to be clear
>   local tmp = self.wBuf
>   self.wBuf = ''
>   local frame_len_buf = libluabpack.bpack("i", string.len(tmp))
>   self.trans:write(frame_len_buf)
>   self.trans:write(tmp)
>   self.trans:flush()
> end
> -----
> which send frame size file and reset msg content independently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to