Hello,
Some framing thoughts. The TFramedTransport frame buffer is not reset
with a new frame from the network until the existing frame is exhausted.
The code below returns bytes until the frame buffer is empty, only then
does readFrame() load another frame.
TFramedTransport.java:
public int read(byte[] buf, int off, int len) throws
TTransportException {
if (readBuffer_ != null) {
int got = readBuffer_.read(buf, off, len);
if (got > 0) {
return got;
}
}
// Read another frame of data
readFrame();
return readBuffer_.read(buf, off, len);
}
Because of this, reading multi-frame messages should be seamless. The
trick is writing multi-frame messages.
If you code directly to a protocol (like TBinaryProtocol) and serialize
your own structs, flushing before the max frame size is exceeded along
the way, you can send a multi-frame message. The problem is that no one
does this in practice, rather the compiler generates a struct for
parameters inbound to a server (mySvc_args) and another struct for
return data (mySvc_result) and structs write themselves in one atomic go
(you can see this in generated code for any method, it will have a name
like: myMeth_argsStandardScheme.write()).
Calling the framed transport flush() method writes the frame size and
bytes. The flush call on the client send side is made in the parent
class for service clients:
TServiceClient.java
protected void sendBase(String methodName, TBase args) throws
TException {
oprot_.writeMessageBegin(new TMessage(methodName,
TMessageType.CALL, ++seqid_));
args.write(oprot_);
oprot_.writeMessageEnd();
oprot_.getTransport().flush();
}
On the server result send side the flush is in ProcessFunction.java.
These are the principle flush points in normal Thrift service processing.
To enable flush calls during structure serialization to break structs up
across multiple frames the compiler would need to generate code that
knows when to flush within the struct serialization logic (non trivial).
This may be out of scope for an RPC framework.
Some folks who need to return large amounts of data create an
application layer protocol for returning data in pages which can be
individually requested (1 of 3, 2 of 3, 3 of 3) others use a non-Thrift
side-chain mechanism (custom, FTP, UDT, ...) explicitly designed for
large data transfer. So while there are some options for >
frame-size-max transfers, they all involve some work.
Cheers,
Randy
On 5/12/2013 11:27 AM, John R. Frank (JIRA) wrote:
[
https://issues.apache.org/jira/browse/THRIFT-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13655598#comment-13655598
]
John R. Frank edited comment on THRIFT-1324 at 5/12/13 6:26 PM:
----------------------------------------------------------------
Having just gone through such a debugging process as a result of this issue,
I'd be happy to contribute a patch.
First, am I reading line 142 in TFramedTransport.java correctly?
{code}
readBuffer_.reset(buff);
{code}
which calls this in TMemoryInputTransport.java
{code}
public void reset(byte[] buf, int offset, int length) {
buf_ = buf;
pos_ = offset;
endPos_ = offset + length;
}
{code}
That seems to *replace* the buffer, instead of gathering together multiple
frames until detecting the end, e.g. by seeing the client send a frame of less
than the max length.
If that understanding is correct, then why do some uses of TFramedTransport
have both a max frame size and also a max message size? For example, cassandra
has
{code:borderStyle=solid}
# Frame size for thrift (maximum field length).
thrift_framed_transport_size_in_mb: 1500
# The max length of a thrift message, including all fields and
# internal thrift overhead.
thrift_max_message_length_in_mb: 1600
{code}
which implies that a message could be made of >1 frame. Maybe that's just
specific to cassandra's use of thrift?
Would there be any interest in a transport that allowed chunking of large
messages?
was (Author: jrf):
Having just gone through such a debugging process as a result of this
issue, I'd be happy to contribute a patch.
First, am I reading line 142 in TFramedTransport.java correctly?
readBuffer_.reset(buff);
which calls this in TMemoryInputTransport.java
public void reset(byte[] buf, int offset, int length) {
buf_ = buf;
pos_ = offset;
endPos_ = offset + length;
}
That seems to *replace* the buffer, instead of gathering together multiple
frames until detecting the end, e.g. by seeing the client send a frame of less
than the max length.
If that understanding is correct, then why do some uses of TFramedTransport
have both a max frame size and also a max message size? For example, cassandra
has
{code:borderStyle=solid}
# Frame size for thrift (maximum field length).
thrift_framed_transport_size_in_mb: 1500
# The max length of a thrift message, including all fields and
# internal thrift overhead.
thrift_max_message_length_in_mb: 1600
{code}
which implies that a message could be made of >1 frame. Maybe that's just
specific to cassandra's use of thrift?
Would there be any interest in a transport that allowed chunking of large
messages?
TFramedTransport should enforce frame size limits on writes
-----------------------------------------------------------
Key: THRIFT-1324
URL: https://issues.apache.org/jira/browse/THRIFT-1324
Project: Thrift
Issue Type: Bug
Components: Java - Library
Reporter: Jim Ancona
Fix For: 1.0
Currently TFramedTransport only enforces the maximum frame size when it
receives a frame larger than its configured maxLength_ value. so there is no
way to enforce a maximum frame size on the client. Because servers typically
deal with oversized frames by silently dropping them (see THRIFT-1323),
problems caused by oversized frames can be very hard to diagnose. Enforcing the
maximum frame size on writes would enable clients to detect the frame size
mismatch, assuming the client and server are configured with the same value.
Note that the exception thrown in this case should not be a generic
TTransportException--it should be either a subclass or a new
TTransportException.type_ value so that clients can distinguish the frame too
large error. This is important because most other TTransportException causes
reflect transient conditions where retry may be appropriate, but a too-large
frame will never succeed if retried.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
--
Randy Abernethy
Managing Partner, RX-M, LLC
[email protected]
Cell: +1-415-624-6447
San Francisco: +1-415-800-2922
Tokyo: +81-50-5532-8040
www.rx-m.com
@rxmllc