On Sun, Dec 4, 2011 at 8:04 AM, Emmanuel Lecharny <[email protected]>wrote:

> Posted on the wrong mailing list... Forwarding there.
>
> Hi Chad,
>
>
>
> On 12/4/11 1:25 AM, Chad Beaulac wrote:
>
>>  A single algorithm
>>
>>>  can handle large data pipes and provide extremely low latency for
>>>>  variable,
>>>>  small and large message sizes at the same time.
>>>>
>>>>   AFAIU, it' snot because you use a big buffer that you will put some
>>> strain
>>>  when dealing with small messages : the buffer will only contain a few
>>>  useful bytes, and that's it. In any case, this buffer won't be allocated
>>>  everytime we read from the channel, so it's just a container. But it's
>>> way
>>>  better to have a big buffer when dealing with big messages, because then
>>>  you'll have less roundtrips between the read and the processing. But the
>>>  condition, as you said, is that you don't read the channel until there
>>> is
>>>  no more bytes to read. You just read *once* get what you get, and go
>>> fetch
>>>  the processing part of your application with these bytes.
>>>
>>>  The write has exactly the same kind of issue, as you said : don't pound
>>>  the channel, let other channel the opportunity to be written too...
>>>
>>>
>>>   The write has the same sort of issue but it can be handled more
>> optimally
>>  in a different manner. The use case is slightly different because it's
>> the
>>  client producer code driving the algorithm instead the Selector.
>>  Producer Side
>>  - Use a queue of ByteBuffers as a send queue.
>>  - When send is possible for the selector, block on the queue, loop over
>> the
>>  output queue and send until SocketChannel.send(ByteBuffer src)
>>  (returnVal
>>  <   src.remaining || returnVal == 0) or you catch exception.
>>  - This is a fair algorithm when dealing with multiple selectors because
>> the
>>  amount of time the sending thread will spend inside the "send" method is
>>  bounded by how much data is in the ouputQueue and nothing can put data
>> into
>>  the queue while draining the queue to send data out.
>>
>
> Right, but there are some cases where for one session, there is a lot to
> write, when the other sessions are waiting, as the thread is in used
> flusing all its data. This is why I proposed to chunk the writes in
> small chunks (well, small does not mean 1kb here).
>
>
This won't work when you have channels with very large data pipes and
channels with small data pipes in the same selector. It will end up being
inefficient for the large data pipe channel.  Chunking the writes is
unnecessary and will consume extra resources.
Yes, the other sessions will be waiting when you're writing for one
channel. This is true for the entire algorithm.
An example of the fairness of the algorithm is as follows:
Consider a selector with two channels in it that you're writing to.
Channel-1 is a 300Mb/second stream.
Channel-2 is a 2Mb/second stream.
To be fair, the system will need to spend a lot more time writing data for
channel-1. Chunking the data creates overhead at the TCP layer that is best
avoided. Let the TCP layer figure out how it wants to segment TCP packets.
If you have 40MB to write, just call channel1.write(outputBuffer). It is ok
that output for channel-2 is waiting while you're writing for channel-1.
Either the call to write will immediately work and all data will be
written, some portion of it will be written or some error occurs because
the socket is closed or something. In case-1, you'll look in the queue for
more output which is synchronized some nobody can put more data into while
this write is occurring.


> If we have more than one selector, it's still the same issue, as a
> session will always use the same selector.
>
>
Not sure why you'll need more than one selector.


>
>

>>  Consumer Side
>>  - Use a ByteBuffer(64k) as a container to receive data into
>>  - Only call SocketChannel.read(**inputBuffer) once for the channel
>> that's
>>  ready to read.
>>  - Create a new ByteBuffer for the size read. Copy the the intputBuffer
>> into
>>  the new ByteBuffer. Give the new ByteBuffer to the session to process.
>>
> Not sure we want to copy the ByteBuffer. It coud be an option, but if we
> can save us this copy, that would be cool.
>
>   Rewind the input ByteBuffer. An alternative to creating a new ByteBuffer
>>  every time for the size read is allow client code to specify a custom
>>  ByteBuffer factory. This allows client code to pre-allocate memory and
>>  create a ring buffer or something like that.
>>
>>  I use these algorithms in C++ (using ACE - Adaptive Communications
>>  Environment) and Java. The algorithm is basically the same in C++ and
>> Java
>>  and handles protocols with a lot of small messages, variable message size
>>  protocols and large data block sizes.
>>
> I bet it's pretty much the same kind of algorihm, ACE and MINA are based
> on the same logic.
>
> Thanks for your input. I guess I have to put it down somewhere so that
> we have a clear algorithm described before starting implementing anything !
>
>
Agreed :-)
Do you have a Git repo setup for Mina3? I'll help you write some of it if
you like. With unit tests, of course. ;-)


>
>
>>
>>
>>   On the Producer side:
>>>>  Application code should determine the block sizes that are pushed onto
>>>> the
>>>>  output queue. Logic would be as previously stated:
>>>>  - write until there's nothing left to write, unregister for the write
>>>>  event, return to event processing
>>>>
>>>>   This is what we do. I'm afraid that it may be a bit annoying for the
>>> other
>>>  sessions, waiting to send data. At some point, it could be better to
>>> write
>>>  only a limited number of bytes, then give back control to the selector,
>>> and
>>>  be awaken when the selector set the OP_WRITE flag again (which will be
>>>  during the next loop anyway, or ay be another later).
>>>
>>>   - write until the the channel is congestion controlled, stay registered
>>>
>>>>  for
>>>>  write event, return to event processing
>>>>
>>>>   And what about a third option : write until the buffer we have
>>> prepared is
>>>  empty, even if the channel is not full ? That mean even if the producer
>>> has
>>>  prepared a -say- 1Mb block of data to write, it will be written in 16
>>>  blocks of 64Kb, even if the channel can absorb more.
>>>
>>>  Does it make sense ?
>>>
>>>
>>>   No. Doesn't make sense to me. Let the TCP layer handle optimizing how
>> large
>>  chunks of data is handled. If the client puts a ByteBuffer of 1MB or 20MB
>>  or whatever onto the outputQueue, call
>>  SocketChannel.write(**outputByteBuffer). Don't chunk it up.
>>
> But then, while we push all those data in the channel, we may have the
> other sessions on wait untl it's done (unless the channel is full, and
> we can switch to the next session).
>
> Correct and that is what you want.


> So, do you mean that the underlying layer will not allow us to push say,
> 20M, without informing the session that it's full ? In other word, there
> is a limited size that can be pushed and we don't have to take care of
> this limit ourselves ?


>
Sort of. If the TCP send window (OS layer) has less room in it than the
outputBuffer.remaining(), the write will only write a portion of
outputBufffer. Consider this the CONGESTION_CONTROLLED state. If the TCP
send window is full when you try to write, the write will return 0. The
algorithm should never see this case because you should always stop trying
to write when only a portion of the outputBuffer is written. And, always
continue to try and write when an entire outputBuffer is written and there
are more outputBuffers to write in the output queue.

>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>
>
>
>

Reply via email to