Re: [Architecture] Batched content chunk reading for WSO2 MB through Disruptor

Asanka Abeyweera Fri, 13 Feb 2015 01:03:08 -0800

Hi Asitha,

On Fri, Feb 13, 2015 at 2:16 PM, Asitha Nanayakkara <[email protected]> wrote:


> Hi Asanka,
>
> On Fri, Feb 13, 2015 at 1:46 PM, Asanka Abeyweera <[email protected]>
> wrote:
>
>> HI Asitha,
>>
>>
>> On Fri, Feb 13, 2015 at 1:30 PM, Asitha Nanayakkara <[email protected]>
>> wrote:
>>
>>> Hi Asanka,
>>>
>>> On Fri, Feb 13, 2015 at 1:07 PM, Asanka Abeyweera <[email protected]>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Feb 13, 2015 at 12:38 PM, Asitha Nanayakkara <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Asanka,
>>>>>
>>>>> On Fri, Feb 13, 2015 at 10:22 AM, Asanka Abeyweera <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Asitha ,
>>>>>>
>>>>>> I don't think we need to write a custom batch processor this. For me
>>>>>> it is an additional maintenance headache, reduce readability and we might
>>>>>> have to change our custom processor implementation when we upgrade
>>>>>> disruptor :). Therefore I'm -1 on writing custom processor for this. I
>>>>>> think it's OK to add batching logic to content reading handler. This is
>>>>>> just my idea. I might have missed some details in understanding this.
>>>>>>
>>>>>
>>>>> I'm ok with dropping custom batch processors and having that batching
>>>>> logic in event handler.
>>>>>
>>>>>
>>>>
>>> When batching we need to assure DeliveryEventHandler
>>> (DeliveryEventHandler comes after the contentReaders ) won't process
>>> messages until batched contents are read from DB. If we use the current
>>> event handler, at each event it will update the sequence barrier to the
>>> next one allowing the delivery handler to process the following slots in
>>> ring buffer. But in this scenario we may be in the process of batching
>>> those events and haven't read the content from DB. To assure that we have
>>> batched and read content before DeliveryEventHandler process that slot we
>>> need a batch processor. And we are using concurrent batch processors to
>>> read content with a custom batching logic. Hence we needed a Custom batch
>>> processor here. Similar to what we have in Inbound event handling
>>> with Disruptor. Sorry I forgot the whole thing before. Please correct me
>>> if I'm wrong or any better way to do this.
>>>
>>>
>> This does not happen if we use the default batch processor.  
>> "sequence.set(nextSequence
>> - 1L)" is called after processing the onEvent call with endOfBatch set
>> to true. Therefore the above scenario won't happen.
>>
>> Source location:
>> https://github.com/LMAX-Exchange/disruptor/blob/2.10.4/code/src/main/com/lmax/disruptor/BatchEventProcessor.java#L117
>>
>
> Yes I agree, default batch processor can be used in this scenario. Idea
> behind writing a custom batch processor was to integrate our custom
> concurrent batching logic. Yes we can move custom batching logic to event
> handler and use the default Disruptor. Initial idea was to keep the
> batching logic in batch processor and handling batched events logic in
> event handler.
>

If the requirement is to separate batching logic from handler, what if we
write a handler with batching logic and inside the handler we call our
batch content reading handler.


>
>
>>
>>
>>
>>>
>>>>>> What I understood about the batching mechanism is if we have two
>>>>>> parallel readers, one will batch odd sequences and other will batch even
>>>>>> sequences. Can't we batch neighboring ones together?. i.e. when there are
>>>>>> two parallel readers sequence 1and 2 is done by one handler, 3 and 4 done
>>>>>> by other handler. In this mechanism if we have 5 items to batch and we 
>>>>>> have
>>>>>> 5 reader and the batch size is five, only one handler will do batching. 
>>>>>> But
>>>>>> in the current implementation all the 5 readers will be involved in
>>>>>> batching (each handler will do one item).
>>>>>>
>>>>>
>>>>> This is a probable improvement I thought of having in Inbound event
>>>>> batching as well. But at high message rates where we need the batched
>>>>> performance this type of sparse batching doesn't happen. Yes I agree that
>>>>> mentioned approach would batch events much better in all scenarios.
>>>>>
>>>>>
>>>> BTW any ideas on batching using content chunks rather than content
>>>>> length? This will have much better control over batching process.
>>>>>
>>>>> What is batching using content length?
>>>>
>>>
>>> Currently from metadata what we can retrieve is content length of a
>>> message. (To get the number of chunks we need to get the chunk size from a
>>> reliable source.)  Therefore we have used content length of each message
>>> and aggregate the value until we meet a specified max aggregate content
>>> length to batch messages. This is suboptimal. We don't have a guarantee of
>>> how many message chunks will be received from DB in one call. This value
>>> depends on the message sizes. I think better approach would be to batch
>>> through content chunks. Where we have a guarantee of how many maximum
>>> chunks will be requested in one DB query. Any ideas on this?
>>>
>> Yes, +1 for batching using content chunks. Can we get the number of
>> chunks for a message ID from AndesMetadata or from any other place?
>>
>
> I couldn't find a way to get the number of chunks in a message from
> metadata. Only information we can get is content length through  
> StorableMessageMetaData#getContentSize()
> If we can get the current content chunk size reliably (
> org.wso2.andes.configuration.qpid.ServerConfiguration has the default
> chunk size, this value can change) we can derive chunk count per message.
> Or we might need to add content chunk count to metadata
> when persisting messages.
>
> Thanks,
> Asitha
>
>
>>> Thanks,
>>> Asitha
>>>
>>>
>>>>
>>>>> Thanks
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Feb 13, 2015 at 5:18 AM, Asitha Nanayakkara <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Pamod,
>>>>>>>
>>>>>>> branch with parrallel read implementation
>>>>>>> https://github.com/asitha/andes/tree/parrallel-readers
>>>>>>>
>>>>>>>
>>>>>>> can configure the max content size to batch. Meaning avg content
>>>>>>> size of a batch.
>>>>>>> for smaller messages setting a high content size will lead to
>>>>>>> loading lot of message chunks.
>>>>>>>
>>>>>>> Property can be added to broker.xml
>>>>>>>
>>>>>>> performanceTuning/delivery/contentReadBatchSize
>>>>>>>
>>>>>>> @Asanka Pls take a look for any issues or improvements. Better if we
>>>>>>> can batch through content chunk count I guess.
>>>>>>>
>>>>>>> --
>>>>>>> *Asitha Nanayakkara*
>>>>>>> Software Engineer
>>>>>>> WSO2, Inc. http://wso2.com/
>>>>>>> Mob: + 94 77 85 30 682
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Asanka Abeyweera
>>>>>> Software Engineer
>>>>>> WSO2 Inc.
>>>>>>
>>>>>> Phone: +94 712228648
>>>>>> Blog: a5anka.github.io
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Asitha Nanayakkara*
>>>>> Software Engineer
>>>>> WSO2, Inc. http://wso2.com/
>>>>> Mob: + 94 77 85 30 682
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Asanka Abeyweera
>>>> Software Engineer
>>>> WSO2 Inc.
>>>>
>>>> Phone: +94 712228648
>>>> Blog: a5anka.github.io
>>>>
>>>
>>>
>>>
>>> --
>>> *Asitha Nanayakkara*
>>> Software Engineer
>>> WSO2, Inc. http://wso2.com/
>>> Mob: + 94 77 85 30 682
>>>
>>>
>>
>>
>> --
>> Asanka Abeyweera
>> Software Engineer
>> WSO2 Inc.
>>
>> Phone: +94 712228648
>> Blog: a5anka.github.io
>>
>
>
>
> --
> *Asitha Nanayakkara*
> Software Engineer
> WSO2, Inc. http://wso2.com/
> Mob: + 94 77 85 30 682
>
>


-- 
Asanka Abeyweera
Software Engineer
WSO2 Inc.

Phone: +94 712228648
Blog: a5anka.github.io

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] Batched content chunk reading for WSO2 MB through Disruptor

Reply via email to