[ 
https://issues.apache.org/jira/browse/S4-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185037#comment-13185037
 ] 

Matthieu Morel commented on S4-40:
----------------------------------

I pushed an implementation of a solution to branch S4-40 
see: 
https://git-wip-us.apache.org/repos/asf?p=incubator-s4.git;a=commitdiff;h=29a22ef0380fec4c355583ef082cdc45711b914e

The implementation uses a configurable thread pool for handling serialization 
tasks, whose results are passed as futures to the storage backend processing 
thread. The CheckpointingCoordinator class prevents race conditions using 
permits.

The latency improvements are quite noticeable. I performed some evaluations to 
validate the approach and here are some results, which I believe make the case 
for this change (the coordination mechanism was also tested for race 
conditions) :
* environment: 
** development machine
** 1 S4 node
** 1 PE prototype with basic state
** 1000 keys (=1000 PE instances)
** checkpointing triggered after every 1 event
** large queues (no event dropped)
** serializer with artificially increased serialization time (busy wait) 
* scenario: 
** inject 100000 events (no delay between events, except during a small warmup 
phase)
** measure time for processing all of them
** measure time for persisting everything to stable storage
** crash and recover, and check recovery ok

Results:
* Previous version takes ~205000 ms to process all events
* Improved version takes ~22000 ms to process all events

That's almost a 10 times performance improvement in that scenario.


                
> Checkpointing: offload serialization of PEs from the event processing thread
> ----------------------------------------------------------------------------
>
>                 Key: S4-40
>                 URL: https://issues.apache.org/jira/browse/S4-40
>             Project: Apache S4
>          Issue Type: Improvement
>    Affects Versions: 0.4
>            Reporter: Matthieu Morel
>            Assignee: Matthieu Morel
>             Fix For: 0.4
>
>
> Checkpointing preserves PE state by serializing PE data then storing the 
> serialized data on stable storage.
> Currently, the storage operation is asynchronous, but the serialization of 
> PEs is performed by the event processing thread. Some PEs have complex data 
> structures that take time to serialize, adding latency to event processing. 
> This affects the overall response time and can also lead to queues overflows 
> and event loss.
> Therefore, we must perform the serialization asynchronously, and make sure to 
> prevent any race condition (PE data updated by normal event processing while 
> PE being serialized on another thread).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to