Roland,

I took all your suggestions to heart and changed my test to include:
Configured my FJE to use a single thread and throughput=100
Ran my test for 1 million iterations instead of 10,000
Removed all logging and profiling code
Used Akka 2.3.9 instead of 2.4-SNAPSHOT

This seems to have made very little difference. Since all my instrumentation is 
removed, all I have now is the data reported by the YCSB benchmark, as follows.

Legacy Mongo Java Driver Results:
YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.MongoDbClient -s -P workloads/workloada 
-threads 1 -load
Loading workload...
Starting test.
new database url = localhost:27017/ycsb
2015-02-24 15:06:12:797 0 sec: 0 operations; 
mongo connection created with localhost:27017/ycsb
[OVERALL], RunTime(ms), 193841.0
[OVERALL], Throughput(ops/sec), 5158.8673190914205
[INSERT], Operations, 1000000
[INSERT], AverageLatency(us), 191.429134
[INSERT], MinLatency(us), 78
[INSERT], MaxLatency(us), 2231484
[INSERT], 95thPercentileLatency(ms), 0
[INSERT], 99thPercentileLatency(ms), 0
[INSERT], Return=0, 1000000
My RxMongo Results:
YCSB Client 0.1
Command line: -db com.reactific.ycsb.RxMongoClient -s -P workloads/workloada 
-load -threads 1
Loading workload...
Starting test.
2015-02-24 14:32:16:302 0 sec: 0 operations; 
rxmongo connection created with 
mongodb://localhost:27017/ycsb?minPoolSize=2&maxPoolSize=2
[OVERALL], RunTime(ms), 305149.0
[OVERALL], Throughput(ops/sec), 3277.08758671993
[INSERT], Operations, 1000000
[INSERT], AverageLatency(us), 300.902736
[INSERT], MinLatency(us), 186
[INSERT], MaxLatency(us), 2756796
[INSERT], 95thPercentileLatency(ms), 0
[INSERT], 99thPercentileLatency(ms), 0
[INSERT], Return=0, 1000000
It is still on average about 50% slower than the legacy driver. While I think 
there is room for improvement here, I don’t have many more cycles to spend on 
performance analysis unless hAkkers have more suggestions of things to try as I 
am out of ideas.  

As an aside and a lark, I increased the number of threads to 10 in the client 
(load generator), in the Akka FJE, and in the number of actors connected to 
mongo. For the legacy java driver, the results were these:
[OVERALL], RunTime(ms), 196111.0
[OVERALL], Throughput(ops/sec), 5099.153030681604
[INSERT], Operations, 1000000
[INSERT], AverageLatency(us), 1939.049101
while the RxMongo driver performed like this:
[OVERALL], RunTime(ms), 232262.0
[OVERALL], Throughput(ops/sec), 4305.482601544807
[INSERT], Operations, 1000000
[INSERT], AverageLatency(us), 2262.050283
I note from these results that:
The discrepancy in latency narrowed from 57% to 16%
The throughput for the legacy driver declined slightly (2.2%) while the 
throughput for RxMongo increased significantly (31.4%). 

This gives me some hope that with correct tuning for a workload, and at higher 
levels of load, an Akka based mongo driver might just perform well, even if it 
suffers in corner cases like a single thread. Since my goal is to make this 
work on top of akka-streams, I am going to finish the RxMongo implementation 
while waiting for akka-streams to be released and then come back to comparing 
the performance of Akka IO vs. akka-streams vs. the legacy java driver after 
akka-streams 1.0 is released. So, unless others have further input, this thread 
will go idle for a couple of months (probably a good thing!) :)

My thanks and best regards to everyone who helped with this analysis.

Reid.  

> On Feb 24, 2015, at 11:55 AM, Reid Spencer <[email protected]> wrote:
> 
> Hi Roland,
> 
> 
>> On Feb 24, 2015, at 5:23 AM, Roland Kuhn <[email protected]> wrote:
>> 
>> Hi Reid,
>> 
>> currently I don’t have the bandwidth to fully understand what you are doing,
> 
> I appreciate any response at all. Thank you!
> 
>> but the numbers you quote here sound suspiciously like the “ping–pong 
>> problem”: if your thread pool is too big and messages always hop threads 
>> even if they should not, then you will incur the full CPU wake-up latency 
>> (which is of the order of 100µs) for each and every message send.
> 
> Okay, that seems plausible and it gives me a new area of inquiry to 
> investigate. 
> 
>> The ping–pong benchmark (with one ping and one pong each) gets roughly 1000x 
>> faster by using only one thread for the two actors—and the reason is that 
>> these two actors cannot be active concurrently anyway.
> 
> I will try configuring my executor with a single thread and see if that makes 
> it fly. 
> 
>> 
>> Another thing I notice: 10,000 repetitions is by far not enough to benchmark 
>> anything on the JVM. Proper inlining and optimizations require a lot more 
>> than that.
> 
> That’s good to know. I can certainly try this with a million or more. 
> 
>> The other question is how much overhead your way of tracing introduces, 
>> because generating a log message per Actor message send will obviously 
>> double the message-sending overhead.
> 
> Also a good point. I’ll repeat my experiments without that overhead.
> 
>> 
>> Regards,
>> 
>> Roland
> 

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply via email to