Re: Some more benchmarks

2014-07-02 Thread Stefan Guggisberg
On Tue, Jul 1, 2014 at 8:37 PM, Jukka Zitting  wrote:
> Hi,
>
> On Tue, Jul 1, 2014 at 9:38 AM, Jukka Zitting  wrote:
>> I also tried including MongoMK results, but the benchmark got stuck in
>> ConcurrentReadTest. I'll re-try today and will file a bug if I can
>> reproduce the problem.
>
> I guess it was a transient problem. Here are the results with
> Oak-Mongo included:
>
> Summary (90%, lower is better)
>
> Benchmark  Jackrabbit  Oak-Mongo  Oak-Tar
> -
> ReadPropertyTest   45  44
> SetPropertyTest  1179   2398  119
> SmallFileReadTest  47  97
> SmallFileWriteTest182530   43
> ConcurrentReadTest   1201   1247  710
> ConcurrentReadWriteTest  1900   2321  775
> ConcurrentWriteReadTest  1009354  108
> ConcurrentWriteTest   532553  101

wow, very impressive, congrats!

cheers
stefan

>
> I updated the gist at
> https://gist.github.com/jukka/078bd524aa0ba36b184b with full details.
>
> The general message here is to use TarMK for maximum single-node
> performance and MongoMK for scalability and throughput across multiple
> cluster nodes.
>
> BR,
>
> Jukka Zitting


Re: Some more benchmarks

2014-07-01 Thread Jukka Zitting
Hi,

On Tue, Jul 1, 2014 at 9:38 AM, Jukka Zitting  wrote:
> I also tried including MongoMK results, but the benchmark got stuck in
> ConcurrentReadTest. I'll re-try today and will file a bug if I can
> reproduce the problem.

I guess it was a transient problem. Here are the results with
Oak-Mongo included:

Summary (90%, lower is better)

Benchmark  Jackrabbit  Oak-Mongo  Oak-Tar
-
ReadPropertyTest   45  44
SetPropertyTest  1179   2398  119
SmallFileReadTest  47  97
SmallFileWriteTest182530   43
ConcurrentReadTest   1201   1247  710
ConcurrentReadWriteTest  1900   2321  775
ConcurrentWriteReadTest  1009354  108
ConcurrentWriteTest   532553  101

I updated the gist at
https://gist.github.com/jukka/078bd524aa0ba36b184b with full details.

The general message here is to use TarMK for maximum single-node
performance and MongoMK for scalability and throughput across multiple
cluster nodes.

BR,

Jukka Zitting


Re: Some more benchmarks

2014-07-01 Thread Jukka Zitting
Hi,

I'm resurrecting this thread with some new findings. I re-ran many of
the benchmarks we've been following, pitting Jackrabbit 2.8.0 against
Oak 1.0.1 with TarMK. The results look pretty nice:

Summary (90%, lower is better)

Benchmark  Jackrabbit  Oak-Tar
--
ReadPropertyTest   454
SetPropertyTest  1179  119
SmallFileReadTest  477
SmallFileWriteTest182   43
ConcurrentReadTest   1201  710
ConcurrentReadWriteTest  1900  775
ConcurrentWriteReadTest  1009  108
ConcurrentWriteTest   532  101

See https://gist.github.com/jukka/078bd524aa0ba36b184b for details.

I also tried including MongoMK results, but the benchmark got stuck in
ConcurrentReadTest. I'll re-try today and will file a bug if I can
reproduce the problem.

BR,

Jukka Zitting


Re: Some more benchmarks

2013-09-28 Thread Jukka Zitting
Hi,

On Tue, Sep 24, 2013 at 11:19 PM, Jukka Zitting  wrote:
> On Tue, Sep 24, 2013 at 10:47 PM, Jukka Zitting  
> wrote:
>> The concurrent read and read/write test cases look like more attention
>> is needed on the test code, as it's currently hard to interpret the
>> results. I'll see what I can do there.
>
> It turns out most of the reported time was going to login() calls (see
> OAK-634). I refactored the tests in revision 1526092 so that the login
> calls won't affect the performance measurements.

There were a few other systemic issues with the concurrency benchmarks
(like the background threads running at lower priority and doing less
work). I made some further improvements, and the numbers now look like
this:

# ConcurrentReadTest 90%
Jackrabbit  4132
Oak-Default 2031
Oak-Mongo   2116
Oak-Segment 2258
Oak-Tar 2580

# ConcurrentReadWriteTest90%
Jackrabbit  3192
Oak-Default 3600
Oak-Mongo   4596
Oak-Segment 2605
Oak-Tar 2876

# ConcurrentWriteReadTest90%
Jackrabbit  2770
Oak-Default  875
Oak-Mongo   1243
Oak-Segment  565
Oak-Tar  405

# ConcurrentWriteTest90%
Jackrabbit   597
Oak-Default 2141
Oak-Mongo   1166
Oak-Segment  558
Oak-Tar  348

Full details in https://gist.github.com/jukka/6748243.

BR,

Jukka Zitting


Re: Some more benchmarks

2013-09-24 Thread Jukka Zitting
Hi,

On Tue, Sep 24, 2013 at 10:47 PM, Jukka Zitting  wrote:
> The concurrent read and read/write test cases look like more attention
> is needed on the test code, as it's currently hard to interpret the
> results. I'll see what I can do there.

It turns out most of the reported time was going to login() calls (see
OAK-634). I refactored the tests in revision 1526092 so that the login
calls won't affect the performance measurements. As a result the
numbers look much better:

# ConcurrentReadTest90%
Jackrabbit  447
Oak-Default 286
Oak-Mongo   240
Oak-Segment 245
Oak-Tar 252

# ConcurrentReadWriteTest   90%
Jackrabbit  383
Oak-Default 263
Oak-Mongo   270
Oak-Segment 280
Oak-Tar 268

BR,

Jukka Zitting


Re: Some more benchmarks

2013-09-24 Thread Jukka Zitting
Hi,

On Tue, Sep 24, 2013 at 8:08 PM, Jukka Zitting  wrote:
> I'll add a few more benchmarks to my test script.

Here are results for three more benchmarks:

# SetPropertyTest   90%
Jackrabbit  740
Oak-Default 916
Oak-Mongo  5973
Oak-Segment2386
Oak-Tar 728

# ConcurrentReadTest90%
Jackrabbit25656
Oak-Default   32840
Oak-Mongo 33928
Oak-Segment   35522
Oak-Tar   35354

# ConcurrentReadWriteTest   90%
Jackrabbit19280
Oak-Default   39289
Oak-Mongo 48078
Oak-Segment   37384
Oak-Tar   37165

The SetPropertyTest measures extremely small transactions (single
property change), which makes the networking overhead in Oak-Mongo and
Oak-Segment more prominent. Other than that the results are fairly
good.

The concurrent read and read/write test cases look like more attention
is needed on the test code, as it's currently hard to interpret the
results. I'll see what I can do there.

BR,

Jukka Zitting


Re: Some more benchmarks

2013-09-24 Thread Jukka Zitting
Hi,

Updating this thread with the latest numbers. No major changes on
these benchmarks:

# ReadPropertyTest   90%
Jackrabbit48
Oak-Default   39
Oak-Mongo 41
Oak-Segment   41
Oak-Tar   42

# SmallFileReadTest  90%
Jackrabbit91
Oak-Default   19
Oak-Mongo 19
Oak-Segment   18
Oak-Tar   18

# SmallFileWriteTest 90%
Jackrabbit   386
Oak-Default  425
Oak-Mongo963
Oak-Segment  180
Oak-Tar   88

Full details in https://gist.github.com/jukka/6693063.

I'll add a few more benchmarks to my test script.

BR,

Jukka Zitting


Re: Some more benchmarks

2013-07-04 Thread Jukka Zitting
Hi,

On Thu, Jul 4, 2013 at 1:12 PM, Jukka Zitting  wrote:
> On Wed, Jul 3, 2013 at 5:34 PM, Thomas Mueller  wrote:
>> I think it's because all binaries are loaded from the backend (no
>> caching). I bumped the blob cache size from 8 MB to 16 MB, let's see if
>> this helps.
>
> Yes, that could be it (I'll run the tests again soon to confirm).

Indeed the numbers look great now:

  # SmallFileReadTest  min 10% 50% 90% max   N
  Oak-Mongo 14  15  16  17  563790

BR,

Jukka Zitting


Re: Some more benchmarks

2013-07-04 Thread Jukka Zitting
Hi,

On Wed, Jul 3, 2013 at 5:34 PM, Thomas Mueller  wrote:
> I think it's because all binaries are loaded from the backend (no
> caching). I bumped the blob cache size from 8 MB to 16 MB, let's see if
> this helps.

Yes, that could be it (I'll run the tests again soon to confirm). The
working set of the test is about 10MB in size and scanned linearly, so
each iteration would end up flushing a cache that's less than 10MB in
size.

BR,

Jukka Zitting


Re: Some more benchmarks

2013-07-03 Thread Thomas Mueller
Hi,

>I don't know what's dragging the performance in the SmallFileReadTest,

I think it's because all binaries are loaded from the backend (no
caching). I bumped the blob cache size from 8 MB to 16 MB, let's see if
this helps.

Regards,
Thomas



Re: Some more benchmarks

2013-07-03 Thread Jukka Zitting
Hi,

On Wed, Jul 3, 2013 at 11:54 AM, Jukka Zitting  wrote:
> On Wed, Jul 3, 2013 at 11:22 AM, Thomas Mueller  wrote:
>> I usually look at "N" first :-)
>
> It's also a good measure.

Actually not that good, as only the lower limit on the amount of time
over which those N iterations happen is defined, so it's for example
not possible to compute an accurate mean execution time from the
reported N. Also, the N figure also covers the before/afterTest()
methods, which are not included in the other statistics and that which
aren't really within the scope of the functionality that a benchmark
intends to measure. The reason I originally included N in the output
was to given an idea about the statistical significance of the other
figures.

Perhaps we should replace the median (50%) or the 10th percentile (not
a very useful figure) with a more exactly calculated mean execution
time, as that would better represent the information for which N
currently only acts as a rough proxy.

BR,

Jukka Zitting


Re: Some more benchmarks

2013-07-03 Thread Jukka Zitting
Hi,

On Wed, Jul 3, 2013 at 11:22 AM, Thomas Mueller  wrote:
>> I've only included the 90th percentile
>
> I usually look at "N" first :-)

It's also a good measure. I like the 90th percentile better as it
filters out outliers that may otherwise weight pretty heavily on the
total or average execution time. Of course, as you note below, it's
good to pay attention also to such cases.

> There is one strange result: SmallFileWriteTest; Oak-Segment: 90%=257,
> max=14763 - Maybe the warmup phase is too short, or the test isn't that
> great?

Good catch. I ran the benchmark a few more times, and the max time is
always pretty high. It shouldn't be about the warmup phase, as the
time in this test should be governed by the blob I/O. I'll try to
figure out what's causing such worst case behavior.

> As for SmallFileReadTest and SmallFileWriteTest with Oak-Mongo: I think I
> know what the problem is; it doesn't seem to be related to BLOB handling
> at all (actually performance is the same without the BLOB), but partially
> related to the "split documents" that should be added in the near future.
> Also, it seems to be partially related to what the test does (repeatedly
> adding and removing the same nodes).

Right. The semantics of the SmallFileWriteTest should be the same if
the test root name was different for each test iteration, which should
avoid the slit document edge case. I adjusted the test (see patch
below), and the numbers do look a bit better but not radically so:

  # SmallFileWriteTest min 10% 50% 90% max   N
  Oak-Mongo577 591 70811981585  33

I don't know what's dragging the performance in the SmallFileReadTest,
as there the nodes are created just once at the beginning of the
benchmark.

BR,

Jukka Zitting


diff --git 
a/oak-run/src/main/java/org/apache/jackrabbit/oak/benchmark/SmallFileWriteTest.java
b/oak-run/src/main/java/org/apache/jackrabbit/oak/benchmark/SmallFileWriteTest.java
index 7d15b00..c5f2ec8 100644
--- 
a/oak-run/src/main/java/org/apache/jackrabbit/oak/benchmark/SmallFileWriteTest.java
+++ 
b/oak-run/src/main/java/org/apache/jackrabbit/oak/benchmark/SmallFileWriteTest.java
@@ -32,6 +32,8 @@ public class SmallFileWriteTest extends AbstractTest {

 private Node root;

+private long count = 0;
+
 @Override
 public void beforeSuite() throws RepositoryException {
 session = loginWriter();
@@ -39,7 +41,7 @@ public class SmallFileWriteTest extends AbstractTest {

 @Override
 public void beforeTest() throws RepositoryException {
-root = session.getRootNode().addNode("SmallFileWriteTest",
"nt:folder");
+root = session.getRootNode().addNode("SmallFileWriteTest" +
count++, "nt:folder");
 session.save();
 }


Re: Some more benchmarks

2013-07-03 Thread Thomas Mueller
Hi,

Thanks a lot!

> I've only included the 90th percentile

I usually look at "N" first :-)

There is one strange result: SmallFileWriteTest; Oak-Segment: 90%=257,
max=14763 - Maybe the warmup phase is too short, or the test isn't that
great?

As for SmallFileReadTest and SmallFileWriteTest with Oak-Mongo: I think I
know what the problem is; it doesn't seem to be related to BLOB handling
at all (actually performance is the same without the BLOB), but partially
related to the "split documents" that should be added in the near future.
Also, it seems to be partially related to what the test does (repeatedly
adding and removing the same nodes).

Regards,
Thomas


On 7/2/13 10:11 PM, "Jukka Zitting"  wrote:

>Hi,
>
>On Fri, May 31, 2013 at 3:14 PM, Jukka Zitting 
>wrote:
>> On Fri, Apr 26, 2013 at 2:12 PM, Jukka Zitting
>> wrote:
>>> On Wed, Mar 27, 2013 at 11:41 AM, Jukka Zitting
>>> wrote:
 Here's a few more simple benchmark results to show where we are:
>>>
>>> Updated numbers with latest Oak:
>>
>> And another one:
>
>Here we go again:
>
># ReadPropertyTest   90%
>Jackrabbit48
>Oak-Default   38
>Oak-Mongo 39
>Oak-Segment   41
>Oak-Tar   40
>
># SmallFileReadTest  90%
>Jackrabbit94
>Oak-Default  258
>Oak-Mongo421
>Oak-Segment   23
>Oak-Tar   20
>
># SmallFileWriteTest 90%
>Jackrabbit   424
>Oak-Default  349
>Oak-Mongo   1376
>Oak-Segment  257
>Oak-Tar  116
>
>For simplicy I've only included the 90th percentile figure (smaller is
>better). See https://gist.github.com/jukka/5912460 for the full
>details.
>
>The ReadPropertyTest figures were again lagging behind those of
>Jackrabbit, but my changes earlier today got us back to the same
>range. However, we've still regressed somewhat from the level we
>reached in early June.
>
>BR,
>
>Jukka Zitting



Re: Some more benchmarks

2013-07-02 Thread Jukka Zitting
Hi,

On Fri, May 31, 2013 at 3:14 PM, Jukka Zitting  wrote:
> On Fri, Apr 26, 2013 at 2:12 PM, Jukka Zitting  
> wrote:
>> On Wed, Mar 27, 2013 at 11:41 AM, Jukka Zitting  
>> wrote:
>>> Here's a few more simple benchmark results to show where we are:
>>
>> Updated numbers with latest Oak:
>
> And another one:

Here we go again:

# ReadPropertyTest   90%
Jackrabbit48
Oak-Default   38
Oak-Mongo 39
Oak-Segment   41
Oak-Tar   40

# SmallFileReadTest  90%
Jackrabbit94
Oak-Default  258
Oak-Mongo421
Oak-Segment   23
Oak-Tar   20

# SmallFileWriteTest 90%
Jackrabbit   424
Oak-Default  349
Oak-Mongo   1376
Oak-Segment  257
Oak-Tar  116

For simplicy I've only included the 90th percentile figure (smaller is
better). See https://gist.github.com/jukka/5912460 for the full
details.

The ReadPropertyTest figures were again lagging behind those of
Jackrabbit, but my changes earlier today got us back to the same
range. However, we've still regressed somewhat from the level we
reached in early June.

BR,

Jukka Zitting


Re: Some more benchmarks

2013-06-06 Thread Jukka Zitting
Hi,

On Fri, May 31, 2013 at 3:14 PM, Jukka Zitting  wrote:
> It looks like we have a performance regression in ReadPropertyTest.
> Quick profiling shows a lot of the time seems to be going to
> MemoryNodeBuilder$ConnectedHead.update(), which is weird since we're
> only reading and thus the related MNB head should be unconnected. I'll
> investigate.

Revision 1490258 fixed the issue. The updated results are:

Apache Jackrabbit Oak 0.9-SNAPSHOT
# ReadPropertyTest   min 10% 50% 90% max   N
Jackrabbit40  41  41  42  971448
Oak-Default   11  12  12  14  194804
Oak-Mongo 17  18  18  20 1233128
Oak-Segment   94  94  96  98 136 622
Oak-Tar   11  11  12  13  175121

BR,

Jukka Zitting


Re: Some more benchmarks

2013-06-03 Thread Thomas Mueller
Hi,

>At least the TarMK should have no problems with the 100 child nodes
>(see the Wikipedia import test results :-).

Yes, I also thought 100 child nodes shouldn't be a problem. The profiling
data I have so far doesn't show a clear bottleneck. I really wonder what
the problem is in this case.

Regards,
Thomas



Re: Some more benchmarks

2013-06-03 Thread Jukka Zitting
Hi,

On Mon, Jun 3, 2013 at 12:51 PM, Thomas Mueller  wrote:
> I was not talking about differences in hardware. I know using different
> hardware will result in different numbers.
>
> I was worried about results would be different if you run one test alone
> versus if you run all tests. That would indicate a problem in the
> benchmark (framework) itself.
>
> But luckily, that doesn't seem to be the case.

OK, good. The fixture code tries to make sure that the previous
repository instance is fully shut down before starting a new one, and
the warm-up period built into the test suite should take care of any
remaining startup artifacts.

> Specially the SmallFileWriteTest seems slow with Oak. The problem doesn't
> seem to be actual blob handling; the profiling result shows the bottleneck
> is with the (few) nodes. If I change the blob size to 0 (that is, 100
> nodes with the same zero-length blob each, instead of 100 nodes with 10 KB
> each), I get basically the same result, with both MongoMK and the Oak-Tar.
> Maybe it's a first sign of slow "many child nodes"?

At least the TarMK should have no problems with the 100 child nodes
(see the Wikipedia import test results :-). Instead I assume (though
haven't profiled in detail) that much of the time is going to the
still unoptimized getEffectiveNodeType() calls in
NodeImpl.internalSetProperty(). Optimizing that is on my TODO.

BR,

Jukka Zitting


Re: Some more benchmarks

2013-06-03 Thread Thomas Mueller
Hi,

I was not talking about differences in hardware. I know using different
hardware will result in different numbers.

I was worried about results would be different if you run one test alone
versus if you run all tests. That would indicate a problem in the
benchmark (framework) itself.

But luckily, that doesn't seem to be the case. I updated the code and ran
the tests again, and now the results are different. It seems there were
changes recently that improved SmallFileWriteTest for Oak-Tar about 5
times; that's great. The result I got now are:

https://gist.github.com/anonymous/5697099


Specially the SmallFileWriteTest seems slow with Oak. The problem doesn't
seem to be actual blob handling; the profiling result shows the bottleneck
is with the (few) nodes. If I change the blob size to 0 (that is, 100
nodes with the same zero-length blob each, instead of 100 nodes with 10 KB
each), I get basically the same result, with both MongoMK and the Oak-Tar.
Maybe it's a first sign of slow "many child nodes"?

Regards,
Thomas






On 6/3/13 10:27 AM, "Jukka Zitting"  wrote:

>Hi,
>
>On Mon, Jun 3, 2013 at 11:09 AM, Thomas Mueller  wrote:
>> A bit weird is, when I run the tests separately I get different numbers:
>
>The results depend on the hardware you're using, so in general numbers
>from two different environments are not directly comparable.
>
>> In your case, the N was 304 versus 3574 (more than 10 times different),
>>in
>> my case it was 528 versus 1085 (factor 2).
>
>Even relative numbers across fixtures can be different depending on
>the varying IO/CPU/memory access costs on different environments. For
>example an SSD disk will reduce the relative advantage of the TarMK
>that "cheats" by mmapping the entire repository to memory.
>
>> How did you run the test? I will try the same command line and post my
>> results.
>
>I'm using a ec2 m1.medium instance to keep the environment stable over
>time. It would be nice to keep track of results also on different
>hardware.
>
>The command line I've used so far is simply:
>
>$ java -jar oak-run-*.jar benchmark \
>  ReadPropertyTest SmallFileReadTest SmallFileWriteTest \
>  Jackrabbit Oak-Default Oak-Mongo Oak-Segment Oak-Tar
>
>BR,
>
>Jukka Zitting



Re: Some more benchmarks

2013-06-03 Thread Jukka Zitting
Hi,

On Mon, Jun 3, 2013 at 11:09 AM, Thomas Mueller  wrote:
> A bit weird is, when I run the tests separately I get different numbers:

The results depend on the hardware you're using, so in general numbers
from two different environments are not directly comparable.

> In your case, the N was 304 versus 3574 (more than 10 times different), in
> my case it was 528 versus 1085 (factor 2).

Even relative numbers across fixtures can be different depending on
the varying IO/CPU/memory access costs on different environments. For
example an SSD disk will reduce the relative advantage of the TarMK
that "cheats" by mmapping the entire repository to memory.

> How did you run the test? I will try the same command line and post my
> results.

I'm using a ec2 m1.medium instance to keep the environment stable over
time. It would be nice to keep track of results also on different
hardware.

The command line I've used so far is simply:

$ java -jar oak-run-*.jar benchmark \
  ReadPropertyTest SmallFileReadTest SmallFileWriteTest \
  Jackrabbit Oak-Default Oak-Mongo Oak-Segment Oak-Tar

BR,

Jukka Zitting


Re: Some more benchmarks

2013-06-03 Thread Thomas Mueller
Hi,

A bit weird is, when I run the tests separately I get different numbers:

java -mx1g -jar target/oak-run-0.9-SNAPSHOT.jar benchmark
SmallFileReadTest Oak-Tar
# SmallFileReadTest  min 10% 50% 90% max
N
Oak-Tar   53  54  55  57  72
1085


java -mx1g -jar target/oak-run-0.9-SNAPSHOT.jar benchmark
SmallFileReadTest Oak-Mongo
# SmallFileReadTest  min 10% 50% 90% max
N
Oak-Mongo102 104 113 122 310
528

In your case, the N was 304 versus 3574 (more than 10 times different), in
my case it was 528 versus 1085 (factor 2).

How did you run the test? I will try the same command line and post my
results.

Regards,
Thomas





On 5/31/13 2:14 PM, "Jukka Zitting"  wrote:

>Hi,
>
>On Fri, Apr 26, 2013 at 2:12 PM, Jukka Zitting 
>wrote:
>> On Wed, Mar 27, 2013 at 11:41 AM, Jukka Zitting
>> wrote:
>>> Here's a few more simple benchmark results to show where we are:
>>
>> Updated numbers with latest Oak:
>
>And another one:
>
>Apache Jackrabbit Oak 0.9-SNAPSHOT
># ReadPropertyTest   min 10% 50% 90% max
> N
>Jackrabbit41  41  42  43  90
>  1428
>Oak-Default   58  58  59  60  69
>  1018
>Oak-Mongo 66  67  67  68  74
>   889
>Oak-Segment  278 279 281 285 321
>   213
>Oak-Tar  114 114 115 117 136
>   520
># SmallFileReadTest  min 10% 50% 90% max
> N
>Jackrabbit56  57  61  84 194
>   895
>Oak-Default   57  57  59 304 353
>   594
>Oak-Mongo148 148 158 406 479
>   304
>Oak-Segment   33  33  36  37  73
>  1701
>Oak-Tar   15  15  16  18  31
>  3574
># SmallFileWriteTest min 10% 50% 90% max
> N
>Jackrabbit   184 196 248 4442084
>   115
>Oak-Default  136 138 181 4331789
>   162
>Oak-Mongo595 617 79510201075
>31
>Oak-Segment  156 161 172 225 660
>   100
>Oak-Tar  101 102 108 116 270
>   167
>
>(also available at https://gist.github.com/jukka/5684506 if the above
>gets mangled with a variable-width font)
>
>It looks like we have a performance regression in ReadPropertyTest.
>Quick profiling shows a lot of the time seems to be going to
>MemoryNodeBuilder$ConnectedHead.update(), which is weird since we're
>only reading and thus the related MNB head should be unconnected. I'll
>investigate.
>
>BR,
>
>Jukka Zitting



RE: Some more benchmarks

2013-05-31 Thread Michael C. Moore
Great, thanks!

-Original Message-
From: Jukka Zitting [mailto:jukka.zitt...@gmail.com] 
Sent: Friday, May 31, 2013 9:28 AM
To: Oak devs
Subject: Re: Some more benchmarks

Hi,

On Fri, May 31, 2013 at 3:52 PM, Michael C. Moore  wrote:
> Can you briefly explain the test results or point me to a wiki or link that 
> has the explanation?

I just committed the description to a README file, see 
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-run/README.md

BR,

Jukka Zitting


Re: Some more benchmarks

2013-05-31 Thread Jukka Zitting
Hi,

On Fri, May 31, 2013 at 3:52 PM, Michael C. Moore  wrote:
> Can you briefly explain the test results or point me to a wiki or link that 
> has the explanation?

I just committed the description to a README file, see
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-run/README.md

BR,

Jukka Zitting


RE: Some more benchmarks

2013-05-31 Thread Michael C. Moore
Hi Jukka,

First, thanks for the information.  I think you explained what the numbers mean 
(seconds, milliseconds, etc.) in a previous email, but I can't locate it.

Can you briefly explain the test results or point me to a wiki or link that has 
the explanation?

Thanks,
Michael

-Original Message-
From: Jukka Zitting [mailto:jukka.zitt...@gmail.com] 
Sent: Friday, May 31, 2013 8:14 AM
To: Oak devs
Subject: Re: Some more benchmarks

Hi,

On Fri, Apr 26, 2013 at 2:12 PM, Jukka Zitting  wrote:
> On Wed, Mar 27, 2013 at 11:41 AM, Jukka Zitting  
> wrote:
>> Here's a few more simple benchmark results to show where we are:
>
> Updated numbers with latest Oak:

And another one:

Apache Jackrabbit Oak 0.9-SNAPSHOT
# ReadPropertyTest   min 10% 50% 90% max   N
Jackrabbit41  41  42  43  901428
Oak-Default   58  58  59  60  691018
Oak-Mongo 66  67  67  68  74 889
Oak-Segment  278 279 281 285 321 213
Oak-Tar  114 114 115 117 136 520
# SmallFileReadTest  min 10% 50% 90% max   N
Jackrabbit56  57  61  84 194 895
Oak-Default   57  57  59 304 353 594
Oak-Mongo148 148 158 406 479 304
Oak-Segment   33  33  36  37  731701
Oak-Tar   15  15  16  18  313574
# SmallFileWriteTest min 10% 50% 90% max   N
Jackrabbit   184 196 248 4442084 115
Oak-Default  136 138 181 4331789 162
Oak-Mongo595 617 79510201075  31
Oak-Segment  156 161 172 225 660 100
Oak-Tar  101 102 108 116 270 167

(also available at https://gist.github.com/jukka/5684506 if the above gets 
mangled with a variable-width font)

It looks like we have a performance regression in ReadPropertyTest.
Quick profiling shows a lot of the time seems to be going to 
MemoryNodeBuilder$ConnectedHead.update(), which is weird since we're only 
reading and thus the related MNB head should be unconnected. I'll investigate.

BR,

Jukka Zitting


Re: Some more benchmarks

2013-05-31 Thread Jukka Zitting
Hi,

On Fri, Apr 26, 2013 at 2:12 PM, Jukka Zitting  wrote:
> On Wed, Mar 27, 2013 at 11:41 AM, Jukka Zitting  
> wrote:
>> Here's a few more simple benchmark results to show where we are:
>
> Updated numbers with latest Oak:

And another one:

Apache Jackrabbit Oak 0.9-SNAPSHOT
# ReadPropertyTest   min 10% 50% 90% max   N
Jackrabbit41  41  42  43  901428
Oak-Default   58  58  59  60  691018
Oak-Mongo 66  67  67  68  74 889
Oak-Segment  278 279 281 285 321 213
Oak-Tar  114 114 115 117 136 520
# SmallFileReadTest  min 10% 50% 90% max   N
Jackrabbit56  57  61  84 194 895
Oak-Default   57  57  59 304 353 594
Oak-Mongo148 148 158 406 479 304
Oak-Segment   33  33  36  37  731701
Oak-Tar   15  15  16  18  313574
# SmallFileWriteTest min 10% 50% 90% max   N
Jackrabbit   184 196 248 4442084 115
Oak-Default  136 138 181 4331789 162
Oak-Mongo595 617 79510201075  31
Oak-Segment  156 161 172 225 660 100
Oak-Tar  101 102 108 116 270 167

(also available at https://gist.github.com/jukka/5684506 if the above
gets mangled with a variable-width font)

It looks like we have a performance regression in ReadPropertyTest.
Quick profiling shows a lot of the time seems to be going to
MemoryNodeBuilder$ConnectedHead.update(), which is weird since we're
only reading and thus the related MNB head should be unconnected. I'll
investigate.

BR,

Jukka Zitting


Re: Some more benchmarks

2013-04-29 Thread Jukka Zitting
Hi,

On Mon, Apr 29, 2013 at 10:17 AM, Lukas Eder  wrote:
> Are there any test results available with respect to ACL, comparing
> Jackrabbit with Oak?

Not yet. See the o.a.j.oak.benchmark package in oak-run for some of
the existing benchmarks I've been using so far. It should be fairly
straightforward to use one of the existing classes as a baseline for
building a simple ACL benchmark.

BR,

Jukka Zitting


Re: Some more benchmarks

2013-04-29 Thread Lukas Eder
Hello,

I'm interested in estimating performance (and load) impacts of ACL
checking on read access. I'm specifically interested in a comparison where
paths like /a, /a/b, /a/b/c, /a/.../y/z are accessed, and ACL has to be
evaluated "upwards" on the path. Since such a test is more high-level and
may suffer from many side-effects, it's probably more of a load test than
a performance test.

Are there any test results available with respect to ACL, comparing
Jackrabbit with Oak?
Are there any load test results available comparing Jackrabbit with Oak?
Can you point me to the code of these benchmarks?

Cheers
Lukas

On 4/26/13 1:12 PM, "Jukka Zitting"  wrote:

>Hi,
>
>On Wed, Mar 27, 2013 at 11:41 AM, Jukka Zitting 
>wrote:
>> Here's a few more simple benchmark results to show where we are:
>
>Updated numbers with latest Oak:
>
># ReadPropertyTest   min 10% 50% 90% max
> N
>Jackrabbit34  35  37  60 110
>  1333
>Oak-Default8   9   9  20  76
>  4972
>Oak-Mongo 10  10  11  34  38
>  4501
>Oak-Segment   13  13  14  37  44
>  3482
># SmallFileReadTest  min 10% 50% 90% max
> N
>Jackrabbit50  52  76 117 622
>   764
>Oak-Default   51  53  77 390 496
>   483
>Oak-Mongo159 160 184 517 657
>   259
>Oak-Segment   15  16  17  40  86
>  2813
># SmallFileWriteTest min 10% 50% 90% max
> N
>Jackrabbit   181 200 250 4691088
>   105
>Oak-Default  169 180 232 429 923
>   107
>Oak-Mongo698 727 88610511066
>26
>Oak-Segment  221 247 262 337 651
>77
>
>Overall that's pretty nice progress. Apart from a few exceptions,
>we're now better (sometimes significantly so) or on par with
>Jackrabbit 2.x in these benchmarks.
>
>BR,
>
>Jukka Zitting



Re: Some more benchmarks

2013-04-26 Thread Jukka Zitting
Hi,

On Wed, Mar 27, 2013 at 11:41 AM, Jukka Zitting  wrote:
> Here's a few more simple benchmark results to show where we are:

Updated numbers with latest Oak:

# ReadPropertyTest   min 10% 50% 90% max   N
Jackrabbit34  35  37  60 1101333
Oak-Default8   9   9  20  764972
Oak-Mongo 10  10  11  34  384501
Oak-Segment   13  13  14  37  443482
# SmallFileReadTest  min 10% 50% 90% max   N
Jackrabbit50  52  76 117 622 764
Oak-Default   51  53  77 390 496 483
Oak-Mongo159 160 184 517 657 259
Oak-Segment   15  16  17  40  862813
# SmallFileWriteTest min 10% 50% 90% max   N
Jackrabbit   181 200 250 4691088 105
Oak-Default  169 180 232 429 923 107
Oak-Mongo698 727 88610511066  26
Oak-Segment  221 247 262 337 651  77

Overall that's pretty nice progress. Apart from a few exceptions,
we're now better (sometimes significantly so) or on par with
Jackrabbit 2.x in these benchmarks.

BR,

Jukka Zitting


Re: Some more benchmarks

2013-04-02 Thread Angela Schreiber

oh... already fixed it before lunch and committed modifications
before re-checking mails (using OAK-690 and an appropriate comment).

thanks anyway... test passed and i assume it's fine. the internal method 
AuthorizableImpl#getTree will now return the Tree associated

with the authorizable as long it has not been disconneted and
throw IllegalStateException otherwise to avoid odd behavior.

angela

On 4/2/13 12:46 PM, Michael Dürig wrote:



On 28.3.13 15:19, Angela Schreiber wrote:

hi michael


With the resolution of OAK-690, I made tree instances stable across save
and refresh operations.


does that mean that the AuthorizableImpl could hold a Tree instance
instead of re-accessing it over and over again using a lookup by id?
if that was the case, could you please create an issue asking for
that refactoring such that we don't forget it? the fix should be
fairly trivial as the Tree gets passed to the constructor for
validation but is currently not kept as field.


https://issues.apache.org/jira/browse/OAK-733

Michael



regards
angela


Re: Some more benchmarks

2013-04-02 Thread Michael Dürig



On 28.3.13 15:19, Angela Schreiber wrote:

hi michael


With the resolution of OAK-690, I made tree instances stable across save
and refresh operations.


does that mean that the AuthorizableImpl could hold a Tree instance
instead of re-accessing it over and over again using a lookup by id?
if that was the case, could you please create an issue asking for
that refactoring such that we don't forget it? the fix should be
fairly trivial as the Tree gets passed to the constructor for
validation but is currently not kept as field.


https://issues.apache.org/jira/browse/OAK-733

Michael



regards
angela


Re: Some more benchmarks

2013-03-28 Thread Michael Dürig



On 27.3.13 14:41, Jukka Zitting wrote:

Drilling down to the NodeDelegate.getProperty() method, we have the
following distribution of time:

 NodeDelegate.getProperty()
   95% NodeDelegate.getChildLocation()
5% TreeImpl.internalGetProperty() via NodeLocation.getProperty()

See why I haven't been too excited about the Location concept...


This is caused by the getChildLocation taking a relative path instead of 
relying on the client to navigate the hierarchy as necessary. This 
effectively duplicates the effort of interpreting paths: once in 
NodeDelegate.getChildLocation() and once in TreeLocation.getChild(). See 
OAK-426 for some discussion.


I did a quick check and changed TeeLocation.getChild() to only take a 
name instead of a relative path:


# ReadPropertyTest min 10% 50% 90% max   N
Oak-Default (before)12  13  14  15 1284166
Oak-Default (after)  8   8   8  10  996896

As said earlier, I suggest to change TreeLocation.getChild() to only 
take names, not relative paths.


Michael


Re: Some more benchmarks

2013-03-28 Thread Michael Dürig



On 27.3.13 14:41, Jukka Zitting wrote:

Profiling the getProperty calls shows the following distribution of time spent:

 NodeImpl.getProperty()
   61% NodeDelegate.getProperty() via perform()
   31% ItemImpl.isStale() via checkStatus()
8% other stuff

The status check would be an obvious area of improvement, especially
since we're dealing with a read-only session that's never refreshed.


With the resolution of OAK-690, I made tree instances stable across save 
and refresh operations. There is thus no need any more for trees to be 
re-loaded in ItemDelegate and I removed the respective logic already.


These changes improve the situation somewhat and might open some 
additional room for optimizing the status checks (especially in the case 
of read only sessions).


# ReadPropertyTest min 10% 50% 90% max   N
Jackrabbit   8   8   9  10 1316623
Oak-Default 22  22  23  24  422559
Oak-Default (OAK-690)   15  15  16  17  433654

The second to last line is without the changes done for OAK-690 while 
the last line includes those changes.


Michael


Re: Some more benchmarks

2013-03-27 Thread Jukka Zitting
Hi,

On Wed, Mar 27, 2013 at 11:41 AM, Jukka Zitting  wrote:
> Here's a few more simple benchmark results to show where we are:

Some notes to help read and produce benchmark results like these.

The oak-run jar that you can find under oak-run/target has a
"benchmark" mode that produces these results. It can be invoked like
this:

$ java -jar oak-run/target/oak-run-*.jar benchmark [options]
[testcases] [fixtures]

The following benchmark options (with default values) are currently supported:

--host localhost   - MongoDB host
--port 27101   - MongoDB port
--cache 100- cache size (in MB)
--wikipedia  - Wikipedia dump

These options are passed to the test cases and repository fixtures
that need them. For example the Wikipedia dump option is needed by the
WikipediaImport test case and the MongoDB address information by the
MongoMK and SegmentMK -based repository fixtures. The cache setting
controls the bundle cache size in Jackrabbit, the KernelNodeState
cache size in MongoMK and the default H2 MK, and the segment cache
size in SegmentMK.

You can use extra JVM options like -Xmx settings to better control the
benchmark environment. It's also possible to attach the JVM to a
profiler to better understand benchmark results. For example, I'm
using "-agentlib:hprof=cpu=samples,depth=100" as a basic profiling
tool, whose results can be processed with "perl analyze-hprof.pl
java.hprof.txt" to produce a somewhat easier-to-read top-down and
bottom-up summaries of how the execution time is distributed across
the benchmarked codebase.

The test case names like ReadPropertyTest, SmallFileReadTest and
SmallFileWriteTest indicate the specific test case being run. You can
specify one or more test cases in the benchmark command line, and
oak-run will execute each benchmark in sequence. The benchmark code is
located under org.apache.jackrabbit.oak.benchmark in the oak-run
component. Each test case tries to exercise some tightly scoped aspect
of the repository. You might remember many of these tests from the
Jackrabbit benchmark reports like
http://people.apache.org/~jukka/jackrabbit/report-2011-09-27/report.html
that I used to produce earlier.

Finally the benchmark runner supports the following repository fixtures:

Jackrabbit   - Jackrabbit with the default embedded Derby  bundle PM
Oak-Memory   - Oak with the default MK using in-memory storage
Oak-Default  - Oak with the default MK using embedded H2 database
Oak-Mongo- Oak with the new MongoMK
Oak-Segment  - Oak with the SegmentMK

Once started, the benchmark runner will execute each listed test case
against all the listed repository fixtures. After starting up the
repository and preparing the test environment, the test case is first
executed a few times to warm up caches before measurements are
started. Then the test case is run repeatedly for one minute (or at
least 10 times) and the number of milliseconds used by each execution
is recorded. Once done, the following statistics are computed and
reported:

min - minimum time (in ms) taken by a test run
10% - time (in ms) in which the fastest 10% of test runs
50% - time (in ms) taken by the median test run
90% - time (in ms) in which the fastest 90% of test runs
max - maximum time (in ms) taken by a test run
N   - total number of test runs in one minute (or more)

The most useful of these numbers is probably the 90% figure, as it
shows the time under which the majority of test runs completed and
thus what kind of performance could reasonably be expected in a normal
usage scenario. However, the reason why all these different numbers
are reported, instead of just the 90% one, is that often seeing the
distribution of time across test runs can be helpful in identifying
things like whether a bigger cache might help.

Finally, and most importantly, like in all benchmarking, the numbers
produced by these tests should be taken with a large dose of salt.
They DO NOT directly indicate the kind of application performance you
could expect with (the current state of) Oak. Instead they are
designed to isolate implementation-level bottlenecks and to help
measure and profile the performance of specific, isolated features.

BR,

Jukka Zitting


Re: Some more benchmarks

2013-03-27 Thread Jukka Zitting
Hi,

On Wed, Mar 27, 2013 at 1:54 PM, Jukka Zitting  wrote:
> Quick benchmarking of the Oak-Default run shows
> NamePathMapperImpl.getOakPath() calling JcrPathParser.validate()
> taking about 20% of the time in this test.

Updated numbers after the latest OAK-108 change:

# ReadPropertyTest   min 10% 50% 90% max   N
before56  58  61 120 132 802
after 53  54  55  56  721089

Profiling the getProperty calls shows the following distribution of time spent:

NodeImpl.getProperty()
  61% NodeDelegate.getProperty() via perform()
  31% ItemImpl.isStale() via checkStatus()
   8% other stuff

The status check would be an obvious area of improvement, especially
since we're dealing with a read-only session that's never refreshed.

Drilling down to the NodeDelegate.getProperty() method, we have the
following distribution of time:

NodeDelegate.getProperty()
  95% NodeDelegate.getChildLocation()
   5% TreeImpl.internalGetProperty() via NodeLocation.getProperty()

See why I haven't been too excited about the Location concept...

BR,

Jukka Zitting


Re: Some more benchmarks

2013-03-27 Thread Jukka Zitting
Hi,

On Wed, Mar 27, 2013 at 2:32 PM, Michael Dürig  wrote:
> That's right. The easiest thing is to try it out, remove pre-emptive path
> validation and see what breaks. I have the vague memory that there were some
> overly picky TCK tests which required us to put this upfront validation in.
> However, a lot of time has past since then so it might be a good idea to
> have another look.

Yep, I'm on it. Expect a patch in OAK-108 once I'm through those issues.

BR,

Jukka Zitting


Re: Some more benchmarks

2013-03-27 Thread Michael Dürig



On 27.3.13 12:21, Jukka Zitting wrote:

Hi,

On Wed, Mar 27, 2013 at 2:12 PM, Michael Dürig  wrote:

IIUC you propose to not validate paths in the read case but rely on the
downstream code to fail. Might be worth a try. However we'd need different
path parsing then for the read an the write case since circumventing path
validation for the write case is most certainly not the right thing to do.


We already have the NameValidator that ensures that all (non-hidden)
names stored in the repository are valid. As a consequence also all
existing repository paths are valid.


That's right. The easiest thing is to try it out, remove pre-emptive 
path validation and see what breaks. I have the vague memory that there 
were some overly picky TCK tests which required us to put this upfront 
validation in. However, a lot of time has past since then so it might be 
a good idea to have another look.


Michael


Re: Some more benchmarks

2013-03-27 Thread Jukka Zitting
Hi,

On Wed, Mar 27, 2013 at 2:12 PM, Michael Dürig  wrote:
> IIUC you propose to not validate paths in the read case but rely on the
> downstream code to fail. Might be worth a try. However we'd need different
> path parsing then for the read an the write case since circumventing path
> validation for the write case is most certainly not the right thing to do.

We already have the NameValidator that ensures that all (non-hidden)
names stored in the repository are valid. As a consequence also all
existing repository paths are valid.

BR,

Jukka Zitting


Re: Some more benchmarks

2013-03-27 Thread Michael Dürig



On 27.3.13 11:54, Jukka Zitting wrote:

Hi,




Do we need to explicitly validate all paths that get passed to us?
Especially in cases like getProperty(), where in the vast majority of
the cases the given path matches an existing property (whose path by
definition is valid), it would make more sense to skip such validation
entirely, or at least postpone it to the rare cases where a matching
property was not found.


FWIW, the relevant discussion is here: 
https://issues.apache.org/jira/browse/OAK-108


IIUC you propose to not validate paths in the read case but rely on the 
downstream code to fail. Might be worth a try. However we'd need 
different path parsing then for the read an the write case since 
circumventing path validation for the write case is most certainly not 
the right thing to do.


Michael



BR,

Jukka Zitting



Re: Some more benchmarks

2013-03-27 Thread Jukka Zitting
Hi,

On Wed, Mar 27, 2013 at 11:41 AM, Jukka Zitting  wrote:
> # ReadPropertyTest   min 10% 50% 90% max  
>  N
> Jackrabbit31  31  33  92 121
> 1470
> Oak-Default   56  58  61 120 132 
> 802

Quick benchmarking of the Oak-Default run shows
NamePathMapperImpl.getOakPath() calling JcrPathParser.validate()
taking about 20% of the time in this test.

Do we need to explicitly validate all paths that get passed to us?
Especially in cases like getProperty(), where in the vast majority of
the cases the given path matches an existing property (whose path by
definition is valid), it would make more sense to skip such validation
entirely, or at least postpone it to the rare cases where a matching
property was not found.

BR,

Jukka Zitting


Re: Some more benchmarks

2013-03-27 Thread Thomas Mueller
Hi,

Thanks! The SmallFileWriteTest is quite slow for Oak-Mongo: 6 times slower
than Jackrabbit on average, as far as I see. It seems to be, at least
partially, a problem of the AbstractBlobStore. I will have a look.

Regards,
Thomas




On 3/27/13 10:41 AM, "Jukka Zitting"  wrote:

>Hi,
>
>Here's a few more simple benchmark results to show where we are:
>
>Apache Jackrabbit Oak 0.7-SNAPSHOT
># ReadPropertyTest   min 10% 50% 90% max
> N
>Jackrabbit31  31  33  92 121
>  1470
>Oak-Default   56  58  61 120 132
>   802
>Oak-Mongo 56  58  61 120 127
>   802
>Oak-Segment  113 118 131 184 195
>   399
># SmallFileReadTest  min 10% 50% 90% max
> N
>Jackrabbit42  43  63 128 288
>   799
>Oak-Default   57  61 104 416 542
>   397
>Oak-Mongo108 124 190 476 616
>   269
>Oak-Segment   35  36  43 104 124
>  1134
># SmallFileWriteTest min 10% 50% 90% max
> N
>Jackrabbit   143 170 249 3931539
>   115
>Oak-Default  502 571 79211031851
>69
>Oak-Mongo   11921333169522672824
>21
>Oak-Segment  366 379 458 5586036
>   101
>
>BR,
>
>Jukka Zitting



Some more benchmarks

2013-03-27 Thread Jukka Zitting
Hi,

Here's a few more simple benchmark results to show where we are:

Apache Jackrabbit Oak 0.7-SNAPSHOT
# ReadPropertyTest   min 10% 50% 90% max   N
Jackrabbit31  31  33  92 1211470
Oak-Default   56  58  61 120 132 802
Oak-Mongo 56  58  61 120 127 802
Oak-Segment  113 118 131 184 195 399
# SmallFileReadTest  min 10% 50% 90% max   N
Jackrabbit42  43  63 128 288 799
Oak-Default   57  61 104 416 542 397
Oak-Mongo108 124 190 476 616 269
Oak-Segment   35  36  43 104 1241134
# SmallFileWriteTest min 10% 50% 90% max   N
Jackrabbit   143 170 249 3931539 115
Oak-Default  502 571 79211031851  69
Oak-Mongo   11921333169522672824  21
Oak-Segment  366 379 458 5586036 101

BR,

Jukka Zitting