Re: Some more benchmarks

2013-03-27 Thread Jukka Zitting
Hi,

On Wed, Mar 27, 2013 at 11:41 AM, Jukka Zitting  wrote:
> Here's a few more simple benchmark results to show where we are:

Some notes to help read and produce benchmark results like these.

The oak-run jar that you can find under oak-run/target has a
"benchmark" mode that produces these results. It can be invoked like
this:

$ java -jar oak-run/target/oak-run-*.jar benchmark [options]
[testcases] [fixtures]

The following benchmark options (with default values) are currently supported:

--host localhost   - MongoDB host
--port 27101   - MongoDB port
--cache 100- cache size (in MB)
--wikipedia  - Wikipedia dump

These options are passed to the test cases and repository fixtures
that need them. For example the Wikipedia dump option is needed by the
WikipediaImport test case and the MongoDB address information by the
MongoMK and SegmentMK -based repository fixtures. The cache setting
controls the bundle cache size in Jackrabbit, the KernelNodeState
cache size in MongoMK and the default H2 MK, and the segment cache
size in SegmentMK.

You can use extra JVM options like -Xmx settings to better control the
benchmark environment. It's also possible to attach the JVM to a
profiler to better understand benchmark results. For example, I'm
using "-agentlib:hprof=cpu=samples,depth=100" as a basic profiling
tool, whose results can be processed with "perl analyze-hprof.pl
java.hprof.txt" to produce a somewhat easier-to-read top-down and
bottom-up summaries of how the execution time is distributed across
the benchmarked codebase.

The test case names like ReadPropertyTest, SmallFileReadTest and
SmallFileWriteTest indicate the specific test case being run. You can
specify one or more test cases in the benchmark command line, and
oak-run will execute each benchmark in sequence. The benchmark code is
located under org.apache.jackrabbit.oak.benchmark in the oak-run
component. Each test case tries to exercise some tightly scoped aspect
of the repository. You might remember many of these tests from the
Jackrabbit benchmark reports like
http://people.apache.org/~jukka/jackrabbit/report-2011-09-27/report.html
that I used to produce earlier.

Finally the benchmark runner supports the following repository fixtures:

Jackrabbit   - Jackrabbit with the default embedded Derby  bundle PM
Oak-Memory   - Oak with the default MK using in-memory storage
Oak-Default  - Oak with the default MK using embedded H2 database
Oak-Mongo- Oak with the new MongoMK
Oak-Segment  - Oak with the SegmentMK

Once started, the benchmark runner will execute each listed test case
against all the listed repository fixtures. After starting up the
repository and preparing the test environment, the test case is first
executed a few times to warm up caches before measurements are
started. Then the test case is run repeatedly for one minute (or at
least 10 times) and the number of milliseconds used by each execution
is recorded. Once done, the following statistics are computed and
reported:

min - minimum time (in ms) taken by a test run
10% - time (in ms) in which the fastest 10% of test runs
50% - time (in ms) taken by the median test run
90% - time (in ms) in which the fastest 90% of test runs
max - maximum time (in ms) taken by a test run
N   - total number of test runs in one minute (or more)

The most useful of these numbers is probably the 90% figure, as it
shows the time under which the majority of test runs completed and
thus what kind of performance could reasonably be expected in a normal
usage scenario. However, the reason why all these different numbers
are reported, instead of just the 90% one, is that often seeing the
distribution of time across test runs can be helpful in identifying
things like whether a bigger cache might help.

Finally, and most importantly, like in all benchmarking, the numbers
produced by these tests should be taken with a large dose of salt.
They DO NOT directly indicate the kind of application performance you
could expect with (the current state of) Oak. Instead they are
designed to isolate implementation-level bottlenecks and to help
measure and profile the performance of specific, isolated features.

BR,

Jukka Zitting


Re: [Still Failing] apache/jackrabbit-oak#1043 (trunk - c3551b9)

2013-03-27 Thread Jukka Zitting
Hi,

On Wed, Mar 27, 2013 at 3:13 PM, Thomas Mueller  wrote:
> I'm looking at the problem, it seems it was one of my commits that broke
> the (integrationTesting) build.

In revision 1461616 I disabled the MongoMK-based TCK tests so we won't
get bombarded by Travis notifications while the problem is being
resolved. Feel free to revert that commit once the tests pass again.

BR,

Jukka Zitting


Re: ConstraintViolationException on heavy load

2013-03-27 Thread Jukka Zitting
Hi,

On Wed, Mar 27, 2013 at 4:28 PM, Tudor Rogoz  wrote:
> I just made a pull, and is not reproducing on my side, too.It was probably
> fixed somehow in the latest commits.

OK, good to know!

BR,

Jukka Zitting


Re: Some more benchmarks

2013-03-27 Thread Jukka Zitting
Hi,

On Wed, Mar 27, 2013 at 1:54 PM, Jukka Zitting  wrote:
> Quick benchmarking of the Oak-Default run shows
> NamePathMapperImpl.getOakPath() calling JcrPathParser.validate()
> taking about 20% of the time in this test.

Updated numbers after the latest OAK-108 change:

# ReadPropertyTest   min 10% 50% 90% max   N
before56  58  61 120 132 802
after 53  54  55  56  721089

Profiling the getProperty calls shows the following distribution of time spent:

NodeImpl.getProperty()
  61% NodeDelegate.getProperty() via perform()
  31% ItemImpl.isStale() via checkStatus()
   8% other stuff

The status check would be an obvious area of improvement, especially
since we're dealing with a read-only session that's never refreshed.

Drilling down to the NodeDelegate.getProperty() method, we have the
following distribution of time:

NodeDelegate.getProperty()
  95% NodeDelegate.getChildLocation()
   5% TreeImpl.internalGetProperty() via NodeLocation.getProperty()

See why I haven't been too excited about the Location concept...

BR,

Jukka Zitting


Re: Some more benchmarks

2013-03-27 Thread Jukka Zitting
Hi,

On Wed, Mar 27, 2013 at 2:32 PM, Michael Dürig  wrote:
> That's right. The easiest thing is to try it out, remove pre-emptive path
> validation and see what breaks. I have the vague memory that there were some
> overly picky TCK tests which required us to put this upfront validation in.
> However, a lot of time has past since then so it might be a good idea to
> have another look.

Yep, I'm on it. Expect a patch in OAK-108 once I'm through those issues.

BR,

Jukka Zitting


Re: Some more benchmarks

2013-03-27 Thread Michael Dürig



On 27.3.13 12:21, Jukka Zitting wrote:

Hi,

On Wed, Mar 27, 2013 at 2:12 PM, Michael Dürig  wrote:

IIUC you propose to not validate paths in the read case but rely on the
downstream code to fail. Might be worth a try. However we'd need different
path parsing then for the read an the write case since circumventing path
validation for the write case is most certainly not the right thing to do.


We already have the NameValidator that ensures that all (non-hidden)
names stored in the repository are valid. As a consequence also all
existing repository paths are valid.


That's right. The easiest thing is to try it out, remove pre-emptive 
path validation and see what breaks. I have the vague memory that there 
were some overly picky TCK tests which required us to put this upfront 
validation in. However, a lot of time has past since then so it might be 
a good idea to have another look.


Michael


Re: Some more benchmarks

2013-03-27 Thread Jukka Zitting
Hi,

On Wed, Mar 27, 2013 at 2:12 PM, Michael Dürig  wrote:
> IIUC you propose to not validate paths in the read case but rely on the
> downstream code to fail. Might be worth a try. However we'd need different
> path parsing then for the read an the write case since circumventing path
> validation for the write case is most certainly not the right thing to do.

We already have the NameValidator that ensures that all (non-hidden)
names stored in the repository are valid. As a consequence also all
existing repository paths are valid.

BR,

Jukka Zitting


Re: Some more benchmarks

2013-03-27 Thread Michael Dürig



On 27.3.13 11:54, Jukka Zitting wrote:

Hi,




Do we need to explicitly validate all paths that get passed to us?
Especially in cases like getProperty(), where in the vast majority of
the cases the given path matches an existing property (whose path by
definition is valid), it would make more sense to skip such validation
entirely, or at least postpone it to the rare cases where a matching
property was not found.


FWIW, the relevant discussion is here: 
https://issues.apache.org/jira/browse/OAK-108


IIUC you propose to not validate paths in the read case but rely on the 
downstream code to fail. Might be worth a try. However we'd need 
different path parsing then for the read an the write case since 
circumventing path validation for the write case is most certainly not 
the right thing to do.


Michael



BR,

Jukka Zitting



Re: Some more benchmarks

2013-03-27 Thread Jukka Zitting
Hi,

On Wed, Mar 27, 2013 at 11:41 AM, Jukka Zitting  wrote:
> # ReadPropertyTest   min 10% 50% 90% max  
>  N
> Jackrabbit31  31  33  92 121
> 1470
> Oak-Default   56  58  61 120 132 
> 802

Quick benchmarking of the Oak-Default run shows
NamePathMapperImpl.getOakPath() calling JcrPathParser.validate()
taking about 20% of the time in this test.

Do we need to explicitly validate all paths that get passed to us?
Especially in cases like getProperty(), where in the vast majority of
the cases the given path matches an existing property (whose path by
definition is valid), it would make more sense to skip such validation
entirely, or at least postpone it to the rare cases where a matching
property was not found.

BR,

Jukka Zitting


Re: ConstraintViolationException on heavy load

2013-03-27 Thread Jukka Zitting
Hi,

On Tue, Mar 26, 2013 at 4:22 PM, Tudor Rogoz  wrote:
> The attached tests (from the initial mail) will reproduce the issue on a
> single mongo instance.

I tried running the test (without modifications, accessing a local
non-clustered mongod instance) a few times (killing it after ten or so
minutes) but couldn't yet reproduce the problem. Do you still see it
with the latest trunk?

BR,

Jukka Zitting


Re: SetPropertyTest progress

2013-03-27 Thread Jukka Zitting
Hi,

On Wed, Mar 27, 2013 at 12:25 PM, Thomas Mueller  wrote:
> Yes, that's a problem. The plan is to split documents once they are too
> large (if they contain many properties, large properties, or many
> updates). This is partially implemented, but disabled currently. When
> enabled, it shouldn't be a problem.

Cool. The SetPropertyTest is a rather extreme edge case with its rapid
sequence of thousands of changes to a single node, that shouldn't
occur too often in practice (only something like a page-view counter
might behave similarly), but it's good to be prepared when something
like that eventually does happen.

BR,

Jukka Zitting


Re: [Broken] apache/jackrabbit-oak#1037 (trunk - 84933ec)

2013-03-27 Thread Jukka Zitting
Hi,

On Wed, Mar 27, 2013 at 11:49 AM, Travis-CI  wrote:
> View the full build log and details: 
> https://travis-ci.org/apache/jackrabbit-oak/builds/5833892

Looks like the newly enabled MongoMK-based TCK tests captured their
first regression.

BR,

Jukka Zitting


Re: Some more benchmarks

2013-03-27 Thread Thomas Mueller
Hi,

Thanks! The SmallFileWriteTest is quite slow for Oak-Mongo: 6 times slower
than Jackrabbit on average, as far as I see. It seems to be, at least
partially, a problem of the AbstractBlobStore. I will have a look.

Regards,
Thomas




On 3/27/13 10:41 AM, "Jukka Zitting"  wrote:

>Hi,
>
>Here's a few more simple benchmark results to show where we are:
>
>Apache Jackrabbit Oak 0.7-SNAPSHOT
># ReadPropertyTest   min 10% 50% 90% max
> N
>Jackrabbit31  31  33  92 121
>  1470
>Oak-Default   56  58  61 120 132
>   802
>Oak-Mongo 56  58  61 120 127
>   802
>Oak-Segment  113 118 131 184 195
>   399
># SmallFileReadTest  min 10% 50% 90% max
> N
>Jackrabbit42  43  63 128 288
>   799
>Oak-Default   57  61 104 416 542
>   397
>Oak-Mongo108 124 190 476 616
>   269
>Oak-Segment   35  36  43 104 124
>  1134
># SmallFileWriteTest min 10% 50% 90% max
> N
>Jackrabbit   143 170 249 3931539
>   115
>Oak-Default  502 571 79211031851
>69
>Oak-Mongo   11921333169522672824
>21
>Oak-Segment  366 379 458 5586036
>   101
>
>BR,
>
>Jukka Zitting



Re: SetPropertyTest progress

2013-03-27 Thread Thomas Mueller
Hi,

Thanks a lot!

>PS. This benchmark is unfortunately particularly ill-suited for use
>with the MongoMK, as it ends up blowing up the revision history of a
>single node:
>
># SetPropertyTestmin 10% 50% 90% max
> N
>Oak-Mongo  31442   32098   58855   92934   93871
>10

Yes, that's a problem. The plan is to split documents once they are too
large (if they contain many properties, large properties, or many
updates). This is partially implemented, but disabled currently. When
enabled, it shouldn't be a problem.

Regards,
Thomas





Some more benchmarks

2013-03-27 Thread Jukka Zitting
Hi,

Here's a few more simple benchmark results to show where we are:

Apache Jackrabbit Oak 0.7-SNAPSHOT
# ReadPropertyTest   min 10% 50% 90% max   N
Jackrabbit31  31  33  92 1211470
Oak-Default   56  58  61 120 132 802
Oak-Mongo 56  58  61 120 127 802
Oak-Segment  113 118 131 184 195 399
# SmallFileReadTest  min 10% 50% 90% max   N
Jackrabbit42  43  63 128 288 799
Oak-Default   57  61 104 416 542 397
Oak-Mongo108 124 190 476 616 269
Oak-Segment   35  36  43 104 1241134
# SmallFileWriteTest min 10% 50% 90% max   N
Jackrabbit   143 170 249 3931539 115
Oak-Default  502 571 79211031851  69
Oak-Mongo   11921333169522672824  21
Oak-Segment  366 379 458 5586036 101

BR,

Jukka Zitting