Re: Some more benchmarks
Hi, On Wed, Mar 27, 2013 at 11:41 AM, Jukka Zitting wrote: > Here's a few more simple benchmark results to show where we are: Some notes to help read and produce benchmark results like these. The oak-run jar that you can find under oak-run/target has a "benchmark" mode that produces these results. It can be invoked like this: $ java -jar oak-run/target/oak-run-*.jar benchmark [options] [testcases] [fixtures] The following benchmark options (with default values) are currently supported: --host localhost - MongoDB host --port 27101 - MongoDB port --cache 100- cache size (in MB) --wikipedia - Wikipedia dump These options are passed to the test cases and repository fixtures that need them. For example the Wikipedia dump option is needed by the WikipediaImport test case and the MongoDB address information by the MongoMK and SegmentMK -based repository fixtures. The cache setting controls the bundle cache size in Jackrabbit, the KernelNodeState cache size in MongoMK and the default H2 MK, and the segment cache size in SegmentMK. You can use extra JVM options like -Xmx settings to better control the benchmark environment. It's also possible to attach the JVM to a profiler to better understand benchmark results. For example, I'm using "-agentlib:hprof=cpu=samples,depth=100" as a basic profiling tool, whose results can be processed with "perl analyze-hprof.pl java.hprof.txt" to produce a somewhat easier-to-read top-down and bottom-up summaries of how the execution time is distributed across the benchmarked codebase. The test case names like ReadPropertyTest, SmallFileReadTest and SmallFileWriteTest indicate the specific test case being run. You can specify one or more test cases in the benchmark command line, and oak-run will execute each benchmark in sequence. The benchmark code is located under org.apache.jackrabbit.oak.benchmark in the oak-run component. Each test case tries to exercise some tightly scoped aspect of the repository. You might remember many of these tests from the Jackrabbit benchmark reports like http://people.apache.org/~jukka/jackrabbit/report-2011-09-27/report.html that I used to produce earlier. Finally the benchmark runner supports the following repository fixtures: Jackrabbit - Jackrabbit with the default embedded Derby bundle PM Oak-Memory - Oak with the default MK using in-memory storage Oak-Default - Oak with the default MK using embedded H2 database Oak-Mongo- Oak with the new MongoMK Oak-Segment - Oak with the SegmentMK Once started, the benchmark runner will execute each listed test case against all the listed repository fixtures. After starting up the repository and preparing the test environment, the test case is first executed a few times to warm up caches before measurements are started. Then the test case is run repeatedly for one minute (or at least 10 times) and the number of milliseconds used by each execution is recorded. Once done, the following statistics are computed and reported: min - minimum time (in ms) taken by a test run 10% - time (in ms) in which the fastest 10% of test runs 50% - time (in ms) taken by the median test run 90% - time (in ms) in which the fastest 90% of test runs max - maximum time (in ms) taken by a test run N - total number of test runs in one minute (or more) The most useful of these numbers is probably the 90% figure, as it shows the time under which the majority of test runs completed and thus what kind of performance could reasonably be expected in a normal usage scenario. However, the reason why all these different numbers are reported, instead of just the 90% one, is that often seeing the distribution of time across test runs can be helpful in identifying things like whether a bigger cache might help. Finally, and most importantly, like in all benchmarking, the numbers produced by these tests should be taken with a large dose of salt. They DO NOT directly indicate the kind of application performance you could expect with (the current state of) Oak. Instead they are designed to isolate implementation-level bottlenecks and to help measure and profile the performance of specific, isolated features. BR, Jukka Zitting
Re: [Still Failing] apache/jackrabbit-oak#1043 (trunk - c3551b9)
Hi, On Wed, Mar 27, 2013 at 3:13 PM, Thomas Mueller wrote: > I'm looking at the problem, it seems it was one of my commits that broke > the (integrationTesting) build. In revision 1461616 I disabled the MongoMK-based TCK tests so we won't get bombarded by Travis notifications while the problem is being resolved. Feel free to revert that commit once the tests pass again. BR, Jukka Zitting
Re: ConstraintViolationException on heavy load
Hi, On Wed, Mar 27, 2013 at 4:28 PM, Tudor Rogoz wrote: > I just made a pull, and is not reproducing on my side, too.It was probably > fixed somehow in the latest commits. OK, good to know! BR, Jukka Zitting
Re: Some more benchmarks
Hi, On Wed, Mar 27, 2013 at 1:54 PM, Jukka Zitting wrote: > Quick benchmarking of the Oak-Default run shows > NamePathMapperImpl.getOakPath() calling JcrPathParser.validate() > taking about 20% of the time in this test. Updated numbers after the latest OAK-108 change: # ReadPropertyTest min 10% 50% 90% max N before56 58 61 120 132 802 after 53 54 55 56 721089 Profiling the getProperty calls shows the following distribution of time spent: NodeImpl.getProperty() 61% NodeDelegate.getProperty() via perform() 31% ItemImpl.isStale() via checkStatus() 8% other stuff The status check would be an obvious area of improvement, especially since we're dealing with a read-only session that's never refreshed. Drilling down to the NodeDelegate.getProperty() method, we have the following distribution of time: NodeDelegate.getProperty() 95% NodeDelegate.getChildLocation() 5% TreeImpl.internalGetProperty() via NodeLocation.getProperty() See why I haven't been too excited about the Location concept... BR, Jukka Zitting
Re: Some more benchmarks
Hi, On Wed, Mar 27, 2013 at 2:32 PM, Michael Dürig wrote: > That's right. The easiest thing is to try it out, remove pre-emptive path > validation and see what breaks. I have the vague memory that there were some > overly picky TCK tests which required us to put this upfront validation in. > However, a lot of time has past since then so it might be a good idea to > have another look. Yep, I'm on it. Expect a patch in OAK-108 once I'm through those issues. BR, Jukka Zitting
Re: Some more benchmarks
On 27.3.13 12:21, Jukka Zitting wrote: Hi, On Wed, Mar 27, 2013 at 2:12 PM, Michael Dürig wrote: IIUC you propose to not validate paths in the read case but rely on the downstream code to fail. Might be worth a try. However we'd need different path parsing then for the read an the write case since circumventing path validation for the write case is most certainly not the right thing to do. We already have the NameValidator that ensures that all (non-hidden) names stored in the repository are valid. As a consequence also all existing repository paths are valid. That's right. The easiest thing is to try it out, remove pre-emptive path validation and see what breaks. I have the vague memory that there were some overly picky TCK tests which required us to put this upfront validation in. However, a lot of time has past since then so it might be a good idea to have another look. Michael
Re: Some more benchmarks
Hi, On Wed, Mar 27, 2013 at 2:12 PM, Michael Dürig wrote: > IIUC you propose to not validate paths in the read case but rely on the > downstream code to fail. Might be worth a try. However we'd need different > path parsing then for the read an the write case since circumventing path > validation for the write case is most certainly not the right thing to do. We already have the NameValidator that ensures that all (non-hidden) names stored in the repository are valid. As a consequence also all existing repository paths are valid. BR, Jukka Zitting
Re: Some more benchmarks
On 27.3.13 11:54, Jukka Zitting wrote: Hi, Do we need to explicitly validate all paths that get passed to us? Especially in cases like getProperty(), where in the vast majority of the cases the given path matches an existing property (whose path by definition is valid), it would make more sense to skip such validation entirely, or at least postpone it to the rare cases where a matching property was not found. FWIW, the relevant discussion is here: https://issues.apache.org/jira/browse/OAK-108 IIUC you propose to not validate paths in the read case but rely on the downstream code to fail. Might be worth a try. However we'd need different path parsing then for the read an the write case since circumventing path validation for the write case is most certainly not the right thing to do. Michael BR, Jukka Zitting
Re: Some more benchmarks
Hi, On Wed, Mar 27, 2013 at 11:41 AM, Jukka Zitting wrote: > # ReadPropertyTest min 10% 50% 90% max > N > Jackrabbit31 31 33 92 121 > 1470 > Oak-Default 56 58 61 120 132 > 802 Quick benchmarking of the Oak-Default run shows NamePathMapperImpl.getOakPath() calling JcrPathParser.validate() taking about 20% of the time in this test. Do we need to explicitly validate all paths that get passed to us? Especially in cases like getProperty(), where in the vast majority of the cases the given path matches an existing property (whose path by definition is valid), it would make more sense to skip such validation entirely, or at least postpone it to the rare cases where a matching property was not found. BR, Jukka Zitting
Re: ConstraintViolationException on heavy load
Hi, On Tue, Mar 26, 2013 at 4:22 PM, Tudor Rogoz wrote: > The attached tests (from the initial mail) will reproduce the issue on a > single mongo instance. I tried running the test (without modifications, accessing a local non-clustered mongod instance) a few times (killing it after ten or so minutes) but couldn't yet reproduce the problem. Do you still see it with the latest trunk? BR, Jukka Zitting
Re: SetPropertyTest progress
Hi, On Wed, Mar 27, 2013 at 12:25 PM, Thomas Mueller wrote: > Yes, that's a problem. The plan is to split documents once they are too > large (if they contain many properties, large properties, or many > updates). This is partially implemented, but disabled currently. When > enabled, it shouldn't be a problem. Cool. The SetPropertyTest is a rather extreme edge case with its rapid sequence of thousands of changes to a single node, that shouldn't occur too often in practice (only something like a page-view counter might behave similarly), but it's good to be prepared when something like that eventually does happen. BR, Jukka Zitting
Re: [Broken] apache/jackrabbit-oak#1037 (trunk - 84933ec)
Hi, On Wed, Mar 27, 2013 at 11:49 AM, Travis-CI wrote: > View the full build log and details: > https://travis-ci.org/apache/jackrabbit-oak/builds/5833892 Looks like the newly enabled MongoMK-based TCK tests captured their first regression. BR, Jukka Zitting
Re: Some more benchmarks
Hi, Thanks! The SmallFileWriteTest is quite slow for Oak-Mongo: 6 times slower than Jackrabbit on average, as far as I see. It seems to be, at least partially, a problem of the AbstractBlobStore. I will have a look. Regards, Thomas On 3/27/13 10:41 AM, "Jukka Zitting" wrote: >Hi, > >Here's a few more simple benchmark results to show where we are: > >Apache Jackrabbit Oak 0.7-SNAPSHOT ># ReadPropertyTest min 10% 50% 90% max > N >Jackrabbit31 31 33 92 121 > 1470 >Oak-Default 56 58 61 120 132 > 802 >Oak-Mongo 56 58 61 120 127 > 802 >Oak-Segment 113 118 131 184 195 > 399 ># SmallFileReadTest min 10% 50% 90% max > N >Jackrabbit42 43 63 128 288 > 799 >Oak-Default 57 61 104 416 542 > 397 >Oak-Mongo108 124 190 476 616 > 269 >Oak-Segment 35 36 43 104 124 > 1134 ># SmallFileWriteTest min 10% 50% 90% max > N >Jackrabbit 143 170 249 3931539 > 115 >Oak-Default 502 571 79211031851 >69 >Oak-Mongo 11921333169522672824 >21 >Oak-Segment 366 379 458 5586036 > 101 > >BR, > >Jukka Zitting
Re: SetPropertyTest progress
Hi, Thanks a lot! >PS. This benchmark is unfortunately particularly ill-suited for use >with the MongoMK, as it ends up blowing up the revision history of a >single node: > ># SetPropertyTestmin 10% 50% 90% max > N >Oak-Mongo 31442 32098 58855 92934 93871 >10 Yes, that's a problem. The plan is to split documents once they are too large (if they contain many properties, large properties, or many updates). This is partially implemented, but disabled currently. When enabled, it shouldn't be a problem. Regards, Thomas
Some more benchmarks
Hi, Here's a few more simple benchmark results to show where we are: Apache Jackrabbit Oak 0.7-SNAPSHOT # ReadPropertyTest min 10% 50% 90% max N Jackrabbit31 31 33 92 1211470 Oak-Default 56 58 61 120 132 802 Oak-Mongo 56 58 61 120 127 802 Oak-Segment 113 118 131 184 195 399 # SmallFileReadTest min 10% 50% 90% max N Jackrabbit42 43 63 128 288 799 Oak-Default 57 61 104 416 542 397 Oak-Mongo108 124 190 476 616 269 Oak-Segment 35 36 43 104 1241134 # SmallFileWriteTest min 10% 50% 90% max N Jackrabbit 143 170 249 3931539 115 Oak-Default 502 571 79211031851 69 Oak-Mongo 11921333169522672824 21 Oak-Segment 366 379 458 5586036 101 BR, Jukka Zitting