Hi Marcel, Thanks for your input, it is really appreciated - yes I can put all the code in Github (I will need a day or two).
I did forget to mention that I ran the tests with the following settings: -Xms1024m -Xmx2048m -Doak.queryLimitInMemory=500000 -Doak.queryLimitReads=100000 -Dupdate.limit=250000 -Doak.fastQuerySize=true As for the data, there is one issue in the second Jackrabbit 2 run there were 100 files uploaded as opposed to 1000. No I did not mix up the other results, I ran these tests about 10 times and these results were pretty consistent. I ran them on my local laptop, so I would assume that you would get better results with a dedicated machine. "In contrast to Jackrabbit 2, a move of a large subtree is an expensive operation in Oak" So should I avoid doing a move of a large number of items using Oak? If we are using Oak then should we avoid operations with a large number of items in general? As a FYI - there are other benefits for us to move to Oak, but our application uses executes JCR operations with a large number of items quite often. I am worried about the performance. The move method is pretty simple - should I be doing it differently? public static long moveNodes(Session session, Node node, String newNodeName) throws Exception{ long start = System.currentTimeMillis(); session.move(node.getPath(), "/"+newNodeName); session.save(); long end = System.currentTimeMillis(); return end-start; } Thanks, Domenic -----Original Message----- From: Marcel Reutegger [mailto:mreut...@adobe.com] Sent: Wednesday, March 30, 2016 4:42 AM To: oak-dev@jackrabbit.apache.org Subject: Re: Jackrabbit 2.10 vs Oak 1.2.7 Hi, On 29/03/16 14:55, "Domenic DiTano" wrote: >Sending the data again, I hope this is makes it clearer. I do not mind >sharing the source, assuming you just want the code that does the >creating, deleting etc of nodes (attached) How I created the Document >stores is in the previous email, but if you want I can send that also. yes, I'm just interested in the test code. Can you please make it available, e.g. over github? some comments on the results: In contrast to Jackrabbit 2, a move of a large subtree is an expensive operation in Oak. With Jackrabbit 2, both the content update as well as the index update is rather cheap when a subtree is moved. With Oak, the cost depends on the number of items you move. Some of the results for Jackrabbit 2 with 10k nodes are better than with just 1k. Did you mix up numbers? As mentioned before you potentially get a speedup with Oak when you tweak the update.limit for large change sets with 10k nodes. Regards Marcel >All milliseconds... > >Oak: >Create 1000 (Mysql,PostGress,Mongo) 3444,2483,8497 Query 1000 >(Mysql,PostGress,Mongo) 2,19,2 Upload 100 files (Mysql,PostGress,Mongo) >1455,1130,845 Move 1000 (Mysql,PostGress,Mongo) 96349,2404,14428 Copied >1000 (Mysql,PostGress,Mongo) 2246,556,4432 Delete 1000 >(Mysql,PostGress,Mongo) 92923,1523,7667 Update 1000 >(Mysql,PostGress,Mongo) 48647,1055,4640 Read 1000 >(Mysql,PostGress,Mongo) 98,111,142 > > >Jackrabbit 2: >Create 1000 (Mysql) 3022 >Query 1000 (Mysql) 143 >Upload 100 files (Mysql) 1105 >Move 1000 (Mysql) 16 >Copied 1000 (Mysql) 764 >Delete 1000 (Mysql) 1481 >Update 1000 (Mysql) 1139 >Read 1000 (Mysql) 12 > > >Oak: >Create 10000 (Mysql,PostGress,Mongo) 31250,16475,342192 Query 10000 >(Mysql,PostGress,Mongo) 4,16,2 Upload 100 files (Mysql,PostGress,Mongo) >1146,605,753 Move 10000 (Mysql,PostGress,Mongo) 741474,30339,406259 >Copied 10000 (Mysql,PostGress,Mongo) 20755,7615,43670 Delete 10000 >(Mysql,PostGress,Mongo) 728737,24461,43670 Update 10000 >(Mysql,PostGress,Mongo) 374387,12453,41053 Read 10000 >(Mysql,PostGress,Mongo) 2216,2989,968 > > >Jackrabbit 2: >Create 10000 (Mysql) 8507 >Query 10000 (Mysql) 94 >Upload 100 files (Mysql) 744 >Move 10000 (Mysql) 14 >Copied 10000 (Mysql) 489 >Delete 10000 (Mysql) 824 >Update 10000 (Mysql) 987 >Read 10000 (Mysql) 8 > > >On Tue, Mar 29, 2016 at 8:28 AM, Marcel Reutegger <mreut...@adobe.com> >wrote: > >Hi Domenic, > >the number of test cases do not match the results you provided. i.e. >the column headers do not match data columns. can you please clarify >how the results map to the test cases? > >also, do you mind sharing the test code? I'd like to better understand >what the tests do. > >Regards > Marcel > >On 29/03/16 14:04, "Domenic DiTano" wrote: > >>Sorry those images did not come through, posting the email again with >>the raw data: >> >>I work with web application that has Jackrabbit 2.10 embedded and we >>wanted to try upgrading to Oak. Our current configuration that we use >>for Jackrabbit 2.10 is the FileDataStore along with MySql for the >>Persistence DataStore. We wrote some test cases to measure the >>performance of JackRabbit 2.1.0 vs latest Oak 1.2. In the case of >>JackRabbit 2.10, we used what our current application configuration - >>FileDataStore along with MySql. In the case of Oak we tried many >>configurations but the one we settled on was a DocumentNodeStore with >>a FileDataStore backend. We tried all 3 RDB options (Mongo, PostGress, >>MySql). All Test cases used the same >>code which standard JCR 2.0 code. The test cases did the following: >> >>. create 1000 & 10,000 nodes >>. move 1000 & 10,000 nodes >>. copy 1000 & 10,000 nodes >>. delete 1000 & 10,000 nodes >>. upload 100 files >>. read 1 property on 1000 & 10,000 nodes >>. update 1 property on 1000 & 10,000 nodes >> >> >>The results were as follows (all results in milliseconds): >> >>Oak tests ran with the creation, move, copy, delete, update, and read >>of >>1000 nodes: >> >>Create 1000 Nodes,Query Properties,Upload 100,Move 1000,Copied >>1000,Delete >>1000 >>MySql:3444,2,1445,96349,2246,92923,48647,98 >>Postgress:2483,19,1130,2404,556,1523,1055,111 >>Mongo:8497,2,845,14428,4432,7667,4640,142 >> >>Postgress seems to perform well overall. >> >>In the case of Jackrabbit 2.10 (tests ran with the creation, move, >>copy, delete, update, and read of 1000 nodes): >>Create 1000 Nodes,Query Properties,Upload 100,Move 1000,Copied >>1000,Delete >>1000 >>MySql:3022,143,1105,16,764,1481,1139,12 >> >> >>Jackrabbit 2.10 performs slightly better than Oak. >> >>The next set of tests were ran with Oak with the creation, move, copy, >>delete, update, and read of 10000 nodes: >> >>Create 10000 Nodes,Query Properties,Upload 100,Move 10000,Copied >>10000,Delete 10000,Update 10000,Read 10000 >>MySql:31250,4,1146,741474,20755,728737,374387,2216 >>Postgress:16475,16,605,30339,7615,24461,12453,2989 >>Mongo:342192,2,753,406259,321040,43670,41053,968 >> >>Postgress once again performed ok. Mongo and MySql did not do well >>around Moves, deletes, and updates. Querying did well also as indexes >>were created. >> >>In the case of Jackrabbit 2.10 (tests ran with the creation, move, >>copy, delete, update, and read of 10000 nodes): >>Create 10000 Nodes,Query Properties,Upload 100,Move 10000,Copied >>10000,Delete 10000,Update 10000,Read 10000 >>MySql:8507,94,744,14,489,824,987,8 >> >>Jackrabbit 2.10 performed much better than Oak in general. >> >>Based on the results I have a few questions/comments: >> >>. Are these fair comparisons between Jackrabbit and Oak? In our >>application it is very possible to create 1-10,000 nodes in a user >>session. >>. Should I have assumed Oak would outperform Jackrabbit 2.10? >>. I understand MySql is experimental but Mongo is not - I would >>assume Mongo would perform as well if not better than Postgress >>. The performance bottlenecks seem to be at the JDBC level for >>MySql. I made some configuration changes which helped performance but >>the changes would make MySql fail any ACID tests. >> >>Just a few notes: >> >>The same JCR code was used for creating, moving, deleting etc any nodes. >>The JCR code was used for all the tests. The tests were all run on >>the same machine >> >>Used DocumentMK Builder for all DataStores: >> >>Mongo: >> DocumentNodeStore storeD = new >>DocumentMK.Builder().setPersistentCache("D:\\ekm-oak\\Mongo,size=1024, >>bin >>a >>ry=0").setMongoDB(db).setBlobStore(new >>DataStoreBlobStore(fds)).getNodeStore(); >> >>MySql: >> RDBOptions options = new >>RDBOptions().tablePrefix(prefix).dropTablesOnClose(false); >> DocumentNodeStore storeD = new >>DocumentMK.Builder().setBlobStore(new >>DataStoreBlobStore(fds)).setClusterId(1).memoryCacheSize(64 * 1024 * >>1024). >> >>setPersistentCache("D:\\ekm-oak\\MySql,size=1024,binary=0").setRDBConn >>ect >>i >>on(RDBDataSourceFactory.forJdbcUrl(url, userName, password), >>options).getNodeStore(); >>PostGres: >> RDBOptions options = new >>RDBOptions().tablePrefix(prefix).dropTablesOnClose(false); >> DocumentNodeStore storeD = new >>DocumentMK.Builder().setAsyncDelay(0).setBlobStore(new >>DataStoreBlobStore(fds)).setClusterId(1).memoryCacheSize(64 * 1024 * >>1024). >> >>setPersistentCache("D:\\ekm-oak\\postGress,size=1024,binary=0").setRDB >>Con >>n >>ection(RDBDataSourceFactory.forJdbcUrl(url, userName, password), >>options).getNodeStore(); >> >>The repository was created the same for all three: >>Repository repository = new Jcr(new Oak(storeD)).with(new >>LuceneIndexEditorProvider()).with(configureSearch()).createRepository( >>); >> >>Any input is welcome.. >> >>Thanks, >>Domenic >> >>-----Original Message----- >>From: Marcel Reutegger [mailto:mreut...@adobe.com] >>Sent: Tuesday, March 29, 2016 4:41 AM >>To: oak-dev@jackrabbit.apache.org >>Subject: Re: Jackrabbit 2.10 vs Oak 1.2.7 >> >>Hi, >> >>the graphs didn't make it through to the mailing list. >>Can you please post raw numbers or a link to the graphs? >> >>Without access to more data, my guess is that Oak on DocumentNodeStore >>is slower with the bigger changes set because it internally creates a >>branch to stage changes when it reaches a given threshold. This >>introduces more traffic to the backend storage when save() is called, >>because previously written data is retrieved again from the backend. >> >>Jackrabbit 2.10 on the other hand keeps the entire changes in memory >>until >>save() is called. >> >>You can increase the threshold for the DocumentNodeStore with a system >>property: -Dupdate.limit=100000 >> >>The default is 10'000. >> >>Regards >> Marcel >> >>On 29/03/16 04:19, "Domenic DiTano" wrote: >> >>>Hello, >>> >>>I work with web application that has Jackrabbit 2.10 embedded and we >>>wanted to try upgrading to Oak. Our current configuration that we >>>use for Jackrabbit >>>2.10 is the FileDataStore along with MySql for the Persistence >>>DataStore. >>> We wrote some test cases to measure the performance of JackRabbit >>>2.1.0 vs latest Oak 1.2. In the case of JackRabbit 2.10, we used >>>what our current application configuration FileDataStore along with >>>MySql. >>>In the case of Oak we tried many configurations but the one we >>>settled on was a DocumentNodeStore with a FileDataStore backend.We >>>tried all 3 RDB options (Mongo, PostGress, MySql). >>>All Test cases used the same code which standard >>>JCR 2.0 code. The test cases did the following: >>> >>>. >>>create 1000 & 10,000 nodes >>>. >>>move 1000 & 10,000 nodes >>>. >>>copy 1000 & 10,000 nodes >>>. >>>delete 1000 & 10,000 nodes >>>. >>>upload 100 files >>>. >>>read 1 property on 1000 & 10,000 nodes . >>>update 1 property on 1000 & 10,000 nodes >>> >>> >>>The results were as follows (all results in milliseconds): >>> >>>Oak tests ran with the creation, move, copy, delete, update, and read >>>of >>>1000 nodes: >>> >>> >>> >>>Postgress seems to perform well overall. >>> >>>In the case of Jackrabbit 2.10 (tests ran with the creation, move, >>>copy, delete, update, and read of 1000 nodes): >>> >>> >>> >>>Jackrabbit 2.10 performs slightly better than Oak. >>> >>>The next set of tests were ran with Oak with the creation, move, >>>copy, delete, update, and read of 10000 nodes: >>> >>> >>> >>>Postgress once again performed ok. Mongo and MySql did not do well >>>around Moves, deletes, and updates. Querying did well also as indexes >>>were created. >>> >>>In the case of Jackrabbit 2.10 (tests ran with the creation, move, >>>copy, delete, update, and read of 10000 nodes): >>> >>> >>> >>>Jackrabbit 2.10 performed much >>>better than Oak in general. >>> >>>Based on the results I have a few questions/comments: >>> >>>. >>>Are these fair comparisons between Jackrabbit and Oak? In our >>>application it is very possible to create 1-10,000 nodes in a user >>>session. >>>. >>>Should I have assumed Oak would outperform Jackrabbit 2.10? >>>. >>>I understand MySql is experimental but Mongo is not I would assume >>>Mongo would perform as well if not better than Postgress . >>>The performance bottlenecks seem to be at the JDBC level for MySql. >>>I made some configuration changes which helped performance but the >>>changes would make MySql fail any ACID tests. >>> >>>Just a few notes: >>> >>>The same JCR code was used for creating, moving, deleting etc any nodes. >>>The JCR code was used for all the tests. The tests were all run on >>>the same machine >>> >>>Used DocumentMK Builder for all DataStores: >>> >>>Mongo: >>> DocumentNodeStore storeD = new >>>DocumentMK.Builder().setPersistentCache("D:\\ekm-oak\\Mongo,size=1024 >>>,b >>>ina >>>ry=0").setMongoDB(db).setBlobStore(new >>>DataStoreBlobStore(fds)).getNodeStore(); >>> >>>MySql: >>> RDBOptions options = new >>>RDBOptions().tablePrefix(prefix).dropTablesOnClose(false); >>> DocumentNodeStore storeD = new >>>DocumentMK.Builder().setBlobStore(new >>>DataStoreBlobStore(fds)).setClusterId(1).memoryCacheSize(64 * 1024 * >>>1024). >>> >>>setPersistentCache("D:\\ekm-oak\\MySql,size=1024,binary=0").setRDBCon >>>ne cti on(RDBDataSourceFactory.forJdbcUrl(url, userName, password), >>>options).getNodeStore(); >>>PostGres: >>> RDBOptions options = new >>>RDBOptions().tablePrefix(prefix).dropTablesOnClose(false); >>> DocumentNodeStore storeD = new >>>DocumentMK.Builder().setAsyncDelay(0).setBlobStore(new >>>DataStoreBlobStore(fds)).setClusterId(1).memoryCacheSize(64 * 1024 * >>>1024). >>> >>>setPersistentCache("D:\\ekm-oak\\postGress,size=1024,binary=0").setRD >>>BC onn ection(RDBDataSourceFactory.forJdbcUrl(url, userName, >>>password), options).getNodeStore(); >>> >>>The repository was created the same for all three: >>>Repository repository = new Jcr(new Oak(storeD)).with(new >>>LuceneIndexEditorProvider()).with(configureSearch()).createRepository >>>() >>>; >>> >>>Any input is welcomeŠ. >>> >>>Thanks, >>>Domenic >>> >>> > > > > > > > > > > >-- >Domenic DiTano >ANSYS, Inc. >Tel: 1.724.514.3624 > >domenic.dit...@ansys.com >www.ansys.com <http://www.ansys.com> > > > > > > >