[jira] Commented: (JCR-2493) Unit tests for persistence managers
[ https://issues.apache.org/jira/browse/JCR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834142#action_12834142 ] Thomas Mueller commented on JCR-2493: - With the H2 database, changing the database URL is enough: manager.setUrl(jdbc:h2:mem: + database.getPath()); The memory is automatically released when the connection is closed. Derby also supports in-memory databases now, but unfortunately it doesn't release the memory, and there is no nice way to do that manually: http://wiki.apache.org/db-derby/InMemoryBackEndPrimer I suggest to only change the H2 JDBC URL. Unit tests for persistence managers --- Key: JCR-2493 URL: https://issues.apache.org/jira/browse/JCR-2493 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-core Reporter: Jukka Zitting Assignee: Jukka Zitting Priority: Minor Attachments: JCR-2493.patch Currently we only test our persistence managers indirectly via JCR-level test cases. The downside of this approach is that we can only test one persistence manager implementation at a time, and need separate build profiles to switch from one implementation to another. To ensure better coverage and consistent behaviour across all our persistence managers I implemented a simple unit test that works directly against the PersistenceManager interface. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Jackrabbit Apache Derby Database 'Vacuum'
Hi I have an jackrabbit housekeeping routine that removes nodes form the jackrabbit repository. I'm using the out of the box 1.6 jackrabbit web app deployment, which is using the embedded apache derby 10.2 database. Although I'm able to remove the nodes successfully, the space allocated to the data is not being restored to the file system. I have done this before on databases such as postgres and oracle. I have found out how to compress the data in the apache derby database using the instructions stated in the manual - http://db.apache.org/derby/docs/10.2/ref/ref-single.html#rrefaltertablec ompress. I just now need to know how I access the embedded apache derby database using the apache derby ij tool - http://db.apache.org/derby/docs/10.2/tools/tools-single.html. Does anyone have any details on how to do this? Regards George Sibley
[jira] Created: (JCR-2495) Exclude tests instead skipping them
Exclude tests instead skipping them --- Key: JCR-2495 URL: https://issues.apache.org/jira/browse/JCR-2495 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-jcr2spi Affects Versions: 2.1.0 Reporter: Michael Dürig Assignee: Michael Dürig jcr2spi tests run with the spi2jcr module by default so they are configured to be skipped when jcr2spi is built. Manually running a jcr2spi test like this mvn -Dtest=MyTest -Dmaven.test.skip=false test does not work however. The pom configuration seems to take precedence here. To fix this I propose to exclude all test instead of skipping them making it possible to manually execute tests like this mvn -Dtest=MyTest test -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (JCR-2495) Exclude tests instead skipping them
[ https://issues.apache.org/jira/browse/JCR-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Dürig resolved JCR-2495. Resolution: Fixed Fix Version/s: 2.1.0 Fixed at revision 910462 Exclude tests instead skipping them --- Key: JCR-2495 URL: https://issues.apache.org/jira/browse/JCR-2495 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-jcr2spi Affects Versions: 2.1.0 Reporter: Michael Dürig Assignee: Michael Dürig Fix For: 2.1.0 jcr2spi tests run with the spi2jcr module by default so they are configured to be skipped when jcr2spi is built. Manually running a jcr2spi test like this mvn -Dtest=MyTest -Dmaven.test.skip=false test does not work however. The pom configuration seems to take precedence here. To fix this I propose to exclude all test instead of skipping them making it possible to manually execute tests like this mvn -Dtest=MyTest test -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (JCR-2496) Internal error in WorkspaceItemStateFactory#createDeepNodeState
Internal error in WorkspaceItemStateFactory#createDeepNodeState Key: JCR-2496 URL: https://issues.apache.org/jira/browse/JCR-2496 Project: Jackrabbit Content Repository Issue Type: Bug Affects Versions: 2.1.0 Reporter: Michael Dürig Assignee: Michael Dürig When WorkspaceItemStateFactory#createDeepNodeState receives the current entry as argument for anyParent, it throws RepositoryException with the message Internal error while getting deep itemState. This is incorrect (probably a leftover from JCR-1797) since any entry is valid as argument for anyParent. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (JCR-2496) Internal error in WorkspaceItemStateFactory#createDeepNodeState
[ https://issues.apache.org/jira/browse/JCR-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Dürig resolved JCR-2496. Resolution: Fixed Fix Version/s: 2.1.0 Fixed at revision 910470 Internal error in WorkspaceItemStateFactory#createDeepNodeState Key: JCR-2496 URL: https://issues.apache.org/jira/browse/JCR-2496 Project: Jackrabbit Content Repository Issue Type: Bug Affects Versions: 2.1.0 Reporter: Michael Dürig Assignee: Michael Dürig Fix For: 2.1.0 When WorkspaceItemStateFactory#createDeepNodeState receives the current entry as argument for anyParent, it throws RepositoryException with the message Internal error while getting deep itemState. This is incorrect (probably a leftover from JCR-1797) since any entry is valid as argument for anyParent. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (JCR-2497) Improve jcr2spi read performance
Improve jcr2spi read performance - Key: JCR-2497 URL: https://issues.apache.org/jira/browse/JCR-2497 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-jcr2spi, jackrabbit-spi, jackrabbit-spi-commons Affects Versions: 2.1.0 Reporter: Michael Dürig Assignee: Michael Dürig There are sever issues with jcr2spi which affect read performance: 1. The item cache is not hierarchy aware. See JCR-2442 2. Processing of batches from RepositoryService#getItemInfos is expensive. This is a reason for JCR-2461 3. Not existing items always cause a network round trip. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (JCR-2498) Implement caching mechanism for ItemInfo batches
Implement caching mechanism for ItemInfo batches Key: JCR-2498 URL: https://issues.apache.org/jira/browse/JCR-2498 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-jcr2spi, jackrabbit-spi Reporter: Michael Dürig Assignee: Michael Dürig Currently all ItemInfos returned by RepositoryService#getItemInfos are placed into the hierarchy right away. For big batch sizes this is prohibitively expensive. The overhead is so great (*), that it quickly outweighs the overhead of network round trips. Moreover, SPI implementations usually choose the batch in a way determined by the backing persistence store and not by the requirements of the consuming application on the JCR side. That is, many of the items in the batch might never be actually needed. I suggest to implement a cache for ItemInfo batches. Conceptually such a cache would live inside jcr2spi right above the SPI API. The actual implementation would be provided by SPI implementations. This approach allows for fine tuning cache/batch sizes to a given persistence store and network environment. This would also better separate different concerns: the purpose of the existing item cache is to optimize for the requirement of the consumer of the JCR API ('the application'). The new ItemInfo cache is to optimize for the specific network environment and backing persistence store. (*) Numbers follow -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (JCR-2497) Improve jcr2spi read performance
[ https://issues.apache.org/jira/browse/JCR-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834199#action_12834199 ] Michael Dürig edited comment on JCR-2497 at 2/16/10 1:18 PM: - Created JCR-2498 for 2) was (Author: mduerig): Created for 2) Improve jcr2spi read performance - Key: JCR-2497 URL: https://issues.apache.org/jira/browse/JCR-2497 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-jcr2spi, jackrabbit-spi, jackrabbit-spi-commons Affects Versions: 2.1.0 Reporter: Michael Dürig Assignee: Michael Dürig There are sever issues with jcr2spi which affect read performance: 1. The item cache is not hierarchy aware. See JCR-2442 2. Processing of batches from RepositoryService#getItemInfos is expensive. This is a reason for JCR-2461 3. Not existing items always cause a network round trip. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Jackrabbit 3: extracting same name sibling support from the core
Hi, A very simple implementation of my idea: http://h2database.com/p.html#e5e5d0fa3aabc42932e6065a37b1f6a8 The method hasSameNameSibling() that is called for each remove(). If it turns out to be a performance problem we could add a hidden property in the first SNS node itself (only required there). Does anybody see any other obvious problems? Regards, Thomas
[jira] Created: (JCR-2499) Add simple banchmarking tools for jcr2spi read performance
Add simple banchmarking tools for jcr2spi read performance -- Key: JCR-2499 URL: https://issues.apache.org/jira/browse/JCR-2499 Project: Jackrabbit Content Repository Issue Type: Task Affects Versions: 2.1.0 Reporter: Michael Dürig Assignee: Michael Dürig -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (JCR-2499) Add simple banchmarking tools for jcr2spi read performance
[ https://issues.apache.org/jira/browse/JCR-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Dürig updated JCR-2499: --- Component/s: jackrabbit-jcr2spi Add simple banchmarking tools for jcr2spi read performance -- Key: JCR-2499 URL: https://issues.apache.org/jira/browse/JCR-2499 Project: Jackrabbit Content Repository Issue Type: Task Components: jackrabbit-jcr2spi Affects Versions: 2.1.0 Reporter: Michael Dürig Assignee: Michael Dürig -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (JCR-2492) Garbage Collector remove data for active node
[ https://issues.apache.org/jira/browse/JCR-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834205#action_12834205 ] Cédric Chantepie commented on JCR-2492: --- I'm still able to reproduce this trouble with the 42Gb datastore. I've been able to do it once with a smaller datastore, I will try to figure out what is exactly its cause. It seems that jackrabbit-core used by my RAR is 1.4 (not 1.4.5), even if other libs are 1.4.5. Getting jackrabbit-1.4 from SVN, I've some doubt about something in org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager::getAllNodeIds : -- Statement stmt = connectionManager.executeStmt(sql, keys, false, maxCount + 10); With + 10, infinite maxCount (0) is turned in 10, so as far as I understand, getAllNodeIds asks its connectionManager to get all nodes, but with a query whose result is limited to 10 rows. If I'm right, GarbageCollector using getAllNodesIds from given IterablePersistenceManager (scanPersistenceManagers) doesn't really get all nodes (due to rows limit), and so only some nodes are marked (date updated). Nodes not marked (not included in retrieved rows), are then considered as removable by the deleteUnused method of GarbageCollector. Garbage Collector remove data for active node - Key: JCR-2492 URL: https://issues.apache.org/jira/browse/JCR-2492 Project: Jackrabbit Content Repository Issue Type: Bug Affects Versions: core 1.4.5 Environment: Linux 2.6.x (gentoo or fedora), JDK 1.5 (sun or jrockit), JBoss 4.2.3.GA, Derby (10.4.1.3), PostgreSQL (8.1.11 or 8.0.3) * FileSystem = LocalFileSystem * custom AccessManager * PersistenceManager = PostgreSQLPersistenceManager * SearchIndex, textFilterClasses = * DataStore = FileDataStore (minLogRecord = 100) Reporter: Cédric Chantepie Priority: Critical When we use GarbageCollector on a 42Gb datastore, GarbageCollector erase all data. Back with node, none have any longer data : jcr:data was removed as data in datastore no longer exist. On some smaller test repository, this trouble does not occur. We will try to update Jackrabbit version, but at least it could be good to be sure what is really the trouble with GC in Jackrabbit 1.4.5 so that we can be sure that updating it will really fix that. Thanks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Reopened: (JCR-2492) Garbage Collector remove data for active node
[ https://issues.apache.org/jira/browse/JCR-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cédric Chantepie reopened JCR-2492: --- Can be reproduced by reporter (me). Trying to make a testcase that can be uploaded there. Really need to figure out whether the cause was fixed by newer Jackrabbit revision, as this trouble makes datastore remove active data. Garbage Collector remove data for active node - Key: JCR-2492 URL: https://issues.apache.org/jira/browse/JCR-2492 Project: Jackrabbit Content Repository Issue Type: Bug Affects Versions: core 1.4.5 Environment: Linux 2.6.x (gentoo or fedora), JDK 1.5 (sun or jrockit), JBoss 4.2.3.GA, Derby (10.4.1.3), PostgreSQL (8.1.11 or 8.0.3) * FileSystem = LocalFileSystem * custom AccessManager * PersistenceManager = PostgreSQLPersistenceManager * SearchIndex, textFilterClasses = * DataStore = FileDataStore (minLogRecord = 100) Reporter: Cédric Chantepie Priority: Critical When we use GarbageCollector on a 42Gb datastore, GarbageCollector erase all data. Back with node, none have any longer data : jcr:data was removed as data in datastore no longer exist. On some smaller test repository, this trouble does not occur. We will try to update Jackrabbit version, but at least it could be good to be sure what is really the trouble with GC in Jackrabbit 1.4.5 so that we can be sure that updating it will really fix that. Thanks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (JCR-2492) Garbage Collector remove data for active node
[ https://issues.apache.org/jira/browse/JCR-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cédric Chantepie updated JCR-2492: -- Affects Version/s: (was: core 1.4.5) 1.4 Garbage Collector remove data for active node - Key: JCR-2492 URL: https://issues.apache.org/jira/browse/JCR-2492 Project: Jackrabbit Content Repository Issue Type: Bug Affects Versions: 1.4 Environment: Linux 2.6.x (gentoo or fedora), JDK 1.5 (sun or jrockit), JBoss 4.2.3.GA, Derby (10.4.1.3), PostgreSQL (8.1.11 or 8.0.3) * FileSystem = LocalFileSystem * custom AccessManager * PersistenceManager = PostgreSQLPersistenceManager * SearchIndex, textFilterClasses = * DataStore = FileDataStore (minLogRecord = 100) Reporter: Cédric Chantepie Priority: Critical When we use GarbageCollector on a 42Gb datastore, GarbageCollector erase all data. Back with node, none have any longer data : jcr:data was removed as data in datastore no longer exist. On some smaller test repository, this trouble does not occur. We will try to update Jackrabbit version, but at least it could be good to be sure what is really the trouble with GC in Jackrabbit 1.4.5 so that we can be sure that updating it will really fix that. Thanks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (JCR-2499) Add simple benchmarking tools for jcr2spi read performance
[ https://issues.apache.org/jira/browse/JCR-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Dürig updated JCR-2499: --- Summary: Add simple benchmarking tools for jcr2spi read performance (was: Add simple banchmarking tools for jcr2spi read performance) Add simple benchmarking tools for jcr2spi read performance -- Key: JCR-2499 URL: https://issues.apache.org/jira/browse/JCR-2499 Project: Jackrabbit Content Repository Issue Type: Task Components: jackrabbit-jcr2spi Affects Versions: 2.1.0 Reporter: Michael Dürig Assignee: Michael Dürig Fix For: 2.1.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (JCR-2499) Add simple benchmarking tools for jcr2spi read performance
[ https://issues.apache.org/jira/browse/JCR-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Dürig resolved JCR-2499. Resolution: Fixed Fix Version/s: 2.1.0 Fixed at revision 910515 Add simple benchmarking tools for jcr2spi read performance -- Key: JCR-2499 URL: https://issues.apache.org/jira/browse/JCR-2499 Project: Jackrabbit Content Repository Issue Type: Task Components: jackrabbit-jcr2spi Affects Versions: 2.1.0 Reporter: Michael Dürig Assignee: Michael Dürig Fix For: 2.1.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (JCR-2063) FileDataStore: garbage collection can delete files that are still needed
[ https://issues.apache.org/jira/browse/JCR-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834216#action_12834216 ] Thomas Mueller commented on JCR-2063: - A workaround for implementations where this is not fixed is: gc.mark(); try { // sleep to ensure the last modified time is updated // even for file system with a lower time resolution Thread.sleep(5000); } catch (Exception e) { // can not ignore, otherwise data that is in use may be deleted throw new RepositoryException(Interrupted); } gc.mark(); FileDataStore: garbage collection can delete files that are still needed Key: JCR-2063 URL: https://issues.apache.org/jira/browse/JCR-2063 Project: Jackrabbit Content Repository Issue Type: Bug Components: jackrabbit-core Reporter: Thomas Mueller Assignee: Thomas Mueller Fix For: 1.5.5 It looks like the FileDataStore garbage collection (both regular scan and persistence manager scan) can delete files that are still needed. Currently it looks like the reason is the last access time resolution of the operating system. This is 2 seconds for FAT and Mac OS X, NTFS 100 ns, and 1 second for other file systems. That means file that are scanned at the very beginning are sometimes deleted, because they have a later last modified time then when the scan was started. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (JCR-2492) Garbage Collector remove data for active node
[ https://issues.apache.org/jira/browse/JCR-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Mueller resolved JCR-2492. - Resolution: Fixed There are other problems with version 1.4.x, see also JCR-1414 and specially JCR-2063, which was not backported to 1.4.x. See also the comment there for a workaround. Please re-open the bug if you can still reproduce it. Garbage Collector remove data for active node - Key: JCR-2492 URL: https://issues.apache.org/jira/browse/JCR-2492 Project: Jackrabbit Content Repository Issue Type: Bug Affects Versions: 1.4 Environment: Linux 2.6.x (gentoo or fedora), JDK 1.5 (sun or jrockit), JBoss 4.2.3.GA, Derby (10.4.1.3), PostgreSQL (8.1.11 or 8.0.3) * FileSystem = LocalFileSystem * custom AccessManager * PersistenceManager = PostgreSQLPersistenceManager * SearchIndex, textFilterClasses = * DataStore = FileDataStore (minLogRecord = 100) Reporter: Cédric Chantepie Priority: Critical When we use GarbageCollector on a 42Gb datastore, GarbageCollector erase all data. Back with node, none have any longer data : jcr:data was removed as data in datastore no longer exist. On some smaller test repository, this trouble does not occur. We will try to update Jackrabbit version, but at least it could be good to be sure what is really the trouble with GC in Jackrabbit 1.4.5 so that we can be sure that updating it will really fix that. Thanks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (JCR-2493) Unit tests for persistence managers
[ https://issues.apache.org/jira/browse/JCR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved JCR-2493. Resolution: Fixed Fix Version/s: 2.1.0 Patch committed in revision 910526. Good idea about in-memory databases. I updated the H2 JDBC URLs. Unit tests for persistence managers --- Key: JCR-2493 URL: https://issues.apache.org/jira/browse/JCR-2493 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-core Reporter: Jukka Zitting Assignee: Jukka Zitting Priority: Minor Fix For: 2.1.0 Attachments: JCR-2493.patch Currently we only test our persistence managers indirectly via JCR-level test cases. The downside of this approach is that we can only test one persistence manager implementation at a time, and need separate build profiles to switch from one implementation to another. To ensure better coverage and consistent behaviour across all our persistence managers I implemented a simple unit test that works directly against the PersistenceManager interface. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (JCR-2483) Out of memory error while adding a new host due to large number of revisions
[ https://issues.apache.org/jira/browse/JCR-2483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated JCR-2483: --- Affects Version/s: 1.6.0 Fix Version/s: (was: 1.6.0) Out of memory error while adding a new host due to large number of revisions Key: JCR-2483 URL: https://issues.apache.org/jira/browse/JCR-2483 Project: Jackrabbit Content Repository Issue Type: Improvement Components: clustering Affects Versions: 1.6.0 Environment: MySQL DB. 512 MB memory allocated to java app. Reporter: aasoj Attachments: patch In a cluster deployment, revisions are saved in Journal Table in the DB. After a while a huge number of revisions can get created (around 70 k in our test). When a new host is added to the cluster, it tries to read all the revisions and hence the following error: Caused by: java.lang.OutOfMemoryError: Java heap space at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2931) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2871) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3414) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:910) at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1405) at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2816) at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:467) at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:2510) at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1746) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2135) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2542) at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1734) at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:995) at org.apache.jackrabbit.core.journal.DatabaseJournal.getRecords(DatabaseJournal.java:460) at org.apache.jackrabbit.core.journal.AbstractJournal.doSync(AbstractJournal.java:201) at org.apache.jackrabbit.core.journal.AbstractJournal.sync(AbstractJournal.java:188) at org.apache.jackrabbit.core.cluster.ClusterNode.sync(ClusterNode.java:329) at org.apache.jackrabbit.core.cluster.ClusterNode.start(ClusterNode.java:270) This can also happen to an existing host in the cluster when the number of revisions returned is very high. Possible solutions: 1. Cleaning old revisions using Janitor thread: This may be good for new hosts. But it will fail in a scenario when sync delay is high (few hours) and number of updates is high in existing hosts in the cluster 2. Increases memory allocated to Java process: This is not a feasible option always 3. Limit the number of updates read from the DB in any cycle. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (JCR-2492) Garbage Collector remove data for active node
[ https://issues.apache.org/jira/browse/JCR-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834234#action_12834234 ] Cédric Chantepie commented on JCR-2492: --- I think the main cause for this trouble is there : http://svn.apache.org/viewvc/jackrabbit/branches/1.4/jackrabbit-core/src/main/java/org/apache/jackrabbit/core/persistence/bundle/BundleDbPersistenceManager.java?p2=%2Fjackrabbit%2Fbranches%2F1.4%2Fjackrabbit-core%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fjackrabbit%2Fcore%2Fpersistence%2Fbundle%2FBundleDbPersistenceManager.javap1=%2Fjackrabbit%2Fbranches%2F1.4%2Fjackrabbit-core%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fjackrabbit%2Fcore%2Fpersistence%2Fbundle%2FBundleDbPersistenceManager.javar1=633844r2=633843view=diffpathrev=633844 Garbage Collector remove data for active node - Key: JCR-2492 URL: https://issues.apache.org/jira/browse/JCR-2492 Project: Jackrabbit Content Repository Issue Type: Bug Affects Versions: 1.4 Environment: Linux 2.6.x (gentoo or fedora), JDK 1.5 (sun or jrockit), JBoss 4.2.3.GA, Derby (10.4.1.3), PostgreSQL (8.1.11 or 8.0.3) * FileSystem = LocalFileSystem * custom AccessManager * PersistenceManager = PostgreSQLPersistenceManager * SearchIndex, textFilterClasses = * DataStore = FileDataStore (minLogRecord = 100) Reporter: Cédric Chantepie Priority: Critical When we use GarbageCollector on a 42Gb datastore, GarbageCollector erase all data. Back with node, none have any longer data : jcr:data was removed as data in datastore no longer exist. On some smaller test repository, this trouble does not occur. We will try to update Jackrabbit version, but at least it could be good to be sure what is really the trouble with GC in Jackrabbit 1.4.5 so that we can be sure that updating it will really fix that. Thanks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (JCR-1414) Data store garbage collection: inUse not correctly synchronized
[ https://issues.apache.org/jira/browse/JCR-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834241#action_12834241 ] Thomas Mueller commented on JCR-1414: - Revision 633844 also fixed a bug in BundleDbPersistenceManager, which caused data store garbage collection to delete almost all data when using a BundleDbPersistenceManager. See also JCR-2492. Data store garbage collection: inUse not correctly synchronized --- Key: JCR-1414 URL: https://issues.apache.org/jira/browse/JCR-1414 Project: Jackrabbit Content Repository Issue Type: Bug Components: jackrabbit-core Affects Versions: 1.4, core 1.4.1 Reporter: Thomas Mueller Assignee: Thomas Mueller Fix For: core 1.4.2 Access to the fields DbDataStore.inUse and FileDataStore.inUse is not synchronized. This is a problem when concurrently calling GarbageCollector.deleteUnused() and accessing the data store (ConcurrentModificationException is thrown). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (JCR-2492) Garbage Collector remove data for active node
[ https://issues.apache.org/jira/browse/JCR-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834242#action_12834242 ] Thomas Mueller commented on JCR-2492: - Hi, I think your are right. I have added a comment in JCR-1414 about this. So I guess this makes it a duplicate of JCR-1414. A workaround is to disable the PersistenceManager scan using GarbageCollector.setPersistenceManagerScan(false), however this will not solve the other problems of JCR-1414 and JCR-2063. Garbage Collector remove data for active node - Key: JCR-2492 URL: https://issues.apache.org/jira/browse/JCR-2492 Project: Jackrabbit Content Repository Issue Type: Bug Affects Versions: 1.4 Environment: Linux 2.6.x (gentoo or fedora), JDK 1.5 (sun or jrockit), JBoss 4.2.3.GA, Derby (10.4.1.3), PostgreSQL (8.1.11 or 8.0.3) * FileSystem = LocalFileSystem * custom AccessManager * PersistenceManager = PostgreSQLPersistenceManager * SearchIndex, textFilterClasses = * DataStore = FileDataStore (minLogRecord = 100) Reporter: Cédric Chantepie Priority: Critical When we use GarbageCollector on a 42Gb datastore, GarbageCollector erase all data. Back with node, none have any longer data : jcr:data was removed as data in datastore no longer exist. On some smaller test repository, this trouble does not occur. We will try to update Jackrabbit version, but at least it could be good to be sure what is really the trouble with GC in Jackrabbit 1.4.5 so that we can be sure that updating it will really fix that. Thanks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (JCR-2492) Garbage Collector remove data for active node
[ https://issues.apache.org/jira/browse/JCR-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834247#action_12834247 ] Cédric Chantepie commented on JCR-2492: --- I will try using Jackrabbit 2.0.0 rather than the workaround for 1.4 . Thanks, now it's clear. Garbage Collector remove data for active node - Key: JCR-2492 URL: https://issues.apache.org/jira/browse/JCR-2492 Project: Jackrabbit Content Repository Issue Type: Bug Affects Versions: 1.4 Environment: Linux 2.6.x (gentoo or fedora), JDK 1.5 (sun or jrockit), JBoss 4.2.3.GA, Derby (10.4.1.3), PostgreSQL (8.1.11 or 8.0.3) * FileSystem = LocalFileSystem * custom AccessManager * PersistenceManager = PostgreSQLPersistenceManager * SearchIndex, textFilterClasses = * DataStore = FileDataStore (minLogRecord = 100) Reporter: Cédric Chantepie Priority: Critical When we use GarbageCollector on a 42Gb datastore, GarbageCollector erase all data. Back with node, none have any longer data : jcr:data was removed as data in datastore no longer exist. On some smaller test repository, this trouble does not occur. We will try to update Jackrabbit version, but at least it could be good to be sure what is really the trouble with GC in Jackrabbit 1.4.5 so that we can be sure that updating it will really fix that. Thanks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (JCR-2498) Implement caching mechanism for ItemInfo batches
[ https://issues.apache.org/jira/browse/JCR-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834250#action_12834250 ] Michael Dürig commented on JCR-2498: As promised some numbers. All measurements are done using ReadPerformanceTest.java [1]. Batch size: 24340, 12170, 6085, 3043, 1521, 761, 380, 190, 95, 48, 24, 12, 6, 3, 1 ms per request: 20.2, 24.2, 17.4, 16.3, 7.3, 3.0, 2.5, 2.1, 2.0, 1.3, 1.3, 1.1, 1.0, 1.0, 1.1 The performance impact of large batches is clearly visible here. Without refresh operations [2] the picture remains similar but less pronounced: Batch size: 24340, 12170, 6085, 3043, 1521, 761, 380, 190, 95, 48, 24, 12, 6, 3, 1 ms per request: 5.1, 17.1, 16.3, 12.0, 6.0, 2.6, 2.7, 2.0, 2.0, 1.4, 1.4, 1.2, 1.0, 1.1, 1.3 [1] http://svn.apache.org/viewvc/jackrabbit/trunk/jackrabbit-jcr2spi/src/test/java/org/apache/jackrabbit/jcr2spi/benchmark/ReadPerformanceTest.java?revision=910523view=markuppathrev=910523 [2] See upcoming patch Implement caching mechanism for ItemInfo batches Key: JCR-2498 URL: https://issues.apache.org/jira/browse/JCR-2498 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-jcr2spi, jackrabbit-spi Reporter: Michael Dürig Assignee: Michael Dürig Currently all ItemInfos returned by RepositoryService#getItemInfos are placed into the hierarchy right away. For big batch sizes this is prohibitively expensive. The overhead is so great (*), that it quickly outweighs the overhead of network round trips. Moreover, SPI implementations usually choose the batch in a way determined by the backing persistence store and not by the requirements of the consuming application on the JCR side. That is, many of the items in the batch might never be actually needed. I suggest to implement a cache for ItemInfo batches. Conceptually such a cache would live inside jcr2spi right above the SPI API. The actual implementation would be provided by SPI implementations. This approach allows for fine tuning cache/batch sizes to a given persistence store and network environment. This would also better separate different concerns: the purpose of the existing item cache is to optimize for the requirement of the consumer of the JCR API ('the application'). The new ItemInfo cache is to optimize for the specific network environment and backing persistence store. (*) Numbers follow -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (JCR-2498) Implement caching mechanism for ItemInfo batches
[ https://issues.apache.org/jira/browse/JCR-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834251#action_12834251 ] Michael Dürig commented on JCR-2498: Here's the patch mentioned in [2] above. Index: src/test/java/org/apache/jackrabbit/jcr2spi/benchmark/ReadPerformanceTest.java === --- src/test/java/org/apache/jackrabbit/jcr2spi/benchmark/ReadPerformanceTest.java +++ src/test/java/org/apache/jackrabbit/jcr2spi/benchmark/ReadPerformanceTest.java @@ -136,7 +136,7 @@ final ListItem items = new ArrayListItem(); for (int k = 0; k count; k ++) { -switch (rnd.nextInt(4)) { +switch (rnd.nextInt(3)) { case 0: { // getItem callables.add(new CallableLong() { public Long call() throws Exception { Implement caching mechanism for ItemInfo batches Key: JCR-2498 URL: https://issues.apache.org/jira/browse/JCR-2498 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-jcr2spi, jackrabbit-spi Reporter: Michael Dürig Assignee: Michael Dürig Currently all ItemInfos returned by RepositoryService#getItemInfos are placed into the hierarchy right away. For big batch sizes this is prohibitively expensive. The overhead is so great (*), that it quickly outweighs the overhead of network round trips. Moreover, SPI implementations usually choose the batch in a way determined by the backing persistence store and not by the requirements of the consuming application on the JCR side. That is, many of the items in the batch might never be actually needed. I suggest to implement a cache for ItemInfo batches. Conceptually such a cache would live inside jcr2spi right above the SPI API. The actual implementation would be provided by SPI implementations. This approach allows for fine tuning cache/batch sizes to a given persistence store and network environment. This would also better separate different concerns: the purpose of the existing item cache is to optimize for the requirement of the consumer of the JCR API ('the application'). The new ItemInfo cache is to optimize for the specific network environment and backing persistence store. (*) Numbers follow -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (JCR-2498) Implement caching mechanism for ItemInfo batches
[ https://issues.apache.org/jira/browse/JCR-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834272#action_12834272 ] Michael Dürig commented on JCR-2498: Some more numbers demonstrating the effect with JCR-2498-poc.patch applied. The 'new/old time' row gives the quotients of the request times with the patch applied vs. without the patch applied. The 'new/old rts' row gives the quotients of the network round trips with the patch applied vs. without the patch applied. The first measurement includes all operations (getItem, getNode, getProperty and refresh) as above. Batch size: 24340, 12170, 6085, 3043, 1521, 761, 380, 190, 95, 48, 24, 12, 6, 3, 1 new/old time: 0.1, 0.1, 0.1, 0.1, 0.2, 0.3, 0.4, 0.5, 0.5, 0.7, 0.6, 1, 1, 1.1, 0.8 new/old rts: 2.1, 2.8, 1.8, 2.4, 1.8, 1.4, 1.3, 1.2, 1, 1.1, 1, 1, 0.9, 1, 0.9 Most obvious is the vast performance increase (up to factor 10) for reading items. However this comes along with an increase of the number of network round trips. Three things should be noted here: 1. For realistic batch sizes the increase of the number of network round trips is not so significant. 2. The increase of the number of network round trips are caused by the refresh operations. In the test scenario the number of refresh operations is unrealistically high (every fourth operation is a refresh). 3. The items in the batches of the test case are not realistically distributed across the items of the repository. That is, the items are randomly chosen from the repository. In practice however, the items in a batch would be related to each other by some locality criteria. I assume that this would further mitigate the observed effect. For completeness sake here the same measurement as above but without refresh operations: Batch size: 24340, 12170, 6085, 3043, 1521, 761, 380, 190, 95, 48, 24, 12, 6, 3, 1 new/old time: 0.2, 0, 0, 0.1, 0.1, 0.2, 0.4, 0.4, 0.6, 0.6, 0.7, 1, 1, 1, 1.1 new/old rts: 1, 1, 0.9, 0.9, 0.8, 0.9, 0.9, 0.9, 0.9, 1, 1, 1, 1, 1, 1 Implement caching mechanism for ItemInfo batches Key: JCR-2498 URL: https://issues.apache.org/jira/browse/JCR-2498 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-jcr2spi, jackrabbit-spi Reporter: Michael Dürig Assignee: Michael Dürig Attachments: JCR-2498-poc.patch Currently all ItemInfos returned by RepositoryService#getItemInfos are placed into the hierarchy right away. For big batch sizes this is prohibitively expensive. The overhead is so great (*), that it quickly outweighs the overhead of network round trips. Moreover, SPI implementations usually choose the batch in a way determined by the backing persistence store and not by the requirements of the consuming application on the JCR side. That is, many of the items in the batch might never be actually needed. I suggest to implement a cache for ItemInfo batches. Conceptually such a cache would live inside jcr2spi right above the SPI API. The actual implementation would be provided by SPI implementations. This approach allows for fine tuning cache/batch sizes to a given persistence store and network environment. This would also better separate different concerns: the purpose of the existing item cache is to optimize for the requirement of the consumer of the JCR API ('the application'). The new ItemInfo cache is to optimize for the specific network environment and backing persistence store. (*) Numbers follow -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (JCR-2498) Implement caching mechanism for ItemInfo batches
[ https://issues.apache.org/jira/browse/JCR-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834324#action_12834324 ] angela commented on JCR-2498: - although i didn't look at the poc-patch in detailbased on our f2f discussion: looks reasonable to me :) Implement caching mechanism for ItemInfo batches Key: JCR-2498 URL: https://issues.apache.org/jira/browse/JCR-2498 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-jcr2spi, jackrabbit-spi Reporter: Michael Dürig Assignee: Michael Dürig Attachments: JCR-2498-poc.patch Currently all ItemInfos returned by RepositoryService#getItemInfos are placed into the hierarchy right away. For big batch sizes this is prohibitively expensive. The overhead is so great (*), that it quickly outweighs the overhead of network round trips. Moreover, SPI implementations usually choose the batch in a way determined by the backing persistence store and not by the requirements of the consuming application on the JCR side. That is, many of the items in the batch might never be actually needed. I suggest to implement a cache for ItemInfo batches. Conceptually such a cache would live inside jcr2spi right above the SPI API. The actual implementation would be provided by SPI implementations. This approach allows for fine tuning cache/batch sizes to a given persistence store and network environment. This would also better separate different concerns: the purpose of the existing item cache is to optimize for the requirement of the consumer of the JCR API ('the application'). The new ItemInfo cache is to optimize for the specific network environment and backing persistence store. (*) Numbers follow -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (JCR-2426) Deadlock in lucene (Jackrabbit 1.4.4)
[ https://issues.apache.org/jira/browse/JCR-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antonio Martinez updated JCR-2426: -- Attachment: deadlock_2nd_setup.txt Both setups have same JVM version and the thread dump (see deadlock_2nd_setup.txt). Deadlock in lucene (Jackrabbit 1.4.4) - Key: JCR-2426 URL: https://issues.apache.org/jira/browse/JCR-2426 Project: Jackrabbit Content Repository Issue Type: Bug Components: indexing Affects Versions: core 1.4.4 Reporter: Antonio Martinez Priority: Blocker Attachments: deadlock_2nd_setup.txt, deadlock_summary.txt We get a deadlock in lucene part of jackrabbit (see deadlock_summary.txt) This issue has been observed in two different production setups running Jackrabbit 1.4.4 in cluster configuration -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.