[jira] Commented: (JCR-2493) Unit tests for persistence managers

2010-02-16 Thread Thomas Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/JCR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834142#action_12834142
 ] 

Thomas Mueller commented on JCR-2493:
-

With the H2 database, changing the database URL is enough:
manager.setUrl(jdbc:h2:mem: + database.getPath());
The memory is automatically released when the connection is closed.

Derby also supports in-memory databases now, but unfortunately
it doesn't release the memory, and there is no nice way to do that manually:
http://wiki.apache.org/db-derby/InMemoryBackEndPrimer

I suggest to only change the H2 JDBC URL.

 Unit tests for persistence managers
 ---

 Key: JCR-2493
 URL: https://issues.apache.org/jira/browse/JCR-2493
 Project: Jackrabbit Content Repository
  Issue Type: Improvement
  Components: jackrabbit-core
Reporter: Jukka Zitting
Assignee: Jukka Zitting
Priority: Minor
 Attachments: JCR-2493.patch


 Currently we only test our persistence managers indirectly via JCR-level test 
 cases. The downside of this approach is that we can only test one persistence 
 manager implementation at a time, and need separate build profiles to switch 
 from one implementation to another. To ensure better coverage and consistent 
 behaviour across all our persistence managers I implemented a simple unit 
 test that works directly against the PersistenceManager interface.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Jackrabbit Apache Derby Database 'Vacuum'

2010-02-16 Thread george.sibley
Hi

I have an jackrabbit housekeeping routine that removes nodes form the
jackrabbit repository. I'm using the out of the box 1.6 jackrabbit web
app deployment, which is using the embedded apache derby 10.2 database.
Although I'm able to remove the nodes successfully, the space allocated
to the data is not being restored to the file system. I have done this
before on databases such as postgres and oracle. 

I have found out how to compress the data in the apache derby database
using the instructions stated in the manual -
http://db.apache.org/derby/docs/10.2/ref/ref-single.html#rrefaltertablec
ompress.

I just now need to know how I access the embedded apache derby database
using the apache derby ij tool -
http://db.apache.org/derby/docs/10.2/tools/tools-single.html.

Does anyone have any details on how to do this?

Regards


George Sibley



[jira] Created: (JCR-2495) Exclude tests instead skipping them

2010-02-16 Thread JIRA
Exclude tests instead skipping them
---

 Key: JCR-2495
 URL: https://issues.apache.org/jira/browse/JCR-2495
 Project: Jackrabbit Content Repository
  Issue Type: Improvement
  Components: jackrabbit-jcr2spi
Affects Versions: 2.1.0
Reporter: Michael Dürig
Assignee: Michael Dürig


jcr2spi tests run with the spi2jcr module by default so they are configured to 
be skipped when jcr2spi is built. Manually running a jcr2spi test like this

mvn -Dtest=MyTest -Dmaven.test.skip=false test

does not work however. The pom configuration seems to take precedence here. 

To fix this I propose to exclude all test instead of skipping them making it 
possible to manually execute tests like this

mvn -Dtest=MyTest test


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (JCR-2495) Exclude tests instead skipping them

2010-02-16 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/JCR-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig resolved JCR-2495.


   Resolution: Fixed
Fix Version/s: 2.1.0

Fixed at revision 910462

 Exclude tests instead skipping them
 ---

 Key: JCR-2495
 URL: https://issues.apache.org/jira/browse/JCR-2495
 Project: Jackrabbit Content Repository
  Issue Type: Improvement
  Components: jackrabbit-jcr2spi
Affects Versions: 2.1.0
Reporter: Michael Dürig
Assignee: Michael Dürig
 Fix For: 2.1.0


 jcr2spi tests run with the spi2jcr module by default so they are configured 
 to be skipped when jcr2spi is built. Manually running a jcr2spi test like this
 mvn -Dtest=MyTest -Dmaven.test.skip=false test
 does not work however. The pom configuration seems to take precedence here. 
 To fix this I propose to exclude all test instead of skipping them making it 
 possible to manually execute tests like this
 mvn -Dtest=MyTest test

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (JCR-2496) Internal error in WorkspaceItemStateFactory#createDeepNodeState

2010-02-16 Thread JIRA
Internal error in WorkspaceItemStateFactory#createDeepNodeState 


 Key: JCR-2496
 URL: https://issues.apache.org/jira/browse/JCR-2496
 Project: Jackrabbit Content Repository
  Issue Type: Bug
Affects Versions: 2.1.0
Reporter: Michael Dürig
Assignee: Michael Dürig


When WorkspaceItemStateFactory#createDeepNodeState receives the current entry 
as argument for anyParent, it throws RepositoryException with the message 
Internal error while getting deep itemState. This is incorrect (probably a 
leftover from JCR-1797) since any entry is valid as argument for anyParent. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (JCR-2496) Internal error in WorkspaceItemStateFactory#createDeepNodeState

2010-02-16 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/JCR-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig resolved JCR-2496.


   Resolution: Fixed
Fix Version/s: 2.1.0

Fixed at revision 910470

 Internal error in WorkspaceItemStateFactory#createDeepNodeState 
 

 Key: JCR-2496
 URL: https://issues.apache.org/jira/browse/JCR-2496
 Project: Jackrabbit Content Repository
  Issue Type: Bug
Affects Versions: 2.1.0
Reporter: Michael Dürig
Assignee: Michael Dürig
 Fix For: 2.1.0


 When WorkspaceItemStateFactory#createDeepNodeState receives the current entry 
 as argument for anyParent, it throws RepositoryException with the message 
 Internal error while getting deep itemState. This is incorrect (probably a 
 leftover from JCR-1797) since any entry is valid as argument for anyParent. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (JCR-2497) Improve jcr2spi read performance

2010-02-16 Thread JIRA
Improve jcr2spi read performance 
-

 Key: JCR-2497
 URL: https://issues.apache.org/jira/browse/JCR-2497
 Project: Jackrabbit Content Repository
  Issue Type: Improvement
  Components: jackrabbit-jcr2spi, jackrabbit-spi, jackrabbit-spi-commons
Affects Versions: 2.1.0
Reporter: Michael Dürig
Assignee: Michael Dürig


There are sever issues with jcr2spi which affect read performance:

1. The item cache is not hierarchy aware. See JCR-2442
2. Processing of batches from RepositoryService#getItemInfos is expensive. This 
is a reason for JCR-2461
3. Not existing items always cause a network round trip. 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (JCR-2498) Implement caching mechanism for ItemInfo batches

2010-02-16 Thread JIRA
Implement caching mechanism for ItemInfo batches


 Key: JCR-2498
 URL: https://issues.apache.org/jira/browse/JCR-2498
 Project: Jackrabbit Content Repository
  Issue Type: Improvement
  Components: jackrabbit-jcr2spi, jackrabbit-spi
Reporter: Michael Dürig
Assignee: Michael Dürig


Currently all ItemInfos returned by RepositoryService#getItemInfos are placed 
into the hierarchy right away. For big batch sizes this is prohibitively 
expensive. The overhead is so great (*), that it quickly outweighs the overhead 
of network round trips. Moreover, SPI implementations usually choose the batch 
in a way determined by the backing persistence store and not by the 
requirements of the consuming application on the JCR side. That is, many of the 
items in the batch might never be actually needed. 

I suggest to implement a cache for ItemInfo batches. Conceptually such a cache 
would live inside jcr2spi right above the SPI API. The actual implementation 
would be provided by SPI implementations. This approach allows for fine tuning 
cache/batch sizes to a given persistence store and network environment. This 
would also better separate different concerns: the purpose of the existing item 
cache is to optimize for the requirement of the consumer of the JCR API ('the 
application'). The new ItemInfo cache is to optimize for the specific network 
environment and backing persistence store. 

(*) Numbers follow 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (JCR-2497) Improve jcr2spi read performance

2010-02-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/JCR-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834199#action_12834199
 ] 

Michael Dürig edited comment on JCR-2497 at 2/16/10 1:18 PM:
-

Created JCR-2498 for 2)

  was (Author: mduerig):
Created for 2)
  
 Improve jcr2spi read performance 
 -

 Key: JCR-2497
 URL: https://issues.apache.org/jira/browse/JCR-2497
 Project: Jackrabbit Content Repository
  Issue Type: Improvement
  Components: jackrabbit-jcr2spi, jackrabbit-spi, 
 jackrabbit-spi-commons
Affects Versions: 2.1.0
Reporter: Michael Dürig
Assignee: Michael Dürig

 There are sever issues with jcr2spi which affect read performance:
 1. The item cache is not hierarchy aware. See JCR-2442
 2. Processing of batches from RepositoryService#getItemInfos is expensive. 
 This is a reason for JCR-2461
 3. Not existing items always cause a network round trip. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Jackrabbit 3: extracting same name sibling support from the core

2010-02-16 Thread Thomas Müller
Hi,

A very simple implementation of my idea:

http://h2database.com/p.html#e5e5d0fa3aabc42932e6065a37b1f6a8

The method hasSameNameSibling() that is called for each remove(). If
it turns out to be a performance problem we could add a hidden
property in the first SNS node itself (only required there).

Does anybody see any other obvious problems?

Regards,
Thomas


[jira] Created: (JCR-2499) Add simple banchmarking tools for jcr2spi read performance

2010-02-16 Thread JIRA
Add simple banchmarking tools for jcr2spi read performance
--

 Key: JCR-2499
 URL: https://issues.apache.org/jira/browse/JCR-2499
 Project: Jackrabbit Content Repository
  Issue Type: Task
Affects Versions: 2.1.0
Reporter: Michael Dürig
Assignee: Michael Dürig




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (JCR-2499) Add simple banchmarking tools for jcr2spi read performance

2010-02-16 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/JCR-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated JCR-2499:
---

Component/s: jackrabbit-jcr2spi

 Add simple banchmarking tools for jcr2spi read performance
 --

 Key: JCR-2499
 URL: https://issues.apache.org/jira/browse/JCR-2499
 Project: Jackrabbit Content Repository
  Issue Type: Task
  Components: jackrabbit-jcr2spi
Affects Versions: 2.1.0
Reporter: Michael Dürig
Assignee: Michael Dürig



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (JCR-2492) Garbage Collector remove data for active node

2010-02-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/JCR-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834205#action_12834205
 ] 

Cédric Chantepie commented on JCR-2492:
---

I'm still able to reproduce this trouble with the 42Gb datastore.
I've been able to do it once with a smaller datastore, I will try to figure out 
what is exactly its cause.

It seems that jackrabbit-core used by my RAR is 1.4 (not 1.4.5), even if other 
libs are 1.4.5.

Getting jackrabbit-1.4 from SVN, I've some doubt about something in 
org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager::getAllNodeIds
 :
-- Statement stmt = connectionManager.executeStmt(sql, keys, false, maxCount + 
10);
With + 10, infinite maxCount (0) is turned in 10, so as far as I understand, 
getAllNodeIds asks its connectionManager to get all nodes, but with a query 
whose result is limited to 10 rows.

If I'm right, GarbageCollector using getAllNodesIds from given 
IterablePersistenceManager (scanPersistenceManagers) doesn't really get all 
nodes (due to rows limit), and so only some nodes are marked (date updated). 
Nodes not marked (not included in retrieved rows), are then considered as 
removable by the deleteUnused method of GarbageCollector.

 Garbage Collector remove data for active node
 -

 Key: JCR-2492
 URL: https://issues.apache.org/jira/browse/JCR-2492
 Project: Jackrabbit Content Repository
  Issue Type: Bug
Affects Versions: core 1.4.5
 Environment: Linux 2.6.x (gentoo or fedora), JDK 1.5 (sun or 
 jrockit), JBoss 4.2.3.GA, Derby (10.4.1.3), PostgreSQL (8.1.11 or 8.0.3)
 * FileSystem = LocalFileSystem
 * custom AccessManager
 * PersistenceManager = PostgreSQLPersistenceManager
 * SearchIndex, textFilterClasses = 
 * DataStore = FileDataStore (minLogRecord = 100)
Reporter: Cédric Chantepie
Priority: Critical

 When we use GarbageCollector on a 42Gb datastore, GarbageCollector erase all 
 data.
 Back with node, none have any longer data : jcr:data was removed as data in 
 datastore no longer exist.
 On some smaller test repository, this trouble does not occur.
 We will try to update Jackrabbit version, but at least it could be good to 
 be sure what is really the trouble with GC in Jackrabbit 1.4.5 so that we can 
 be sure that updating it will really fix that.
 Thanks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Reopened: (JCR-2492) Garbage Collector remove data for active node

2010-02-16 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/JCR-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cédric Chantepie reopened JCR-2492:
---


Can be reproduced by reporter (me). Trying to make a testcase that can be 
uploaded there. Really need to figure out whether the cause was fixed by newer 
Jackrabbit revision, as this trouble makes datastore remove active data.

 Garbage Collector remove data for active node
 -

 Key: JCR-2492
 URL: https://issues.apache.org/jira/browse/JCR-2492
 Project: Jackrabbit Content Repository
  Issue Type: Bug
Affects Versions: core 1.4.5
 Environment: Linux 2.6.x (gentoo or fedora), JDK 1.5 (sun or 
 jrockit), JBoss 4.2.3.GA, Derby (10.4.1.3), PostgreSQL (8.1.11 or 8.0.3)
 * FileSystem = LocalFileSystem
 * custom AccessManager
 * PersistenceManager = PostgreSQLPersistenceManager
 * SearchIndex, textFilterClasses = 
 * DataStore = FileDataStore (minLogRecord = 100)
Reporter: Cédric Chantepie
Priority: Critical

 When we use GarbageCollector on a 42Gb datastore, GarbageCollector erase all 
 data.
 Back with node, none have any longer data : jcr:data was removed as data in 
 datastore no longer exist.
 On some smaller test repository, this trouble does not occur.
 We will try to update Jackrabbit version, but at least it could be good to 
 be sure what is really the trouble with GC in Jackrabbit 1.4.5 so that we can 
 be sure that updating it will really fix that.
 Thanks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (JCR-2492) Garbage Collector remove data for active node

2010-02-16 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/JCR-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cédric Chantepie updated JCR-2492:
--

Affects Version/s: (was: core 1.4.5)
   1.4

 Garbage Collector remove data for active node
 -

 Key: JCR-2492
 URL: https://issues.apache.org/jira/browse/JCR-2492
 Project: Jackrabbit Content Repository
  Issue Type: Bug
Affects Versions: 1.4
 Environment: Linux 2.6.x (gentoo or fedora), JDK 1.5 (sun or 
 jrockit), JBoss 4.2.3.GA, Derby (10.4.1.3), PostgreSQL (8.1.11 or 8.0.3)
 * FileSystem = LocalFileSystem
 * custom AccessManager
 * PersistenceManager = PostgreSQLPersistenceManager
 * SearchIndex, textFilterClasses = 
 * DataStore = FileDataStore (minLogRecord = 100)
Reporter: Cédric Chantepie
Priority: Critical

 When we use GarbageCollector on a 42Gb datastore, GarbageCollector erase all 
 data.
 Back with node, none have any longer data : jcr:data was removed as data in 
 datastore no longer exist.
 On some smaller test repository, this trouble does not occur.
 We will try to update Jackrabbit version, but at least it could be good to 
 be sure what is really the trouble with GC in Jackrabbit 1.4.5 so that we can 
 be sure that updating it will really fix that.
 Thanks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (JCR-2499) Add simple benchmarking tools for jcr2spi read performance

2010-02-16 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/JCR-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated JCR-2499:
---

Summary: Add simple benchmarking tools for jcr2spi read performance  (was: 
Add simple banchmarking tools for jcr2spi read performance)

 Add simple benchmarking tools for jcr2spi read performance
 --

 Key: JCR-2499
 URL: https://issues.apache.org/jira/browse/JCR-2499
 Project: Jackrabbit Content Repository
  Issue Type: Task
  Components: jackrabbit-jcr2spi
Affects Versions: 2.1.0
Reporter: Michael Dürig
Assignee: Michael Dürig
 Fix For: 2.1.0




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (JCR-2499) Add simple benchmarking tools for jcr2spi read performance

2010-02-16 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/JCR-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig resolved JCR-2499.


   Resolution: Fixed
Fix Version/s: 2.1.0

Fixed at revision 910515  


 Add simple benchmarking tools for jcr2spi read performance
 --

 Key: JCR-2499
 URL: https://issues.apache.org/jira/browse/JCR-2499
 Project: Jackrabbit Content Repository
  Issue Type: Task
  Components: jackrabbit-jcr2spi
Affects Versions: 2.1.0
Reporter: Michael Dürig
Assignee: Michael Dürig
 Fix For: 2.1.0




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (JCR-2063) FileDataStore: garbage collection can delete files that are still needed

2010-02-16 Thread Thomas Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/JCR-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834216#action_12834216
 ] 

Thomas Mueller commented on JCR-2063:
-

A workaround for implementations where this is not fixed is:

gc.mark();
try {
// sleep to ensure the last modified time is updated
// even for file system with a lower time resolution
Thread.sleep(5000);
} catch (Exception e) {
// can not ignore, otherwise data that is in use may be deleted
throw new RepositoryException(Interrupted);
}
gc.mark();



 FileDataStore: garbage collection can delete files that are still needed
 

 Key: JCR-2063
 URL: https://issues.apache.org/jira/browse/JCR-2063
 Project: Jackrabbit Content Repository
  Issue Type: Bug
  Components: jackrabbit-core
Reporter: Thomas Mueller
Assignee: Thomas Mueller
 Fix For: 1.5.5


 It looks like the FileDataStore garbage collection (both regular scan and 
 persistence manager scan) can delete files that are still needed.
 Currently it looks like the reason is the last access time resolution of the 
 operating system. This is 2 seconds for FAT and Mac OS X, NTFS 100 ns, and 1 
 second for other file systems. That means file that are scanned at the very 
 beginning are sometimes deleted, because they have a later last modified time 
 then when the scan was started.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (JCR-2492) Garbage Collector remove data for active node

2010-02-16 Thread Thomas Mueller (JIRA)

 [ 
https://issues.apache.org/jira/browse/JCR-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller resolved JCR-2492.
-

Resolution: Fixed

There are other problems with version 1.4.x, see also JCR-1414 and specially 
JCR-2063, which was not backported to 1.4.x. See also the comment there for a 
workaround.

Please re-open the bug if you can still reproduce it.


 Garbage Collector remove data for active node
 -

 Key: JCR-2492
 URL: https://issues.apache.org/jira/browse/JCR-2492
 Project: Jackrabbit Content Repository
  Issue Type: Bug
Affects Versions: 1.4
 Environment: Linux 2.6.x (gentoo or fedora), JDK 1.5 (sun or 
 jrockit), JBoss 4.2.3.GA, Derby (10.4.1.3), PostgreSQL (8.1.11 or 8.0.3)
 * FileSystem = LocalFileSystem
 * custom AccessManager
 * PersistenceManager = PostgreSQLPersistenceManager
 * SearchIndex, textFilterClasses = 
 * DataStore = FileDataStore (minLogRecord = 100)
Reporter: Cédric Chantepie
Priority: Critical

 When we use GarbageCollector on a 42Gb datastore, GarbageCollector erase all 
 data.
 Back with node, none have any longer data : jcr:data was removed as data in 
 datastore no longer exist.
 On some smaller test repository, this trouble does not occur.
 We will try to update Jackrabbit version, but at least it could be good to 
 be sure what is really the trouble with GC in Jackrabbit 1.4.5 so that we can 
 be sure that updating it will really fix that.
 Thanks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (JCR-2493) Unit tests for persistence managers

2010-02-16 Thread Jukka Zitting (JIRA)

 [ 
https://issues.apache.org/jira/browse/JCR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting resolved JCR-2493.


   Resolution: Fixed
Fix Version/s: 2.1.0

Patch committed in revision 910526. Good idea about in-memory databases. I 
updated the H2 JDBC URLs.

 Unit tests for persistence managers
 ---

 Key: JCR-2493
 URL: https://issues.apache.org/jira/browse/JCR-2493
 Project: Jackrabbit Content Repository
  Issue Type: Improvement
  Components: jackrabbit-core
Reporter: Jukka Zitting
Assignee: Jukka Zitting
Priority: Minor
 Fix For: 2.1.0

 Attachments: JCR-2493.patch


 Currently we only test our persistence managers indirectly via JCR-level test 
 cases. The downside of this approach is that we can only test one persistence 
 manager implementation at a time, and need separate build profiles to switch 
 from one implementation to another. To ensure better coverage and consistent 
 behaviour across all our persistence managers I implemented a simple unit 
 test that works directly against the PersistenceManager interface.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (JCR-2483) Out of memory error while adding a new host due to large number of revisions

2010-02-16 Thread Jukka Zitting (JIRA)

 [ 
https://issues.apache.org/jira/browse/JCR-2483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting updated JCR-2483:
---

Affects Version/s: 1.6.0
Fix Version/s: (was: 1.6.0)

 Out of memory error while adding a new host due to large number of revisions
 

 Key: JCR-2483
 URL: https://issues.apache.org/jira/browse/JCR-2483
 Project: Jackrabbit Content Repository
  Issue Type: Improvement
  Components: clustering
Affects Versions: 1.6.0
 Environment: MySQL DB. 512 MB memory allocated to java app.
Reporter: aasoj
 Attachments: patch


 In a cluster deployment, revisions are saved in Journal Table in the DB. 
 After a while a huge number of revisions can get created (around 70 k in our 
 test). When a new host is added to the cluster, it tries to read all the 
 revisions and hence the following error:
 Caused by: java.lang.OutOfMemoryError: Java heap space
 at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2931)
 at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2871)
 at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3414)
 at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:910)
 at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1405)
 at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2816)
 at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:467)
 at 
 com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:2510)
 at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1746)
 at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2135)
 at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2542)
 at 
 com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1734)
 at 
 com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:995)
 at 
 org.apache.jackrabbit.core.journal.DatabaseJournal.getRecords(DatabaseJournal.java:460)
 at 
 org.apache.jackrabbit.core.journal.AbstractJournal.doSync(AbstractJournal.java:201)
 at 
 org.apache.jackrabbit.core.journal.AbstractJournal.sync(AbstractJournal.java:188)
 at 
 org.apache.jackrabbit.core.cluster.ClusterNode.sync(ClusterNode.java:329)
 at 
 org.apache.jackrabbit.core.cluster.ClusterNode.start(ClusterNode.java:270)
 This can also happen to an existing host in the cluster when the number of 
 revisions returned is very high.
 Possible solutions:
 1. Cleaning old revisions using Janitor thread: This may be good for new 
 hosts. But it will fail in a scenario when sync delay is high (few hours) and 
 number of updates is high in existing hosts in the cluster
 2. Increases memory allocated to Java process: This is not a feasible option 
 always
 3. Limit the number of updates read from the DB in any cycle.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (JCR-2492) Garbage Collector remove data for active node

2010-02-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/JCR-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834234#action_12834234
 ] 

Cédric Chantepie commented on JCR-2492:
---

I think the main cause for this trouble is there : 
http://svn.apache.org/viewvc/jackrabbit/branches/1.4/jackrabbit-core/src/main/java/org/apache/jackrabbit/core/persistence/bundle/BundleDbPersistenceManager.java?p2=%2Fjackrabbit%2Fbranches%2F1.4%2Fjackrabbit-core%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fjackrabbit%2Fcore%2Fpersistence%2Fbundle%2FBundleDbPersistenceManager.javap1=%2Fjackrabbit%2Fbranches%2F1.4%2Fjackrabbit-core%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fjackrabbit%2Fcore%2Fpersistence%2Fbundle%2FBundleDbPersistenceManager.javar1=633844r2=633843view=diffpathrev=633844


 Garbage Collector remove data for active node
 -

 Key: JCR-2492
 URL: https://issues.apache.org/jira/browse/JCR-2492
 Project: Jackrabbit Content Repository
  Issue Type: Bug
Affects Versions: 1.4
 Environment: Linux 2.6.x (gentoo or fedora), JDK 1.5 (sun or 
 jrockit), JBoss 4.2.3.GA, Derby (10.4.1.3), PostgreSQL (8.1.11 or 8.0.3)
 * FileSystem = LocalFileSystem
 * custom AccessManager
 * PersistenceManager = PostgreSQLPersistenceManager
 * SearchIndex, textFilterClasses = 
 * DataStore = FileDataStore (minLogRecord = 100)
Reporter: Cédric Chantepie
Priority: Critical

 When we use GarbageCollector on a 42Gb datastore, GarbageCollector erase all 
 data.
 Back with node, none have any longer data : jcr:data was removed as data in 
 datastore no longer exist.
 On some smaller test repository, this trouble does not occur.
 We will try to update Jackrabbit version, but at least it could be good to 
 be sure what is really the trouble with GC in Jackrabbit 1.4.5 so that we can 
 be sure that updating it will really fix that.
 Thanks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (JCR-1414) Data store garbage collection: inUse not correctly synchronized

2010-02-16 Thread Thomas Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/JCR-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834241#action_12834241
 ] 

Thomas Mueller commented on JCR-1414:
-

Revision 633844 also fixed a bug in BundleDbPersistenceManager, which caused 
data store garbage collection 
to delete almost all data when using a BundleDbPersistenceManager. See also 
JCR-2492.

 Data store garbage collection: inUse not correctly synchronized
 ---

 Key: JCR-1414
 URL: https://issues.apache.org/jira/browse/JCR-1414
 Project: Jackrabbit Content Repository
  Issue Type: Bug
  Components: jackrabbit-core
Affects Versions: 1.4, core 1.4.1
Reporter: Thomas Mueller
Assignee: Thomas Mueller
 Fix For: core 1.4.2


 Access to the fields DbDataStore.inUse and FileDataStore.inUse is not 
 synchronized.
 This is a problem when concurrently calling GarbageCollector.deleteUnused() 
 and accessing the data store (ConcurrentModificationException is thrown).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (JCR-2492) Garbage Collector remove data for active node

2010-02-16 Thread Thomas Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/JCR-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834242#action_12834242
 ] 

Thomas Mueller commented on JCR-2492:
-

Hi,

I think your are right. I have added a comment in JCR-1414 about this.
So I guess this makes it a duplicate of JCR-1414.

A workaround is to disable the PersistenceManager scan using 
GarbageCollector.setPersistenceManagerScan(false), 
however this will not solve the other problems of JCR-1414 and JCR-2063.


 Garbage Collector remove data for active node
 -

 Key: JCR-2492
 URL: https://issues.apache.org/jira/browse/JCR-2492
 Project: Jackrabbit Content Repository
  Issue Type: Bug
Affects Versions: 1.4
 Environment: Linux 2.6.x (gentoo or fedora), JDK 1.5 (sun or 
 jrockit), JBoss 4.2.3.GA, Derby (10.4.1.3), PostgreSQL (8.1.11 or 8.0.3)
 * FileSystem = LocalFileSystem
 * custom AccessManager
 * PersistenceManager = PostgreSQLPersistenceManager
 * SearchIndex, textFilterClasses = 
 * DataStore = FileDataStore (minLogRecord = 100)
Reporter: Cédric Chantepie
Priority: Critical

 When we use GarbageCollector on a 42Gb datastore, GarbageCollector erase all 
 data.
 Back with node, none have any longer data : jcr:data was removed as data in 
 datastore no longer exist.
 On some smaller test repository, this trouble does not occur.
 We will try to update Jackrabbit version, but at least it could be good to 
 be sure what is really the trouble with GC in Jackrabbit 1.4.5 so that we can 
 be sure that updating it will really fix that.
 Thanks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (JCR-2492) Garbage Collector remove data for active node

2010-02-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/JCR-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834247#action_12834247
 ] 

Cédric Chantepie commented on JCR-2492:
---

I will try using Jackrabbit 2.0.0 rather than the workaround for 1.4 .
Thanks, now it's clear.

 Garbage Collector remove data for active node
 -

 Key: JCR-2492
 URL: https://issues.apache.org/jira/browse/JCR-2492
 Project: Jackrabbit Content Repository
  Issue Type: Bug
Affects Versions: 1.4
 Environment: Linux 2.6.x (gentoo or fedora), JDK 1.5 (sun or 
 jrockit), JBoss 4.2.3.GA, Derby (10.4.1.3), PostgreSQL (8.1.11 or 8.0.3)
 * FileSystem = LocalFileSystem
 * custom AccessManager
 * PersistenceManager = PostgreSQLPersistenceManager
 * SearchIndex, textFilterClasses = 
 * DataStore = FileDataStore (minLogRecord = 100)
Reporter: Cédric Chantepie
Priority: Critical

 When we use GarbageCollector on a 42Gb datastore, GarbageCollector erase all 
 data.
 Back with node, none have any longer data : jcr:data was removed as data in 
 datastore no longer exist.
 On some smaller test repository, this trouble does not occur.
 We will try to update Jackrabbit version, but at least it could be good to 
 be sure what is really the trouble with GC in Jackrabbit 1.4.5 so that we can 
 be sure that updating it will really fix that.
 Thanks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (JCR-2498) Implement caching mechanism for ItemInfo batches

2010-02-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/JCR-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834250#action_12834250
 ] 

Michael Dürig commented on JCR-2498:


As promised some numbers. All measurements are done using 
ReadPerformanceTest.java [1]. 

Batch size: 24340, 12170, 6085, 3043, 1521, 761, 380, 190, 95, 48, 24, 12, 6, 
3, 1
ms per request: 20.2, 24.2, 17.4, 16.3, 7.3, 3.0, 2.5, 2.1, 2.0, 1.3, 1.3, 1.1, 
1.0, 1.0, 1.1

The performance impact of large batches is clearly visible here. Without 
refresh operations [2] the picture remains similar but less pronounced:

Batch size: 24340, 12170, 6085, 3043, 1521, 761, 380, 190, 95, 48, 24, 12, 6, 
3, 1
ms per request: 5.1, 17.1, 16.3, 12.0, 6.0, 2.6, 2.7, 2.0, 2.0, 1.4, 1.4, 1.2, 
1.0, 1.1, 1.3


[1] 
http://svn.apache.org/viewvc/jackrabbit/trunk/jackrabbit-jcr2spi/src/test/java/org/apache/jackrabbit/jcr2spi/benchmark/ReadPerformanceTest.java?revision=910523view=markuppathrev=910523

[2] See upcoming patch



 Implement caching mechanism for ItemInfo batches
 

 Key: JCR-2498
 URL: https://issues.apache.org/jira/browse/JCR-2498
 Project: Jackrabbit Content Repository
  Issue Type: Improvement
  Components: jackrabbit-jcr2spi, jackrabbit-spi
Reporter: Michael Dürig
Assignee: Michael Dürig

 Currently all ItemInfos returned by RepositoryService#getItemInfos are placed 
 into the hierarchy right away. For big batch sizes this is prohibitively 
 expensive. The overhead is so great (*), that it quickly outweighs the 
 overhead of network round trips. Moreover, SPI implementations usually choose 
 the batch in a way determined by the backing persistence store and not by the 
 requirements of the consuming application on the JCR side. That is, many of 
 the items in the batch might never be actually needed. 
 I suggest to implement a cache for ItemInfo batches. Conceptually such a 
 cache would live inside jcr2spi right above the SPI API. The actual 
 implementation would be provided by SPI implementations. This approach allows 
 for fine tuning cache/batch sizes to a given persistence store and network 
 environment. This would also better separate different concerns: the purpose 
 of the existing item cache is to optimize for the requirement of the consumer 
 of the JCR API ('the application'). The new ItemInfo cache is to optimize for 
 the specific network environment and backing persistence store. 
 (*) Numbers follow 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (JCR-2498) Implement caching mechanism for ItemInfo batches

2010-02-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/JCR-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834251#action_12834251
 ] 

Michael Dürig commented on JCR-2498:


Here's the patch mentioned in [2] above. 

Index: 
src/test/java/org/apache/jackrabbit/jcr2spi/benchmark/ReadPerformanceTest.java
===
--- 
src/test/java/org/apache/jackrabbit/jcr2spi/benchmark/ReadPerformanceTest.java
+++ 
src/test/java/org/apache/jackrabbit/jcr2spi/benchmark/ReadPerformanceTest.java
@@ -136,7 +136,7 @@
 final ListItem items = new ArrayListItem();
 
 for (int k = 0; k  count; k ++) {
-switch (rnd.nextInt(4)) {
+switch (rnd.nextInt(3)) {
 case 0: { // getItem
 callables.add(new CallableLong() {
 public Long call() throws Exception {


 Implement caching mechanism for ItemInfo batches
 

 Key: JCR-2498
 URL: https://issues.apache.org/jira/browse/JCR-2498
 Project: Jackrabbit Content Repository
  Issue Type: Improvement
  Components: jackrabbit-jcr2spi, jackrabbit-spi
Reporter: Michael Dürig
Assignee: Michael Dürig

 Currently all ItemInfos returned by RepositoryService#getItemInfos are placed 
 into the hierarchy right away. For big batch sizes this is prohibitively 
 expensive. The overhead is so great (*), that it quickly outweighs the 
 overhead of network round trips. Moreover, SPI implementations usually choose 
 the batch in a way determined by the backing persistence store and not by the 
 requirements of the consuming application on the JCR side. That is, many of 
 the items in the batch might never be actually needed. 
 I suggest to implement a cache for ItemInfo batches. Conceptually such a 
 cache would live inside jcr2spi right above the SPI API. The actual 
 implementation would be provided by SPI implementations. This approach allows 
 for fine tuning cache/batch sizes to a given persistence store and network 
 environment. This would also better separate different concerns: the purpose 
 of the existing item cache is to optimize for the requirement of the consumer 
 of the JCR API ('the application'). The new ItemInfo cache is to optimize for 
 the specific network environment and backing persistence store. 
 (*) Numbers follow 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (JCR-2498) Implement caching mechanism for ItemInfo batches

2010-02-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/JCR-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834272#action_12834272
 ] 

Michael Dürig commented on JCR-2498:


Some more numbers demonstrating the effect with JCR-2498-poc.patch applied. The 
'new/old time' row gives the quotients of the request times with the patch 
applied vs. without the patch applied. The 'new/old rts' row gives the 
quotients of the network round trips with the patch applied vs. without the 
patch applied. 

The first measurement includes all operations (getItem, getNode, getProperty 
and refresh) as above. 

Batch size: 24340, 12170, 6085, 3043, 1521, 761, 380, 190, 95, 48, 24, 12, 6, 
3, 1
new/old time: 0.1, 0.1, 0.1, 0.1, 0.2, 0.3, 0.4, 0.5, 0.5, 0.7, 0.6, 1, 1, 1.1, 
0.8
new/old rts: 2.1, 2.8, 1.8, 2.4, 1.8, 1.4, 1.3, 1.2, 1, 1.1, 1, 1, 0.9, 1, 0.9

Most obvious is the vast performance increase (up to factor 10) for reading 
items. However this comes along with an increase of the number of network round 
trips. Three things should be noted here: 1. For realistic batch sizes the 
increase of the number of network round trips is not so significant. 2. The 
increase of the number of network round trips are caused by the refresh 
operations. In the test scenario the number of refresh operations is 
unrealistically high (every fourth operation is a refresh). 3. The items in the 
batches of the test case are not realistically distributed across the items of 
the repository. That is, the items are randomly chosen from the repository. In 
practice however, the items in a batch would be related to each other by some 
locality criteria. I assume that this would further mitigate the observed 
effect. 

For completeness sake here the same measurement as above but without refresh 
operations: 

Batch size: 24340, 12170, 6085, 3043, 1521, 761, 380, 190, 95, 48, 24, 12, 6, 
3, 1
new/old time: 0.2, 0, 0, 0.1, 0.1, 0.2, 0.4, 0.4, 0.6, 0.6, 0.7, 1, 1, 1, 1.1
new/old rts: 1, 1, 0.9, 0.9, 0.8, 0.9, 0.9, 0.9, 0.9, 1, 1, 1, 1, 1, 1


 Implement caching mechanism for ItemInfo batches
 

 Key: JCR-2498
 URL: https://issues.apache.org/jira/browse/JCR-2498
 Project: Jackrabbit Content Repository
  Issue Type: Improvement
  Components: jackrabbit-jcr2spi, jackrabbit-spi
Reporter: Michael Dürig
Assignee: Michael Dürig
 Attachments: JCR-2498-poc.patch


 Currently all ItemInfos returned by RepositoryService#getItemInfos are placed 
 into the hierarchy right away. For big batch sizes this is prohibitively 
 expensive. The overhead is so great (*), that it quickly outweighs the 
 overhead of network round trips. Moreover, SPI implementations usually choose 
 the batch in a way determined by the backing persistence store and not by the 
 requirements of the consuming application on the JCR side. That is, many of 
 the items in the batch might never be actually needed. 
 I suggest to implement a cache for ItemInfo batches. Conceptually such a 
 cache would live inside jcr2spi right above the SPI API. The actual 
 implementation would be provided by SPI implementations. This approach allows 
 for fine tuning cache/batch sizes to a given persistence store and network 
 environment. This would also better separate different concerns: the purpose 
 of the existing item cache is to optimize for the requirement of the consumer 
 of the JCR API ('the application'). The new ItemInfo cache is to optimize for 
 the specific network environment and backing persistence store. 
 (*) Numbers follow 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (JCR-2498) Implement caching mechanism for ItemInfo batches

2010-02-16 Thread angela (JIRA)

[ 
https://issues.apache.org/jira/browse/JCR-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834324#action_12834324
 ] 

angela commented on JCR-2498:
-

although i didn't look at the poc-patch in detailbased on our f2f 
discussion: looks reasonable to me :)



 Implement caching mechanism for ItemInfo batches
 

 Key: JCR-2498
 URL: https://issues.apache.org/jira/browse/JCR-2498
 Project: Jackrabbit Content Repository
  Issue Type: Improvement
  Components: jackrabbit-jcr2spi, jackrabbit-spi
Reporter: Michael Dürig
Assignee: Michael Dürig
 Attachments: JCR-2498-poc.patch


 Currently all ItemInfos returned by RepositoryService#getItemInfos are placed 
 into the hierarchy right away. For big batch sizes this is prohibitively 
 expensive. The overhead is so great (*), that it quickly outweighs the 
 overhead of network round trips. Moreover, SPI implementations usually choose 
 the batch in a way determined by the backing persistence store and not by the 
 requirements of the consuming application on the JCR side. That is, many of 
 the items in the batch might never be actually needed. 
 I suggest to implement a cache for ItemInfo batches. Conceptually such a 
 cache would live inside jcr2spi right above the SPI API. The actual 
 implementation would be provided by SPI implementations. This approach allows 
 for fine tuning cache/batch sizes to a given persistence store and network 
 environment. This would also better separate different concerns: the purpose 
 of the existing item cache is to optimize for the requirement of the consumer 
 of the JCR API ('the application'). The new ItemInfo cache is to optimize for 
 the specific network environment and backing persistence store. 
 (*) Numbers follow 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (JCR-2426) Deadlock in lucene (Jackrabbit 1.4.4)

2010-02-16 Thread Antonio Martinez (JIRA)

 [ 
https://issues.apache.org/jira/browse/JCR-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antonio Martinez updated JCR-2426:
--

Attachment: deadlock_2nd_setup.txt

Both setups have same JVM version and the thread dump (see 
deadlock_2nd_setup.txt).

 Deadlock in lucene (Jackrabbit 1.4.4)
 -

 Key: JCR-2426
 URL: https://issues.apache.org/jira/browse/JCR-2426
 Project: Jackrabbit Content Repository
  Issue Type: Bug
  Components: indexing
Affects Versions: core 1.4.4
Reporter: Antonio Martinez
Priority: Blocker
 Attachments: deadlock_2nd_setup.txt, deadlock_summary.txt


 We get a deadlock in lucene part of jackrabbit (see deadlock_summary.txt)
 This issue has been observed in two different production setups running 
 Jackrabbit 1.4.4 in cluster configuration

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.