Sling BOF at ApacheCon Atlanta

2007-08-30 Thread Felix Meschberger
Hi all,

Just wanted to spread word, that we listed for a BOF at ApacheCon07 in
Atlanta. If you are interested in talking about the new Sling project
and its future directions, please show your interest by bumping the
counter up at [1]. Thanks.

Regards
Felix

[1] http://wiki.apache.org/apachecon/BirdsOfaFeatherUs07



RE: Database connections queries

2007-08-30 Thread Martijn Hendriks
Hi,

 -Original Message-
 From: Jukka Zitting [mailto:[EMAIL PROTECTED] 

 On 8/29/07, Martijn Hendriks [EMAIL PROTECTED] wrote:
  That's a nice idea! But wouldn't it be confusing that one can get a 
  Node object through the nextNode() method which does not 
 exist in the 
  repository anymore?
 
 The search index should always be in sync with the persistent 
 state, so such situations should not happen.

If we don't take a clustered setup into account, then indeed the search
index should always be in sync with the persistent state. The ScoreNode
objects in the LazyQueryResultImpls resultNodes field are, however, in
general not in sync with the persistent state as they are always loaded
during the construction of the LazyQueryResultImpl (because the
Searchindex.getResultFetchSize() currently returns Integer.MAX_VALUE).
Thus, after construction of the LazyQueryResultImpl another thread could
remove nodes that are in the resultNodes array of the
LazyQueryResultImpl with the behaviour described above as a consequence.

Best wishes,

Martijn


Re: Database connections queries

2007-08-30 Thread Jukka Zitting
Hi,

On 8/30/07, Martijn Hendriks [EMAIL PROTECTED] wrote:
 If we don't take a clustered setup into account, then indeed the search
 index should always be in sync with the persistent state. The ScoreNode
 objects in the LazyQueryResultImpls resultNodes field are, however, in
 general not in sync with the persistent state as they are always loaded
 during the construction of the LazyQueryResultImpl (because the
 Searchindex.getResultFetchSize() currently returns Integer.MAX_VALUE).
 Thus, after construction of the LazyQueryResultImpl another thread could
 remove nodes that are in the resultNodes array of the
 LazyQueryResultImpl with the behaviour described above as a consequence.

I don't see that as a big problem. It's roughly equivalent to the
following case:

Session sessionA = ...;
Session sessionB = ...;

Node node = sessionA.getRootNode().getNode(path/to/node);

sessionB.getRootNode().getNode(path/to/node).remove();
sessionB.save();

node.getProperty(...);

BR,

Jukka Zitting


RE: Database connections queries

2007-08-30 Thread Martijn Hendriks
Hi,

I agree that it isn't a big problem; your example shows that approx.
equivalent behaviour could already occur now.

We do have a very concrete problem which is related: an installation of
our product which uses Jackrabbit for persistence keeps logging lots of
warnings (in the order of thousands per day):

Aug 29, 2007 4:56:33 PM
org.apache.jackrabbit.core.query.lucene.LazyQueryResultImpl$LazyScoreNod
eIterator fetchNext
WARNING: Exception retrieving Node with UUID:
b11aa8a2-beed-4d24-95a0-592b6b193534: javax.jcr.ItemNotFoundException:
b11aa8a2-beed-4d24-95a0-592b6b193534

Re-building the search index does not help: the logging of these
warnings just continuous. I just can't believe that these result from a
save that throws away many nodes while another thread loops over a query
result. Any thoughts on this?

Best regards,

Martijn

--

Martijn Hendriks
GX creative online development B.V.
 
t: 024 - 3888 261
f: 024 - 3888 621
e: [EMAIL PROTECTED]
 
Wijchenseweg 111
6538 SW Nijmegen
http://www.gx.nl/  

 -Original Message-
 From: Jukka Zitting [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, August 30, 2007 9:40 AM
 To: dev@jackrabbit.apache.org
 Subject: Re: Database connections  queries
 
 Hi,
 
 On 8/30/07, Martijn Hendriks [EMAIL PROTECTED] wrote:
  If we don't take a clustered setup into account, then indeed the 
  search index should always be in sync with the persistent 
 state. The 
  ScoreNode objects in the LazyQueryResultImpls resultNodes 
 field are, 
  however, in general not in sync with the persistent state 
 as they are 
  always loaded during the construction of the LazyQueryResultImpl 
  (because the
  Searchindex.getResultFetchSize() currently returns 
 Integer.MAX_VALUE).
  Thus, after construction of the LazyQueryResultImpl another thread 
  could remove nodes that are in the resultNodes array of the 
  LazyQueryResultImpl with the behaviour described above as a 
 consequence.
 
 I don't see that as a big problem. It's roughly equivalent to 
 the following case:
 
 Session sessionA = ...;
 Session sessionB = ...;
 
 Node node = sessionA.getRootNode().getNode(path/to/node);
 
 sessionB.getRootNode().getNode(path/to/node).remove();
 sessionB.save();
 
 node.getProperty(...);
 
 BR,
 
 Jukka Zitting
 


Re: Database connections queries

2007-08-30 Thread Jukka Zitting
Hi,

On 8/30/07, Martijn Hendriks [EMAIL PROTECTED] wrote:
 We do have a very concrete problem which is related: an installation of
 our product which uses Jackrabbit for persistence keeps logging lots of
 warnings (in the order of thousands per day):

 Aug 29, 2007 4:56:33 PM
 org.apache.jackrabbit.core.query.lucene.LazyQueryResultImpl$LazyScoreNod
 eIterator fetchNext
 WARNING: Exception retrieving Node with UUID:
 b11aa8a2-beed-4d24-95a0-592b6b193534: javax.jcr.ItemNotFoundException:
 b11aa8a2-beed-4d24-95a0-592b6b193534

 Re-building the search index does not help: the logging of these
 warnings just continuous. I just can't believe that these result from a
 save that throws away many nodes while another thread loops over a query
 result. Any thoughts on this?

That seems worrisome... Could there be a bug in the search index
update code for deleting entries? Something like that could easily
have been ignored so far if the only effect is a warning in the log
due to the current approach of simply dropping the results for which
an exception is thrown.

BR,

Jukka Zitting


Re: Sling BOF at ApacheCon Atlanta

2007-08-30 Thread Alexandru Popescu ☀
On 8/30/07, Felix Meschberger [EMAIL PROTECTED] wrote:
 Hi all,

 Just wanted to spread word, that we listed for a BOF at ApacheCon07 in
 Atlanta. If you are interested in talking about the new Sling project
 and its future directions, please show your interest by bumping the
 counter up at [1]. Thanks.

 Regards
 Felix

 [1] http://wiki.apache.org/apachecon/BirdsOfaFeatherUs07



That's cool, but unfortunately I will not be able to make it :-(.

./alex
--
.w( the_mindstorm )p.


[jira] Commented: (JCR-926) Global data store for binaries

2007-08-30 Thread Thomas Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/JCR-926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523807
 ] 

Thomas Mueller commented on JCR-926:


Revision 571094: global data store: new in-memory, data store, and temp file 
BLOB
The data store can now be tested, however it is disabled by default. 
To enable, set the system property org.jackrabbit.useDataStore to true
before starting the application: java -Dorg.jackrabbit.useDataStore=true ...
(this does not work, not sure why: mvn -Dorg.jackrabbit.useDataStore=true ...)

 Global data store for binaries
 --

 Key: JCR-926
 URL: https://issues.apache.org/jira/browse/JCR-926
 Project: Jackrabbit
  Issue Type: New Feature
  Components: core
Reporter: Jukka Zitting
 Attachments: dataStore.patch, DataStore.patch, DataStore2.patch, 
 dataStore3.patch, dataStore4.zip, dataStore5-garbageCollector.patch, 
 internalValue.patch, ReadWhileSaveTest.patch


 There are three main problems with the way Jackrabbit currently handles large 
 binary values:
 1) Persisting a large binary value blocks access to the persistence layer for 
 extended amounts of time (see JCR-314)
 2) At least two copies of binary streams are made when saving them through 
 the JCR API: one in the transient space, and one when persisting the value
 3) Versioining and copy operations on nodes or subtrees that contain large 
 binary values can quickly end up consuming excessive amounts of storage space.
 To solve these issues (and to get other nice benefits), I propose that we 
 implement a global data store concept in the repository. A data store is an 
 append-only set of binary values that uses short identifiers to identify and 
 access the stored binary values. The data store would trivially fit the 
 requirements of transient space and transaction handling due to the 
 append-only nature. An explicit mark-and-sweep garbage collection process 
 could be added to avoid concerns about storing garbage values.
 See the recent NGP value record discussion, especially [1], for more 
 background on this idea.
 [1] 
 http://mail-archives.apache.org/mod_mbox/jackrabbit-dev/200705.mbox/[EMAIL 
 PROTECTED]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (JCR-926) Global data store for binaries

2007-08-30 Thread JIRA

[ 
https://issues.apache.org/jira/browse/JCR-926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523809
 ] 

Claus Köll commented on JCR-926:


hi thomas,
first ... great work !
can you explain how to configure the datastore or is this feature not yet 
implemented ?
i mean will the datastore be configureable in the workspace.xml  .. i think,so 
each workspace can have its own
datastore to define different backup solutions ..
thanks
claus


 Global data store for binaries
 --

 Key: JCR-926
 URL: https://issues.apache.org/jira/browse/JCR-926
 Project: Jackrabbit
  Issue Type: New Feature
  Components: core
Reporter: Jukka Zitting
 Attachments: dataStore.patch, DataStore.patch, DataStore2.patch, 
 dataStore3.patch, dataStore4.zip, dataStore5-garbageCollector.patch, 
 internalValue.patch, ReadWhileSaveTest.patch


 There are three main problems with the way Jackrabbit currently handles large 
 binary values:
 1) Persisting a large binary value blocks access to the persistence layer for 
 extended amounts of time (see JCR-314)
 2) At least two copies of binary streams are made when saving them through 
 the JCR API: one in the transient space, and one when persisting the value
 3) Versioining and copy operations on nodes or subtrees that contain large 
 binary values can quickly end up consuming excessive amounts of storage space.
 To solve these issues (and to get other nice benefits), I propose that we 
 implement a global data store concept in the repository. A data store is an 
 append-only set of binary values that uses short identifiers to identify and 
 access the stored binary values. The data store would trivially fit the 
 requirements of transient space and transaction handling due to the 
 append-only nature. An explicit mark-and-sweep garbage collection process 
 could be added to avoid concerns about storing garbage values.
 See the recent NGP value record discussion, especially [1], for more 
 background on this idea.
 [1] 
 http://mail-archives.apache.org/mod_mbox/jackrabbit-dev/200705.mbox/[EMAIL 
 PROTECTED]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (JCR-926) Global data store for binaries

2007-08-30 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/JCR-926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523810
 ] 

Jukka Zitting commented on JCR-926:
---

Nice work!

 (this does not work, not sure why: mvn -Dorg.jackrabbit.useDataStore=true ...)

Maven probably forks a separate JVM instance for running the test suite.

 Global data store for binaries
 --

 Key: JCR-926
 URL: https://issues.apache.org/jira/browse/JCR-926
 Project: Jackrabbit
  Issue Type: New Feature
  Components: core
Reporter: Jukka Zitting
 Attachments: dataStore.patch, DataStore.patch, DataStore2.patch, 
 dataStore3.patch, dataStore4.zip, dataStore5-garbageCollector.patch, 
 internalValue.patch, ReadWhileSaveTest.patch


 There are three main problems with the way Jackrabbit currently handles large 
 binary values:
 1) Persisting a large binary value blocks access to the persistence layer for 
 extended amounts of time (see JCR-314)
 2) At least two copies of binary streams are made when saving them through 
 the JCR API: one in the transient space, and one when persisting the value
 3) Versioining and copy operations on nodes or subtrees that contain large 
 binary values can quickly end up consuming excessive amounts of storage space.
 To solve these issues (and to get other nice benefits), I propose that we 
 implement a global data store concept in the repository. A data store is an 
 append-only set of binary values that uses short identifiers to identify and 
 access the stored binary values. The data store would trivially fit the 
 requirements of transient space and transaction handling due to the 
 append-only nature. An explicit mark-and-sweep garbage collection process 
 could be added to avoid concerns about storing garbage values.
 See the recent NGP value record discussion, especially [1], for more 
 background on this idea.
 [1] 
 http://mail-archives.apache.org/mod_mbox/jackrabbit-dev/200705.mbox/[EMAIL 
 PROTECTED]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [VOTE] Approve the Sling project for incubation

2007-08-30 Thread David Nuescheler
+1

regards,
david


[jira] Commented: (JCR-1099) jcr2spi NodeEntryImpl.getPath() blows stack due to getIndex() calling itself

2007-08-30 Thread Julian Reschke (JIRA)

[ 
https://issues.apache.org/jira/browse/JCR-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523824
 ] 

Julian Reschke commented on JCR-1099:
-

Could you try changing ItemDefinitionProviderImpl.getQNodeDefinition() to catch 
a ConstraintViolationException instead of RepositoryException? 


 jcr2spi NodeEntryImpl.getPath() blows stack due to getIndex() calling itself
 

 Key: JCR-1099
 URL: https://issues.apache.org/jira/browse/JCR-1099
 Project: Jackrabbit
  Issue Type: Bug
  Components: SPI
Affects Versions: 1.4
Reporter: David Rauschenbach
 Attachments: repository.xml


 The jcr2spi NodeEntryImpl class contains logic that causes getIndex() to call 
 itself.
 Calling code:
 Session sess = repo.login(creds);
 Node inboxNode = sess.getRootNode().getNode(Inbox);
 inboxNode.getPath(); == blows stack
 Tracing reveals:
 1. NodeEntryImpl.getPath() ultimately calls getIndex()
 2. getIndex() calls NodeState.getDefinition()
 3. which calls ItemDefinitionProviderImpl.getQNodeDefinition(...)
 4. which catches a RepositoryException then calls 
 NodeEntryImpl.getWorkspaceId()
 5. which calls NodeEntryImpl.getWorkspaceIndex()
 6. which calls getIndex() (back to step 2, ad infinitum)
 Configuration:
 1. A configuration is loaded specifying in-memory persist manager
 2. Config is wrapped in TransientRepository
 3. that's wrapped in spi2jcr's RepositoryService using default 
 BatchReadConfig
 4. a jcr2spi provider is instantiated that directly couples to spi2jcr
 5. Node in question is created as follows:
 Session sess = repo.login(creds);
 sess.getRootNode().addNode(Inbox, nt:folder);
 sess.save();
 I guess that's about it.
 David

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (JCR-926) Global data store for binaries

2007-08-30 Thread JIRA

[ 
https://issues.apache.org/jira/browse/JCR-926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523831
 ] 

Claus Köll commented on JCR-926:


ok great work to both of you :-)
I think only one datastore is not a good way.
we have jackrabbit running in a model 3 architecture with one repository and 
for each application one workspace.
the problem is not the backup solution, we would prefere backup incremential by 
third party solutions.
at the moment we define for each workspace different persistencemanagers to 
different db servers.
we want for each workspace application define different SAN storage places, 
because we must discount the
storage volume for each application. if we have only one datastore it is not 
possible to know how much
space each application consumes.
hope for a feature to define that.

one thing at the end ... what do you mean with 
 Deleted only when no longer used (by the garbage collector). 
the files in the datastore are the permanent files or not ?
thanks
claus

 Global data store for binaries
 --

 Key: JCR-926
 URL: https://issues.apache.org/jira/browse/JCR-926
 Project: Jackrabbit
  Issue Type: New Feature
  Components: core
Reporter: Jukka Zitting
 Attachments: dataStore.patch, DataStore.patch, DataStore2.patch, 
 dataStore3.patch, dataStore4.zip, dataStore5-garbageCollector.patch, 
 internalValue.patch, ReadWhileSaveTest.patch


 There are three main problems with the way Jackrabbit currently handles large 
 binary values:
 1) Persisting a large binary value blocks access to the persistence layer for 
 extended amounts of time (see JCR-314)
 2) At least two copies of binary streams are made when saving them through 
 the JCR API: one in the transient space, and one when persisting the value
 3) Versioining and copy operations on nodes or subtrees that contain large 
 binary values can quickly end up consuming excessive amounts of storage space.
 To solve these issues (and to get other nice benefits), I propose that we 
 implement a global data store concept in the repository. A data store is an 
 append-only set of binary values that uses short identifiers to identify and 
 access the stored binary values. The data store would trivially fit the 
 requirements of transient space and transaction handling due to the 
 append-only nature. An explicit mark-and-sweep garbage collection process 
 could be added to avoid concerns about storing garbage values.
 See the recent NGP value record discussion, especially [1], for more 
 background on this idea.
 [1] 
 http://mail-archives.apache.org/mod_mbox/jackrabbit-dev/200705.mbox/[EMAIL 
 PROTECTED]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [VOTE] Approve the Sling project for incubation

2007-08-30 Thread Thomas Mueller
+1

Thomas


[jira] Commented: (JCR-926) Global data store for binaries

2007-08-30 Thread Thomas Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/JCR-926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523835
 ] 

Thomas Mueller commented on JCR-926:


Hi,

 one repository and for each application one workspace.

Why not one repository for each application? Like this you can limit the heap 
memory as well.

 the files in the datastore are the permanent files

If things get deleted, the space must eventually be reclaimed (unless you work 
for a hard drive company).

Thomas

 Global data store for binaries
 --

 Key: JCR-926
 URL: https://issues.apache.org/jira/browse/JCR-926
 Project: Jackrabbit
  Issue Type: New Feature
  Components: core
Reporter: Jukka Zitting
 Attachments: dataStore.patch, DataStore.patch, DataStore2.patch, 
 dataStore3.patch, dataStore4.zip, dataStore5-garbageCollector.patch, 
 internalValue.patch, ReadWhileSaveTest.patch


 There are three main problems with the way Jackrabbit currently handles large 
 binary values:
 1) Persisting a large binary value blocks access to the persistence layer for 
 extended amounts of time (see JCR-314)
 2) At least two copies of binary streams are made when saving them through 
 the JCR API: one in the transient space, and one when persisting the value
 3) Versioining and copy operations on nodes or subtrees that contain large 
 binary values can quickly end up consuming excessive amounts of storage space.
 To solve these issues (and to get other nice benefits), I propose that we 
 implement a global data store concept in the repository. A data store is an 
 append-only set of binary values that uses short identifiers to identify and 
 access the stored binary values. The data store would trivially fit the 
 requirements of transient space and transaction handling due to the 
 append-only nature. An explicit mark-and-sweep garbage collection process 
 could be added to avoid concerns about storing garbage values.
 See the recent NGP value record discussion, especially [1], for more 
 background on this idea.
 [1] 
 http://mail-archives.apache.org/mod_mbox/jackrabbit-dev/200705.mbox/[EMAIL 
 PROTECTED]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (JCR-926) Global data store for binaries

2007-08-30 Thread JIRA

[ 
https://issues.apache.org/jira/browse/JCR-926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523848
 ] 

Claus Köll commented on JCR-926:


the heap is not the problem ... we have a lot ;-)
i think to use different workspaces is for us the better way ..
what are the benefits you would lost with a datastore per workspace ?
i see the datastore as enhancement for the persistencemanager because the other 
properties will be stored through the persistencemanager 
per workspace and now we will put everything from all workspaces into one 
datastore ?
greets 
claus

 Global data store for binaries
 --

 Key: JCR-926
 URL: https://issues.apache.org/jira/browse/JCR-926
 Project: Jackrabbit
  Issue Type: New Feature
  Components: core
Reporter: Jukka Zitting
 Attachments: dataStore.patch, DataStore.patch, DataStore2.patch, 
 dataStore3.patch, dataStore4.zip, dataStore5-garbageCollector.patch, 
 internalValue.patch, ReadWhileSaveTest.patch


 There are three main problems with the way Jackrabbit currently handles large 
 binary values:
 1) Persisting a large binary value blocks access to the persistence layer for 
 extended amounts of time (see JCR-314)
 2) At least two copies of binary streams are made when saving them through 
 the JCR API: one in the transient space, and one when persisting the value
 3) Versioining and copy operations on nodes or subtrees that contain large 
 binary values can quickly end up consuming excessive amounts of storage space.
 To solve these issues (and to get other nice benefits), I propose that we 
 implement a global data store concept in the repository. A data store is an 
 append-only set of binary values that uses short identifiers to identify and 
 access the stored binary values. The data store would trivially fit the 
 requirements of transient space and transaction handling due to the 
 append-only nature. An explicit mark-and-sweep garbage collection process 
 could be added to avoid concerns about storing garbage values.
 See the recent NGP value record discussion, especially [1], for more 
 background on this idea.
 [1] 
 http://mail-archives.apache.org/mod_mbox/jackrabbit-dev/200705.mbox/[EMAIL 
 PROTECTED]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (JCR-926) Global data store for binaries

2007-08-30 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/JCR-926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523852
 ] 

Jukka Zitting commented on JCR-926:
---

A central idea of the *Global* Data Store is that its global to the repository, 
especially to drive down the costs of versioning and other cross-workspace 
operations.

It would in principle be feasible to allow a workspace-specific data store to 
be configured, but that would make handling of cross-workspace operations 
considerably more complex. IMHO the benefits of workspace-local data stores 
wouldn't be worth the added complexity.

On a longer timescale I also believe Jackrabbit should be moving even more to 
centralized repository-global resource handling as that would for example help 
a lot in making things like versioning operations transactional.

As for features like per-workspace quota or backups, I think those would be 
best achieved by implementing the features in Jackrabbit instead of relying on 
the underlying storage mechanism.

 Global data store for binaries
 --

 Key: JCR-926
 URL: https://issues.apache.org/jira/browse/JCR-926
 Project: Jackrabbit
  Issue Type: New Feature
  Components: core
Reporter: Jukka Zitting
 Attachments: dataStore.patch, DataStore.patch, DataStore2.patch, 
 dataStore3.patch, dataStore4.zip, dataStore5-garbageCollector.patch, 
 internalValue.patch, ReadWhileSaveTest.patch


 There are three main problems with the way Jackrabbit currently handles large 
 binary values:
 1) Persisting a large binary value blocks access to the persistence layer for 
 extended amounts of time (see JCR-314)
 2) At least two copies of binary streams are made when saving them through 
 the JCR API: one in the transient space, and one when persisting the value
 3) Versioining and copy operations on nodes or subtrees that contain large 
 binary values can quickly end up consuming excessive amounts of storage space.
 To solve these issues (and to get other nice benefits), I propose that we 
 implement a global data store concept in the repository. A data store is an 
 append-only set of binary values that uses short identifiers to identify and 
 access the stored binary values. The data store would trivially fit the 
 requirements of transient space and transaction handling due to the 
 append-only nature. An explicit mark-and-sweep garbage collection process 
 could be added to avoid concerns about storing garbage values.
 See the recent NGP value record discussion, especially [1], for more 
 background on this idea.
 [1] 
 http://mail-archives.apache.org/mod_mbox/jackrabbit-dev/200705.mbox/[EMAIL 
 PROTECTED]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (JCR-926) Global data store for binaries

2007-08-30 Thread Thomas Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/JCR-926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523853
 ] 

Thomas Mueller commented on JCR-926:


As far as I understand, one (important) use case is to use one workspace for 
'authoring' and another for 'production'. The workspaces contain mostly the 
same data (maybe 90% is the same). Having a data store for each workspace would 
mean having to copy all large files. Having one data store saves you 50% of the 
space (for large objects). Also you can move data from one workspace to the 
other very quickly (because the files don't have to be copied, only the 
identifiers). Also cloning of a workspace is very fast for the same reasons.

 i think to use different workspaces is for us the better way .. 
Do you know about blob store? If not you should try it out, because it sounds 
like this would be exactly what you need. The blob store already available.


 Global data store for binaries
 --

 Key: JCR-926
 URL: https://issues.apache.org/jira/browse/JCR-926
 Project: Jackrabbit
  Issue Type: New Feature
  Components: core
Reporter: Jukka Zitting
 Attachments: dataStore.patch, DataStore.patch, DataStore2.patch, 
 dataStore3.patch, dataStore4.zip, dataStore5-garbageCollector.patch, 
 internalValue.patch, ReadWhileSaveTest.patch


 There are three main problems with the way Jackrabbit currently handles large 
 binary values:
 1) Persisting a large binary value blocks access to the persistence layer for 
 extended amounts of time (see JCR-314)
 2) At least two copies of binary streams are made when saving them through 
 the JCR API: one in the transient space, and one when persisting the value
 3) Versioining and copy operations on nodes or subtrees that contain large 
 binary values can quickly end up consuming excessive amounts of storage space.
 To solve these issues (and to get other nice benefits), I propose that we 
 implement a global data store concept in the repository. A data store is an 
 append-only set of binary values that uses short identifiers to identify and 
 access the stored binary values. The data store would trivially fit the 
 requirements of transient space and transaction handling due to the 
 append-only nature. An explicit mark-and-sweep garbage collection process 
 could be added to avoid concerns about storing garbage values.
 See the recent NGP value record discussion, especially [1], for more 
 background on this idea.
 [1] 
 http://mail-archives.apache.org/mod_mbox/jackrabbit-dev/200705.mbox/[EMAIL 
 PROTECTED]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: Database connections queries

2007-08-30 Thread Martijn Hendriks
 That seems worrisome... Could there be a bug in the search 
 index update code for deleting entries? Something like that 
 could easily have been ignored so far if the only effect is a 
 warning in the log due to the current approach of simply 
 dropping the results for which an exception is thrown.

There's an issue with Lucene 2.0 (see
http://issues.apache.org/jira/browse/LUCENE-669 and
http://issues.apache.org/jira/browse/LUCENE-750) which might cause
IOExceptions. This is logged very clearly in the SearchManager.onEvent
method however and I haven't seen such messages.

The problem is that it happens on only one installation to which we have
very little access. We are not able to reproduce it and debugging is
therefore quite problematic...

Best wishes,

Martijn


[jira] Assigned: (JCR-1096) Problems with custom nodes in journal

2007-08-30 Thread Dominique Pfister (JIRA)

 [ 
https://issues.apache.org/jira/browse/JCR-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominique Pfister reassigned JCR-1096:
--

Assignee: Dominique Pfister

 Problems with custom nodes in journal
 -

 Key: JCR-1096
 URL: https://issues.apache.org/jira/browse/JCR-1096
 Project: Jackrabbit
  Issue Type: Bug
  Components: clustering
Affects Versions: 1.3.1
Reporter: Raffaele Sena
Assignee: Dominique Pfister

 I have an application that uses custom node types and I am having problems in 
 a clustered configuration.
 Issue 1: the following definition in a nodetype is incorrectly read from the 
 journal:
   + * (nt:hierarchyNode) version
 The * is stored in the journal as _x002a_ since it should be a QName and it 
 gets escaped.
 When read, the code 
 ...core.nodetype.compact.CompactNodeTypeDefReader.doChildNodeDefinition does 
 the following test:
 if (currentTokenEquals('*')) {
 ndi.setName(ItemDef.ANY_NAME); 
 } else {
 ndi.setName(toQName(currentToken));
 }
 Since currentToken is _x002a_ and not * toQName(currentToken) is called but 
 it fails.
 I changed the test to:
 if (currentTokenEquals('*') || currentTokenEquals(_x002a_))
 
 and that fixes the problem.
 Issue 2: when storing a nodeType in the journal the superclass nt:base is not 
 store, but when reading I get an error saying the node should be a subclass 
 of nt:base.
 The code in...core.nodetype.compact.CompactNodeTypeDefWriter.writeSupertypes 
 skips nt:base when writing the node.
 When reading the nodetype definition from the journal the following exception 
 is thrown:
 Unable to deliver node type operation: 
 [{http://namespace/app/repository/1.0}resource] all primary node types except 
 nt:base itself must be (directly or indirectly) derived from nt:base
 probably because nt:base is not re-added to the nodetype definition
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (JCR-1099) jcr2spi NodeEntryImpl.getPath() blows stack due to getIndex() calling itself

2007-08-30 Thread David Rauschenbach (JIRA)

[ 
https://issues.apache.org/jira/browse/JCR-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523884
 ] 

David Rauschenbach commented on JCR-1099:
-

That wouldn't change the flow. I'm having a ConstraintViolationException 
thrown, so it'd get caught the same way. The exception message is no matching 
child node definition found for {}Inbox. I guess that's my ultimate problem, 
that my effective node type object has empty named and unnamed node definition 
arrays. But those values are coming straight from the in-memory transient 
repository, through my SPI bridge, so I don't see where the values could get 
dropped.

Nevertheless, if those arrays are empty (maybe due to a counterpart flaw in 
spi2jcr?), this flow still seems to be invalid, since an SPI can return 
anything, and it should not trigger a crash.

 jcr2spi NodeEntryImpl.getPath() blows stack due to getIndex() calling itself
 

 Key: JCR-1099
 URL: https://issues.apache.org/jira/browse/JCR-1099
 Project: Jackrabbit
  Issue Type: Bug
  Components: SPI
Affects Versions: 1.4
Reporter: David Rauschenbach
 Attachments: repository.xml


 The jcr2spi NodeEntryImpl class contains logic that causes getIndex() to call 
 itself.
 Calling code:
 Session sess = repo.login(creds);
 Node inboxNode = sess.getRootNode().getNode(Inbox);
 inboxNode.getPath(); == blows stack
 Tracing reveals:
 1. NodeEntryImpl.getPath() ultimately calls getIndex()
 2. getIndex() calls NodeState.getDefinition()
 3. which calls ItemDefinitionProviderImpl.getQNodeDefinition(...)
 4. which catches a RepositoryException then calls 
 NodeEntryImpl.getWorkspaceId()
 5. which calls NodeEntryImpl.getWorkspaceIndex()
 6. which calls getIndex() (back to step 2, ad infinitum)
 Configuration:
 1. A configuration is loaded specifying in-memory persist manager
 2. Config is wrapped in TransientRepository
 3. that's wrapped in spi2jcr's RepositoryService using default 
 BatchReadConfig
 4. a jcr2spi provider is instantiated that directly couples to spi2jcr
 5. Node in question is created as follows:
 Session sess = repo.login(creds);
 sess.getRootNode().addNode(Inbox, nt:folder);
 sess.save();
 I guess that's about it.
 David

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (JCR-1099) jcr2spi NodeEntryImpl.getPath() blows stack due to getIndex() calling itself

2007-08-30 Thread Julian Reschke (JIRA)

[ 
https://issues.apache.org/jira/browse/JCR-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523904
 ] 

Julian Reschke commented on JCR-1099:
-

Understood and agreed. I just wanted to make sure it's not a 
non-ConstraintViolationException that shouldn't have been called in the first 
place.


 jcr2spi NodeEntryImpl.getPath() blows stack due to getIndex() calling itself
 

 Key: JCR-1099
 URL: https://issues.apache.org/jira/browse/JCR-1099
 Project: Jackrabbit
  Issue Type: Bug
  Components: SPI
Affects Versions: 1.4
Reporter: David Rauschenbach
 Attachments: repository.xml


 The jcr2spi NodeEntryImpl class contains logic that causes getIndex() to call 
 itself.
 Calling code:
 Session sess = repo.login(creds);
 Node inboxNode = sess.getRootNode().getNode(Inbox);
 inboxNode.getPath(); == blows stack
 Tracing reveals:
 1. NodeEntryImpl.getPath() ultimately calls getIndex()
 2. getIndex() calls NodeState.getDefinition()
 3. which calls ItemDefinitionProviderImpl.getQNodeDefinition(...)
 4. which catches a RepositoryException then calls 
 NodeEntryImpl.getWorkspaceId()
 5. which calls NodeEntryImpl.getWorkspaceIndex()
 6. which calls getIndex() (back to step 2, ad infinitum)
 Configuration:
 1. A configuration is loaded specifying in-memory persist manager
 2. Config is wrapped in TransientRepository
 3. that's wrapped in spi2jcr's RepositoryService using default 
 BatchReadConfig
 4. a jcr2spi provider is instantiated that directly couples to spi2jcr
 5. Node in question is created as follows:
 Session sess = repo.login(creds);
 sess.getRootNode().addNode(Inbox, nt:folder);
 sess.save();
 I guess that's about it.
 David

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (JCR-1099) jcr2spi NodeEntryImpl.getPath() blows stack due to getIndex() calling itself

2007-08-30 Thread David Rauschenbach (JIRA)

[ 
https://issues.apache.org/jira/browse/JCR-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523905
 ] 

David Rauschenbach commented on JCR-1099:
-

The bottom line, I think, is that in my case the jcr2spi bridge is unable to 
get around to making the getNodeDefinition SPI invocation, because the call to 
NodeEntry.getWorkspaceId()  at ItemDefinitionProviderImpl.java:90, which would 
provide the NodeId to use as the second argument, causes the recursion.

 jcr2spi NodeEntryImpl.getPath() blows stack due to getIndex() calling itself
 

 Key: JCR-1099
 URL: https://issues.apache.org/jira/browse/JCR-1099
 Project: Jackrabbit
  Issue Type: Bug
  Components: SPI
Affects Versions: 1.4
Reporter: David Rauschenbach
 Attachments: repository.xml


 The jcr2spi NodeEntryImpl class contains logic that causes getIndex() to call 
 itself.
 Calling code:
 Session sess = repo.login(creds);
 Node inboxNode = sess.getRootNode().getNode(Inbox);
 inboxNode.getPath(); == blows stack
 Tracing reveals:
 1. NodeEntryImpl.getPath() ultimately calls getIndex()
 2. getIndex() calls NodeState.getDefinition()
 3. which calls ItemDefinitionProviderImpl.getQNodeDefinition(...)
 4. which catches a RepositoryException then calls 
 NodeEntryImpl.getWorkspaceId()
 5. which calls NodeEntryImpl.getWorkspaceIndex()
 6. which calls getIndex() (back to step 2, ad infinitum)
 Configuration:
 1. A configuration is loaded specifying in-memory persist manager
 2. Config is wrapped in TransientRepository
 3. that's wrapped in spi2jcr's RepositoryService using default 
 BatchReadConfig
 4. a jcr2spi provider is instantiated that directly couples to spi2jcr
 5. Node in question is created as follows:
 Session sess = repo.login(creds);
 sess.getRootNode().addNode(Inbox, nt:folder);
 sess.save();
 I guess that's about it.
 David

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (JCR-1100) Support for dynamic mixins

2007-08-30 Thread Padraic Hannon (JIRA)

[ 
https://issues.apache.org/jira/browse/JCR-1100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523960
 ] 

Padraic Hannon commented on JCR-1100:
-

I currently have this working for persistence using cglib. However, it requires 
moving away from pojo's and requires having interfaces defined to create the 
proxy dynamically. I can upload diffs with the code, however, I think the 
solution would be to not have to resort to creating interfaces for everything.

 Support for dynamic mixins
 --

 Key: JCR-1100
 URL: https://issues.apache.org/jira/browse/JCR-1100
 Project: Jackrabbit
  Issue Type: New Feature
  Components: jcr-mapping
Affects Versions: 1.3.1
Reporter: Padraic Hannon

 JCR allows one to add mixins to nodes dynamically. However, within the ocm 
 code one cannot readily add mixins dynamically to objects. This feature would 
 allow jcr nodes to be updated with a mixin and ocm to read that node and 
 ensure an object is created correctly. Additionally for a passed in object 
 upon storage the ocm would inspect it and ensure all mixed in object fields 
 are added to the class descriptor. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (JCR-1100) Support for dynamic mixins

2007-08-30 Thread Padraic Hannon (JIRA)
Support for dynamic mixins
--

 Key: JCR-1100
 URL: https://issues.apache.org/jira/browse/JCR-1100
 Project: Jackrabbit
  Issue Type: New Feature
  Components: jcr-mapping
Affects Versions: 1.3.1
Reporter: Padraic Hannon


JCR allows one to add mixins to nodes dynamically. However, within the ocm code 
one cannot readily add mixins dynamically to objects. This feature would allow 
jcr nodes to be updated with a mixin and ocm to read that node and ensure an 
object is created correctly. Additionally for a passed in object upon storage 
the ocm would inspect it and ensure all mixed in object fields are added to the 
class descriptor. 



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.