Re: Embedding Groovy in oak-run for Oak Shell (OAK-1805)
On Fri, May 23, 2014 at 12:05 AM, Marcel Reutegger mreut...@adobe.com wrote: but the resulting jar file is indeed quite big. Do we really need all the jars we currently embed? Yes currently we are embedding quite a few jars. Looking at oak-run I see it embed following major types of deps 1. JR2 jars (required for benchmark and also upgrade) 2. Lucene 3.6.x jars for JR2 3. H2 and related DBCP jars for RDB 4. Oak jars 5. Logback/jopt etc required for standalone usage 6. Now groovy Alternatively we may want to consider a new module. E.g. oak-console with only the required jar files to run the console. Might be better to go this way as we anyway have to start using Lucene 4.x to allow say a command to dump the Lucene directory content. Given oak-run would be used for benchmark and upgrade it has to package Jr2 and Lucene 3.6.x. So for pure oak related feature set we might require a new module. Chetan Mehrotra
Re: My repository is not indexing PDFs, what am I missing?
Hi Bertrand, This might be due to OAK-1462. We had to disable the LuceneIndexProvider form getting registered as OSGi service due to handle case where LuceneIndexProvider was getting registered twice (1 default and other for Aggregate case). Would try to resolve this soon by next week and then it should work fine Chetan Mehrotra On Wed, May 21, 2014 at 8:58 PM, Bertrand Delacretaz bdelacre...@apache.org wrote: Hi, I'm upgrading the OakSlingRepositoryManager used for Sling tests to Oak 1.0, and it's not indexing PDFs anymore - it used to with oak 0.8. After uploading a text file to /tmp, the /jcr:root/foo//*[jcr:contains(.,'some word')] query finds it, but the same doesn't work with a PDF. My repository setup is in the OakSlingRepositoryManager [1] - am I missing something in there? -Bertrand [1] https://svn.apache.org/repos/asf/sling/trunk/bundles/jcr/oak-server/src/main/java/org/apache/sling/oak/server/OakSlingRepositoryManager.java
Re: How to activate a SecurityProvider
On Tue, May 20, 2014 at 7:36 PM, Galo Gimenez galo.gime...@gmail.com wrote: I am running an old version of Felix, maybe that is the problem? Looks like you are using an old version of SCR. Try to run with more recent version of SCR. Chetan Mehrotra
Re: svn commit: r1587286 - in /jackrabbit/oak/trunk: oak-core/pom.xml oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStoreService.java oak-parent/pom.xml
For the record I have implemented a DataSource provider bundle [2] (based on above flow) as part of SLING-3574. That bundle can be used to configure a DataSource in OSGi env Chetan Mehrotra [1] https://issues.apache.org/jira/browse/SLING-3574 [2] https://github.com/chetanmeh/sling-datasource On Tue, Apr 15, 2014 at 12:30 PM, Chetan Mehrotra chetan.mehro...@gmail.com wrote: Register a DataSource where? DataSource would be registered with OSGi ServiceRegistery Does this work in an OSGi context? Yes it should work in OSGi context. Would try to implement the approach by end of week if time permits How does it get the DataSource? Per JNDI? The DataSource would be obtained from OSGi service registry just like it currently obtains the BlobStore instance Chetan Mehrotra On Tue, Apr 15, 2014 at 12:00 PM, Julian Reschke julian.resc...@gmx.de wrote: On 2014-04-15 06:10, Chetan Mehrotra wrote: Hi Julian, On Tue, Apr 15, 2014 at 12:39 AM, resc...@apache.org wrote: - Embed-Dependencycommons-dbcp,commons-pool,h2,json-simple/Embed-Dependency + Embed-Dependencycommons-dbcp,commons-pool,h2,json-simple,postgresql,db2,db2-license/Embed-Dependency Embed-Transitivetrue/Embed-Transitive I believe this is a temporary change and would not be required for final implementation? Would be helpful if we add a TODO/FIXME there such that we remember to remove this later OK. Instead of embedding all such types of drivers/dbcp/pool etc within oak-core it would be better to decouple them. For example one approach can be 1. Have a bundle which embeds common-dbcp and required dependencies. It would be responsible for registering a DataSource Register a DataSource where? 2. Driver bundle would be fragments to the bundle #1 as host. With JDBC 4.0 the Driver classes are provided as part of META-INF/services/java.sql.Driver [1]. For such cases fragment bundles can be avoided by having #1 monitor for such drivers and register them programatically Does this work in an OSGi context? 3. DocumentNodeStoreService should only have a reference to DataSource and use that How does it get the DataSource? Per JNDI? Best regards, Julian
Re: NodeStore and BlobStore configurations in OSGi
On Mon, May 19, 2014 at 3:29 PM, Marc Pfaff pfa...@adobe.com wrote: For SegmentNodeStore my research ends up in FileBlobStore and for the DocumentNodeStore it appears to be the MongoBlobStore. Is that correct? SegementNodeStore does not uses BlobStore by default. Instead all the binary content is stored as part of segment data itself. Sofollowing points can be noted for BlobStore 1. SegmentNodeStore does not require BlobStore by default 2. DocumentNodeStore uses MongoBlobStore by default 3. Both can be configured to use a BlobStore via OSGi config * I was only able to find the FileBlobStoreService that registers the FileBlobStore as an OSGi service. I was not able to find more BlobStore implementations to be exposed in OSGi. Are there any more? And how about the MongoBlobStore in particular? MongoBlobStore is not configured as an explicit service instead is used as the default fallback option if no other BlobStore is configured. As it just requires Mongo connection detail, it is currently configured along with MongoDocumentStore in DocumentNodeStore There is AbstractDataStoreService which wraps JR2 DataStore as BlobStore and configures and registers them with OSGi. it currently support FileDataStore and S3DataStore. Note that FileDataStore is currently preferred over FileBlobStore The DocumentNodeStoreService references the same blob store service as the SegmentNodeStoreService. As I'm not able to find the MongoBlobStore exposed as service, does that mean the DocumentNodeStore uses the FileBlobStore No. The MongoBlobStore is configured implicitly in org.apache.jackrabbit.oak.plugins.document.DocumentMK.Builder#setMongoDB(com.mongodb.DB, int). So unless a BlobStore is explicitly configured DocumentNodeStore would use MongoBlobStore. Both, the SegmentNodeStoreService and the DocumentNodeStoreService appear to check for 'custom blob store' property but both components do not expose such a property? And how would they select from a specific BlobStore service? Here there is an assumption that system has only one BlobStore registered with OSGi Service Registry. If multiple BlobStore services are registered then you can possibly specify a specific one by configuring the 'blobStore.target' to the required OSGi service filter (DS 112.6 of OSGi Compedium) Chetan Mehrotra
Re: NodeStore and BlobStore configurations in OSGi
On Mon, May 19, 2014 at 8:10 PM, Marc Pfaff pfa...@adobe.com wrote: SegmentNodeStore.getBlob() does not seem to be used when reading binaries through JCR Yes when reading via JCR the read is handled via Blob itself i.e. SegmentBlob in this case. A SegmentBlob gets created when the JCR property access - SegmentNodeState.getProperty - SegmentPropertyState.getValue - SegmentPropertyState#getValue(Segment, RecordId, TypeT) Here there is an assumption that system has only one BlobStore registered with OSGi Service Registry. If multiple BlobStore services are registered then you can possibly specify a specific one by configuring the 'blobStore.target' to the required OSGi service filter (DS 112.6 of OSGi Compedium) Assuming same is true for NodeStore services. Did not get the query? btw I have update the docs at [1] (update should reflect on github in couple of hours) Chetan Mehrotra [1] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/blobstore.md
Re: How to activate a SecurityProvider
SecurityProvider should get registered. Do you have the Felix running with WebConsole. Whats the status of the 'org.apache.jackrabbit.oak.security.SecurityProviderImpl' Chetan Mehrotra On Sat, May 17, 2014 at 1:30 AM, Galo Gimenez galo.gime...@gmail.com wrote: Hello, I am setting up Oak on a Felix container , and the RepositoryManager reference to the SecurityProvider does not get satisfied, by looking at the documentation I do not see a way to fix this. I have noticed that the Sling project has a very different way to setup the repository, should I follow that model , or there is something I missing that makes the SecurityProvider service not to register. -- Galo
Lucene blob size different in trunk and 1.0 branch
Hi, As part of [1] the Lucene blob size was changed to 16kb (from 32 kb) to ensure that Lucene blobs are not made part of FileDataStore when SegmentMK is used. However this revision was not merged to 1.0 branch. This miss also affects the caching logic in DataStore (OAK-1726) as there it was assumed that Lucene blobs would be less than 16 kb hence it only cached binaries upto 16 kb. However in trunk the Lucene blobs are of size 32 kb which breaks this assumption and Lucene blobs would not be cached in memory. This can be fixed via config setting 'maxCachedBinarySize' Changing it to 16 now in 1.0 would cause upgrade issue. So should the change be reverted in trunk? Chetan Mehrotra [1] http://svn.apache.org/viewvc?view=revisionrevision=r1587430 [2] http://svn.apache.org/viewvc/jackrabbit/oak/branches/1.0/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/OakDirectory.java?view=markup
Re: buildbot failure in ASF Buildbot on oak-trunk-win7
Failure in ObservationTest Tests run: 110, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 179.557 sec FAILURE! observationDispose[4](org.apache.jackrabbit.oak.jcr.observation.ObservationTest) Time elapsed: 7.138 sec FAILURE! java.lang.AssertionError at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertFalse(Assert.java:68) at org.junit.Assert.assertFalse(Assert.java:79) at org.apache.jackrabbit.oak.jcr.observation.ObservationTest.observationDispose(ObservationTest.java:467) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Chetan Mehrotra On Tue, May 13, 2014 at 12:10 PM, build...@apache.org wrote: The Buildbot has detected a new failure on builder oak-trunk-win7 while building ASF Buildbot. Full details are available at: http://ci.apache.org/builders/oak-trunk-win7/builds/67 Buildbot URL: http://ci.apache.org/ Buildslave for this Build: bb-win7 Build Reason: scheduler Build Source Stamp: [branch jackrabbit/oak/trunk] 1594128 Blamelist: chetanm BUILD FAILED: failed compile sincerely, -The Buildbot
Re: [VOTE] Release Apache Jackrabbit Oak 1.0.0
[X] +1 Release this package as Apache Jackrabbit Oak 1.0.0 All tests passed and All checks ok. Chetan Mehrotra On Mon, May 12, 2014 at 2:15 PM, Davide Giannella giannella.dav...@gmail.com wrote: [X] +1 Release this package as Apache Jackrabbit Oak 1.0.0 Davide
Re: svn commit: r1560611 - in /jackrabbit/oak/trunk: oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/mongomk/util/ oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/mongomk/ oak-jcr/sr
On Fri, Apr 25, 2014 at 11:04 PM, Jukka Zitting jukka.zitt...@gmail.com wrote: The credentials in any case need to be valid for the database that holds the repository, so I don't see why we couldn't use it for this purpose. As per the doc the database name tells mongo driver the name of DB in which user details are stored. Typically in SQL database the admin user tables are managed in a dedicated schema. Probably similar scheme is followed in Mongo side also. Probably we can modify the logic to use the db name present as part of url if no db name is explicitly provided via 'oak.mongo.db' Chetan Mehrotra
Adding ProviderType and ConsumerType annotation to interfaces in exported packages
As part of OAK-1741 I was changing the Version of exported packages to 1.0.0. Looking at the interfaces which are part of exported packages I do not see usage of ConsumerType/ProviderType annotations [1] In brief and simple terms the interfaces which are expected to be implemented by users of Oak api (like org.apache.jackrabbit.oak.plugins.observation.EventHandler) should be marked with ConsumerType anotation. This enables bnd tool to generate package import instructions with stricter range [1.0,1.1) For all other interface which are supposed to be provided by Oak we should mark them with ProviderType. This enables bnd to generate the package import instructions with relaxed range [1.0,2) for our api consumers. This would help us evolve the api in future easily. Currently we are having following interfaces as part of exported packages [2]. Looking at the list I believe most are of ProviderType i.e. provided by Oak and not required by Oak API users. Some like org.apache.jackrabbit.oak.plugins.observation.EventHandler are of ConsumerType as we require the API users to implement them. Should we add the required annotations for 1.0 release? If yes then can team members look into the list and set the right type Chetan Mehrotra [1] https://github.com/osgi/design/raw/master/rfcs/rfc0197/rfc-0197-OSGiPackageTypeAnnotations.pdf [2] https://issues.apache.org/jira/browse/OAK-1741?focusedCommentId=13979465#comment-13979465
Oak CI notifications not comming
Hi, I was checking the CI status for Oak trunk and it seems build are not getting built at [1] and [2]. Do we have to enable it somehow? Chetan Mehrotra [1] https://travis-ci.org/apache/jackrabbit-oak/builds [2] http://ci.apache.org/builders/oak-trunk/
Re: plugin.document not exported in OSGi bundle
The preferred approach is to instantiate via OSGi configuration. So in your OSGi create configuration for DocumentNodeStore for pid 'org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService' [1]. This would activate the DocumentNodeStoreService [2] component and that would register DocumentNodeStore against NodeStore interface Chetan Mehrotra [1] http://jackrabbit.apache.org/oak/docs/osgi_config.html#DocumentNodeStore [2] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStoreService.java On Tue, Apr 22, 2014 at 11:23 PM, Galo Gimenez galo.gime...@gmail.com wrote: Hello, I noticed org.apache.jackrabbit.oak.plugins.document.DocumentMK is not exported in the OSGi bundle, is there a way to get Oak with the DocumentMK instantiated in OSGi. -- Galo
Review currently exported package version for 1.0 release
Hi Team, As part of OAK-1741 [1] I have captured details about current exported packages from various bundles provided as part of Oak. Currently some packages are exported at 0.18, 0.16 and some are being exported at bundle version. Should we bump all of them to 1.0.0 for the 1.0 release and ensure they are consistent from there on. Also would helpful to review the list once i.e. if the package export is required for e.g in oak-solr-osgi exports quite abit but are probably not required. Chetan Mehrotra [1] https://issues.apache.org/jira/browse/OAK-1741
Re: Using Lucene indexes for property queries
Should we let the user decide whether it's OK to use an asynchronous index for this case +1 for that. It has been the case with JR2 (I may be wrong here). And when user is searching for say some asset via DAM in Adobe CQ then he would be ok if result is not for latest head. A small lag should be acceptable. This would enable scenarios where traversal would be too costly and Lucene can still be used to provide required results in a lot lesser time. Chetan Mehrotra On Mon, Apr 14, 2014 at 2:33 PM, Thomas Mueller muel...@adobe.com wrote: Hi, In theory, the Lucene index could be used quite easily. As far as I see, we would only need to change the cost function of the Lucene index (return a reasonable cost even if there is no full-text constraint). One problem might be: the Lucene index is asynchronous, and the user might expect the result to be up-to-date. The user knows this already for full-text constraints, but not for property constraints. Should we let the user decide whether it's OK to use an asynchronous index for this case? For example by specifying an option in the query (for example similar to the order by, at the very end of the query, option async)? So a query that can use an asynchronous index would look like this: //*[@prop = 'x'] option async or //*[@prop = 'x'] order by @otherProperty option async or select [jcr:path] from [nt:base] as a where [prop] 1 option async Regards, Thomas On 14/04/14 06:54, Chetan Mehrotra chetan.mehro...@gmail.com wrote: Hi, In JR2 I believe Lucene was used for all types of queries and not only for full text searches. In Oak we have our own PropertyIndexes for handling queries involving constraints on properties. This I believe provides a more accurate result as its built on top of mvcc support so results obtained are consistent with session state/revision. However this involves creating a index for property to be queried. And the way currently property indexes are stored they consume quite a bit of state (at least in DocumentNodeStore). In comparison Lucene stores the index content in quite compact form. In quite a few cases (like user choice based query builder) it might not be known in advance which property the user would use. As we already have all string property indexed in Lucene. Would it be possible to use Lucene for performing such queries? Or allow the user to choose which types of index he wants to use depending on the usecase. Chetan Mehrotra
Re: svn commit: r1587286 - in /jackrabbit/oak/trunk: oak-core/pom.xml oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStoreService.java oak-parent/pom.xml
Hi Julian, On Tue, Apr 15, 2014 at 12:39 AM, resc...@apache.org wrote: - Embed-Dependencycommons-dbcp,commons-pool,h2,json-simple/Embed-Dependency + Embed-Dependencycommons-dbcp,commons-pool,h2,json-simple,postgresql,db2,db2-license/Embed-Dependency Embed-Transitivetrue/Embed-Transitive I believe this is a temporary change and would not be required for final implementation? Would be helpful if we add a TODO/FIXME there such that we remember to remove this later Instead of embedding all such types of drivers/dbcp/pool etc within oak-core it would be better to decouple them. For example one approach can be 1. Have a bundle which embeds common-dbcp and required dependencies. It would be responsible for registering a DataSource 2. Driver bundle would be fragments to the bundle #1 as host. With JDBC 4.0 the Driver classes are provided as part of META-INF/services/java.sql.Driver [1]. For such cases fragment bundles can be avoided by having #1 monitor for such drivers and register them programatically 3. DocumentNodeStoreService should only have a reference to DataSource and use that Chetan Mehrotra [1] http://docs.oracle.com/javase/7/docs/api/java/sql/DriverManager.html
Using Lucene indexes for property queries
Hi, In JR2 I believe Lucene was used for all types of queries and not only for full text searches. In Oak we have our own PropertyIndexes for handling queries involving constraints on properties. This I believe provides a more accurate result as its built on top of mvcc support so results obtained are consistent with session state/revision. However this involves creating a index for property to be queried. And the way currently property indexes are stored they consume quite a bit of state (at least in DocumentNodeStore). In comparison Lucene stores the index content in quite compact form. In quite a few cases (like user choice based query builder) it might not be known in advance which property the user would use. As we already have all string property indexed in Lucene. Would it be possible to use Lucene for performing such queries? Or allow the user to choose which types of index he wants to use depending on the usecase. Chetan Mehrotra
Re: jackrabbit-oak build #4073: Errored
I'm sorry but your test run exceeded 50.0 minutes. Build failure is due to TimeOut Chetan Mehrotra On Thu, Apr 10, 2014 at 11:49 AM, Travis CI ju...@apache.org wrote: Build Update for apache/jackrabbit-oak - Build: #4073 Status: Errored Duration: 3002 seconds Commit: a653c0f168842a5d9b1de8072fdcc5f6d216ad12 (trunk) Author: Chetan Mehrotra Message: OAK-1716 - Enable passing of a execution context to runTest in multi threaded runs Exposed a protected method `prepareThreadExecutionContext` which subclasses can override to return a context instance which would be used by that thread of execution git-svn-id: https://svn.apache.org/repos/asf/jackrabbit/oak/trunk@1586218 13f79535-47bb-0310-9956-ffa450edef68 View the changeset: https://github.com/apache/jackrabbit-oak/compare/2371ef73a4cd...a653c0f16884 View the full build log and details: https://travis-ci.org/apache/jackrabbit-oak/builds/22666739 -- sent by Jukka's Travis notification gateway
Re: Slow full text query performance and Lucene Index handling in Oak
On Wed, Apr 9, 2014 at 12:25 PM, Marcel Reutegger mreut...@adobe.com wrote: Since the Lucene index is in any case updated asynchronously, it should be fine for us to ignore the base NodeState of the current session and instead use an IndexSearcher based on the last state as updated by the async indexer. This would allow us to reuse the IndexSearcher over multiple queries. I was also wondering if it makes sense to share it across multiple sessions performing a query to reduce the number of index readers that may be open at the same time. however, this will likely also reduce concurrency because we synchronize access to a single session. I tried with one approach where I used a custom SerahcerManager based on Lucene SearcherManager. It obtains the root NodeState directly from NodeStore. As NodeStore can be accessed concurrently it should not have any impact on session concurrency With this change there is a slight improvement Oak-Tar1 39 40 40 44 641459 Oak-Tar(Shared)1 32 33 34 36 611738 So did not gave much boost (at least with approach taken). As I do not have much understanding of Lucene internal can someone review the approach taken and see if there are some major issues with it Chetan Mehrotra [1] https://issues.apache.org/jira/secure/attachment/12639366/OAK-1702-shared-indexer.patch [2] https://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/search/SearcherManager.html
Re: Slow full text query performance and Lucene Index handling in Oak
On Wed, Apr 9, 2014 at 3:00 PM, Alex Parvulescu alex.parvule...@gmail.com wrote: - the patch assumes that there is and will be a single lucene index directly under the root node, which may not necessarily be the case. I agree this assumption holds now, but I would not introduce any changes that take away this flexibility. That is not a problem per se as IndexReader starts with a count of 1. So it would never go zero The problem appears to be somewhere else. As I modified the code to use shared IndexSearcher and native FSDirectory and still the performance improvement was marginal. The problem is occuring because the org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndex#query [1] currently does a eager initialization of cursor while the testcase only fetches the first result. Compared to this the JR2 version does a lazy evaluation. If put a break in loop (exit after first result) the results are much better Oak-Tar(break.shared searcher,fs) 1 2 2 3 3 170 23204 Oak-Tar(break) 1 5 5 5 6 90 10593 Jackrabbit 1 4 4 5 6 231 11385 Now I am not sure if this a problem with the usecase taken. Or the Lucene Index cursor management should be improved as in many case the results would be multiple but the client code only makes use of initial few results Chetan Mehrotra [1] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndex.java#L381-L409
Re: Slow full text query performance and Lucene Index handling in Oak
On Wed, Apr 9, 2014 at 5:14 PM, Jukka Zitting jukka.zitt...@gmail.com wrote: Is that a common use case? To better simulate a normal usage scenario I'd make the benchmark fetch up to N results (where N is configurable, with default something like 20) and access the path and the title property of the matching nodes. I changed the logic of benchmark in http://svn.apache.org/r1585962. With that JR2 slows down a bit # FullTextSearchTest C min 10% 50% 90% max N Oak-Tar1 34 35 36 39 601639 Jackrabbit 1 5 5 6 7 68 10038 Profiling the result shows that quite a bit of time goes in org.apache.lucene.codecs.compressing.LZ4.decompress() (40%). This I think is part of Lucene 4.x and not present in 3.x. Any idea if I can disable compression? Chetan Mehrotra
Re: Slow full text query performance and Lucene Index handling in Oak
Current update 1. Tommaso provided a patch (OAK-1702) to disable compression and that also helps quite a bit 2. Currently we are storing the full tokenized text in Lucene Index [1]. This would cause fetching of doc fields to be slower. On disabling the storage the number improve quite a bit. This was added as part of OAK-319 for supporting MLT # FullTextSearchTest C min 10% 50% 90% max N Oak-Tar (codec)1 9 9 10 12 415664 Oak-Tar (codec,mlt off)1 7 8 8 10 216921 Would look further Chetan Mehrotra [1] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/FieldFactory.java#L44 On Wed, Apr 9, 2014 at 7:15 PM, Alex Parvulescu alex.parvule...@gmail.com wrote: Aside from the compression issue, there was another one related to the 'order by' clause. I saw Collections.sort taking up as far as 23% of the perf. I removed the order by temporarily so it doesn't get in the way of the Lucene stuff, but I think the QueryEngine should skip ordering results in this case. On Wed, Apr 9, 2014 at 3:31 PM, Tommaso Teofili tommaso.teof...@gmail.comwrote: I'm looking into the Lucene codecs right now. Tommaso 2014-04-09 15:20 GMT+02:00 Alex Parvulescu alex.parvule...@gmail.com: Profiling the result shows that quite a bit of time goes in org.apache.lucene.codecs.compressing.LZ4.decompress() (40%). This I think is part of Lucene 4.x and not present in 3.x. Any idea if I can disable compression? +1 I noticed that too, we should try to disable compression and compare results. alex On Wed, Apr 9, 2014 at 3:16 PM, Chetan Mehrotra chetan.mehro...@gmail.comwrote: On Wed, Apr 9, 2014 at 5:14 PM, Jukka Zitting jukka.zitt...@gmail.com wrote: Is that a common use case? To better simulate a normal usage scenario I'd make the benchmark fetch up to N results (where N is configurable, with default something like 20) and access the path and the title property of the matching nodes. I changed the logic of benchmark in http://svn.apache.org/r1585962. With that JR2 slows down a bit # FullTextSearchTest C min 10% 50% 90% max N Oak-Tar1 34 35 36 39 601639 Jackrabbit 1 5 5 6 7 68 10038 Profiling the result shows that quite a bit of time goes in org.apache.lucene.codecs.compressing.LZ4.decompress() (40%). This I think is part of Lucene 4.x and not present in 3.x. Any idea if I can disable compression? Chetan Mehrotra
Slow full text query performance and Lucene Index handling in Oak
Hi, As part of OAK-1702 I have added a benchmark to compare the performance of Full text query search with JR2 Based on approach taken (which might be wrong) I get following numbers Apache Jackrabbit Oak 0.21.0-SNAPSHOT # FullTextSearchTest C min 10% 50% 90% max N Oak-Mongo 1 58 71 101 119 287 610 Oak-Mongo-FDS 1 50 51 52 58 1841106 Oak-Tar1 39 40 40 44 641459 Oak-Tar-FDS1 53 54 55 64 1971030 Jackrabbit 1 4 4 5 6 231 11385 Which shows that JR2 performs lot better for full text queries and subsequent queries are quite faster once Lucene has warmed up. Looking at current usage of Lucene in Oak and the way we store and access the Lucene indexes [2] I have couple of doubts 1. Multiple IndexSearcher instances - Current impl would create a new IndexSearcher for every Lucene query as the OakDirectory uses is bound to NodeState of executing JCR session. Compared to this in JR2 we probably had a singleton IndexSearcher which was shared across all the query execution path. This would potentially cause performance issue as Lucene is effectively used in a state less way and it has to perform initialization for every call. As [3] the IndexSearcher must be shared 2. Index Access - Currently we have custom OakDirectory which provides access to Lucene indexes stored in NodeStore. Even with SegmentStore which has memory mapped file the random access used by Lucene would probably be lot slower with OakDirectory in comparison to default Lucene MMapDirectory. For small setups where Lucene index can be accomodated on each node I think it would be better if the index is access from file system Are the above concerns valid and should we relook into how we are using Lucene in Oak? Chetan Mehrotra [1] https://issues.apache.org/jira/browse/OAK-1702 [2] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/OakDirectory.java [3] http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
Re: svn commit: r1577449 - in /jackrabbit/oak/trunk/oak-core/src: main/java/org/apache/jackrabbit/oak/plugins/segment/ main/java/org/apache/jackrabbit/oak/plugins/segment/file/ main/java/org/apache/ja
On Wed, Apr 2, 2014 at 11:36 AM, Jukka Zitting jukka.zitt...@gmail.com wrote: I consider this an unfortunate recent development. Not sure. There are some deployment scenarios where a shared FileDataStore is a must requirement and thus we need to support cases where blobs can be stored separately from node data. Yes it adds to the complexity of backup but then such a feature is required then that cost has to be paid. Default setups currently do not use FileDataStore or BlobStore with SegmentNodeStore. So as per defaults original design is still honored. Chetan Mehrotra
Re: svn commit: r1583285 - /jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/value/ValueImpl.java
On Wed, Apr 2, 2014 at 12:18 PM, Jukka Zitting jukka.zitt...@gmail.com wrote: The getContentIdentity() method has a specific contract and the return value should generally not be interpreted as a referenceable identifier. Ack. If you need a method that exposes the blobId, it would be best to add a separate method for that. But note that not all Blob implementations have a blobId like in BlobStoreBlob. For now there is no strong requirement for that. If need arises would follow up this way Chetan Mehrotra
Re: Question regarding missing _lastRev recovery - OAK-1295
The lease time is set to 1 minute. Would it be ok to check this every minute, from every node? Adding to that the default time intervals are - asyncDelay = 1 sec - The background operation are performed every 1 sec per cluster node. If nothing changes we would fire 1query/sec/cluster node to check the head revision - cluster lease time = 1 min - This is the time after a cluster lease would be renewed. So we need to decide the time interval for Job for detecting recovery condition Chetan Mehrotra On Wed, Apr 2, 2014 at 4:31 PM, Amit Jain am...@ieee.org wrote: Hi, 1) a cluster node starts up and sees it didn't shut down properly. I'm not sure this information is available, but remember we discussed this once. Yes, this case has been taken care of in the startup. this check could be done in the background operations thread on a regular basis. probably depending on the lease interval. The lease time is set to 1 minute. Would it be ok to check this every minute, from every node? Thanks Amit On Wed, Apr 2, 2014 at 4:14 PM, Marcel Reutegger mreut...@adobe.com wrote: Hi, I think the recovery should be triggered automatically by the system when: 1) a cluster node starts up and sees it didn't shut down properly. I'm not sure this information is available, but remember we discussed this once. 2) a cluster node sees a lease timeout of another cluster node and initiates the recovery for the failed cluster node. this check could be done in the background operations thread on a regular basis. probably depending on the lease interval. In addition it would probably also be useful to have the recovery operation available as a command in oak-run. that way you can manually trigger it from the command line. WDYT? Regards Marcel How do we expose _lastRev recovery operation? This would need to check all the cluster nodes info and run recovery for those nodes which need recovery. 1. We either have a scheduled job which checks all the nodes and run the recovery. What should be the interval to trigger the job? 2. Or if we want it run only when triggered manually, then expose an appropriate MBean. Thanks Amit
Re: svn commit: r1577449 - in /jackrabbit/oak/trunk/oak-core/src: main/java/org/apache/jackrabbit/oak/plugins/segment/ main/java/org/apache/jackrabbit/oak/plugins/segment/file/ main/java/org/apache/ja
@Chetan: why would the configs not be stored in the repo? I do not see how this relates to non-OSGi environments Well thats the basic config required to configure DocumentNodeStore/SegmentNodeStore. These config cannot be stored as content. Other settings like security related config currently is not read from NodeStore and in OSGi env is being provided by OSGi ConfigAdmin. And more other settings like Index are currently stored as content Chetan Mehrotra On Wed, Apr 2, 2014 at 3:49 PM, Michael Marth mma...@adobe.com wrote: On 02 Apr 2014, at 08:06, Jukka Zitting jukka.zitt...@gmail.commailto:jukka.zitt...@gmail.com wrote: That design gets broken if components start storing data separately in the repository folder. Agree with that design principle, but the (shared) file system DS is a valid exception IMO (same for the S3 DS). Later we would probably store the config files when using Oak outside of std OSGi env like with PojoSR @Chetan: why would the configs not be stored in the repo? I do not see how this relates to non-OSGi environments
Re: jackrabbit-oak build #3994: Broken
Test case failure on oak-solr Failed tests: testOffsetAndLimit(org.apache.jackrabbit.core.query.LimitAndOffsetTest): expected:1 but was:0 testOffsetAndLimitWithGetSize(org.apache.jackrabbit.core.query.LimitAndOffsetTest): expected:2 but was:0 Chetan Mehrotra On Wed, Apr 2, 2014 at 6:04 PM, Travis CI ju...@apache.org wrote: Build Update for apache/jackrabbit-oak - Build: #3994 Status: Broken Duration: 2194 seconds Commit: 0e0a47ec387626e494a65dd143e3a25a3d004abe (trunk) Author: Julian Reschke Message: OAK-1533 - remove JDBC URL specific constructors from -core git-svn-id: https://svn.apache.org/repos/asf/jackrabbit/oak/trunk@1583981 13f79535-47bb-0310-9956-ffa450edef68 View the changeset: https://github.com/apache/jackrabbit-oak/compare/1402e7db17c0...0e0a47ec3876 View the full build log and details: https://travis-ci.org/apache/jackrabbit-oak/builds/22096647 -- sent by Jukka's Travis notification gateway
Re: svn commit: r1583994 - in /jackrabbit/oak/trunk/oak-core/src: main/java/org/apache/jackrabbit/oak/plugins/blob/datastore/OakFileDataStore.java test/java/org/apache/jackrabbit/oak/plugins/blob/data
On Wed, Apr 2, 2014 at 6:30 PM, Jukka Zitting jukka.zitt...@gmail.com wrote: The inUse map is in FileDataStore for a reason. Ack. From what I have understood from Blob GC logic in Oak is that it relies on blob last modified value to distinguish between active used blobs. So for performing GC only those blob would be considered whose lastModified value is say 1 day. Only these blobs would be candidate for deletion. This ensures that any blob created in transient space are not considered for GC. So current logic does make an assumption that 1 day is sufficient time and hence not foolproof. However the current impl of inUse would probably only work for a single node system and would fail for shared DataStore scenario as its an in memory state and its hard to determine inUse state for whole cluster. For supporting such case we would have to rely on lastModified time interval to distinguish between active used blobs regards Chetan Chetan Mehrotra
Re: svn commit: r1583325 - in /jackrabbit/oak/trunk: oak-auth-external/pom.xml oak-core/pom.xml oak-jcr/pom.xml oak-mk-perf/pom.xml oak-mk/pom.xml oak-run/pom.xml oak-upgrade/pom.xml
Might be simpler to define the version in oak-parent Chetan Mehrotra On Mon, Mar 31, 2014 at 6:57 PM, resc...@apache.org wrote: Author: reschke Date: Mon Mar 31 13:27:46 2014 New Revision: 1583325 URL: http://svn.apache.org/r1583325 Log: use the latest H2 DB throughout Modified: jackrabbit/oak/trunk/oak-auth-external/pom.xml jackrabbit/oak/trunk/oak-core/pom.xml jackrabbit/oak/trunk/oak-jcr/pom.xml jackrabbit/oak/trunk/oak-mk-perf/pom.xml jackrabbit/oak/trunk/oak-mk/pom.xml jackrabbit/oak/trunk/oak-run/pom.xml jackrabbit/oak/trunk/oak-upgrade/pom.xml Modified: jackrabbit/oak/trunk/oak-auth-external/pom.xml URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-auth-external/pom.xml?rev=1583325r1=1583324r2=1583325view=diff == Binary files - no diff available. Modified: jackrabbit/oak/trunk/oak-core/pom.xml URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-core/pom.xml?rev=1583325r1=1583324r2=1583325view=diff == --- jackrabbit/oak/trunk/oak-core/pom.xml (original) +++ jackrabbit/oak/trunk/oak-core/pom.xml Mon Mar 31 13:27:46 2014 @@ -275,7 +275,7 @@ dependency groupIdcom.h2database/groupId artifactIdh2/artifactId - version1.3.158/version + version1.3.175/version optionaltrue/optional /dependency dependency Modified: jackrabbit/oak/trunk/oak-jcr/pom.xml URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-jcr/pom.xml?rev=1583325r1=1583324r2=1583325view=diff == --- jackrabbit/oak/trunk/oak-jcr/pom.xml (original) +++ jackrabbit/oak/trunk/oak-jcr/pom.xml Mon Mar 31 13:27:46 2014 @@ -294,7 +294,7 @@ dependency groupIdcom.h2database/groupId artifactIdh2/artifactId - version1.3.158/version + version1.3.175/version scopetest/scope /dependency dependency Modified: jackrabbit/oak/trunk/oak-mk-perf/pom.xml URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-mk-perf/pom.xml?rev=1583325r1=1583324r2=1583325view=diff == --- jackrabbit/oak/trunk/oak-mk-perf/pom.xml (original) +++ jackrabbit/oak/trunk/oak-mk-perf/pom.xml Mon Mar 31 13:27:46 2014 @@ -111,7 +111,7 @@ dependency groupIdcom.h2database/groupId artifactIdh2/artifactId -version1.3.158/version +version1.3.175/version /dependency dependency groupIdcom.cedarsoft.commons/groupId Modified: jackrabbit/oak/trunk/oak-mk/pom.xml URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-mk/pom.xml?rev=1583325r1=1583324r2=1583325view=diff == --- jackrabbit/oak/trunk/oak-mk/pom.xml (original) +++ jackrabbit/oak/trunk/oak-mk/pom.xml Mon Mar 31 13:27:46 2014 @@ -114,7 +114,7 @@ dependency groupIdcom.h2database/groupId artifactIdh2/artifactId - version1.3.158/version + version1.3.175/version optionaltrue/optional /dependency Modified: jackrabbit/oak/trunk/oak-run/pom.xml URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-run/pom.xml?rev=1583325r1=1583324r2=1583325view=diff == --- jackrabbit/oak/trunk/oak-run/pom.xml (original) +++ jackrabbit/oak/trunk/oak-run/pom.xml Mon Mar 31 13:27:46 2014 @@ -142,7 +142,7 @@ dependency groupIdcom.h2database/groupId artifactIdh2/artifactId - version1.3.158/version + version1.3.175/version /dependency dependency groupIdorg.mongodb/groupId Modified: jackrabbit/oak/trunk/oak-upgrade/pom.xml URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-upgrade/pom.xml?rev=1583325r1=1583324r2=1583325view=diff == --- jackrabbit/oak/trunk/oak-upgrade/pom.xml (original) +++ jackrabbit/oak/trunk/oak-upgrade/pom.xml Mon Mar 31 13:27:46 2014 @@ -95,7 +95,7 @@ dependency groupIdcom.h2database/groupId artifactIdh2/artifactId - version1.3.158/version + version1.3.175/version scopetest/scope /dependency dependency
Re: svn commit: r1583325 - in /jackrabbit/oak/trunk: oak-auth-external/pom.xml oak-core/pom.xml oak-jcr/pom.xml oak-mk-perf/pom.xml oak-mk/pom.xml oak-run/pom.xml oak-upgrade/pom.xml
You can define the version in dependencyManagement section and that would be inherited by child projects. For e.g. there are entries for junit, easymock etc. In child project you just define the groupId and artifactId Chetan Mehrotra On Mon, Mar 31, 2014 at 7:22 PM, Julian Reschke julian.resc...@gmx.de wrote: On 2014-03-31 15:31, Chetan Mehrotra wrote: Might be simpler to define the version in oak-parent Chetan Mehrotra ... Likely. I quickly looked at oak-parent and couldn't see any test dependencies over there, so decided to lave it alone for now... Best regards, Julian
Re: svn commit: r1583285 - /jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/value/ValueImpl.java
+1. I also missed getting a clean way to get blobId from Blob. So adding this method would be useful in other cases also Chetan Mehrotra On Tue, Apr 1, 2014 at 8:05 AM, Jukka Zitting jukka.zitt...@gmail.com wrote: Hi, On Mon, Mar 31, 2014 at 3:25 PM, Michael Dürig mdue...@apache.org wrote: 2nd try: http://svn.apache.org/r1583413 That's more correct, but has horrible performance with any implementation (including BlobStoreBlob and SegmentBlob) that doesn't precompute the hash. As mentioned earlier, a better alternative would be to add an explicit method for this and let the implementations decide what the best identifier would be. For BlobStoreBlob that would likely be: public String getContentIdentity() { return blobId; } And for SegmentBlob: public String getContentIdentity() { return getRecordId().toString(); } BR, Jukka Zitting
Re: [DocumentNodeStore] Clarify behaviour for Commit.getModified
I think the intention of the method is to return a value in seconds with a five second resolution. Makes sense. Change the logic to use seconds and also fixed method names/constant to reflect that Chetan Mehrotra On Fri, Mar 28, 2014 at 3:28 PM, Marcel Reutegger mreut...@adobe.com wrote: Hi, the fix looks good, but why do you want to convert the seconds to milliseconds again at the end? I think the intention of the method is to return a value in seconds with a five second resolution. we definitively need to add javadoc :-/ Regards Marcel -Original Message- From: Chetan Mehrotra [mailto:chetan.mehro...@gmail.com] Sent: Donnerstag, 27. März 2014 18:05 To: oak-dev@jackrabbit.apache.org Subject: [DocumentNodeStore] Clarify behaviour for Commit.getModified Hi, Currently Commit.getModified has following impl - public static long getModified(long timestamp) { // 5 second resolution return timestamp / 1000 / 5; } - The result when treated as timestamp cause the time to set to 0 i.e. 1970 I intend to fix this with (looking at comment) - public static long getModified(long timestamp) { long timeInSec = TimeUnit.MILLISECONDS.toSeconds(timestamp); timeInSec = timeInSec - timeInSec % 5; return TimeUnit.SECONDS.toMillis(timeInSec); } - Would that be correct approach? Chetan Mehrotrarted
Re: jackrabbit-oak build #3922: Errored
Travis says I'm sorry but your test run exceeded 50.0 minutes. One possible solution is to split up your test run. - Chetan Mehrotra On Fri, Mar 28, 2014 at 4:49 PM, Travis CI ju...@apache.org wrote: Build Update for apache/jackrabbit-oak - Build: #3922 Status: Errored Duration: 3002 seconds Commit: 4690d169b8469689436d14e3cadfe8f56621f99f (trunk) Author: Marcel Reutegger Message: OAK-1341 - DocumentNodeStore: Implement revision garbage collection (WIP) JavaDoc git-svn-id: https://svn.apache.org/repos/asf/jackrabbit/oak/trunk@1582676 13f79535-47bb-0310-9956-ffa450edef68 View the changeset: https://github.com/apache/jackrabbit-oak/compare/67f784241281...4690d169b846 View the full build log and details: https://travis-ci.org/apache/jackrabbit-oak/builds/21746766 -- sent by Jukka's Travis notification gateway
Remove SynchronizedDocumentStoreWrapper
I see two similar classes - org.apache.jackrabbit.oak.plugins.document.rdb.SynchronizedDocumentStoreWrapper - org.apache.jackrabbit.oak.plugins.document.util.SynchronizingDocumentStoreWrapper And these are not being used anywhere. Further I am not sure what purpose it would serve. Should they be removed. Further can we implement other wrapper via proxies as it increases work if new methods are to be added to DocumentStore Chetan Mehrotra
Re: Remove SynchronizedDocumentStoreWrapper
On Thu, Mar 27, 2014 at 2:30 PM, Julian Reschke julian.resc...@gmx.de wrote: I created the first one, and use it occasionally for debugging. (it's similar to the Logging*Wrapper. Please do not remove. Okie but there already one preset in Util. Are they different? If not we should keep only one Example? Something like public class SynchronizedDocumentStoreWrapper2 { public static DocumentStore create(DocumentStore documentStore){ return (DocumentStore) Proxy.newProxyInstance( SynchronizedDocumentStoreWrapper2.class.getClassLoader(), new Class[] {DocumentStore.class}, new DocumentStoreProxy(documentStore)); } private static class DocumentStoreProxy implements InvocationHandler { private final DocumentStore delegate; private final Object lock = new Object(); private DocumentStoreProxy(DocumentStore delegate) { this.delegate = delegate; } @Override public Object invoke(Object proxy, Method method, Object[] args) throws Throwable { synchronized (lock){ return method.invoke(delegate, args); } } } } Chetan Mehrotra
Re: Remove SynchronizedDocumentStoreWrapper
On Thu, Mar 27, 2014 at 4:00 PM, Julian Reschke julian.resc...@gmx.de wrote: We can kill the one in rdb (I didn't see the other one when I added it). Would do Chetan Mehrotra
Re: AbstractBlobStoreTest
On Thu, Mar 27, 2014 at 10:08 PM, Julian Reschke julian.resc...@gmx.de wrote: if there's a reason not to It might effect test related to GC as GC logic would clean more than expected set of blobs Chetan Mehrotra
[DocumentNodeStore] Clarify behaviour for Commit.getModified
Hi, Currently Commit.getModified has following impl - public static long getModified(long timestamp) { // 5 second resolution return timestamp / 1000 / 5; } - The result when treated as timestamp cause the time to set to 0 i.e. 1970 I intend to fix this with (looking at comment) - public static long getModified(long timestamp) { long timeInSec = TimeUnit.MILLISECONDS.toSeconds(timestamp); timeInSec = timeInSec - timeInSec % 5; return TimeUnit.SECONDS.toMillis(timeInSec); } - Would that be correct approach? Chetan Mehrotrarted
Re: jackrabbit-oak build #3838: Broken
I fixed that issue yesterday but build is currently failing in rat check Warning: org.apache.xerces.parsers.SAXParser: Property 'http://www.oracle.com/xml/jaxp/properties/entityExpansionLimit' is not recognized. [INFO] Rat check: Summary of files. Unapproved: 1 unknown: 1 generated: 0 approved: 1031 licence. [INFO] [INFO] Reactor Summary: rat.txt points to two new files that get created under oak-core/oaknodes.trace.db which look like H2 db files. Probably the testcase need to be adjusted to create these files in target folder Chetan Mehrotra On Tue, Mar 25, 2014 at 6:20 PM, Chetan Mehrotra chetan.mehro...@gmail.com wrote: On Tue, Mar 25, 2014 at 5:49 PM, Michael Dürig mdue...@apache.org wrote: CacheInvalidationIT Looking into it Chetan Mehrotra
Re: jackrabbit-oak build #3838: Broken
Fixed the RDBDocumentStore to create file in target folder. However current approach would cause issue in test env. Would start a separate thread on that Chetan Mehrotra On Wed, Mar 26, 2014 at 12:52 PM, Chetan Mehrotra chetan.mehro...@gmail.com wrote: I fixed that issue yesterday but build is currently failing in rat check Warning: org.apache.xerces.parsers.SAXParser: Property 'http://www.oracle.com/xml/jaxp/properties/entityExpansionLimit' is not recognized. [INFO] Rat check: Summary of files. Unapproved: 1 unknown: 1 generated: 0 approved: 1031 licence. [INFO] [INFO] Reactor Summary: rat.txt points to two new files that get created under oak-core/oaknodes.trace.db which look like H2 db files. Probably the testcase need to be adjusted to create these files in target folder Chetan Mehrotra On Tue, Mar 25, 2014 at 6:20 PM, Chetan Mehrotra chetan.mehro...@gmail.com wrote: On Tue, Mar 25, 2014 at 5:49 PM, Michael Dürig mdue...@apache.org wrote: CacheInvalidationIT Looking into it Chetan Mehrotra
Re: Request for feedback: OSGi Configuration for Query Limits (OAK-1571)
Patch looks fine to me. Probably we can collapse QueryIndexProvider and QueryEngineSettings into a single QueryEngineContext and pass that along till Root. So: is it worth it to have the 100 KB source code overhead just to make things configurable separately for each Oak instance? I think there are couple of benefits * Isolation between multiple oak instance running on same jvm (minor) * It opens up possibility to have session specific settings. So later we require say JR2 compatible behaviour case for some session then those settings can be overlayed with session attributes * it allows to change the setting at runtime via gui as some of these settings would not require repository restart and can effect the next query that gets executed. That would be a major win So this effort now would enable incremental improvements in QueryEngine in future! The Whiteboard is per Oak instance, right? For OSGi case yes Chetan Mehrotra On Wed, Mar 26, 2014 at 2:23 PM, Thomas Mueller muel...@adobe.com wrote: Hi, I'm trying to make some query settings (limits on the number of nodes read) configurable via OSGi. So far, I have a patch of about 100 KB, and this is just wiring together the components (no OSGi / Whiteboard so far). I wonder, is there an easier way to do it? With system properties, it's just a few lines of code. The disadvantage is that all Oak instances in the same JVM use the same settings, but with OSGi configuration I guess in reality it's not much different. So: is it worth it to have the 100 KB source code overhead just to make things configurable separately for each Oak instance? If not, how could it be implemented? The Whiteboard is per Oak instance, right? Regards, Thomas
Re: jackrabbit-oak build #3838: Broken
On Tue, Mar 25, 2014 at 5:49 PM, Michael Dürig mdue...@apache.org wrote: CacheInvalidationIT Looking into it Chetan Mehrotra
Re: jackrabbit-oak build #3809: Broken
My fault. Looking into it Chetan Mehrotra On Mon, Mar 24, 2014 at 11:58 AM, Travis CI ju...@apache.org wrote: Build Update for apache/jackrabbit-oak - Build: #3809 Status: Broken Duration: 444 seconds Commit: afb6c5335b46067a3ea43ce69c987a46d9a3fd38 (trunk) Author: Chetan Mehrotra Message: OAK-1586 - Implement checkpoint support in DocumentNodeStore Initial implementation which stores the checkpoint data as part of NODES collection -- Using Clock for determining current time to simplify testing -- Custom Clock can be specified via Builder git-svn-id: https://svn.apache.org/repos/asf/jackrabbit/oak/trunk@1580769 13f79535-47bb-0310-9956-ffa450edef68 View the changeset: https://github.com/apache/jackrabbit-oak/compare/6b79fda2e9f2...afb6c5335b46 View the full build log and details: https://travis-ci.org/apache/jackrabbit-oak/builds/21404793 -- sent by Jukka's Travis notification gateway
Re: friendly reminder about license headers
Roger that! For Java file IDE takes care of them. Probably we can just exclude test/resources from rat plugin? Most of the missing headers are reported in that probably Chetan Mehrotra On Thu, Mar 20, 2014 at 2:50 PM, Alex Parvulescu alex.parvule...@gmail.com wrote: Yes boys and girls, files need licence headers! Please check new files before committing them, last 2 days I found 3 occurrences, probably more than the entire last month put together. When in doubt, run your builds with the pedantic profile activated. (mvn clean install -PintegrationTesting,pedantic) your friendly release manager
Re: svn commit: r1578423 - in /jackrabbit/oak/trunk/oak-core: ./ src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb/
It would be better if we make DataSource as part of Builder itself and remove all such logic from RDB. Or better for testing purpose or even in normal case the RDBDocumentStore be instantiated outside and passed in Builder. Then main code would not have to worry about creating DataStource and handle all possible config options that comes with that Chetan Mehrotra On Mon, Mar 17, 2014 at 8:38 PM, resc...@apache.org wrote: Author: reschke Date: Mon Mar 17 15:08:55 2014 New Revision: 1578423 URL: http://svn.apache.org/r1578423 Log: OAK-1533 - add dbcp BasicDataSource (WIP) Added: jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb/RDBDataSourceFactory.java (with props) Modified: jackrabbit/oak/trunk/oak-core/pom.xml jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb/RDBBlobStore.java jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb/RDBDocumentStore.java Modified: jackrabbit/oak/trunk/oak-core/pom.xml URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-core/pom.xml?rev=1578423r1=1578422r2=1578423view=diff == --- jackrabbit/oak/trunk/oak-core/pom.xml (original) +++ jackrabbit/oak/trunk/oak-core/pom.xml Mon Mar 17 15:08:55 2014 @@ -276,6 +276,11 @@ version1.3.158/version optionaltrue/optional /dependency +dependency + groupIdcommons-dbcp/groupId + artifactIdcommons-dbcp/artifactId + version1.4/version +/dependency !-- Logging -- dependency Modified: jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb/RDBBlobStore.java URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb/RDBBlobStore.java?rev=1578423r1=1578422r2=1578423view=diff == --- jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb/RDBBlobStore.java (original) +++ jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb/RDBBlobStore.java Mon Mar 17 15:08:55 2014 @@ -19,7 +19,6 @@ package org.apache.jackrabbit.oak.plugin import java.io.Closeable; import java.io.IOException; import java.sql.Connection; -import java.sql.DriverManager; import java.sql.PreparedStatement; import java.sql.ResultSet; import java.sql.SQLException; @@ -29,14 +28,14 @@ import java.util.Iterator; import javax.sql.DataSource; -import com.google.common.collect.AbstractIterator; - import org.apache.jackrabbit.mk.api.MicroKernelException; -import org.apache.jackrabbit.oak.plugins.blob.CachingBlobStore; import org.apache.jackrabbit.oak.commons.StringUtils; +import org.apache.jackrabbit.oak.plugins.blob.CachingBlobStore; import org.slf4j.Logger; import org.slf4j.LoggerFactory; +import com.google.common.collect.AbstractIterator; + public class RDBBlobStore extends CachingBlobStore implements Closeable { /** * Creates a {@linkplain RDBBlobStore} instance using an embedded H2 @@ -45,8 +44,8 @@ public class RDBBlobStore extends Cachin public RDBBlobStore() { try { String jdbcurl = jdbc:h2:mem:oaknodes; -Connection connection = DriverManager.getConnection(jdbcurl, sa, ); -initialize(connection); +DataSource ds = RDBDataSourceFactory.forJdbcUrl(jdbcurl, sa, ); +initialize(ds.getConnection()); } catch (Exception ex) { throw new MicroKernelException(initializing RDB blob store, ex); } @@ -58,8 +57,8 @@ public class RDBBlobStore extends Cachin */ public RDBBlobStore(String jdbcurl, String username, String password) { try { -Connection connection = DriverManager.getConnection(jdbcurl, username, password); -initialize(connection); +DataSource ds = RDBDataSourceFactory.forJdbcUrl(jdbcurl, username, password); +initialize(ds.getConnection()); } catch (Exception ex) { throw new MicroKernelException(initializing RDB blob store, ex); } Added: jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb/RDBDataSourceFactory.java URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb/RDBDataSourceFactory.java?rev=1578423view=auto == --- jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb/RDBDataSourceFactory.java (added) +++ jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb
Re: failing oak build
Looking into it. Chetan Mehrotra On Tue, Mar 18, 2014 at 2:10 PM, Alex Parvulescu alex.parvule...@gmail.com wrote: Hi, The Oak build is failing Tests in error: testEmptyIdentifier(org.apache.jackrabbit.oak.plugins.document.blob.ds.MongoDataStoreBlobStoreTest): String index out of range: 2 The 0.19 release is really close, it is really important to get the build stable again. best, alex
Re: svn commit: r1578943 - in /jackrabbit/oak/trunk/oak-core/src: main/java/org/apache/jackrabbit/oak/ main/java/org/apache/jackrabbit/oak/plugins/backup/ main/java/org/apache/jackrabbit/oak/plugins/b
I have updated the ~/.subversion/config file with those setting but I use git-svn for my local development. Do I need to tweak any git specific setting for EOL handling? Currently my autocrlf = input in ~/.gitconfig Chetan Mehrotra On Tue, Mar 18, 2014 at 8:38 PM, Jukka Zitting jukka.zitt...@gmail.com wrote: Hi, On Tue, Mar 18, 2014 at 11:01 AM, resc...@apache.org wrote: jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/RepositoryManager.java (props changed) jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/backup/FileStoreBackupRestore.java (props changed) jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/backup/FileStoreBackupRestoreMBean.java (props changed) jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/blob/BlobGC.java (props changed) jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/blob/BlobGCMBean.java (props changed) jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/spi/state/RevisionGC.java (props changed) jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/spi/state/RevisionGCMBean.java (props changed) jackrabbit/oak/trunk/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/index/property/OrderDirectionEnumTest.java (props changed) jackrabbit/oak/trunk/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/index/property/OrderedIndexCostTest.java (props changed) jackrabbit/oak/trunk/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/index/property/ValuePathTupleTest.java (props changed) jackrabbit/oak/trunk/oak-core/src/test/java/org/apache/jackrabbit/oak/spi/state/AbstractRebaseDiffTest.java (props changed) People who added these files, please check your eol-style settings: http://www.apache.org/dev/svn-eol-style.txt BR, Jukka Zitting
Using DBCursor in MongoDocumentStore
Hi, I was looking into the support for Version GC (OAK-1341). For that I had a look at BlobReferenceIterator#loadBatchQuery which currently does a pagination over all Documents which have binaries. The DBCursor returned by Mongo [1] implements the Iterator contract and would lazily fetch the next batch. So had two queries 1. Should we expose the Iterator as part of DocumentStore API and rely on Mongo to paginate. Probably the ResultSet in DB case can also easily Iterator pattern 2. Cache Handling - Currently fetching document in such pattern for GC handling would also overflow the cache. So should we pass an extra flag to not cache docs from such calls Chetan Mehrotra [1] http://api.mongodb.org/java/2.0/com/mongodb/DBCursor.html
Build currently failing in SegmentMicroKernelFixture
Hi Team, I was trying to run the IT test locally before I push in my changes related to DataStore. However it seems that testcase are currently failing even without my change. - Apache CI - http://ci.apache.org/builders/oak-trunk/ - There has been no build post rev 1576949 (Current 1577399) - Travis CI - https://travis-ci.org/apache/jackrabbit-oak/builds Failing due to updated JR SNAPSHOTs for 2.8 are not present causing compilation failure On running the test locally following tests fails for me Tests run: 116, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 10.488 sec FAILURE! testLargeBlob[1](org.apache.jackrabbit.mk.test.MicroKernelIT) Time elapsed: 2.589 sec FAILURE! java.lang.AssertionError: data does not match expected:24 but was:48 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.jackrabbit.mk.test.MicroKernelIT.testBlobs(MicroKernelIT.java:1382) On debugging it seems to be a failure in SegmentMicroKernelFixture and comes only when SegmentNodeStore is used with MemoryStore Chetan Mehrotra
Re: Using other JVM supported language like Groovy for testcase
On Fri, Mar 14, 2014 at 1:36 PM, Davide Giannella giannella.dav...@gmail.com wrote: Personally I don't mind but it would complicate the life of someone who doesn't know groovy and has to maintain the code later on. I understand the concern here. However Groovy is much more easier to understand for a Java developer compared to other languages like Scala. Also for now I am proposing to use that in a small module like oak-pojosr as an experiment. After some time we can weigh the benefits and then decide if wider usage should be encouraged or not. Chetan Mehrotra
Review of patches related to BlobStore related work
Hi, As part of work related to enabling usage of custom BlobStore implementations I have updated broken down patches to * OAK-1502 - Make DataStores available to NodeStores [1] * OAK-805 - Support for existing Jackrabbit 2.x DataStores [2] * OAK-1333 - SegmentMK: Support for Blobs in external storage [3] To see full changes together (to get a better picture) you can have a look at Github fork [4] and the last 3 commits there. As these changes need to be part of next release (by 13 March), it would be helpful if it can be reviewed soon!. Chetan Mehrotra [1] https://issues.apache.org/jira/browse/OAK-1502 [2] https://issues.apache.org/jira/browse/OAK-805 [3] https://issues.apache.org/jira/browse/OAK-1333 [4] https://github.com/chetanmeh/jackrabbit-oak/commits/OAK-1502
Using other JVM supported language like Groovy for testcase
Just testing the waters here :) I need to write couple of integration testcase related to checking OSGi configuration support (OAK-1502) in oak-pojosr module. It would be much much more faster and convenient for me if I can write these testcases in groovy. So would it be fine if I can use Groovy language for writing testcase only in oak-pojosr module? Chetan Mehrotra
Re: [DISCUSS] - OSGi deployment for Oak Lucene / Solr indexers
Couple of things to try * Specify the packages versions via package-info * Inline the classes instead of embedding the jars This would enable maven-bundle-plugin to see required package-info.java file for versions and also the SCR generated files. Also can you share your project say on github. Would be easier for me to try some options Chetan Mehrotra On Wed, Mar 12, 2014 at 3:55 PM, Tommaso Teofili tommaso.teof...@gmail.com wrote: update on this: I've tried the oak-fulltext approach and I found two issues: 1. exported packages with semantic versioning from oak-lucene and oak-solr get lost when packing everything together unless they're explicitly specified (by hand) in the oak-fulltext maven-bundle-plugin configuration, it can be done but can be tedious (and it's error prone) 2. OSGi services exported by oak-lucene and oak-solr don't get exported by oak-fulltext as maven-scr-plugin can look into src/main/java or classes but don't know if / how it could work with embedded jars. Any suggestions? Regards, Tommaso 2014-03-11 9:00 GMT+01:00 Tommaso Teofili tommaso.teof...@gmail.com: if there're no other objections / comments I'll go with the last suggested approach of having oak-lucene and oak-solr not embedding anything and having the oak-fulltext bundle embedding everything needed to make Lucene and Solr indexers working in OSGi (lucene-*, oak-lucene, solr-*, oak-solr-*, etc.) until we (eventually) get to proper semantic versioning in Lucene / Solr. As a side effect I don't think it would make sense to keep oak-solr-embedded and oak-solr-remote as separate artifacts so I'd merge them with oak-solr-core in one oak-solr bundle. Regards, Tommaso 2014-03-10 18:18 GMT+01:00 Tommaso Teofili tommaso.teof...@gmail.com: ah ok, thanks for clarifying. Regards, Tommaso 2014-03-10 18:10 GMT+01:00 Jukka Zitting jukka.zitt...@gmail.com: Hi, On Mon, Mar 10, 2014 at 1:01 PM, Tommaso Teofili tommaso.teof...@gmail.com wrote: ok, so (in OSGi env) we would have oak-solr and oak-fulltext as fragments of oak-lucene (being the fragment host) No, that's not what I meant. The proposed oak-fulltext bundle would contain all of oak-lucene, oak-solr, and the Lucene/Solr dependencies. No need for fragment bundles in this case. BR, Jukka Zitting
Re: buildbot failure in ASF Buildbot on oak-trunk
There were couple of different commits and the issues highlighted in build failure have been addressed in later CL. The testcase passed on local setup. Would wait for further reports Chetan Mehrotra On Tue, Mar 11, 2014 at 1:17 PM, build...@apache.org wrote: The Buildbot has detected a new failure on builder oak-trunk while building ASF Buildbot. Full details are available at: http://ci.apache.org/builders/oak-trunk/builds/4628 Buildbot URL: http://ci.apache.org/ Buildslave for this Build: osiris_ubuntu Build Reason: scheduler Build Source Stamp: [branch jackrabbit/oak/trunk] 1576207 Blamelist: chetanm BUILD FAILED: failed compile sincerely, -The Buildbot
Re: svn commit: r1576236 - /jackrabbit/oak/trunk/oak-pojosr/pom.xml
Thanks Julian! Missed that part while porting the project from github Chetan Mehrotra On Tue, Mar 11, 2014 at 2:32 PM, resc...@apache.org wrote: Author: reschke Date: Tue Mar 11 09:02:43 2014 New Revision: 1576236 URL: http://svn.apache.org/r1576236 Log: OAK-1522 - fix POM Modified: jackrabbit/oak/trunk/oak-pojosr/pom.xml Modified: jackrabbit/oak/trunk/oak-pojosr/pom.xml URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-pojosr/pom.xml?rev=1576236r1=1576235r2=1576236view=diff == --- jackrabbit/oak/trunk/oak-pojosr/pom.xml (original) +++ jackrabbit/oak/trunk/oak-pojosr/pom.xml Tue Mar 11 09:02:43 2014 @@ -24,6 +24,7 @@ groupIdorg.apache.jackrabbit/groupId artifactIdoak-parent/artifactId version0.19-SNAPSHOT/version +relativePath../oak-parent/pom.xml/relativePath /parent artifactIdoak-pojosr/artifactId
Re: [DISCUSS] - OSGi deployment for Oak Lucene / Solr indexers
My vote would be to go for #3 move the OSGi services we need for Solr in Oak into oak-solr-osgi (as a fragment cannot run OSGi components/services) Need not be. Host bundle can allow DS components to be picked from Fragment bundle. So the required logic can live in respective fragment bundle. However I would prefer if its done in following way A - oak-lucene - Becomes the host bundle. Assuming its always required. Though some of its services might not be required in call cases B - oak-solr-* - Remain independent but fragment bundles. However they do not embed any jars C - lucene-fragment - This fragment bundle embed the lucene related jars D - solr-fragment - This fragment embed the solr related jars All the fragment specify the oak-lucene as host bundle Reason for not embedding the Lucene and Solr related dependencies in Oak bundles is to enable usage of same bundle jars in non OSGi env without adding to size. As embedded jars (which are not inlined) would not be usable in non OSGi env a user would have to add such jars in addition to the one embedded thus adding to size Chetan Mehrotra On Mon, Mar 10, 2014 at 2:37 PM, Tommaso Teofili tommaso.teof...@gmail.com wrote: Hi all, The main issue we currently have with our full text indexers is that Lucene and Solr are not OSGi ready, missing proper MANIFEST entries so that they cannot directly be installed in e.g. Apache Felix, don't have semantic version information, etc. Currently oak-lucene embeds the Lucene dependencies it needs, also exporting its Lucene packages so that they can be used by oak-solr-core (as they're needed by Solr) which exposes its embedded Solr packages so that oak-solr-[embedded|remote] can use them. While this should work, there are some concerns raised in OAK-1442 [2]: - one issue is that with the current approach we have a problem if a Lucene / Solr package changes in a backward incompatible way while we don't properly upgrade the semantic version of the Oak package(s) using that, which in the end would mean that we in Oak would have to track changes in Lucene / Solr and I think I can assume we don't want to - the other issue relates more to how we package such Lucene and Solr artifacts to use them, as the suggestion is to just wrap o.a.lucene-* and o.a.solr-* in two wrapping bundles which can be installed inside an OSGi container. in OAK-1475 [1] we discussed a bit some different approaches for the deployment of Oak Lucene and Solr indexers in an OSGi environment, currently we have the following options: 1. package oak-lucene and oak-solr-* in a single bundle (e.g. called oak-fulltext), with their Lucene and Solr dependencies embedded, the Lucene indexer OSGi services would be already available, the Solr ones would need to be configured in order to start the Solr indexer. 2. package and export Lucene stuff in a oak-lucene-osgi bundle, package and export Solr stuff in oak-solr-osgi bundle, avoid oak-lucene and oak-solr-core to export Lucene and Solr packages. 3. merge oak-solr-* in a single oak-solr bundle which embeds the Solr dependencies (but doesn't export them) to be a fragment of oak-lucene (so that they share the classloader and therefore oak-solr can use Lucene stuff in oak-lucene), move the OSGi services we need for Solr in Oak into oak-solr-osgi (as a fragment cannot run OSGi components/services) Not yet discussed options: 4. remove the exported contents from oak-lucene and oak-solr, merge oak-solr-* together and duplicate the Lucene dependencies to be embedded in oak-solr. Options 1 and 4 are the simplest ones, the only disadvantage is packaging is heavy (in 4 we have duplicated embedded dependencies for Lucene in oak-lucene and oak-solr) Option 2 is the most OSGi oriented, even if has the unlikely, but yet possible, issue with semantic versioning. Option 3 is the smartest and trickiest one, no duplication of dependencies, full OSGi, but a bit more complicated as it uses OSGi fragments. My preferences go to 3 and 4. As this should be fixed for 0.19 please share your comments, Regards, Tommaso [1] : https://issues.apache.org/jira/browse/OAK-1475 [2] : https://issues.apache.org/jira/browse/OAK-1442?focusedCommentId=13908472page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13908472
Re: Dependency on json-simple in oak-core
Thanks Julian. Created OAK-1521 to track this Chetan Mehrotra On Mon, Mar 10, 2014 at 1:07 PM, Julian Reschke julian.resc...@gmx.de wrote: On 2014-03-10 05:06, Chetan Mehrotra wrote: Currently oak-core has a dependency on com.googlecode.json-simple:json-simple:1.1.1 jar. Looking at compile time usage I see it being referred only from RDBDocumentStore Would it be possible to use classes from 'org.apache.jackrabbit.oak.commons.json' package and remove the required dependency on this library Chetan Mehrotra Yes, I can have a look.
Re: Using Oak with PojoSR - Make use of OSGi features in POJO env
Created https://issues.apache.org/jira/browse/OAK-1522 to track this. Kindly review the patch Chetan Mehrotra On Wed, Mar 5, 2014 at 1:24 AM, Jukka Zitting jukka.zitt...@gmail.com wrote: Hi, On Tue, Mar 4, 2014 at 3:11 AM, Chetan Mehrotra chetan.mehro...@gmail.com wrote: 1. Configure components - Currently various sub modules are using different ways to configure. Some is being done via system properties, Security module has its own way of configuring, DataStore support has its own That's why I brought up the new DataStore config mechanism as somewhat troublesome. Security configuration has the same problem. System properties are currently used in places where a quick and simple way is needed to tweak options during testing. If a particular setting needs to be available in production, it should be made configurable through a setter or a constructor argument. Then it is easy to control the setting through OSGi service properties or by whichever framework or configuration mechanism is being used. 2. Expose extension points - Exposing extension points in non OSGi env becomes tricky. We faced some issues around LoginModule extensiblity earlier We have plenty of extension points (CommitHook, QueryIndexProvider, etc.) that work just fine. The problem with LoginModules is caused by the tricky way JAAS works, not by any inherent flaw in our extension mechanism. PojoSR [1] provides support for using OSGi constructs outside of OSGi framework I have implemented a POC [2] which makes use of PojoSR to configure Oak in pojo env. So far basic stuff seems to work as expected. Would be worthwhile to investigate further here and possibly provide a PojoSR based RespositoryFactory? Looks nice! Having alternative ways of configuring and running Oak is useful, as it helps us spot problems like the one you brought up about the security configuration. BR, Jukka Zitting
Re: Dependency on json-simple in oak-core
and org.apache.jackrabbit.oak.commons.json does not only seem to introduce a JSOP dependency (aren't we getting rid of that?), but also the JSON support doesn't appear to support data types... (JsonObject.getProperties returns a String/String map). In that case can we use the JSON library from json.org as thats widely used in projects like Sling Felix WebConsole already. So we would not be adding a new dependency there Chetan Mehrotra On Mon, Mar 10, 2014 at 6:23 PM, Julian Reschke julian.resc...@gmx.de wrote: On 2014-03-10 10:59, Chetan Mehrotra wrote: Thanks Julian. Created OAK-1521 to track this I have looked at this, and org.apache.jackrabbit.oak.commons.json does not only seem to introduce a JSOP dependency (aren't we getting rid of that?), but also the JSON support doesn't appear to support data types... (JsonObject.getProperties returns a String/String map). Best regards, Julian
Queries related to various BlobStore implementations
Currently we have following implementations of BlobStore 1. org.apache.jackrabbit.oak.plugins.blob.db.DbBlobStore 2. org.apache.jackrabbit.oak.plugins.blob.cloud.CloudBlobStore 3. org.apache.jackrabbit.oak.spi.blob.FileBlobStore 4. org.apache.jackrabbit.oak.spi.blob.MemoryBlobStore 5. org.apache.jackrabbit.oak.plugins.document.mongo.MongoBlobStore 6. org.apache.jackrabbit.oak.plugins.document.mongo.gridfs.MongoGridFSBlobStore 7. org.apache.jackrabbit.oak.plugins.blob.datastore.DataStoreBlobStore 8. org.apache.jackrabbit.oak.plugins.document.rdb.RDBBlobStore And then for DataStore we have a) org.apache.jackrabbit.core.data.db.DbDataStore b) org.apache.jackrabbit.aws.ext.ds.S3DataStore c) org.apache.jackrabbit.core.data.FileDataStore Now based on above list couple of Queries Q1 - What is the difference between RDBBlobStore and DbBlobStore? Should we have only one implementation for storage in DataBase Q2 - For a system which is getting upgraded from JR2 to Oak. Would 2.1 It continue to use its existing DataStore implementation. 2.2 Migrate all the content first then switch to one of the BlobStore implementations 2.3 Both DataStore and BlobStore would be used together Q3. Just for the record and to confirm we would be preferring S3DataStore over CloudBlobStore for now Q4. Should we remove some of the unused BlobStore impl like MongoGridFSBlobStore. They can be resurrected back if need is felt for them Chetan Mehrotra
Re: Dependency on json-simple in oak-core
Looking at its widespread usage I think its correct and maintained. So +1 for json.org Chetan Mehrotra On Mon, Mar 10, 2014 at 6:47 PM, Julian Reschke julian.resc...@gmx.de wrote: On 2014-03-10 13:58, Chetan Mehrotra wrote: and org.apache.jackrabbit.oak.commons.json does not only seem to introduce a JSOP dependency (aren't we getting rid of that?), but also the JSON support doesn't appear to support data types... (JsonObject.getProperties returns a String/String map). In that case can we use the JSON library from json.org as thats widely used in projects like Sling Felix WebConsole already. So we would not be adding a new dependency there Chetan Mehrotra FWIW, we the oak-blob POM also references json-simple, but it doesn't appear to be used. I really don't care what JSON lib we use, but it needs to be correct, maintained, and fast. If json.org fulfills these requirements then I'm more than happy to switch. Best regards, Julian
Using Oak Run with complex setup and configurations
Currently oak-run (and to some extent oak-upgrade) both instantiate Repository (or NodeStore) instance outside of OSGi env. So far some level of customizations are supported. Looking for some guidance on how to support bit more complex configuration like using different DataStore and providing config to those DataStore? Current way of configuring custom BlobStore relies on system properties to achieve the same. As part of OAK-1502 I need to refactor the way the BlobStores are configured. I would like to configure them via OSGi configuration and use SCR annotation for the same With such a change supporting oak-run and oak-upgrade becomes tricky. One option is to expose all such config options as command line options but then that would require duplicate work for handling configuration. Another way is to switch to proposed PojoSR based setup [2]. So one would provide all the config in a single json file for example oak-config.json { properties: { oak.documentMK.revisionAge: 7 }, configs: { org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService: { db: test, mongouri: mongodb://localhost:27017 }, org.apache.jackrabbit.oak.security.user.UserConfigurationImpl: { usersPath: /home/users, /home/users: /home/groups } } } And on command line we pass on that file as one of the arguments java -jar oak-run-*.jar --config oak-config.json Chetan Mehrotra [1] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/blob/BlobStoreHelper.java#L52-52 [2] http://markmail.org/thread/26g2fqbda3uyahmm
Dependency on json-simple in oak-core
Currently oak-core has a dependency on com.googlecode.json-simple:json-simple:1.1.1 jar. Looking at compile time usage I see it being referred only from RDBDocumentStore Would it be possible to use classes from 'org.apache.jackrabbit.oak.commons.json' package and remove the required dependency on this library Chetan Mehrotra
Proposal for configuring Oak using PojoSR and not miss good old repository.xml for configuration!
Hi, Till JR2 the repository was configured via repository.xml which served two purposes 1. Support for DI 2. Configuring the service properties From the end user perspective this used to prove useful from usability point of view. 1. One can see what all components are involved 2. Easy to specify all config properties at one place. The xml can be documented also 3. JR2 used to warn if property names are not correct. So good support for handling misconfiguration From developer perspective the repository.xml constrained us in the ways repository element can be configured. With Oak we have done away with the repository.xml approach. The repository components can now be assembled programatically. In OSGi repository configures itself through std OSGi constructs. For non OSGi case well we have the Jcr class which provides a ready repository with suitable defaults. However configuring properties still involves quite a work and so far have to do it in code IMHO having configuration in place xml/yaml etc is pretty useful to a end user and simplifies administration. Proposal - To enable ease of both configuration and modularity we already support OSGi. With PojoSR (as explained in other mail [1]) we can support both 1. Have a PojoSR based RepositoryFactory to provide access to Repository instance 2. Have a custom xml way to capture OSGi configuration. Instead of config scattered across multiple property file we can consolidate all such config in a single file. For some example see [2] [3] As implemented with [4] when repository starts with default Segment store configured it creates following directory layout repo-home |__ /bundles |__/content |__/config the config folder can be replaced with single xml. Bundles folder is created by PojoSR as some bundles like ConfigAdmin use the bundle data folder to store some state Benefits 1. Repository can easily be extended vis std OSGi constructs is users use DS and put there bundle in the flat classpath 2. For simple setup users can directly register there service programatically with the PojoServiceRegistry 3. Configuration is managed in a central place and can easily be edited 4. For more adventurous usage users can also enable the Felix WebConsole when deploying a Oak based application in Tomocat or even uses Felix Jetty bundle!. Most of the webconsole would work Thoughts? Chetan Mehrotra PS: PojoSR is now being moved to Apache Felix (FELIX-4445) [1] http://markmail.org/thread/2vnktbuq2jd2ovs5 [2] https://docs.jboss.org/author/display/AS7/OSGi+subsystem+configuration#OSGisubsystemconfiguration-cas [3] https://access.redhat.com/site/documentation/en-US/JBoss_Fuse/6.0/html/Deploying_into_the_Container/files/DeployCamel-OsgiConfigProps.html [4] https://github.com/chetanmeh/oak-pojosr
Using Oak with PojoSR - Make use of OSGi features in POJO env
Hi, Earlier we had some discussion around different ways to configure Oak. Oak currently supports running and configuring itself in OSGi env. it can easily be used in non OSGi env also by programatically configuring and assembling the required components However this poses problem around best way to 1. Configure components - Currently various sub modules are using different ways to configure. Some is being done via system properties, Security module has its own way of configuring, DataStore support has its own 2. Expose extension points - Exposing extension points in non OSGi env becomes tricky. We faced some issues around LoginModule extensiblity earlier PojoSR [1] provides support for using OSGi constructs outside of OSGi framework I have implemented a POC [2] which makes use of PojoSR to configure Oak in pojo env. So far basic stuff seems to work as expected. Would be worthwhile to investigate further here and possibly provide a PojoSR based RespositoryFactory? Chetan Mehrotra [1] https://code.google.com/p/pojosr/ [2] https://github.com/chetanmeh/oak-pojosr
Re: buildbot failure in ASF Buildbot on oak-trunk-win7
Build failed due to --- Failed tests: concurrentObservers(org.apache.jackrabbit.oak.spi.commit.BackgroundObserverTest) Tests run: 1579, Failures: 1, Errors: 0, Skipped: 70 --- This should not be related to my changes and looks like an intermittent issue. Chetan Mehrotra On Mon, Mar 3, 2014 at 11:33 AM, build...@apache.org wrote: The Buildbot has detected a new failure on builder oak-trunk-win7 while building ASF Buildbot. Full details are available at: http://ci.apache.org/builders/oak-trunk-win7/builds/4725 Buildbot URL: http://ci.apache.org/ Buildslave for this Build: bb-win7 Build Reason: scheduler Build Source Stamp: [branch jackrabbit/oak/trunk] 1573450 Blamelist: chetanm BUILD FAILED: failed compile sincerely, -The Buildbot
Re: Efficient import of binary data into Oak
On Tue, Feb 18, 2014 at 2:32 PM, Jukka Zitting jukka.zitt...@gmail.com wrote: Good point. That use case would probably be best handled with a specific InputStream subclass like suggested by Felix for files. That mode is fine for case like FileInputStream but not for case like S3 where underlying data is just a url and DataStore needs to make use of that. Instead what I was referring to is the ability to pass custom Binary implementations - FileBinary implements javx.jcr.Binary - - File object - boolean flag indicating that ownership of file. If true it indicates that client is transferring the ownership of the file to Oak. In such a case the Oak DataStore can simply move the file instance to its storage area - S3Binary implements javx.jcr.Binary - Signed S3 url which can be used by S3DataStore to perform Copy operation This binary is then passed to JCR API and then DataStore makes use of that and chooses an efficient path depending on the type of Binary Couple of points to note - Such an API would introduce a strong coupling with underlying DataStore - But such an API would then be targeted for a very specific usecase where the code needs to import high amount of binary data and it is written keeping in mind the underlying DataStore being used such that it can use the most optimized way Chetan Mehrotra
Re: Efficient import of binary data into Oak
On Tue, Feb 18, 2014 at 3:48 PM, Jukka Zitting jukka.zitt...@gmail.com wrote: Something like S3InputStream.getURL() should work just fine for that use case: That also would work with the caveat that in between layer do not decorate the InputStream in any form. Chetan Mehrotra
Efficient import of binary data into Oak
Hi, Currently in a Sling based application where a user uploads a file to the JCR following sequence of steps are executed 1. User uploads file via HTTP request mostly using Multi-Part form data based upload 2. Sling uses Commons File Upload to parse the multi-part request which uses a DiskFileItemFactory and write the binary content to a temporary file (for file size 256 KB) [1] 3. Later the servlet would access the JCR Session and create a Binary value by extracting the InputStream 4. The file content would then be spooled into the BlobStore Effect of different blobstore Now depending on the type of BlobStore one of the following code flow would happen A - JR2 DataStores - The inputstream would be copied to file B - S3DataStore - The AWS SDK would be creating a temporary file and then that file content would be streamed back to the S3 C - Segment - Content from InputStream would be stored as part of various segments D - MongoBlobStore - Content from InputStream would be pushed to remote mongo via multiple remote calls Things to note in above sequence 1. Uploaded content is copied twice. 2. The whole content is spooled via InputStream through JVM Heap Possible areas of Improvement 1. If the BlobStore is finally using some File (on same hard disk not NFS) then it might be better to *move* the file which was created in upload. This would help local FileDataStore and S3DataStore 2. Avoid spooling via InputStream if possible. Spooling via IS is slow [3]. Though in most cases we use efficient buffered copy which is marginally slower than NIO based variants. However avoiding moving byte[] might reduce pressure on GC (probably!) Changes required If we can have a way to create JCR Binary implementations which enables DataStore/BlobStore to efficiently transfer content then that would help. For example for File based DS the Binary created can keep a reference to the source File object and that Binary is used in JCR API. Eventually the FileDataStore can treat it in a different way and move the file. Another example is S3DataStore - In some cases the file has already been transferred to S3 using other options. And the user wants to transfer the S3 file from its bucket to our bucket. So a Binary implementation which can just wrap the S3 url would enable the S3DataStore to transfer the content without streaming all content again [4] Any thoughts on the best way to enable users of Oak to create Binaries via other means (compared to current mode which only enables via InputStream) and enable the DataStores to make use of such binaries? Chetan Mehrotra [1]https://github.com/apache/sling/blob/trunk/bundles/engine/src/main/java/org/apache/sling/engine/impl/parameters/ParameterSupport.java#L190 [2] http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/PutObjectRequest.html [3] http://www.baptiste-wicht.com/2010/08/file-copy-in-java-benchmark/3/ [4] http://stackoverflow.com/questions/9664904/best-way-to-move-files-between-s3-buckets
Re: Use LoginModuleProxy (was: Make Whiteboard accessible through ContentRepository)
On Thu, Feb 13, 2014 at 11:49 PM, Tobias Bocanegra tri...@apache.org wrote: this callback would rely on some sort of (non-osgi) list of pre-configured factories, right? eg some sort of LoginModuleFactoryRegistryImpl that is added to the authentication configuration? Sort of. As Jukka mentioned the code which configures Oak programatically in non OSGi env can also provide a list of LoginModuleFactory as part of Security setup. In OSGi env get them via DS this has one caveat, that you could register 1 factory per class (if you are using the factories class name as identifier). otherwise you would need to introduce some kind of factory pid. Yup. For completeness and to cover all cases you can have some factory property (similar to service property in OSGi). And that can be passed to LoginModuleFactory as argument so that it can instantiate right LM impl Chetan Mehrotra
Re: Make Whiteboard accessible through ContentRepository
On Thu, Feb 13, 2014 at 12:45 PM, Tobias Bocanegra tri...@apache.org wrote: I don't quite follow. can you give an example of what would be in the jaas.conf and where you instantiate the ProxyLoginModule ? A rough sketch would be ... jaas.config oakAuth { org.apache.jackrabbit.oak.security.ProxyLoginModule REQUIRED loginModuleFactoryClass=org.apache.jackrabbit.oak.security.LdapLoginModuleFactory authIdentity={USERNAME} useSSL=false debug=true; }; public class ProxyLoginModule implements LoginModule{ private LoginModule delegate; public void initialize(Subject subject, CallbackHandler callbackHandler, MapString, ? sharedState, MapString, ? options){ LMFactoryProviderCallBack lmfcb = new LMFactoryProviderCallBack() factory = callbackHandler.handle([lmfcb]); LoginModuleFactory factory = lmfcb.getLoginModuleFactoryProvider() .getFactory(options.get(loginModuleFactoryClass)); delegate = factory.createLoginModule(); delegate.initialize(subject, callbackHandler, sharedState, options); } ... //Use delegate for other operations } The flow would involve following steps 1. User mentions the ProxyLoginModule in jaas entry and provide the factory class name in the config. JAAS logic would be instantiating the Proxy LM 2. Oak provides a callback using which Proxy LM can obtain the factory 3. Upon init the proxy would initialize the delegate from factory 4. The delegate is used for later calls 5. LM if required can still use the config from jaas or ot is configured via factory itself Note here I preferred using the callback to get LM access the outer layer services instead of using a custom config. The custom config mode works fine in standalone case where the application is the sole user of JAAS system. Hence it works fine for Karaf/OSGi env But that might not work properly in App server env where app server itself uses jaas. So to avoid interfering in embedded mode callback should be preferred. Chetan Mehrotra
Re: Failure in compile trunk
I checked the code and looks like classes from Felix JAAS are not being used as after removing the entry oak-core compiled successfully. Te entry is now removed Lets see if next build pass. Also i would start release of Felix Jaas module Chetan Mehrotra On Tue, Feb 11, 2014 at 2:18 PM, Michael Dürig mdue...@apache.org wrote: On 11.2.14 9:43 , Davide Giannella wrote: Good morning everyone, it seems that the commit r1566802 introduced a dependency to an artifact that is not available on maven repo. http://markmail.org/message/6gsaop3mfxccxkjn ERROR] Failed to execute goal on project oak-core: Could not resolve dependencies for project org.apache.jackrabbit:oak-core:bundle:0.17-SNAPSHOT: Could not find artifact org.apache.felix:org.apache.felix.jaas:jar:0.0.1-R1560269 in central (http://repo.maven.apache.org/maven2) Michael
Re: Failure in compile trunk
For now I have implemented a workaround with [1]. This should get us moving forward. This would be fixed with proper released version of JAAS bundle by end of this week once the release is approved in Felix project Chetan Mehrotra [1] http://svn.apache.org/r1567089 On Tue, Feb 11, 2014 at 5:21 PM, Julian Reschke julian.resc...@gmx.de wrote: On 2014-02-11 10:27, Amit Jain wrote: Its also referenced from oak-auth-external/pom.xml Thanks Amit ... Fails here as well. Can we revert that change for now? Best regards, Julian
Re: Make Whiteboard accessible through ContentRepository
To address the same problem with Felix Jaas support one can make use of LoginModuleFactory [1]. Its job is to create LoginModule instances. One example of this is JdbcLoginModuleFactory [2]. It creates JdbcLoginModule and passes on the DataSource to the LoginModule instances. So instead of LoginModule _looking up_ the DataSource service it is provided with the DataSource instance. The factory itself was _provided_ with DataSource reference via DI (via Declarative Services) To implement a similar approach in non OSGi world following approach can be used 1. Have a ProxyLoginModule like [3]. The JAAS Config would refer to this class and would be able to create it 2. Have a LoginModuleFactory LMF (custom one) which is referred to in the JAAS Config. 3. One can register a custom LMF implementation with Oak and they would be passed on to SecurityProvider 3. ProxyLoginModule determines the type of LoginModuleFactory and obtains them via Callback 4. The LMF obtained is used to create the LoginModule instance Now same LoginModule (where most of the logic reside) can be shared between OSGi and non OSGi world. Further you can even share the LoginModuleFactory (if using non OSGi stuff only). For OSGi case the LMF would be managed via DS and its dependencies provided via DS For non OSGi case host application would wire up the LMF with its dependencies (via setters) and then register them with the Oak Chetan Mehrotra [1] http://svn.apache.org/repos/asf/felix/trunk/jaas/src/main/java/org/apache/felix/jaas/LoginModuleFactory.java [2] http://svn.apache.org/repos/asf/felix/trunk/examples/jaas/lm-jdbc/src/main/java/org/apache/felix/example/jaas/jdbc/JdbcLoginModuleFactory.java [3] http://svn.apache.org/repos/asf/felix/trunk/jaas/src/main/java/org/apache/felix/jaas/boot/ProxyLoginModule.java On Wed, Feb 12, 2014 at 12:07 PM, Tobias Bocanegra tri...@apache.org wrote: Hi, On Tue, Feb 11, 2014 at 4:38 PM, Tobias Bocanegra tri...@apache.org wrote: On Tue, Feb 11, 2014 at 1:59 PM, Jukka Zitting jukka.zitt...@gmail.com wrote: On Mon, Feb 10, 2014 at 2:25 AM, Felix Meschberger fmesc...@adobe.com wrote: This thread indeed raises the question, why Oak has to come up with something (the Whiteboard) that is almost but not quite like OSGi instead of going all the way through ? This is a misunderstanding; we're not trying to reinvent OSGi. The Whiteboard interface is *only* an abstraction of the whiteboard pattern described in http://www.osgi.org/wiki/uploads/Links/whiteboard.pdf, and is used only for those cases in Oak where that pattern is useful. When running in an OSGi environment, the Whiteboard simply leverages the existing OSGi functionality. The Whiteboard in Oak is not a generic service registry, and is not supposed to become one. but then, why does the Whiteboard interface has a register() method? This indicates to me, that there is a global service registry behind that can be used by all other users of the whiteboard. Also, using the whiteboard in the tests make them very easy configurable and simulates a bit better how OSGi will work. for example in the ExternalLoginModuleTest, I can just do: whiteboard = oak.getWhiteboard(); whiteboard.register(SyncManager.class, new SyncManagerImpl(whiteboard), Collections.emptyMap()); whiteboard.register(ExternalIdentityProviderManager.class, new ExternalIDPManagerImpl(whiteboard), Collections.emptyMap()); If I need to register them with the login module, this would not work today, without hard wiring all possible services to the SecurityProvider. addendum: If I want to achieve the same without the whiteboard, I would need to: * Invent a new interface that allows to pass services/helpers to the login modules. eg a LoginModuleService interface * Create some sort of LoginModuleServiceProviderConfiguration * Create an implementation of the above that deals with OSGi but also can be used statically * Add the LoginModuleServiceProviderConfiguration to ServiceProvider.getConfiguration() * Add the interface to my ExternalIDPManagerImpl and SyncManagerImpl * in the login module, retrieve the LoginModuleServiceProviderConfiguration from the SecurityProvider, then find a service for SyncManager and ExternalIdentityProviderManager * in the non-osgi case, I would need to initialize the LoginModuleServiceProviderConfigurationImpl myself and add the 2 services and add the config to the securityprovider. The additional work is that I need to re-invent some sort of service registry for the LoginModuleServiceProviderConfigurationImpl and extend the SecurityProvider with another configuration. Also, I limit the interfaces to be used in the LoginModules to the ones implementing the LoginModuleService interface. you might say, this is more robust - I say, this is just more complicated and has tighter coupling. regards, toby
Re: svn commit: r1561926 - /jackrabbit/oak/tags/jackrabbit-oak-core-0.15.2/pom.xml
On Tue, Jan 28, 2014 at 7:41 AM, tri...@apache.org wrote: + org.apache.jackrabbit.oak;version=0.15.0, + org.apache.jackrabbit.oak.api;version=0.15.0, + org.apache.jackrabbit.oak.api.jmx;version=0.15.0, We should version packages independent of Oak release version. Probably a better way would be to 1. Add explicit package-info.java with correct bnd version annotations. Then we need not explicitly list down package names in pom.xml 2. Untill we have a 1.0 release we can set the package version to 1.0.0 and not change it if even if any API changes considering that pre 1.0 builds are to be considered as unstable wrt API Post 1.0 we take care to bump the version as per OSGi Sematic Version guidelines [1] Chetan Mehrotra [1] http://www.osgi.org/wiki/uploads/Links/SemanticVersioning.pdf
Re: svn commit: r1560611 - in /jackrabbit/oak/trunk: oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/mongomk/util/ oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/mongomk/ oak-jcr/sr
On Thu, Jan 23, 2014 at 2:35 PM, thom...@apache.org wrote: +mongo = new MongoClient(mongoURI); +db = mongo.getDB(mongoURI.getDatabase()); The database referred in the URI is used by Mongo for determining the user database against which credentials if passed needs to be validated. -- /database Optional. The name of the database to authenticate if the connection string includes authentication credentials in the form of username:password@. If /database is not specified and the connection string includes credentials, the driver will authenticate to the admin database. -- So far we do not make use of credentials so database field can be used. But better to manage it in a separate way. Chetan Mehrotra [1] http://docs.mongodb.org/manual/reference/connection-string/
Re: Package deployment check for locking leading to large no of small branch commits
The patch works great!! With patch applied I get similar timings as I got by disabling session refresh in LockOperation. Chetan Mehrotra On Wed, Dec 18, 2013 at 2:47 PM, Michael Dürig mdue...@apache.org wrote: On 18.12.13 10:12 , Marcel Reutegger wrote: We could rebase the branch and then rebase the in memory changes on top of the rebased branch. This would get us rid of the branch commit. But wouldn't get us rid of the rebase operation on the persisted branch. So I'm not too sure this will gain us a lot. hmm, you are right. this may still result in a commit on the branch for the rebase. it should be possible to avoid it when the rebase is actually a no-op because there were no changes by other sessions. I think this is already the case. See org.apache.jackrabbit.oak.spi.state.AbstractNodeStoreBranch.Persisted#rebase For the other branch states rebasing is already entirely in memory. I quickly hacked together a POC patch for above approach. Chetan agreed to do a quick check with that. Michael Regards Marcel
Re: Package deployment check for locking leading to large no of small branch commits
Created OAK-1294 to track this Chetan Mehrotra [1] https://issues.apache.org/jira/browse/OAK-1294 On Wed, Dec 18, 2013 at 3:15 PM, Chetan Mehrotra chetan.mehro...@gmail.com wrote: The patch works great!! With patch applied I get similar timings as I got by disabling session refresh in LockOperation. Chetan Mehrotra On Wed, Dec 18, 2013 at 2:47 PM, Michael Dürig mdue...@apache.org wrote: On 18.12.13 10:12 , Marcel Reutegger wrote: We could rebase the branch and then rebase the in memory changes on top of the rebased branch. This would get us rid of the branch commit. But wouldn't get us rid of the rebase operation on the persisted branch. So I'm not too sure this will gain us a lot. hmm, you are right. this may still result in a commit on the branch for the rebase. it should be possible to avoid it when the rebase is actually a no-op because there were no changes by other sessions. I think this is already the case. See org.apache.jackrabbit.oak.spi.state.AbstractNodeStoreBranch.Persisted#rebase For the other branch states rebasing is already entirely in memory. I quickly hacked together a POC patch for above approach. Chetan agreed to do a quick check with that. Michael Regards Marcel
Re: svn commit: r1547017 - in /jackrabbit/oak/trunk/oak-core/src: main/java/org/apache/jackrabbit/oak/plugins/mongomk/MongoNodeStore.java test/java/org/apache/jackrabbit/oak/plugins/mongomk/Background
Hi Marcel, Probably below code can be simplified using the Lists.partition(list,size) [1] -// update if this is the last path or -// revision is not equal to last revision -if (i + 1 = paths.size() || size == ids.size()) { +// call update if any of the following is true: +// - this is the last path +// - revision is not equal to last revision (size of ids didn't change) +// - the update limit is reached +if (i + 1 = paths.size() +|| size == ids.size() +|| ids.size() = BACKGROUND_MULTI_UPDATE_LIMIT) { store.update(Collection.NODES, ids, updateOp); for (String id : ids) { unsavedLastRevisions.remove(Utils.getPathFromId(id)); Chetan Mehrotra [1] http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/collect/Lists.html#partition(java.util.List, int)
Re: Running background operation on a single node in a Oak cluster
On Wed, Nov 20, 2013 at 11:41 PM, Jukka Zitting jukka.zitt...@gmail.com wrote: Yes, sounds like a good candidate. The only additional bit we'd need is a timestamp that allows indexers on other cluster nodes to automatically resume processing if an active indexing task dies for whatever reason without a chance to clear the flag. Implemented such a logic with OAK-1246 Chetan Mehrotra
Running background operation on a single node in a Oak cluster
Hi, Oak currently executes various task in background like Async Index Update, Blob store garbage collection etc. While running multiple instances of Oak in a cluster it would be desirable to run some job only on one node in a cluster particularly Async Index Update. Currently we schedule these jobs through WhiteboardUtils [1] . The default implementation schedules it using an executor. However when we run it in Sling based system the scheduling is handled via Sling Commons Scheduler. Sometime back it added support for running the task on only one node in a cluster [2]. This can be done by just adding an extra property with the service registration. Would it make sense to make use of this feature to run the Async Index Update only on one node?. It would also help in avoiding the conflict while concurrently updating the index related data in cluster. For non Sling based system we would let the owning system provide this support by providing right implementation of the WhiteBoard Chetan Mehrotra [1] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/spi/whiteboard/WhiteboardUtils.java#L31 [2] https://issues.apache.org/jira/browse/SLING-2979
Re: Segment Store and unbounded instances of MappedByteBuffer causing high RAM usage
On Tue, Nov 19, 2013 at 11:29 PM, Jukka Zitting jukka.zitt...@gmail.com wrote: Hmm, I wonder if we do have a problem here or if the system is just working as designed. Thats my thinking as well and hence wanted to discuss this on DL first. The system here was being subjected to some DAM related tests and quite a few binary files were added to it. The only cause of concern was that system become quite slow for user response. The GUI was very slow and accessing the Sling server running on Oak was also very slow. So we need to control some aspect probably. What aspect should be controlled I am not sure. I would keep an eye on this and if such a problem is again reported to me would try to collect more data and try to come up with a test for that. Chetan Mehrotra
Segment Store and unbounded instances of MappedByteBuffer causing high RAM usage
Hi, On some systems running on Oak (on Windows) where lots of nodes get created we are seeing high RAM usage. The Java process memory (shown directly under task manager. See note on Working Set below) remains within expected range but system starts paging memory quite a bit and slows down. On checking the runtime state following things can be observed * FileStore has 186 TarFile and each TarFile refers to 256MB MappedByteBuffer taking around 46 GB of memory * The JVM process constantly shows 50% CPU usage. Checking further indictes that AsynchUpdate is running and Lucene indexing is being performed * Windows Task Manager shows 7 .66 GB (total 8 GB) overall memory usage with Java process only showing 1.4GB usage. No other process account for so high usage * Checking the Working Set [1] shows 7 GB memory being used by Java process alone From my basic understanding of the Memory Mapped file usage it should not cause msuch resource crunch and OS would be able to manage the memory. However probably in Oak case all these TarFiles are getting accessed frequently which leads to these memory mapped pages held in physical memory. Should we put upper cap on number of TarFiles opened? If any other data needs to be collected to determine the problem cause further then let me know? Chetan Mehrotra [1] http://msdn.microsoft.com/en-us/library/windows/desktop/cc441804(v=vs.85).aspx
Re: Strategies around storing blobs in Mongo
To close this thread On Wed, Oct 30, 2013 at 7:52 PM, Jukka Zitting jukka.zitt...@gmail.com wrote: So AFAICT the worry about a write blocking all concurrent reads is unfounded unless it shows up in a benchmark. I tried to measure effect of such scenario in OAK-1153 [1] and from results obtained there does not appear to be much change if collections are managed in same database or different. So for now this is nothing to worry about. On a side note - No of reads and writes performed drop considerably when accessing a remote Mongo server. See results at [1] for more details Chetan Mehrotra [1] https://issues.apache.org/jira/browse/OAK-1153
Re: Strategies around storing blobs in Mongo
Open questions are, what is the write thoughput for one shard, does the write lock also block reads (I guess not), does the write As Ian mentioned above write locks block all reads. So even adding a 2 MB chunk on a sharded system over remote connection would block read for that complete duration. So at minimum we should be avoiding that. Chetan Mehrotra On Wed, Oct 30, 2013 at 2:40 PM, Ian Boston i...@tfd.co.uk wrote: On 30 October 2013 07:55, Thomas Mueller muel...@adobe.com wrote: Hi, as Mongo maintains a global exclusive write locks on a per database level I think this is not necessarily a huge problem. As far as I understand, it limits write concurrency within one shard only, so it does not block scalability. Open questions are, what is the write thoughput for one shard, does the write lock also block reads (I guess not), does the write lock cause high latency for other writes because binaries are big. This information would be extremely useful for all those looking to Oak to address use cases where the repository access is between 20 and 60% write. To answer one of your questions According to [1] write locks do block reads within the scope of the lock. Other information from [1]. Write locks are exclusive and global. Write locks block read locks being established. (and obviously read locks block write locks being established) Read locks are concurrent and shared. Pre 2.2 a write lock was scoped to the mongod process. Post 2.2 a write lock is scoped to the database within the mondod process. All locks are scoped to a shard. IIUC, the lock behaviour is identical to that in JR2 except for the scope. Ian 1 http://docs.mongodb.org/manual/faq/concurrency/#how-granular-are-locks-in-mongodb I think it would make sense to have a simple benchmark (concurrent writing / reading of binaries), so that we can test which strategy is best, and possibly play around with different strategies (split binaries into smaller / larger chunks, use different write concerns, use more shards,...). Regards, Thomas On 10/30/13 7:50 AM, Chetan Mehrotra chetan.mehro...@gmail.com wrote: Hi, Currently we are storing blobs by breaking them into small chunks and then storing those chunks in MongoDB as part of blobs collection. This approach would cause issues as Mongo maintains a global exclusive write locks on a per database level [1]. So even writing multiple small chunks of say 2 MB each would lead to write lock contention. Mongo also provides GridFS[2]. However it also uses a similar strategy like we are currently using and such a support is built into the Driver. For server they are just collection entries. So to minimize contentions for write locks for uses cases where big assets are being stored in Oak we can opt for following strategies 1. Store the blobs collection in a different database. As Mongo write locks [1] are taken per db level then storing the blobs in different db would allow the read/write of node data (majority usecase) to continue. 2. For more asset/binary heavy usecase use a separate database server itself to server the binaries. 3. Bring back the JR2 DataStore implementation and just save metadata related to binaries in Mongo. We already have S3 based implementation there and they would continue to work with Oak also Chetan Mehrotra [1] http://docs.mongodb.org/manual/faq/concurrency/#how-granular-are-locks-in- mongodb [2] http://docs.mongodb.org/manual/core/gridfs/
Re: Strategies around storing blobs in Mongo
sounds reasonable. what is the impact of such a design when it comes to map-reduce features? I was thinking that we could use it e.g. for garbage collection, but I don't know if this is still an option when data is spread across multiple databases. Would investigate that aspect further connecting to a second server would add quite some complexity to Yup. Option was just provided for completeness sake. And something like this would probably never be required. that was one of my initial thoughts as well, but I was wondering what the impact of such a deployment is on data store garbage collection. Probably we can make a shadow node for the binary in the blob collection and keep the binary content within the DataStore itself. Stuff like Garbage collection would be performed on the Shadow node and logic would use results from that to perform actual deletions. Chetan Mehrotra On Wed, Oct 30, 2013 at 1:13 PM, Marcel Reutegger mreut...@adobe.com wrote: Hi, Currently we are storing blobs by breaking them into small chunks and then storing those chunks in MongoDB as part of blobs collection. This approach would cause issues as Mongo maintains a global exclusive write locks on a per database level [1]. So even writing multiple small chunks of say 2 MB each would lead to write lock contention. so far we observed high lock content primarily when there are a lot of updates. inserts were not that big of a problem, because you can batch them. it would probably be good to have a test to see how big the impact is when blogs come into play. Mongo also provides GridFS[2]. However it also uses a similar strategy like we are currently using and such a support is built into the Driver. For server they are just collection entries. So to minimize contentions for write locks for uses cases where big assets are being stored in Oak we can opt for following strategies 1. Store the blobs collection in a different database. As Mongo write locks [1] are taken per db level then storing the blobs in different db would allow the read/write of node data (majority usecase) to continue. sounds reasonable. what is the impact of such a design when it comes to map-reduce features? I was thinking that we could use it e.g. for garbage collection, but I don't know if this is still an option when data is spread across multiple databases. 2. For more asset/binary heavy usecase use a separate database server itself to server the binaries. connecting to a second server would add quite some complexity to the system. wouldn't it be easier to just leverage standard mongodb sharding to distribute the load? 3. Bring back the JR2 DataStore implementation and just save metadata related to binaries in Mongo. We already have S3 based implementation there and they would continue to work with Oak also that was one of my initial thoughts as well, but I was wondering what the impact of such a deployment is on data store garbage collection. regards marcel Chetan Mehrotra [1] http://docs.mongodb.org/manual/faq/concurrency/#how-granular-are- locks-in-mongodb [2] http://docs.mongodb.org/manual/core/gridfs/
Re: [MongoMK] flag document with children
I have implemented the above logic as part of OAK-1117 [1]. With this in place number of call made to Mongo on restarts of Adobe CQ goes down from 42000 to 25000 significantly reducing the startup time when Mongo is remote!! regards Chetan [1] https://issues.apache.org/jira/browse/OAK-1117 Chetan Mehrotra On Thu, Oct 24, 2013 at 3:23 PM, Chetan Mehrotra chetan.mehro...@gmail.com wrote: I am trying to prototype an approach. Would come up with a patch for this soon. So far I was going with the reverse approach whereby when I fetch a node I retrieve some extra child rows [1] in same call to determine if it has any children. But given that number of read would far exceed number of writes it would be better to perform extra update call. I would try to come up with a patch for this regards Chetan [1] by adding an or clause to fetch node with id say ^2:/foo/.* to fetch child node for a parent with id 1:/foo. Chetan Mehrotra On Thu, Oct 24, 2013 at 3:08 PM, Thomas Mueller muel...@adobe.com wrote: Hi, Yes, you are right. It should be relatively easy to implement (low risk). Regards, Thomas On 10/24/13 10:12 AM, Marcel Reutegger mreut...@adobe.com wrote: The disadvantage is, when a node is added, either: - then the parent needs to be checked whether is already has this flag set (if it is in the cache), or I'd say a parent node is likely in the cache because oak will read it first before it is able to add a child. - the parent needs to be updated to set the flag that's correct. though you only have to do it when it isn't set already. and the check should be cheap in most cases, because the node is in the cache. regards marcel
Reduce number of calls from Oak to Mongo DB on restarts
Hi, Trying to restart an application like Adobe CQ running on Oak and Mongo DB on a remote system takes considerable amount of time. In a typical restart the number of such calls are around 23000 (Reduced from 42000 with OAK-1117). I am trying to analyze the nature of calls and also cache utilization to see if these can be reduced in OAK-1119. Seeing the various logs following things stand out * Number of queries made to fetch children are 18000 out of total 23000. Such queries also populate the doc cache hence number of explicit find queries for individual docs are quite low (~400) * The utilization of nodeChildrenCache is quite poor * Number of updates are low still we see cache entries for same path at diff revisions * Checking the entries of the nodeCache and nodeChildren cache which use path@revision key shows that there are quite a few entries at different revision for same path. Based on above we can look into following aspect A - caching strategy for nodeCache and nodeChildrenCache - I think current logic caches a node at the revision its is asked for and not at the revision it actually exist. For example if client ask for node at revision 100 for /foo/bar and actual latest valid revision for /foo/bar is 70 we still cache it at key /foo/bar@100. If a different client ask for version 105 and version of /foo/bar is still /70 we would make a new cache entry at 105 B - Using a persistent cache to manage restarts - No matter what we do we would still make large number of call on restart as complete state is managed in a remote system. Most of state we would read again would not have changed. So It might be better if we cache the doc cache (L3 cache, L2 might be off heap cache) in a persistent way say using MapDB [1] or H2 database. if we can found a valid node at given revision from L3 then we serve it from there. Upon restart we can check if modCount for document does not change (we can check for multiple doc using in clause) and then we server it from L3 C - Maintain an approximate estimate of childCounts in parent In addition we can make an approx estimate of childCount and eagerly fetch child node if number is small. We can possible make use of primaryType of node to make a better guess Thoughts? Chetan Mehrotra [1] http://www.mapdb.org/
Re: [MongoMK] flag document with children
I am trying to prototype an approach. Would come up with a patch for this soon. So far I was going with the reverse approach whereby when I fetch a node I retrieve some extra child rows [1] in same call to determine if it has any children. But given that number of read would far exceed number of writes it would be better to perform extra update call. I would try to come up with a patch for this regards Chetan [1] by adding an or clause to fetch node with id say ^2:/foo/.* to fetch child node for a parent with id 1:/foo. Chetan Mehrotra On Thu, Oct 24, 2013 at 3:08 PM, Thomas Mueller muel...@adobe.com wrote: Hi, Yes, you are right. It should be relatively easy to implement (low risk). Regards, Thomas On 10/24/13 10:12 AM, Marcel Reutegger mreut...@adobe.com wrote: The disadvantage is, when a node is added, either: - then the parent needs to be checked whether is already has this flag set (if it is in the cache), or I'd say a parent node is likely in the cache because oak will read it first before it is able to add a child. - the parent needs to be updated to set the flag that's correct. though you only have to do it when it isn't set already. and the check should be cheap in most cases, because the node is in the cache. regards marcel
Re: Oak JCR Observation scalability aspects and concerns
On Mon, Oct 21, 2013 at 6:47 PM, Jukka Zitting jukka.zitt...@gmail.com wrote: -1 This introduces the problem where a single JCR event listener can block or slow down all other listeners. That can be mitigated upto an extent by using some sort of Black List (OAK-1084). However current approach of each listener pulling in the diff at its own pace is more robust to handle such cases. I'm not convinced by the assumption here that the observation listeners put undue pressure on the underlying MK or its caching. Do we have some data to prove this point? My reasoning is that if in any case we have a single (potentially multiplexed as suggested) listener that wants to read all the changed nodes, then those nodes will still need to be accessed from the MK and placed in the cache. If another listener does the same thing, they'll most likely find the items in the cache and not repeat the MK accesses. The end result is that the main performance cost goes to the first listener and any additional ones will come mostly for free, thus the claimed performance benefit of multiplexing observers is IMHO questionable. Agreed (and also mentioned earlier) that current approach does cause multiple calls to MK as in most cases the NodeState would be found in the cache. However due the access pattern i.e. same node state being fetched multiple times such entries in cache would get higher priority and occupy memory which would otherwise would have been used to cache NodeState for *latest* revision. This is just an observation and I currently do not have any numbers which indicate that this would cause significant performance issue and further such things are hard to measure. Chetan Mehrotra
Re: Oak JCR Observation scalability aspects and concerns
On Mon, Oct 21, 2013 at 11:39 PM, Jukka Zitting jukka.zitt...@gmail.com wrote: 3) The Observer mechanism allows a listener to look at repository changes in variable granularity and frequency depending on application needs and current repository load. Thus an Oak Observer can potentially process orders of magnitude more changes than a JCR event listener that needs to look at each individual changed item. +1 I think in Sling case it would make sense for it to be implemented as an Observer. And I had a look at implementation of some of the listener implementations of [1] and I think they can be easily moved to Sling OSGi events Chetan Mehrotra [1] https://gist.github.com/chetanmeh/7081328/raw/listeners-list-filtered.txt
Oak JCR Observation scalability aspects and concerns
Duerig, Carsten Ziegeler, Chetan Mehrotra [1] https://gist.github.com/chetanmeh/7081328 [2] https://gist.github.com/chetanmeh/7081328/raw/listeners-list-filtered.txt [3] https://git.corp.adobe.com/gist/chetanm/863/raw/listerners-per-path.txt [4] https://cwiki.apache.org/confluence/display/SLING/Observation+usage+patterns
IndexEditor and Commit Failure
Hi, Currently the various IndexEditor (Lucene,Property and Solr) are invoked as part of CommitHook.processCommit call whenever a JCR Session is saved. In case the commit fails would it leave the Index state in inconsistent state? For PropertyIndex I think it would be fine as the index content is also part of same commit and hence would not be committed. But for other indexes the index data would have been saved (sort of 2 phase commit) and it would not be possible to roll them back leaving them with data which has not been committed. And more over such commit failure would occur *after * a proper commit has been done so the changes done to index state as part of failed commit would overwrite the changes done as part of successful commit. Should not the IndexEditor work as part of PostCommitHook so that they always work on proper committed content? Chetan Mehrotra
Re: IndexEditor and Commit Failure
The lucene index is asynchronous Okie I missed that part completely i.e. OAK-763. Yup with that used for such indexers this problem would not be observed. Thanks for the pointer Alex!! Chetan Mehrotra
Full text indexing with Solr
Hi, When Oak uses Solr then do we send the complete binary to Solr for full text indexing or we extract the content on Oak side and send the extracted content. And if send the complete binary content do we send it inline or it is first uploaded to Solr and reference to that is passed? Chetan Mehrotra
NodeType index returns cost as zero when count is zero
Hi, While trying to execute a query like select [jcr:path], [jcr:score], * from [nt:unstructured] as a where [sling:resourceType] = 'dam/smartcollection' and isdescendantnode(a, '/content/dam') where index does exist for sling:resourceType the explain shows that its using the NodeType index for jcr:primaryType [nt:unstructured] as [a] /* Filter(query=explain select [jcr:path], [jcr:score], * from [nt:unstructured] as a where [sling:resourceType] = 'dam/smartcollection' and isdescendantnode(a, '/content/dam') , path=/content/dam//*, property=[sling:resourceType=dam/smartcollection]) where ([a].[sling:resourceType] = cast('dam/smartcollection' as string)) and (isdescendantnode([a], [/content/dam])) */ The problem I think is the way cost is determined in [1]. Current implementation returns cost as zero if count returned by the IndexStoreStrategy is zero. This forces the query engine to use this index. I think it should return a value less than Double.POSITIVE_INFINITY but greater than zero probably MAX_COST to indicate that it can participate but there is some cost -return store.count(indexMeta, encode(value), MAX_COST); +long count = store.count(indexMeta, encode(value), MAX_COST); +return (count == 0) ? MAX_COST : count; With the logic changed above the plan changes to [nt:unstructured] as [a] /* property sling:resourceType=dam/smartcollection where ([a].[sling:resourceType] = cast('dam/smartcollection' as string)) and (isdescendantnode([a], [/content/dam])) */ Any pointers? Chetan Mehrotra [1] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/index/property/PropertyIndexLookup.java#L110
Re: Providing details with CommitFailedException and security considerations
The best thing is probably to entirely remove the message from the exception/logs and replace it with a text that explains how to find the conflict information (i.e. in the transient space). Not sure if that is always possible easily. As at times there are multiple layers involved and a user does not have access to the actual session. If the information is somehow available via logs it is far more accessible. So would turn it to a debug level log. regards Chetan Chetan Mehrotra On Thu, Sep 12, 2013 at 1:01 PM, Marcel Reutegger mreut...@adobe.com wrote: Hi, I would turn the log into a debug message, because it can be very confusing. conflicts happen during regular repository usage and shouldn't log warnings. if needed one could still enable debug logging to get more information on what happens during a conflict. regards marcel -Original Message- From: Michael Dürig [mailto:mdue...@apache.org] Sent: Donnerstag, 12. September 2013 09:15 To: oak-dev@jackrabbit.apache.org Subject: Re: Providing details with CommitFailedException and security considerations Hi Chetan, The best thing is probably to entirely remove the message from the exception/logs and replace it with a text that explains how to find the conflict information (i.e. in the transient space). Michael On 12.9.13 6:56 , Chetan Mehrotra wrote: Hi, As part of OAK-943 I had updated the ConflictValidator [1] to more more details around Commit Failure. However exposing such details as part of exception was considered risky from security aspect and it was decided to log a warning instead. Now in some cases the upper layer do expect a CommitFailedException have required logic to retry the commit in case of failure. In such cases these warning logs cause confusion. So not sure what is the best thing to do. Should I turn the log to debug level or make details part of exception message? Making it part of warn level would cause issue as such situations a not very repetative and users typically run system at INFO level. If I make it part of exception message is then max it would expose presence of some property names (not there values). And in most cases the exception is not exposed to end user and is logged to system logs. So probably we can make it part of exception message itself [1] https://github.com/apache/jackrabbit-oak/blob/trunk/oak- core/src/main/java/org/apache/jackrabbit/oak/plugins/commit/ConflictValid ator.java#L90 Chetan Mehrotra
Exception thrown in Item.save
Hi, After refreshing the build which has changes done in OAK-993 I am getting following exception. (Detailed one at [1]) -- Caused by: javax.jcr.UnsupportedRepositoryOperationException: OakUnsupported: Failed to save subtree at /content/usergenerated/content/geometrixx-outdoors/en/socialforum-vpktx/jcr:content/forum/1_ciot/wufg-init_topic_vpktx1. There are transient modifications outside that subtree. --- Now from OAK-993 I understand that such saves would cause issue. Can some background be provided in 1. What might be the issue cause 2. Is this a change in behavior from JR2 3. If as a user I see such issue then what should I change ... or what I am doing which is causing this issue [1] https://paste.apache.org/mAL7 Chetan Mehrotra