Re: Embedding Groovy in oak-run for Oak Shell (OAK-1805)

2014-05-22 Thread Chetan Mehrotra
On Fri, May 23, 2014 at 12:05 AM, Marcel Reutegger mreut...@adobe.com wrote:
 but the
 resulting jar file is indeed quite big. Do we really need
 all the jars we currently embed?

Yes currently we are embedding quite a few jars. Looking at oak-run I
see it embed following major types of deps
1. JR2 jars (required for benchmark and also upgrade)
2. Lucene 3.6.x jars for JR2
3. H2 and related DBCP jars for RDB
4. Oak jars
5. Logback/jopt etc required for standalone usage
6. Now groovy

 Alternatively we may
 want to consider a new module. E.g. oak-console with only
 the required jar files to run the console.

Might be better to go this way as we anyway have to start using Lucene
4.x to allow say a command to dump the Lucene directory content. Given
oak-run would be used for benchmark and upgrade it has to package Jr2
and Lucene 3.6.x. So for pure oak related feature set we might require
a new module.

Chetan Mehrotra


Re: My repository is not indexing PDFs, what am I missing?

2014-05-21 Thread Chetan Mehrotra
Hi Bertrand,

This might be due to OAK-1462. We had to disable the
LuceneIndexProvider form getting registered as OSGi service due to
handle case where LuceneIndexProvider was getting registered twice (1
default and other for Aggregate case). Would try to resolve this soon
by next week and then it should work fine
Chetan Mehrotra


On Wed, May 21, 2014 at 8:58 PM, Bertrand Delacretaz
bdelacre...@apache.org wrote:
 Hi,

 I'm upgrading the OakSlingRepositoryManager used for Sling tests to
 Oak 1.0, and it's not indexing PDFs anymore - it used to with oak 0.8.

 After uploading a text file to /tmp, the
 /jcr:root/foo//*[jcr:contains(.,'some word')] query finds it, but the
 same doesn't work with a PDF.

 My repository setup is in the OakSlingRepositoryManager [1] - am I
 missing something in there?

 -Bertrand

 [1] 
 https://svn.apache.org/repos/asf/sling/trunk/bundles/jcr/oak-server/src/main/java/org/apache/sling/oak/server/OakSlingRepositoryManager.java


Re: How to activate a SecurityProvider

2014-05-20 Thread Chetan Mehrotra
On Tue, May 20, 2014 at 7:36 PM, Galo Gimenez galo.gime...@gmail.com wrote:
 I am running an old version of Felix, maybe that is the problem?

Looks like you are using an old version of SCR. Try to run with more
recent version of SCR.

Chetan Mehrotra


Re: svn commit: r1587286 - in /jackrabbit/oak/trunk: oak-core/pom.xml oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStoreService.java oak-parent/pom.xml

2014-05-19 Thread Chetan Mehrotra
For the record I have implemented a DataSource provider bundle [2]
(based on above flow) as part of SLING-3574. That bundle can be used
to configure a DataSource in OSGi env

Chetan Mehrotra
[1] https://issues.apache.org/jira/browse/SLING-3574
[2] https://github.com/chetanmeh/sling-datasource

On Tue, Apr 15, 2014 at 12:30 PM, Chetan Mehrotra
chetan.mehro...@gmail.com wrote:
 Register a DataSource where?

 DataSource would be registered with OSGi ServiceRegistery

 Does this work in an OSGi context?

 Yes it should work in OSGi context. Would try to implement the
 approach by end of week if time permits

 How does it get the DataSource? Per JNDI?

 The DataSource would be obtained from OSGi service registry just like
 it currently obtains the BlobStore instance
 Chetan Mehrotra


 On Tue, Apr 15, 2014 at 12:00 PM, Julian Reschke julian.resc...@gmx.de 
 wrote:
 On 2014-04-15 06:10, Chetan Mehrotra wrote:

 Hi Julian,

 On Tue, Apr 15, 2014 at 12:39 AM,  resc...@apache.org wrote:

 -
 Embed-Dependencycommons-dbcp,commons-pool,h2,json-simple/Embed-Dependency
 +
 Embed-Dependencycommons-dbcp,commons-pool,h2,json-simple,postgresql,db2,db2-license/Embed-Dependency
   Embed-Transitivetrue/Embed-Transitive


 I believe this is a temporary change and would not be required for
 final implementation? Would be helpful if we add a TODO/FIXME there
 such that we remember to remove this later


 OK.


 Instead of embedding all such types of drivers/dbcp/pool etc within
 oak-core it would be better to decouple them. For example one approach
 can be

 1. Have a bundle which embeds common-dbcp and required dependencies.
 It would be responsible for registering a DataSource


 Register a DataSource where?


 2. Driver bundle would be fragments to the bundle #1 as host. With
 JDBC 4.0 the Driver classes are provided as part of
 META-INF/services/java.sql.Driver [1]. For such cases fragment bundles
 can be avoided by having #1 monitor for such drivers and register them
 programatically


 Does this work in an OSGi context?


 3. DocumentNodeStoreService should only have a reference to DataSource
 and use that


 How does it get the DataSource? Per JNDI?

 Best regards, Julian



Re: NodeStore and BlobStore configurations in OSGi

2014-05-19 Thread Chetan Mehrotra
On Mon, May 19, 2014 at 3:29 PM, Marc Pfaff pfa...@adobe.com wrote:
 For SegmentNodeStore my research
 ends up in FileBlobStore and for the DocumentNodeStore it appears to be
 the MongoBlobStore. Is that correct?

SegementNodeStore does not uses BlobStore by default. Instead all the
binary content is stored as part of segment data itself. Sofollowing
points can be noted for BlobStore

1. SegmentNodeStore does not require BlobStore by default
2. DocumentNodeStore uses MongoBlobStore by default
3. Both can be configured to use a BlobStore via OSGi config

* I was only able to find the FileBlobStoreService that registers the
 FileBlobStore as an OSGi service. I was not able to find more BlobStore
 implementations to be exposed in OSGi. Are there any more? And how about
 the MongoBlobStore in particular?

MongoBlobStore is not configured as an explicit service instead is
used as the default fallback option if no other BlobStore is
configured. As it just requires Mongo connection detail, it is
currently configured along with MongoDocumentStore in
DocumentNodeStore

There is AbstractDataStoreService which wraps JR2 DataStore as
BlobStore and configures and registers them with OSGi. it currently
support FileDataStore and S3DataStore.

Note that FileDataStore is currently preferred over FileBlobStore

 The DocumentNodeStoreService references the  same blob store service as
 the SegmentNodeStoreService. As I'm not able to find the MongoBlobStore
 exposed as service, does that mean the DocumentNodeStore uses the
 FileBlobStore

No. The MongoBlobStore is configured implicitly in
org.apache.jackrabbit.oak.plugins.document.DocumentMK.Builder#setMongoDB(com.mongodb.DB,
int). So unless a BlobStore is explicitly configured DocumentNodeStore
would use MongoBlobStore.

 Both, the SegmentNodeStoreService and the DocumentNodeStoreService
 appear to check for 'custom blob store' property but both components do
 not expose such a property? And how would they select from a specific
 BlobStore service?

Here there is an assumption that system has only one BlobStore
registered with OSGi Service Registry. If multiple BlobStore services
are registered then you can possibly specify a specific one by
configuring the 'blobStore.target' to the required OSGi service filter
(DS 112.6 of OSGi Compedium)

Chetan Mehrotra


Re: NodeStore and BlobStore configurations in OSGi

2014-05-19 Thread Chetan Mehrotra
On Mon, May 19, 2014 at 8:10 PM, Marc Pfaff pfa...@adobe.com wrote:
 SegmentNodeStore.getBlob() does not seem to be used
 when reading binaries through JCR

Yes when reading via JCR the read is handled via Blob itself i.e.
SegmentBlob in this case. A SegmentBlob gets created when the JCR
property access - SegmentNodeState.getProperty -
SegmentPropertyState.getValue -
SegmentPropertyState#getValue(Segment, RecordId, TypeT)

Here there is an assumption that system has only one BlobStore
registered with OSGi Service Registry. If multiple BlobStore services
are registered then you can possibly specify a specific one by
configuring the 'blobStore.target' to the required OSGi service filter
(DS 112.6 of OSGi Compedium)
 Assuming same is true for NodeStore services.

Did not get the query?

btw I have update the docs at [1] (update should reflect on github in
couple of hours)

Chetan Mehrotra
[1] 
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/blobstore.md


Re: How to activate a SecurityProvider

2014-05-19 Thread Chetan Mehrotra
SecurityProvider should get registered. Do you have the Felix running
with WebConsole. Whats the status of the
'org.apache.jackrabbit.oak.security.SecurityProviderImpl'
Chetan Mehrotra


On Sat, May 17, 2014 at 1:30 AM, Galo Gimenez galo.gime...@gmail.com wrote:
 Hello,


 I am setting up Oak on a Felix container , and the RepositoryManager
 reference to the SecurityProvider does not get satisfied, by looking at the
 documentation I do not see a way to fix this.

 I have noticed that the Sling project has a very different way to setup the
 repository, should I follow that model , or there is something I missing
 that makes the SecurityProvider service not to register.


 -- Galo


Lucene blob size different in trunk and 1.0 branch

2014-05-14 Thread Chetan Mehrotra
Hi,

As part of [1] the Lucene blob size was changed to 16kb (from 32 kb)
to ensure that Lucene blobs are not made part of FileDataStore when
SegmentMK is used. However this revision was not merged to 1.0 branch.

This miss also affects the caching logic in DataStore (OAK-1726) as
there it was assumed that Lucene blobs would be less than 16 kb hence
it only cached binaries upto 16 kb. However in trunk the Lucene blobs
are of size 32 kb which breaks this assumption and Lucene blobs would
not be cached in memory. This can be fixed via config setting
'maxCachedBinarySize'

Changing it to 16 now in 1.0 would cause upgrade issue.

So should the change be reverted in trunk?

Chetan Mehrotra
[1] http://svn.apache.org/viewvc?view=revisionrevision=r1587430
[2] 
http://svn.apache.org/viewvc/jackrabbit/oak/branches/1.0/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/OakDirectory.java?view=markup


Re: buildbot failure in ASF Buildbot on oak-trunk-win7

2014-05-13 Thread Chetan Mehrotra
Failure in ObservationTest


Tests run: 110, Failures: 1, Errors: 0, Skipped: 0, Time elapsed:
179.557 sec  FAILURE!
observationDispose[4](org.apache.jackrabbit.oak.jcr.observation.ObservationTest)
 Time elapsed: 7.138 sec   FAILURE!
java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:92)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertFalse(Assert.java:68)
at org.junit.Assert.assertFalse(Assert.java:79)
at 
org.apache.jackrabbit.oak.jcr.observation.ObservationTest.observationDispose(ObservationTest.java:467)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

Chetan Mehrotra


On Tue, May 13, 2014 at 12:10 PM,  build...@apache.org wrote:
 The Buildbot has detected a new failure on builder oak-trunk-win7 while 
 building ASF Buildbot.
 Full details are available at:
  http://ci.apache.org/builders/oak-trunk-win7/builds/67

 Buildbot URL: http://ci.apache.org/

 Buildslave for this Build: bb-win7

 Build Reason: scheduler
 Build Source Stamp: [branch jackrabbit/oak/trunk] 1594128
 Blamelist: chetanm

 BUILD FAILED: failed compile

 sincerely,
  -The Buildbot





Re: [VOTE] Release Apache Jackrabbit Oak 1.0.0

2014-05-12 Thread Chetan Mehrotra
[X] +1 Release this package as Apache Jackrabbit Oak 1.0.0

All tests passed and All checks ok.
Chetan Mehrotra


On Mon, May 12, 2014 at 2:15 PM, Davide Giannella
giannella.dav...@gmail.com wrote:
 [X] +1 Release this package as Apache Jackrabbit Oak 1.0.0
 Davide





Re: svn commit: r1560611 - in /jackrabbit/oak/trunk: oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/mongomk/util/ oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/mongomk/ oak-jcr/sr

2014-05-04 Thread Chetan Mehrotra
On Fri, Apr 25, 2014 at 11:04 PM, Jukka Zitting jukka.zitt...@gmail.com wrote:
 The credentials in any case need to be valid for the database that
 holds the repository, so I don't see why we couldn't use it for this
 purpose.

As per the doc the database name tells mongo driver the name of DB in
which user details are stored. Typically in SQL database the admin
user tables are managed in a dedicated schema. Probably similar scheme
is followed in Mongo side also.

Probably we can modify the logic to use the db name present as part of
url if no db name is explicitly provided via 'oak.mongo.db'


Chetan Mehrotra


Adding ProviderType and ConsumerType annotation to interfaces in exported packages

2014-04-24 Thread Chetan Mehrotra
As part of OAK-1741 I was changing the Version of exported packages to
1.0.0. Looking at the interfaces which are part of exported packages I
do not see usage of ConsumerType/ProviderType annotations [1]

In brief and simple terms the interfaces which are expected to be
implemented by users of Oak api (like
org.apache.jackrabbit.oak.plugins.observation.EventHandler) should be
marked with ConsumerType anotation. This enables bnd tool to generate
package import instructions with stricter range [1.0,1.1)

For all other interface which are supposed to be provided by Oak we
should mark them with ProviderType. This enables bnd to generate the
package import instructions with relaxed range [1.0,2) for our api
consumers. This would help us evolve the api in future easily.

Currently we are having following interfaces as part of exported
packages [2]. Looking at the list I believe most are of ProviderType
i.e. provided by Oak and not required by Oak API users.

Some like org.apache.jackrabbit.oak.plugins.observation.EventHandler
are of ConsumerType as we require the API users to implement them.

Should we add the required annotations for 1.0 release?

If yes then can team members look into the list and set the right type

Chetan Mehrotra
[1] 
https://github.com/osgi/design/raw/master/rfcs/rfc0197/rfc-0197-OSGiPackageTypeAnnotations.pdf

[2] 
https://issues.apache.org/jira/browse/OAK-1741?focusedCommentId=13979465#comment-13979465


Oak CI notifications not comming

2014-04-24 Thread Chetan Mehrotra
Hi,

I was checking the CI status for Oak trunk and it seems build are not
getting built at [1] and [2].

Do we have to enable it somehow?

Chetan Mehrotra
[1] https://travis-ci.org/apache/jackrabbit-oak/builds
[2] http://ci.apache.org/builders/oak-trunk/


Re: plugin.document not exported in OSGi bundle

2014-04-22 Thread Chetan Mehrotra
The preferred approach is to instantiate via OSGi configuration. So in
your OSGi create configuration for DocumentNodeStore for pid
'org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService'
[1]. This would activate the DocumentNodeStoreService [2] component
and that would register DocumentNodeStore against NodeStore interface

Chetan Mehrotra
[1] http://jackrabbit.apache.org/oak/docs/osgi_config.html#DocumentNodeStore
[2] 
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStoreService.java

On Tue, Apr 22, 2014 at 11:23 PM, Galo Gimenez galo.gime...@gmail.com wrote:
 Hello,

 I noticed org.apache.jackrabbit.oak.plugins.document.DocumentMK is not
 exported in the OSGi bundle, is there a way to get Oak with the DocumentMK
 instantiated in OSGi.


 -- Galo


Review currently exported package version for 1.0 release

2014-04-17 Thread Chetan Mehrotra
Hi Team,

As part of OAK-1741 [1] I have captured details about current exported
packages from various bundles provided as part of Oak.

Currently some packages are exported at 0.18, 0.16 and some are being
exported at bundle version.

Should we bump all of them to 1.0.0 for the 1.0 release and ensure
they are consistent from there on.

Also would helpful to review the list once i.e. if the package export
is required for e.g in oak-solr-osgi exports quite abit but are
probably not required.

Chetan Mehrotra
[1] https://issues.apache.org/jira/browse/OAK-1741


Re: Using Lucene indexes for property queries

2014-04-14 Thread Chetan Mehrotra
 Should we let the
user decide whether it's OK to use an asynchronous index for this case

+1 for that. It has been the case with JR2 (I may be wrong here). And
when user is searching for say some asset via DAM in Adobe CQ then he
would be ok if result is not for latest head. A small lag should be
acceptable. This would enable scenarios where traversal would be too
costly and Lucene can still be used to provide required results in a
lot lesser time.
Chetan Mehrotra


On Mon, Apr 14, 2014 at 2:33 PM, Thomas Mueller muel...@adobe.com wrote:
 Hi,

 In theory, the Lucene index could be used quite easily. As far as I see,
 we would only need to change the cost function of the Lucene index (return
 a reasonable cost even if there is no full-text constraint).

 One problem might be: the Lucene index is asynchronous, and the user might
 expect the result to be up-to-date. The user knows this already for
 full-text constraints, but not for property constraints. Should we let the
 user decide whether it's OK to use an asynchronous index for this case?
 For example by specifying an option in the query (for example similar to
 the order by, at the very end of the query, option async)? So a query
 that can use an asynchronous index would look like this:

   //*[@prop = 'x'] option async
 or
   //*[@prop = 'x'] order by @otherProperty option async
 or
   select [jcr:path] from [nt:base] as a where [prop]  1 option async


 Regards,
 Thomas






 On 14/04/14 06:54, Chetan Mehrotra chetan.mehro...@gmail.com wrote:

Hi,

In JR2 I believe Lucene was used for all types of queries and not only
for full text searches. In Oak we have our own PropertyIndexes for
handling queries involving constraints on properties. This I believe
provides a more accurate result as its built on top of mvcc support so
results obtained are consistent with session state/revision.

However this involves creating a index for property to be queried. And
the way currently property indexes are stored they consume quite a bit
of state (at least in DocumentNodeStore). In comparison Lucene stores
the index content in quite compact form.

In quite a few cases (like user choice based query builder) it might
not be known in advance which property the user would use. As we
already have all string property indexed in Lucene. Would it be
possible to use Lucene for performing such queries? Or allow the user
to choose which types of index he wants to use depending on the
usecase.

Chetan Mehrotra



Re: svn commit: r1587286 - in /jackrabbit/oak/trunk: oak-core/pom.xml oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStoreService.java oak-parent/pom.xml

2014-04-14 Thread Chetan Mehrotra
Hi Julian,

On Tue, Apr 15, 2014 at 12:39 AM,  resc...@apache.org wrote:
 -
 Embed-Dependencycommons-dbcp,commons-pool,h2,json-simple/Embed-Dependency
 +
 Embed-Dependencycommons-dbcp,commons-pool,h2,json-simple,postgresql,db2,db2-license/Embed-Dependency
  Embed-Transitivetrue/Embed-Transitive

I believe this is a temporary change and would not be required for
final implementation? Would be helpful if we add a TODO/FIXME there
such that we remember to remove this later

Instead of embedding all such types of drivers/dbcp/pool etc within
oak-core it would be better to decouple them. For example one approach
can be

1. Have a bundle which embeds common-dbcp and required dependencies.
It would be responsible for registering a DataSource
2. Driver bundle would be fragments to the bundle #1 as host. With
JDBC 4.0 the Driver classes are provided as part of
META-INF/services/java.sql.Driver [1]. For such cases fragment bundles
can be avoided by having #1 monitor for such drivers and register them
programatically
3. DocumentNodeStoreService should only have a reference to DataSource
and use that

Chetan Mehrotra
[1] http://docs.oracle.com/javase/7/docs/api/java/sql/DriverManager.html


Using Lucene indexes for property queries

2014-04-13 Thread Chetan Mehrotra
Hi,

In JR2 I believe Lucene was used for all types of queries and not only
for full text searches. In Oak we have our own PropertyIndexes for
handling queries involving constraints on properties. This I believe
provides a more accurate result as its built on top of mvcc support so
results obtained are consistent with session state/revision.

However this involves creating a index for property to be queried. And
the way currently property indexes are stored they consume quite a bit
of state (at least in DocumentNodeStore). In comparison Lucene stores
the index content in quite compact form.

In quite a few cases (like user choice based query builder) it might
not be known in advance which property the user would use. As we
already have all string property indexed in Lucene. Would it be
possible to use Lucene for performing such queries? Or allow the user
to choose which types of index he wants to use depending on the
usecase.

Chetan Mehrotra


Re: jackrabbit-oak build #4073: Errored

2014-04-10 Thread Chetan Mehrotra
 I'm sorry but your test run exceeded 50.0 minutes.

Build failure is due to TimeOut

Chetan Mehrotra


On Thu, Apr 10, 2014 at 11:49 AM, Travis CI ju...@apache.org wrote:
 Build Update for apache/jackrabbit-oak
 -

 Build: #4073
 Status: Errored

 Duration: 3002 seconds
 Commit: a653c0f168842a5d9b1de8072fdcc5f6d216ad12 (trunk)
 Author: Chetan Mehrotra
 Message: OAK-1716 - Enable passing of a execution context to runTest in multi 
 threaded runs

 Exposed a protected method `prepareThreadExecutionContext`
 which subclasses can override to return a context instance which would
 be used by that thread of execution

 git-svn-id: https://svn.apache.org/repos/asf/jackrabbit/oak/trunk@1586218 
 13f79535-47bb-0310-9956-ffa450edef68

 View the changeset: 
 https://github.com/apache/jackrabbit-oak/compare/2371ef73a4cd...a653c0f16884

 View the full build log and details: 
 https://travis-ci.org/apache/jackrabbit-oak/builds/22666739

 --
 sent by Jukka's Travis notification gateway


Re: Slow full text query performance and Lucene Index handling in Oak

2014-04-09 Thread Chetan Mehrotra
On Wed, Apr 9, 2014 at 12:25 PM, Marcel Reutegger mreut...@adobe.com wrote:
 Since the Lucene index is in any case updated asynchronously, it
 should be fine for us to ignore the base NodeState of the current
 session and instead use an IndexSearcher based on the last state as
 updated by the async indexer. This would allow us to reuse the
 IndexSearcher over multiple queries.

 I was also wondering if it makes sense to share it across multiple
 sessions performing a query to reduce the number of index readers
 that may be open at the same time. however, this will likely also
 reduce concurrency because we synchronize access to a single
 session.

I tried with one approach where I used a custom SerahcerManager based
on Lucene SearcherManager. It obtains the root NodeState directly from
NodeStore. As NodeStore can be accessed concurrently it should not
have any impact on session concurrency

With this change there is a slight improvement

Oak-Tar1  39  40  40  44
   641459
Oak-Tar(Shared)1  32  33  34  36
   611738

So did not gave much boost (at least with approach taken). As I do not
have much understanding of Lucene internal can someone review the
approach taken and see if there are some major issues with it


Chetan Mehrotra
[1] 
https://issues.apache.org/jira/secure/attachment/12639366/OAK-1702-shared-indexer.patch
[2] 
https://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/search/SearcherManager.html


Re: Slow full text query performance and Lucene Index handling in Oak

2014-04-09 Thread Chetan Mehrotra
On Wed, Apr 9, 2014 at 3:00 PM, Alex Parvulescu
alex.parvule...@gmail.com wrote:
  - the patch assumes that there is and will be a single lucene index
 directly under the root node, which may not necessarily be the case. I
 agree this assumption holds now, but I would not introduce any changes that
 take away this flexibility.

That is not a problem per se as IndexReader starts with a count of 1.
So it would never go zero

The problem appears to be somewhere else. As I modified the code to
use shared IndexSearcher and native FSDirectory and still the
performance improvement was marginal.

The problem is occuring because the
org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndex#query [1]
currently does a eager initialization of cursor while the testcase
only fetches the first result. Compared to this the JR2 version does a
lazy evaluation. If put a break in loop (exit after first result) the
results are much better

Oak-Tar(break.shared searcher,fs)  1   2   2   3   3
  170   23204
Oak-Tar(break) 1   5   5   5   6
   90   10593
Jackrabbit 1   4   4   5   6
  231   11385

Now I am not sure if this a problem with the usecase taken. Or the
Lucene Index cursor management should be improved as in many case the
results would be multiple but the client code only makes use of
initial few results

Chetan Mehrotra
[1] 
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndex.java#L381-L409


Re: Slow full text query performance and Lucene Index handling in Oak

2014-04-09 Thread Chetan Mehrotra
On Wed, Apr 9, 2014 at 5:14 PM, Jukka Zitting jukka.zitt...@gmail.com wrote:
 Is that a common use case? To better simulate a normal usage scenario
 I'd make the benchmark fetch up to N results (where N is configurable,
 with default something like 20) and access the path and the title
 property of the matching nodes.

I changed the logic of benchmark in http://svn.apache.org/r1585962.
With that JR2 slows down a bit

# FullTextSearchTest   C min 10% 50% 90%
  max   N
Oak-Tar1  34  35  36  39
   601639
Jackrabbit 1   5   5   6   7
   68   10038

Profiling the result shows that quite a bit of time goes in
org.apache.lucene.codecs.compressing.LZ4.decompress() (40%). This I
think is part of Lucene 4.x and not present in 3.x. Any idea if I can
disable compression?

Chetan Mehrotra


Re: Slow full text query performance and Lucene Index handling in Oak

2014-04-09 Thread Chetan Mehrotra
Current update

1. Tommaso provided a patch (OAK-1702) to disable compression and that
also helps quite a bit
2. Currently we are storing the full tokenized text in Lucene Index
[1]. This would cause fetching of doc fields to be slower. On
disabling the storage the number improve quite a bit. This was added
as part of OAK-319 for supporting MLT

# FullTextSearchTest   C min 10% 50% 90%
  max   N
Oak-Tar (codec)1   9   9  10  12
   415664
Oak-Tar (codec,mlt off)1   7   8   8  10
   216921

Would look further

Chetan Mehrotra
[1] 
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/FieldFactory.java#L44

On Wed, Apr 9, 2014 at 7:15 PM, Alex Parvulescu
alex.parvule...@gmail.com wrote:
 Aside from the compression issue, there was another one related to the
 'order by' clause. I saw Collections.sort taking up as far as 23% of the
 perf.

 I removed the order by temporarily so it doesn't get in the way of the
 Lucene stuff, but I think the QueryEngine should skip ordering results in
 this case.




 On Wed, Apr 9, 2014 at 3:31 PM, Tommaso Teofili
 tommaso.teof...@gmail.comwrote:

 I'm looking into the Lucene codecs right now.

 Tommaso


 2014-04-09 15:20 GMT+02:00 Alex Parvulescu alex.parvule...@gmail.com:

  Profiling the result shows that quite a bit of time goes in
  org.apache.lucene.codecs.compressing.LZ4.decompress() (40%). This I
  think is part of Lucene 4.x and not present in 3.x. Any idea if I can
  disable compression?
 
  +1 I noticed that too, we should try to disable compression and compare
  results.
 
  alex
 
 
  On Wed, Apr 9, 2014 at 3:16 PM, Chetan Mehrotra
  chetan.mehro...@gmail.comwrote:
 
   On Wed, Apr 9, 2014 at 5:14 PM, Jukka Zitting jukka.zitt...@gmail.com
 
   wrote:
Is that a common use case? To better simulate a normal usage scenario
I'd make the benchmark fetch up to N results (where N is
 configurable,
with default something like 20) and access the path and the title
property of the matching nodes.
  
   I changed the logic of benchmark in http://svn.apache.org/r1585962.
   With that JR2 slows down a bit
  
   # FullTextSearchTest   C min 10% 50% 90%
 max   N
   Oak-Tar1  34  35  36  39
  601639
   Jackrabbit 1   5   5   6   7
  68   10038
  
   Profiling the result shows that quite a bit of time goes in
   org.apache.lucene.codecs.compressing.LZ4.decompress() (40%). This I
   think is part of Lucene 4.x and not present in 3.x. Any idea if I can
   disable compression?
  
   Chetan Mehrotra
  
 



Slow full text query performance and Lucene Index handling in Oak

2014-04-08 Thread Chetan Mehrotra
Hi,

As part of OAK-1702 I have added a benchmark to compare the
performance of Full text query search with JR2

Based on approach taken (which might be wrong) I get following numbers

Apache Jackrabbit Oak 0.21.0-SNAPSHOT
# FullTextSearchTest   C min 10% 50% 90%
  max   N
Oak-Mongo  1  58  71 101 119
  287 610
Oak-Mongo-FDS  1  50  51  52  58
  1841106
Oak-Tar1  39  40  40  44
   641459
Oak-Tar-FDS1  53  54  55  64
  1971030
Jackrabbit 1   4   4   5   6
  231   11385

Which shows that JR2 performs lot better for full text queries and
subsequent queries are quite faster once Lucene has warmed up.

Looking at current usage of Lucene in Oak and the way we store and
access the Lucene indexes [2] I have couple of doubts

1. Multiple IndexSearcher instances - Current impl would create a new
IndexSearcher for every Lucene query as the OakDirectory uses is bound
to NodeState of executing JCR session. Compared to this in JR2 we
probably had a singleton IndexSearcher which was shared across all the
query execution path. This would potentially cause performance issue
as Lucene is effectively used in a state less way and it has to
perform initialization for every call. As [3] the IndexSearcher must
be shared

2. Index Access - Currently we have custom OakDirectory which provides
access to Lucene indexes stored in NodeStore. Even with SegmentStore
which has memory mapped file the random access used by Lucene would
probably be lot slower with OakDirectory in comparison to default
Lucene MMapDirectory. For small setups where Lucene index can be
accomodated on each node I think it would be better if the index is
access from file system

Are the above concerns valid and should we relook into how we are
using Lucene in Oak?

Chetan Mehrotra
[1] https://issues.apache.org/jira/browse/OAK-1702
[2] 
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/OakDirectory.java
[3] http://wiki.apache.org/lucene-java/ImproveSearchingSpeed


Re: svn commit: r1577449 - in /jackrabbit/oak/trunk/oak-core/src: main/java/org/apache/jackrabbit/oak/plugins/segment/ main/java/org/apache/jackrabbit/oak/plugins/segment/file/ main/java/org/apache/ja

2014-04-02 Thread Chetan Mehrotra
On Wed, Apr 2, 2014 at 11:36 AM, Jukka Zitting jukka.zitt...@gmail.com wrote:
 I consider this an unfortunate recent development.

Not sure. There are some deployment scenarios where a shared
FileDataStore is a must requirement and thus we need to support cases
where blobs can be stored separately from node data. Yes it adds to
the complexity of backup but then such a feature is required then that
cost has to be paid.

Default setups currently do not use FileDataStore or BlobStore with
SegmentNodeStore. So as per defaults original design is still honored.

Chetan Mehrotra


Re: svn commit: r1583285 - /jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/value/ValueImpl.java

2014-04-02 Thread Chetan Mehrotra
On Wed, Apr 2, 2014 at 12:18 PM, Jukka Zitting jukka.zitt...@gmail.com wrote:
 The getContentIdentity() method has a specific contract and the return
 value should generally not be interpreted as a referenceable
 identifier.

Ack.

 If you need a method that exposes the blobId, it would be best to add
 a separate method for that. But note that not all Blob implementations
 have a blobId like in BlobStoreBlob.

For now there is no strong requirement for that. If need arises would
follow up this way
Chetan Mehrotra


Re: Question regarding missing _lastRev recovery - OAK-1295

2014-04-02 Thread Chetan Mehrotra
 The lease time is set to 1 minute. Would it be ok to check this every
minute, from every node?

Adding to that the default time intervals are
- asyncDelay = 1 sec - The background operation are performed every 1
sec per cluster node. If nothing changes we would fire
1query/sec/cluster node to check the head revision

- cluster lease time = 1 min - This is the time after a cluster lease
would be renewed.

So we need to decide the time interval for Job for detecting recovery condition
Chetan Mehrotra


On Wed, Apr 2, 2014 at 4:31 PM, Amit Jain am...@ieee.org wrote:
 Hi,

 1) a cluster node starts up and sees it didn't shut down properly. I'm
 not
 sure this information is available, but remember we discussed this once.

 Yes, this case has been taken care of in the startup.

  this check could be done in the
 background operations thread on a regular basis. probably depending on
 the lease interval.

 The lease time is set to 1 minute. Would it be ok to check this every
 minute, from every node?

 Thanks
 Amit


 On Wed, Apr 2, 2014 at 4:14 PM, Marcel Reutegger mreut...@adobe.com wrote:

 Hi,

 I think the recovery should be triggered automatically by the system when:

 1) a cluster node starts up and sees it didn't shut down properly. I'm not
 sure this information is available, but remember we discussed this once.

 2) a cluster node sees a lease timeout of another cluster node and
 initiates
 the recovery for the failed cluster node. this check could be done in the
 background operations thread on a regular basis. probably depending on
 the lease interval.

 In addition it would probably also be useful to have the recovery operation
 available as a command in oak-run. that way you can manually trigger it
 from
 the command line. WDYT?

 Regards
  Marcel

  How do we expose _lastRev recovery operation? This would need to check
  all
  the cluster nodes info and run recovery for those nodes which need
  recovery.
 
  1. We either have a scheduled job which checks all the nodes and run the
  recovery. What should be the interval to trigger the job?
  2. Or if we want it run only when triggered manually, then expose an
  appropriate MBean.
 
 
  Thanks
  Amit



Re: svn commit: r1577449 - in /jackrabbit/oak/trunk/oak-core/src: main/java/org/apache/jackrabbit/oak/plugins/segment/ main/java/org/apache/jackrabbit/oak/plugins/segment/file/ main/java/org/apache/ja

2014-04-02 Thread Chetan Mehrotra
 @Chetan: why would the configs not be stored in the repo? I do not see how 
 this relates to non-OSGi environments

Well thats the basic config required to configure
DocumentNodeStore/SegmentNodeStore. These config cannot be stored as
content.

Other settings like security related config currently is not read from
NodeStore and in OSGi env is being provided by OSGi ConfigAdmin.

And more other settings like Index are currently stored as content
Chetan Mehrotra


On Wed, Apr 2, 2014 at 3:49 PM, Michael Marth mma...@adobe.com wrote:

 On 02 Apr 2014, at 08:06, Jukka Zitting 
 jukka.zitt...@gmail.commailto:jukka.zitt...@gmail.com wrote:

 That design gets broken if components
 start storing data separately in the repository folder.

 Agree with that design principle, but the (shared) file system DS is a valid 
 exception IMO (same for the S3 DS).

 Later we would probably store the config files when using Oak outside
 of std OSGi env like with PojoSR

 @Chetan: why would the configs not be stored in the repo? I do not see how 
 this relates to non-OSGi environments


Re: jackrabbit-oak build #3994: Broken

2014-04-02 Thread Chetan Mehrotra
Test case failure on oak-solr

Failed tests:
testOffsetAndLimit(org.apache.jackrabbit.core.query.LimitAndOffsetTest):
expected:1 but was:0
  
testOffsetAndLimitWithGetSize(org.apache.jackrabbit.core.query.LimitAndOffsetTest):
expected:2 but was:0
Chetan Mehrotra


On Wed, Apr 2, 2014 at 6:04 PM, Travis CI ju...@apache.org wrote:
 Build Update for apache/jackrabbit-oak
 -

 Build: #3994
 Status: Broken

 Duration: 2194 seconds
 Commit: 0e0a47ec387626e494a65dd143e3a25a3d004abe (trunk)
 Author: Julian Reschke
 Message: OAK-1533 - remove JDBC URL specific constructors from -core

 git-svn-id: https://svn.apache.org/repos/asf/jackrabbit/oak/trunk@1583981 
 13f79535-47bb-0310-9956-ffa450edef68

 View the changeset: 
 https://github.com/apache/jackrabbit-oak/compare/1402e7db17c0...0e0a47ec3876

 View the full build log and details: 
 https://travis-ci.org/apache/jackrabbit-oak/builds/22096647

 --
 sent by Jukka's Travis notification gateway


Re: svn commit: r1583994 - in /jackrabbit/oak/trunk/oak-core/src: main/java/org/apache/jackrabbit/oak/plugins/blob/datastore/OakFileDataStore.java test/java/org/apache/jackrabbit/oak/plugins/blob/data

2014-04-02 Thread Chetan Mehrotra
On Wed, Apr 2, 2014 at 6:30 PM, Jukka Zitting jukka.zitt...@gmail.com wrote:
 The inUse map is in FileDataStore for a reason.

Ack. From what I have understood from Blob GC logic in Oak is that it
relies on blob last modified value to distinguish between active used
blobs. So for performing GC only those blob would be considered whose
lastModified value is say 1 day. Only these blobs would be candidate
for deletion. This ensures that any blob created in transient space
are not considered for GC.

So current logic does make an assumption that 1 day is sufficient time
and hence not foolproof. However the current impl of inUse would
probably only work for a single node system and would fail for shared
DataStore scenario as its an in memory state and its hard to determine
inUse state for whole cluster. For supporting such case we would have
to rely on lastModified time interval to distinguish between active
used blobs

regards
Chetan

Chetan Mehrotra


Re: svn commit: r1583325 - in /jackrabbit/oak/trunk: oak-auth-external/pom.xml oak-core/pom.xml oak-jcr/pom.xml oak-mk-perf/pom.xml oak-mk/pom.xml oak-run/pom.xml oak-upgrade/pom.xml

2014-03-31 Thread Chetan Mehrotra
Might be simpler to define the version in oak-parent
Chetan Mehrotra


On Mon, Mar 31, 2014 at 6:57 PM,  resc...@apache.org wrote:
 Author: reschke
 Date: Mon Mar 31 13:27:46 2014
 New Revision: 1583325

 URL: http://svn.apache.org/r1583325
 Log:
 use the latest H2 DB throughout

 Modified:
 jackrabbit/oak/trunk/oak-auth-external/pom.xml
 jackrabbit/oak/trunk/oak-core/pom.xml
 jackrabbit/oak/trunk/oak-jcr/pom.xml
 jackrabbit/oak/trunk/oak-mk-perf/pom.xml
 jackrabbit/oak/trunk/oak-mk/pom.xml
 jackrabbit/oak/trunk/oak-run/pom.xml
 jackrabbit/oak/trunk/oak-upgrade/pom.xml

 Modified: jackrabbit/oak/trunk/oak-auth-external/pom.xml
 URL: 
 http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-auth-external/pom.xml?rev=1583325r1=1583324r2=1583325view=diff
 ==
 Binary files - no diff available.

 Modified: jackrabbit/oak/trunk/oak-core/pom.xml
 URL: 
 http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-core/pom.xml?rev=1583325r1=1583324r2=1583325view=diff
 ==
 --- jackrabbit/oak/trunk/oak-core/pom.xml (original)
 +++ jackrabbit/oak/trunk/oak-core/pom.xml Mon Mar 31 13:27:46 2014
 @@ -275,7 +275,7 @@
  dependency
groupIdcom.h2database/groupId
artifactIdh2/artifactId
 -  version1.3.158/version
 +  version1.3.175/version
optionaltrue/optional
  /dependency
  dependency

 Modified: jackrabbit/oak/trunk/oak-jcr/pom.xml
 URL: 
 http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-jcr/pom.xml?rev=1583325r1=1583324r2=1583325view=diff
 ==
 --- jackrabbit/oak/trunk/oak-jcr/pom.xml (original)
 +++ jackrabbit/oak/trunk/oak-jcr/pom.xml Mon Mar 31 13:27:46 2014
 @@ -294,7 +294,7 @@
  dependency
groupIdcom.h2database/groupId
artifactIdh2/artifactId
 -  version1.3.158/version
 +  version1.3.175/version
scopetest/scope
  /dependency
  dependency

 Modified: jackrabbit/oak/trunk/oak-mk-perf/pom.xml
 URL: 
 http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-mk-perf/pom.xml?rev=1583325r1=1583324r2=1583325view=diff
 ==
 --- jackrabbit/oak/trunk/oak-mk-perf/pom.xml (original)
 +++ jackrabbit/oak/trunk/oak-mk-perf/pom.xml Mon Mar 31 13:27:46 2014
 @@ -111,7 +111,7 @@
  dependency
  groupIdcom.h2database/groupId
  artifactIdh2/artifactId
 -version1.3.158/version
 +version1.3.175/version
  /dependency
  dependency
  groupIdcom.cedarsoft.commons/groupId

 Modified: jackrabbit/oak/trunk/oak-mk/pom.xml
 URL: 
 http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-mk/pom.xml?rev=1583325r1=1583324r2=1583325view=diff
 ==
 --- jackrabbit/oak/trunk/oak-mk/pom.xml (original)
 +++ jackrabbit/oak/trunk/oak-mk/pom.xml Mon Mar 31 13:27:46 2014
 @@ -114,7 +114,7 @@
  dependency
groupIdcom.h2database/groupId
artifactIdh2/artifactId
 -  version1.3.158/version
 +  version1.3.175/version
optionaltrue/optional
  /dependency


 Modified: jackrabbit/oak/trunk/oak-run/pom.xml
 URL: 
 http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-run/pom.xml?rev=1583325r1=1583324r2=1583325view=diff
 ==
 --- jackrabbit/oak/trunk/oak-run/pom.xml (original)
 +++ jackrabbit/oak/trunk/oak-run/pom.xml Mon Mar 31 13:27:46 2014
 @@ -142,7 +142,7 @@
  dependency
groupIdcom.h2database/groupId
artifactIdh2/artifactId
 -  version1.3.158/version
 +  version1.3.175/version
  /dependency
  dependency
groupIdorg.mongodb/groupId

 Modified: jackrabbit/oak/trunk/oak-upgrade/pom.xml
 URL: 
 http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-upgrade/pom.xml?rev=1583325r1=1583324r2=1583325view=diff
 ==
 --- jackrabbit/oak/trunk/oak-upgrade/pom.xml (original)
 +++ jackrabbit/oak/trunk/oak-upgrade/pom.xml Mon Mar 31 13:27:46 2014
 @@ -95,7 +95,7 @@
  dependency
groupIdcom.h2database/groupId
artifactIdh2/artifactId
 -  version1.3.158/version
 +  version1.3.175/version
scopetest/scope
  /dependency
  dependency




Re: svn commit: r1583325 - in /jackrabbit/oak/trunk: oak-auth-external/pom.xml oak-core/pom.xml oak-jcr/pom.xml oak-mk-perf/pom.xml oak-mk/pom.xml oak-run/pom.xml oak-upgrade/pom.xml

2014-03-31 Thread Chetan Mehrotra
You can define the version in dependencyManagement section and that
would be inherited by child projects. For e.g. there are entries for
junit, easymock etc. In child project you just define the groupId and
artifactId
Chetan Mehrotra


On Mon, Mar 31, 2014 at 7:22 PM, Julian Reschke julian.resc...@gmx.de wrote:
 On 2014-03-31 15:31, Chetan Mehrotra wrote:

 Might be simpler to define the version in oak-parent
 Chetan Mehrotra
 ...


 Likely. I quickly looked at oak-parent and couldn't see any test
 dependencies over there, so decided to lave it alone for now...

 Best regards, Julian


Re: svn commit: r1583285 - /jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/value/ValueImpl.java

2014-03-31 Thread Chetan Mehrotra
+1. I also missed getting a clean way to get blobId from Blob. So
adding this method would be useful in other cases also
Chetan Mehrotra


On Tue, Apr 1, 2014 at 8:05 AM, Jukka Zitting jukka.zitt...@gmail.com wrote:
 Hi,

 On Mon, Mar 31, 2014 at 3:25 PM, Michael Dürig mdue...@apache.org wrote:
 2nd try: http://svn.apache.org/r1583413

 That's more correct, but has horrible performance with any
 implementation (including BlobStoreBlob and SegmentBlob) that doesn't
 precompute the hash.

 As mentioned earlier, a better alternative would be to add an explicit
 method for this and let the implementations decide what the best
 identifier would be.

 For BlobStoreBlob that would likely be:

 public String getContentIdentity() {
 return blobId;
 }

 And for SegmentBlob:

 public String getContentIdentity() {
 return getRecordId().toString();
 }

 BR,

 Jukka Zitting


Re: [DocumentNodeStore] Clarify behaviour for Commit.getModified

2014-03-28 Thread Chetan Mehrotra
 I think the intention of the method is to return a value in
seconds with a five second resolution.

Makes sense. Change the logic to use seconds and also fixed method
names/constant to reflect that
Chetan Mehrotra


On Fri, Mar 28, 2014 at 3:28 PM, Marcel Reutegger mreut...@adobe.com wrote:
 Hi,

 the fix looks good, but why do you want to convert the
 seconds to milliseconds again at the end?

 I think the intention of the method is to return a value in
 seconds with a five second resolution.

 we definitively need to add javadoc :-/

 Regards
  Marcel

 -Original Message-
 From: Chetan Mehrotra [mailto:chetan.mehro...@gmail.com]
 Sent: Donnerstag, 27. März 2014 18:05
 To: oak-dev@jackrabbit.apache.org
 Subject: [DocumentNodeStore] Clarify behaviour for Commit.getModified

 Hi,

 Currently Commit.getModified has following impl

 -
 public static long getModified(long timestamp) {
 // 5 second resolution
 return timestamp / 1000 / 5;
  }
 -

 The result when treated as timestamp cause the time to set to 0 i.e. 1970

 I intend to fix this with (looking at comment)

 -
 public static long getModified(long timestamp) {
 long timeInSec = TimeUnit.MILLISECONDS.toSeconds(timestamp);
 timeInSec = timeInSec - timeInSec % 5;
 return TimeUnit.SECONDS.toMillis(timeInSec);
 }
 -

 Would that be correct approach?

 Chetan Mehrotrarted


Re: jackrabbit-oak build #3922: Errored

2014-03-28 Thread Chetan Mehrotra
Travis says


I'm sorry but your test run exceeded 50.0 minutes.

One possible solution is to split up your test run.
-


Chetan Mehrotra


On Fri, Mar 28, 2014 at 4:49 PM, Travis CI ju...@apache.org wrote:
 Build Update for apache/jackrabbit-oak
 -

 Build: #3922
 Status: Errored

 Duration: 3002 seconds
 Commit: 4690d169b8469689436d14e3cadfe8f56621f99f (trunk)
 Author: Marcel Reutegger
 Message: OAK-1341 - DocumentNodeStore: Implement revision garbage collection 
 (WIP)

 JavaDoc

 git-svn-id: https://svn.apache.org/repos/asf/jackrabbit/oak/trunk@1582676 
 13f79535-47bb-0310-9956-ffa450edef68

 View the changeset: 
 https://github.com/apache/jackrabbit-oak/compare/67f784241281...4690d169b846

 View the full build log and details: 
 https://travis-ci.org/apache/jackrabbit-oak/builds/21746766

 --
 sent by Jukka's Travis notification gateway


Remove SynchronizedDocumentStoreWrapper

2014-03-27 Thread Chetan Mehrotra
I see two similar classes

- 
org.apache.jackrabbit.oak.plugins.document.rdb.SynchronizedDocumentStoreWrapper
- 
org.apache.jackrabbit.oak.plugins.document.util.SynchronizingDocumentStoreWrapper

And these are not being used anywhere. Further I am not sure what
purpose it would serve. Should they be removed.

Further can we implement other wrapper via proxies as it increases
work if new methods are to be added to DocumentStore

Chetan Mehrotra


Re: Remove SynchronizedDocumentStoreWrapper

2014-03-27 Thread Chetan Mehrotra
On Thu, Mar 27, 2014 at 2:30 PM, Julian Reschke julian.resc...@gmx.de wrote:
 I created the first one, and use it occasionally for debugging. (it's
 similar to the Logging*Wrapper. Please do not remove.

Okie but there already one preset in Util. Are they different? If not
we should keep only one

 Example?

Something like
public class SynchronizedDocumentStoreWrapper2 {

public static DocumentStore create(DocumentStore documentStore){
return (DocumentStore) Proxy.newProxyInstance(
SynchronizedDocumentStoreWrapper2.class.getClassLoader(),
new Class[] {DocumentStore.class},
new DocumentStoreProxy(documentStore));
}

private static class DocumentStoreProxy implements InvocationHandler {
private final DocumentStore delegate;
private final Object lock = new Object();

private DocumentStoreProxy(DocumentStore delegate) {
this.delegate = delegate;
}

@Override
public Object invoke(Object proxy, Method method, Object[]
args) throws Throwable {
synchronized (lock){
return method.invoke(delegate, args);
}
}
}
}

Chetan Mehrotra


Re: Remove SynchronizedDocumentStoreWrapper

2014-03-27 Thread Chetan Mehrotra
On Thu, Mar 27, 2014 at 4:00 PM, Julian Reschke julian.resc...@gmx.de wrote:
 We can kill the one in rdb (I didn't see the other one when I added it).

Would do

Chetan Mehrotra


Re: AbstractBlobStoreTest

2014-03-27 Thread Chetan Mehrotra
On Thu, Mar 27, 2014 at 10:08 PM, Julian Reschke julian.resc...@gmx.de wrote:
 if there's a reason not to

It might effect test related to GC as GC logic would clean more than
expected set of blobs


Chetan Mehrotra


[DocumentNodeStore] Clarify behaviour for Commit.getModified

2014-03-27 Thread Chetan Mehrotra
Hi,

Currently Commit.getModified has following impl

-
public static long getModified(long timestamp) {
// 5 second resolution
return timestamp / 1000 / 5;
 }
-

The result when treated as timestamp cause the time to set to 0 i.e. 1970

I intend to fix this with (looking at comment)

-
public static long getModified(long timestamp) {
long timeInSec = TimeUnit.MILLISECONDS.toSeconds(timestamp);
timeInSec = timeInSec - timeInSec % 5;
return TimeUnit.SECONDS.toMillis(timeInSec);
}
-

Would that be correct approach?

Chetan Mehrotrarted


Re: jackrabbit-oak build #3838: Broken

2014-03-26 Thread Chetan Mehrotra
I fixed that issue yesterday but build is currently failing in rat check

Warning:  org.apache.xerces.parsers.SAXParser: Property
'http://www.oracle.com/xml/jaxp/properties/entityExpansionLimit' is
not recognized.

[INFO] Rat check: Summary of files. Unapproved: 1 unknown: 1
generated: 0 approved: 1031 licence.

[INFO] 

[INFO] Reactor Summary:

rat.txt points to two new files that get created under
oak-core/oaknodes.trace.db which look like H2 db files. Probably the
testcase need to be adjusted to create these files in target folder
Chetan Mehrotra


On Tue, Mar 25, 2014 at 6:20 PM, Chetan Mehrotra
chetan.mehro...@gmail.com wrote:
 On Tue, Mar 25, 2014 at 5:49 PM, Michael Dürig mdue...@apache.org wrote:
 CacheInvalidationIT


 Looking into it
 Chetan Mehrotra


Re: jackrabbit-oak build #3838: Broken

2014-03-26 Thread Chetan Mehrotra
Fixed the RDBDocumentStore to create file in target folder. However
current approach would cause issue in test env. Would start a separate
thread on that


Chetan Mehrotra


On Wed, Mar 26, 2014 at 12:52 PM, Chetan Mehrotra
chetan.mehro...@gmail.com wrote:
 I fixed that issue yesterday but build is currently failing in rat check

 Warning:  org.apache.xerces.parsers.SAXParser: Property
 'http://www.oracle.com/xml/jaxp/properties/entityExpansionLimit' is
 not recognized.

 [INFO] Rat check: Summary of files. Unapproved: 1 unknown: 1
 generated: 0 approved: 1031 licence.

 [INFO] 
 

 [INFO] Reactor Summary:

 rat.txt points to two new files that get created under
 oak-core/oaknodes.trace.db which look like H2 db files. Probably the
 testcase need to be adjusted to create these files in target folder
 Chetan Mehrotra


 On Tue, Mar 25, 2014 at 6:20 PM, Chetan Mehrotra
 chetan.mehro...@gmail.com wrote:
 On Tue, Mar 25, 2014 at 5:49 PM, Michael Dürig mdue...@apache.org wrote:
 CacheInvalidationIT


 Looking into it
 Chetan Mehrotra


Re: Request for feedback: OSGi Configuration for Query Limits (OAK-1571)

2014-03-26 Thread Chetan Mehrotra
Patch looks fine to me. Probably we can collapse QueryIndexProvider
and QueryEngineSettings into a single QueryEngineContext and pass that
along till Root.

 So: is it worth it to have the 100 KB source code overhead just to make 
 things configurable separately for each Oak instance?
I think there are couple of benefits
* Isolation between multiple oak instance running on same jvm (minor)

* It opens up possibility to have session specific settings. So later
we require say JR2
  compatible behaviour case for some session then those settings can
be overlayed with
  session attributes

* it allows to change the setting at runtime via gui as some of these
settings would not
   require repository restart and can effect the next query that gets
executed. That would
  be a major win

So this effort now would enable incremental improvements in
QueryEngine in future!

 The Whiteboard is per Oak instance, right?
For OSGi case yes
Chetan Mehrotra


On Wed, Mar 26, 2014 at 2:23 PM, Thomas Mueller muel...@adobe.com wrote:
 Hi,

 I'm trying to make some query settings (limits on the number of nodes read) 
 configurable via OSGi. So far, I have a patch of about 100 KB, and this is 
 just wiring together the components (no OSGi / Whiteboard so far).

 I wonder, is there an easier way to do it? With system properties, it's just 
 a few lines of code. The disadvantage is that all Oak instances in the same 
 JVM use the same settings, but with OSGi configuration I guess in reality 
 it's not much different. So: is it worth it to have the 100 KB source code 
 overhead just to make things configurable separately for each Oak instance? 
 If not, how could it be implemented? The Whiteboard is per Oak instance, 
 right?

 Regards,
 Thomas




Re: jackrabbit-oak build #3838: Broken

2014-03-25 Thread Chetan Mehrotra
On Tue, Mar 25, 2014 at 5:49 PM, Michael Dürig mdue...@apache.org wrote:
 CacheInvalidationIT


Looking into it
Chetan Mehrotra


Re: jackrabbit-oak build #3809: Broken

2014-03-24 Thread Chetan Mehrotra
My fault. Looking into it
Chetan Mehrotra


On Mon, Mar 24, 2014 at 11:58 AM, Travis CI ju...@apache.org wrote:
 Build Update for apache/jackrabbit-oak
 -

 Build: #3809
 Status: Broken

 Duration: 444 seconds
 Commit: afb6c5335b46067a3ea43ce69c987a46d9a3fd38 (trunk)
 Author: Chetan Mehrotra
 Message: OAK-1586 - Implement checkpoint support in DocumentNodeStore

 Initial implementation which stores the checkpoint data as part of NODES 
 collection

 -- Using Clock for determining current time to simplify testing
 -- Custom Clock can be specified via Builder

 git-svn-id: https://svn.apache.org/repos/asf/jackrabbit/oak/trunk@1580769 
 13f79535-47bb-0310-9956-ffa450edef68

 View the changeset: 
 https://github.com/apache/jackrabbit-oak/compare/6b79fda2e9f2...afb6c5335b46

 View the full build log and details: 
 https://travis-ci.org/apache/jackrabbit-oak/builds/21404793

 --
 sent by Jukka's Travis notification gateway


Re: friendly reminder about license headers

2014-03-20 Thread Chetan Mehrotra
Roger that!

For Java file IDE takes care of them. Probably we can just exclude
test/resources from rat plugin? Most of the missing headers are
reported in that probably
Chetan Mehrotra


On Thu, Mar 20, 2014 at 2:50 PM, Alex Parvulescu
alex.parvule...@gmail.com wrote:
 Yes boys and girls, files need licence headers!

 Please check new files before committing them, last 2 days I found 3
 occurrences, probably more than the entire last month put together.

 When in doubt, run your builds with the pedantic profile activated.
 (mvn clean install -PintegrationTesting,pedantic)


 your friendly release manager


Re: svn commit: r1578423 - in /jackrabbit/oak/trunk/oak-core: ./ src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb/

2014-03-18 Thread Chetan Mehrotra
It would be better if we make DataSource as part of Builder itself and
remove all such logic from RDB. Or better for testing purpose or even
in normal case the RDBDocumentStore be instantiated outside and passed
in Builder. Then  main code would not have to worry about creating
DataStource and handle all possible config options that comes with
that
Chetan Mehrotra


On Mon, Mar 17, 2014 at 8:38 PM,  resc...@apache.org wrote:
 Author: reschke
 Date: Mon Mar 17 15:08:55 2014
 New Revision: 1578423

 URL: http://svn.apache.org/r1578423
 Log:
 OAK-1533 - add dbcp BasicDataSource (WIP)

 Added:
 
 jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb/RDBDataSourceFactory.java
(with props)
 Modified:
 jackrabbit/oak/trunk/oak-core/pom.xml
 
 jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb/RDBBlobStore.java
 
 jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb/RDBDocumentStore.java

 Modified: jackrabbit/oak/trunk/oak-core/pom.xml
 URL: 
 http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-core/pom.xml?rev=1578423r1=1578422r2=1578423view=diff
 ==
 --- jackrabbit/oak/trunk/oak-core/pom.xml (original)
 +++ jackrabbit/oak/trunk/oak-core/pom.xml Mon Mar 17 15:08:55 2014
 @@ -276,6 +276,11 @@
version1.3.158/version
optionaltrue/optional
  /dependency
 +dependency
 +  groupIdcommons-dbcp/groupId
 +  artifactIdcommons-dbcp/artifactId
 +  version1.4/version
 +/dependency

  !-- Logging --
  dependency

 Modified: 
 jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb/RDBBlobStore.java
 URL: 
 http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb/RDBBlobStore.java?rev=1578423r1=1578422r2=1578423view=diff
 ==
 --- 
 jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb/RDBBlobStore.java
  (original)
 +++ 
 jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb/RDBBlobStore.java
  Mon Mar 17 15:08:55 2014
 @@ -19,7 +19,6 @@ package org.apache.jackrabbit.oak.plugin
  import java.io.Closeable;
  import java.io.IOException;
  import java.sql.Connection;
 -import java.sql.DriverManager;
  import java.sql.PreparedStatement;
  import java.sql.ResultSet;
  import java.sql.SQLException;
 @@ -29,14 +28,14 @@ import java.util.Iterator;

  import javax.sql.DataSource;

 -import com.google.common.collect.AbstractIterator;
 -
  import org.apache.jackrabbit.mk.api.MicroKernelException;
 -import org.apache.jackrabbit.oak.plugins.blob.CachingBlobStore;
  import org.apache.jackrabbit.oak.commons.StringUtils;
 +import org.apache.jackrabbit.oak.plugins.blob.CachingBlobStore;
  import org.slf4j.Logger;
  import org.slf4j.LoggerFactory;

 +import com.google.common.collect.AbstractIterator;
 +
  public class RDBBlobStore extends CachingBlobStore implements Closeable {
  /**
   * Creates a {@linkplain RDBBlobStore} instance using an embedded H2
 @@ -45,8 +44,8 @@ public class RDBBlobStore extends Cachin
  public RDBBlobStore() {
  try {
  String jdbcurl = jdbc:h2:mem:oaknodes;
 -Connection connection = DriverManager.getConnection(jdbcurl, 
 sa, );
 -initialize(connection);
 +DataSource ds = RDBDataSourceFactory.forJdbcUrl(jdbcurl, sa, 
 );
 +initialize(ds.getConnection());
  } catch (Exception ex) {
  throw new MicroKernelException(initializing RDB blob store, 
 ex);
  }
 @@ -58,8 +57,8 @@ public class RDBBlobStore extends Cachin
   */
  public RDBBlobStore(String jdbcurl, String username, String password) {
  try {
 -Connection connection = DriverManager.getConnection(jdbcurl, 
 username, password);
 -initialize(connection);
 +DataSource ds = RDBDataSourceFactory.forJdbcUrl(jdbcurl, 
 username, password);
 +initialize(ds.getConnection());
  } catch (Exception ex) {
  throw new MicroKernelException(initializing RDB blob store, 
 ex);
  }

 Added: 
 jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb/RDBDataSourceFactory.java
 URL: 
 http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb/RDBDataSourceFactory.java?rev=1578423view=auto
 ==
 --- 
 jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb/RDBDataSourceFactory.java
  (added)
 +++ 
 jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb

Re: failing oak build

2014-03-18 Thread Chetan Mehrotra
Looking into it.
Chetan Mehrotra


On Tue, Mar 18, 2014 at 2:10 PM, Alex Parvulescu
alex.parvule...@gmail.com wrote:
 Hi,

 The Oak build is failing

 Tests in error:
 testEmptyIdentifier(org.apache.jackrabbit.oak.plugins.document.blob.ds.MongoDataStoreBlobStoreTest):
 String index out of range: 2

 The 0.19 release is really close, it is really important to get the build
 stable again.


 best,
 alex


Re: svn commit: r1578943 - in /jackrabbit/oak/trunk/oak-core/src: main/java/org/apache/jackrabbit/oak/ main/java/org/apache/jackrabbit/oak/plugins/backup/ main/java/org/apache/jackrabbit/oak/plugins/b

2014-03-18 Thread Chetan Mehrotra
I have updated the ~/.subversion/config file with those setting but I
use git-svn for my local development. Do I need to tweak any git
specific setting for EOL handling? Currently my autocrlf = input in
~/.gitconfig
Chetan Mehrotra


On Tue, Mar 18, 2014 at 8:38 PM, Jukka Zitting jukka.zitt...@gmail.com wrote:
 Hi,

 On Tue, Mar 18, 2014 at 11:01 AM,  resc...@apache.org wrote:
 
 jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/RepositoryManager.java
(props changed)
 
 jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/backup/FileStoreBackupRestore.java
(props changed)
 
 jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/backup/FileStoreBackupRestoreMBean.java
(props changed)
 
 jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/blob/BlobGC.java
(props changed)
 
 jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/blob/BlobGCMBean.java
(props changed)
 
 jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/spi/state/RevisionGC.java
(props changed)
 
 jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/spi/state/RevisionGCMBean.java
(props changed)
 
 jackrabbit/oak/trunk/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/index/property/OrderDirectionEnumTest.java
(props changed)
 
 jackrabbit/oak/trunk/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/index/property/OrderedIndexCostTest.java
(props changed)
 
 jackrabbit/oak/trunk/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/index/property/ValuePathTupleTest.java
(props changed)
 
 jackrabbit/oak/trunk/oak-core/src/test/java/org/apache/jackrabbit/oak/spi/state/AbstractRebaseDiffTest.java
(props changed)

 People who added these files, please check your eol-style settings:
 http://www.apache.org/dev/svn-eol-style.txt

 BR,

 Jukka Zitting


Using DBCursor in MongoDocumentStore

2014-03-18 Thread Chetan Mehrotra
Hi,

I was looking into the support for Version GC (OAK-1341). For that I
had a look at BlobReferenceIterator#loadBatchQuery which currently
does a pagination over all Documents which have binaries. The DBCursor
returned by Mongo [1] implements the Iterator contract and would
lazily fetch the next batch.

So had two queries

1. Should we expose the Iterator as part of DocumentStore API and rely
on Mongo to paginate. Probably the ResultSet in DB case can also
easily Iterator pattern

2. Cache Handling - Currently fetching document in such pattern for GC
handling would also overflow the cache. So should we pass an extra
flag to not cache docs from such calls

Chetan Mehrotra
[1] http://api.mongodb.org/java/2.0/com/mongodb/DBCursor.html


Build currently failing in SegmentMicroKernelFixture

2014-03-14 Thread Chetan Mehrotra
Hi Team,

I was trying to run the IT test locally before I push in my changes
related to DataStore. However it seems that testcase are currently
failing even without my change.

- Apache CI - http://ci.apache.org/builders/oak-trunk/ - There has
been no build post rev 1576949 (Current 1577399)
- Travis CI - https://travis-ci.org/apache/jackrabbit-oak/builds
Failing due to updated JR SNAPSHOTs for 2.8 are not present causing
compilation failure

On running the test locally following tests fails for me

Tests run: 116, Failures: 1, Errors: 0, Skipped: 0, Time elapsed:
10.488 sec  FAILURE!
testLargeBlob[1](org.apache.jackrabbit.mk.test.MicroKernelIT)  Time
elapsed: 2.589 sec   FAILURE!
java.lang.AssertionError: data does not match expected:24 but was:48
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at 
org.apache.jackrabbit.mk.test.MicroKernelIT.testBlobs(MicroKernelIT.java:1382)

On debugging it seems to be a failure in SegmentMicroKernelFixture and
comes only when SegmentNodeStore is used with MemoryStore

Chetan Mehrotra


Re: Using other JVM supported language like Groovy for testcase

2014-03-14 Thread Chetan Mehrotra
On Fri, Mar 14, 2014 at 1:36 PM, Davide Giannella
giannella.dav...@gmail.com wrote:
 Personally I don't mind but it would complicate the life of someone who
 doesn't know groovy and has to maintain the code later on.

I understand the concern here. However Groovy is much more easier to
understand for a Java developer compared to other languages like
Scala. Also for now I am proposing to use that in a small module like
oak-pojosr as an experiment.

After some time we can weigh the benefits and then decide if wider
usage should be encouraged or not.

Chetan Mehrotra


Review of patches related to BlobStore related work

2014-03-13 Thread Chetan Mehrotra
Hi,

As part of work related to enabling usage of custom BlobStore
implementations I have updated broken down patches to

* OAK-1502 - Make DataStores available to NodeStores [1]
* OAK-805 - Support for existing Jackrabbit 2.x DataStores [2]
* OAK-1333 - SegmentMK: Support for Blobs in external storage [3]

To see full changes together (to get a better picture) you can have a
look at Github fork [4] and the last 3 commits there.

As these changes need to be part of next release (by 13 March), it
would be helpful if it can be reviewed soon!.

Chetan Mehrotra
[1] https://issues.apache.org/jira/browse/OAK-1502
[2] https://issues.apache.org/jira/browse/OAK-805
[3] https://issues.apache.org/jira/browse/OAK-1333
[4] https://github.com/chetanmeh/jackrabbit-oak/commits/OAK-1502


Using other JVM supported language like Groovy for testcase

2014-03-13 Thread Chetan Mehrotra
Just testing the waters here :)

I need to write couple of integration testcase related to checking
OSGi configuration support (OAK-1502) in oak-pojosr module. It would
be much much more faster and convenient for me if I can write these
testcases in groovy.

So would it be fine if I can use Groovy language for writing testcase
only in oak-pojosr module?

Chetan Mehrotra


Re: [DISCUSS] - OSGi deployment for Oak Lucene / Solr indexers

2014-03-12 Thread Chetan Mehrotra
Couple of things to try

* Specify the packages versions via package-info
* Inline the classes instead of embedding the jars

This would enable maven-bundle-plugin to see required
package-info.java file for versions and also the SCR generated files.

Also can you share your project say on github. Would be easier for me
to try some options
Chetan Mehrotra


On Wed, Mar 12, 2014 at 3:55 PM, Tommaso Teofili
tommaso.teof...@gmail.com wrote:
 update on this:
 I've tried the oak-fulltext approach and I found two issues:
 1. exported packages with semantic versioning from oak-lucene and oak-solr
 get lost when packing everything together unless they're explicitly
 specified (by hand) in the oak-fulltext maven-bundle-plugin configuration,
 it can be done but can be tedious (and it's error prone)
 2. OSGi services exported by oak-lucene and oak-solr don't get exported by
 oak-fulltext as maven-scr-plugin can look into src/main/java or classes but
 don't know if / how it could work with embedded jars.

 Any suggestions?
 Regards,
 Tommaso



 2014-03-11 9:00 GMT+01:00 Tommaso Teofili tommaso.teof...@gmail.com:

 if there're no other objections / comments I'll go with the last suggested
 approach of having oak-lucene and oak-solr not embedding anything and
 having the oak-fulltext bundle embedding everything needed to make Lucene
 and Solr indexers working in OSGi (lucene-*, oak-lucene, solr-*,
 oak-solr-*, etc.) until we (eventually) get to proper semantic versioning
 in Lucene / Solr.

 As a side effect I don't think it would make sense to keep
 oak-solr-embedded and oak-solr-remote as separate artifacts so I'd merge
 them with oak-solr-core in one oak-solr bundle.

 Regards,
 Tommaso


 2014-03-10 18:18 GMT+01:00 Tommaso Teofili tommaso.teof...@gmail.com:

 ah ok, thanks for clarifying.
 Regards,
 Tommaso


 2014-03-10 18:10 GMT+01:00 Jukka Zitting jukka.zitt...@gmail.com:

 Hi,

 On Mon, Mar 10, 2014 at 1:01 PM, Tommaso Teofili
 tommaso.teof...@gmail.com wrote:
  ok, so (in OSGi env) we would have oak-solr and oak-fulltext as
 fragments
  of oak-lucene (being the fragment host)

 No, that's not what I meant. The proposed oak-fulltext bundle would
 contain all of oak-lucene, oak-solr, and the Lucene/Solr dependencies.
 No need for fragment bundles in this case.

 BR,

 Jukka Zitting






Re: buildbot failure in ASF Buildbot on oak-trunk

2014-03-11 Thread Chetan Mehrotra
There were couple of different commits and the issues highlighted in
build failure have been addressed in later CL. The testcase passed on
local setup. Would wait for further reports
Chetan Mehrotra


On Tue, Mar 11, 2014 at 1:17 PM,  build...@apache.org wrote:
 The Buildbot has detected a new failure on builder oak-trunk while building 
 ASF Buildbot.
 Full details are available at:
  http://ci.apache.org/builders/oak-trunk/builds/4628

 Buildbot URL: http://ci.apache.org/

 Buildslave for this Build: osiris_ubuntu

 Build Reason: scheduler
 Build Source Stamp: [branch jackrabbit/oak/trunk] 1576207
 Blamelist: chetanm

 BUILD FAILED: failed compile

 sincerely,
  -The Buildbot





Re: svn commit: r1576236 - /jackrabbit/oak/trunk/oak-pojosr/pom.xml

2014-03-11 Thread Chetan Mehrotra
Thanks Julian! Missed that part while porting the project from github
Chetan Mehrotra


On Tue, Mar 11, 2014 at 2:32 PM,  resc...@apache.org wrote:
 Author: reschke
 Date: Tue Mar 11 09:02:43 2014
 New Revision: 1576236

 URL: http://svn.apache.org/r1576236
 Log:
 OAK-1522 - fix POM

 Modified:
 jackrabbit/oak/trunk/oak-pojosr/pom.xml

 Modified: jackrabbit/oak/trunk/oak-pojosr/pom.xml
 URL: 
 http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-pojosr/pom.xml?rev=1576236r1=1576235r2=1576236view=diff
 ==
 --- jackrabbit/oak/trunk/oak-pojosr/pom.xml (original)
 +++ jackrabbit/oak/trunk/oak-pojosr/pom.xml Tue Mar 11 09:02:43 2014
 @@ -24,6 +24,7 @@
  groupIdorg.apache.jackrabbit/groupId
  artifactIdoak-parent/artifactId
  version0.19-SNAPSHOT/version
 +relativePath../oak-parent/pom.xml/relativePath
/parent

artifactIdoak-pojosr/artifactId




Re: [DISCUSS] - OSGi deployment for Oak Lucene / Solr indexers

2014-03-10 Thread Chetan Mehrotra
My vote would be to go for #3

move the OSGi services we need for Solr in Oak into
oak-solr-osgi (as a fragment cannot run OSGi components/services)

Need not be. Host bundle can allow DS components to be picked from
Fragment bundle. So the required logic can live in respective fragment
bundle.

However I would prefer if its done in following way

A - oak-lucene - Becomes the host bundle. Assuming its always
required. Though some of its services might not be required in call
cases
B - oak-solr-* - Remain independent but fragment bundles. However they
do not embed any jars
C - lucene-fragment - This fragment bundle embed the lucene related jars
D - solr-fragment - This fragment embed the solr related jars

All the fragment specify the oak-lucene as host bundle

Reason for not embedding the Lucene and Solr related dependencies in
Oak bundles is to enable usage of same bundle jars in non OSGi env
without adding to size. As embedded jars (which are not inlined) would
not be usable in non OSGi env a user would have to add such jars in
addition to the one embedded thus adding to size

Chetan Mehrotra


On Mon, Mar 10, 2014 at 2:37 PM, Tommaso Teofili
tommaso.teof...@gmail.com wrote:
 Hi all,

 The main issue we currently have with our full text indexers is that Lucene
 and Solr are not OSGi ready, missing proper MANIFEST entries so that they
 cannot directly be installed in e.g. Apache Felix, don't have semantic
 version information, etc.

 Currently oak-lucene embeds the Lucene dependencies it needs, also
 exporting its Lucene packages so that they can be used by oak-solr-core (as
 they're needed by Solr) which exposes its embedded Solr packages so that
 oak-solr-[embedded|remote] can use them.

 While this should work, there are some concerns raised in OAK-1442 [2]:
 - one issue is that with the current approach we have a problem if a Lucene
 / Solr package changes in a backward incompatible way while we don't
 properly upgrade the semantic version of the Oak package(s) using that,
 which in the end would mean that we in Oak would have to track changes in
 Lucene / Solr and I think I can assume we don't want to
 - the other issue relates more to how we package such Lucene and Solr
 artifacts to use them, as the suggestion is to just wrap o.a.lucene-* and
 o.a.solr-* in two wrapping bundles which can be installed inside an OSGi
 container.

 in OAK-1475 [1] we discussed a bit some different approaches for the
 deployment of Oak Lucene and Solr indexers in an OSGi environment,
 currently we have the following options:

 1. package oak-lucene and oak-solr-* in a single bundle (e.g. called
 oak-fulltext), with their Lucene and Solr dependencies embedded, the Lucene
 indexer OSGi services would be already available, the Solr ones would need
 to be configured in order to start the Solr indexer.
 2. package and export Lucene stuff in a oak-lucene-osgi bundle, package and
 export Solr stuff in oak-solr-osgi bundle, avoid oak-lucene and
 oak-solr-core to export Lucene and Solr packages.
 3. merge oak-solr-* in a single oak-solr bundle which embeds the Solr
 dependencies (but doesn't export them) to be a fragment of oak-lucene (so
 that they share the classloader and therefore oak-solr can use Lucene stuff
 in oak-lucene), move the OSGi services we need for Solr in Oak into
 oak-solr-osgi (as a fragment cannot run OSGi components/services)

 Not yet discussed options:

 4. remove the exported contents from oak-lucene and oak-solr, merge
 oak-solr-* together and duplicate the Lucene dependencies to be embedded in
 oak-solr.

 Options 1 and 4 are the simplest ones, the only disadvantage is packaging
 is heavy (in 4 we have duplicated embedded dependencies for Lucene in
 oak-lucene and oak-solr)
 Option 2 is the most OSGi oriented, even if has the unlikely, but yet
 possible, issue with semantic versioning.
 Option 3 is the smartest and trickiest one, no duplication of dependencies,
 full OSGi, but a bit more complicated as it uses OSGi fragments.

 My preferences go to 3 and 4.

 As this should be fixed for 0.19 please share your comments,
 Regards,
 Tommaso

 [1] : https://issues.apache.org/jira/browse/OAK-1475
 [2] :
 https://issues.apache.org/jira/browse/OAK-1442?focusedCommentId=13908472page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13908472


Re: Dependency on json-simple in oak-core

2014-03-10 Thread Chetan Mehrotra
Thanks Julian. Created OAK-1521 to track this
Chetan Mehrotra


On Mon, Mar 10, 2014 at 1:07 PM, Julian Reschke julian.resc...@gmx.de wrote:
 On 2014-03-10 05:06, Chetan Mehrotra wrote:

 Currently oak-core has a dependency on
 com.googlecode.json-simple:json-simple:1.1.1 jar. Looking at compile
 time usage I see it being referred only from RDBDocumentStore

 Would it be possible to use classes from
 'org.apache.jackrabbit.oak.commons.json' package and remove the
 required dependency on this library

 Chetan Mehrotra


 Yes, I can have a look.




Re: Using Oak with PojoSR - Make use of OSGi features in POJO env

2014-03-10 Thread Chetan Mehrotra
Created https://issues.apache.org/jira/browse/OAK-1522 to track this.
Kindly review the patch
Chetan Mehrotra


On Wed, Mar 5, 2014 at 1:24 AM, Jukka Zitting jukka.zitt...@gmail.com wrote:
 Hi,

 On Tue, Mar 4, 2014 at 3:11 AM, Chetan Mehrotra
 chetan.mehro...@gmail.com wrote:
 1. Configure components - Currently various sub modules are using
 different ways to configure. Some is being done via system properties,
 Security module has its own way of configuring, DataStore support has
 its own

 That's why I brought up the new DataStore config mechanism as somewhat
 troublesome. Security configuration has the same problem.

 System properties are currently used in places where a quick and
 simple way is needed to tweak options during testing. If a particular
 setting needs to be available in production, it should be made
 configurable through a setter or a constructor argument. Then it is
 easy to control the setting through OSGi service properties or by
 whichever framework or configuration mechanism is being used.

 2. Expose extension points - Exposing extension points in non OSGi env
 becomes tricky. We faced some issues around LoginModule extensiblity
 earlier

 We have plenty of extension points (CommitHook, QueryIndexProvider,
 etc.) that work just fine. The problem with LoginModules is caused by
 the tricky way JAAS works, not by any inherent flaw in our extension
 mechanism.

 PojoSR [1] provides support for using OSGi constructs outside of OSGi 
 framework

 I have implemented a POC [2] which makes use of PojoSR to configure
 Oak in pojo env. So far basic stuff seems to work as expected.

 Would be worthwhile to investigate further here and possibly provide a
 PojoSR based RespositoryFactory?

 Looks nice! Having alternative ways of configuring and running Oak is
 useful, as it helps us spot problems like the one you brought up about
 the security configuration.

 BR,

 Jukka Zitting


Re: Dependency on json-simple in oak-core

2014-03-10 Thread Chetan Mehrotra
 and org.apache.jackrabbit.oak.commons.json does not only seem to introduce a 
 JSOP dependency (aren't we getting rid of that?), but also the JSON support 
 doesn't appear to support data types... (JsonObject.getProperties returns a 
 String/String map).

In that case can we use the JSON library from json.org as thats widely
used in projects like Sling Felix WebConsole already. So we would not
be adding a new dependency there
Chetan Mehrotra


On Mon, Mar 10, 2014 at 6:23 PM, Julian Reschke julian.resc...@gmx.de wrote:
 On 2014-03-10 10:59, Chetan Mehrotra wrote:

 Thanks Julian. Created OAK-1521 to track this


 I have looked at this, and org.apache.jackrabbit.oak.commons.json does not
 only seem to introduce a JSOP dependency (aren't we getting rid of that?),
 but also the JSON support doesn't appear to support data types...
 (JsonObject.getProperties returns a String/String map).

 Best regards, Julian



Queries related to various BlobStore implementations

2014-03-10 Thread Chetan Mehrotra
Currently we have following implementations of BlobStore

1. org.apache.jackrabbit.oak.plugins.blob.db.DbBlobStore
2. org.apache.jackrabbit.oak.plugins.blob.cloud.CloudBlobStore
3. org.apache.jackrabbit.oak.spi.blob.FileBlobStore
4. org.apache.jackrabbit.oak.spi.blob.MemoryBlobStore
5. org.apache.jackrabbit.oak.plugins.document.mongo.MongoBlobStore
6. org.apache.jackrabbit.oak.plugins.document.mongo.gridfs.MongoGridFSBlobStore
7. org.apache.jackrabbit.oak.plugins.blob.datastore.DataStoreBlobStore
8. org.apache.jackrabbit.oak.plugins.document.rdb.RDBBlobStore

And then for DataStore we have

a) org.apache.jackrabbit.core.data.db.DbDataStore
b) org.apache.jackrabbit.aws.ext.ds.S3DataStore
c) org.apache.jackrabbit.core.data.FileDataStore

Now based on above list couple of Queries

Q1 - What is the difference between RDBBlobStore and DbBlobStore?
Should we have only one implementation for storage in DataBase

Q2 - For a system which is getting upgraded from JR2 to Oak. Would
2.1 It continue to use its existing DataStore implementation.
2.2 Migrate all the content first then switch to one of the BlobStore
implementations
2.3 Both DataStore and BlobStore would be used together

Q3. Just for the record and to confirm we would be preferring
S3DataStore over CloudBlobStore for now

Q4. Should we remove some of the unused BlobStore impl like
MongoGridFSBlobStore. They can be resurrected back if need is felt for
them

Chetan Mehrotra


Re: Dependency on json-simple in oak-core

2014-03-10 Thread Chetan Mehrotra
Looking at its widespread usage I think its correct and maintained. So
+1 for json.org
Chetan Mehrotra


On Mon, Mar 10, 2014 at 6:47 PM, Julian Reschke julian.resc...@gmx.de wrote:
 On 2014-03-10 13:58, Chetan Mehrotra wrote:

 and org.apache.jackrabbit.oak.commons.json does not only seem to
 introduce a JSOP dependency (aren't we getting rid of that?), but also the
 JSON support doesn't appear to support data types...
 (JsonObject.getProperties returns a String/String map).


 In that case can we use the JSON library from json.org as thats widely
 used in projects like Sling Felix WebConsole already. So we would not
 be adding a new dependency there
 Chetan Mehrotra


 FWIW, we the oak-blob POM also references json-simple, but it doesn't appear
 to be used.

 I really don't care what JSON lib we use, but it needs to be correct,
 maintained, and fast. If json.org fulfills these requirements then I'm more
 than happy to switch.

 Best regards, Julian



Using Oak Run with complex setup and configurations

2014-03-10 Thread Chetan Mehrotra
Currently oak-run (and to some extent oak-upgrade) both instantiate
Repository (or NodeStore) instance outside of OSGi env. So far some
level of customizations are supported.

Looking for some guidance on how to support bit more complex
configuration like using different DataStore and providing config to
those DataStore?

Current way of configuring custom BlobStore relies on system
properties to achieve the same. As part of OAK-1502 I need to refactor
the way the BlobStores are configured. I would like to configure them
via OSGi configuration and use SCR annotation for the same

With such a change supporting oak-run and oak-upgrade becomes tricky.

One option is to expose all such config options as command line
options but then that would require duplicate work for handling
configuration.

Another way is to switch to proposed PojoSR based setup [2]. So one
would provide all the config in a single json file for example

oak-config.json
{
  properties: {
oak.documentMK.revisionAge: 7
  },
  configs: {
org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService: {
  db: test,
  mongouri: mongodb://localhost:27017
},
org.apache.jackrabbit.oak.security.user.UserConfigurationImpl: {
  usersPath: /home/users,
  /home/users: /home/groups
}
  }
}

And on command line we pass on that file as one of the arguments

java -jar oak-run-*.jar --config oak-config.json

Chetan Mehrotra
[1] 
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/blob/BlobStoreHelper.java#L52-52
[2] http://markmail.org/thread/26g2fqbda3uyahmm


Dependency on json-simple in oak-core

2014-03-09 Thread Chetan Mehrotra
Currently oak-core has a dependency on
com.googlecode.json-simple:json-simple:1.1.1 jar. Looking at compile
time usage I see it being referred only from RDBDocumentStore

Would it be possible to use classes from
'org.apache.jackrabbit.oak.commons.json' package and remove the
required dependency on this library

Chetan Mehrotra


Proposal for configuring Oak using PojoSR and not miss good old repository.xml for configuration!

2014-03-06 Thread Chetan Mehrotra
Hi,

Till JR2 the repository was configured via repository.xml which served
two purposes

1. Support for DI
2. Configuring the service properties

From the end user perspective this used to prove useful from usability
point of view.

1. One can see what all components are involved
2. Easy to specify all config properties at one place. The xml can be
documented also
3. JR2 used to warn if property names are not correct. So good support
for handling misconfiguration

From developer perspective the repository.xml constrained us in the
ways repository element can be configured.

With Oak we have done away with the repository.xml approach. The
repository components can now be assembled programatically. In OSGi
repository configures itself through std OSGi constructs.

For non OSGi case well we have the Jcr class which provides a ready
repository with suitable defaults. However configuring properties
still involves quite a work and so far have to do it in code

IMHO having configuration in place xml/yaml etc is pretty useful to a
end user and simplifies administration.

Proposal
-

To enable ease of both configuration and modularity we already support
OSGi. With PojoSR (as explained in other mail [1]) we can support both

1. Have a PojoSR based RepositoryFactory to provide access to
Repository instance
2. Have a custom xml way to capture OSGi configuration. Instead of
config scattered across multiple property file we can consolidate all
such config in a single file. For some example see [2] [3]

As implemented with [4] when repository starts with default Segment
store configured it creates following directory layout

repo-home
  |__ /bundles
  |__/content
  |__/config

the config folder can be replaced with single xml. Bundles folder is
created by PojoSR as some bundles like ConfigAdmin use the bundle data
folder to store some state

Benefits
1. Repository can easily be extended vis std OSGi constructs is users
use DS and put there bundle in the flat classpath
2. For simple setup users can directly register there service
programatically with the PojoServiceRegistry
3. Configuration is managed in a central place and can easily be edited
4. For more adventurous usage users can also enable the Felix
WebConsole when deploying a Oak based application in Tomocat or even
uses Felix Jetty bundle!. Most of the webconsole would work

Thoughts?

Chetan Mehrotra

PS: PojoSR is now being moved to Apache Felix (FELIX-4445)

[1] http://markmail.org/thread/2vnktbuq2jd2ovs5
[2] 
https://docs.jboss.org/author/display/AS7/OSGi+subsystem+configuration#OSGisubsystemconfiguration-cas
[3] 
https://access.redhat.com/site/documentation/en-US/JBoss_Fuse/6.0/html/Deploying_into_the_Container/files/DeployCamel-OsgiConfigProps.html
[4] https://github.com/chetanmeh/oak-pojosr


Using Oak with PojoSR - Make use of OSGi features in POJO env

2014-03-04 Thread Chetan Mehrotra
Hi,

Earlier we had some discussion around different ways to configure Oak.
Oak currently supports running and configuring itself in OSGi env. it
can easily be used in non OSGi env also by programatically configuring
and assembling the required components

However this poses problem around best way to

1. Configure components - Currently various sub modules are using
different ways to configure. Some is being done via system properties,
Security module has its own way of configuring, DataStore support has
its own

2. Expose extension points - Exposing extension points in non OSGi env
becomes tricky. We faced some issues around LoginModule extensiblity
earlier

PojoSR [1] provides support for using OSGi constructs outside of OSGi framework

I have implemented a POC [2] which makes use of PojoSR to configure
Oak in pojo env. So far basic stuff seems to work as expected.

Would be worthwhile to investigate further here and possibly provide a
PojoSR based RespositoryFactory?

Chetan Mehrotra
[1] https://code.google.com/p/pojosr/
[2] https://github.com/chetanmeh/oak-pojosr


Re: buildbot failure in ASF Buildbot on oak-trunk-win7

2014-03-02 Thread Chetan Mehrotra
Build failed due to

---
Failed tests:
concurrentObservers(org.apache.jackrabbit.oak.spi.commit.BackgroundObserverTest)

Tests run: 1579, Failures: 1, Errors: 0, Skipped: 70
---

This should not be related to my changes and looks like an intermittent issue.


Chetan Mehrotra


On Mon, Mar 3, 2014 at 11:33 AM,  build...@apache.org wrote:
 The Buildbot has detected a new failure on builder oak-trunk-win7 while 
 building ASF Buildbot.
 Full details are available at:
  http://ci.apache.org/builders/oak-trunk-win7/builds/4725

 Buildbot URL: http://ci.apache.org/

 Buildslave for this Build: bb-win7

 Build Reason: scheduler
 Build Source Stamp: [branch jackrabbit/oak/trunk] 1573450
 Blamelist: chetanm

 BUILD FAILED: failed compile

 sincerely,
  -The Buildbot





Re: Efficient import of binary data into Oak

2014-02-18 Thread Chetan Mehrotra
On Tue, Feb 18, 2014 at 2:32 PM, Jukka Zitting jukka.zitt...@gmail.com wrote:
 Good point. That use case would probably be best handled with a
 specific InputStream subclass like suggested by Felix for files.

That mode is fine for case like FileInputStream but not for case like
S3 where underlying data is just a url and DataStore needs to make use
of that. Instead what I was referring to is the ability to pass custom
Binary implementations

- FileBinary implements javx.jcr.Binary -
- File object
- boolean flag indicating that ownership of file. If true it
indicates that client is transferring the ownership of the file to
Oak. In such a case the Oak DataStore can simply move the file
instance to its storage area

- S3Binary implements javx.jcr.Binary
- Signed S3 url which can be used by S3DataStore to perform Copy operation

This binary is then passed to JCR API and then DataStore makes use of
that and chooses an efficient path depending on the type of Binary

Couple of points to note
- Such an API would introduce a strong coupling with underlying DataStore
- But such an API would then be targeted for a very specific usecase
where the code needs to import high amount of binary data and it is
written keeping in mind the underlying DataStore being used such that
it can use the most optimized way

Chetan Mehrotra


Re: Efficient import of binary data into Oak

2014-02-18 Thread Chetan Mehrotra
On Tue, Feb 18, 2014 at 3:48 PM, Jukka Zitting jukka.zitt...@gmail.com wrote:
 Something like S3InputStream.getURL() should work just fine for that use case:

That also would work with the caveat that in between layer do not
decorate the InputStream in any form.




Chetan Mehrotra


Efficient import of binary data into Oak

2014-02-17 Thread Chetan Mehrotra
Hi,

Currently in a Sling based application where a user uploads a file to
the JCR following sequence of steps are executed

1. User uploads file via HTTP request mostly using Multi-Part form
data based upload

2. Sling uses Commons File Upload to parse the multi-part request
which uses a DiskFileItemFactory and write the binary content to a
temporary file (for file size  256 KB) [1]

3. Later the servlet would access the JCR Session and create a Binary
value by extracting the InputStream

4. The file content would then be spooled into the BlobStore

Effect of different blobstore


Now depending on the type of BlobStore one of the following code flow
would happen

A - JR2 DataStores - The inputstream would be copied to file
B - S3DataStore - The AWS SDK would be creating a temporary file and
then that file content would be streamed back to the S3
C - Segment - Content from InputStream would be stored as part of
various segments
D - MongoBlobStore - Content from InputStream would be pushed to
remote mongo via multiple remote calls

Things to note in above sequence

1. Uploaded content is copied twice.
2. The whole content is spooled via InputStream through JVM Heap

Possible areas of Improvement


1. If the BlobStore is finally using some File (on same hard disk not
NFS) then it might be better to *move* the file which was created in
upload. This would help local FileDataStore and S3DataStore

2. Avoid spooling via InputStream if possible. Spooling via IS is slow
[3]. Though in most cases we use efficient buffered copy which is
marginally slower than NIO based variants. However avoiding moving
byte[] might reduce pressure on GC (probably!)

Changes required


If we can have a way to create JCR Binary implementations which
enables DataStore/BlobStore to efficiently transfer content then that
would help.

For example for File based DS the Binary created can keep a reference
to the source File object and that Binary is used in JCR API.
Eventually the FileDataStore can treat it in a different way and move
the file.

Another example is S3DataStore - In some cases the file has already
been transferred to S3 using other options. And the user wants to
transfer the S3 file from its bucket to our bucket. So a Binary
implementation which can just wrap the S3 url would enable the
S3DataStore to transfer the content without streaming all content
again [4]

Any thoughts on the best way to enable users of Oak to create Binaries
via other means (compared to current mode which only enables via
InputStream) and enable the DataStores to make use of such binaries?

Chetan Mehrotra

[1]https://github.com/apache/sling/blob/trunk/bundles/engine/src/main/java/org/apache/sling/engine/impl/parameters/ParameterSupport.java#L190
[2] 
http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/PutObjectRequest.html
[3] http://www.baptiste-wicht.com/2010/08/file-copy-in-java-benchmark/3/
[4] 
http://stackoverflow.com/questions/9664904/best-way-to-move-files-between-s3-buckets


Re: Use LoginModuleProxy (was: Make Whiteboard accessible through ContentRepository)

2014-02-14 Thread Chetan Mehrotra
On Thu, Feb 13, 2014 at 11:49 PM, Tobias Bocanegra tri...@apache.org wrote:
 this callback would rely on some sort of (non-osgi) list of
 pre-configured factories, right?
 eg some sort of LoginModuleFactoryRegistryImpl that is added to the
 authentication configuration?

Sort of. As Jukka mentioned the code which configures Oak
programatically in non OSGi env can also provide a list of
LoginModuleFactory as part of Security setup. In OSGi env get them via
DS

 this has one caveat, that you could register 1 factory per class (if
 you are using the factories class name as identifier). otherwise you
 would need to introduce some kind of factory pid.

Yup. For completeness and to cover all cases you can have some factory
property (similar to service property in OSGi). And that can be passed
to LoginModuleFactory as argument so that it can instantiate right LM
impl

Chetan Mehrotra


Re: Make Whiteboard accessible through ContentRepository

2014-02-13 Thread Chetan Mehrotra
On Thu, Feb 13, 2014 at 12:45 PM, Tobias Bocanegra tri...@apache.org wrote:
 I don't quite follow. can you give an example of what would be in the
 jaas.conf and where you instantiate the ProxyLoginModule ?

A rough sketch would be ...

jaas.config


oakAuth {
org.apache.jackrabbit.oak.security.ProxyLoginModule REQUIRED

loginModuleFactoryClass=org.apache.jackrabbit.oak.security.LdapLoginModuleFactory
authIdentity={USERNAME}
useSSL=false
debug=true;
};


public class ProxyLoginModule implements LoginModule{
private LoginModule delegate;

public void initialize(Subject subject, CallbackHandler callbackHandler,
MapString, ? sharedState, MapString, ? options){
LMFactoryProviderCallBack lmfcb = new LMFactoryProviderCallBack()
factory =  callbackHandler.handle([lmfcb]);
LoginModuleFactory factory = lmfcb.getLoginModuleFactoryProvider()

.getFactory(options.get(loginModuleFactoryClass));
delegate = factory.createLoginModule();
delegate.initialize(subject, callbackHandler, sharedState, options);
}

...
//Use delegate for other operations
}

The flow would involve following steps

1. User mentions the ProxyLoginModule in jaas entry and provide the
factory class name in the config. JAAS logic would be instantiating
the Proxy LM
2. Oak provides a callback using which Proxy LM can obtain the factory
3. Upon init the proxy would initialize the delegate from factory
4. The delegate is used for later calls
5. LM if required can still use the config from jaas or ot is
configured via factory itself

Note here I preferred using the callback to get LM access the outer
layer services instead of using a custom config.

The custom config mode works fine in standalone case where the
application is the sole user of JAAS system. Hence it works fine for
Karaf/OSGi env But that might not work properly in App server env
where app server itself uses jaas. So to avoid interfering in embedded
mode callback should be preferred.

Chetan Mehrotra


Re: Failure in compile trunk

2014-02-11 Thread Chetan Mehrotra
I checked the code and looks like classes from Felix JAAS are not
being used as after removing the entry oak-core compiled successfully.
Te entry is now removed

Lets see if next build pass. Also i would start release of Felix Jaas module
Chetan Mehrotra


On Tue, Feb 11, 2014 at 2:18 PM, Michael Dürig mdue...@apache.org wrote:

 On 11.2.14 9:43 , Davide Giannella wrote:

 Good morning everyone,

 it seems that the commit r1566802 introduced a dependency to an artifact
 that is not available on maven repo.

 http://markmail.org/message/6gsaop3mfxccxkjn


 ERROR] Failed to execute goal on project oak-core: Could not resolve
 dependencies for project
 org.apache.jackrabbit:oak-core:bundle:0.17-SNAPSHOT: Could not find artifact
 org.apache.felix:org.apache.felix.jaas:jar:0.0.1-R1560269 in central
 (http://repo.maven.apache.org/maven2)

 Michael


Re: Failure in compile trunk

2014-02-11 Thread Chetan Mehrotra
For now I have implemented a workaround with [1]. This should get us
moving forward.

This would be fixed with proper released version of JAAS bundle by end
of this week once the release is approved in Felix project

Chetan Mehrotra
[1] http://svn.apache.org/r1567089


On Tue, Feb 11, 2014 at 5:21 PM, Julian Reschke julian.resc...@gmx.de wrote:
 On 2014-02-11 10:27, Amit Jain wrote:

 Its also referenced from oak-auth-external/pom.xml

 Thanks
 Amit
 ...


 Fails here as well. Can we revert that change for now?

 Best regards, Julian


Re: Make Whiteboard accessible through ContentRepository

2014-02-11 Thread Chetan Mehrotra
To address the same problem with Felix Jaas support one can make use
of LoginModuleFactory [1]. Its job is to create LoginModule instances.

One example of this is JdbcLoginModuleFactory [2]. It creates
JdbcLoginModule and passes on the DataSource to the LoginModule
instances. So instead of LoginModule _looking up_ the DataSource
service it is provided with the DataSource instance. The factory
itself was _provided_ with DataSource reference via DI (via
Declarative Services)

To implement a similar approach in non OSGi world following approach can be used

1. Have a ProxyLoginModule like [3]. The JAAS Config would refer to
this class and would be able to create it
2. Have a LoginModuleFactory LMF (custom one) which is referred to in
the JAAS Config.
3. One can register a custom LMF implementation with Oak and they
would be passed on to SecurityProvider
3. ProxyLoginModule determines the type of LoginModuleFactory and
obtains them via Callback
4. The LMF obtained is used to create the LoginModule instance

Now same LoginModule (where most of the logic reside) can be shared
between OSGi and non OSGi world. Further you can even share the
LoginModuleFactory (if using non OSGi stuff only).

For OSGi case the LMF would be managed via DS and its dependencies
provided via DS
For non OSGi case host application would wire up the LMF with its
dependencies (via setters) and then register them with the Oak

Chetan Mehrotra
[1] 
http://svn.apache.org/repos/asf/felix/trunk/jaas/src/main/java/org/apache/felix/jaas/LoginModuleFactory.java
[2] 
http://svn.apache.org/repos/asf/felix/trunk/examples/jaas/lm-jdbc/src/main/java/org/apache/felix/example/jaas/jdbc/JdbcLoginModuleFactory.java
[3] 
http://svn.apache.org/repos/asf/felix/trunk/jaas/src/main/java/org/apache/felix/jaas/boot/ProxyLoginModule.java


On Wed, Feb 12, 2014 at 12:07 PM, Tobias Bocanegra tri...@apache.org wrote:
 Hi,


 On Tue, Feb 11, 2014 at 4:38 PM, Tobias Bocanegra tri...@apache.org wrote:
 On Tue, Feb 11, 2014 at 1:59 PM, Jukka Zitting jukka.zitt...@gmail.com 
 wrote:
 On Mon, Feb 10, 2014 at 2:25 AM, Felix Meschberger fmesc...@adobe.com 
 wrote:
 This thread indeed raises the question, why Oak has to come up with 
 something (the Whiteboard)
 that is almost but not quite like OSGi instead of going all the way 
 through ?

 This is a misunderstanding; we're not trying to reinvent OSGi.

 The Whiteboard interface is *only* an abstraction of the whiteboard
 pattern described in
 http://www.osgi.org/wiki/uploads/Links/whiteboard.pdf, and is used
 only for those cases in Oak where that pattern is useful. When running
 in an OSGi environment, the Whiteboard simply leverages the existing
 OSGi functionality.

 The Whiteboard in Oak is not a generic service registry, and is not
 supposed to become one.
 but then, why does the Whiteboard interface has a register() method?
 This indicates to me, that there is a global service registry behind
 that can be used by all other users of the whiteboard.

 Also, using the whiteboard in the tests make them very easy
 configurable and simulates a bit better how OSGi will work. for
 example in the ExternalLoginModuleTest, I can just do:

 whiteboard = oak.getWhiteboard();
 whiteboard.register(SyncManager.class, new
 SyncManagerImpl(whiteboard), Collections.emptyMap());
 whiteboard.register(ExternalIdentityProviderManager.class, new
 ExternalIDPManagerImpl(whiteboard), Collections.emptyMap());

 If I need to register them with the login module, this would not work
 today, without hard wiring all possible services to the
 SecurityProvider.

 addendum: If I want to achieve the same without the whiteboard, I would need 
 to:

 * Invent a new interface that allows to pass services/helpers to the
 login modules. eg a LoginModuleService interface
 * Create some sort of LoginModuleServiceProviderConfiguration
 * Create an implementation of the above that deals with OSGi but also
 can be used statically
 * Add the LoginModuleServiceProviderConfiguration to
 ServiceProvider.getConfiguration()
 * Add the interface to my ExternalIDPManagerImpl and SyncManagerImpl
 * in the login module, retrieve the
 LoginModuleServiceProviderConfiguration from the SecurityProvider,
 then find a service for SyncManager and
 ExternalIdentityProviderManager

 * in the non-osgi case, I would need to initialize the
 LoginModuleServiceProviderConfigurationImpl myself and add the 2
 services and add the config to the securityprovider.

 The additional work is that I need to re-invent some sort of service
 registry for the LoginModuleServiceProviderConfigurationImpl and
 extend the SecurityProvider with another configuration. Also, I limit
 the interfaces to be used in the LoginModules to the ones implementing
 the LoginModuleService interface. you might say, this is more robust -
 I say, this is just more complicated and has tighter coupling.

 regards, toby


Re: svn commit: r1561926 - /jackrabbit/oak/tags/jackrabbit-oak-core-0.15.2/pom.xml

2014-01-28 Thread Chetan Mehrotra
On Tue, Jan 28, 2014 at 7:41 AM,  tri...@apache.org wrote:
 +  org.apache.jackrabbit.oak;version=0.15.0,
 +  org.apache.jackrabbit.oak.api;version=0.15.0,
 +  org.apache.jackrabbit.oak.api.jmx;version=0.15.0,

We should version packages independent of Oak release version.
Probably a better way would be to

1. Add explicit package-info.java with correct bnd version
annotations. Then we need not explicitly list down package names in
pom.xml
2. Untill we have  a 1.0 release we can set the package version to
1.0.0 and not change it if even if any API changes considering that
pre 1.0 builds are to be considered as unstable wrt API

Post 1.0 we take care to bump the version as per OSGi Sematic Version
guidelines [1]

Chetan Mehrotra
[1] http://www.osgi.org/wiki/uploads/Links/SemanticVersioning.pdf


Re: svn commit: r1560611 - in /jackrabbit/oak/trunk: oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/mongomk/util/ oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/mongomk/ oak-jcr/sr

2014-01-23 Thread Chetan Mehrotra
On Thu, Jan 23, 2014 at 2:35 PM,  thom...@apache.org wrote:
 +mongo = new MongoClient(mongoURI);
 +db = mongo.getDB(mongoURI.getDatabase());

The database referred in the URI is used by Mongo for determining the
user database against which credentials if passed needs to be
validated.

--
/database
Optional. The name of the database to authenticate if the connection
string includes authentication credentials in the form of
username:password@. If /database is not specified and the connection
string includes credentials, the driver will authenticate to the admin
database.
--

So far we do not make use of credentials so database field can be
used. But better to manage it in a separate way.

Chetan Mehrotra
[1] http://docs.mongodb.org/manual/reference/connection-string/


Re: Package deployment check for locking leading to large no of small branch commits

2013-12-18 Thread Chetan Mehrotra
The patch works great!!

With patch applied I get similar timings as I got by disabling session
refresh in LockOperation.
Chetan Mehrotra


On Wed, Dec 18, 2013 at 2:47 PM, Michael Dürig mdue...@apache.org wrote:


 On 18.12.13 10:12 , Marcel Reutegger wrote:

 We could rebase the branch and then rebase the in memory changes on top
 of the rebased branch. This would get us rid of the branch commit. But
 wouldn't get us rid of the rebase operation on the persisted branch. So
 I'm not too sure this will gain us a lot.


 hmm, you are right. this may still result in a commit on the branch for
 the
 rebase. it should be possible to avoid it when the rebase is actually a
 no-op because there were no changes by other sessions.


 I think this is already the case. See
 org.apache.jackrabbit.oak.spi.state.AbstractNodeStoreBranch.Persisted#rebase

 For the other branch states rebasing is already entirely in memory. I
 quickly hacked together a POC patch for above approach. Chetan agreed to do
 a quick check with that.

 Michael


 Regards
   Marcel




Re: Package deployment check for locking leading to large no of small branch commits

2013-12-18 Thread Chetan Mehrotra
Created OAK-1294 to track this

Chetan Mehrotra
[1] https://issues.apache.org/jira/browse/OAK-1294

On Wed, Dec 18, 2013 at 3:15 PM, Chetan Mehrotra
chetan.mehro...@gmail.com wrote:
 The patch works great!!

 With patch applied I get similar timings as I got by disabling session
 refresh in LockOperation.
 Chetan Mehrotra


 On Wed, Dec 18, 2013 at 2:47 PM, Michael Dürig mdue...@apache.org wrote:


 On 18.12.13 10:12 , Marcel Reutegger wrote:

 We could rebase the branch and then rebase the in memory changes on top
 of the rebased branch. This would get us rid of the branch commit. But
 wouldn't get us rid of the rebase operation on the persisted branch. So
 I'm not too sure this will gain us a lot.


 hmm, you are right. this may still result in a commit on the branch for
 the
 rebase. it should be possible to avoid it when the rebase is actually a
 no-op because there were no changes by other sessions.


 I think this is already the case. See
 org.apache.jackrabbit.oak.spi.state.AbstractNodeStoreBranch.Persisted#rebase

 For the other branch states rebasing is already entirely in memory. I
 quickly hacked together a POC patch for above approach. Chetan agreed to do
 a quick check with that.

 Michael


 Regards
   Marcel




Re: svn commit: r1547017 - in /jackrabbit/oak/trunk/oak-core/src: main/java/org/apache/jackrabbit/oak/plugins/mongomk/MongoNodeStore.java test/java/org/apache/jackrabbit/oak/plugins/mongomk/Background

2013-12-02 Thread Chetan Mehrotra
Hi Marcel,

Probably below code can be simplified using the Lists.partition(list,size) [1]

 -// update if this is the last path or
 -// revision is not equal to last revision
 -if (i + 1 = paths.size() || size == ids.size()) {
 +// call update if any of the following is true:
 +// - this is the last path
 +// - revision is not equal to last revision (size of ids didn't 
 change)
 +// - the update limit is reached
 +if (i + 1 = paths.size()
 +|| size == ids.size()
 +|| ids.size() = BACKGROUND_MULTI_UPDATE_LIMIT) {
  store.update(Collection.NODES, ids, updateOp);
  for (String id : ids) {
  unsavedLastRevisions.remove(Utils.getPathFromId(id));



Chetan Mehrotra
[1] 
http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/collect/Lists.html#partition(java.util.List,
int)


Re: Running background operation on a single node in a Oak cluster

2013-12-02 Thread Chetan Mehrotra
On Wed, Nov 20, 2013 at 11:41 PM, Jukka Zitting jukka.zitt...@gmail.com wrote:
 Yes, sounds like a good candidate. The only additional bit we'd need
 is a timestamp that allows indexers on other cluster nodes to
 automatically resume processing if an active indexing task dies for
 whatever reason without a chance to clear the flag.

Implemented such a logic with OAK-1246

Chetan Mehrotra


Running background operation on a single node in a Oak cluster

2013-11-20 Thread Chetan Mehrotra
Hi,

Oak currently executes various task in background like Async Index
Update, Blob store garbage collection etc. While running multiple
instances of Oak in a cluster it would be desirable to run some job
only on one node in a cluster particularly Async Index Update.

Currently we schedule these jobs through WhiteboardUtils [1] . The
default implementation schedules it using an executor. However when we
run it in Sling based system the scheduling is handled via Sling
Commons Scheduler. Sometime back it added support for running the task
on only one node in a cluster [2]. This can be done by just adding an
extra property with the service registration.

Would it make sense to make use of this feature to run the Async Index
Update only on one node?. It would also help in avoiding the conflict
while concurrently updating the index related data in cluster.

For non Sling based system we would let the owning system provide this
support by providing right implementation of the WhiteBoard

Chetan Mehrotra

[1] 
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/spi/whiteboard/WhiteboardUtils.java#L31
[2] https://issues.apache.org/jira/browse/SLING-2979


Re: Segment Store and unbounded instances of MappedByteBuffer causing high RAM usage

2013-11-19 Thread Chetan Mehrotra
On Tue, Nov 19, 2013 at 11:29 PM, Jukka Zitting jukka.zitt...@gmail.com wrote:
 Hmm, I wonder if we do have a problem here or if the system is just
 working as designed.

Thats my thinking as well and hence wanted to discuss this on DL
first. The system here was being subjected to some DAM related tests
and quite a few binary files were added to it. The only cause of
concern was that system become quite slow for user response. The GUI
was very slow and accessing the Sling server running on Oak was also
very slow. So we need to control some aspect probably. What aspect
should be controlled I am not sure.

I would keep an eye on this and if such a problem is again reported to
me would try to collect more data and try to come up with a test for
that.

Chetan Mehrotra


Segment Store and unbounded instances of MappedByteBuffer causing high RAM usage

2013-11-18 Thread Chetan Mehrotra
Hi,

On some systems running on Oak (on Windows) where lots of nodes get
created we are seeing high RAM usage. The Java process memory (shown
directly under task manager. See note on Working Set below) remains
within expected range but system starts paging memory quite a bit and
slows down.

On checking the runtime state following things can be observed
* FileStore has 186 TarFile and each TarFile refers to 256MB
MappedByteBuffer taking around 46 GB of memory
* The JVM process constantly shows 50% CPU usage. Checking further
indictes that AsynchUpdate is running and Lucene indexing is being
performed
* Windows Task Manager shows 7 .66 GB (total 8 GB) overall memory
usage with Java process only showing 1.4GB usage. No other process
account for so high usage
* Checking the Working Set [1] shows 7 GB memory being used by Java
process alone

From my basic understanding of the Memory Mapped file usage it should
not cause msuch resource crunch and OS would be able to manage the
memory. However probably in Oak case all these TarFiles are getting
accessed frequently which leads to these memory mapped pages held in
physical memory.

Should we put upper cap on number of TarFiles opened?

If any other data needs to be collected to determine the problem cause
further then let me know?


Chetan Mehrotra
[1] 
http://msdn.microsoft.com/en-us/library/windows/desktop/cc441804(v=vs.85).aspx


Re: Strategies around storing blobs in Mongo

2013-11-07 Thread Chetan Mehrotra
To close this thread

On Wed, Oct 30, 2013 at 7:52 PM, Jukka Zitting jukka.zitt...@gmail.com wrote:
 So AFAICT the worry about a write blocking all concurrent reads is
 unfounded unless it shows up in a benchmark.

I tried to measure effect of such scenario in OAK-1153 [1] and from
results obtained there does not appear to be much change if
collections are managed in same database or different. So for now this
is nothing to worry about.

On a side note - No of reads and writes performed drop considerably
when accessing a remote Mongo server. See results at [1] for more
details

Chetan Mehrotra
[1] https://issues.apache.org/jira/browse/OAK-1153


Re: Strategies around storing blobs in Mongo

2013-10-30 Thread Chetan Mehrotra
  Open questions are, what is the write thoughput for one
 shard, does the write lock also block reads (I guess not), does the write

As Ian mentioned above write locks block all reads. So even adding a 2
MB chunk on a sharded system over remote connection would block read
for that complete duration. So at minimum we should be avoiding that.
Chetan Mehrotra


On Wed, Oct 30, 2013 at 2:40 PM, Ian Boston i...@tfd.co.uk wrote:
 On 30 October 2013 07:55, Thomas Mueller muel...@adobe.com wrote:
 Hi,

 as Mongo maintains a global exclusive write locks on a per database level

 I think this is not necessarily a huge problem. As far as I understand, it
 limits write concurrency within one shard only, so it does not block
 scalability. Open questions are, what is the write thoughput for one
 shard, does the write lock also block reads (I guess not), does the write
 lock cause high latency for other writes because binaries are big.



 This information would be extremely useful for all those looking to
 Oak to address use cases where the repository access is between 20 and
 60% write.

 To answer one of your questions
 According to [1] write locks do block reads within the scope of the lock.

 Other information from [1].
 Write locks are exclusive and global.
 Write locks block read locks being established.
 (and obviously read locks block write locks being established)
 Read locks are concurrent and shared.
 Pre 2.2 a write lock was scoped to the mongod process.
 Post 2.2 a write lock is scoped to the database within the mondod process.
 All locks are scoped to a shard.
 IIUC, the lock behaviour is identical to that in JR2 except for the scope.

 Ian

 1 
 http://docs.mongodb.org/manual/faq/concurrency/#how-granular-are-locks-in-mongodb




 I think it would make sense to have a simple benchmark (concurrent writing
 / reading of binaries), so that we can test which strategy is best, and
 possibly play around with different strategies (split binaries into
 smaller / larger chunks, use different write concerns, use more
 shards,...).

 Regards,
 Thomas



 On 10/30/13 7:50 AM, Chetan Mehrotra chetan.mehro...@gmail.com wrote:

Hi,

Currently we are storing blobs by breaking them into small chunks and
then storing those chunks in MongoDB as part of blobs collection. This
approach would cause issues as Mongo maintains a global exclusive
write locks on a per database level [1]. So even writing multiple
small chunks of say 2 MB each would lead to write lock contention.

Mongo also provides GridFS[2]. However it also uses a similar strategy
like we are currently using and such a support is built into the
Driver. For server they are just collection entries.

So to minimize contentions for write locks for uses cases where big
assets are being stored in Oak we can opt for following strategies

1. Store the blobs collection in a different database. As Mongo write
locks [1] are taken per db level then storing the blobs in different
db would allow the read/write of node data (majority usecase) to
continue.

2. For more asset/binary heavy usecase use a separate database server
itself to server the binaries.

3. Bring back the JR2 DataStore implementation and just save metadata
related to binaries in Mongo. We already have S3 based implementation
there and they would continue to work with Oak also

Chetan Mehrotra
[1]
http://docs.mongodb.org/manual/faq/concurrency/#how-granular-are-locks-in-
mongodb
[2] http://docs.mongodb.org/manual/core/gridfs/



Re: Strategies around storing blobs in Mongo

2013-10-30 Thread Chetan Mehrotra
 sounds reasonable. what is the impact of such a design when it comes
 to map-reduce features? I was thinking that we could use it e.g. for
 garbage collection, but I don't know if this is still an option when data
 is spread across multiple databases.

Would investigate that aspect further

 connecting to a second server would add quite some complexity to
Yup. Option was just provided for completeness sake. And something
like this would probably never be required.

 that was one of my initial thoughts as well, but I was wondering what
 the impact of such a deployment is on data store garbage collection.

Probably we can make a shadow node for the binary in the blob
collection and keep the binary content within the DataStore itself.
Stuff like Garbage collection would be performed on the Shadow node
and logic would use results from that to perform actual deletions.


Chetan Mehrotra


On Wed, Oct 30, 2013 at 1:13 PM, Marcel Reutegger mreut...@adobe.com wrote:
 Hi,

 Currently we are storing blobs by breaking them into small chunks and
 then storing those chunks in MongoDB as part of blobs collection. This
 approach would cause issues as Mongo maintains a global exclusive
 write locks on a per database level [1]. So even writing multiple
 small chunks of say 2 MB each would lead to write lock contention.

 so far we observed high lock content primarily when there are a lot of
 updates. inserts were not that big of a problem, because you can batch
 them. it would probably be good to have a test to see how big the
 impact is when blogs come into play.

 Mongo also provides GridFS[2]. However it also uses a similar strategy
 like we are currently using and such a support is built into the
 Driver. For server they are just collection entries.

 So to minimize contentions for write locks for uses cases where big
 assets are being stored in Oak we can opt for following strategies

 1. Store the blobs collection in a different database. As Mongo write
 locks [1] are taken per db level then storing the blobs in different
 db would allow the read/write of node data (majority usecase) to
 continue.

 sounds reasonable. what is the impact of such a design when it comes
 to map-reduce features? I was thinking that we could use it e.g. for
 garbage collection, but I don't know if this is still an option when data
 is spread across multiple databases.

 2. For more asset/binary heavy usecase use a separate database server
 itself to server the binaries.

 connecting to a second server would add quite some complexity to
 the system. wouldn't it be easier to just leverage standard mongodb
 sharding to distribute the load?

 3. Bring back the JR2 DataStore implementation and just save metadata
 related to binaries in Mongo. We already have S3 based implementation
 there and they would continue to work with Oak also

 that was one of my initial thoughts as well, but I was wondering what
 the impact of such a deployment is on data store garbage collection.

 regards
  marcel

 Chetan Mehrotra
 [1] http://docs.mongodb.org/manual/faq/concurrency/#how-granular-are-
 locks-in-mongodb
 [2] http://docs.mongodb.org/manual/core/gridfs/


Re: [MongoMK] flag document with children

2013-10-25 Thread Chetan Mehrotra
I have implemented the above logic as part of OAK-1117 [1]. With this
in place number of call made to Mongo on restarts of Adobe CQ goes
down from 42000 to 25000 significantly reducing the startup time when
Mongo is remote!!

regards
Chetan
[1] https://issues.apache.org/jira/browse/OAK-1117
Chetan Mehrotra


On Thu, Oct 24, 2013 at 3:23 PM, Chetan Mehrotra
chetan.mehro...@gmail.com wrote:
 I am trying to prototype an approach. Would come up with a patch for
 this soon. So far I was going with the reverse approach whereby when I
 fetch a node I retrieve some extra child rows [1] in same call to
 determine if it has any children.

 But given that number of read would far exceed number of writes it
 would be better to perform extra update call. I would try to come up
 with a patch for this

 regards
 Chetan
 [1] by adding an or clause to fetch node with id say ^2:/foo/.* to
 fetch child node for a parent with id 1:/foo.
 Chetan Mehrotra


 On Thu, Oct 24, 2013 at 3:08 PM, Thomas Mueller muel...@adobe.com wrote:
 Hi,

 Yes, you are right. It should be relatively easy to implement (low risk).

 Regards,
 Thomas


 On 10/24/13 10:12 AM, Marcel Reutegger mreut...@adobe.com wrote:

 The disadvantage is, when a node is added, either:

 - then the parent needs to be checked whether is already has this flag
set
 (if it is in the cache), or

I'd say a parent node is likely in the cache because oak will read it
first before
it is able to add a child.

 - the parent needs to be updated to set the flag

that's correct. though you only have to do it when it isn't set already.
and
the check should be cheap in most cases, because the node is in the cache.

regards
 marcel




Reduce number of calls from Oak to Mongo DB on restarts

2013-10-25 Thread Chetan Mehrotra
Hi,

Trying to restart an application like Adobe CQ running on Oak and
Mongo DB on a remote system takes considerable amount of time. In a
typical restart the number of such calls are around 23000 (Reduced
from 42000 with OAK-1117). I am trying to analyze the nature of calls
and also cache utilization to see if these can be reduced in OAK-1119.

Seeing the various logs following things stand out

* Number of queries made to fetch children are 18000 out of total
23000. Such queries also populate the doc cache hence number of
explicit find queries for individual docs are quite low (~400)
* The utilization of nodeChildrenCache is quite poor
* Number of updates are low still we see cache entries for same path
at diff revisions
* Checking the entries of the nodeCache and nodeChildren cache which
use path@revision key shows that there are quite a few entries at
different revision for same path.

Based on above we can look into following aspect

A - caching strategy for nodeCache and nodeChildrenCache - I think
current logic caches a node at the revision its is asked for and not
at the revision it actually exist. For example if client ask for node
at revision 100 for /foo/bar and actual latest valid revision for
/foo/bar is 70 we still cache it at key /foo/bar@100. If a different
client ask for version 105 and version of /foo/bar is still /70 we
would make a new cache entry at 105

B - Using a persistent cache to manage restarts -
No matter what we do we would still make large number of call on
restart as complete state is managed in a remote system. Most of state
we would read again would not have changed. So It might be better if
we cache the doc cache (L3 cache, L2 might be off heap cache) in a
persistent way say using MapDB [1] or H2 database. if we can found a
valid node at given revision from L3 then we serve it from there.

Upon restart we can check if modCount for document does not change (we
can check for multiple doc using in clause) and then we server it from
L3

C - Maintain an approximate estimate of childCounts in parent
In addition we can make an approx estimate of childCount and eagerly
fetch child node if number is small. We can possible make use of
primaryType of node to make a better guess

Thoughts?

Chetan Mehrotra
[1] http://www.mapdb.org/


Re: [MongoMK] flag document with children

2013-10-24 Thread Chetan Mehrotra
I am trying to prototype an approach. Would come up with a patch for
this soon. So far I was going with the reverse approach whereby when I
fetch a node I retrieve some extra child rows [1] in same call to
determine if it has any children.

But given that number of read would far exceed number of writes it
would be better to perform extra update call. I would try to come up
with a patch for this

regards
Chetan
[1] by adding an or clause to fetch node with id say ^2:/foo/.* to
fetch child node for a parent with id 1:/foo.
Chetan Mehrotra


On Thu, Oct 24, 2013 at 3:08 PM, Thomas Mueller muel...@adobe.com wrote:
 Hi,

 Yes, you are right. It should be relatively easy to implement (low risk).

 Regards,
 Thomas


 On 10/24/13 10:12 AM, Marcel Reutegger mreut...@adobe.com wrote:

 The disadvantage is, when a node is added, either:

 - then the parent needs to be checked whether is already has this flag
set
 (if it is in the cache), or

I'd say a parent node is likely in the cache because oak will read it
first before
it is able to add a child.

 - the parent needs to be updated to set the flag

that's correct. though you only have to do it when it isn't set already.
and
the check should be cheap in most cases, because the node is in the cache.

regards
 marcel




Re: Oak JCR Observation scalability aspects and concerns

2013-10-22 Thread Chetan Mehrotra
On Mon, Oct 21, 2013 at 6:47 PM, Jukka Zitting jukka.zitt...@gmail.com wrote:
 -1 This introduces the problem where a single JCR event listener can
 block or slow down all other listeners.

That can be mitigated upto an extent by using some sort of Black List
(OAK-1084). However current approach of each listener pulling in the
diff at its own pace is more robust to handle such cases.


 I'm not convinced by the assumption here that the observation
 listeners put undue pressure on the underlying MK or its caching. Do
 we have some data to prove this point? My reasoning is that if in any
 case we have a single (potentially multiplexed as suggested) listener
 that wants to read all the changed nodes, then those nodes will still
 need to be accessed from the MK and placed in the cache. If another
 listener does the same thing, they'll most likely find the items in
 the cache and not repeat the MK accesses. The end result is that the
 main performance cost goes to the first listener and any additional
 ones will come mostly for free, thus the claimed performance benefit
 of multiplexing observers is IMHO questionable.


Agreed (and also mentioned earlier) that current approach does cause
multiple calls to MK as in most cases the NodeState would be found in
the cache. However due the access pattern i.e. same node state being
fetched multiple times such entries in cache would get higher priority
and occupy memory which would otherwise would have been used to cache
NodeState for *latest* revision.

This is just an observation and I currently do not have any numbers
which indicate that this would cause significant performance issue and
further such things are hard to measure.

Chetan Mehrotra


Re: Oak JCR Observation scalability aspects and concerns

2013-10-22 Thread Chetan Mehrotra
On Mon, Oct 21, 2013 at 11:39 PM, Jukka Zitting jukka.zitt...@gmail.com wrote:
 3) The Observer mechanism allows a listener to look at repository
 changes in variable granularity and frequency depending on application
 needs and current repository load. Thus an Oak Observer can
 potentially process orders of magnitude more changes than a JCR event
 listener that needs to look at each individual changed item.

+1

I think in Sling case it would make sense for it to be implemented as
an Observer. And I had a look at implementation of some of the
listener implementations of [1] and I think they can be easily moved
to Sling OSGi events

Chetan Mehrotra
[1] https://gist.github.com/chetanmeh/7081328/raw/listeners-list-filtered.txt


Oak JCR Observation scalability aspects and concerns

2013-10-21 Thread Chetan Mehrotra
 Duerig, Carsten Ziegeler, Chetan Mehrotra
[1] https://gist.github.com/chetanmeh/7081328
[2] https://gist.github.com/chetanmeh/7081328/raw/listeners-list-filtered.txt
[3] https://git.corp.adobe.com/gist/chetanm/863/raw/listerners-per-path.txt
[4] https://cwiki.apache.org/confluence/display/SLING/Observation+usage+patterns


IndexEditor and Commit Failure

2013-10-16 Thread Chetan Mehrotra
Hi,

Currently the various IndexEditor (Lucene,Property and Solr) are
invoked as part of CommitHook.processCommit call whenever a JCR
Session is saved.

In case the commit fails would it leave the Index state in inconsistent state?

For PropertyIndex I think it would be fine as the index content is
also part of same commit and hence would not be committed. But for
other indexes the index data would have been saved (sort of 2 phase
commit) and it would not be possible to roll them back leaving them
with data which has not been committed.

And more over such commit failure would occur *after * a proper commit
has been done so the changes done to index state as part of failed
commit would overwrite the changes done as part of successful commit.

Should not the IndexEditor work as part of PostCommitHook so that they
always work on proper committed content?

Chetan Mehrotra


Re: IndexEditor and Commit Failure

2013-10-16 Thread Chetan Mehrotra
 The lucene index is asynchronous

Okie I missed that part completely i.e. OAK-763. Yup with that used
for such indexers this problem would not be observed.

Thanks for the pointer Alex!!

Chetan Mehrotra


Full text indexing with Solr

2013-09-24 Thread Chetan Mehrotra
Hi,

When Oak uses Solr then do we send the complete binary to Solr for
full text indexing or we extract the content on Oak side and send the
extracted content.

And if send the complete binary content do we send it inline or it is
first uploaded to Solr and reference to that is passed?

Chetan Mehrotra


NodeType index returns cost as zero when count is zero

2013-09-23 Thread Chetan Mehrotra
Hi,

While trying to execute a query like

select [jcr:path], [jcr:score], * from [nt:unstructured] as a where
[sling:resourceType] = 'dam/smartcollection' and isdescendantnode(a,
'/content/dam')

where index does exist for sling:resourceType the explain shows that
its using the NodeType index for jcr:primaryType

[nt:unstructured] as [a] /* Filter(query=explain select [jcr:path],
[jcr:score], *
from [nt:unstructured] as a
where [sling:resourceType] = 'dam/smartcollection'
and isdescendantnode(a, '/content/dam')
, path=/content/dam//*,
property=[sling:resourceType=dam/smartcollection]) where
([a].[sling:resourceType] = cast('dam/smartcollection' as string)) and
(isdescendantnode([a], [/content/dam])) */

The problem I think is the way cost is determined in [1]. Current
implementation returns cost as zero if count returned by the
IndexStoreStrategy is zero. This forces the query engine to use this
index. I think it should return a value less than
Double.POSITIVE_INFINITY but greater than zero probably MAX_COST to
indicate that it can participate but there is some cost

-return store.count(indexMeta, encode(value), MAX_COST);
+long count = store.count(indexMeta, encode(value), MAX_COST);
+return (count == 0) ? MAX_COST : count;

With the logic changed above the plan changes to

[nt:unstructured] as [a] /* property
sling:resourceType=dam/smartcollection where ([a].[sling:resourceType]
= cast('dam/smartcollection' as string)) and (isdescendantnode([a],
[/content/dam])) */

Any pointers?

Chetan Mehrotra
[1] 
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/index/property/PropertyIndexLookup.java#L110


Re: Providing details with CommitFailedException and security considerations

2013-09-12 Thread Chetan Mehrotra
The best thing is probably to entirely remove the message from the 
exception/logs and replace it with a text that explains how to find the 
conflict information (i.e. in the transient space).

Not sure if that is always possible easily. As at times there are
multiple layers involved and a user does not have access to the actual
session. If the information is somehow available via logs it is far
more accessible.

So would turn it to a debug level log.

regards
Chetan
Chetan Mehrotra


On Thu, Sep 12, 2013 at 1:01 PM, Marcel Reutegger mreut...@adobe.com wrote:
 Hi,

 I would turn the log into a debug message, because it can be very confusing.
 conflicts happen during regular repository usage and shouldn't log warnings.
 if needed one could still enable debug logging to get more information on
 what happens during a conflict.

 regards
  marcel

 -Original Message-
 From: Michael Dürig [mailto:mdue...@apache.org]
 Sent: Donnerstag, 12. September 2013 09:15
 To: oak-dev@jackrabbit.apache.org
 Subject: Re: Providing details with CommitFailedException and security
 considerations


 Hi Chetan,

 The best thing is probably to entirely remove the message from the
 exception/logs and replace it with a text that explains how to find the
 conflict information (i.e. in the transient space).

 Michael

 On 12.9.13 6:56 , Chetan Mehrotra wrote:
  Hi,
 
  As part of OAK-943 I had updated the ConflictValidator [1] to more
  more details around Commit Failure. However exposing such details as
  part of exception was considered risky from security aspect and it was
  decided to log a warning instead.
 
  Now in some cases the upper layer do expect a CommitFailedException
  have required logic to retry the commit in case of failure. In such
  cases these warning logs cause confusion.
 
  So not sure what is the best thing to do. Should I turn the log to
  debug level or make details part of exception message?
 
  Making it part of warn level would cause issue as such situations a
  not very repetative and users typically run system at INFO level.
 
  If I make it part of exception message is then max it would expose
  presence of some property names (not there values). And in most cases
  the exception is not exposed to end user and is logged to system logs.
  So probably we can make it part of exception message itself
 
 
  [1] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-
 core/src/main/java/org/apache/jackrabbit/oak/plugins/commit/ConflictValid
 ator.java#L90
 
  Chetan Mehrotra
 



Exception thrown in Item.save

2013-09-10 Thread Chetan Mehrotra
Hi,

After refreshing the build which has changes done in OAK-993 I am
getting following exception. (Detailed one at [1])

--
Caused by: javax.jcr.UnsupportedRepositoryOperationException:
OakUnsupported: Failed to save subtree at
/content/usergenerated/content/geometrixx-outdoors/en/socialforum-vpktx/jcr:content/forum/1_ciot/wufg-init_topic_vpktx1.
There are transient modifications outside that subtree.
---

Now from OAK-993 I understand that such saves would cause issue. Can
some background be provided in

1. What might be the issue cause
2. Is this a change in behavior from JR2
3. If as a user I see such issue then what should I change ... or what
I am doing which is causing this issue

[1] https://paste.apache.org/mAL7

Chetan Mehrotra


<    1   2   3   4   5   6   7   >