Re: MissingLastRevSeeker

2014-08-26 Thread Amit Jain
Hi Julian,

The LastRevRecoveryAgent is executed at 2 places
1. On DocumentNodeStore startup where the MissingLastRevSeeker is used to
get potential candidates for recovery.
 2. At regular intervals defined by the property
'lastRevRecoveryJobIntervalInSecs' in the DocumentNodeStoreService (default
60 seconds). Short description is that MissingLastRevSeeker will be called
rarely in this case.
Long description - In this case a less expensive query is executed to find
out all the stale clusterNodes for which recovery is to be performed. If
there are clusterNodes that have unexpectedly shutdown and their
'leaseEndTime' has not expired then MissingLastRevSeeker will check all
potential candidates.

 Proposal: if this code *is* used regularly, we'll need an API so that
DocumentStore implementations other than Mongo can optimize the query.
+1. Since, It will be executed on every startup. RDBDocumentStore already
maintains the index on _modified property so, optimized querying is
possible.

Thanks
Amit


On Mon, Aug 25, 2014 at 7:36 PM, Julian Reschke julian.resc...@gmx.de
wrote:

 Hi there,

 it appears that the MissingLastRevSeeker (oak-core), when run, will be
 very slow on large repos, unless they use a MongoDocumentStore (which has a
 special-cased query).

 Question: when will this code execute? I've seen it occasionally during
 benchmarking, but it doesn't seem to happen always.

 Proposal: if this code *is* used regularly, we'll need an API so that
 DocumentStore implementations other than Mongo can optimize the query.

 Best regards, Julian



Re: MissingLastRevSeeker

2014-08-26 Thread Julian Reschke

On 2014-08-26 08:03, Amit Jain wrote:

Hi Julian,

The LastRevRecoveryAgent is executed at 2 places
1. On DocumentNodeStore startup where the MissingLastRevSeeker is used to
get potential candidates for recovery.
  2. At regular intervals defined by the property
'lastRevRecoveryJobIntervalInSecs' in the DocumentNodeStoreService (default
60 seconds). Short description is that MissingLastRevSeeker will be called
rarely in this case.
Long description - In this case a less expensive query is executed to find
out all the stale clusterNodes for which recovery is to be performed. If
there are clusterNodes that have unexpectedly shutdown and their
'leaseEndTime' has not expired then MissingLastRevSeeker will check all
potential candidates.


Proposal: if this code *is* used regularly, we'll need an API so that

DocumentStore implementations other than Mongo can optimize the query.
+1. Since, It will be executed on every startup. RDBDocumentStore already
maintains the index on _modified property so, optimized querying is
possible.

Thanks
Amit


OK, so can we put what's needed into the DocumentStore API, or 
alternatively have an extension interface, that both MongoDocumentStore 
and RDBDocumentStore could implement?


Best regards, Julian


[VOTE] Release Apache Jackrabbit Oak 1.0.5

2014-08-26 Thread Thomas Mueller
A candidate for the Jackrabbit Oak 1.0.5 release is available at:

https://dist.apache.org/repos/dist/dev/jackrabbit/oak/1.0.5/

The release candidate is a zip archive of the sources in:


https://svn.apache.org/repos/asf/jackrabbit/oak/tags/jackrabbit-oak-1.0.5/

The SHA1 checksum of the archive is
2cd71913fe66ba9491ee7edb4e82469e228412c9.

A staged Maven repository is available for review at:

https://repository.apache.org/

The command for running automated checks against this release candidate is:

$ sh check-release.sh oak 1.0.5
2cd71913fe66ba9491ee7edb4e82469e228412c9

Please vote on releasing this package as Apache Jackrabbit Oak 1.0.5.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 Jackrabbit PMC votes are cast.

[ ] +1 Release this package as Apache Jackrabbit Oak 1.0.5
[ ] -1 Do not release this package because...

My vote is +1

Regards
Thomas




Re: [VOTE] Release Apache Jackrabbit Oak 1.0.5

2014-08-26 Thread Michael Dürig



On 26.8.14 8:42 , Thomas Mueller wrote:

Please vote on releasing this package as Apache Jackrabbit Oak 1.0.5.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 Jackrabbit PMC votes are cast.

 [X] +1 Release this package as Apache Jackrabbit Oak 1.0.5


Michael


Re: [VOTE] Release Apache Jackrabbit Oak 1.0.5

2014-08-26 Thread Alex Parvulescu
+1 all checks ok


On Tue, Aug 26, 2014 at 8:42 AM, Thomas Mueller muel...@adobe.com wrote:

 A candidate for the Jackrabbit Oak 1.0.5 release is available at:

 https://dist.apache.org/repos/dist/dev/jackrabbit/oak/1.0.5/

 The release candidate is a zip archive of the sources in:


 https://svn.apache.org/repos/asf/jackrabbit/oak/tags/jackrabbit-oak-1.0.5/

 The SHA1 checksum of the archive is
 2cd71913fe66ba9491ee7edb4e82469e228412c9.

 A staged Maven repository is available for review at:

 https://repository.apache.org/

 The command for running automated checks against this release candidate is:

 $ sh check-release.sh oak 1.0.5
 2cd71913fe66ba9491ee7edb4e82469e228412c9

 Please vote on releasing this package as Apache Jackrabbit Oak 1.0.5.
 The vote is open for the next 72 hours and passes if a majority of at
 least three +1 Jackrabbit PMC votes are cast.

 [ ] +1 Release this package as Apache Jackrabbit Oak 1.0.5
 [ ] -1 Do not release this package because...

 My vote is +1

 Regards
 Thomas





Re: JCR API implementation transparency

2014-08-26 Thread Michael Dürig



On 26.8.14 7:14 , Tobias Bocanegra wrote:

IMO, this should work, even if the value is not a ValueImpl. In this
case, it should fall back to the API methods to read the binary.
WDYT?


Ack. This is most likely a regression introduces with OAK-1164.

Michael


Re: [DISCUSS] supporting faceting in Oak query engine

2014-08-26 Thread Tommaso Teofili
Hi Laurie,

2014-08-25 18:43 GMT+02:00 Laurie Byrum lby...@adobe.com:

 Hi Tommaso,
 I am happy to see this thread!


;-)



 Questions:
 Do you expect to want to support hierarchical or pivoted facets soonish?


I would say 'why not' if we have a valid use case.


 If so, does that influence this decision?


I think so, especially it would influence the way that may be implemented.


 Do you know how ACLs will come into play with your facet implementation?


not yet, I think that's one of the open points (e.g. Lukas mentioned that
HippoCMS did use 'virtual nodes' for them) we should take care of; each
'term' in the facet should be properly checked, but of course doing this
kind of check at that fine grain would be costly so we need to come up with
a solution which is both correct from the security point of view and
performant.


 If so, does that influence this decision? :-)


yes, I think so :)

Any suggestions and / or feedback would be highly welcome, especially from
potential users of this feature so that we properly tackle your
requirements (if any).

Thanks and regards,
Tommaso



 Thanks!
 Laurie



 On 8/25/14 7:08 AM, Tommaso Teofili tommaso.teof...@gmail.com wrote:

 Hi all,
 
 since this has been asked every now and then [1] and since I think it's a
 pretty useful and common feature for search engine nowadays I'd like to
 discuss introduction of facets [2] for the Oak query engine.
 
 Pros: having facets in search results usually helps filtering (drill down)
 the results before browsing all of them, so the main usage would be for
 client code.
 
 Impact: probably change / addition in both the JCR and Oak APIs to support
 returning other than just nodes (a NodeIterator and a Cursor
 respectively).
 
 Right now a couple of ideas on how we could do that come to my mind, both
 based on the approach of having an Oak index for them:
 1. a (multivalued) property index for facets, meaning we would store the
 facets in the repository, so that we would run a query against it to have
 the facets of an originating query.
 2. a dedicated QueryIndex implementation, eventually leveraging Lucene
 faceting capabilities, which could use the Lucene index we already have,
 together with a sidecar index [3].
 
 What do you think?
 Regards,
 Tommaso
 
 [1] :
 
 http://markmail.org/search/?q=oak%20faceting#query:oak%20faceting%20list%3
 Aorg.apache.jackrabbit.oak-dev+page:1+state:facets
 [2] : http://en.wikipedia.org/wiki/Faceted_search
 [3] :
 
 http://lucene.apache.org/core/4_0_0/facet/org/apache/lucene/facet/doc-file
 s/userguide.html




Re: [VOTE] Release Apache Jackrabbit Oak 1.0.5

2014-08-26 Thread Tommaso Teofili
+1

Tommaso


2014-08-26 8:42 GMT+02:00 Thomas Mueller muel...@adobe.com:

 A candidate for the Jackrabbit Oak 1.0.5 release is available at:

 https://dist.apache.org/repos/dist/dev/jackrabbit/oak/1.0.5/

 The release candidate is a zip archive of the sources in:


 https://svn.apache.org/repos/asf/jackrabbit/oak/tags/jackrabbit-oak-1.0.5/

 The SHA1 checksum of the archive is
 2cd71913fe66ba9491ee7edb4e82469e228412c9.

 A staged Maven repository is available for review at:

 https://repository.apache.org/

 The command for running automated checks against this release candidate is:

 $ sh check-release.sh oak 1.0.5
 2cd71913fe66ba9491ee7edb4e82469e228412c9

 Please vote on releasing this package as Apache Jackrabbit Oak 1.0.5.
 The vote is open for the next 72 hours and passes if a majority of at
 least three +1 Jackrabbit PMC votes are cast.

 [ ] +1 Release this package as Apache Jackrabbit Oak 1.0.5
 [ ] -1 Do not release this package because...

 My vote is +1

 Regards
 Thomas





Re: [DISCUSS] supporting faceting in Oak query engine

2014-08-26 Thread Tommaso Teofili
2014-08-25 19:02 GMT+02:00 Lukas Smith sm...@pooteeweet.org:

 Aloha,


Aloha!



 you should definitely talk to the HippoCMS developers. They forked
 Jackrabbit 2.x to add facetting as virtual nodes. They ran into some
 performance issues but I am sure they still have value-able feedback on
 this.


Cool, thanks for letting us know, if you or any other (from Hippo) would
like to give some more insight on pros and cons of such an approach that'd
be very good.

Regards,
Tommaso



 regards,
 Lukas Kahwe Smith

  On 25 Aug 2014, at 18:43, Laurie Byrum lby...@adobe.com wrote:
 
  Hi Tommaso,
  I am happy to see this thread!
 
  Questions:
  Do you expect to want to support hierarchical or pivoted facets soonish?
  If so, does that influence this decision?
  Do you know how ACLs will come into play with your facet implementation?
  If so, does that influence this decision? :-)
 
  Thanks!
  Laurie
 
 
 
  On 8/25/14 7:08 AM, Tommaso Teofili tommaso.teof...@gmail.com
 wrote:
 
  Hi all,
 
  since this has been asked every now and then [1] and since I think it's
 a
  pretty useful and common feature for search engine nowadays I'd like to
  discuss introduction of facets [2] for the Oak query engine.
 
  Pros: having facets in search results usually helps filtering (drill
 down)
  the results before browsing all of them, so the main usage would be for
  client code.
 
  Impact: probably change / addition in both the JCR and Oak APIs to
 support
  returning other than just nodes (a NodeIterator and a Cursor
  respectively).
 
  Right now a couple of ideas on how we could do that come to my mind,
 both
  based on the approach of having an Oak index for them:
  1. a (multivalued) property index for facets, meaning we would store the
  facets in the repository, so that we would run a query against it to
 have
  the facets of an originating query.
  2. a dedicated QueryIndex implementation, eventually leveraging Lucene
  faceting capabilities, which could use the Lucene index we already
 have,
  together with a sidecar index [3].
 
  What do you think?
  Regards,
  Tommaso
 
  [1] :
 
 http://markmail.org/search/?q=oak%20faceting#query:oak%20faceting%20list%3
  Aorg.apache.jackrabbit.oak-dev+page:1+state:facets
  [2] : http://en.wikipedia.org/wiki/Faceted_search
  [3] :
 
 http://lucene.apache.org/core/4_0_0/facet/org/apache/lucene/facet/doc-file
  s/userguide.html
 



Re: MissingLastRevSeeker

2014-08-26 Thread Amit Jain
Hi,

 OK, so can we put what's needed into the DocumentStore API, or
alternatively have an extension interface, that both MongoDocumentStore and
RDBDocumentStore could implement?

It would make sense to add a generic method which queries on a particular
property(possibly limiting to only indexed ones), like below, to the
DocumentStore interface.
T extends Document ListT queryProperty(CollectionT collection,
   String indexedProperty,
   String fromKey,
   String toKey,
   int limit);
Thoughts?

Thanks
Amit

On Tue, Aug 26, 2014 at 12:03 PM, Julian Reschke 
julian.resc...@greenbytes.de wrote:

 On 2014-08-26 08:03, Amit Jain wrote:

 Hi Julian,

 The LastRevRecoveryAgent is executed at 2 places
 1. On DocumentNodeStore startup where the MissingLastRevSeeker is used to
 get potential candidates for recovery.
   2. At regular intervals defined by the property
 'lastRevRecoveryJobIntervalInSecs' in the DocumentNodeStoreService
 (default
 60 seconds). Short description is that MissingLastRevSeeker will be called
 rarely in this case.
 Long description - In this case a less expensive query is executed to find
 out all the stale clusterNodes for which recovery is to be performed. If
 there are clusterNodes that have unexpectedly shutdown and their
 'leaseEndTime' has not expired then MissingLastRevSeeker will check all
 potential candidates.

  Proposal: if this code *is* used regularly, we'll need an API so that

 DocumentStore implementations other than Mongo can optimize the query.
 +1. Since, It will be executed on every startup. RDBDocumentStore already
 maintains the index on _modified property so, optimized querying is
 possible.

 Thanks
 Amit


 OK, so can we put what's needed into the DocumentStore API, or
 alternatively have an extension interface, that both MongoDocumentStore and
 RDBDocumentStore could implement?

 Best regards, Julian



Re: [VOTE] Release Apache Jackrabbit Oak 1.0.5

2014-08-26 Thread Davide Giannella
On 26/08/2014 07:42, Thomas Mueller wrote:
 [X] +1 Release this package as Apache Jackrabbit Oak 1.0.5

Davide




Re: MissingLastRevSeeker

2014-08-26 Thread Marcel Reutegger
Hi,

I would only add it if really necessary. We already have a very
similar method:

/**
 * Get a list of documents where the key is greater than a start value
and
 * less than an end value. The returned documents are immutable.
 *
 * @param T the document type
 * @param collection the collection
 * @param fromKey the start value (excluding)
 * @param toKey the end value (excluding)
 * @param indexedProperty the name of the indexed property (optional)
 * @param startValue the minimum value of the indexed property
 * @param limit the maximum number of entries to return
 * @return the list (possibly empty)
 */
@Nonnull
T extends Document ListT query(CollectionT collection,
   String fromKey,
   String toKey,
   String indexedProperty,
   long startValue,
   int limit);

Can't we use this method to at least narrow down the query to the
lower bound? I think for the purpose of the last rev seeker, this
should be sufficient.

Regards
 Marcel



On 26/08/14 10:18, Amit Jain am...@ieee.org wrote:

Hi,

 OK, so can we put what's needed into the DocumentStore API, or
alternatively have an extension interface, that both MongoDocumentStore
and
RDBDocumentStore could implement?

It would make sense to add a generic method which queries on a particular
property(possibly limiting to only indexed ones), like below, to the
DocumentStore interface.
T extends Document ListT queryProperty(CollectionT collection,
   String indexedProperty,
   String fromKey,
   String toKey,
   int limit);
Thoughts?

Thanks
Amit

On Tue, Aug 26, 2014 at 12:03 PM, Julian Reschke 
julian.resc...@greenbytes.de wrote:

 On 2014-08-26 08:03, Amit Jain wrote:

 Hi Julian,

 The LastRevRecoveryAgent is executed at 2 places
 1. On DocumentNodeStore startup where the MissingLastRevSeeker is used
to
 get potential candidates for recovery.
   2. At regular intervals defined by the property
 'lastRevRecoveryJobIntervalInSecs' in the DocumentNodeStoreService
 (default
 60 seconds). Short description is that MissingLastRevSeeker will be
called
 rarely in this case.
 Long description - In this case a less expensive query is executed to
find
 out all the stale clusterNodes for which recovery is to be performed.
If
 there are clusterNodes that have unexpectedly shutdown and their
 'leaseEndTime' has not expired then MissingLastRevSeeker will check all
 potential candidates.

  Proposal: if this code *is* used regularly, we'll need an API so that

 DocumentStore implementations other than Mongo can optimize the query.
 +1. Since, It will be executed on every startup. RDBDocumentStore
already
 maintains the index on _modified property so, optimized querying is
 possible.

 Thanks
 Amit


 OK, so can we put what's needed into the DocumentStore API, or
 alternatively have an extension interface, that both MongoDocumentStore
and
 RDBDocumentStore could implement?

 Best regards, Julian




Re: MissingLastRevSeeker

2014-08-26 Thread Amit Jain
Hi,

I was proposing the additional method for cases where we want to query the
indexed properties other than _id like needed in MongoBlobReferenceIterator
and MongoMissingLastRevSeeker.

But,
 Can't we use this method to at least narrow down the query to the
 lower bound? I think for the purpose of the last rev seeker, this
 should be sufficient.
Yes, this should speed up the query from what we have currently. So, right
now we can make this change and see if further improvement is necessary.
Will create a jira to track this.



On Tue, Aug 26, 2014 at 2:37 PM, Marcel Reutegger mreut...@adobe.com
wrote:

 Hi,

 I would only add it if really necessary. We already have a very
 similar method:

 /**
  * Get a list of documents where the key is greater than a start value
 and
  * less than an end value. The returned documents are immutable.
  *
  * @param T the document type
  * @param collection the collection
  * @param fromKey the start value (excluding)
  * @param toKey the end value (excluding)
  * @param indexedProperty the name of the indexed property (optional)
  * @param startValue the minimum value of the indexed property
  * @param limit the maximum number of entries to return
  * @return the list (possibly empty)
  */
 @Nonnull
 T extends Document ListT query(CollectionT collection,
String fromKey,
String toKey,
String indexedProperty,
long startValue,
int limit);

 Can't we use this method to at least narrow down the query to the
 lower bound? I think for the purpose of the last rev seeker, this
 should be sufficient.

 Regards
  Marcel



 On 26/08/14 10:18, Amit Jain am...@ieee.org wrote:

 Hi,
 
  OK, so can we put what's needed into the DocumentStore API, or
 alternatively have an extension interface, that both MongoDocumentStore
 and
 RDBDocumentStore could implement?
 
 It would make sense to add a generic method which queries on a particular
 property(possibly limiting to only indexed ones), like below, to the
 DocumentStore interface.
 T extends Document ListT queryProperty(CollectionT collection,
String indexedProperty,
String fromKey,
String toKey,
int limit);
 Thoughts?
 
 Thanks
 Amit
 
 On Tue, Aug 26, 2014 at 12:03 PM, Julian Reschke 
 julian.resc...@greenbytes.de wrote:
 
  On 2014-08-26 08:03, Amit Jain wrote:
 
  Hi Julian,
 
  The LastRevRecoveryAgent is executed at 2 places
  1. On DocumentNodeStore startup where the MissingLastRevSeeker is used
 to
  get potential candidates for recovery.
2. At regular intervals defined by the property
  'lastRevRecoveryJobIntervalInSecs' in the DocumentNodeStoreService
  (default
  60 seconds). Short description is that MissingLastRevSeeker will be
 called
  rarely in this case.
  Long description - In this case a less expensive query is executed to
 find
  out all the stale clusterNodes for which recovery is to be performed.
 If
  there are clusterNodes that have unexpectedly shutdown and their
  'leaseEndTime' has not expired then MissingLastRevSeeker will check all
  potential candidates.
 
   Proposal: if this code *is* used regularly, we'll need an API so that
 
  DocumentStore implementations other than Mongo can optimize the query.
  +1. Since, It will be executed on every startup. RDBDocumentStore
 already
  maintains the index on _modified property so, optimized querying is
  possible.
 
  Thanks
  Amit
 
 
  OK, so can we put what's needed into the DocumentStore API, or
  alternatively have an extension interface, that both MongoDocumentStore
 and
  RDBDocumentStore could implement?
 
  Best regards, Julian
 




Re: reindex improvements

2014-08-26 Thread Nicolas Peltier
Hi Davide, 

this would be nice indeed! wouldn’t that be “indexPath”, not “re-indexPath” ?

Nicolas
On 26 Aug 2014, at 12:04, Davide Giannella dav...@apache.org wrote:

 Hello team,
 
 when we issue the reindex by changing the index definition with
 `reindex=true` the algorithm scan all the repository and issue the node
 modified/added to the specified index.
 
 While this works with small repositories it doesn't really scale with
 big ones.
 
 So for taking an extreme example, we have 2 millions node repository
 with only 1 node with the required property. The reindex will keep going
 for as long the 2m node have not been scanned. And with very active
 repositories where we changes a lot of nodes, manually or not, we could
 virtually have an endless reindexing.
 
 Based on my experience with content repositories normally clients are
 interested in querying only parts of it. For example /content.
 
 I was thinking that it could be a good added value if we could add an
 additional property to the index definition: reindexPaths (multivalue,
 String).
 
 When this property is specified, the reindex will happens only on those
 paths in the order as they are specified and it could potentially makes
 the currently indexed content available to the query engine for
 returning partial results when every path is completed.
 
 A single path could be just path or a glob/regex. I'm for using a java
 regex as it gives the end user a lot of power on fine tuning but on the
 other hand regex evaluation is pretty slow...
 
 thoughts?
 
 Cheers
 Davide
 
 
 



Re: [VOTE] Release Apache Jackrabbit Oak 1.0.5

2014-08-26 Thread Julian Reschke

On 2014-08-26 08:42, Thomas Mueller wrote:
 ...

Please vote on releasing this package as Apache Jackrabbit Oak 1.0.5.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 Jackrabbit PMC votes are cast.

 [ ] +1 Release this package as Apache Jackrabbit Oak 1.0.5
 [ ] -1 Do not release this package because...
...


[X] +1 Release this package as Apache Jackrabbit Oak 1.0.5


Re: [DISCUSS] supporting faceting in Oak query engine

2014-08-26 Thread Chetan Mehrotra
This looks useful Tommaso. With OAK-2005 we should be able to support
multiple LuceneIndexes and manage them easily.

If we can abstract all this out and just expose the facet information
as virtual node that would simplify the stuff for end users. Probably
we can have a read only NodeStore impl to expose the faceted data
bound to a system path. Otherwise we would need to expose the Lucene
API and OakDirectory
Chetan Mehrotra


On Tue, Aug 26, 2014 at 1:28 PM, Tommaso Teofili
tommaso.teof...@gmail.com wrote:
 2014-08-25 19:02 GMT+02:00 Lukas Smith sm...@pooteeweet.org:

 Aloha,


 Aloha!



 you should definitely talk to the HippoCMS developers. They forked
 Jackrabbit 2.x to add facetting as virtual nodes. They ran into some
 performance issues but I am sure they still have value-able feedback on
 this.


 Cool, thanks for letting us know, if you or any other (from Hippo) would
 like to give some more insight on pros and cons of such an approach that'd
 be very good.

 Regards,
 Tommaso



 regards,
 Lukas Kahwe Smith

  On 25 Aug 2014, at 18:43, Laurie Byrum lby...@adobe.com wrote:
 
  Hi Tommaso,
  I am happy to see this thread!
 
  Questions:
  Do you expect to want to support hierarchical or pivoted facets soonish?
  If so, does that influence this decision?
  Do you know how ACLs will come into play with your facet implementation?
  If so, does that influence this decision? :-)
 
  Thanks!
  Laurie
 
 
 
  On 8/25/14 7:08 AM, Tommaso Teofili tommaso.teof...@gmail.com
 wrote:
 
  Hi all,
 
  since this has been asked every now and then [1] and since I think it's
 a
  pretty useful and common feature for search engine nowadays I'd like to
  discuss introduction of facets [2] for the Oak query engine.
 
  Pros: having facets in search results usually helps filtering (drill
 down)
  the results before browsing all of them, so the main usage would be for
  client code.
 
  Impact: probably change / addition in both the JCR and Oak APIs to
 support
  returning other than just nodes (a NodeIterator and a Cursor
  respectively).
 
  Right now a couple of ideas on how we could do that come to my mind,
 both
  based on the approach of having an Oak index for them:
  1. a (multivalued) property index for facets, meaning we would store the
  facets in the repository, so that we would run a query against it to
 have
  the facets of an originating query.
  2. a dedicated QueryIndex implementation, eventually leveraging Lucene
  faceting capabilities, which could use the Lucene index we already
 have,
  together with a sidecar index [3].
 
  What do you think?
  Regards,
  Tommaso
 
  [1] :
 
 http://markmail.org/search/?q=oak%20faceting#query:oak%20faceting%20list%3
  Aorg.apache.jackrabbit.oak-dev+page:1+state:facets
  [2] : http://en.wikipedia.org/wiki/Faceted_search
  [3] :
 
 http://lucene.apache.org/core/4_0_0/facet/org/apache/lucene/facet/doc-file
  s/userguide.html
 



Re: MissingLastRevSeeker

2014-08-26 Thread Julian Reschke

On 2014-08-26 08:03, Amit Jain wrote:

Hi Julian,

The LastRevRecoveryAgent is executed at 2 places
1. On DocumentNodeStore startup where the MissingLastRevSeeker is used to
get potential candidates for recovery.


Sure? I've been logging it, and I don't see it called on every startup...


 ...


Best regards, Julian


Re: MissingLastRevSeeker

2014-08-26 Thread Julian Reschke

On 2014-08-26 11:32, Amit Jain wrote:

Hi,

I was proposing the additional method for cases where we want to query the
indexed properties other than _id like needed in MongoBlobReferenceIterator
and MongoMissingLastRevSeeker.

But,

Can't we use this method to at least narrow down the query to the
lower bound? I think for the purpose of the last rev seeker, this
should be sufficient.

Yes, this should speed up the query from what we have currently. So, right
now we can make this change and see if further improvement is necessary.
Will create a jira to track this.
...


+1.

I can take over, if you want...



Re: reindex improvements

2014-08-26 Thread Justin Edelson
Hi Davide,
So what would happen to the already-indexed content which wasn't in
one of the reindexPaths?

For example, let's say I'm building an index of a property called
keywords. In the repo, I have:

/content/foo@keywords=something
/content/bar/one@keywords=something
/content/bar/two@keywords=something

And then I trigger a reindex with reindexPaths = /content/bar.

Would //element(*)[@keywords='something'] still return /content/foo ?

Regards,
Justin


On Tue, Aug 26, 2014 at 6:04 AM, Davide Giannella dav...@apache.org wrote:
 Hello team,

 when we issue the reindex by changing the index definition with
 `reindex=true` the algorithm scan all the repository and issue the node
 modified/added to the specified index.

 While this works with small repositories it doesn't really scale with
 big ones.

 So for taking an extreme example, we have 2 millions node repository
 with only 1 node with the required property. The reindex will keep going
 for as long the 2m node have not been scanned. And with very active
 repositories where we changes a lot of nodes, manually or not, we could
 virtually have an endless reindexing.

 Based on my experience with content repositories normally clients are
 interested in querying only parts of it. For example /content.

 I was thinking that it could be a good added value if we could add an
 additional property to the index definition: reindexPaths (multivalue,
 String).

 When this property is specified, the reindex will happens only on those
 paths in the order as they are specified and it could potentially makes
 the currently indexed content available to the query engine for
 returning partial results when every path is completed.

 A single path could be just path or a glob/regex. I'm for using a java
 regex as it gives the end user a lot of power on fine tuning but on the
 other hand regex evaluation is pretty slow...

 thoughts?

 Cheers
 Davide





Re: reindex improvements

2014-08-26 Thread Davide Giannella
On 26/08/2014 11:27, Nicolas Peltier wrote:
 Hi Davide, 

 this would be nice indeed! wouldn’t that be “indexPath”, not “re-indexPath” ?

I'd rather keep a sort of namespace in the property naming. By stating
`reindexPath` it should be clear that is only related to reindexing and
that if the index is then global (under /oak:indexs) it will index all
repository.

Any other opinions? I'm not convinced yet in my idea. There's something
there that smells for me. :)

Cheers
Davide




Re: reindex improvements

2014-08-26 Thread Davide Giannella
On 26/08/2014 14:13, Justin Edelson wrote:
 Hi Davide,
 So what would happen to the already-indexed content which wasn't in
 one of the reindexPaths?

 For example, let's say I'm building an index of a property called
 keywords. In the repo, I have:

 /content/foo@keywords=something
 /content/bar/one@keywords=something
 /content/bar/two@keywords=something

 And then I trigger a reindex with reindexPaths = /content/bar.

 Would //element(*)[@keywords='something'] still return /content/foo ?

In my idea no.

Currently when reindexing the :index node, where the actual index is
stored, is deleted and recreated.

I would keep the same approach. I'm thinking of this as an advanced
feature that someone has to know how to use it. So in the above example
I would specify either: /content or /content/bar, /content/foo.

It's a dangerous thing though. I can see it. :)

D.




Re: reindex improvements

2014-08-26 Thread Justin Edelson
Hi,

On Tue, Aug 26, 2014 at 10:01 AM, Davide Giannella dav...@apache.org wrote:
 On 26/08/2014 14:13, Justin Edelson wrote:
 Hi Davide,
 So what would happen to the already-indexed content which wasn't in
 one of the reindexPaths?

 For example, let's say I'm building an index of a property called
 keywords. In the repo, I have:

 /content/foo@keywords=something
 /content/bar/one@keywords=something
 /content/bar/two@keywords=something

 And then I trigger a reindex with reindexPaths = /content/bar.

 Would //element(*)[@keywords='something'] still return /content/foo ?

 In my idea no.

 Currently when reindexing the :index node, where the actual index is
 stored, is deleted and recreated.

 I would keep the same approach. I'm thinking of this as an advanced
 feature that someone has to know how to use it. So in the above example
 I would specify either: /content or /content/bar, /content/foo.

 It's a dangerous thing though. I can see it. :)

In this case, I think Thomas's suggestion makes much more sense. Let's
just add a property to the QID which allows an index to be restricted
to particular paths.

Regards,
Justin


 D.