Hi Davide, So what would happen to the already-indexed content which wasn't in one of the reindexPaths?
For example, let's say I'm building an index of a property called "keywords". In the repo, I have: /content/foo@keywords=something /content/bar/one@keywords=something /content/bar/two@keywords=something And then I trigger a reindex with reindexPaths = /content/bar. Would //element(*)[@keywords='something'] still return /content/foo ? Regards, Justin On Tue, Aug 26, 2014 at 6:04 AM, Davide Giannella <dav...@apache.org> wrote: > Hello team, > > when we issue the reindex by changing the index definition with > `reindex=true` the algorithm scan all the repository and issue the "node > modified/added" to the specified index. > > While this works with small repositories it doesn't really scale with > big ones. > > So for taking an extreme example, we have 2 millions node repository > with only 1 node with the required property. The reindex will keep going > for as long the 2m node have not been scanned. And with very active > repositories where we changes a lot of nodes, manually or not, we could > virtually have an endless reindexing. > > Based on my experience with content repositories normally clients are > interested in querying only parts of it. For example /content. > > I was thinking that it could be a good added value if we could add an > additional property to the index definition: reindexPaths (multivalue, > String). > > When this property is specified, the reindex will happens only on those > paths in the order as they are specified and it could potentially makes > the currently indexed content available to the query engine for > returning partial results when every path is completed. > > A single path could be just path or a glob/regex. I'm for using a java > regex as it gives the end user a lot of power on fine tuning but on the > other hand regex evaluation is pretty slow... > > thoughts? > > Cheers > Davide > > >