[jira] [Commented] (OAK-5937) Disable query where path restriction is not absolute

2017-11-20 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16259110#comment-16259110
 ] 

Chetan Mehrotra commented on OAK-5937:
--

[~tmueller] Should this be done (see last comment above)

> Disable query where path restriction is not absolute
> 
>
> Key: OAK-5937
> URL: https://issues.apache.org/jira/browse/OAK-5937
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: query
>Reporter: Chetan Mehrotra
>Priority: Minor
> Fix For: 1.8
>
>
> Query like below cannot be executed in a performant way. We should provide an 
> option to reject such queries
> //content/foo/bar



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-5661) Make NRT indexing resilient against unbounded growth

2017-11-20 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-5661:
-
Fix Version/s: (was: 1.8)
   1.10

> Make NRT indexing resilient against unbounded growth
> 
>
> Key: OAK-5661
> URL: https://issues.apache.org/jira/browse/OAK-5661
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.10
>
>
> NRT Indexes for volatile indexes [1] can grow large if async index update 
> faces issues. Like if it gets stuck for days or due to some bug index do not 
> get updates like in OAK-5649 then the sizes can grow very large.
> For such cases we should add some checks in logic where system can ensure 
> that some cleanup is performed or writes to indexes are stopped. Also such a 
> situation should be flagged 
> [1] Indexes which see lots of addition and deletions. So effective indexing 
> size is smaller however if deletions are not applied (as is the case with 
> NRT) such indexes can grow large



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-5927) Load excerpt lazily

2017-11-20 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16259111#comment-16259111
 ] 

Chetan Mehrotra commented on OAK-5927:
--

[~teofili] Can we do this for 1.8 release?

> Load excerpt lazily
> ---
>
> Key: OAK-5927
> URL: https://issues.apache.org/jira/browse/OAK-5927
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Chetan Mehrotra
>  Labels: performance
> Fix For: 1.8
>
>
> Currently LucenePropertyIndex loads the excerpt eagerly in batch as part of 
> loadDocs call. The load docs batch size doubles starting from 50 (max 100k) 
> as more data is read. 
> We should look into ways to make the excerpt loaded lazily as and when caller 
> ask for excerpt.
> Note that currently the excerpt are only loaded when query request for 
> excerpt i.e. there is a not null property restriction for {{rep:excerpt}}. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-5924) Prevent long running query from delaying refresh of index

2017-11-20 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-5924:
-
Fix Version/s: (was: 1.8)
   1.10

> Prevent long running query from delaying refresh of index
> -
>
> Key: OAK-5924
> URL: https://issues.apache.org/jira/browse/OAK-5924
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.10
>
>
> Whenever the index gets updated {{IndexTracker}} detects the changes and open 
> new {{IndexNode}} and closes old index nodes. This flow would block untill 
> all old IndexNode are closed.
> IndexNode close itself relies on a writer lock. It can happen that a long 
> running query i.e. a query which is about to read a page of large is 
> currently executing on the old IndexNode instance. This query is trying load 
> 100k  docs and is very slow (due to loading of excerpt) then such a query 
> would prevent the IndexNode from getting closed. This in turn would prevent 
> the index from seeing latest data and become stale.
> To make query and indexing more resilient we should look if current IndexNode 
> being used for query is closing or not. If closing then query should open a 
> fresh searcher



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-5791) Reduce number of calls while adding a new node

2017-11-20 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-5791:
-
Fix Version/s: (was: 1.8)
   1.10

> Reduce number of calls while adding a new node 
> ---
>
> Key: OAK-5791
> URL: https://issues.apache.org/jira/browse/OAK-5791
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Chetan Mehrotra
> Fix For: 1.10
>
>
> Adding a new child node currently takes 2 remote calls. We should look into 
> reducing this to 1



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-5553) Index async index in a new lane without blocking the main lane

2017-11-20 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-5553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-5553:
-
Fix Version/s: (was: 1.8)
   1.10

> Index async index in a new lane without blocking the main lane
> --
>
> Key: OAK-5553
> URL: https://issues.apache.org/jira/browse/OAK-5553
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: indexing
>Reporter: Chetan Mehrotra
> Fix For: 1.10
>
>
> Currently if an async index has to be reindex for any reason say update of 
> index definition then this process blocks the indexing of other indexes on 
> that lane. 
> For e.g. if on "async" lane we have 2 indexes /oak:index/fooIndex and 
> /oak:index/barIndex and fooIndex needs to be reindexed. In such a case 
> currently AsyncIndexUpdate would work on reindexing and untill that gets 
> complete other index do not receive any update. If the reindexing takes say 1 
> day then other index would start lagging behind by that time. Note that NRT 
> indexing would help somewhat here.
> To improve this we can implement something similar to what was done for 
> property index in OAK-1456 i.e. provide a way where 
> # an admin can trigger reindex of some async indexes
> # those indexes are moved to different lane and then reindexed
> # post reindexing logic should then move them back to there original lane
> Further this task can then be performed on non leader node as the indexes 
> would not be part of any active lane. Also we may implement it as part of 
> oak-run



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-5458) Test failure: RepositoryBootIT.repositoryLogin

2017-11-20 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-5458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-5458:
-
Fix Version/s: (was: 1.8)
   1.10

> Test failure: RepositoryBootIT.repositoryLogin
> --
>
> Key: OAK-5458
> URL: https://issues.apache.org/jira/browse/OAK-5458
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: continuous integration, examples
>Affects Versions: 1.4, 1.5.18
>Reporter: Hudson
>Assignee: Chetan Mehrotra
>  Labels: test-failure, ubuntu
> Fix For: 1.4.19, 1.6.7, 1.10
>
> Attachments: unit-tests-1379.log
>
>
> Jenkins CI failure: 
> https://builds.apache.org/job/Apache%20Jackrabbit%20Oak%20matrix/
> The build Apache Jackrabbit Oak matrix/Ubuntu Slaves=ubuntu,jdk=JDK 1.8 
> (latest),nsfixtures=SEGMENT_MK,profile=integrationTesting #1369 has failed.
> First failed run: [Apache Jackrabbit Oak matrix/Ubuntu Slaves=ubuntu,jdk=JDK 
> 1.8 (latest),nsfixtures=SEGMENT_MK,profile=integrationTesting 
> #1369|https://builds.apache.org/job/Apache%20Jackrabbit%20Oak%20matrix/Ubuntu%20Slaves=ubuntu,jdk=JDK%201.8%20(latest),nsfixtures=SEGMENT_MK,profile=integrationTesting/1369/]
>  [console 
> log|https://builds.apache.org/job/Apache%20Jackrabbit%20Oak%20matrix/Ubuntu%20Slaves=ubuntu,jdk=JDK%201.8%20(latest),nsfixtures=SEGMENT_MK,profile=integrationTesting/1369/console]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-5121) review CommitInfo==null in BackgroundObserver with isExternal change

2017-11-20 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-5121:
-
Fix Version/s: (was: 1.8)
   1.10

> review CommitInfo==null in BackgroundObserver with isExternal change
> 
>
> Key: OAK-5121
> URL: https://issues.apache.org/jira/browse/OAK-5121
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: core
>Affects Versions: 1.5.13
>Reporter: Stefan Egli
>Assignee: Chetan Mehrotra
> Fix For: 1.10
>
> Attachments: OAK-5121.patch
>
>
> OAK-4898 changes CommitInfo to be never null. This is the case outside of the 
> BackgroundObserver - but in the BackgroundObserver itself it is explicitly 
> set to null when compacting. 
> Once OAK-4898 is committed this task is about reviewing the implications in 
> BackgroundObserver wrt compaction and CommitInfo==null



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-4643) Support multiple readers in suggester, spellcheck and faceted search

2017-11-20 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-4643:
-
Fix Version/s: (was: 1.8)
   1.10

> Support multiple readers in suggester, spellcheck and faceted search
> 
>
> Key: OAK-4643
> URL: https://issues.apache.org/jira/browse/OAK-4643
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Chetan Mehrotra
> Fix For: 1.10
>
>
> As part of OAK-4566 normal search has been modified to support multiple 
> readers. However for suggester, spellcheck and faceted search the logic is 
> still working with the assumption of single reader. 
> Those parts need to be adapted to support multiple readers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-4647) Multiplexing support in PropertyIndexStats MBean

2017-11-20 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-4647:
-
Fix Version/s: (was: 1.8)
   1.10

> Multiplexing support in PropertyIndexStats MBean
> 
>
> Key: OAK-4647
> URL: https://issues.apache.org/jira/browse/OAK-4647
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: property-index
>Reporter: Chetan Mehrotra
>Priority: Minor
> Fix For: 1.10
>
>
> {{PropertyIndexStats}} MBean added in OAK-4144 allows introspecting property 
> index content. This needs to be adapted to support updated storage format 
> when multiplexing is enabled



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-4651) Clarify equality contract for CommitInfo

2017-11-20 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-4651:
-
Fix Version/s: (was: 1.8)

> Clarify equality contract for CommitInfo
> 
>
> Key: OAK-4651
> URL: https://issues.apache.org/jira/browse/OAK-4651
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: core
>Reporter: Chetan Mehrotra
>Priority: Minor
>
> Currently {{CommitInfo}} performs a equals which also check about equality of 
> info content. With OAK-4640 we also have reference to mutable CommitContext 
> within the info map. 
> Due to this CommitInfo cannot be used as key in a map. Purpose of this task 
> is to determine if there is need for equality check to be performed on 
> CommitInfo and determine if CommitContext should be included in that equality 
> check



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-3598) Export cache related classes for usage in other oak bundle

2017-11-20 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-3598:
-
Fix Version/s: (was: 1.8)
   1.10

> Export cache related classes for usage in other oak bundle
> --
>
> Key: OAK-3598
> URL: https://issues.apache.org/jira/browse/OAK-3598
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: cache
>Reporter: Chetan Mehrotra
>  Labels: tech-debt
> Fix For: 1.10
>
>
> For OAK-3092 oak-lucene would need to access classes from 
> {{org.apache.jackrabbit.oak.cache}} package. For now its limited to 
> {{CacheStats}} to expose the cache related statistics.
> This task is meant to determine steps needed to export the package 
> * Update the pom.xml to export the package
> * Review current set of classes to see if they need to be reviewed



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-2787) Faster multi threaded indexing / text extraction for binary content

2017-11-20 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-2787:
-
Fix Version/s: (was: 1.8)
   1.10

> Faster multi threaded indexing / text extraction for binary content
> ---
>
> Key: OAK-2787
> URL: https://issues.apache.org/jira/browse/OAK-2787
> Project: Jackrabbit Oak
>  Issue Type: Wish
>  Components: lucene
>Reporter: Chetan Mehrotra
> Fix For: 1.10
>
>
> With Lucene based indexing the indexing process is single threaded. This 
> hamper the indexing of binary content as on a multi processor system only 
> single thread can be used to perform the indexing
> [~ianeboston] Suggested a possible approach [1] involving a 2 phase indexing
> # In first phase detect the nodes to be indexed and start the full text 
> extraction of the binary content. Post extraction save the binary token 
> stream back to the node as a hidden data. In this phase the node properties 
> can still be indexed and a marker field would be added to indicate the 
> fulltext index is still pending
> # Later in 2nd phase look for all such Lucene docs and then update them with 
> the saved token stream
> This would allow the text extraction logic to be decouple from Lucene 
> indexing logic
> [1] http://markmail.org/thread/2w5o4bwqsosb6esu



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-3150) Update Lucene to 6.x series

2017-11-20 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-3150:
-
Fix Version/s: (was: 1.8)
   1.10

> Update Lucene to 6.x series
> ---
>
> Key: OAK-3150
> URL: https://issues.apache.org/jira/browse/OAK-3150
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Tommaso Teofili
>  Labels: technical_debt
> Fix For: 1.10
>
>
> We should look into updating the Lucene version to 6.x. Java 8 is the minimum 
> Java version required
> Note this is to be done for trunk only



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-6972) DefaultIndexReader closes suggest directory multiple times

2017-11-22 Thread Chetan Mehrotra (JIRA)
Chetan Mehrotra created OAK-6972:


 Summary: DefaultIndexReader closes suggest directory multiple times
 Key: OAK-6972
 URL: https://issues.apache.org/jira/browse/OAK-6972
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: lucene
Affects Versions: 1.7.11
Reporter: Chetan Mehrotra
Assignee: Chetan Mehrotra
Priority: Minor
 Fix For: 1.8, 1.7.12


With OAK-6895 DefaultIndexReader now closes CopyOnReadDirectory used for 
suggestor multiple times. This leads to exception later 

{noformat}
21.11.2017 13:53:52.750 *WARN* [oak-lucene-2162] 
org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexProviderService Error 
occurred in asynchronous processing
org.apache.lucene.store.AlreadyClosedException: this Directory is closed
at 
org.apache.lucene.store.BaseDirectory.ensureOpen(BaseDirectory.java:66)
at 
org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:338)
at 
org.apache.jackrabbit.oak.plugins.index.lucene.directory.LocalIndexFile.getFSDir(LocalIndexFile.java:125)
at 
org.apache.jackrabbit.oak.plugins.index.lucene.directory.LocalIndexFile.(LocalIndexFile.java:43)
at 
org.apache.jackrabbit.oak.plugins.index.lucene.IndexCopier.deleteFile(IndexCopier.java:276)
at 
org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnReadDirectory.removeDeletedFiles(CopyOnReadDirectory.java:315)
at 
org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnReadDirectory.access$300(CopyOnReadDirectory.java:51)
at 
org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnReadDirectory$2.run(CopyOnReadDirectory.java:278)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{noformat}

As a fix
# Avoid closing the directory multiple times in DefaultIndexReader
# Make CopyOnReadDirectory resilient to multiple close calls



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OAK-6972) DefaultIndexReader closes suggest directory multiple times

2017-11-22 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra resolved OAK-6972.
--
Resolution: Fixed

Done with 1816019

> DefaultIndexReader closes suggest directory multiple times
> --
>
> Key: OAK-6972
> URL: https://issues.apache.org/jira/browse/OAK-6972
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.7.11
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Minor
> Fix For: 1.8, 1.7.12
>
>
> With OAK-6895 DefaultIndexReader now closes CopyOnReadDirectory used for 
> suggestor multiple times. This leads to exception later 
> {noformat}
> 21.11.2017 13:53:52.750 *WARN* [oak-lucene-2162] 
> org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexProviderService 
> Error occurred in asynchronous processing
> org.apache.lucene.store.AlreadyClosedException: this Directory is closed
>   at 
> org.apache.lucene.store.BaseDirectory.ensureOpen(BaseDirectory.java:66)
>   at 
> org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:338)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.LocalIndexFile.getFSDir(LocalIndexFile.java:125)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.LocalIndexFile.(LocalIndexFile.java:43)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.IndexCopier.deleteFile(IndexCopier.java:276)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnReadDirectory.removeDeletedFiles(CopyOnReadDirectory.java:315)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnReadDirectory.access$300(CopyOnReadDirectory.java:51)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnReadDirectory$2.run(CopyOnReadDirectory.java:278)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> As a fix
> # Avoid closing the directory multiple times in DefaultIndexReader
> # Make CopyOnReadDirectory resilient to multiple close calls



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OAK-6257) Move the NodeStoreFixtureProvider support to oak-run-commons

2017-11-22 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra resolved OAK-6257.
--
   Resolution: Fixed
Fix Version/s: 1.7.12

Moved the package to oak-run-commons with 1816028

> Move the NodeStoreFixtureProvider support to oak-run-commons
> 
>
> Key: OAK-6257
> URL: https://issues.apache.org/jira/browse/OAK-6257
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Minor
> Fix For: 1.8, 1.7.12
>
>
> OAK-6210 introduced a generic consistent way to construct NodeStore via cli 
> options. Those classes can be moved to oak-run-commons so that they can be 
> used in benchmark module



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-6917) Configuration presets for DocumentNodeStoreService

2017-11-23 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16264221#comment-16264221
 ] 

Chetan Mehrotra commented on OAK-6917:
--

Abstracting out config logic was long overdue! So good we have it now

Looking at the patch it uses Preset class. This class does not have config 
policy required so it can happen (mostly theoretical) that this component gets 
activated without config and then DNSS get activated (as Preset reference 
becomes satisfied). Now later the config gets delivered by ConfigurationAdmin 
and this component gets reactivated thus causing DNSS to get reactivated. 

Given DocumentNodeStoreServiceConfiguration is already using ConfigAdmin may be 
we use it to fetch config for preset also and thus avoid all this dummy 
reference

PS: I am not sure how SCR and ConfigAdmin interact for component where config 
policy is not required. Would it ensure that if config is already present in 
config admin then it uses that always for first activation post start

> Configuration presets for DocumentNodeStoreService
> --
>
> Key: OAK-6917
> URL: https://issues.apache.org/jira/browse/OAK-6917
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Marcel Reutegger
>Assignee: Marcel Reutegger
>Priority: Minor
> Fix For: 1.8
>
> Attachments: OAK-6917-alternative-approach.patch, OAK-6917.patch, 
> OAK-6917.patch
>
>
> When Oak is deployed in an OSGi container, applications usually want to ship 
> a default configuration which is different from the defaults present in Oak. 
> E.g. an application may want to use a default cache size of 1G for the 
> DocumentNodeStoreService instead of the default 256M. Now if a user of the 
> application provides a custom configuration and does not specify the cache 
> size, the value for this configuration will flip back to the Oak default of 
> 256M.
> There should be a way to configure presets for the application that are 
> different from the Oak defaults and then allow a user to customize the 
> configuration while still respecting the presets.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-3911) Integer overflow causing incorrect file handling in OakDirectory for file size more than 2 GB

2017-11-30 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16274062#comment-16274062
 ] 

Chetan Mehrotra commented on OAK-3911:
--

bq. If you run into this issue, reindexing is not needed. Simply upgrade Oak to 
a more recent version.

[~tmueller] Cannot confirm that as overflow may have caused issue while 
"chunking" i.e. write operation (flushBlob). So if calculations done in that 
method were incorrect then it may impact persisted index file structures. So 
for surety it would be better to reindex such indexes i.e. indexes having index 
files over 2GB

> Integer overflow causing incorrect file handling in OakDirectory for file 
> size more than 2 GB
> -
>
> Key: OAK-3911
> URL: https://issues.apache.org/jira/browse/OAK-3911
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.0.27, 1.2.11, 1.3.15, 1.4
>
> Attachments: OAK-3911-v1.patch
>
>
> In couple of cases we have seen strange error related to invalid seek. In 
> such cases it was seen that file sizes are greater than 2GB. A close 
> inspection of OakDirectory [1] shows that following calls in loadBlob and 
> flushBlob are prone to integer overflow (Thanks [~tmueller])
> * {{int n = (int) Math.min(blobSize, length - index * blobSize);}}
> * {{int n = (int) Math.min(blobSize, length - i * blobSize);}}
> Above both {{blobSize}} and {{index}} and {{i}} are {{int}}. And 
> multiplication of 2 int would be int that can cause overflow.
> {noformat}Caused by: java.io.IOException: Invalid seek request
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.OakDirectory$OakIndexFile.seek(OakDirectory.java:288)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.OakDirectory$OakIndexInput.seek(OakDirectory.java:418)
>   at 
> org.apache.lucene.codecs.BlockTreeTermsReader.seekDir(BlockTreeTermsReader.java:223)
>   at 
> org.apache.lucene.codecs.BlockTreeTermsReader.(BlockTreeTermsReader.java:142)
> {noformat}
> [1] 
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/OakDirectory.java#L361



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7021) Collect DocumentStore stats as part of status zip

2017-12-04 Thread Chetan Mehrotra (JIRA)
Chetan Mehrotra created OAK-7021:


 Summary: Collect DocumentStore stats as part of status zip
 Key: OAK-7021
 URL: https://issues.apache.org/jira/browse/OAK-7021
 Project: Jackrabbit Oak
  Issue Type: New Feature
  Components: documentmk
Reporter: Chetan Mehrotra
 Fix For: 1.8


Many times while investigating issue we request customer to provide Mongo db 
stats. It would be useful if there is an InventoryPrinter for DocumentStore 
which dumps various stats

Some useful stats would be
* Number of entries in "nodes" collection
* Index memory size
* Node collection size on disk
* Number of cluster members which are active

See OAK-6179 for inventory printer related details



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7021) Collect DocumentStore stats as part of status zip

2017-12-04 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276542#comment-16276542
 ] 

Chetan Mehrotra commented on OAK-7021:
--

[~mreutegg] [~catholicon] [~julian.resc...@gmx.de] Probably we can add a 
generic method to DocumentStore

{noformat}
void dumpStats(PrintWriter printWriter)
{noformat}

And have implementation for both stores. This can then be used by an 
InventoryPrinter implementation to include this info with config zip. Also same 
can be exposed via JMX

> Collect DocumentStore stats as part of status zip
> -
>
> Key: OAK-7021
> URL: https://issues.apache.org/jira/browse/OAK-7021
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: documentmk
>Reporter: Chetan Mehrotra
> Fix For: 1.8
>
>
> Many times while investigating issue we request customer to provide Mongo db 
> stats. It would be useful if there is an InventoryPrinter for DocumentStore 
> which dumps various stats
> Some useful stats would be
> * Number of entries in "nodes" collection
> * Index memory size
> * Node collection size on disk
> * Number of cluster members which are active
> See OAK-6179 for inventory printer related details



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-3426) MultiplexingDocumentStore implementation

2017-12-04 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16277982#comment-16277982
 ] 

Chetan Mehrotra commented on OAK-3426:
--

I think this feature would still be useful to provide a path based sharding 
support for Oak deployments on DocumentNodeStore. Something we can revisit for 
1.10 work

> MultiplexingDocumentStore implementation
> 
>
> Key: OAK-3426
> URL: https://issues.apache.org/jira/browse/OAK-3426
> Project: Jackrabbit Oak
>  Issue Type: Story
>  Components: core, mongomk, rdbmk
>Reporter: Robert Munteanu
>
> Create a MultiplexingDocumentStore implementation ( see OAK-3401 for details 
> ). Feature is developed for now at 
> https://github.com/rombert/jackrabbit-oak/tree/features/docstore-multiplex



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-6619) Async indexer thread may get stuck in CopyOnWriteDirectory close method

2017-12-06 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-6619:
-
Fix Version/s: (was: 1.8)

> Async indexer thread may get stuck in CopyOnWriteDirectory close method
> ---
>
> Key: OAK-6619
> URL: https://issues.apache.org/jira/browse/OAK-6619
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Critical
> Attachments: status-threaddump-Sep-5.txt
>
>
> With copy-on-write mode enabled at times its seen that async index thread 
> remain stuck in CopyOnWriteDirectory#close method
> {noformat}
> "async-index-update-async" prio=5 tid=0xb9e63 nid=0x timed_waiting
>java.lang.Thread.State: TIMED_WAITING
>   at sun.misc.Unsafe.park(Native Method)
>   - waiting to lock <0x2504cd51> (a 
> java.util.concurrent.CountDownLatch$Sync) owned by "null" tid=0x-1
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory.close(CopyOnWriteDirectory.java:221)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.writer.DefaultIndexWriter.updateSuggester(DefaultIndexWriter.java:177)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.writer.DefaultIndexWriter.close(DefaultIndexWriter.java:121)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditorContext.closeWriter(LuceneIndexEditorContext.java:136)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditor.leave(LuceneIndexEditor.java:154)
>   at 
> org.apache.jackrabbit.oak.plugins.index.IndexUpdate.leave(IndexUpdate.java:357)
>   at 
> org.apache.jackrabbit.oak.spi.commit.VisibleEditor.leave(VisibleEditor.java:60)
>   at 
> org.apache.jackrabbit.oak.spi.commit.EditorDiff.process(EditorDiff.java:56)
>   at 
> org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate.updateIndex(AsyncIndexUpdate.java:727)
>   at 
> org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate.runWhenPermitted(AsyncIndexUpdate.java:572)
>   at 
> org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate.run(AsyncIndexUpdate.java:431)
>   - locked <0x3d542de5> (a 
> org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate)
>   at 
> org.apache.sling.commons.scheduler.impl.QuartzJobExecutor.execute(QuartzJobExecutor.java:245)
>   at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> The thread is waiting on a latch and no other thread is going to release the 
> latch.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-6619) Async indexer thread may get stuck in CopyOnWriteDirectory close method

2017-12-06 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16279935#comment-16279935
 ] 

Chetan Mehrotra commented on OAK-6619:
--

With various fixes done this should not happen to much. So moving this to 1.10

> Async indexer thread may get stuck in CopyOnWriteDirectory close method
> ---
>
> Key: OAK-6619
> URL: https://issues.apache.org/jira/browse/OAK-6619
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Critical
> Fix For: 1.10
>
> Attachments: status-threaddump-Sep-5.txt
>
>
> With copy-on-write mode enabled at times its seen that async index thread 
> remain stuck in CopyOnWriteDirectory#close method
> {noformat}
> "async-index-update-async" prio=5 tid=0xb9e63 nid=0x timed_waiting
>java.lang.Thread.State: TIMED_WAITING
>   at sun.misc.Unsafe.park(Native Method)
>   - waiting to lock <0x2504cd51> (a 
> java.util.concurrent.CountDownLatch$Sync) owned by "null" tid=0x-1
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory.close(CopyOnWriteDirectory.java:221)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.writer.DefaultIndexWriter.updateSuggester(DefaultIndexWriter.java:177)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.writer.DefaultIndexWriter.close(DefaultIndexWriter.java:121)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditorContext.closeWriter(LuceneIndexEditorContext.java:136)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditor.leave(LuceneIndexEditor.java:154)
>   at 
> org.apache.jackrabbit.oak.plugins.index.IndexUpdate.leave(IndexUpdate.java:357)
>   at 
> org.apache.jackrabbit.oak.spi.commit.VisibleEditor.leave(VisibleEditor.java:60)
>   at 
> org.apache.jackrabbit.oak.spi.commit.EditorDiff.process(EditorDiff.java:56)
>   at 
> org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate.updateIndex(AsyncIndexUpdate.java:727)
>   at 
> org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate.runWhenPermitted(AsyncIndexUpdate.java:572)
>   at 
> org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate.run(AsyncIndexUpdate.java:431)
>   - locked <0x3d542de5> (a 
> org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate)
>   at 
> org.apache.sling.commons.scheduler.impl.QuartzJobExecutor.execute(QuartzJobExecutor.java:245)
>   at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> The thread is waiting on a latch and no other thread is going to release the 
> latch.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-6619) Async indexer thread may get stuck in CopyOnWriteDirectory close method

2017-12-06 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-6619:
-
Fix Version/s: 1.10

> Async indexer thread may get stuck in CopyOnWriteDirectory close method
> ---
>
> Key: OAK-6619
> URL: https://issues.apache.org/jira/browse/OAK-6619
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Critical
> Fix For: 1.10
>
> Attachments: status-threaddump-Sep-5.txt
>
>
> With copy-on-write mode enabled at times its seen that async index thread 
> remain stuck in CopyOnWriteDirectory#close method
> {noformat}
> "async-index-update-async" prio=5 tid=0xb9e63 nid=0x timed_waiting
>java.lang.Thread.State: TIMED_WAITING
>   at sun.misc.Unsafe.park(Native Method)
>   - waiting to lock <0x2504cd51> (a 
> java.util.concurrent.CountDownLatch$Sync) owned by "null" tid=0x-1
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory.close(CopyOnWriteDirectory.java:221)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.writer.DefaultIndexWriter.updateSuggester(DefaultIndexWriter.java:177)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.writer.DefaultIndexWriter.close(DefaultIndexWriter.java:121)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditorContext.closeWriter(LuceneIndexEditorContext.java:136)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditor.leave(LuceneIndexEditor.java:154)
>   at 
> org.apache.jackrabbit.oak.plugins.index.IndexUpdate.leave(IndexUpdate.java:357)
>   at 
> org.apache.jackrabbit.oak.spi.commit.VisibleEditor.leave(VisibleEditor.java:60)
>   at 
> org.apache.jackrabbit.oak.spi.commit.EditorDiff.process(EditorDiff.java:56)
>   at 
> org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate.updateIndex(AsyncIndexUpdate.java:727)
>   at 
> org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate.runWhenPermitted(AsyncIndexUpdate.java:572)
>   at 
> org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate.run(AsyncIndexUpdate.java:431)
>   - locked <0x3d542de5> (a 
> org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate)
>   at 
> org.apache.sling.commons.scheduler.impl.QuartzJobExecutor.execute(QuartzJobExecutor.java:245)
>   at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> The thread is waiting on a latch and no other thread is going to release the 
> latch.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene

2017-12-06 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-6597:
-
Fix Version/s: (was: 1.8)
   1.10

> rep:excerpt not working for content indexed by aggregation in lucene
> 
>
> Key: OAK-6597
> URL: https://issues.apache.org/jira/browse/OAK-6597
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.1, 1.7.6
>Reporter: Dirk Rudolph
>Assignee: Chetan Mehrotra
>  Labels: excerpt
> Fix For: 1.10
>
> Attachments: excerpt-with-aggregation-test.patch
>
>
> I mentioned that properties that got indexed due to an aggregation are not 
> considered for excerpts (highlighting) as they are not indexed as stored 
> fields.
> See the attached patch that implements a test for excerpts in 
> {{LuceneIndexAggregationTest2}}.
> It creates the following structure:
> {code}
> /content/foo [test:Page]
>  + bar (String)
>  - jcr:content [test:PageContent]
>   + bar (String)
> {code}
> where both strings (the _bar_ property at _foo_ and the _bar_ property at 
> _jcr:content_) contain different text. 
> Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in 
> _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the 
> former one the excerpt is properly provided for the later one it isn't.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene

2017-12-06 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16279939#comment-16279939
 ] 

Chetan Mehrotra commented on OAK-6597:
--

Not sure on the approach to take here so moving this to 1.10

> rep:excerpt not working for content indexed by aggregation in lucene
> 
>
> Key: OAK-6597
> URL: https://issues.apache.org/jira/browse/OAK-6597
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.1, 1.7.6
>Reporter: Dirk Rudolph
>Assignee: Chetan Mehrotra
>  Labels: excerpt
> Fix For: 1.10
>
> Attachments: excerpt-with-aggregation-test.patch
>
>
> I mentioned that properties that got indexed due to an aggregation are not 
> considered for excerpts (highlighting) as they are not indexed as stored 
> fields.
> See the attached patch that implements a test for excerpts in 
> {{LuceneIndexAggregationTest2}}.
> It creates the following structure:
> {code}
> /content/foo [test:Page]
>  + bar (String)
>  - jcr:content [test:PageContent]
>   + bar (String)
> {code}
> where both strings (the _bar_ property at _foo_ and the _bar_ property at 
> _jcr:content_) contain different text. 
> Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in 
> _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the 
> former one the excerpt is properly provided for the later one it isn't.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7039) IndexDefinition should provides names of indexed relative node names

2017-12-07 Thread Chetan Mehrotra (JIRA)
Chetan Mehrotra created OAK-7039:


 Summary: IndexDefinition should provides names of indexed relative 
node names
 Key: OAK-7039
 URL: https://issues.apache.org/jira/browse/OAK-7039
 Project: Jackrabbit Oak
  Issue Type: Technical task
  Components: lucene
Reporter: Chetan Mehrotra
Assignee: Chetan Mehrotra
Priority: Minor
 Fix For: 1.7.13, 1.8


For enabling traversal in FlatFileStore we need to determine which all child 
node names would be accessed explicitly i.e. by virtue of them being part of 
aggregate or relative property being indexed.

To support this IndexDefinition should provide a method which provides all such 
relative node names




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7042) Pin async indexer on cluster leader

2017-12-10 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16285527#comment-16285527
 ] 

Chetan Mehrotra commented on OAK-7042:
--

+1. Looks well tested!

> Pin async indexer on cluster leader
> ---
>
> Key: OAK-7042
> URL: https://issues.apache.org/jira/browse/OAK-7042
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Vikas Saurabh
>Assignee: Vikas Saurabh
>Priority: Blocker
> Fix For: 1.7.13, 1.8
>
>
> Currently, async indexer creates some data locally to declare which index 
> files have been deleted from the indexer. This information is used by active 
> deletion's purge logic (to be invoked via jmx).
> Since, async indexer sets its task to scheduler to run as singleton on the 
> cluster (not pinning to anywhere), then the active deletion jmx execution 
> requires the knowledge where async indexer is scheduled.
> To avoid that uncertainty, it's better to instead pin the async indexer on 
> cluster leader and document that active deletion jmx also needs to be invoked 
> on cluster leader.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OAK-7039) IndexDefinition should provides names of indexed relative node names

2017-12-10 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra resolved OAK-7039.
--
Resolution: Fixed

Done with 1817465

> IndexDefinition should provides names of indexed relative node names
> 
>
> Key: OAK-7039
> URL: https://issues.apache.org/jira/browse/OAK-7039
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Minor
> Fix For: 1.7.13, 1.8
>
>
> For enabling traversal in FlatFileStore we need to determine which all child 
> node names would be accessed explicitly i.e. by virtue of them being part of 
> aggregate or relative property being indexed.
> To support this IndexDefinition should provide a method which provides all 
> such relative node names



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7043) Collect SegmentStore stats as part of status zip

2017-12-11 Thread Chetan Mehrotra (JIRA)
Chetan Mehrotra created OAK-7043:


 Summary: Collect SegmentStore stats as part of status zip
 Key: OAK-7043
 URL: https://issues.apache.org/jira/browse/OAK-7043
 Project: Jackrabbit Oak
  Issue Type: New Feature
  Components: segment-tar
Reporter: Chetan Mehrotra
 Fix For: 1.7.13, 1.8


Many times while investigating issue we request customer to provide to size of 
segmentstore and at times list of segmentstore directory. It would be useful if 
there is an InventoryPrinter for SegmentStore which can include

* Size of segment store 
* Listing of segment store directory
* Possibly tail of journal.log
* Possibly some stats/info from index files stored in tar files



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7043) Collect SegmentStore stats as part of status zip

2017-12-11 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16285644#comment-16285644
 ] 

Chetan Mehrotra commented on OAK-7043:
--

[~mduerig] [~frm] Recently similar stats were added for DocumentStore 
(OAK-7021). I thought it would be good to have similar support for 
SegmentStore. So something we can target for 1.8.

Thoughts?

> Collect SegmentStore stats as part of status zip
> 
>
> Key: OAK-7043
> URL: https://issues.apache.org/jira/browse/OAK-7043
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: segment-tar
>Reporter: Chetan Mehrotra
> Fix For: 1.7.13, 1.8
>
>
> Many times while investigating issue we request customer to provide to size 
> of segmentstore and at times list of segmentstore directory. It would be 
> useful if there is an InventoryPrinter for SegmentStore which can include
> * Size of segment store 
> * Listing of segment store directory
> * Possibly tail of journal.log
> * Possibly some stats/info from index files stored in tar files



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7065) Analyze case around failure of indexing cycle leading to orphan files

2017-12-14 Thread Chetan Mehrotra (JIRA)
Chetan Mehrotra created OAK-7065:


 Summary: Analyze case around failure of indexing cycle leading to 
orphan files
 Key: OAK-7065
 URL: https://issues.apache.org/jira/browse/OAK-7065
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: lucene
Reporter: Chetan Mehrotra
Assignee: Chetan Mehrotra
 Fix For: 1.10


If an indexing cycle fails for some reason it may leave orphan files in local 
directory. Later on in next indexing cycle Lucene would try to create files 
with same name on local disk and thia may fail on Windows where such files may 
have been memory mapped and hence cannot  be deleted.

We should analyze such a scenario and see if system can handle the failure case 
properly



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7029) RDBDocumentStore.getStats() implementation

2017-12-14 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16291266#comment-16291266
 ] 

Chetan Mehrotra commented on OAK-7029:
--

[~reschke] Can you also add some samples of the stats for few of the dbs to get 
a sense of what info looks like? Just output of status printer rendering should 
be fine

> RDBDocumentStore.getStats() implementation
> --
>
> Key: OAK-7029
> URL: https://issues.apache.org/jira/browse/OAK-7029
> Project: Jackrabbit Oak
>  Issue Type: Sub-task
>  Components: rdbmk
>Reporter: Marcel Reutegger
>Assignee: Julian Reschke
>Priority: Minor
> Fix For: 1.7.13, 1.8
>
> Attachments: OAK-7029.diff, OAK-7029.diff, OAK-7029.diff
>
>
> Proper implementation of DocumentStore.getStats() introduced with OAK-7021.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7066) Active deletion blob list files can grow too large due to inlined blobs

2017-12-14 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16291272#comment-16291272
 ] 

Chetan Mehrotra commented on OAK-7066:
--

bq. I think adding a method "isInline" would be better

+1 for such a method to Blob interface

> Active deletion blob list files can grow too large due to inlined blobs
> ---
>
> Key: OAK-7066
> URL: https://issues.apache.org/jira/browse/OAK-7066
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Vikas Saurabh
>Assignee: Vikas Saurabh
>
> This is follow up from OAK-7052 where we noticed that deleted blob list files 
> collected by active deletion logic can grow very large due to inlined blobs.
> One potential way (not sure how yet though) is to not actively delete inlined 
> blobs.
> Here are some stats which might help us take a call (based on raw numbers 
> collected at \[0])
> ||file-name||large_lines||large_size||small_lines||small_size||small_lines/total_lines||small_size/total_size||
> |blobs-1512664032264.txt|245301|3310224358|173096|35473656|0.413712335413495|0.010602766852107|
> |blobs-1512698405656.txt|370373|4443957885|256775|52997864|0.409432861142824|0.011785275852845|
> |blobs-1512987450004.txt|660669|6214740439|461168|92017554|0.411082893504137|0.014590309966251|
> |blobs-1513130410963.txt|569083|5490965583|406756|80124598|0.416826956085994|0.014382211631264|
> |blobs-1513216819447.txt|69876|1413561892|46238|9221956|0.398212101899857|0.006481628262061|
> \[0]:
> file sizes
> {noformat}
> repository/index/deleted-blobs$ ls -l blobs-151*
> -rw-r--r-- 1 root root 3369065620 Dec  8 01:59 blobs-1512664032264.txt
> -rw-r--r-- 1 root root 4532250073 Dec  9 01:59 blobs-1512698405656.txt
> -rw-r--r-- 1 root root 6370201955 Dec 13 01:59 blobs-1512987450004.txt
> -rw-r--r-- 1 root root 1916223582 Dec 13 11:52 blobs-1513130410963.txt
> {noformat}
> number of entries
> {noformat}
> repository/index/deleted-blobs$ wc -l blobs-151*
>  418397 blobs-1512664032264.txt
>  627148 blobs-1512698405656.txt
> 1121837 blobs-1512987450004.txt
>  308292 blobs-1513130410963.txt
> 2475674 total
> {noformat}
> number of entries and sizes split on threshold of 500 bytes of blob ids
> {noformat}
> repository/index/deleted-blobs$ for i in blobs-151*;do echo $i;awk 'BEGIN 
> {FS="|"} {len = length($1); if (len > 500) {large++; largeSize+=len} else 
> {small++; smallSize+=len}} END {print large, largeSize, small, smallSize}' 
> $i;done
> blobs-1512664032264.txt
> 245301 3310224358 173096 35473656
> blobs-1512698405656.txt
> 370373 4443957885 256775 52997864
> blobs-1512987450004.txt
> 660669 6214740439 461168 92017554
> blobs-1513130410963.txt
> 569083 5490965583 406756 80124598
> blobs-1513216819447.txt
> 69876 1413561892 46238 9221956
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7029) RDBDocumentStore.getStats() implementation

2017-12-14 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16292031#comment-16292031
 ] 

Chetan Mehrotra commented on OAK-7029:
--

If you deploy Oak in Sling based application then you can access the stats at 
/system/console/status-oak-document-store-stats.txt (or 
/system/console/status-oak-document-store-stats)

> RDBDocumentStore.getStats() implementation
> --
>
> Key: OAK-7029
> URL: https://issues.apache.org/jira/browse/OAK-7029
> Project: Jackrabbit Oak
>  Issue Type: Sub-task
>  Components: rdbmk
>Reporter: Marcel Reutegger
>Assignee: Julian Reschke
>Priority: Minor
> Fix For: 1.7.13, 1.8
>
> Attachments: OAK-7029.diff, OAK-7029.diff, OAK-7029.diff
>
>
> Proper implementation of DocumentStore.getStats() introduced with OAK-7021.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7072) Add log message upon compacted call in GCMonitor in Lucene indexer

2017-12-17 Thread Chetan Mehrotra (JIRA)
Chetan Mehrotra created OAK-7072:


 Summary: Add log message upon compacted call in GCMonitor in 
Lucene indexer
 Key: OAK-7072
 URL: https://issues.apache.org/jira/browse/OAK-7072
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: lucene
Reporter: Chetan Mehrotra
Assignee: Chetan Mehrotra
Priority: Trivial
 Fix For: 1.7.13, 1.8


It would be useful to log a message confirming that compacted method was 
invoked on GCMonitor registered by LuceneIndexProviderService



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-6353) Use Document order traversal for reindexing performed on DocumentNodeStore setups

2017-12-17 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294556#comment-16294556
 ] 

Chetan Mehrotra commented on OAK-6353:
--

With new Document order traversal based indexing significant performance 
improvements were seen. 

For a large repo (255M Mongo Docs, 66M nodes under /content and having 4.2M 
assets) earlier indexing completed in 13.66 h. Compared to that document order 
based indexing completed in 3.469 h. 

With this initial planned implementation is done. Specific issues can later be 
opened for further improvements. Possible future enhancements

# Prefetch the previous documents before doing Mongo traversal - This may 
reduce the time to resolve the NodeDocument to NodeState
# Mongo query optimizations
## Avoid fetching nodes under hidden paths at all
## Only fetch those documents from Mongo which are under included paths - This 
can be done by using javascript function
# Sorting optimization - Sort the batch in memory as nodes are being read and 
just write the sorted files

Also documents need to be updated

> Use Document order traversal for reindexing performed on DocumentNodeStore 
> setups
> -
>
> Key: OAK-6353
> URL: https://issues.apache.org/jira/browse/OAK-6353
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.8
>
> Attachments: OAK-6353-v1.patch, OAK-6353-v2.patch
>
>
> [~tmueller] suggested 
> [here|https://issues.apache.org/jira/browse/OAK-6246?focusedCommentId=16034442&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16034442]
>  that document order traversal can be faster compared to current mode of path 
> based traversal. Initial test indicate that such a traversal can be order of 
> magnitude faster. 
> So this task is meant to implement such an approach and see if it can be a 
> viable indexing mode used for DocumentNodeStore based setups



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OAK-7072) Add log message upon compacted call in GCMonitor in Lucene indexer

2017-12-17 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra resolved OAK-7072.
--
Resolution: Fixed

Done with 1818528

> Add log message upon compacted call in GCMonitor in Lucene indexer
> --
>
> Key: OAK-7072
> URL: https://issues.apache.org/jira/browse/OAK-7072
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Trivial
> Fix For: 1.7.13, 1.8
>
>
> It would be useful to log a message confirming that compacted method was 
> invoked on GCMonitor registered by LuceneIndexProviderService



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OAK-6353) Use Document order traversal for reindexing performed on DocumentNodeStore setups

2017-12-17 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra resolved OAK-6353.
--
   Resolution: Fixed
Fix Version/s: 1.7.13

> Use Document order traversal for reindexing performed on DocumentNodeStore 
> setups
> -
>
> Key: OAK-6353
> URL: https://issues.apache.org/jira/browse/OAK-6353
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.7.13, 1.8
>
> Attachments: OAK-6353-v1.patch, OAK-6353-v2.patch
>
>
> [~tmueller] suggested 
> [here|https://issues.apache.org/jira/browse/OAK-6246?focusedCommentId=16034442&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16034442]
>  that document order traversal can be faster compared to current mode of path 
> based traversal. Initial test indicate that such a traversal can be order of 
> magnitude faster. 
> So this task is meant to implement such an approach and see if it can be a 
> viable indexing mode used for DocumentNodeStore based setups



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-6353) Use Document order traversal for reindexing performed on DocumentNodeStore setups

2017-12-17 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294556#comment-16294556
 ] 

Chetan Mehrotra edited comment on OAK-6353 at 12/18/17 6:44 AM:


With new Document order traversal based indexing significant performance 
improvements were seen. 

For a large repo (255M Mongo Docs, 66M nodes under /content and having 4.2M 
assets) earlier indexing completed in 13.66 h. Compared to that document order 
based indexing completed in 3.469 h. 

With this initial planned implementation is done. Specific issues can later be 
opened for further improvements. Possible future enhancements

# Prefetch the previous documents before doing Mongo traversal - This may 
reduce the time to resolve the NodeDocument to NodeState
# Mongo query optimizations
## Avoid fetching nodes under hidden paths at all
## Only fetch those documents from Mongo which are under included paths - This 
can be done by using javascript function
# Sorting optimization - Sort the batch in memory as nodes are being read and 
just write the sorted files

*Usage*

This mode can be enabled for Mongo based setup via cli argument 
{{--doc-traversal-mode}}

This indexing mode requires quite a bit of local disk space to store all the 
NodeState in json format. For 200GB Mongo repo it required 100GB of local disk 
space to keep the NodeState json and also for performing external sort on that

Also documents need to be updated


was (Author: chetanm):
With new Document order traversal based indexing significant performance 
improvements were seen. 

For a large repo (255M Mongo Docs, 66M nodes under /content and having 4.2M 
assets) earlier indexing completed in 13.66 h. Compared to that document order 
based indexing completed in 3.469 h. 

With this initial planned implementation is done. Specific issues can later be 
opened for further improvements. Possible future enhancements

# Prefetch the previous documents before doing Mongo traversal - This may 
reduce the time to resolve the NodeDocument to NodeState
# Mongo query optimizations
## Avoid fetching nodes under hidden paths at all
## Only fetch those documents from Mongo which are under included paths - This 
can be done by using javascript function
# Sorting optimization - Sort the batch in memory as nodes are being read and 
just write the sorted files

Also documents need to be updated

> Use Document order traversal for reindexing performed on DocumentNodeStore 
> setups
> -
>
> Key: OAK-6353
> URL: https://issues.apache.org/jira/browse/OAK-6353
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.7.13, 1.8
>
> Attachments: OAK-6353-v1.patch, OAK-6353-v2.patch
>
>
> [~tmueller] suggested 
> [here|https://issues.apache.org/jira/browse/OAK-6246?focusedCommentId=16034442&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16034442]
>  that document order traversal can be faster compared to current mode of path 
> based traversal. Initial test indicate that such a traversal can be order of 
> magnitude faster. 
> So this task is meant to implement such an approach and see if it can be a 
> viable indexing mode used for DocumentNodeStore based setups



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7073) Expose readOnly status for MongoDocumentStore

2017-12-18 Thread Chetan Mehrotra (JIRA)
Chetan Mehrotra created OAK-7073:


 Summary: Expose readOnly status for MongoDocumentStore
 Key: OAK-7073
 URL: https://issues.apache.org/jira/browse/OAK-7073
 Project: Jackrabbit Oak
  Issue Type: Technical task
  Components: mongomk
Reporter: Chetan Mehrotra
Assignee: Chetan Mehrotra
Priority: Minor
 Fix For: 1.7.13, 1.8


Provide a way to determine if the MongoDocumentStore is configured for readOnly 
access



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OAK-7073) Expose readOnly status for MongoDocumentStore

2017-12-18 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra resolved OAK-7073.
--
Resolution: Fixed

Done with 1818533

> Expose readOnly status for MongoDocumentStore
> -
>
> Key: OAK-7073
> URL: https://issues.apache.org/jira/browse/OAK-7073
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: mongomk
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Minor
> Fix For: 1.7.13, 1.8
>
>
> Provide a way to determine if the MongoDocumentStore is configured for 
> readOnly access



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-6353) Use Document order traversal for reindexing performed on DocumentNodeStore setups

2017-12-18 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294747#comment-16294747
 ] 

Chetan Mehrotra commented on OAK-6353:
--

There are some aspects which still need to be taken care of. See OAK-7074

> Use Document order traversal for reindexing performed on DocumentNodeStore 
> setups
> -
>
> Key: OAK-6353
> URL: https://issues.apache.org/jira/browse/OAK-6353
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.7.13, 1.8
>
> Attachments: OAK-6353-v1.patch, OAK-6353-v2.patch
>
>
> [~tmueller] suggested 
> [here|https://issues.apache.org/jira/browse/OAK-6246?focusedCommentId=16034442&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16034442]
>  that document order traversal can be faster compared to current mode of path 
> based traversal. Initial test indicate that such a traversal can be order of 
> magnitude faster. 
> So this task is meant to implement such an approach and see if it can be a 
> viable indexing mode used for DocumentNodeStore based setups



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7074) Ensure that all Documents are read with document order traversal indexing

2017-12-18 Thread Chetan Mehrotra (JIRA)
Chetan Mehrotra created OAK-7074:


 Summary: Ensure that all Documents are read with document order 
traversal indexing
 Key: OAK-7074
 URL: https://issues.apache.org/jira/browse/OAK-7074
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: mongomk, run
Reporter: Chetan Mehrotra
Assignee: Chetan Mehrotra
 Fix For: 1.8


With OAK-6353 support was added for document order traversal indexing. In this 
mode we open a DB cursor and try to read all documents from it using document 
order traversal. Such a cursor may remain open for long time (2-4 hrs) and its 
possible that document may get reordered by the Mongo storage engine. This 
would result in 2 aspects to be thought about 

# Duplicate documents - Same document may appear more than once in result set 
# Possibly missed document - It may be a possibility that a document got moved 
and missed becoming part of cursor. 

Both these aspects would need to be handled



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7074) Ensure that all Documents are read with document order traversal indexing

2017-12-18 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16296232#comment-16296232
 ] 

Chetan Mehrotra commented on OAK-7074:
--

With 1818634 now sorting uses distinct mode to avoid duplicates.

[~catholicon] mentioned in offline discussion that for duplicates we just need 
to ensure NodeStateEntries are unique per per. It does not matter for same path 
which entry is picked. Further document may appear more than once in a cursor 
traversal for one of the following cases

# Document was updated - If document gets updated then it may be moved around 
and thus may appear twice in natural order traversal. So while sorting we can 
still pick anyone as the NodeState view for the checkpoint revision would be 
same for both Mongo documents. 
# Document was moved due to internal design of Mongo - It may happen that Mongo 
may move around document without update (say due to some compaction process). 
In that case we are not sure on consistency gurantee of natural order traversal 
i.e. is it possible that document may not get reflected in cursor result at all 
if Mongo is in use?

So based on #1 we just need to ensure that sorting removes any duplicates

> Ensure that all Documents are read with document order traversal indexing
> -
>
> Key: OAK-7074
> URL: https://issues.apache.org/jira/browse/OAK-7074
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: mongomk, run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.8
>
>
> With OAK-6353 support was added for document order traversal indexing. In 
> this mode we open a DB cursor and try to read all documents from it using 
> document order traversal. Such a cursor may remain open for long time (2-4 
> hrs) and its possible that document may get reordered by the Mongo storage 
> engine. This would result in 2 aspects to be thought about 
> # Duplicate documents - Same document may appear more than once in result set 
> # Possibly missed document - It may be a possibility that a document got 
> moved and missed becoming part of cursor. 
> Both these aspects would need to be handled



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7079) Enable oak-run indexing to connect to secondary node in Mongo replica set

2017-12-18 Thread Chetan Mehrotra (JIRA)
Chetan Mehrotra created OAK-7079:


 Summary: Enable oak-run indexing to connect to secondary node in 
Mongo replica set
 Key: OAK-7079
 URL: https://issues.apache.org/jira/browse/OAK-7079
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: mongomk, run
Reporter: Chetan Mehrotra
 Fix For: 1.10


With OAK-6353 support for document order traversal based indexing has been 
added. Currently it connects to Mongo primary. 

We need to test and validate if it can be made only to connect to Mongo 
secondary for below 2 cases
# Pre created checkpoint - Here checkpoint is created already and then oak-run 
*only* connects to Mongo secondary
# Online indexing - Here oak-run would also create checkpoint. However it would 
need to be ensured that when it performs the document order traversal query 
that query is handled by Mongo secondary and oak-run logic ensures that 
secondary node has the created checkpoint



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-6353) Use Document order traversal for reindexing performed on DocumentNodeStore setups

2017-12-18 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294747#comment-16294747
 ] 

Chetan Mehrotra edited comment on OAK-6353 at 12/19/17 5:41 AM:


There are some aspects which still need to be taken care of. See OAK-7074 and 
OAK-7079


was (Author: chetanm):
There are some aspects which still need to be taken care of. See OAK-7074

> Use Document order traversal for reindexing performed on DocumentNodeStore 
> setups
> -
>
> Key: OAK-6353
> URL: https://issues.apache.org/jira/browse/OAK-6353
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.7.13, 1.8
>
> Attachments: OAK-6353-v1.patch, OAK-6353-v2.patch
>
>
> [~tmueller] suggested 
> [here|https://issues.apache.org/jira/browse/OAK-6246?focusedCommentId=16034442&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16034442]
>  that document order traversal can be faster compared to current mode of path 
> based traversal. Initial test indicate that such a traversal can be order of 
> magnitude faster. 
> So this task is meant to implement such an approach and see if it can be a 
> viable indexing mode used for DocumentNodeStore based setups



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-6353) Use Document order traversal for reindexing performed on DocumentNodeStore setups

2017-12-19 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297924#comment-16297924
 ] 

Chetan Mehrotra commented on OAK-6353:
--

Some performance numbers for reindexing done for repo having 255M Mongo Docs, 
66M nodes under /content and having 4.2M assets

# Normal NodeStore traversal - 13.66 h

*Document Traversal*

A - Default setup 

# Total time - 3.469 h
## Time in dumping - 2.405 h
## Time in sorting - 39.87 min
###  Batch sorting - 19.13 min
###  Merging - 20.17
## Indexing 24 mins
# Space consumed
#* dumped json - 43.6 GB
#* chunked files - 43.6 GB
#* index size - 2.5 GB

{noformat}
2017-12-15 16:48:34 Proceeding to index [/oak:index/damAssetLucene2] upto 
checkpoint head {} 
2017-12-15 19:12:55 Dumped 65472172 nodestates in json format in 2.405 h 
2017-12-15 19:12:55 Compression enabled while sorting : false 
(oak.indexer.useZip) 
2017-12-15 19:12:55 Delete original dump from traversal : true 
(oak.indexer.deleteOriginal) 
2017-12-15 19:12:55 Max heap memory (GB) to be used for merge sort : 3 
(oak.indexer.maxSortMemoryInGB) 
2017-12-15 19:12:57 Sorting with memory 3.2 GB (estimated 12.6 GB) 
2017-12-15 19:32:05 Batch sorting done in 19.13 min with 29 files of size 43.6 
GB to merge 
2017-12-15 19:32:05 Removing the original file temp/flat-file-store/store.json 
2017-12-15 19:52:50 Merging of sorted files completed in 20.71 min 
2017-12-15 19:52:50 Sorting completed in 39.87 min 
2017-12-15 19:52:50 Estimated node count to be traversed for reindexing under / 
is [65472172] 
2017-12-15 20:16:35 Indexing report
- /oak:index/damAssetLucene2*(4407265)
2017-12-15 20:16:43 Indexing completed for indexes [/oak:index/damAssetLucene2] 
in 3.469 h (12488171 ms) 
{noformat}

B - Compression enabled in sorting

# Total time - 3.811 h
## Time in dumping - 2.929 h
## Time in sorting - 29.56 min
###  Batch sorting - 17.67 min
###  Merging - 11.87 min
## Indexing 24 mins
# Space consumed
#* dumped json - 43.6 GB
#* chunked files - 5.5 GB
#* index size - 2.5 GB

{noformat}
2017-12-19 10:56:00  Proceeding to index [/oak:index/damAssetLucene2] upto 
checkpoint head {} 
2017-12-19 13:51:50 oreBuilder - Dumped 65469575 nodestates in json format in 
2.929 h (43.6 GB) 
2017-12-19 13:51:50 oreBuilder - Compression enabled while sorting : true 
(oak.indexer.useZip) 
2017-12-19 13:51:50 oreBuilder - Delete original dump from traversal : true 
(oak.indexer.deleteOriginal) 
2017-12-19 13:51:50 oreBuilder - Max heap memory (GB) to be used for merge sort 
: 3 (oak.indexer.maxSortMemoryInGB) 
2017-12-19 13:51:52 Sorter - Sorting with memory 3.2 GB (estimated 12.6 GB) 
2017-12-19 14:09:32 Sorter - Batch sorting done in 17.67 min with 29 files of 
size 5.5 GB to merge 
2017-12-19 14:09:32 Sorter - Removing the original file 
temp/flat-file-store/store.json 
2017-12-19 14:21:25 Sorter - Merging of sorted files completed in 11.87 min 
2017-12-19 14:21:25 Sorter - Sorting completed in 29.56 min 
2017-12-19 14:21:26 Estimated node count to be traversed for reindexing under / 
is [65469575] 
2017-12-19 14:44:30 Indexing report
- /oak:index/damAssetLucene2*(4407265)
 2017-12-19 14:44:30 Reindexing completed 
2017-12-19 14:44:30 Switched the async lane for indexes at 
[/oak:index/damAssetLucene2] back to there original lanes 
2017-12-19 14:44:39 Indexing completed for indexes [/oak:index/damAssetLucene2] 
in 3.811 h (13718589 ms)
{noformat}

> Use Document order traversal for reindexing performed on DocumentNodeStore 
> setups
> -
>
> Key: OAK-6353
> URL: https://issues.apache.org/jira/browse/OAK-6353
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.7.13, 1.8
>
> Attachments: OAK-6353-v1.patch, OAK-6353-v2.patch
>
>
> [~tmueller] suggested 
> [here|https://issues.apache.org/jira/browse/OAK-6246?focusedCommentId=16034442&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16034442]
>  that document order traversal can be faster compared to current mode of path 
> based traversal. Initial test indicate that such a traversal can be order of 
> magnitude faster. 
> So this task is meant to implement such an approach and see if it can be a 
> viable indexing mode used for DocumentNodeStore based setups



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7094) Log cli arguments and vm arguments passed to indexer command

2017-12-20 Thread Chetan Mehrotra (JIRA)
Chetan Mehrotra created OAK-7094:


 Summary: Log cli arguments and vm arguments passed to indexer 
command
 Key: OAK-7094
 URL: https://issues.apache.org/jira/browse/OAK-7094
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: run
Reporter: Chetan Mehrotra
Assignee: Chetan Mehrotra
Priority: Minor
 Fix For: 1.7.14, 1.8


It would be useful to also log the cli arguments to the indexing.log as that 
would help in analysing any customer reported issue



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OAK-7094) Log cli arguments and vm arguments passed to indexer command

2017-12-20 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra resolved OAK-7094.
--
Resolution: Fixed

Done with 1818746

> Log cli arguments and vm arguments passed to indexer command
> 
>
> Key: OAK-7094
> URL: https://issues.apache.org/jira/browse/OAK-7094
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Minor
> Fix For: 1.8, 1.7.14
>
>
> It would be useful to also log the cli arguments to the indexing.log as that 
> would help in analysing any customer reported issue



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7095) NodeStoreFixtureProvider should use BlobStore from DocumentNodeStore if no DataStore configured

2017-12-20 Thread Chetan Mehrotra (JIRA)
Chetan Mehrotra created OAK-7095:


 Summary: NodeStoreFixtureProvider should use BlobStore from 
DocumentNodeStore if no DataStore configured
 Key: OAK-7095
 URL: https://issues.apache.org/jira/browse/OAK-7095
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: run
Reporter: Chetan Mehrotra
Assignee: Chetan Mehrotra
 Fix For: 1.8, 1.7.14


NodeStoreFixtureProvider currently works fine for explicitly configured 
BlobStore. However for setups like Mongo where is no external DataStore is 
configured an implicit one is created then that BlobStore is not exposed.

So NodeStoreFixtureProvider should expose such a BlobStore



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OAK-7095) NodeStoreFixtureProvider should use BlobStore from DocumentNodeStore if no DataStore configured

2017-12-20 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra resolved OAK-7095.
--
Resolution: Fixed

Done with 1818751

> NodeStoreFixtureProvider should use BlobStore from DocumentNodeStore if no 
> DataStore configured
> ---
>
> Key: OAK-7095
> URL: https://issues.apache.org/jira/browse/OAK-7095
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.8, 1.7.14
>
>
> NodeStoreFixtureProvider currently works fine for explicitly configured 
> BlobStore. However for setups like Mongo where is no external DataStore is 
> configured an implicit one is created then that BlobStore is not exposed.
> So NodeStoreFixtureProvider should expose such a BlobStore



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7097) DocumentStoreIndexer should clear the index state prior to indexing

2017-12-20 Thread Chetan Mehrotra (JIRA)
Chetan Mehrotra created OAK-7097:


 Summary: DocumentStoreIndexer should clear the index state prior 
to indexing
 Key: OAK-7097
 URL: https://issues.apache.org/jira/browse/OAK-7097
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: run
Reporter: Chetan Mehrotra
Assignee: Chetan Mehrotra
 Fix For: 1.8, 1.7.14


DocumentStoreIndexer currently implements some part of logic which is present 
in IndexUpdate. However it misses on 2 things

# Removing the hidden index state
# Resetting the reindexing flag

Those should be implemented



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-7097) DocumentStoreIndexer should clear the index state prior to indexing

2017-12-20 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-7097:
-
Affects Version/s: 1.7.13

> DocumentStoreIndexer should clear the index state prior to indexing
> ---
>
> Key: OAK-7097
> URL: https://issues.apache.org/jira/browse/OAK-7097
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: run
>Affects Versions: 1.7.13
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.8, 1.7.14
>
>
> DocumentStoreIndexer currently implements some part of logic which is present 
> in IndexUpdate. However it misses on 2 things
> # Removing the hidden index state
> # Resetting the reindexing flag
> Those should be implemented



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7098) Refcator common logic between IndexUpdate and DocumentStoreIndexer

2017-12-20 Thread Chetan Mehrotra (JIRA)
Chetan Mehrotra created OAK-7098:


 Summary: Refcator common logic between IndexUpdate and 
DocumentStoreIndexer
 Key: OAK-7098
 URL: https://issues.apache.org/jira/browse/OAK-7098
 Project: Jackrabbit Oak
  Issue Type: Task
  Components: indexing, run
Reporter: Chetan Mehrotra
 Fix For: 1.10


DocumentStoreIndexer implements an alternative way of indexing which differs 
from diff based indexing done by IndexUpdate. However some part of logic is 
commong

We should refactor them and abstract them out so both can share same logic



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-7098) Refactor common logic between IndexUpdate and DocumentStoreIndexer

2017-12-20 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-7098:
-
Summary: Refactor common logic between IndexUpdate and DocumentStoreIndexer 
 (was: Refcator common logic between IndexUpdate and DocumentStoreIndexer)

> Refactor common logic between IndexUpdate and DocumentStoreIndexer
> --
>
> Key: OAK-7098
> URL: https://issues.apache.org/jira/browse/OAK-7098
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: indexing, run
>Reporter: Chetan Mehrotra
> Fix For: 1.10
>
>
> DocumentStoreIndexer implements an alternative way of indexing which differs 
> from diff based indexing done by IndexUpdate. However some part of logic is 
> commong
> We should refactor them and abstract them out so both can share same logic



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7099) DocumentStoreIndexer should log estimate of ETA for dumping phase

2017-12-20 Thread Chetan Mehrotra (JIRA)
Chetan Mehrotra created OAK-7099:


 Summary: DocumentStoreIndexer should log estimate of ETA for 
dumping phase
 Key: OAK-7099
 URL: https://issues.apache.org/jira/browse/OAK-7099
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: run
Reporter: Chetan Mehrotra
Assignee: Chetan Mehrotra
Priority: Minor
 Fix For: 1.8, 1.7.14


DocumentStoreIndexer currently does not log ETA for dumping phase



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OAK-7097) DocumentStoreIndexer should clear the index state prior to indexing

2017-12-20 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra resolved OAK-7097.
--
Resolution: Fixed

Done with 1818758

> DocumentStoreIndexer should clear the index state prior to indexing
> ---
>
> Key: OAK-7097
> URL: https://issues.apache.org/jira/browse/OAK-7097
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: run
>Affects Versions: 1.7.13
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.8, 1.7.14
>
>
> DocumentStoreIndexer currently implements some part of logic which is present 
> in IndexUpdate. However it misses on 2 things
> # Removing the hidden index state
> # Resetting the reindexing flag
> Those should be implemented



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OAK-7099) DocumentStoreIndexer should log estimate of ETA for dumping phase

2017-12-20 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra resolved OAK-7099.
--
Resolution: Fixed

Done with 1818768

> DocumentStoreIndexer should log estimate of ETA for dumping phase
> -
>
> Key: OAK-7099
> URL: https://issues.apache.org/jira/browse/OAK-7099
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Minor
> Fix For: 1.8, 1.7.14
>
>
> DocumentStoreIndexer currently does not log ETA for dumping phase



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7102) Refactor DocumentIndexer logic to enable different sort approaches

2017-12-20 Thread Chetan Mehrotra (JIRA)
Chetan Mehrotra created OAK-7102:


 Summary: Refactor DocumentIndexer logic to enable different sort 
approaches
 Key: OAK-7102
 URL: https://issues.apache.org/jira/browse/OAK-7102
 Project: Jackrabbit Oak
  Issue Type: Task
  Components: run
Reporter: Chetan Mehrotra
Assignee: Chetan Mehrotra
 Fix For: 1.7.14, 1.8


DocumentStoreIndexer logic needs to be refactored to support plugging in 
different sort approaches



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7103) Enable compression by default on DocumentStoreIndexer logic

2017-12-20 Thread Chetan Mehrotra (JIRA)
Chetan Mehrotra created OAK-7103:


 Summary: Enable compression by default on DocumentStoreIndexer 
logic
 Key: OAK-7103
 URL: https://issues.apache.org/jira/browse/OAK-7103
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: run
Reporter: Chetan Mehrotra
Assignee: Chetan Mehrotra
 Fix For: 1.7.14, 1.8


While performing tests it appears that enabling end to end compression reduces 
the sorting time by 14 mins (39.87 min to 26.44 min) and disk consumption by 
65GB (87GB to 12.5). Based on that we should enable compression by default for

# Create compressed base store.json written by traversal
# Enable compression for intermediate files created while sorting
# Enable compression for finally sorted json file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7104) Support read and writing to compressed file in ExternalSort

2017-12-20 Thread Chetan Mehrotra (JIRA)
Chetan Mehrotra created OAK-7104:


 Summary: Support read and writing to compressed file in 
ExternalSort
 Key: OAK-7104
 URL: https://issues.apache.org/jira/browse/OAK-7104
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: commons
Reporter: Chetan Mehrotra
Assignee: Chetan Mehrotra
 Fix For: 1.7.14, 1.8


Currently ExternalSort only support compression for intermediate file created 
in merge phase. It would be good to also support reading and writing to 
compressed file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-7104) Support read and write to compressed file in ExternalSort

2017-12-20 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-7104:
-
Summary: Support read and write to compressed file in ExternalSort  (was: 
Support read and writing to compressed file in ExternalSort)

> Support read and write to compressed file in ExternalSort
> -
>
> Key: OAK-7104
> URL: https://issues.apache.org/jira/browse/OAK-7104
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: commons
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.7.14, 1.8
>
>
> Currently ExternalSort only support compression for intermediate file created 
> in merge phase. It would be good to also support reading and writing to 
> compressed file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OAK-7104) Support read and write to compressed file in ExternalSort

2017-12-20 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra resolved OAK-7104.
--
   Resolution: Fixed
Fix Version/s: (was: 1.7.14)
   1.7.15

Done with http://svn.apache.org/viewvc?rev=1818878&view=rev

[~amjain] Please review the commit once

> Support read and write to compressed file in ExternalSort
> -
>
> Key: OAK-7104
> URL: https://issues.apache.org/jira/browse/OAK-7104
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: commons
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.7.15, 1.8
>
>
> Currently ExternalSort only support compression for intermediate file created 
> in merge phase. It would be good to also support reading and writing to 
> compressed file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OAK-7103) Enable compression by default on DocumentStoreIndexer logic

2017-12-20 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra resolved OAK-7103.
--
   Resolution: Fixed
Fix Version/s: (was: 1.7.14)
   1.7.15

Done with 1818879

> Enable compression by default on DocumentStoreIndexer logic
> ---
>
> Key: OAK-7103
> URL: https://issues.apache.org/jira/browse/OAK-7103
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.8, 1.7.15
>
>
> While performing tests it appears that enabling end to end compression 
> reduces the sorting time by 14 mins (39.87 min to 26.44 min) and disk 
> consumption by 65GB (87GB to 12.5). Based on that we should enable 
> compression by default for
> # Create compressed base store.json written by traversal
> # Enable compression for intermediate files created while sorting
> # Enable compression for finally sorted json file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7105) Implement a traverse with sort strategy for DocumentStoreIndexer

2017-12-20 Thread Chetan Mehrotra (JIRA)
Chetan Mehrotra created OAK-7105:


 Summary: Implement a traverse with sort strategy for 
DocumentStoreIndexer
 Key: OAK-7105
 URL: https://issues.apache.org/jira/browse/OAK-7105
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: run
Reporter: Chetan Mehrotra
Assignee: Chetan Mehrotra
 Fix For: 1.8, 1.7.15


Currently the DocumentStoreIndexer logic uses a StoreAndSortStrategy in which 
it first dumps all nodestates to a json file -> sort them in batches -> merge 
the sorted file. In whole indexing the sorting phase is taking decent amount of 
time (40 mins out of 3 hr run).

Further this approach suffers with potential OOM while ExternalSort creates in 
memory batches where actual size of batch exceeds the estimated size 
considerably. So we need to constant tweak the "oak.indexer.maxSortMemoryInGB" 
(currently set to 2 GB)

As an improvement we can do following changes

# Implement a traverse with sort strategy - Here instead of first dumping all 
nodestate in a single big json we instead add them to an in memory buffer and 
then at some stage sort the batch and save it to file
# Use better memory checks - Use the approach as implemented in GCBarrier i.e. 
monitor the current memory usage and if it goes below certain threshold trigger 
the batch sort



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OAK-7102) Refactor DocumentIndexer logic to enable different sort approaches

2017-12-21 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra resolved OAK-7102.
--
Resolution: Fixed

Done with various commits in trunk

> Refactor DocumentIndexer logic to enable different sort approaches
> --
>
> Key: OAK-7102
> URL: https://issues.apache.org/jira/browse/OAK-7102
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.7.14, 1.8
>
>
> DocumentStoreIndexer logic needs to be refactored to support plugging in 
> different sort approaches



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7105) Implement a traverse with sort strategy for DocumentStoreIndexer

2017-12-21 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16299800#comment-16299800
 ] 

Chetan Mehrotra commented on OAK-7105:
--

Implemented the above flow with 1818896

> Implement a traverse with sort strategy for DocumentStoreIndexer
> 
>
> Key: OAK-7105
> URL: https://issues.apache.org/jira/browse/OAK-7105
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.8, 1.7.15
>
>
> Currently the DocumentStoreIndexer logic uses a StoreAndSortStrategy in which 
> it first dumps all nodestates to a json file -> sort them in batches -> merge 
> the sorted file. In whole indexing the sorting phase is taking decent amount 
> of time (40 mins out of 3 hr run).
> Further this approach suffers with potential OOM while ExternalSort creates 
> in memory batches where actual size of batch exceeds the estimated size 
> considerably. So we need to constant tweak the 
> "oak.indexer.maxSortMemoryInGB" (currently set to 2 GB)
> As an improvement we can do following changes
> # Implement a traverse with sort strategy - Here instead of first dumping all 
> nodestate in a single big json we instead add them to an in memory buffer and 
> then at some stage sort the batch and save it to file
> # Use better memory checks - Use the approach as implemented in GCBarrier 
> i.e. monitor the current memory usage and if it goes below certain threshold 
> trigger the batch sort



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OAK-7105) Implement a traverse with sort strategy for DocumentStoreIndexer

2017-12-21 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra resolved OAK-7105.
--
Resolution: Fixed

> Implement a traverse with sort strategy for DocumentStoreIndexer
> 
>
> Key: OAK-7105
> URL: https://issues.apache.org/jira/browse/OAK-7105
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.8, 1.7.15
>
>
> Currently the DocumentStoreIndexer logic uses a StoreAndSortStrategy in which 
> it first dumps all nodestates to a json file -> sort them in batches -> merge 
> the sorted file. In whole indexing the sorting phase is taking decent amount 
> of time (40 mins out of 3 hr run).
> Further this approach suffers with potential OOM while ExternalSort creates 
> in memory batches where actual size of batch exceeds the estimated size 
> considerably. So we need to constant tweak the 
> "oak.indexer.maxSortMemoryInGB" (currently set to 2 GB)
> As an improvement we can do following changes
> # Implement a traverse with sort strategy - Here instead of first dumping all 
> nodestate in a single big json we instead add them to an in memory buffer and 
> then at some stage sort the batch and save it to file
> # Use better memory checks - Use the approach as implemented in GCBarrier 
> i.e. monitor the current memory usage and if it goes below certain threshold 
> trigger the batch sort



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7105) Implement a traverse with sort strategy for DocumentStoreIndexer

2017-12-21 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16299858#comment-16299858
 ] 

Chetan Mehrotra commented on OAK-7105:
--

Switched the default with 1818900

> Implement a traverse with sort strategy for DocumentStoreIndexer
> 
>
> Key: OAK-7105
> URL: https://issues.apache.org/jira/browse/OAK-7105
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.8, 1.7.15
>
>
> Currently the DocumentStoreIndexer logic uses a StoreAndSortStrategy in which 
> it first dumps all nodestates to a json file -> sort them in batches -> merge 
> the sorted file. In whole indexing the sorting phase is taking decent amount 
> of time (40 mins out of 3 hr run).
> Further this approach suffers with potential OOM while ExternalSort creates 
> in memory batches where actual size of batch exceeds the estimated size 
> considerably. So we need to constant tweak the 
> "oak.indexer.maxSortMemoryInGB" (currently set to 2 GB)
> As an improvement we can do following changes
> # Implement a traverse with sort strategy - Here instead of first dumping all 
> nodestate in a single big json we instead add them to an in memory buffer and 
> then at some stage sort the batch and save it to file
> # Use better memory checks - Use the approach as implemented in GCBarrier 
> i.e. monitor the current memory usage and if it goes below certain threshold 
> trigger the batch sort



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7106) Index Tooling for Oak 1.10

2017-12-21 Thread Chetan Mehrotra (JIRA)
Chetan Mehrotra created OAK-7106:


 Summary: Index Tooling for Oak 1.10
 Key: OAK-7106
 URL: https://issues.apache.org/jira/browse/OAK-7106
 Project: Jackrabbit Oak
  Issue Type: Epic
  Components: indexing, run
Reporter: Chetan Mehrotra
 Fix For: 1.10


Epic to track tooling work for Oak 1.10 release



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OAK-6460) Index related tooling

2017-12-21 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra resolved OAK-6460.
--
Resolution: Fixed

> Index related tooling
> -
>
> Key: OAK-6460
> URL: https://issues.apache.org/jira/browse/OAK-6460
> Project: Jackrabbit Oak
>  Issue Type: Epic
>  Components: indexing, run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.8
>
>
> To enable better management for indexing related operation specially around 
> reindexing indexes on large repository setup we should implement some tooling 
> as part of oak-run. This epic is meant to track all work done in this area



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7115) Compress NodeStateEntry when storing in in memory queue

2018-01-02 Thread Chetan Mehrotra (JIRA)
Chetan Mehrotra created OAK-7115:


 Summary: Compress NodeStateEntry when storing in in memory queue
 Key: OAK-7115
 URL: https://issues.apache.org/jira/browse/OAK-7115
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: run
Reporter: Chetan Mehrotra
Assignee: Chetan Mehrotra
Priority: Minor
 Fix For: 1.10


Currently TraverseWithSortStrategy stores the NodeStateEntry as json text in 
the in-memory queue. We can save memory by storing it in byte array and 
probably compressed which would allow storing more entries in-memory before 
sorting and saving in the file





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-7115) Compress NodeStateEntry when storing in in-memory queue

2018-01-02 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-7115:
-
Summary: Compress NodeStateEntry when storing in in-memory queue  (was: 
Compress NodeStateEntry when storing in in memory queue)

> Compress NodeStateEntry when storing in in-memory queue
> ---
>
> Key: OAK-7115
> URL: https://issues.apache.org/jira/browse/OAK-7115
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Minor
> Fix For: 1.10
>
>
> Currently TraverseWithSortStrategy stores the NodeStateEntry as json text in 
> the in-memory queue. We can save memory by storing it in byte array and 
> probably compressed which would allow storing more entries in-memory before 
> sorting and saving in the file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7116) Use JMX mode to reindex on SegmentNodeStore without requiring manual steps

2018-01-02 Thread Chetan Mehrotra (JIRA)
Chetan Mehrotra created OAK-7116:


 Summary: Use JMX mode to reindex on SegmentNodeStore without 
requiring manual steps
 Key: OAK-7116
 URL: https://issues.apache.org/jira/browse/OAK-7116
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: run
Reporter: Chetan Mehrotra
Assignee: Chetan Mehrotra
 Fix For: 1.10


oak-run indexing for SegmentNodeStore currently require following steps while 
performing indexing against a repository which is in use [1]

# Create checkpoint via MBean and pass it as part of cli args
# Perform actual indexing with read only access to repo
# Import the index via MBean operation 

As per current documented steps #1 and #3 are manual. This can potentially be 
simplified by directly using JMX operation from within oak-run as currently for 
accessing SegmentNodeStore oak-run needs to run on same host as actual 
application

*JMX Access*

JMX access can be done via following ways

# Application using Oak has JMX remote 
## Enabled and same info provided as cli args
## Not enabled - In such a case we can rely on 
### pid and [local 
connection|https://stackoverflow.com/questions/13252914/how-to-connect-to-a-local-jmx-server-by-knowing-the-process-id]
 
### Or query all running java prcess jmx and check if SegmentNodeStore repo 
path is same as one provided in cli
# Application using OAk

*Proposed Approach*

# Establish the JMX connection
# Create checkpoint using the JMX connection programatically
# Perform indexing with read only access
# Import the index via JMX access

With this indexing support for SegmentNodeStore would be at par with 
DocumentNodeStore in terms of ease of use
[1] https://jackrabbit.apache.org/oak/docs/query/oak-run-indexing.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7116) Use JMX mode to reindex on SegmentNodeStore without requiring manual steps

2018-01-02 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16307884#comment-16307884
 ] 

Chetan Mehrotra commented on OAK-7116:
--

[~catholicon] [~tmueller] [~dhasler] Thoughts?

> Use JMX mode to reindex on SegmentNodeStore without requiring manual steps
> --
>
> Key: OAK-7116
> URL: https://issues.apache.org/jira/browse/OAK-7116
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.10
>
>
> oak-run indexing for SegmentNodeStore currently require following steps while 
> performing indexing against a repository which is in use [1]
> # Create checkpoint via MBean and pass it as part of cli args
> # Perform actual indexing with read only access to repo
> # Import the index via MBean operation 
> As per current documented steps #1 and #3 are manual. This can potentially be 
> simplified by directly using JMX operation from within oak-run as currently 
> for accessing SegmentNodeStore oak-run needs to run on same host as actual 
> application
> *JMX Access*
> JMX access can be done via following ways
> # Application using Oak has JMX remote 
> ## Enabled and same info provided as cli args
> ## Not enabled - In such a case we can rely on 
> ### pid and [local 
> connection|https://stackoverflow.com/questions/13252914/how-to-connect-to-a-local-jmx-server-by-knowing-the-process-id]
>  
> ### Or query all running java prcess jmx and check if SegmentNodeStore repo 
> path is same as one provided in cli
> # Application using OAk
> *Proposed Approach*
> # Establish the JMX connection
> # Create checkpoint using the JMX connection programatically
> # Perform indexing with read only access
> # Import the index via JMX access
> With this indexing support for SegmentNodeStore would be at par with 
> DocumentNodeStore in terms of ease of use
> [1] https://jackrabbit.apache.org/oak/docs/query/oak-run-indexing.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-7116) Use JMX mode to reindex on SegmentNodeStore without requiring manual steps

2018-01-02 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-7116:
-
Description: 
oak-run indexing for SegmentNodeStore currently require following steps while 
performing indexing against a repository which is in use [1]

# Create checkpoint via MBean and pass it as part of cli args
# Perform actual indexing with read only access to repo
# Import the index via MBean operation 

As per current documented steps #1 and #3 are manual. This can potentially be 
simplified by directly using JMX operation from within oak-run as currently for 
accessing SegmentNodeStore oak-run needs to run on same host as actual 
application

*JMX Access*

JMX access can be done via following ways

# Application using Oak has JMX remote 
## Enabled and same info provided as cli args
## Not enabled - In such a case we can rely on 
### pid and [local 
connection|https://stackoverflow.com/questions/13252914/how-to-connect-to-a-local-jmx-server-by-knowing-the-process-id]
 or [attach|https://github.com/nickman/jmxlocal]
### Or query all running java prcess jmx and check if SegmentNodeStore repo 
path is same as one provided in cli
# Application using OAk

*Proposed Approach*

# Establish the JMX connection
# Create checkpoint using the JMX connection programatically
# Perform indexing with read only access
# Import the index via JMX access

With this indexing support for SegmentNodeStore would be at par with 
DocumentNodeStore in terms of ease of use
[1] https://jackrabbit.apache.org/oak/docs/query/oak-run-indexing.html

  was:
oak-run indexing for SegmentNodeStore currently require following steps while 
performing indexing against a repository which is in use [1]

# Create checkpoint via MBean and pass it as part of cli args
# Perform actual indexing with read only access to repo
# Import the index via MBean operation 

As per current documented steps #1 and #3 are manual. This can potentially be 
simplified by directly using JMX operation from within oak-run as currently for 
accessing SegmentNodeStore oak-run needs to run on same host as actual 
application

*JMX Access*

JMX access can be done via following ways

# Application using Oak has JMX remote 
## Enabled and same info provided as cli args
## Not enabled - In such a case we can rely on 
### pid and [local 
connection|https://stackoverflow.com/questions/13252914/how-to-connect-to-a-local-jmx-server-by-knowing-the-process-id]
 
### Or query all running java prcess jmx and check if SegmentNodeStore repo 
path is same as one provided in cli
# Application using OAk

*Proposed Approach*

# Establish the JMX connection
# Create checkpoint using the JMX connection programatically
# Perform indexing with read only access
# Import the index via JMX access

With this indexing support for SegmentNodeStore would be at par with 
DocumentNodeStore in terms of ease of use
[1] https://jackrabbit.apache.org/oak/docs/query/oak-run-indexing.html


> Use JMX mode to reindex on SegmentNodeStore without requiring manual steps
> --
>
> Key: OAK-7116
> URL: https://issues.apache.org/jira/browse/OAK-7116
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.10
>
>
> oak-run indexing for SegmentNodeStore currently require following steps while 
> performing indexing against a repository which is in use [1]
> # Create checkpoint via MBean and pass it as part of cli args
> # Perform actual indexing with read only access to repo
> # Import the index via MBean operation 
> As per current documented steps #1 and #3 are manual. This can potentially be 
> simplified by directly using JMX operation from within oak-run as currently 
> for accessing SegmentNodeStore oak-run needs to run on same host as actual 
> application
> *JMX Access*
> JMX access can be done via following ways
> # Application using Oak has JMX remote 
> ## Enabled and same info provided as cli args
> ## Not enabled - In such a case we can rely on 
> ### pid and [local 
> connection|https://stackoverflow.com/questions/13252914/how-to-connect-to-a-local-jmx-server-by-knowing-the-process-id]
>  or [attach|https://github.com/nickman/jmxlocal]
> ### Or query all running java prcess jmx and check if SegmentNodeStore repo 
> path is same as one provided in cli
> # Application using OAk
> *Proposed Approach*
> # Establish the JMX connection
> # Create checkpoint using the JMX connection programatically
> # Perform indexing with read only access
> # Import the index via JMX access
> With this indexing support for SegmentNodeStore would be at par with 
> DocumentNodeStore in terms of ease of use
> [1] https://jackrabbit.apache.org/oak/docs/query/oak-run-indexing.html




[jira] [Updated] (OAK-7115) Compress NodeStateEntry when storing in in-memory queue

2018-01-02 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-7115:
-
Attachment: OAK-7115-v1.patch

[patch|^OAK-7115-v1.patch] for the same. Perf test under run

> Compress NodeStateEntry when storing in in-memory queue
> ---
>
> Key: OAK-7115
> URL: https://issues.apache.org/jira/browse/OAK-7115
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Minor
> Fix For: 1.10
>
> Attachments: OAK-7115-v1.patch
>
>
> Currently TraverseWithSortStrategy stores the NodeStateEntry as json text in 
> the in-memory queue. We can save memory by storing it in byte array and 
> probably compressed which would allow storing more entries in-memory before 
> sorting and saving in the file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-7115) Store NodeState json in bytes when storing in in-memory queue

2018-01-03 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-7115:
-
Summary: Store NodeState json in bytes when storing in in-memory queue  
(was: Compress NodeStateEntry when storing in in-memory queue)

> Store NodeState json in bytes when storing in in-memory queue
> -
>
> Key: OAK-7115
> URL: https://issues.apache.org/jira/browse/OAK-7115
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Minor
> Fix For: 1.10
>
> Attachments: OAK-7115-v1.patch
>
>
> Currently TraverseWithSortStrategy stores the NodeStateEntry as json text in 
> the in-memory queue. We can save memory by storing it in byte array and 
> probably compressed which would allow storing more entries in-memory before 
> sorting and saving in the file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OAK-7115) Store NodeState json in bytes when storing in in-memory queue

2018-01-03 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra resolved OAK-7115.
--
   Resolution: Fixed
Fix Version/s: (was: 1.10)
   1.7.15
   1.8

Done with 1819936. It just stores the json in bytes but does not perform any 
compression. With this change the dumping time for 65M nodestates reduced from 
2.632h to 2.230h i.e. saving of 24 mins!

> Store NodeState json in bytes when storing in in-memory queue
> -
>
> Key: OAK-7115
> URL: https://issues.apache.org/jira/browse/OAK-7115
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Minor
> Fix For: 1.8, 1.7.15
>
> Attachments: OAK-7115-v1.patch
>
>
> Currently TraverseWithSortStrategy stores the NodeStateEntry as json text in 
> the in-memory queue. We can save memory by storing it in byte array and 
> probably compressed which would allow storing more entries in-memory before 
> sorting and saving in the file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7122) Implement script to compare lucene indexes logically

2018-01-05 Thread Chetan Mehrotra (JIRA)
Chetan Mehrotra created OAK-7122:


 Summary: Implement script to compare lucene indexes logically
 Key: OAK-7122
 URL: https://issues.apache.org/jira/browse/OAK-7122
 Project: Jackrabbit Oak
  Issue Type: Task
  Components: run
Reporter: Chetan Mehrotra
Assignee: Chetan Mehrotra
 Fix For: 1.8


With Document Traversal based indexing we have implemented a newer indexing 
logic. To validate that index produced by it is is same as one done by existing 
indexing flow we need to implement a script which can enable comparing the 
index content logically

This was recently discussed on lucene mailing list [1] and suggestion there was 
it can be done by un-inverting the index. So to enable that we need to 
implement a script which can 

# Open a Lucene index
# Map the Lucene Document to path of node
# For each document determine what all fields are associated with it (stored 
and non stored)
# Dump this content in file sorted by path and for each line field name sorted 
by name

Then such dumps can be generated for old and new index and compared via simple 
text diff

[1] http://lucene.markmail.org/thread/wt22gk6aufs4uz55



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7122) Implement script to compare lucene indexes logically

2018-01-05 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312866#comment-16312866
 ] 

Chetan Mehrotra commented on OAK-7122:
--

Implemented the script at [1]. Currently it build up the structure in memory. 
If this proves to be problamatic for large index can look into building the 
structure on file system

[1] 
https://github.com/chetanmeh/oak-console-scripts/tree/master/src/main/groovy/lucene

> Implement script to compare lucene indexes logically
> 
>
> Key: OAK-7122
> URL: https://issues.apache.org/jira/browse/OAK-7122
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.8
>
>
> With Document Traversal based indexing we have implemented a newer indexing 
> logic. To validate that index produced by it is is same as one done by 
> existing indexing flow we need to implement a script which can enable 
> comparing the index content logically
> This was recently discussed on lucene mailing list [1] and suggestion there 
> was it can be done by un-inverting the index. So to enable that we need to 
> implement a script which can 
> # Open a Lucene index
> # Map the Lucene Document to path of node
> # For each document determine what all fields are associated with it (stored 
> and non stored)
> # Dump this content in file sorted by path and for each line field name 
> sorted by name
> Then such dumps can be generated for old and new index and compared via 
> simple text diff
> [1] http://lucene.markmail.org/thread/wt22gk6aufs4uz55



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7123) ChildNodeStateProvider does not return all immediate children

2018-01-05 Thread Chetan Mehrotra (JIRA)
Chetan Mehrotra created OAK-7123:


 Summary: ChildNodeStateProvider does not return all immediate 
children
 Key: OAK-7123
 URL: https://issues.apache.org/jira/browse/OAK-7123
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: run
Affects Versions: 1.7.14
Reporter: Chetan Mehrotra
Assignee: Chetan Mehrotra
 Fix For: 1.8, 1.7.15


Based on script implemented in OAK-7122 and running it against a test index it 
was observed that some of the relative fields were not getting indexed. This 
happens because the ChildNodeStateProvider#children does not handle the 
immediate children check properly. It would fail for case like

{noformat}
/a
/a/b
/a/b/c
/a/d
/a/d/e
{noformat}

Currently it would only report 'b' as child of 'a'. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-7122) Implement script to compare lucene indexes logically

2018-01-05 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312866#comment-16312866
 ] 

Chetan Mehrotra edited comment on OAK-7122 at 1/5/18 10:24 AM:
---

Implemented the script at [1]. Currently it build up the structure in memory. 
If this proves to be problamatic for large index can look into building the 
structure on file system

*Usage*

{code}
java -DindexPath=/path/to/indexing-result/indexes/lucene/data \
-jar oak-run-*.jar \
console /path/to/segmentstore \
":load 
https://raw.githubusercontent.com/chetanmeh/oak-console-scripts/master/src/main/groovy/lucene/luceneIndexDumper.groovy";
{code}

[1] 
https://github.com/chetanmeh/oak-console-scripts/tree/master/src/main/groovy/lucene


was (Author: chetanm):
Implemented the script at [1]. Currently it build up the structure in memory. 
If this proves to be problamatic for large index can look into building the 
structure on file system

[1] 
https://github.com/chetanmeh/oak-console-scripts/tree/master/src/main/groovy/lucene

> Implement script to compare lucene indexes logically
> 
>
> Key: OAK-7122
> URL: https://issues.apache.org/jira/browse/OAK-7122
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.8
>
>
> With Document Traversal based indexing we have implemented a newer indexing 
> logic. To validate that index produced by it is is same as one done by 
> existing indexing flow we need to implement a script which can enable 
> comparing the index content logically
> This was recently discussed on lucene mailing list [1] and suggestion there 
> was it can be done by un-inverting the index. So to enable that we need to 
> implement a script which can 
> # Open a Lucene index
> # Map the Lucene Document to path of node
> # For each document determine what all fields are associated with it (stored 
> and non stored)
> # Dump this content in file sorted by path and for each line field name 
> sorted by name
> Then such dumps can be generated for old and new index and compared via 
> simple text diff
> [1] http://lucene.markmail.org/thread/wt22gk6aufs4uz55



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OAK-7123) ChildNodeStateProvider does not return all immediate children

2018-01-05 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra resolved OAK-7123.
--
Resolution: Fixed

Done with 1820278

> ChildNodeStateProvider does not return all immediate children
> -
>
> Key: OAK-7123
> URL: https://issues.apache.org/jira/browse/OAK-7123
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: run
>Affects Versions: 1.7.14
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.8, 1.7.15
>
>
> Based on script implemented in OAK-7122 and running it against a test index 
> it was observed that some of the relative fields were not getting indexed. 
> This happens because the ChildNodeStateProvider#children does not handle the 
> immediate children check properly. It would fail for case like
> {noformat}
> /a
> /a/b
> /a/b/c
> /a/d
> /a/d/e
> {noformat}
> Currently it would only report 'b' as child of 'a'. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OAK-7122) Implement script to compare lucene indexes logically

2018-01-05 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra resolved OAK-7122.
--
Resolution: Done

> Implement script to compare lucene indexes logically
> 
>
> Key: OAK-7122
> URL: https://issues.apache.org/jira/browse/OAK-7122
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.8
>
>
> With Document Traversal based indexing we have implemented a newer indexing 
> logic. To validate that index produced by it is is same as one done by 
> existing indexing flow we need to implement a script which can enable 
> comparing the index content logically
> This was recently discussed on lucene mailing list [1] and suggestion there 
> was it can be done by un-inverting the index. So to enable that we need to 
> implement a script which can 
> # Open a Lucene index
> # Map the Lucene Document to path of node
> # For each document determine what all fields are associated with it (stored 
> and non stored)
> # Dump this content in file sorted by path and for each line field name 
> sorted by name
> Then such dumps can be generated for old and new index and compared via 
> simple text diff
> [1] http://lucene.markmail.org/thread/wt22gk6aufs4uz55



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7124) Support MemoryNodeStore with NodeStoreFixtureProvider

2018-01-05 Thread Chetan Mehrotra (JIRA)
Chetan Mehrotra created OAK-7124:


 Summary: Support MemoryNodeStore with NodeStoreFixtureProvider
 Key: OAK-7124
 URL: https://issues.apache.org/jira/browse/OAK-7124
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: run
Reporter: Chetan Mehrotra
Assignee: Chetan Mehrotra
 Fix For: 1.8, 1.7.15


At times we need to use oak-run console to just execute some script (like 
OAK-7122). Currently oak-run console would require a working repository access. 
To support such cases we should enable support for using MemoryNodeStore. So 
following command can be used

{noformat}
java -jar oak-run-*.jar console memory
{noformat}

The memory NodeStore can be used to play with NodeStore API. Or this can just 
be used to enable launch of groovy script



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OAK-7124) Support MemoryNodeStore with NodeStoreFixtureProvider

2018-01-05 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra resolved OAK-7124.
--
Resolution: Fixed

Done with 1820292

> Support MemoryNodeStore with NodeStoreFixtureProvider
> -
>
> Key: OAK-7124
> URL: https://issues.apache.org/jira/browse/OAK-7124
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.8, 1.7.15
>
>
> At times we need to use oak-run console to just execute some script (like 
> OAK-7122). Currently oak-run console would require a working repository 
> access. To support such cases we should enable support for using 
> MemoryNodeStore. So following command can be used
> {noformat}
> java -jar oak-run-*.jar console memory
> {noformat}
> The memory NodeStore can be used to play with NodeStore API. Or this can just 
> be used to enable launch of groovy script



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7147) Oak run LuceneIndexer indexes excluded parent nodes

2018-01-11 Thread Chetan Mehrotra (JIRA)
Chetan Mehrotra created OAK-7147:


 Summary: Oak run LuceneIndexer indexes excluded parent nodes
 Key: OAK-7147
 URL: https://issues.apache.org/jira/browse/OAK-7147
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: indexing, run
Reporter: Chetan Mehrotra
Assignee: Chetan Mehrotra
Priority: Minor
 Fix For: 1.9.0, 1.10, 1.8.1


{{LuceneIndexer}} currently indexes parent nodes which are not included by 
includedPaths. This happens because the LuceneIndexer#index does not check for 
path filter result and proceeds to index any node handed to it by the 
DocumentStoreIndexer

As a fix it should check if the filter result is PathFilter.Result.INCLUDE



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7147) Oak run LuceneIndexer indexes excluded parent nodes

2018-01-11 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16323563#comment-16323563
 ] 

Chetan Mehrotra commented on OAK-7147:
--

Fixed with r1820947

> Oak run LuceneIndexer indexes excluded parent nodes
> ---
>
> Key: OAK-7147
> URL: https://issues.apache.org/jira/browse/OAK-7147
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: indexing, run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Minor
> Fix For: 1.9.0, 1.10, 1.8.1
>
>
> {{LuceneIndexer}} currently indexes parent nodes which are not included by 
> includedPaths. This happens because the LuceneIndexer#index does not check 
> for path filter result and proceeds to index any node handed to it by the 
> DocumentStoreIndexer
> As a fix it should check if the filter result is PathFilter.Result.INCLUDE



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-7147) Oak run LuceneIndexer indexes excluded parent nodes

2018-01-11 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-7147:
-
Labels: candidate_oak_1_8  (was: )

> Oak run LuceneIndexer indexes excluded parent nodes
> ---
>
> Key: OAK-7147
> URL: https://issues.apache.org/jira/browse/OAK-7147
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: indexing, run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Minor
>  Labels: candidate_oak_1_8
> Fix For: 1.9.0, 1.10
>
>
> {{LuceneIndexer}} currently indexes parent nodes which are not included by 
> includedPaths. This happens because the LuceneIndexer#index does not check 
> for path filter result and proceeds to index any node handed to it by the 
> DocumentStoreIndexer
> As a fix it should check if the filter result is PathFilter.Result.INCLUDE



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-7147) Oak run LuceneIndexer indexes excluded parent nodes

2018-01-11 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-7147:
-
Fix Version/s: (was: 1.8.1)

> Oak run LuceneIndexer indexes excluded parent nodes
> ---
>
> Key: OAK-7147
> URL: https://issues.apache.org/jira/browse/OAK-7147
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: indexing, run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Minor
>  Labels: candidate_oak_1_8
> Fix For: 1.9.0, 1.10
>
>
> {{LuceneIndexer}} currently indexes parent nodes which are not included by 
> includedPaths. This happens because the LuceneIndexer#index does not check 
> for path filter result and proceeds to index any node handed to it by the 
> DocumentStoreIndexer
> As a fix it should check if the filter result is PathFilter.Result.INCLUDE



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OAK-7147) Oak run LuceneIndexer indexes excluded parent nodes

2018-01-11 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra resolved OAK-7147.
--
Resolution: Fixed

> Oak run LuceneIndexer indexes excluded parent nodes
> ---
>
> Key: OAK-7147
> URL: https://issues.apache.org/jira/browse/OAK-7147
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: indexing, run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Minor
>  Labels: candidate_oak_1_8
> Fix For: 1.9.0, 1.10
>
>
> {{LuceneIndexer}} currently indexes parent nodes which are not included by 
> includedPaths. This happens because the LuceneIndexer#index does not check 
> for path filter result and proceeds to index any node handed to it by the 
> DocumentStoreIndexer
> As a fix it should check if the filter result is PathFilter.Result.INCLUDE



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-7147) Oak run LuceneIndexer indexes excluded parent nodes

2018-01-11 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-7147:
-
Affects Version/s: 1.8.0

> Oak run LuceneIndexer indexes excluded parent nodes
> ---
>
> Key: OAK-7147
> URL: https://issues.apache.org/jira/browse/OAK-7147
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: indexing, run
>Affects Versions: 1.8.0
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Minor
> Fix For: 1.9.0, 1.10, 1.8.1
>
>
> {{LuceneIndexer}} currently indexes parent nodes which are not included by 
> includedPaths. This happens because the LuceneIndexer#index does not check 
> for path filter result and proceeds to index any node handed to it by the 
> DocumentStoreIndexer
> As a fix it should check if the filter result is PathFilter.Result.INCLUDE



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-7147) Oak run LuceneIndexer indexes excluded parent nodes

2018-01-11 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-7147:
-
   Labels:   (was: candidate_oak_1_8)
Fix Version/s: 1.8.1

Merge
* 1.8 - 1820948

> Oak run LuceneIndexer indexes excluded parent nodes
> ---
>
> Key: OAK-7147
> URL: https://issues.apache.org/jira/browse/OAK-7147
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: indexing, run
>Affects Versions: 1.8.0
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Minor
> Fix For: 1.9.0, 1.10, 1.8.1
>
>
> {{LuceneIndexer}} currently indexes parent nodes which are not included by 
> includedPaths. This happens because the LuceneIndexer#index does not check 
> for path filter result and proceeds to index any node handed to it by the 
> DocumentStoreIndexer
> As a fix it should check if the filter result is PathFilter.Result.INCLUDE



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7167) 1.0: oak-lucene uses packages from oak-core that are not exported

2018-01-17 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328692#comment-16328692
 ] 

Chetan Mehrotra commented on OAK-7167:
--

Looks like we need to backport the test introduced in OAK-2402 which was 
introduced for very same reason!

> 1.0: oak-lucene uses packages from oak-core that are not exported
> -
>
> Key: OAK-7167
> URL: https://issues.apache.org/jira/browse/OAK-7167
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.0.40
>Reporter: Julian Reschke
>Priority: Major
> Fix For: 1.0.41
>
>
> See comments in https://issues.apache.org/jira/browse/OAK-5299.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7167) 1.0: oak-lucene uses packages from oak-core that are not exported

2018-01-17 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328730#comment-16328730
 ] 

Chetan Mehrotra commented on OAK-7167:
--

{quote}just the test?{quote}

Yes

> 1.0: oak-lucene uses packages from oak-core that are not exported
> -
>
> Key: OAK-7167
> URL: https://issues.apache.org/jira/browse/OAK-7167
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.0.40
>Reporter: Julian Reschke
>Priority: Major
> Fix For: 1.0.41
>
>
> See comments in https://issues.apache.org/jira/browse/OAK-5299.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-6254) DataStore: API to retrieve approximate storage size

2018-01-23 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16335543#comment-16335543
 ] 

Chetan Mehrotra commented on OAK-6254:
--

bq. The size could be stored in a file, and updated whenever datastore GC is 
run.

It may be better to store the unstructured data in NodeStore itself under 
specific node. 

> DataStore: API to retrieve approximate storage size
> ---
>
> Key: OAK-6254
> URL: https://issues.apache.org/jira/browse/OAK-6254
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: blob
>Reporter: Thomas Mueller
>Priority: Major
> Fix For: 1.10
>
>
> The estimated size of the datastore (on disk) is needed to:
> * monitor growth over time, or growth of certain operations
> * monitor if garbage collection is effective
> * avoid out of disk space
> * estimate backup size
> * statistical purposes (for example, if there are many repositories, to group 
> them by size)
> Datastore size: we could use the following heuristic: We could read the file 
> sizes in ./datastore/00/00 (if it exists) and multiply by 65536; or 
> ./datastore/00 and multiply by 256. That would give a rough estimation 
> (within about 20% for repositories with datastore size > 50 GB).
> I think this is mainly important for the FileDataStore. The S3 datastore, if 
> there is a simple and fast S3 API to read the size, then that would be good 
> as well, but if there is none, then returning "unknown" is fine for me.
> As for the API, I would use something like this: {{long 
> getEstimatedStorageSize(int accuracyLevel)}} with accuracyLevel 1 for 
> inaccurate (fastest), 2 more accurate (slower),..., 9 precise (possibly very 
> slow). Similar to 
> [java.util.zip.Deflater.setLevel|https://docs.oracle.com/javase/7/docs/api/java/util/zip/Deflater.html#setLevel(int)].
>  I would expect it takes up to 1 second for accuracyLevel 0, up to 5 seconds 
> for accuracyLevel 1, and possibly hours for level 9. With level 1, I would 
> read files in 00/00, with level 2 - 8 I would read files in 00, and with 
> level 9 I would read all the files. For level 1, I wouldn't stop; for level 
> 2, if it takes more than 5 seconds, I would stop and return the current best 
> estimate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (OAK-7212) Document the document order traversal option

2018-01-28 Thread Chetan Mehrotra (JIRA)
Chetan Mehrotra created OAK-7212:


 Summary: Document the document order traversal option
 Key: OAK-7212
 URL: https://issues.apache.org/jira/browse/OAK-7212
 Project: Jackrabbit Oak
  Issue Type: Documentation
  Components: doc, run
Reporter: Chetan Mehrotra
Assignee: Chetan Mehrotra
 Fix For: 1.8.2


Document the doc-order-traversal option introduced with OAK-6353



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


<    9   10   11   12   13   14   15   16   17   18   >