[jira] [Closed] (OAK-1861) Limit memory usage of DocumentNodeStore.readChildren()

2014-07-22 Thread Marcel Reutegger (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Reutegger closed OAK-1861.
-


> Limit memory usage of DocumentNodeStore.readChildren()
> --
>
> Key: OAK-1861
> URL: https://issues.apache.org/jira/browse/OAK-1861
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, mongomk
>Reporter: Marcel Reutegger
>Assignee: Marcel Reutegger
> Fix For: 1.0.2, 1.1
>
>
> There is still a TODO in DocumentNodeStore.readChildren() about memory usage. 
> The name offset is already implemented and used when iterating over many 
> child nodes. But there are still cases where the readChildren() method itself 
> may use too much memory. This happens when there are a lot of documents for 
> deleted child nodes. The for loop inside readChildren() will double the 
> rawLimit until it is able to fetch the requested nodes and start again with 
> an empty list of children. This should be improved to continue after the last 
> returned document.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Closed] (OAK-1937) Add progress logs to MarkSweepGarbageCollector

2014-07-22 Thread Marcel Reutegger (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Reutegger closed OAK-1937.
-


> Add progress logs to MarkSweepGarbageCollector
> --
>
> Key: OAK-1937
> URL: https://issues.apache.org/jira/browse/OAK-1937
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segmentmk
>Reporter: Alex Parvulescu
>Assignee: Alex Parvulescu
> Fix For: 1.0.2, 1.1
>
>
> The #mark phase of the GC can be quite long and the logs don't provide any 
> meaningful progress reports. I'd like to add a debug log on each batch save.
> Everything is already in place, this is just to add a simple one line debug.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Closed] (OAK-1462) Aggregate Index isn't usable in an osgi environment

2014-07-22 Thread Marcel Reutegger (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Reutegger closed OAK-1462.
-


> Aggregate Index isn't usable in an osgi environment
> ---
>
> Key: OAK-1462
> URL: https://issues.apache.org/jira/browse/OAK-1462
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, query
>Reporter: Alex Parvulescu
>Assignee: Chetan Mehrotra
> Fix For: 1.0.2, 1.1
>
>
> There are a few issues with the Aggregate Index in an osgi env.
>  - First the cost is the same as the wrapped index. This is not really about 
> osgi, but the fact that in an osgi enabled environment there will be 2 
> indices returning the same cost (ex. lucene and aggregate lucene) so some 
> full-text queries may fail randomly.
>  - Second, there are no osgi annotations on the Aggregate Index so it is not 
> automatically enabled. I'm not sure how to enable that yet.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Closed] (OAK-1936) TarMK compaction map check should switch comparison sides

2014-07-22 Thread Marcel Reutegger (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Reutegger closed OAK-1936.
-


> TarMK compaction map check should switch comparison sides
> -
>
> Key: OAK-1936
> URL: https://issues.apache.org/jira/browse/OAK-1936
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: segmentmk
>Affects Versions: 1.0.1
>Reporter: Alex Parvulescu
>Assignee: Alex Parvulescu
> Fix For: 1.0.2, 1.1
>
>
> This issue affects the SegmentNodeState#equals call as it makes the 
> compaction map useless on account of not properly identifying compacted 
> states.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Closed] (OAK-1959) AsyncIndexUpdate unable to cope with missing checkpoint ref

2014-07-22 Thread Marcel Reutegger (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Reutegger closed OAK-1959.
-


> AsyncIndexUpdate unable to cope with missing checkpoint ref
> ---
>
> Key: OAK-1959
> URL: https://issues.apache.org/jira/browse/OAK-1959
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.0, 1.0.1
>Reporter: Alex Parvulescu
>Assignee: Alex Parvulescu
> Fix For: 1.0.2, 1.1
>
>
> The async index uses a checkpoint reference stored under the _:async_ hidden 
> node as a base for running the index diff.
> It might happen that this reference is stale (pointing to checkpoints that no 
> longer exist) so the async indexer logs a warning that it will reindex 
> everything and will start its work.
> The trouble is with the #mergeWithConcurrencyCheck which does not cope well 
> with this scenario. Even if the ref checkpoint is null, it will throw a 
> concurrent update exception which will be logged as a misleading debug log 
> _Concurrent update detected in the async index update_.
> Overall the code looks stuck in an endless reindexing loop.
> {code}
> *WARN* [pool-9-thread-1] 
> org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate Failed to retrieve 
> previously indexed checkpoint 569d8847-ebb6-4832-a55f-2b0b1a32ae71; 
> re-running the initial async index update
> *DEBUG* [pool-9-thread-1] 
> org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate Concurrent update 
> detected in the async index update
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Closed] (OAK-1934) Optimize MutableTree.orderBefore for the common case

2014-07-22 Thread Marcel Reutegger (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Reutegger closed OAK-1934.
-


> Optimize MutableTree.orderBefore for the common case
> 
>
> Key: OAK-1934
> URL: https://issues.apache.org/jira/browse/OAK-1934
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core
>Reporter: Jukka Zitting
>Assignee: Jukka Zitting
> Fix For: 1.0.2, 1.1
>
>
> After OAK-850 and OAK-1584 we settled on an {{orderBefore()}} implementation 
> that always recreates the child order list based on the names of the child 
> nodes that are present in a parent. This is a somewhat expensive operation 
> with lots of child nodes as seen in JCR-3793.
> We could optimize the implementation further for the common case where the 
> child order list is in sync with the actual list of child nodes. For example 
> we could skip recreating the child order list when the name we're looking for 
> is already included in that list. Over time this approach should still detect 
> cases where the list becomes out of sync, and automatically repair the list 
> when that happens.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Closed] (OAK-1890) Concurrent System Login: slowdown for high concurrency levels

2014-07-22 Thread Marcel Reutegger (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Reutegger closed OAK-1890.
-


> Concurrent System Login: slowdown for high concurrency levels
> -
>
> Key: OAK-1890
> URL: https://issues.apache.org/jira/browse/OAK-1890
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: jcr
>Reporter: angela
>Assignee: Michael Dürig
> Fix For: 1.0.2, 1.1
>
>
> output of running the system login/logout test with profiling enabled:
> {quote}
> $ java -Dprofile=true -Xmx2048M org.apache.jackrabbit.oak.run.Main benchmark 
> LoginSystemTest Oak-Tar --concurrency 1,2,4,8,10,15,20,50
> Apache Jackrabbit Oak 1.1-SNAPSHOT
> # LoginSystemTest  C min 10% 50% 90% max  
>  N
> Oak-Tar1  12  13  19  24  42  
>266
> Oak-Tar2  12  15  20  24  32  
>496
> Oak-Tar4  20  23  30  37  60  
>660
> Oak-Tar8  41  67  75  85  95  
>532
> Oak-Tar   10  77  90  96 1135166  
>122
> Oak-Tar   15 109 127555956735701  
> 27
> Oak-Tar   2058685874592859435944  
> 20
> Oak-Tar   50   22116   22133   22151   22157   22162  
> 50
> Profiler: top 5 stack trace(s) of 70414 ms:
> 1865/21120 (8%):
> at 
> org.apache.jackrabbit.stats.RepositoryStatisticsImpl.getOrCreateRecorder(RepositoryStatisticsImpl.java:99)
> at 
> org.apache.jackrabbit.stats.RepositoryStatisticsImpl.getCounter(RepositoryStatisticsImpl.java:80)
> at 
> org.apache.jackrabbit.oak.stats.StatisticManager.getCounter(StatisticManager.java:81)
> at 
> org.apache.jackrabbit.oak.jcr.session.SessionContext.getCounter(SessionContext.java:182)
> at 
> org.apache.jackrabbit.oak.jcr.session.SessionImpl.(SessionImpl.java:89)
> at 
> org.apache.jackrabbit.oak.jcr.session.SessionContext.createSession(SessionContext.java:161)
> at 
> org.apache.jackrabbit.oak.jcr.session.SessionContext.getSession(SessionContext.java:141)
> at 
> org.apache.jackrabbit.oak.jcr.repository.RepositoryImpl.login(RepositoryImpl.java:260)
> at 
> org.apache.jackrabbit.oak.jcr.repository.RepositoryImpl.login(RepositoryImpl.java:195)
> at 
> org.apache.jackrabbit.oak.benchmark.LoginSystemTest$1.run(LoginSystemTest.java:54)
> at 
> org.apache.jackrabbit.oak.benchmark.LoginSystemTest$1.run(LoginSystemTest.java:51)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAsPrivileged(Subject.java:515)
> at 
> org.apache.jackrabbit.oak.benchmark.LoginSystemTest.runTest(LoginSystemTest.java:51)
> at 
> org.apache.jackrabbit.oak.benchmark.AbstractTest.execute(AbstractTest.java:279)
> at 
> org.apache.jackrabbit.oak.benchmark.LoginSystemTest.execute(LoginSystemTest.java:33)
> at 
> org.apache.jackrabbit.oak.benchmark.AbstractTest.execute(AbstractTest.java:288)
> at 
> org.apache.jackrabbit.oak.benchmark.AbstractTest.access$000(AbstractTest.java:42)
> at 
> org.apache.jackrabbit.oak.benchmark.AbstractTest$Executor.run(AbstractTest.java:215)
> 1704/21120 (8%):
> at java.lang.Throwable.fillInStackTrace(Native Method)
> at java.lang.Throwable.(Throwable.java:196)
> at java.lang.Exception.(Exception.java:41)
> at 
> org.apache.jackrabbit.oak.jcr.session.SessionStats.(SessionStats.java:40)
> at 
> org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.(SessionDelegate.java:154)
> at 
> org.apache.jackrabbit.oak.jcr.repository.RepositoryImpl$1.(RepositoryImpl.java:271)
> at 
> org.apache.jackrabbit.oak.jcr.repository.RepositoryImpl.createSessionDelegate(RepositoryImpl.java:269)
> at 
> org.apache.jackrabbit.oak.jcr.repository.RepositoryImpl.login(RepositoryImpl.java:255)
> at 
> org.apache.jackrabbit.oak.jcr.repository.RepositoryImpl.login(RepositoryImpl.java:195)
> at 
> org.apache.jackrabbit.oak.benchmark.LoginSystemTest$1.run(LoginSystemTest.java:54)
> at 
> org.apache.jackrabbit.oak.benchmark.LoginSystemTest$1.run(LoginSystemTest.java:51)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAsPrivileged(Subject.java:515)
> at 
> org.apache.jackrabbit.oak.benchmark.LoginSystemTest.runTest(LoginSystemTest.java:51)
> at 
> org.apache.jackrabbit.oak.benchmark.AbstractTest.execute(AbstractTest.java:279)
> at 
> org.apache.jackrabbit.oak.benchmark.LoginSystemTest.execute(LoginSystemTest.java:33)
> at 
> org.apache.jackrabbit.oak.benchmark.AbstractTest.execute(AbstractTest.java:288)
> at 
> org.apache.jackrabbit.oak.benchmark.AbstractTest.access$000(AbstractTest.java:42)
> at 
>

[jira] [Closed] (OAK-1902) NodeTypeIndex is not conversative enough about its cost

2014-07-22 Thread Marcel Reutegger (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Reutegger closed OAK-1902.
-


> NodeTypeIndex is not conversative enough about its cost
> ---
>
> Key: OAK-1902
> URL: https://issues.apache.org/jira/browse/OAK-1902
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: query
>Affects Versions: 1.0, 1.0.1
>Reporter: Justin Edelson
>Assignee: Thomas Mueller
> Fix For: 1.0.2, 1.1
>
> Attachments: OAK-1902-b.patch, OAK-1902-c.patch, OAK-1902.patch
>
>
> NodeTypeIndexProvider derives its cost from PropertyIndexLookup. 
> PropertyIndexLookup has a hardcoded maximum cost (which actually isn't a 
> maximum cost, but is more the maximum number of nodes which will be read 
> during cost calcuation).
> IMHO, these maximum costs should not be the same. In my experience with 
> JCR-based applications, the number of matches for a particular node type is 
> far greater than the number of matches for a regular property value.
> As a result, I would suggest that if the maximum cost is reached, a greater 
> penalty should be applied to a node type index than a regular property index.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Closed] (OAK-1917) FileNotFoundException during TarMK GC

2014-07-22 Thread Marcel Reutegger (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Reutegger closed OAK-1917.
-


> FileNotFoundException during TarMK GC
> -
>
> Key: OAK-1917
> URL: https://issues.apache.org/jira/browse/OAK-1917
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, segmentmk
>Affects Versions: 1.0.1
>Reporter: Jukka Zitting
>Assignee: Jukka Zitting
> Fix For: 1.0.2, 1.1
>
>
> When running garbage collection on a TarMK repository, it's in certain cases 
> possible for the following {{FileNotFoundException}} to occur:
> {noformat}
> java.io.FileNotFoundException: /path/to/dataNNb.tar (No such file or 
> directory)
> at java.io.RandomAccessFile.open(Native Method) ~[na:1.7.0_55]
> at java.io.RandomAccessFile.(RandomAccessFile.java:241) 
> ~[na:1.7.0_55]
> at 
> org.apache.jackrabbit.oak.plugins.segment.file.TarReader.openFirstFileWithValidIndex(TarReader.java:186)
>  [oak-run-1.0.2-SNAPSHOT.jar:1.0.2-SNAPSHOT]
> at 
> org.apache.jackrabbit.oak.plugins.segment.file.TarReader.cleanup(TarReader.java:647)
>  [oak-run-1.0.2-SNAPSHOT.jar:1.0.2-SNAPSHOT]
> at 
> org.apache.jackrabbit.oak.plugins.segment.file.FileStore.flush(FileStore.java:375)
>  [oak-run-1.0.2-SNAPSHOT.jar:1.0.2-SNAPSHOT]
> at 
> org.apache.jackrabbit.oak.plugins.segment.file.FileStore.close(FileStore.java:465)
>  [oak-run-1.0.2-SNAPSHOT.jar:1.0.2-SNAPSHOT]
> at org.apache.jackrabbit.oak.run.Main.compact(Main.java:177) 
> [oak-run-1.0.2-SNAPSHOT.jar:1.0.2-SNAPSHOT]
> at org.apache.jackrabbit.oak.run.Main.main(Main.java:108) 
> [oak-run-1.0.2-SNAPSHOT.jar:1.0.2-SNAPSHOT]
> {noformat}
> I originally assumed this error to be some weird platform issue, based on 
> some online reports about a new file not being available for opening during a 
> brief period after it was created. However, the explanation for this issue is 
> more deterministic:
> If the tar file in question was created with an Oak 0.x version from before 
> OAK-1780, then it wouldn't contain the pre-compiled segment graph 
> information. Due to a slight bug in the OAK-1780 implementation, this would 
> prevent a tar file that's full of garbage from being simply removed. Instead 
> a new, empty tar file would get generated, and due to the lazy writing 
> implemented OAK-631 that file would actually never get created. Thus the 
> FileNotFoundException.
> To fix this problem, we need to make sure that a tar file that's full of 
> garbage will get cleanly removed even if it doesn't contain a pre-compiled 
> segment graph.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Closed] (OAK-1903) PropertyIndex applies 10x penalty to is not null queries even when the node count is known

2014-07-22 Thread Marcel Reutegger (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Reutegger closed OAK-1903.
-


> PropertyIndex applies 10x penalty to is not null queries even when the node 
> count is known
> --
>
> Key: OAK-1903
> URL: https://issues.apache.org/jira/browse/OAK-1903
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: query
>Affects Versions: 1.0, 1.0.1
>Reporter: Justin Edelson
>Assignee: Thomas Mueller
> Fix For: 1.0.2
>
> Attachments: OAK-1903.patch
>
>
> ContentMirrorStoreStrategy has a bit of code which multiples the cost of a 
> "is not null" query by 10. This code was added in r1501746, but there's 
> nothing in the related JIRA (OAK-894) which mentions this change.
> It appears to me that this multiplier should only take effect *if* the 
> maximum node count has been reached. Otherwise, this code vastly overstates 
> the cost of a query.
> In a recent test, a property on 45 nodes generated a cost of 452.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Closed] (OAK-1958) Session.logout performance poor

2014-07-22 Thread Marcel Reutegger (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Reutegger closed OAK-1958.
-


> Session.logout performance poor
> ---
>
> Key: OAK-1958
> URL: https://issues.apache.org/jira/browse/OAK-1958
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: jcr
>Affects Versions: 1.0.1
> Environment: linux jdk1.7 b25 and b55.
>Reporter: Rob Ryan
>Assignee: Michael Dürig
>  Labels: Performance
> Fix For: 1.0.2, 1.1
>
> Attachments: oak-1958.diff, sessionConcurrencyTest.zip
>
>
> Problem:
> Session.logout was observed to take 14% of time in a performance test of a 
> reasonably real-world load.
> Method:
> Use the attached sling junit test case to run 8 concurrent instances of the 
> test. profile with YourKit or  similar and see >50% time taken by logout.
> Expected:
> Logout should be practically free.
> Solution:
> The attached patch avoids a bug in guava-15 (still present in guava-17 the 
> latest) where the former use of addCallback triggered many 
> CancellationExceptions when sessions were quickly created and logged out.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Closed] (OAK-1931) MicroKernel.read() returns negative value

2014-07-22 Thread Marcel Reutegger (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Reutegger closed OAK-1931.
-


> MicroKernel.read() returns negative value
> -
>
> Key: OAK-1931
> URL: https://issues.apache.org/jira/browse/OAK-1931
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: mk
>Affects Versions: 1.0
>Reporter: Michael Dürig
>Assignee: Stefan Guggisberg
> Fix For: 1.0.2, 1.1
>
>
> The contract of {{MicroKernel#read}} states: "This method never returns 
> negative values.". However, AFAICS all of our implementations *do* return -1 
> under certain circumstances and some test cases (e.g. 
> {{MicroKernelInputStreamTest}} even rely on this). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Closed] (OAK-1932) TarMK compaction can create mixed segments

2014-07-22 Thread Marcel Reutegger (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Reutegger closed OAK-1932.
-


> TarMK compaction can create mixed segments
> --
>
> Key: OAK-1932
> URL: https://issues.apache.org/jira/browse/OAK-1932
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, segmentmk
>Affects Versions: 1.0.1
>Reporter: Jukka Zitting
>Assignee: Jukka Zitting
> Fix For: 1.0.2, 1.1
>
> Attachments: Compactor.java.patch, CompactorTest.java.patch
>
>
> As described in http://markmail.org/message/ujkqdlthudaortxf, commits that 
> occur while the compaction operation is running can make the compacted 
> segments contain references to older data segments, which prevents old data 
> from being reclaimed during cleanup.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Closed] (OAK-1703) Improve warning logged on concurrent Session access

2014-07-22 Thread Marcel Reutegger (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Reutegger closed OAK-1703.
-


> Improve warning logged on concurrent Session access
> ---
>
> Key: OAK-1703
> URL: https://issues.apache.org/jira/browse/OAK-1703
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: jcr
>Reporter: Michael Dürig
>Assignee: Michael Dürig
>  Labels: concurrency
> Fix For: 1.0.2, 1.1
>
>
> OAK-1601 introduced warnings that are logged when a session is accessed 
> concurrently from different threads. The modalities however differ from those 
> of Jackrabbit 2. The message 
> {code}
> Attempt to perform "sessionOperation" while another thread is concurrently 
> writing to "session". Blocking until the other thread is finished using this 
> session. Please review your code to avoid concurrent use of a session.
> {code}
> is logged for the current thread
> * if the current threads attempts a write operation while another thread 
> already executes a write operation in Jackrabbit 2,
> * if the current thread attempts a write operation while another thread 
> already executes any operation. 
> We should make these warnings identical to those of Jackrabbit 2.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Closed] (OAK-1895) ClassCastException can occur if the TraversalIndex is cheaper than an OrderedIndex (or a different AdvancedQueryIndex impl)

2014-07-22 Thread Marcel Reutegger (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Reutegger closed OAK-1895.
-


> ClassCastException can occur if the TraversalIndex is cheaper than an 
> OrderedIndex (or a different AdvancedQueryIndex impl)
> ---
>
> Key: OAK-1895
> URL: https://issues.apache.org/jira/browse/OAK-1895
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: query
>Reporter: Justin Edelson
>Assignee: Justin Edelson
> Fix For: 1.0.2, 1.1
>
>
> Because the TraversalIndex is added last, the `bestPlan` variable will be 
> non-null if an OrderedIndex was usable for the query. If the TraversalIndex 
> ends up being cheaper, then the `bestIndex` variable is set to the 
> TraveralIndex, but the `bestPlan` remains set to a non-null value.
> Later, in SelectorImpl, the fact that the plan is non-null causes the index 
> to be cast to AdvancedQueryIndex which fails with a ClassCastException.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Closed] (OAK-1927) TarMK compaction delays journal updates

2014-07-22 Thread Marcel Reutegger (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Reutegger closed OAK-1927.
-


> TarMK compaction delays journal updates
> ---
>
> Key: OAK-1927
> URL: https://issues.apache.org/jira/browse/OAK-1927
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, segmentmk
>Affects Versions: 1.0.1
>Reporter: Jukka Zitting
>Assignee: Jukka Zitting
>Priority: Critical
> Fix For: 1.0.2, 1.1
>
>
> The compaction operation gets currently invoked from the TarMK flush thread, 
> which is a bit troublesome as the operation can take some while during which 
> the flush thread won't be able to persist the latest updates to the journal 
> file.
> To avoid this problem, the compaction operation should be performed in a 
> separate background thread.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Closed] (OAK-1916) NodeStoreKernel doesn't handle array properties correctly

2014-07-22 Thread Marcel Reutegger (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Reutegger closed OAK-1916.
-


> NodeStoreKernel doesn't handle array properties correctly
> -
>
> Key: OAK-1916
> URL: https://issues.apache.org/jira/browse/OAK-1916
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: mk
>Reporter: Michael Dürig
>Assignee: Michael Dürig
> Fix For: 1.0.2
>
> Attachments: OAK-1916.patch
>
>
> {{NodeStoreKernel}} currently only supports array properties of type long. 
> For other types it will fail with an {{IllegalStateException}}. See also the 
> FIXME in the code.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Closed] (OAK-1899) Ordered index fails with old index content

2014-07-22 Thread Marcel Reutegger (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Reutegger closed OAK-1899.
-


> Ordered index fails with old index content
> --
>
> Key: OAK-1899
> URL: https://issues.apache.org/jira/browse/OAK-1899
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: query
>Reporter: Thomas Mueller
>Assignee: Davide Giannella
> Fix For: 1.0.2, 1.1
>
> Attachments: OAK-1899-r2.patch, OAK-1899.patch
>
>
> With the latest changes, the ordered index no longer works with old index 
> data. When running the latest Oak 1.0.2 snapshot run against an Oak 1.0.0 
> repository with an existing ordered index, the index fails with the exception 
> below.
> As a workaround, the ordered index can be manually re-built. Either the index 
> re-build needs to be automatic, or the ordered index needs to work with the 
> old index content.
> {noformat}
> java.lang.IndexOutOfBoundsException: index (3) must be less than size (1)
> at 
> com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:306)
> at 
> com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:285)
> at 
> org.apache.jackrabbit.oak.plugins.segment.SegmentPropertyState.getValue(SegmentPropertyState.java:157)
> at 
> org.apache.jackrabbit.oak.plugins.index.property.strategy.OrderedContentMirrorStoreStrategy.getPropertyNext(OrderedContentMirrorStoreStrategy.java:1024)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Closed] (OAK-1898) Query: Incorrect cost calculation for traversal

2014-07-22 Thread Marcel Reutegger (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Reutegger closed OAK-1898.
-


> Query: Incorrect cost calculation for traversal
> ---
>
> Key: OAK-1898
> URL: https://issues.apache.org/jira/browse/OAK-1898
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: query
>Affects Versions: 1.0, 1.0.1
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
> Fix For: 1.0.2, 1.1
>
> Attachments: [OAK][TraversingIndex] Incorrect cost calculation_.pdf
>
>
> For queries of the following type, the estimated cost of traversal is very 
> low if the number of path elements is high:
> {noformat}
> /jcr:root/path/with/many/elements/in/it//*
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Closed] (OAK-1894) PropertyIndex only considers the cost of a single indexed property

2014-07-22 Thread Marcel Reutegger (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Reutegger closed OAK-1894.
-


> PropertyIndex only considers the cost of a single indexed property
> --
>
> Key: OAK-1894
> URL: https://issues.apache.org/jira/browse/OAK-1894
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: query
>Reporter: Justin Edelson
>Assignee: Thomas Mueller
> Fix For: 1.0.2, 1.1
>
> Attachments: OAK-1894-advanced.patch, OAK-1894-old.patch, 
> OAK-1894.patch
>
>
> The existing PropertyIndex loops through the PropertyRestriction objects in 
> the Filter and essentially only calculates the cost of the first indexed 
> property. This isn't actually the first property in the query and 
> Filter.propertyRestrictions is a HashMap.
> More confusingly, the plan for a query with multiple indexed properties 
> outputs *all* indexed properties, even though only the first one is used.
> For queries with multiple indexed properties, the cheapest property index 
> should be used in all three relevant places: when calculating the cost, when 
> executing the query, and when producing the plan.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Closed] (OAK-1645) Route find queries to Mongo secondary in MongoDocumentStore

2014-07-22 Thread Marcel Reutegger (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Reutegger closed OAK-1645.
-


> Route find queries to Mongo secondary in MongoDocumentStore
> ---
>
> Key: OAK-1645
> URL: https://issues.apache.org/jira/browse/OAK-1645
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: mongomk
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.0.2, 1.1
>
> Attachments: OAK-1645-1.patch, OAK-1645-2.patch
>
>
> Currently MongoDocumentStores routes all find query to primary. In some case 
> it is possible to route the call to secondary safely
> *1. Make use of Max Age*
> Find call takes a maxAge parameter
> {code}
> find(Collection collection, String key, int maxCacheAge)
> {code}
> If the maxAge is high then its safe to route the call to secondary as the 
> caller explicitly does not want the latest version. This would be specially 
> useful in fetching split documents as such docs are immutable. So logic can 
> first check in secondary and if not found can make a call to primary
> *2. Make use of modified time of parent*
> When fetch a path its possible to check if the parent exist in the cache or 
> not. if parent is present in cache we can make use of its {{modified}} time. 
> If the modified time is old it indicates that subtree under it has also not 
> been modified. So call for such child can be routed to secondary
> In both cases we need to have a time interval defined to switch the logic to 
> secondary call



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Closed] (OAK-1897) Stale documents in MongoDocumentStore cache

2014-07-22 Thread Marcel Reutegger (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Reutegger closed OAK-1897.
-


> Stale documents in MongoDocumentStore cache
> ---
>
> Key: OAK-1897
> URL: https://issues.apache.org/jira/browse/OAK-1897
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, mongomk
>Affects Versions: 1.0
>Reporter: Marcel Reutegger
>Assignee: Marcel Reutegger
> Fix For: 1.0.2, 1.1
>
>
> MongoDocumentStore may put outdated documents into its cache when multiple 
> documents are read from MongoDB and at the same time one of those documents 
> in updated. This may happen e.g. in {{MongoDocumentStore.query()}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Closed] (OAK-1807) ConstraintViolationException seen with multiple Oak/Mongo with ConcurrentCreateNodesTest

2014-07-22 Thread Marcel Reutegger (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Reutegger closed OAK-1807.
-


> ConstraintViolationException seen with multiple Oak/Mongo with 
> ConcurrentCreateNodesTest
> 
>
> Key: OAK-1807
> URL: https://issues.apache.org/jira/browse/OAK-1807
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: mongomk
>Reporter: Chetan Mehrotra
>Assignee: Marcel Reutegger
>Priority: Minor
>  Labels: concurrency
> Fix For: 1.0.2
>
>
> While running ConcurrentCreateNodesTest with 5 instances writing to same 
> Mongo instance following exception is seen
> {noformat}
> Exception in thread "Background job 
> org.apache.jackrabbit.oak.benchmark.ConcurrentCreateNodesTest$Writer@3f56e5ed"
>  java.lang.RuntimeException: javax.jcr.nodetype.ConstraintViolationException: 
> OakConstraint0001: /: The primary type rep:root does not exist
> at 
> org.apache.jackrabbit.oak.benchmark.ConcurrentCreateNodesTest$Writer.run(ConcurrentCreateNodesTest.java:111)
> at 
> org.apache.jackrabbit.oak.benchmark.AbstractTest$1.run(AbstractTest.java:481)
> Caused by: javax.jcr.nodetype.ConstraintViolationException: 
> OakConstraint0001: /: The primary type rep:root does not exist
> at 
> org.apache.jackrabbit.oak.api.CommitFailedException.asRepositoryException(CommitFailedException.java:225)
> at 
> org.apache.jackrabbit.oak.api.CommitFailedException.asRepositoryException(CommitFailedException.java:212)
> at 
> org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.newRepositoryException(SessionDelegate.java:679)
> at 
> org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.save(SessionDelegate.java:553)
> at 
> org.apache.jackrabbit.oak.jcr.session.SessionImpl$8.perform(SessionImpl.java:417)
> at 
> org.apache.jackrabbit.oak.jcr.session.SessionImpl$8.perform(SessionImpl.java:414)
> at 
> org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.perform(SessionDelegate.java:308)
> at 
> org.apache.jackrabbit.oak.jcr.session.SessionImpl.perform(SessionImpl.java:127)
> at 
> org.apache.jackrabbit.oak.jcr.session.SessionImpl.save(SessionImpl.java:414)
> at 
> org.apache.jackrabbit.oak.benchmark.ConcurrentCreateNodesTest$Writer.run(ConcurrentCreateNodesTest.java:100)
> ... 1 more
> Caused by: org.apache.jackrabbit.oak.api.CommitFailedException: 
> OakConstraint0001: /: The primary type rep:root does not exist
> at 
> org.apache.jackrabbit.oak.plugins.nodetype.TypeEditor.constraintViolation(TypeEditor.java:150)
> at 
> org.apache.jackrabbit.oak.plugins.nodetype.TypeEditor.getEffectiveType(TypeEditor.java:286)
> at 
> org.apache.jackrabbit.oak.plugins.nodetype.TypeEditor.(TypeEditor.java:101)
> at 
> org.apache.jackrabbit.oak.plugins.nodetype.TypeEditorProvider.getRootEditor(TypeEditorProvider.java:85)
> at 
> org.apache.jackrabbit.oak.spi.commit.CompositeEditorProvider.getRootEditor(CompositeEditorProvider.java:80)
> at 
> org.apache.jackrabbit.oak.spi.commit.EditorHook.processCommit(EditorHook.java:53)
> at 
> org.apache.jackrabbit.oak.spi.commit.CompositeHook.processCommit(CompositeHook.java:60)
> at 
> org.apache.jackrabbit.oak.spi.commit.CompositeHook.processCommit(CompositeHook.java:60)
> at 
> org.apache.jackrabbit.oak.spi.state.AbstractNodeStoreBranch$InMemory.merge(AbstractNodeStoreBranch.java:498)
> at 
> org.apache.jackrabbit.oak.spi.state.AbstractNodeStoreBranch.merge(AbstractNodeStoreBranch.java:300)
> at 
> org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreBranch.merge(DocumentNodeStoreBranch.java:129)
> at 
> org.apache.jackrabbit.oak.plugins.document.DocumentRootBuilder.merge(DocumentRootBuilder.java:159)
> at 
> org.apache.jackrabbit.oak.plugins.document.DocumentNodeStore.merge(DocumentNodeStore.java:1275)
> at org.apache.jackrabbit.oak.core.MutableRoot.commit(MutableRoot.java:247)
> at 
> org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.commit(SessionDelegate.java:405)
> at 
> org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.save(SessionDelegate.java:551)
> ... 7 more
> {noformat}
> This has been reported by [~rogoz]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Closed] (OAK-1921) Backup: "Attempt to read external blob" error

2014-07-22 Thread Marcel Reutegger (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Reutegger closed OAK-1921.
-


> Backup: "Attempt to read external blob" error
> -
>
> Key: OAK-1921
> URL: https://issues.apache.org/jira/browse/OAK-1921
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: segmentmk
>Affects Versions: 1.0, 1.0.1, 1.0.2
>Reporter: Thomas Mueller
>Assignee: Alex Parvulescu
> Fix For: 1.0.2, 1.1
>
> Attachments: FileStoreBackup.java.patch, 
> OAK-1921-generic-backup.patch, OAK-1921.patch
>
>
> I tried to backup a segmentstore (with an external BlobStore) using
> {noformat}
> java -mx8g -jar oak-run-1.0.2-SNAPSHOT.jar backup segmentstore s2
> {noformat}
> and got:
> {noformat}
> Attempt to read external blob with blobId
> [c184d2a3f1dbc709004a45ae6c5df7624c2ae653#32768] without specifying BlobStore
>   at 
> org.apache.jackrabbit.oak.plugins.segment.SegmentBlob.getReference(SegmentBlob.java:118)
>   at 
> org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeBlob(SegmentWriter.java:706)
>   at 
> org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeProperty(SegmentWriter.java:808)
>   at 
> org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeProperty(SegmentWriter.java:796)
> {noformat}
> There are two options:
> 1) Adjust the backup code to work like compaction does, i.e. leave
> external blobs as-is and perhaps output a message that informs the
> user about the need to use a different mechanism to back up the
> BlobStore contents
> 2) Add command line options for configuring the BlobStore to be used
> for accessing external blobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1941) RDB: decide on table layout

2014-07-22 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070054#comment-14070054
 ] 

Marcel Reutegger commented on OAK-1941:
---

bq. what's the extensibility story with respect to properties added by the 
persistence?

So far we didn't have a need to introduce DocumentStore implementation specific
data to a NodeDocument. But I think an implementation should be allowed to 
store more
data than required by the DocumentMK. This means we will have to change the 
black
list approach on document split into a white list one.

> RDB: decide on table layout
> ---
>
> Key: OAK-1941
> URL: https://issues.apache.org/jira/browse/OAK-1941
> Project: Jackrabbit Oak
>  Issue Type: Sub-task
>  Components: rdbmk
>Reporter: Julian Reschke
> Fix For: 1.1
>
> Attachments: OAK-1941-cmodcount.diff
>
>
> The current approach is to serialize the Document using JSON, and then to 
> store either (a) the full JSON in a VARCHAR column, or, if that column isn't 
> wide enough, (b) to store it in a BLOB (optionally gzipped).
> For debugging purposes, the inline VARCHAR always gets populated with the 
> start of the JSON serialization.
> However, with Oracle we are limited to 4000 bytes (which may be way less 
> characters due to non-ASCII overhead), so many document instances will use 
> what was initially thought to be the exception case.
> Questions:
> 1) Do we stick with JSON or do we attempt a different serialization? It might 
> make sense both wrt to length and performance. There might be also some code 
> to borrow from the off-heap serialization code.
> 2) Do we get rid of the "dual" strategy, and just always use the BLOB? The 
> indirection might make things more expensive, but then the total column width 
> would drop considerably. -- How can we do good benchmarks on this?
> (This all assumes that we stick with a model where all code is the same 
> between database types, except for the DDL statements; of course it's also 
> conceivable add more vendor-specific special cases into the Java code)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1941) RDB: decide on table layout

2014-07-22 Thread Julian Reschke (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070082#comment-14070082
 ] 

Julian Reschke commented on OAK-1941:
-

[~mreutegg] _modcount is already an example. The DocumentMK doesn't even know 
what it is.

> RDB: decide on table layout
> ---
>
> Key: OAK-1941
> URL: https://issues.apache.org/jira/browse/OAK-1941
> Project: Jackrabbit Oak
>  Issue Type: Sub-task
>  Components: rdbmk
>Reporter: Julian Reschke
> Fix For: 1.1
>
> Attachments: OAK-1941-cmodcount.diff
>
>
> The current approach is to serialize the Document using JSON, and then to 
> store either (a) the full JSON in a VARCHAR column, or, if that column isn't 
> wide enough, (b) to store it in a BLOB (optionally gzipped).
> For debugging purposes, the inline VARCHAR always gets populated with the 
> start of the JSON serialization.
> However, with Oracle we are limited to 4000 bytes (which may be way less 
> characters due to non-ASCII overhead), so many document instances will use 
> what was initially thought to be the exception case.
> Questions:
> 1) Do we stick with JSON or do we attempt a different serialization? It might 
> make sense both wrt to length and performance. There might be also some code 
> to borrow from the off-heap serialization code.
> 2) Do we get rid of the "dual" strategy, and just always use the BLOB? The 
> indirection might make things more expensive, but then the total column width 
> would drop considerably. -- How can we do good benchmarks on this?
> (This all assumes that we stick with a model where all code is the same 
> between database types, except for the DDL statements; of course it's also 
> conceivable add more vendor-specific special cases into the Java code)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (OAK-1980) Use index on non-root node

2014-07-22 Thread Marcel Reutegger (JIRA)
Marcel Reutegger created OAK-1980:
-

 Summary: Use index on non-root node
 Key: OAK-1980
 URL: https://issues.apache.org/jira/browse/OAK-1980
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: core
Reporter: Marcel Reutegger
 Fix For: 1.1


Oak is able to maintain indexes on any location in the hierarchy. However the 
lookup for most index implementations only make use of an index under the root 
node. There are various TODOs in the code regarding this, e.g. in 
PropertyIndex. Looking up an index along the filter path adds some additional 
cost, but should be within reasonable bounds.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1965) Support for constraints like: foo = 'X' OR bar = 'Y'

2014-07-22 Thread David Gonzalez (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071319#comment-14071319
 ] 

David Gonzalez commented on OAK-1965:
-

[~jukkaz] FYI - after installing this, my instance has been "restarting" for 6+ 
hours; When I enabled DEBUG on org.jackrabbit logs, I see a wall of... 

{noformat}
22.07.2014 23:08:46.051 *DEBUG* [TarMK compaction thread 
[/xxx/repository/segmentstore], active since Tue Jul 22 02:00:00 EDT 2014, 
previous max duration 0ms] org.apache.jackrabbit.oak.plugins.segment.SegmentId 
Loading segment 4b993a00-2233-4c8c-a9fa-e9e4fab8a665
{noformat}

With 147 and counting new segment store tar files since the installation. 
Doesn't seem directly related, but was an unexpected side effect of installing 
this jar.

> Support for constraints like: foo = 'X' OR bar = 'Y'
> 
>
> Key: OAK-1965
> URL: https://issues.apache.org/jira/browse/OAK-1965
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, query
>Reporter: Jukka Zitting
>Assignee: Jukka Zitting
> Fix For: 1.1
>
> Attachments: oak-core-1.0.3-OAK-1965-SNAPSHOT.jar
>
>
> Consider the following query statement:
> {noformat}
> SELECT * FROM [nt:base] WHERE [foo] = 'X OR [bar] = 'Y'
> {noformat}
> Such a query could be fairly efficiently executed against a property index 
> that indexes the values of both "foo" and "bar" properties. However, the 
> query engine doesn't pass such OR constraints down to the index 
> implementations, so we currently can't leverage such an index for this query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1926) UnmergedBranch state growing with empty BranchCommit leading to performance degradation

2014-07-22 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071411#comment-14071411
 ] 

Chetan Mehrotra commented on OAK-1926:
--

Following notes are are based on  a discussion with [~mreutegg] on this issue

* DocumentNodeStore needs to keep track of UnmergedBranches to distinguish 
revisions which are part of a branch
* If a process terminates with some pending UnmergedBranches then those branch 
info remain present in root document revision map and can only be removed if we 
do a garbage collection and remove all commits which were part of those 
branches. Without that we need to maintain the in memory state
* Loading of unmerged branch was done in 
[1461193|http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-mongomk/src/main/java/org/apache/jackrabbit/mongomk/prototype/MongoMK.java?r1=1461193&r2=1461192&pathrev=1461193]
* Currently there are following problems wrt unmerged branches
** A - Check for revision being part of branch is costly - The way check is 
currently implemented does not distinguish between in memory alive branches and 
persisted unmerged branches. To simplify the check we distinguish between the 
two types and for persisted unmerged branch we keep a set of such rev and first 
do a lookup there to confirm if rev is part of unmerged branch before doing 
actual check
** B - Tracking of branches which are not merged - An unmerged branch state 
would be persisted in two cases
*** Client did not merged the branch - In this case we can somehow figure out 
that a branch has gone out of scope (possibly via WekReference on 
DocumentNodeStoreBranch) and would not be merged. In such a case we know the 
commits done in that branch and perform a cleanup
*** Oak processes had a sudden exit - In this case branch commit info would be 
lost and we would have to resort to GC
** C - Unmerged Rev GC - Once we implement a full GC then such branch state can 
be collected in that GC

For now as part of this bug we would implement #C as that should reduce the 
performance issue and later we can go for #B and #C

> UnmergedBranch state growing with empty BranchCommit leading to performance 
> degradation
> ---
>
> Key: OAK-1926
> URL: https://issues.apache.org/jira/browse/OAK-1926
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: mongomk
>Affects Versions: 1.0.1
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.1
>
>
> In some cluster deployment cases it has been seen that in memory state of 
> UnmergedBranches contains large number of empty commits. For e.g. in  one of 
> of the runs there were 750 entries in the UnmergedBranches and each Branch 
> had empty branch commits.
> If there are large number of UnmergedBranches then read performance would 
> degrade as for determining revision validity currently logic scans all 
> branches
> Below is some part of UnmergedBranch state
> {noformat}
> Branch 1
> 1 -> br146d2edb7a7-0-1 (true) (revision: "br146d2edb7a7-0-1", clusterId: 1, 
> time: "2014-06-25 05:08:52.903", branch: true)
> 2 -> br146d2f0450b-0-1 (true) (revision: "br146d2f0450b-0-1", clusterId: 1, 
> time: "2014-06-25 05:11:40.171", branch: true)
> Branch 2
> 1 -> br146d2ef1d08-0-1 (true) (revision: "br146d2ef1d08-0-1", clusterId: 1, 
> time: "2014-06-25 05:10:24.392", branch: true)
> Branch 3
> 1 -> br146d2ed26ca-0-1 (true) (revision: "br146d2ed26ca-0-1", clusterId: 1, 
> time: "2014-06-25 05:08:15.818", branch: true)
> 2 -> br146d2edfd0e-0-1 (true) (revision: "br146d2edfd0e-0-1", clusterId: 1, 
> time: "2014-06-25 05:09:10.670", branch: true)
> Branch 4
> 1 -> br146d2ecd85b-0-1 (true) (revision: "br146d2ecd85b-0-1", clusterId: 1, 
> time: "2014-06-25 05:07:55.739", branch: true)
> Branch 5
> 1 -> br146d2ec21a0-0-1 (true) (revision: "br146d2ec21a0-0-1", clusterId: 1, 
> time: "2014-06-25 05:07:08.960", branch: true)
> 2 -> br146d2ec8eca-0-1 (true) (revision: "br146d2ec8eca-0-1", clusterId: 1, 
> time: "2014-06-25 05:07:36.906", branch: true)
> Branch 6
> 1 -> br146d2eaf159-1-1 (true) (revision: "br146d2eaf159-1-1", clusterId: 1, 
> time: "2014-06-25 05:05:51.065", counter: 1, branch: true)
> Branch 7
> 1 -> br146d2e9a513-0-1 (true) (revision: "br146d2e9a513-0-1", clusterId: 1, 
> time: "2014-06-25 05:04:26.003", branch: true)
> {noformat}
> [~mreutegg] Suggested that these branch might be for those revision which 
> have resulted in a collision and upon checking it indeed appears to be the 
> case  (value true in brackets above indicate that). Further given the age of 
> such revision it looks like they get populated upon startup itself
> *Fix*
> * Need to check why we need to populate the UnermgedBranch
> * Possibly implement some purge job which would remove such stale ent

[jira] [Commented] (OAK-1926) UnmergedBranch state growing with empty BranchCommit leading to performance degradation

2014-07-22 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071420#comment-14071420
 ] 

Marcel Reutegger commented on OAK-1926:
---

bq. only be removed if we do a garbage collection and remove all commits which 
were part of those branches

Wouldn't it be sufficient to just remove the _revisions entries from the root 
document on startup? For readers those commits from branches that were never 
merged will appear as non-committed and will be ignored. 

> UnmergedBranch state growing with empty BranchCommit leading to performance 
> degradation
> ---
>
> Key: OAK-1926
> URL: https://issues.apache.org/jira/browse/OAK-1926
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: mongomk
>Affects Versions: 1.0.1
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.1
>
>
> In some cluster deployment cases it has been seen that in memory state of 
> UnmergedBranches contains large number of empty commits. For e.g. in  one of 
> of the runs there were 750 entries in the UnmergedBranches and each Branch 
> had empty branch commits.
> If there are large number of UnmergedBranches then read performance would 
> degrade as for determining revision validity currently logic scans all 
> branches
> Below is some part of UnmergedBranch state
> {noformat}
> Branch 1
> 1 -> br146d2edb7a7-0-1 (true) (revision: "br146d2edb7a7-0-1", clusterId: 1, 
> time: "2014-06-25 05:08:52.903", branch: true)
> 2 -> br146d2f0450b-0-1 (true) (revision: "br146d2f0450b-0-1", clusterId: 1, 
> time: "2014-06-25 05:11:40.171", branch: true)
> Branch 2
> 1 -> br146d2ef1d08-0-1 (true) (revision: "br146d2ef1d08-0-1", clusterId: 1, 
> time: "2014-06-25 05:10:24.392", branch: true)
> Branch 3
> 1 -> br146d2ed26ca-0-1 (true) (revision: "br146d2ed26ca-0-1", clusterId: 1, 
> time: "2014-06-25 05:08:15.818", branch: true)
> 2 -> br146d2edfd0e-0-1 (true) (revision: "br146d2edfd0e-0-1", clusterId: 1, 
> time: "2014-06-25 05:09:10.670", branch: true)
> Branch 4
> 1 -> br146d2ecd85b-0-1 (true) (revision: "br146d2ecd85b-0-1", clusterId: 1, 
> time: "2014-06-25 05:07:55.739", branch: true)
> Branch 5
> 1 -> br146d2ec21a0-0-1 (true) (revision: "br146d2ec21a0-0-1", clusterId: 1, 
> time: "2014-06-25 05:07:08.960", branch: true)
> 2 -> br146d2ec8eca-0-1 (true) (revision: "br146d2ec8eca-0-1", clusterId: 1, 
> time: "2014-06-25 05:07:36.906", branch: true)
> Branch 6
> 1 -> br146d2eaf159-1-1 (true) (revision: "br146d2eaf159-1-1", clusterId: 1, 
> time: "2014-06-25 05:05:51.065", counter: 1, branch: true)
> Branch 7
> 1 -> br146d2e9a513-0-1 (true) (revision: "br146d2e9a513-0-1", clusterId: 1, 
> time: "2014-06-25 05:04:26.003", branch: true)
> {noformat}
> [~mreutegg] Suggested that these branch might be for those revision which 
> have resulted in a collision and upon checking it indeed appears to be the 
> case  (value true in brackets above indicate that). Further given the age of 
> such revision it looks like they get populated upon startup itself
> *Fix*
> * Need to check why we need to populate the UnermgedBranch
> * Possibly implement some purge job which would remove such stale entries 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (OAK-1981) Implement full scale Revision GC for DocumentNodeStore

2014-07-22 Thread Chetan Mehrotra (JIRA)
Chetan Mehrotra created OAK-1981:


 Summary: Implement full scale Revision GC for DocumentNodeStore
 Key: OAK-1981
 URL: https://issues.apache.org/jira/browse/OAK-1981
 Project: Jackrabbit Oak
  Issue Type: New Feature
  Components: mongomk
Reporter: Chetan Mehrotra


So far we have implemented garbage collection in some form with OAK-1341. Those 
approaches help us remove quite a bit of garbage (mostly due to deleted nodes) 
but till some part is left

However full GC is still not performed due to which some of the old revision 
related data cannot be GCed like
* Revision info present in revision maps of various commit roots
* Revision related to unmerged branches (OAK-1926)
* Revision data created to property being modified by different cluster nodes

So having a tool which can perform above GC would be helpful. For start we can 
have an implementation which takes a brute force approach and scans whole repo 
(would take quite a bit of time) and later we can evolve it. Or allow system 
admins to determine to what level GC has to be done



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (OAK-1926) UnmergedBranch state growing with empty BranchCommit leading to performance degradation

2014-07-22 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071411#comment-14071411
 ] 

Chetan Mehrotra edited comment on OAK-1926 at 7/23/14 6:54 AM:
---

Following notes are are based on  a discussion with [~mreutegg] on this issue

* DocumentNodeStore needs to keep track of UnmergedBranches to distinguish 
revisions which are part of a branch
* If a process terminates with some pending UnmergedBranches then those branch 
info remain present in root document revision map and can only be removed if we 
do a garbage collection and remove all commits which were part of those 
branches. Without that we need to maintain the in memory state
* Loading of unmerged branch was done in 
[1461193|http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-mongomk/src/main/java/org/apache/jackrabbit/mongomk/prototype/MongoMK.java?r1=1461193&r2=1461192&pathrev=1461193]
* Currently there are following problems wrt unmerged branches
** A - Check for revision being part of branch is costly - The way check is 
currently implemented does not distinguish between in memory alive branches and 
persisted unmerged branches. To simplify the check we distinguish between the 
two types and for persisted unmerged branch we keep a set of such rev and first 
do a lookup there to confirm if rev is part of unmerged branch before doing 
actual check
** B - Tracking of branches which are not merged - An unmerged branch state 
would be persisted in two cases
*** Client did not merged the branch - In this case we can somehow figure out 
that a branch has gone out of scope (possibly via WekReference on 
DocumentNodeStoreBranch) and would not be merged. In such a case we know the 
commits done in that branch and perform a cleanup
*** Oak processes had a sudden exit - In this case branch commit info would be 
lost and we would have to resort to GC
** C - Unmerged Rev GC (OAK-1981) - Once we implement a full GC then such 
branch state can be collected in that GC

For now as part of this bug we would implement #C as that should reduce the 
performance issue and later we can go for #B and #C


was (Author: chetanm):
Following notes are are based on  a discussion with [~mreutegg] on this issue

* DocumentNodeStore needs to keep track of UnmergedBranches to distinguish 
revisions which are part of a branch
* If a process terminates with some pending UnmergedBranches then those branch 
info remain present in root document revision map and can only be removed if we 
do a garbage collection and remove all commits which were part of those 
branches. Without that we need to maintain the in memory state
* Loading of unmerged branch was done in 
[1461193|http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-mongomk/src/main/java/org/apache/jackrabbit/mongomk/prototype/MongoMK.java?r1=1461193&r2=1461192&pathrev=1461193]
* Currently there are following problems wrt unmerged branches
** A - Check for revision being part of branch is costly - The way check is 
currently implemented does not distinguish between in memory alive branches and 
persisted unmerged branches. To simplify the check we distinguish between the 
two types and for persisted unmerged branch we keep a set of such rev and first 
do a lookup there to confirm if rev is part of unmerged branch before doing 
actual check
** B - Tracking of branches which are not merged - An unmerged branch state 
would be persisted in two cases
*** Client did not merged the branch - In this case we can somehow figure out 
that a branch has gone out of scope (possibly via WekReference on 
DocumentNodeStoreBranch) and would not be merged. In such a case we know the 
commits done in that branch and perform a cleanup
*** Oak processes had a sudden exit - In this case branch commit info would be 
lost and we would have to resort to GC
** C - Unmerged Rev GC - Once we implement a full GC then such branch state can 
be collected in that GC

For now as part of this bug we would implement #C as that should reduce the 
performance issue and later we can go for #B and #C

> UnmergedBranch state growing with empty BranchCommit leading to performance 
> degradation
> ---
>
> Key: OAK-1926
> URL: https://issues.apache.org/jira/browse/OAK-1926
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: mongomk
>Affects Versions: 1.0.1
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.1
>
>
> In some cluster deployment cases it has been seen that in memory state of 
> UnmergedBranches contains large number of empty commits. For e.g. in  one of 
> of the runs there were 750 entries in the UnmergedBranches and each Branch 
> had empty branch commits.
> If there are large number of UnmergedBranch