[jira] [Updated] (OAK-3071) Add a compound index for _modified + _id
[ https://issues.apache.org/jira/browse/OAK-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-3071: --- Labels: performance resilience (was: ) Add a compound index for _modified + _id Key: OAK-3071 URL: https://issues.apache.org/jira/browse/OAK-3071 Project: Jackrabbit Oak Issue Type: Improvement Components: mongomk Reporter: Chetan Mehrotra Assignee: Chetan Mehrotra Labels: performance, resilience Fix For: 1.3.5 As explained in OAK-1966 diff logic makes a call like bq. db.nodes.find({ _id: { $gt: 3:/content/foo/01/, $lt: 3:/content/foo010 }, _modified: { $gte: 1405085300 } }).sort({_id:1}) For better and deterministic query performance we would need to create a compound index like \{_modified:1, _id:1\}. This index would ensure that Mongo does not have to perform object scan while evaluating such a query. Care must be taken that index is only created by default for fresh setup. For existing setup we should expose a JMX operation which can be invoked by system admin to create the required index as per maintenance window -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-2779) DocumentNodeStore should provide option to set initial cache size as percentage of MAX VM size
[ https://issues.apache.org/jira/browse/OAK-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-2779: --- Labels: performance resilience (was: ) DocumentNodeStore should provide option to set initial cache size as percentage of MAX VM size -- Key: OAK-2779 URL: https://issues.apache.org/jira/browse/OAK-2779 Project: Jackrabbit Oak Issue Type: Improvement Components: mongomk Affects Versions: 1.2 Reporter: Will McGauley Labels: performance, resilience Fix For: 1.3.5 Currently the DocumentNodeStore provides a way to configure various cache parameters, including cache size and distribution of that size to various caches. The distribution of caches is done as a % of the total cache size, which is very helpful, but the overall cache size can only be set as a literal value. It would be helpful to achieve a good default value based on the available VM memory as a %, instead of a literal value. By doing this the cache size would not need to be set by each customer, and a better initial experience would be achieved. I suggest that 25% of the max VM size would be a good starting point. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-1575) DocumentNS: Implement refined conflict resolution for addExistingNode conflicts
[ https://issues.apache.org/jira/browse/OAK-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-1575: --- Labels: resilience (was: ) DocumentNS: Implement refined conflict resolution for addExistingNode conflicts --- Key: OAK-1575 URL: https://issues.apache.org/jira/browse/OAK-1575 Project: Jackrabbit Oak Issue Type: Sub-task Components: mongomk Reporter: Michael Dürig Assignee: Marcel Reutegger Labels: resilience Fix For: 1.4 Implement refined conflict resolution for addExistingNode conflicts as defined in the parent issue for the document NS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-2242) provide a way to update the created timestamp of a NodeDocument
[ https://issues.apache.org/jira/browse/OAK-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-2242: --- Labels: performance (was: ) provide a way to update the created timestamp of a NodeDocument - Key: OAK-2242 URL: https://issues.apache.org/jira/browse/OAK-2242 Project: Jackrabbit Oak Issue Type: Improvement Components: mongomk Affects Versions: 1.1.1 Reporter: Julian Reschke Assignee: Julian Reschke Labels: performance Fix For: 1.4 Both the MongoDocumentStore and the RDBDocumentStore maintain a _modCount property, which uniquely identifies a version of a document in the persistence. Sometimes, we read data from the persistence although we already might have the document cached. This happens: a) when the cached document is older than what the caller asked for b) when running a query (for instance when looking up children of a node) In both cases, we currently replace the cache entry with a newly built NodeDocument. It would make sense to re-use the existing document instead. (This would probably require modifying the created timestamp, but would avoid the trouble of having to update the cache at all) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-1557) Mark documents as deleted
[ https://issues.apache.org/jira/browse/OAK-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-1557: --- Fix Version/s: (was: 1.4) 1.3.6 Mark documents as deleted - Key: OAK-1557 URL: https://issues.apache.org/jira/browse/OAK-1557 Project: Jackrabbit Oak Issue Type: Improvement Components: mongomk Reporter: Marcel Reutegger Assignee: Chetan Mehrotra Labels: performance, resilience Fix For: 1.3.6 This is an improvement to make a certain use case more efficient. When there is a parent node with frequently added and removed child nodes, the reading of the current list of child nodes becomes inefficient because the decision whether a node exists at a certain revision is done in the DocumentNodeStore and no filtering is done on the MongoDB side. So far we figured this would be solved automatically by the MVCC garbage collection, when documents for deleted nodes are removed. However for locations in the repository where nodes are added and deleted again frequently (think of a temp folder), the issue pops up before the GC had a chance to clean up. The Document should have an additional field, which is set when the node is deleted in the most recent revision. Based on this field the DocumentNodeStore can limit the query to MongoDB to documents that are not deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-2242) provide a way to update the created timestamp of a NodeDocument
[ https://issues.apache.org/jira/browse/OAK-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-2242: --- Fix Version/s: (was: 1.4) 1.3.7 provide a way to update the created timestamp of a NodeDocument - Key: OAK-2242 URL: https://issues.apache.org/jira/browse/OAK-2242 Project: Jackrabbit Oak Issue Type: Improvement Components: mongomk Affects Versions: 1.1.1 Reporter: Julian Reschke Assignee: Julian Reschke Labels: performance Fix For: 1.3.7 Both the MongoDocumentStore and the RDBDocumentStore maintain a _modCount property, which uniquely identifies a version of a document in the persistence. Sometimes, we read data from the persistence although we already might have the document cached. This happens: a) when the cached document is older than what the caller asked for b) when running a query (for instance when looking up children of a node) In both cases, we currently replace the cache entry with a newly built NodeDocument. It would make sense to re-use the existing document instead. (This would probably require modifying the created timestamp, but would avoid the trouble of having to update the cache at all) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-2621) Too many reads for child nodes
[ https://issues.apache.org/jira/browse/OAK-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-2621: --- Fix Version/s: (was: 1.3.5) 1.3.7 Too many reads for child nodes -- Key: OAK-2621 URL: https://issues.apache.org/jira/browse/OAK-2621 Project: Jackrabbit Oak Issue Type: Improvement Components: mongomk Affects Versions: 1.0 Reporter: Marcel Reutegger Labels: performance Fix For: 1.3.7 The DocumentNodeStore issues a lot of reads when sibling nodes are deleted, which are also index with a property index. The following calls will become a hotspot: {noformat} at org.apache.jackrabbit.oak.plugins.document.mongo.MongoDocumentStore.query(MongoDocumentStore.java:406) at org.apache.jackrabbit.oak.plugins.document.DocumentNodeStore.readChildDocs(DocumentNodeStore.java:846) at org.apache.jackrabbit.oak.plugins.document.DocumentNodeStore.readChildren(DocumentNodeStore.java:788) at org.apache.jackrabbit.oak.plugins.document.DocumentNodeStore.getChildren(DocumentNodeStore.java:753) at org.apache.jackrabbit.oak.plugins.document.DocumentNodeState.getChildNodeCount(DocumentNodeState.java:194) at org.apache.jackrabbit.oak.plugins.memory.ModifiedNodeState.getChildNodeCount(ModifiedNodeState.java:198) at org.apache.jackrabbit.oak.plugins.memory.MutableNodeState.getChildNodeCount(MutableNodeState.java:265) at org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.getChildNodeCount(MemoryNodeBuilder.java:293) at org.apache.jackrabbit.oak.plugins.index.property.strategy.ContentMirrorStoreStrategy.prune(ContentMirrorStoreStrategy.java:456) {noformat} I think the code triggering this issue is in {{ModifiedNodeState.getChildNodeCount()}}. It keeps track of already deleted children and requests {{max += deleted}}. The actual {{max}} is always 1 as requested from {{ContentMirrorStoreStrategy.prune()}}, but as more nodes get deleted, the higher {{max}} gets passed to {{DocumentNodeState.getChildNodeCount()}}. The DocumentNodeStore then checks if it has the children in the cache, only to find out the cache entry has too few entries and it needs to fetch one more. It would be best to have a minimum number of child nodes to fetch from MongoDB in this case. E.g. when NodeState.getChildNodeEntries() is called, the DocumentNodeState fetches 100 children. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-3071) Add a compound index for _modified + _id
[ https://issues.apache.org/jira/browse/OAK-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-3071: --- Fix Version/s: (was: 1.3.5) 1.3.7 Add a compound index for _modified + _id Key: OAK-3071 URL: https://issues.apache.org/jira/browse/OAK-3071 Project: Jackrabbit Oak Issue Type: Improvement Components: mongomk Reporter: Chetan Mehrotra Assignee: Chetan Mehrotra Labels: performance, resilience Fix For: 1.3.7 As explained in OAK-1966 diff logic makes a call like bq. db.nodes.find({ _id: { $gt: 3:/content/foo/01/, $lt: 3:/content/foo010 }, _modified: { $gte: 1405085300 } }).sort({_id:1}) For better and deterministic query performance we would need to create a compound index like \{_modified:1, _id:1\}. This index would ensure that Mongo does not have to perform object scan while evaluating such a query. Care must be taken that index is only created by default for fresh setup. For existing setup we should expose a JMX operation which can be invoked by system admin to create the required index as per maintenance window -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-2492) Flag Document having many children
[ https://issues.apache.org/jira/browse/OAK-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-2492: --- Fix Version/s: (was: 1.4) 1.3.7 Flag Document having many children -- Key: OAK-2492 URL: https://issues.apache.org/jira/browse/OAK-2492 Project: Jackrabbit Oak Issue Type: Improvement Components: mongomk Reporter: Chetan Mehrotra Assignee: Chetan Mehrotra Labels: performance Fix For: 1.3.7 Current DocumentMK logic while performing a diff for child nodes works as below # Get children for _before_ revision upto MANY_CHILDREN_THRESHOLD (which defaults to 50). Further note that current logic of fetching children nodes also add children {{NodeDocument}} to {{Document}} cache and also reads the complete Document for those children # Get children for _after_ revision with limits as above # If the child list is complete then it does a direct diff on the fetched children # if the list is not complete i.e. number of children are more than the threshold then it for a query based diff (also see OAK-1970) So in those cases where number of children are large then all work done in #1 above is wasted and should be avoided. To do that we can mark those parent nodes which have many children via special flag like {{_manyChildren}}. One such nodes are marked the diff logic can check for the flag and skip the work done in #1 This is kind of similar to way we mark nodes which have at least one child (OAK-1117) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-3018) Use batch-update in backgroundWrite
[ https://issues.apache.org/jira/browse/OAK-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-3018: --- Fix Version/s: (was: 1.3.5) 1.3.7 Use batch-update in backgroundWrite --- Key: OAK-3018 URL: https://issues.apache.org/jira/browse/OAK-3018 Project: Jackrabbit Oak Issue Type: Improvement Components: mongomk Reporter: Stefan Egli Labels: performance Fix For: 1.3.7 (From an earlier [post on the list|http://markmail.org/thread/mkrvhkfabit4osli]) The DocumentNodeStore.backgroundWrite goes through the heavy work of updating the lastRev for all pending changes and does so in a hierarchical-depth-first manner. Unfortunately, if the pending changes all come from separate commits (as does not sound so unlikely), the updates are sent in individual update calls to mongo (whenever the lastRev differs). Which, if there are many changes, results in many calls to mongo. OAK-2066 is about extending the DocumentStore API with a batch-update method. That one, once available, should thus be used in the {{backgroundWrite}} as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-3066) Persistent cache for previous documents
[ https://issues.apache.org/jira/browse/OAK-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-3066: --- Fix Version/s: (was: 1.3.5) 1.3.7 Persistent cache for previous documents --- Key: OAK-3066 URL: https://issues.apache.org/jira/browse/OAK-3066 Project: Jackrabbit Oak Issue Type: Improvement Components: core, mongomk Reporter: Marcel Reutegger Assignee: Vikas Saurabh Labels: performance Fix For: 1.3.7 Previous (aka split) documents contain old revisions and are immutable documents. Those documents should go into the persistent cache to reduce calls to the underlying DocumentStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (OAK-3222) RDBDocumentStore: add missing RDBHelper support for JOURNAL table
[ https://issues.apache.org/jira/browse/OAK-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davide Giannella closed OAK-3222. - Bulk close for 1.0.19 RDBDocumentStore: add missing RDBHelper support for JOURNAL table - Key: OAK-3222 URL: https://issues.apache.org/jira/browse/OAK-3222 Project: Jackrabbit Oak Issue Type: Sub-task Components: rdbmk Affects Versions: 1.2.3 Reporter: Julian Reschke Assignee: Julian Reschke Fix For: 1.2.4, 1.0.19 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-2744) Change default cache distribution ratio if persistent cache is enabled
[ https://issues.apache.org/jira/browse/OAK-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-2744: --- Fix Version/s: (was: 1.3.5) 1.3.7 Change default cache distribution ratio if persistent cache is enabled -- Key: OAK-2744 URL: https://issues.apache.org/jira/browse/OAK-2744 Project: Jackrabbit Oak Issue Type: Improvement Components: mongomk Reporter: Chetan Mehrotra Assignee: Chetan Mehrotra Labels: performance Fix For: 1.3.7 By default the cache memory in DocumentNodeStore is distributed in following ratio * nodeCache - 25% * childrenCache - 10% * docChildrenCache - 3% * diffCache - 5% * documentCache - Is given the rest i.e. 57% However off late we have found that with persistent cache enabled we can lower the cache allocated to Document cache. That would reduce the time spent in invalidating cache entries in periodic reads. So far we are using following ration in few setup and that is turning out well * nodeCachePercentage=35 * childrenCachePercentage=20 * diffCachePercentage=30 * docChildrenCachePercentage=10 * documentCache - Is given the rest i.e. 5% We should use the above distribution by default if the persistent cache is found to be enabled -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (OAK-3180) Versioning: improve diagnostics when version history state is broken
[ https://issues.apache.org/jira/browse/OAK-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davide Giannella closed OAK-3180. - Bulk close for 1.0.19 Versioning: improve diagnostics when version history state is broken Key: OAK-3180 URL: https://issues.apache.org/jira/browse/OAK-3180 Project: Jackrabbit Oak Issue Type: Improvement Components: core Affects Versions: 1.2.3, 1.3.3, 1.0.18 Reporter: Julian Reschke Assignee: Julian Reschke Fix For: 1.0.19 Attachments: OAK-3180.diff Users suffering from the problem described in OAK-3169 may encounter NPEs upon checkin(), as ReadWriteVersionManager.checkin() does not check the return value of getExistingBaseVersion() for null. Even if we can't fix the underlying problem easily, we should at least provide better diagnostics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-2836) Create diff cache entry for merged persisted branch
[ https://issues.apache.org/jira/browse/OAK-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-2836: --- Fix Version/s: (was: 1.3.5) 1.3.7 Create diff cache entry for merged persisted branch --- Key: OAK-2836 URL: https://issues.apache.org/jira/browse/OAK-2836 Project: Jackrabbit Oak Issue Type: Improvement Components: core, mongomk Reporter: Marcel Reutegger Labels: performance Fix For: 1.3.7 The diff cache is currently not populated with an entry when a persisted branch in the DocumentNodeStore is merged. This means the diff needs to be calculated later, which may affect performance when events are generated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (OAK-3189) CLONE - MissingLastRevSeeker non MongoDS may fail with OOM
[ https://issues.apache.org/jira/browse/OAK-3189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davide Giannella closed OAK-3189. - Bulk close for 1.0.19 CLONE - MissingLastRevSeeker non MongoDS may fail with OOM -- Key: OAK-3189 URL: https://issues.apache.org/jira/browse/OAK-3189 Project: Jackrabbit Oak Issue Type: Task Components: core, rdbmk Affects Versions: 1.0.18 Reporter: Julian Reschke Assignee: Julian Reschke Fix For: 1.0.19 (This clones OAK-2208 as that never made it into the 1.0 branch) This code currently has a hardwired optimization for MongoDB (returning an Iterable over a DBCursor). For all other persistences, a java List of all matching NodeDocuments will be built. I see two ways to address this: 1) Generalize the Mongo approach, where a query to the persistence can return a live iterator, or 2) Stick with the public DS API, but leverage paging (get N nodes at once, and then keep calling query() again with the right starting ID). 2) sounds simpler, but is not transactional; [~mreutegg] would that be sufficient? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-2610) Annotate intermediate docs with property names
[ https://issues.apache.org/jira/browse/OAK-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-2610: --- Fix Version/s: (was: 1.3.5) 1.3.7 Annotate intermediate docs with property names -- Key: OAK-2610 URL: https://issues.apache.org/jira/browse/OAK-2610 Project: Jackrabbit Oak Issue Type: Improvement Components: mongomk Reporter: Marcel Reutegger Assignee: Marcel Reutegger Labels: performance Fix For: 1.3.7 Reading through a ValueMap can be very inefficient if the changes of a given property are distributed sparsely across the previous documents. The current implementation has to scan through the entire set of previous documents to collect the changes. Intermediate documents should have additional information about what properties are present on referenced previous documents. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (OAK-3221) JournalTest may fail on machine with slow I/O
[ https://issues.apache.org/jira/browse/OAK-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davide Giannella closed OAK-3221. - Bulk close for 1.0.19 JournalTest may fail on machine with slow I/O - Key: OAK-3221 URL: https://issues.apache.org/jira/browse/OAK-3221 Project: Jackrabbit Oak Issue Type: Bug Components: core, mongomk Reporter: Marcel Reutegger Assignee: Marcel Reutegger Priority: Minor Fix For: 1.0.19 [~reschke] reported a failure for JournalTest.lastRevRecoveryJournalTestWithConcurrency() on a test machine without an SSD. This test creates 200 threads running lastRev recovery concurrently. Each thread will create a map using MapFactory. The default implementation in 1.0 is backed by MapDB and therefore creates quite a bit of I/O. Even on my machine with an SSD the test takes 11 seconds to run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-3272) DocumentMK scalability improvements
[ https://issues.apache.org/jira/browse/OAK-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-3272: --- Fix Version/s: 1.3.8 DocumentMK scalability improvements --- Key: OAK-3272 URL: https://issues.apache.org/jira/browse/OAK-3272 Project: Jackrabbit Oak Issue Type: Improvement Components: mongomk, rdbmk Reporter: Michael Marth Labels: scalability Fix For: 1.3.8 Collector issue for tracking DocMK issues concerning scalability -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-1769) Better cooperation for conflicting updates across cluster nodes
[ https://issues.apache.org/jira/browse/OAK-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-1769: --- Fix Version/s: (was: 1.3.6) 1.3.8 Better cooperation for conflicting updates across cluster nodes --- Key: OAK-1769 URL: https://issues.apache.org/jira/browse/OAK-1769 Project: Jackrabbit Oak Issue Type: Improvement Components: core, mongomk Reporter: Marcel Reutegger Assignee: Marcel Reutegger Labels: concurrency, scalability Fix For: 1.3.8 Every now and then we see commit failures in a cluster when many sessions try to update the same property or perform some other conflicting update. The current implementation will retry the merge after a delay, but chances are some session on another cluster node again changed the property in the meantime. This will lead to yet another retry until the limit is reached and the commit fails. The conflict logic is quite unfair, because it favors the winning session. The implementation should be improved to show a more fair behavior across cluster nodes when there are conflicts caused by competing session. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-2592) Commit does not ensure w:majority
[ https://issues.apache.org/jira/browse/OAK-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-2592: --- Fix Version/s: (was: 1.3.6) 1.3.8 Commit does not ensure w:majority - Key: OAK-2592 URL: https://issues.apache.org/jira/browse/OAK-2592 Project: Jackrabbit Oak Issue Type: Bug Components: core, mongomk Reporter: Marcel Reutegger Assignee: Marcel Reutegger Labels: resilience, scalability Fix For: 1.3.8 The MongoDocumentStore uses {{findAndModify()}} to commit a transaction. This operation does not allow an application specified write concern and always uses the MongoDB default write concern {{Acknowledged}}. This means a commit may not make it to a majority of a replica set when the primary fails. From a MongoDocumentStore perspective it may appear as if a write was successful and later reverted. See also the test in OAK-1641. To fix this, we'd probably have to change the MongoDocumentStore to avoid {{findAndModify()}} and use {{update()}} instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-2106) Optimize reads from secondaries
[ https://issues.apache.org/jira/browse/OAK-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-2106: --- Fix Version/s: (was: 1.3.6) 1.3.8 Optimize reads from secondaries --- Key: OAK-2106 URL: https://issues.apache.org/jira/browse/OAK-2106 Project: Jackrabbit Oak Issue Type: Improvement Components: core, mongomk Reporter: Marcel Reutegger Assignee: Marcel Reutegger Labels: performance, scalability Fix For: 1.3.8 OAK-1645 introduced support for reads from secondaries under certain conditions. The current implementation checks the _lastRev on a potentially cached parent document and reads from a secondary if it has not been modified in the last 6 hours. This timespan is somewhat arbitrary but reflects the assumption that the replication lag of a secondary shouldn't be more than 6 hours. This logic should be optimized to take the actual replication lag into account. MongoDB provides information about the replication lag with the command rs.status(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-2622) dynamic cache allocation
[ https://issues.apache.org/jira/browse/OAK-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-2622: --- Labels: performance resilience (was: resilience) dynamic cache allocation Key: OAK-2622 URL: https://issues.apache.org/jira/browse/OAK-2622 Project: Jackrabbit Oak Issue Type: Improvement Components: mongomk Affects Versions: 1.0.12 Reporter: Stefan Egli Labels: performance, resilience Fix For: 1.3.5 At the moment mongoMk's various caches are configurable (OAK-2546) but other than that static in terms of size. Different use-cases might require different allocations of the sub caches though. And it might not always be possible to find a good configuration upfront for all use cases. We might be able to come up with dynamically allocating the overall cache size to the different sub-caches, based on which cache is how heavily loaded or how well performing for example. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-3271) Improve DocumentMK performance
[ https://issues.apache.org/jira/browse/OAK-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-3271: --- Component/s: rdbmk mongomk Improve DocumentMK performance -- Key: OAK-3271 URL: https://issues.apache.org/jira/browse/OAK-3271 Project: Jackrabbit Oak Issue Type: Improvement Components: mongomk, rdbmk Reporter: Michael Marth Labels: performance Fix For: 1.3.7 Collector issue for DocMK performance improvements -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (OAK-3271) Improve DocumentMK performance
Michael Marth created OAK-3271: -- Summary: Improve DocumentMK performance Key: OAK-3271 URL: https://issues.apache.org/jira/browse/OAK-3271 Project: Jackrabbit Oak Issue Type: Improvement Reporter: Michael Marth Fix For: 1.3.7 Collector issue for DocMK performance improvements -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (OAK-3273) ColdStandby make sync start and end timestamp updates atomic
[ https://issues.apache.org/jira/browse/OAK-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Parvulescu reassigned OAK-3273: Assignee: Alex Parvulescu ColdStandby make sync start and end timestamp updates atomic Key: OAK-3273 URL: https://issues.apache.org/jira/browse/OAK-3273 Project: Jackrabbit Oak Issue Type: Improvement Components: tarmk-standby Reporter: Valentin Olteanu Assignee: Alex Parvulescu Priority: Minor Attachments: OAK-3273.patch OAK-3113 introduced two fields in the ColdStandby MBean: SyncStartTimestamp and SyncEndTimestamp. This is much more useful than the old SecondsSinceLastSuccess, yet, there are situations in which it's hard to interpret them since they are updated independently: - it's impossible to correlate the start with the end - in case of fail, the start still reflects the failed cycle It would be even better if the two would be updated atomically, to reflect the start and end of the last successful cycle. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-2844) Introducing a simple document-based discovery-light service (to circumvent documentMk's eventual consistency delays)
[ https://issues.apache.org/jira/browse/OAK-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709335#comment-14709335 ] Stefan Egli commented on OAK-2844: -- thx, just saw it on jenkins too and switched to memory store. can you pls try again? thx! Introducing a simple document-based discovery-light service (to circumvent documentMk's eventual consistency delays) Key: OAK-2844 URL: https://issues.apache.org/jira/browse/OAK-2844 Project: Jackrabbit Oak Issue Type: New Feature Components: mongomk Reporter: Stefan Egli Assignee: Stefan Egli Labels: resilience Fix For: 1.3.5 Attachments: InstanceStateChangeListener.java, OAK-2844.WIP-02.patch, OAK-2844.patch, OAK-2844.v3.patch, OAK-2844.v4.patch When running discovery.impl on a mongoMk-backed jcr repository, there are risks of hitting problems such as described in SLING-3432 pseudo-network-partitioning: this happens when a jcr-level heartbeat does not reach peers within the configured heartbeat timeout - it then treats that affected instance as dead, removes it from the topology, and continues with the remainings, potentially electing a new leader, running the risk of duplicate leaders. This happens when delays in mongoMk grow larger than the (configured) heartbeat timeout. These problems ultimately are due to the 'eventual consistency' nature of, not only mongoDB, but more so of mongoMk. The only alternative so far is to increase the heartbeat timeout to match the expected or measured delays that mongoMk can produce (under say given load/performance scenarios). Assuming that mongoMk will always carry a risk of certain delays and a maximum, reasonable (for discovery.impl timeout that is) maximum cannot be guaranteed, a better solution is to provide discovery with more 'real-time' like information and/or privileged access to mongoDb. Here's a summary of alternatives that have so far been floating around as a solution to circumvent eventual consistency: # expose existing (jmx) information about active 'clusterIds' - this has been proposed in SLING-4603. The pros: reuse of existing functionality. The cons: going via jmx, binding of exposed functionality as 'to be maintained API' # expose a plain mongo db/collection (via osgi injection) such that a higher (sling) level discovery could directly write heartbeats there. The pros: heartbeat latency would be minimal (assuming the collection is not sharded). The cons: exposes a mongo db/collection potentially also to anyone else, with the risk of opening up to unwanted possibilities # introduce a simple 'discovery-light' API to oak which solely provides information about which instances are active in a cluster. The implementation of this is not exposed. The pros: no need to expose a mongoDb/collection, allows any other jmx-functionality to remain unchanged. The cons: a new API that must be maintained This ticket is about the 3rd option, about a new mongo-based discovery-light service that is introduced to oak. The functionality in short: * it defines a 'local instance id' that is non-persisted, ie can change at each bundle activation. * it defines a 'view id' that uniquely identifies a particular incarnation of a 'cluster view/state' (which is: a list of active instance ids) * and it defines a list of active instance ids * the above attributes are passed to interested components via a listener that can be registered. that listener is called whenever the discovery-light notices the cluster view has changed. While the actual implementation could in fact be based on the existing {{getActiveClusterNodes()}} {{getClusterId()}} of the {{DocumentNodeStoreMBean}}, the suggestion is to not fiddle with that part, as that has dependencies to other logic. But instead, the suggestion is to create a dedicated, other, collection ('discovery') where heartbeats as well as the currentView are stored. Will attach a suggestion for an initial version of this for review. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-2844) Introducing a simple document-based discovery-light service (to circumvent documentMk's eventual consistency delays)
[ https://issues.apache.org/jira/browse/OAK-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709345#comment-14709345 ] Alex Parvulescu commented on OAK-2844: -- nope, looks like the fiesta still cannot start! {code} testLargeStartStopFiesta(org.apache.jackrabbit.oak.plugins.document.DocumentDiscoveryLiteServiceTest) Time elapsed: 0 sec ERROR! java.lang.NullPointerException at org.apache.jackrabbit.oak.plugins.document.DocumentDiscoveryLiteServiceTest.createMK(DocumentDiscoveryLiteServiceTest.java:1025) at org.apache.jackrabbit.oak.plugins.document.DocumentDiscoveryLiteServiceTest.createNodeStore(DocumentDiscoveryLiteServiceTest.java:594) at org.apache.jackrabbit.oak.plugins.document.DocumentDiscoveryLiteServiceTest.createInstance(DocumentDiscoveryLiteServiceTest.java:609) at org.apache.jackrabbit.oak.plugins.document.DocumentDiscoveryLiteServiceTest.testLargeStartStopFiesta(DocumentDiscoveryLiteServiceTest.java:931) {code} Introducing a simple document-based discovery-light service (to circumvent documentMk's eventual consistency delays) Key: OAK-2844 URL: https://issues.apache.org/jira/browse/OAK-2844 Project: Jackrabbit Oak Issue Type: New Feature Components: mongomk Reporter: Stefan Egli Assignee: Stefan Egli Labels: resilience Fix For: 1.3.5 Attachments: InstanceStateChangeListener.java, OAK-2844.WIP-02.patch, OAK-2844.patch, OAK-2844.v3.patch, OAK-2844.v4.patch When running discovery.impl on a mongoMk-backed jcr repository, there are risks of hitting problems such as described in SLING-3432 pseudo-network-partitioning: this happens when a jcr-level heartbeat does not reach peers within the configured heartbeat timeout - it then treats that affected instance as dead, removes it from the topology, and continues with the remainings, potentially electing a new leader, running the risk of duplicate leaders. This happens when delays in mongoMk grow larger than the (configured) heartbeat timeout. These problems ultimately are due to the 'eventual consistency' nature of, not only mongoDB, but more so of mongoMk. The only alternative so far is to increase the heartbeat timeout to match the expected or measured delays that mongoMk can produce (under say given load/performance scenarios). Assuming that mongoMk will always carry a risk of certain delays and a maximum, reasonable (for discovery.impl timeout that is) maximum cannot be guaranteed, a better solution is to provide discovery with more 'real-time' like information and/or privileged access to mongoDb. Here's a summary of alternatives that have so far been floating around as a solution to circumvent eventual consistency: # expose existing (jmx) information about active 'clusterIds' - this has been proposed in SLING-4603. The pros: reuse of existing functionality. The cons: going via jmx, binding of exposed functionality as 'to be maintained API' # expose a plain mongo db/collection (via osgi injection) such that a higher (sling) level discovery could directly write heartbeats there. The pros: heartbeat latency would be minimal (assuming the collection is not sharded). The cons: exposes a mongo db/collection potentially also to anyone else, with the risk of opening up to unwanted possibilities # introduce a simple 'discovery-light' API to oak which solely provides information about which instances are active in a cluster. The implementation of this is not exposed. The pros: no need to expose a mongoDb/collection, allows any other jmx-functionality to remain unchanged. The cons: a new API that must be maintained This ticket is about the 3rd option, about a new mongo-based discovery-light service that is introduced to oak. The functionality in short: * it defines a 'local instance id' that is non-persisted, ie can change at each bundle activation. * it defines a 'view id' that uniquely identifies a particular incarnation of a 'cluster view/state' (which is: a list of active instance ids) * and it defines a list of active instance ids * the above attributes are passed to interested components via a listener that can be registered. that listener is called whenever the discovery-light notices the cluster view has changed. While the actual implementation could in fact be based on the existing {{getActiveClusterNodes()}} {{getClusterId()}} of the {{DocumentNodeStoreMBean}}, the suggestion is to not fiddle with that part, as that has dependencies to other logic. But instead, the suggestion is to create a dedicated, other, collection ('discovery') where heartbeats as well as the currentView are
[jira] [Commented] (OAK-3265) Test failures on trunk: NodeLocalNameTest, NodeNameTest
[ https://issues.apache.org/jira/browse/OAK-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709143#comment-14709143 ] Marcel Reutegger commented on OAK-3265: --- I think this is caused by OAK-2634, but from what I can see, in some cases the tests are actually at fault. Test failures on trunk: NodeLocalNameTest, NodeNameTest --- Key: OAK-3265 URL: https://issues.apache.org/jira/browse/OAK-3265 Project: Jackrabbit Oak Issue Type: Bug Components: jcr Reporter: Michael Dürig Fix For: 1.3.5 Trunk's it fail for me: {noformat} testStringLiteralInvalidName(org.apache.jackrabbit.test.api.query.qom.NodeLocalNameTest) Time elapsed: 0.007 sec ERROR! javax.jcr.query.InvalidQueryException: java.lang.IllegalArgumentException: Not a valid JCR path: [node1 at org.apache.jackrabbit.oak.jcr.query.QueryManagerImpl.executeQuery(QueryManagerImpl.java:142) at org.apache.jackrabbit.oak.jcr.query.qom.QueryObjectModelImpl.execute(QueryObjectModelImpl.java:131) at org.apache.jackrabbit.test.api.query.qom.NodeLocalNameTest.testStringLiteralInvalidName(NodeLocalNameTest.java:68) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at org.apache.jackrabbit.test.AbstractJCRTest.run(AbstractJCRTest.java:464) at junit.framework.TestSuite.runTest(TestSuite.java:243) at junit.framework.TestSuite.run(TestSuite.java:238) at junit.framework.TestSuite.runTest(TestSuite.java:243) at junit.framework.TestSuite.run(TestSuite.java:238) at junit.framework.TestSuite.runTest(TestSuite.java:243) at junit.framework.TestSuite.run(TestSuite.java:238) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75) Caused by: java.lang.IllegalArgumentException: Not a valid JCR path: [node1 at org.apache.jackrabbit.oak.spi.query.PropertyValues.getOakPath(PropertyValues.java:405) at org.apache.jackrabbit.oak.query.ast.NodeNameImpl.getName(NodeNameImpl.java:131) at org.apache.jackrabbit.oak.query.ast.NodeLocalNameImpl.restrict(NodeLocalNameImpl.java:89) at org.apache.jackrabbit.oak.query.ast.ComparisonImpl.restrict(ComparisonImpl.java:184) at org.apache.jackrabbit.oak.query.ast.AndImpl.restrict(AndImpl.java:153) at org.apache.jackrabbit.oak.query.ast.SelectorImpl.createFilter(SelectorImpl.java:389) at org.apache.jackrabbit.oak.query.ast.SelectorImpl.prepare(SelectorImpl.java:284) at org.apache.jackrabbit.oak.query.QueryImpl.prepare(QueryImpl.java:591) at org.apache.jackrabbit.oak.query.QueryEngineImpl.executeQuery(QueryEngineImpl.java:193) at org.apache.jackrabbit.oak.jcr.query.QueryManagerImpl.executeQuery(QueryManagerImpl.java:132) ... 32 more testURILiteral(org.apache.jackrabbit.test.api.query.qom.NodeLocalNameTest) Time elapsed: 0.005 sec ERROR! javax.jcr.query.InvalidQueryException: java.lang.IllegalArgumentException: Not a valid JCR path: http://example.com at
[jira] [Updated] (OAK-3247) DocumentNodeStore.retrieve() should not throw IllegalArgumentException
[ https://issues.apache.org/jira/browse/OAK-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-3247: --- Component/s: rdbmk mongomk DocumentNodeStore.retrieve() should not throw IllegalArgumentException -- Key: OAK-3247 URL: https://issues.apache.org/jira/browse/OAK-3247 Project: Jackrabbit Oak Issue Type: Improvement Components: core, mongomk, rdbmk Affects Versions: 1.3.3 Reporter: Julian Sedding Priority: Minor Labels: resilience Fix For: 1.3.5 {{DocumentNodeSTore#retrieve(checkpoint)}} may throw an {{IllegalArgumentException}} via {{Revision.fromString(checkpoint)}}. The javadocs say that it returns a {{NodeState}} or {{null}}. The exception prevents recovery of {{AsyncIndexUpdate}} from a bad recorded checkpoint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-3247) DocumentNodeStore.retrieve() should not throw IllegalArgumentException
[ https://issues.apache.org/jira/browse/OAK-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-3247: --- Labels: resilience (was: ) DocumentNodeStore.retrieve() should not throw IllegalArgumentException -- Key: OAK-3247 URL: https://issues.apache.org/jira/browse/OAK-3247 Project: Jackrabbit Oak Issue Type: Improvement Components: core, mongomk, rdbmk Affects Versions: 1.3.3 Reporter: Julian Sedding Priority: Minor Labels: resilience Fix For: 1.3.5 {{DocumentNodeSTore#retrieve(checkpoint)}} may throw an {{IllegalArgumentException}} via {{Revision.fromString(checkpoint)}}. The javadocs say that it returns a {{NodeState}} or {{null}}. The exception prevents recovery of {{AsyncIndexUpdate}} from a bad recorded checkpoint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-2929) Parent of unseen children must not be removable
[ https://issues.apache.org/jira/browse/OAK-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-2929: --- Fix Version/s: (was: 1.3.5) 1.3.6 Parent of unseen children must not be removable --- Key: OAK-2929 URL: https://issues.apache.org/jira/browse/OAK-2929 Project: Jackrabbit Oak Issue Type: Bug Components: core, mongomk Affects Versions: 1.0.13, 1.2 Reporter: Vikas Saurabh Assignee: Marcel Reutegger Priority: Minor Labels: concurrency, technical_debt Fix For: 1.3.6 Attachments: IgnoredTestCase.patch With OAK-2673, it's now possible to have hidden intermediate nodes created concurrently. So, a scenario like: {noformat} start - /:hidden N1 creates /:hiddent/parent/node1 N2 creates /:hidden/parent/node2 {noformat} is allowed. But, if N2's creation of {{parent}} got persisted later than that on N1, then N2 is currently able to delete {{parent}} even though there's {{node1}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-2126) retry strategy for failed JDBC requests
[ https://issues.apache.org/jira/browse/OAK-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-2126: --- Fix Version/s: (was: 1.3.5) 1.3.6 retry strategy for failed JDBC requests --- Key: OAK-2126 URL: https://issues.apache.org/jira/browse/OAK-2126 Project: Jackrabbit Oak Issue Type: Sub-task Components: rdbmk Reporter: Julian Reschke Labels: resilience Fix For: 1.3.6 Discussion: should we have a retry strategy for failed commits? Things to consider: - does this potentially interfere with other retry strategies (either on a lower layer or in the DocumentMK)? - what failure scenarios would it address? - how to test those? - how to configure it? - what would be good defaults? (number of retries, interval) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-3079) LastRevRecoveryAgent can update _lastRev of children but not the root
[ https://issues.apache.org/jira/browse/OAK-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-3079: --- Fix Version/s: (was: 1.4) 1.3.6 LastRevRecoveryAgent can update _lastRev of children but not the root - Key: OAK-3079 URL: https://issues.apache.org/jira/browse/OAK-3079 Project: Jackrabbit Oak Issue Type: Bug Components: core, mongomk Affects Versions: 1.3.2 Reporter: Stefan Egli Labels: resilience Fix For: 1.3.6 Attachments: NonRootUpdatingLastRevRecoveryTest.java As mentioned in [OAK-2131|https://issues.apache.org/jira/browse/OAK-2131?focusedCommentId=14616391page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14616391] there can be a situation wherein the LastRevRecoveryAgent updates some nodes in the tree but not the root. This seems to happen due to OAK-2131's change in the Commit.applyToCache (where paths to update are collected via tracker.track): in that code, paths which are non-root and for which no content has changed (and mind you, a content change includes adding _deleted, which happens by default for nodes with children) are not 'tracked', ie for those the _lastRev is not update by subsequent backgroundUpdate operations - leaving them 'old/out-of-date'. This seems correct as per description/intention of OAK-2131 where the last revision can be determined via the commitRoot of the parent. But it has the effect that the LastRevRecoveryAgent then finds those intermittent nodes to be updated while as the root has already been updated (which is at first glance non-intuitive). I'll attach a test case to reproduce this. Perhaps this is a bug, perhaps it's ok. [~mreutegg] wdyt? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-3036) DocumentRootBuilder: revisit update.limit default
[ https://issues.apache.org/jira/browse/OAK-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-3036: --- Fix Version/s: (was: 1.3.5) 1.3.6 DocumentRootBuilder: revisit update.limit default - Key: OAK-3036 URL: https://issues.apache.org/jira/browse/OAK-3036 Project: Jackrabbit Oak Issue Type: Improvement Components: mongomk, rdbmk Reporter: Julian Reschke Labels: resilience Fix For: 1.3.6 update.limit decides whether a commit is persisted using a branch or not. The default is 1 (and can be overridden using the system property). A typical call pattern in JCR is to persist batches of ~1024 nodes. These translate to more than 1 changes (see PackageImportIT), due to JCR properties, and also indexing commit hooks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-3070) Use a lower bound in VersionGC query to avoid checking unmodified once deleted docs
[ https://issues.apache.org/jira/browse/OAK-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-3070: --- Fix Version/s: (was: 1.3.5) 1.3.6 Use a lower bound in VersionGC query to avoid checking unmodified once deleted docs --- Key: OAK-3070 URL: https://issues.apache.org/jira/browse/OAK-3070 Project: Jackrabbit Oak Issue Type: Improvement Components: mongomk, rdbmk Reporter: Chetan Mehrotra Assignee: Vikas Saurabh Labels: performance, resilience Fix For: 1.3.6 Attachments: OAK-3070.patch As part of OAK-3062 [~mreutegg] suggested {quote} As a further optimization we could also limit the lower bound of the _modified range. The revision GC does not need to check documents with a _deletedOnce again if they were not modified after the last successful GC run. If they didn't change and were considered existing during the last run, then they must still exist in the current GC run. To make this work, we'd need to track the last successful revision GC run. {quote} Lowest last validated _modified can be possibly saved in settings collection and reused for next run -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-2622) dynamic cache allocation
[ https://issues.apache.org/jira/browse/OAK-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-2622: --- Fix Version/s: (was: 1.3.5) 1.3.6 dynamic cache allocation Key: OAK-2622 URL: https://issues.apache.org/jira/browse/OAK-2622 Project: Jackrabbit Oak Issue Type: Improvement Components: mongomk Affects Versions: 1.0.12 Reporter: Stefan Egli Labels: performance, resilience Fix For: 1.3.6 At the moment mongoMk's various caches are configurable (OAK-2546) but other than that static in terms of size. Different use-cases might require different allocations of the sub caches though. And it might not always be possible to find a good configuration upfront for all use cases. We might be able to come up with dynamically allocating the overall cache size to the different sub-caches, based on which cache is how heavily loaded or how well performing for example. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-1575) DocumentNS: Implement refined conflict resolution for addExistingNode conflicts
[ https://issues.apache.org/jira/browse/OAK-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-1575: --- Fix Version/s: (was: 1.4) 1.3.6 DocumentNS: Implement refined conflict resolution for addExistingNode conflicts --- Key: OAK-1575 URL: https://issues.apache.org/jira/browse/OAK-1575 Project: Jackrabbit Oak Issue Type: Sub-task Components: mongomk Reporter: Michael Dürig Assignee: Marcel Reutegger Labels: resilience Fix For: 1.3.6 Implement refined conflict resolution for addExistingNode conflicts as defined in the parent issue for the document NS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-2920) RDBDocumentStore: fail init when database config seems to be inadequate
[ https://issues.apache.org/jira/browse/OAK-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-2920: --- Fix Version/s: (was: 1.3.5) 1.3.6 RDBDocumentStore: fail init when database config seems to be inadequate --- Key: OAK-2920 URL: https://issues.apache.org/jira/browse/OAK-2920 Project: Jackrabbit Oak Issue Type: Sub-task Components: rdbmk Reporter: Julian Reschke Priority: Minor Labels: resilience Fix For: 1.3.6 It has been suggested that the implementation should fail to start (rather than warn) when it detects a DB configuration that is likely to cause problems (such as wrt character encoding or collation sequences) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-2066) DocumentStore API: batch create, but no batch update
[ https://issues.apache.org/jira/browse/OAK-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-2066: --- Fix Version/s: (was: 1.3.5) 1.3.7 DocumentStore API: batch create, but no batch update Key: OAK-2066 URL: https://issues.apache.org/jira/browse/OAK-2066 Project: Jackrabbit Oak Issue Type: Improvement Components: mongomk Reporter: Julian Reschke Labels: performance Fix For: 1.3.7 The DocumentStore API currently has a call for creating many nodes at once. However, this will sometimes fail for large save operations in JCR, because in the DS persistence, JCR-deleted nodes are still present (with a deleted flag). This causes two subsequent sequences of 1) create test container 2) create many child nodes 3) remove test container to behave very differently, depending on whether the test container is created for the first time or not. (see CreateManyChildNodesTest) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-1322) Reduce calls to MongoDB
[ https://issues.apache.org/jira/browse/OAK-1322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-1322: --- Fix Version/s: (was: 1.3.5) 1.3.7 Reduce calls to MongoDB --- Key: OAK-1322 URL: https://issues.apache.org/jira/browse/OAK-1322 Project: Jackrabbit Oak Issue Type: Improvement Components: core, mongomk Reporter: Marcel Reutegger Assignee: Marcel Reutegger Labels: performance Fix For: 1.3.7 Attachments: OAK-1322-mreutegg.patch As discussed with Chetan offline we'd like to reduce the number of calls to MongoDB when content is added to the repository with a filevault package import. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (OAK-3259) Optimize NodeDocument.getNewestRevision()
[ https://issues.apache.org/jira/browse/OAK-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcel Reutegger resolved OAK-3259. --- Resolution: Fixed Introduced a new method {{NodeDocument.getAllChanges()}} which returns the {{_revisions}} and {{_commitRoot}} revisions in descending order. The implementation performs lazy loading of previous documents as needed. Implemented in trunk: http://svn.apache.org/r1697373 Optimize NodeDocument.getNewestRevision() - Key: OAK-3259 URL: https://issues.apache.org/jira/browse/OAK-3259 Project: Jackrabbit Oak Issue Type: Improvement Components: core, mongomk Reporter: Marcel Reutegger Assignee: Marcel Reutegger Labels: performance Fix For: 1.3.5 Most of the time NodeDocument.getNewestRevision() is able to quickly identify the newest revision, but sometimes the code falls to a more expensive calculation, which attempts to read through available {{_revisions}} and {{_commitRoot}} entries. If either of those maps are empty, the method will go through the entire revision history. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3273) ColdStandby JMX Status
[ https://issues.apache.org/jira/browse/OAK-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709242#comment-14709242 ] Valentin Olteanu commented on OAK-3273: --- [~alex.parvulescu], could you please take a look and review the patch I've created for this issue? ColdStandby JMX Status --- Key: OAK-3273 URL: https://issues.apache.org/jira/browse/OAK-3273 Project: Jackrabbit Oak Issue Type: Improvement Components: tarmk-standby Reporter: Valentin Olteanu Priority: Minor Attachments: OAK-3273.patch OAK-3113 introduced two fields in the ColdStandby MBean: SyncStartTimestamp and SyncEndTimestamp. This is much more useful than the old SecondsSinceLastSuccess, yet, there are situations in which it's hard to interpret them since they are updated independently: - it's impossible to correlate the start with the end - in case of fail, the start still reflects the failed cycle It would be even better if the two would be updated atomically, to reflect the start and end of the last successful cycle. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-2929) Parent of unseen children must not be removable
[ https://issues.apache.org/jira/browse/OAK-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-2929: --- Priority: Minor (was: Major) Parent of unseen children must not be removable --- Key: OAK-2929 URL: https://issues.apache.org/jira/browse/OAK-2929 Project: Jackrabbit Oak Issue Type: Bug Components: core, mongomk Affects Versions: 1.0.13, 1.2 Reporter: Vikas Saurabh Assignee: Marcel Reutegger Priority: Minor Labels: concurrency, technical_debt Fix For: 1.3.5 Attachments: IgnoredTestCase.patch With OAK-2673, it's now possible to have hidden intermediate nodes created concurrently. So, a scenario like: {noformat} start - /:hidden N1 creates /:hiddent/parent/node1 N2 creates /:hidden/parent/node2 {noformat} is allowed. But, if N2's creation of {{parent}} got persisted later than that on N1, then N2 is currently able to delete {{parent}} even though there's {{node1}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-3018) Use batch-update in backgroundWrite
[ https://issues.apache.org/jira/browse/OAK-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-3018: --- Labels: performance (was: ) Use batch-update in backgroundWrite --- Key: OAK-3018 URL: https://issues.apache.org/jira/browse/OAK-3018 Project: Jackrabbit Oak Issue Type: Improvement Components: mongomk Reporter: Stefan Egli Labels: performance Fix For: 1.3.5 (From an earlier [post on the list|http://markmail.org/thread/mkrvhkfabit4osli]) The DocumentNodeStore.backgroundWrite goes through the heavy work of updating the lastRev for all pending changes and does so in a hierarchical-depth-first manner. Unfortunately, if the pending changes all come from separate commits (as does not sound so unlikely), the updates are sent in individual update calls to mongo (whenever the lastRev differs). Which, if there are many changes, results in many calls to mongo. OAK-2066 is about extending the DocumentStore API with a batch-update method. That one, once available, should thus be used in the {{backgroundWrite}} as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3265) Test failures on trunk: NodeLocalNameTest, NodeNameTest
[ https://issues.apache.org/jira/browse/OAK-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709162#comment-14709162 ] Marcel Reutegger commented on OAK-3265: --- Added the failing tests to the known issues list: http://svn.apache.org/r1697363 Test failures on trunk: NodeLocalNameTest, NodeNameTest --- Key: OAK-3265 URL: https://issues.apache.org/jira/browse/OAK-3265 Project: Jackrabbit Oak Issue Type: Bug Components: jcr Reporter: Michael Dürig Fix For: 1.3.5 Trunk's it fail for me: {noformat} testStringLiteralInvalidName(org.apache.jackrabbit.test.api.query.qom.NodeLocalNameTest) Time elapsed: 0.007 sec ERROR! javax.jcr.query.InvalidQueryException: java.lang.IllegalArgumentException: Not a valid JCR path: [node1 at org.apache.jackrabbit.oak.jcr.query.QueryManagerImpl.executeQuery(QueryManagerImpl.java:142) at org.apache.jackrabbit.oak.jcr.query.qom.QueryObjectModelImpl.execute(QueryObjectModelImpl.java:131) at org.apache.jackrabbit.test.api.query.qom.NodeLocalNameTest.testStringLiteralInvalidName(NodeLocalNameTest.java:68) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at org.apache.jackrabbit.test.AbstractJCRTest.run(AbstractJCRTest.java:464) at junit.framework.TestSuite.runTest(TestSuite.java:243) at junit.framework.TestSuite.run(TestSuite.java:238) at junit.framework.TestSuite.runTest(TestSuite.java:243) at junit.framework.TestSuite.run(TestSuite.java:238) at junit.framework.TestSuite.runTest(TestSuite.java:243) at junit.framework.TestSuite.run(TestSuite.java:238) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75) Caused by: java.lang.IllegalArgumentException: Not a valid JCR path: [node1 at org.apache.jackrabbit.oak.spi.query.PropertyValues.getOakPath(PropertyValues.java:405) at org.apache.jackrabbit.oak.query.ast.NodeNameImpl.getName(NodeNameImpl.java:131) at org.apache.jackrabbit.oak.query.ast.NodeLocalNameImpl.restrict(NodeLocalNameImpl.java:89) at org.apache.jackrabbit.oak.query.ast.ComparisonImpl.restrict(ComparisonImpl.java:184) at org.apache.jackrabbit.oak.query.ast.AndImpl.restrict(AndImpl.java:153) at org.apache.jackrabbit.oak.query.ast.SelectorImpl.createFilter(SelectorImpl.java:389) at org.apache.jackrabbit.oak.query.ast.SelectorImpl.prepare(SelectorImpl.java:284) at org.apache.jackrabbit.oak.query.QueryImpl.prepare(QueryImpl.java:591) at org.apache.jackrabbit.oak.query.QueryEngineImpl.executeQuery(QueryEngineImpl.java:193) at org.apache.jackrabbit.oak.jcr.query.QueryManagerImpl.executeQuery(QueryManagerImpl.java:132) ... 32 more testURILiteral(org.apache.jackrabbit.test.api.query.qom.NodeLocalNameTest) Time elapsed: 0.005 sec ERROR! javax.jcr.query.InvalidQueryException: java.lang.IllegalArgumentException: Not a valid JCR path: http://example.com at
[jira] [Updated] (OAK-2492) Flag Document having many children
[ https://issues.apache.org/jira/browse/OAK-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-2492: --- Labels: performance (was: ) Flag Document having many children -- Key: OAK-2492 URL: https://issues.apache.org/jira/browse/OAK-2492 Project: Jackrabbit Oak Issue Type: Improvement Components: mongomk Reporter: Chetan Mehrotra Assignee: Chetan Mehrotra Labels: performance Fix For: 1.4 Current DocumentMK logic while performing a diff for child nodes works as below # Get children for _before_ revision upto MANY_CHILDREN_THRESHOLD (which defaults to 50). Further note that current logic of fetching children nodes also add children {{NodeDocument}} to {{Document}} cache and also reads the complete Document for those children # Get children for _after_ revision with limits as above # If the child list is complete then it does a direct diff on the fetched children # if the list is not complete i.e. number of children are more than the threshold then it for a query based diff (also see OAK-1970) So in those cases where number of children are large then all work done in #1 above is wasted and should be avoided. To do that we can mark those parent nodes which have many children via special flag like {{_manyChildren}}. One such nodes are marked the diff logic can check for the flag and skip the work done in #1 This is kind of similar to way we mark nodes which have at least one child (OAK-1117) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (OAK-3272) DocumentMK scalability improvements
Michael Marth created OAK-3272: -- Summary: DocumentMK scalability improvements Key: OAK-3272 URL: https://issues.apache.org/jira/browse/OAK-3272 Project: Jackrabbit Oak Issue Type: Improvement Components: mongomk, rdbmk Reporter: Michael Marth Collector issue for tracking DocMK issues concerning scalability -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-3259) Optimize NodeDocument.getNewestRevision()
[ https://issues.apache.org/jira/browse/OAK-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcel Reutegger updated OAK-3259: -- Labels: performance (was: resilience) Optimize NodeDocument.getNewestRevision() - Key: OAK-3259 URL: https://issues.apache.org/jira/browse/OAK-3259 Project: Jackrabbit Oak Issue Type: Improvement Components: core, mongomk Reporter: Marcel Reutegger Assignee: Marcel Reutegger Labels: performance Fix For: 1.3.5 Most of the time NodeDocument.getNewestRevision() is able to quickly identify the newest revision, but sometimes the code falls to a more expensive calculation, which attempts to read through available {{_revisions}} and {{_commitRoot}} entries. If either of those maps are empty, the method will go through the entire revision history. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-2844) Introducing a simple document-based discovery-light service (to circumvent documentMk's eventual consistency delays)
[ https://issues.apache.org/jira/browse/OAK-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709299#comment-14709299 ] Alex Parvulescu commented on OAK-2844: -- fyi this fails the trunk build on my machine for _DocumentDiscoveryLiteServiceTest_ Introducing a simple document-based discovery-light service (to circumvent documentMk's eventual consistency delays) Key: OAK-2844 URL: https://issues.apache.org/jira/browse/OAK-2844 Project: Jackrabbit Oak Issue Type: New Feature Components: mongomk Reporter: Stefan Egli Assignee: Stefan Egli Labels: resilience Fix For: 1.3.5 Attachments: InstanceStateChangeListener.java, OAK-2844.WIP-02.patch, OAK-2844.patch, OAK-2844.v3.patch, OAK-2844.v4.patch When running discovery.impl on a mongoMk-backed jcr repository, there are risks of hitting problems such as described in SLING-3432 pseudo-network-partitioning: this happens when a jcr-level heartbeat does not reach peers within the configured heartbeat timeout - it then treats that affected instance as dead, removes it from the topology, and continues with the remainings, potentially electing a new leader, running the risk of duplicate leaders. This happens when delays in mongoMk grow larger than the (configured) heartbeat timeout. These problems ultimately are due to the 'eventual consistency' nature of, not only mongoDB, but more so of mongoMk. The only alternative so far is to increase the heartbeat timeout to match the expected or measured delays that mongoMk can produce (under say given load/performance scenarios). Assuming that mongoMk will always carry a risk of certain delays and a maximum, reasonable (for discovery.impl timeout that is) maximum cannot be guaranteed, a better solution is to provide discovery with more 'real-time' like information and/or privileged access to mongoDb. Here's a summary of alternatives that have so far been floating around as a solution to circumvent eventual consistency: # expose existing (jmx) information about active 'clusterIds' - this has been proposed in SLING-4603. The pros: reuse of existing functionality. The cons: going via jmx, binding of exposed functionality as 'to be maintained API' # expose a plain mongo db/collection (via osgi injection) such that a higher (sling) level discovery could directly write heartbeats there. The pros: heartbeat latency would be minimal (assuming the collection is not sharded). The cons: exposes a mongo db/collection potentially also to anyone else, with the risk of opening up to unwanted possibilities # introduce a simple 'discovery-light' API to oak which solely provides information about which instances are active in a cluster. The implementation of this is not exposed. The pros: no need to expose a mongoDb/collection, allows any other jmx-functionality to remain unchanged. The cons: a new API that must be maintained This ticket is about the 3rd option, about a new mongo-based discovery-light service that is introduced to oak. The functionality in short: * it defines a 'local instance id' that is non-persisted, ie can change at each bundle activation. * it defines a 'view id' that uniquely identifies a particular incarnation of a 'cluster view/state' (which is: a list of active instance ids) * and it defines a list of active instance ids * the above attributes are passed to interested components via a listener that can be registered. that listener is called whenever the discovery-light notices the cluster view has changed. While the actual implementation could in fact be based on the existing {{getActiveClusterNodes()}} {{getClusterId()}} of the {{DocumentNodeStoreMBean}}, the suggestion is to not fiddle with that part, as that has dependencies to other logic. But instead, the suggestion is to create a dedicated, other, collection ('discovery') where heartbeats as well as the currentView are stored. Will attach a suggestion for an initial version of this for review. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (OAK-3274) DefaultSyncConfigImpl: add information to user.membershipExpTime about minimum expiration time
Konrad Windszus created OAK-3274: Summary: DefaultSyncConfigImpl: add information to user.membershipExpTime about minimum expiration time Key: OAK-3274 URL: https://issues.apache.org/jira/browse/OAK-3274 Project: Jackrabbit Oak Issue Type: Improvement Components: auth-external Affects Versions: 1.3.5 Reporter: Konrad Windszus Priority: Trivial The {{user.membershipExpTime}} property cannot have a value which is less than the value of the {{user.expirationTime}}. Please add this information to the OSGi property description. Otherwise it is hard to debug issues here. The reason why {{user.expirationTime}} must be less or equal to {{user.membershipExpTime}} is in https://github.com/apache/jackrabbit-oak/blob/trunk/oak-auth-external/src/main/java/org/apache/jackrabbit/oak/spi/security/authentication/external/basic/DefaultSyncContext.java#L421. Since {{syncMembership}} is only called after the {{user.expirationTime}} guard, it cannot be updated more often than the user itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-3259) Optimize NodeDocument.getNewestRevision()
[ https://issues.apache.org/jira/browse/OAK-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-3259: --- Labels: resilience (was: ) Optimize NodeDocument.getNewestRevision() - Key: OAK-3259 URL: https://issues.apache.org/jira/browse/OAK-3259 Project: Jackrabbit Oak Issue Type: Improvement Components: core, mongomk Reporter: Marcel Reutegger Assignee: Marcel Reutegger Labels: resilience Fix For: 1.3.5 Most of the time NodeDocument.getNewestRevision() is able to quickly identify the newest revision, but sometimes the code falls to a more expensive calculation, which attempts to read through available {{_revisions}} and {{_commitRoot}} entries. If either of those maps are empty, the method will go through the entire revision history. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-2986) RDB: switch to tomcat datasource implementation
[ https://issues.apache.org/jira/browse/OAK-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-2986: --- Labels: resilience (was: ) RDB: switch to tomcat datasource implementation Key: OAK-2986 URL: https://issues.apache.org/jira/browse/OAK-2986 Project: Jackrabbit Oak Issue Type: Sub-task Components: rdbmk Affects Versions: 1.2.2, 1.0.15 Reporter: Julian Reschke Assignee: Julian Reschke Labels: resilience Fix For: 1.3.5 Attachments: OAK-2986.diff, OAK-2986.diff See https://people.apache.org/~fhanik/jdbc-pool/jdbc-pool.html. In addition, this is the datasource used in Sling's datasource service, so it's closer to what people will use in practice. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (OAK-3273) ColdStandby make sync start and end timestamp updates atomic
[ https://issues.apache.org/jira/browse/OAK-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Parvulescu resolved OAK-3273. -- Resolution: Fixed Fix Version/s: 1.3.5 thanks for the patch [~volteanu]! applied at http://svn.apache.org/r1697383 ColdStandby make sync start and end timestamp updates atomic Key: OAK-3273 URL: https://issues.apache.org/jira/browse/OAK-3273 Project: Jackrabbit Oak Issue Type: Improvement Components: tarmk-standby Reporter: Valentin Olteanu Assignee: Alex Parvulescu Priority: Minor Fix For: 1.3.5 Attachments: OAK-3273.patch OAK-3113 introduced two fields in the ColdStandby MBean: SyncStartTimestamp and SyncEndTimestamp. This is much more useful than the old SecondsSinceLastSuccess, yet, there are situations in which it's hard to interpret them since they are updated independently: - it's impossible to correlate the start with the end - in case of fail, the start still reflects the failed cycle It would be even better if the two would be updated atomically, to reflect the start and end of the last successful cycle. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (OAK-3275) DefaultSyncConfig: User membership expiration time not working under some circumstances
Konrad Windszus created OAK-3275: Summary: DefaultSyncConfig: User membership expiration time not working under some circumstances Key: OAK-3275 URL: https://issues.apache.org/jira/browse/OAK-3275 Project: Jackrabbit Oak Issue Type: Bug Components: auth-external Affects Versions: 1.3.5 Reporter: Konrad Windszus Currently the user expiration and the user membership expiration can be set independently of each other in the OSGi configuration for the {{DefaultSyncConfigImpl}}. In reality this is not true though: Not only can the membership not be updated more often than the other user properties (compare with OAK-3274). Also the property which is used to mark the last successfull sync is the same for both synchronisations (https://github.com/apache/jackrabbit-oak/blob/trunk/oak-auth-external/src/main/java/org/apache/jackrabbit/oak/spi/security/authentication/external/basic/DefaultSyncContext.java#L433 and https://github.com/apache/jackrabbit-oak/blob/trunk/oak-auth-external/src/main/java/org/apache/jackrabbit/oak/spi/security/authentication/external/basic/DefaultSyncContext.java#L422). That is a problem if e.g. the user expiration time is 10 minutes but the user membership expiration time is 1 hour. Then every 10 minutes the property {{rep:lastSynced}} would be updated to the current time and the expiration check for the membership expiration would never return true (https://github.com/apache/jackrabbit-oak/blob/trunk/oak-auth-external/src/main/java/org/apache/jackrabbit/oak/spi/security/authentication/external/basic/DefaultSyncContext.java#L433). Therefore memberships would never be updated! I suggest to completely get rid of user membership expiration time and only have one expiration time for both the user properties and the memberships. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3235) Deadlock when closing a concurrently used FileStore
[ https://issues.apache.org/jira/browse/OAK-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709147#comment-14709147 ] Alex Parvulescu commented on OAK-3235: -- patch looks good! bq. I'd rather not remove the synchronized from writeMapBucket() for now though. (We can discuss doing so but lets move it out of this issue). agreed, I would not introduce this change with this patch either. I would rather tackle this as a part of (a subtask of) OAK-1828 Deadlock when closing a concurrently used FileStore --- Key: OAK-3235 URL: https://issues.apache.org/jira/browse/OAK-3235 Project: Jackrabbit Oak Issue Type: Bug Components: segmentmk Affects Versions: 1.3.3 Reporter: Francesco Mari Assignee: Michael Dürig Priority: Critical Fix For: 1.3.5 Attachments: OAK-3235-01.patch A deadlock was detected while stopping the {{SegmentCompactionIT}} using the exposed MBean. {noformat} Found one Java-level deadlock: = pool-1-thread-23: waiting to lock monitor 0x7fa8cf1f0488 (object 0x0007a0081e48, a org.apache.jackrabbit.oak.plugins.segment.file.FileStore), which is held by main main: waiting to lock monitor 0x7fa8cc015ff8 (object 0x0007a011f750, a org.apache.jackrabbit.oak.plugins.segment.SegmentWriter), which is held by pool-1-thread-23 Java stack information for the threads listed above: === pool-1-thread-23: at org.apache.jackrabbit.oak.plugins.segment.file.FileStore.writeSegment(FileStore.java:948) - waiting to lock 0x0007a0081e48 (a org.apache.jackrabbit.oak.plugins.segment.file.FileStore) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.flush(SegmentWriter.java:228) - locked 0x0007a011f750 (a org.apache.jackrabbit.oak.plugins.segment.SegmentWriter) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.prepare(SegmentWriter.java:329) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeListBucket(SegmentWriter.java:447) - locked 0x0007a011f750 (a org.apache.jackrabbit.oak.plugins.segment.SegmentWriter) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeList(SegmentWriter.java:698) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeNode(SegmentWriter.java:1190) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter$2.childNodeChanged(SegmentWriter.java:1135) at org.apache.jackrabbit.oak.plugins.memory.ModifiedNodeState.compareAgainstBaseState(ModifiedNodeState.java:400) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeNode(SegmentWriter.java:1126) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter$2.childNodeChanged(SegmentWriter.java:1135) at org.apache.jackrabbit.oak.plugins.memory.ModifiedNodeState.compareAgainstBaseState(ModifiedNodeState.java:400) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeNode(SegmentWriter.java:1126) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter$2.childNodeChanged(SegmentWriter.java:1135) at org.apache.jackrabbit.oak.plugins.memory.ModifiedNodeState.compareAgainstBaseState(ModifiedNodeState.java:400) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeNode(SegmentWriter.java:1126) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter$2.childNodeChanged(SegmentWriter.java:1135) at org.apache.jackrabbit.oak.plugins.memory.ModifiedNodeState.compareAgainstBaseState(ModifiedNodeState.java:400) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeNode(SegmentWriter.java:1126) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter$2.childNodeChanged(SegmentWriter.java:1135) at org.apache.jackrabbit.oak.plugins.memory.ModifiedNodeState.compareAgainstBaseState(ModifiedNodeState.java:400) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeNode(SegmentWriter.java:1126) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter$2.childNodeChanged(SegmentWriter.java:1135) at org.apache.jackrabbit.oak.plugins.memory.ModifiedNodeState.compareAgainstBaseState(ModifiedNodeState.java:400) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeNode(SegmentWriter.java:1126) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter$2.childNodeChanged(SegmentWriter.java:1135) at org.apache.jackrabbit.oak.plugins.memory.ModifiedNodeState.compareAgainstBaseState(ModifiedNodeState.java:400) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeNode(SegmentWriter.java:1126) at
[jira] [Updated] (OAK-3070) Use a lower bound in VersionGC query to avoid checking unmodified once deleted docs
[ https://issues.apache.org/jira/browse/OAK-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-3070: --- Labels: performance resilience (was: ) Use a lower bound in VersionGC query to avoid checking unmodified once deleted docs --- Key: OAK-3070 URL: https://issues.apache.org/jira/browse/OAK-3070 Project: Jackrabbit Oak Issue Type: Improvement Components: mongomk, rdbmk Reporter: Chetan Mehrotra Assignee: Vikas Saurabh Labels: performance, resilience Fix For: 1.3.5 Attachments: OAK-3070.patch As part of OAK-3062 [~mreutegg] suggested {quote} As a further optimization we could also limit the lower bound of the _modified range. The revision GC does not need to check documents with a _deletedOnce again if they were not modified after the last successful GC run. If they didn't change and were considered existing during the last run, then they must still exist in the current GC run. To make this work, we'd need to track the last successful revision GC run. {quote} Lowest last validated _modified can be possibly saved in settings collection and reused for next run -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-3036) DocumentRootBuilder: revisit update.limit default
[ https://issues.apache.org/jira/browse/OAK-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-3036: --- Labels: resilience (was: ) DocumentRootBuilder: revisit update.limit default - Key: OAK-3036 URL: https://issues.apache.org/jira/browse/OAK-3036 Project: Jackrabbit Oak Issue Type: Improvement Components: mongomk, rdbmk Reporter: Julian Reschke Labels: resilience Fix For: 1.3.5 update.limit decides whether a commit is persisted using a branch or not. The default is 1 (and can be overridden using the system property). A typical call pattern in JCR is to persist batches of ~1024 nodes. These translate to more than 1 changes (see PackageImportIT), due to JCR properties, and also indexing commit hooks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-3079) LastRevRecoveryAgent can update _lastRev of children but not the root
[ https://issues.apache.org/jira/browse/OAK-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-3079: --- Labels: resilience (was: ) LastRevRecoveryAgent can update _lastRev of children but not the root - Key: OAK-3079 URL: https://issues.apache.org/jira/browse/OAK-3079 Project: Jackrabbit Oak Issue Type: Bug Components: core, mongomk Affects Versions: 1.3.2 Reporter: Stefan Egli Labels: resilience Fix For: 1.4 Attachments: NonRootUpdatingLastRevRecoveryTest.java As mentioned in [OAK-2131|https://issues.apache.org/jira/browse/OAK-2131?focusedCommentId=14616391page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14616391] there can be a situation wherein the LastRevRecoveryAgent updates some nodes in the tree but not the root. This seems to happen due to OAK-2131's change in the Commit.applyToCache (where paths to update are collected via tracker.track): in that code, paths which are non-root and for which no content has changed (and mind you, a content change includes adding _deleted, which happens by default for nodes with children) are not 'tracked', ie for those the _lastRev is not update by subsequent backgroundUpdate operations - leaving them 'old/out-of-date'. This seems correct as per description/intention of OAK-2131 where the last revision can be determined via the commitRoot of the parent. But it has the effect that the LastRevRecoveryAgent then finds those intermittent nodes to be updated while as the root has already been updated (which is at first glance non-intuitive). I'll attach a test case to reproduce this. Perhaps this is a bug, perhaps it's ok. [~mreutegg] wdyt? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-3270) Improve DocumentMK resilience
[ https://issues.apache.org/jira/browse/OAK-3270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-3270: --- Fix Version/s: 1.3.6 Improve DocumentMK resilience - Key: OAK-3270 URL: https://issues.apache.org/jira/browse/OAK-3270 Project: Jackrabbit Oak Issue Type: Improvement Components: mongomk, rdbmk Reporter: Michael Marth Labels: resilience Fix For: 1.3.6 Collection of DocMK resilience improvements -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (OAK-3256) Release Oak 1.0.19
[ https://issues.apache.org/jira/browse/OAK-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davide Giannella closed OAK-3256. - Release Oak 1.0.19 -- Key: OAK-3256 URL: https://issues.apache.org/jira/browse/OAK-3256 Project: Jackrabbit Oak Issue Type: Task Reporter: Davide Giannella Assignee: Davide Giannella -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-3273) ColdStandby JMX Status
[ https://issues.apache.org/jira/browse/OAK-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentin Olteanu updated OAK-3273: -- Attachment: OAK-3273.patch ColdStandby JMX Status --- Key: OAK-3273 URL: https://issues.apache.org/jira/browse/OAK-3273 Project: Jackrabbit Oak Issue Type: Improvement Components: tarmk-standby Reporter: Valentin Olteanu Priority: Minor Attachments: OAK-3273.patch OAK-3113 introduced two fields in the ColdStandby MBean: SyncStartTimestamp and SyncEndTimestamp. This is much more useful than the old SecondsSinceLastSuccess, yet, there are situations in which it's hard to interpret them since they are updated independently: - it's impossible to correlate the start with the end - in case of fail, the start still reflects the failed cycle It would be even better if the two would be updated atomically, to reflect the start and end of the last successful cycle. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (OAK-3273) ColdStandby JMX Status
Valentin Olteanu created OAK-3273: - Summary: ColdStandby JMX Status Key: OAK-3273 URL: https://issues.apache.org/jira/browse/OAK-3273 Project: Jackrabbit Oak Issue Type: Improvement Components: tarmk-standby Reporter: Valentin Olteanu Priority: Minor OAK-3113 introduced two fields in the ColdStandby MBean: SyncStartTimestamp and SyncEndTimestamp. This is much more useful than the old SecondsSinceLastSuccess, yet, there are situations in which it's hard to interpret them since they are updated independently: - it's impossible to correlate the start with the end - in case of fail, the start still reflects the failed cycle It would be even better if the two would be updated atomically, to reflect the start and end of the last successful cycle. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (OAK-3256) Release Oak 1.0.19
[ https://issues.apache.org/jira/browse/OAK-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davide Giannella resolved OAK-3256. --- Resolution: Fixed Release Oak 1.0.19 -- Key: OAK-3256 URL: https://issues.apache.org/jira/browse/OAK-3256 Project: Jackrabbit Oak Issue Type: Task Reporter: Davide Giannella Assignee: Davide Giannella -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-2844) Introducing a simple document-based discovery-light service (to circumvent documentMk's eventual consistency delays)
[ https://issues.apache.org/jira/browse/OAK-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709349#comment-14709349 ] Stefan Egli commented on OAK-2844: -- [~alex.parvulescu], aah, typo nr 2 .. :S. fixed. or so I hope ;) Introducing a simple document-based discovery-light service (to circumvent documentMk's eventual consistency delays) Key: OAK-2844 URL: https://issues.apache.org/jira/browse/OAK-2844 Project: Jackrabbit Oak Issue Type: New Feature Components: mongomk Reporter: Stefan Egli Assignee: Stefan Egli Labels: resilience Fix For: 1.3.5 Attachments: InstanceStateChangeListener.java, OAK-2844.WIP-02.patch, OAK-2844.patch, OAK-2844.v3.patch, OAK-2844.v4.patch When running discovery.impl on a mongoMk-backed jcr repository, there are risks of hitting problems such as described in SLING-3432 pseudo-network-partitioning: this happens when a jcr-level heartbeat does not reach peers within the configured heartbeat timeout - it then treats that affected instance as dead, removes it from the topology, and continues with the remainings, potentially electing a new leader, running the risk of duplicate leaders. This happens when delays in mongoMk grow larger than the (configured) heartbeat timeout. These problems ultimately are due to the 'eventual consistency' nature of, not only mongoDB, but more so of mongoMk. The only alternative so far is to increase the heartbeat timeout to match the expected or measured delays that mongoMk can produce (under say given load/performance scenarios). Assuming that mongoMk will always carry a risk of certain delays and a maximum, reasonable (for discovery.impl timeout that is) maximum cannot be guaranteed, a better solution is to provide discovery with more 'real-time' like information and/or privileged access to mongoDb. Here's a summary of alternatives that have so far been floating around as a solution to circumvent eventual consistency: # expose existing (jmx) information about active 'clusterIds' - this has been proposed in SLING-4603. The pros: reuse of existing functionality. The cons: going via jmx, binding of exposed functionality as 'to be maintained API' # expose a plain mongo db/collection (via osgi injection) such that a higher (sling) level discovery could directly write heartbeats there. The pros: heartbeat latency would be minimal (assuming the collection is not sharded). The cons: exposes a mongo db/collection potentially also to anyone else, with the risk of opening up to unwanted possibilities # introduce a simple 'discovery-light' API to oak which solely provides information about which instances are active in a cluster. The implementation of this is not exposed. The pros: no need to expose a mongoDb/collection, allows any other jmx-functionality to remain unchanged. The cons: a new API that must be maintained This ticket is about the 3rd option, about a new mongo-based discovery-light service that is introduced to oak. The functionality in short: * it defines a 'local instance id' that is non-persisted, ie can change at each bundle activation. * it defines a 'view id' that uniquely identifies a particular incarnation of a 'cluster view/state' (which is: a list of active instance ids) * and it defines a list of active instance ids * the above attributes are passed to interested components via a listener that can be registered. that listener is called whenever the discovery-light notices the cluster view has changed. While the actual implementation could in fact be based on the existing {{getActiveClusterNodes()}} {{getClusterId()}} of the {{DocumentNodeStoreMBean}}, the suggestion is to not fiddle with that part, as that has dependencies to other logic. But instead, the suggestion is to create a dedicated, other, collection ('discovery') where heartbeats as well as the currentView are stored. Will attach a suggestion for an initial version of this for review. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-2844) Introducing a simple document-based discovery-light service (to circumvent documentMk's eventual consistency delays)
[ https://issues.apache.org/jira/browse/OAK-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709356#comment-14709356 ] Marcel Reutegger commented on OAK-2844: --- It also looks like DocumentDiscoveryLiteServiceTest messes up the build because it changes the system property 'user.dir'. Introducing a simple document-based discovery-light service (to circumvent documentMk's eventual consistency delays) Key: OAK-2844 URL: https://issues.apache.org/jira/browse/OAK-2844 Project: Jackrabbit Oak Issue Type: New Feature Components: mongomk Reporter: Stefan Egli Assignee: Stefan Egli Labels: resilience Fix For: 1.3.5 Attachments: InstanceStateChangeListener.java, OAK-2844.WIP-02.patch, OAK-2844.patch, OAK-2844.v3.patch, OAK-2844.v4.patch When running discovery.impl on a mongoMk-backed jcr repository, there are risks of hitting problems such as described in SLING-3432 pseudo-network-partitioning: this happens when a jcr-level heartbeat does not reach peers within the configured heartbeat timeout - it then treats that affected instance as dead, removes it from the topology, and continues with the remainings, potentially electing a new leader, running the risk of duplicate leaders. This happens when delays in mongoMk grow larger than the (configured) heartbeat timeout. These problems ultimately are due to the 'eventual consistency' nature of, not only mongoDB, but more so of mongoMk. The only alternative so far is to increase the heartbeat timeout to match the expected or measured delays that mongoMk can produce (under say given load/performance scenarios). Assuming that mongoMk will always carry a risk of certain delays and a maximum, reasonable (for discovery.impl timeout that is) maximum cannot be guaranteed, a better solution is to provide discovery with more 'real-time' like information and/or privileged access to mongoDb. Here's a summary of alternatives that have so far been floating around as a solution to circumvent eventual consistency: # expose existing (jmx) information about active 'clusterIds' - this has been proposed in SLING-4603. The pros: reuse of existing functionality. The cons: going via jmx, binding of exposed functionality as 'to be maintained API' # expose a plain mongo db/collection (via osgi injection) such that a higher (sling) level discovery could directly write heartbeats there. The pros: heartbeat latency would be minimal (assuming the collection is not sharded). The cons: exposes a mongo db/collection potentially also to anyone else, with the risk of opening up to unwanted possibilities # introduce a simple 'discovery-light' API to oak which solely provides information about which instances are active in a cluster. The implementation of this is not exposed. The pros: no need to expose a mongoDb/collection, allows any other jmx-functionality to remain unchanged. The cons: a new API that must be maintained This ticket is about the 3rd option, about a new mongo-based discovery-light service that is introduced to oak. The functionality in short: * it defines a 'local instance id' that is non-persisted, ie can change at each bundle activation. * it defines a 'view id' that uniquely identifies a particular incarnation of a 'cluster view/state' (which is: a list of active instance ids) * and it defines a list of active instance ids * the above attributes are passed to interested components via a listener that can be registered. that listener is called whenever the discovery-light notices the cluster view has changed. While the actual implementation could in fact be based on the existing {{getActiveClusterNodes()}} {{getClusterId()}} of the {{DocumentNodeStoreMBean}}, the suggestion is to not fiddle with that part, as that has dependencies to other logic. But instead, the suggestion is to create a dedicated, other, collection ('discovery') where heartbeats as well as the currentView are stored. Will attach a suggestion for an initial version of this for review. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-2844) Introducing a simple document-based discovery-light service (to circumvent documentMk's eventual consistency delays)
[ https://issues.apache.org/jira/browse/OAK-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709466#comment-14709466 ] Stefan Egli commented on OAK-2844: -- FYI: reactivated the test (http://svn.apache.org/r1697438) - that user.dir was no longer needed and a left-over of initial prototyping. Introducing a simple document-based discovery-light service (to circumvent documentMk's eventual consistency delays) Key: OAK-2844 URL: https://issues.apache.org/jira/browse/OAK-2844 Project: Jackrabbit Oak Issue Type: New Feature Components: mongomk Reporter: Stefan Egli Assignee: Stefan Egli Labels: resilience Fix For: 1.3.5 Attachments: InstanceStateChangeListener.java, OAK-2844.WIP-02.patch, OAK-2844.patch, OAK-2844.v3.patch, OAK-2844.v4.patch When running discovery.impl on a mongoMk-backed jcr repository, there are risks of hitting problems such as described in SLING-3432 pseudo-network-partitioning: this happens when a jcr-level heartbeat does not reach peers within the configured heartbeat timeout - it then treats that affected instance as dead, removes it from the topology, and continues with the remainings, potentially electing a new leader, running the risk of duplicate leaders. This happens when delays in mongoMk grow larger than the (configured) heartbeat timeout. These problems ultimately are due to the 'eventual consistency' nature of, not only mongoDB, but more so of mongoMk. The only alternative so far is to increase the heartbeat timeout to match the expected or measured delays that mongoMk can produce (under say given load/performance scenarios). Assuming that mongoMk will always carry a risk of certain delays and a maximum, reasonable (for discovery.impl timeout that is) maximum cannot be guaranteed, a better solution is to provide discovery with more 'real-time' like information and/or privileged access to mongoDb. Here's a summary of alternatives that have so far been floating around as a solution to circumvent eventual consistency: # expose existing (jmx) information about active 'clusterIds' - this has been proposed in SLING-4603. The pros: reuse of existing functionality. The cons: going via jmx, binding of exposed functionality as 'to be maintained API' # expose a plain mongo db/collection (via osgi injection) such that a higher (sling) level discovery could directly write heartbeats there. The pros: heartbeat latency would be minimal (assuming the collection is not sharded). The cons: exposes a mongo db/collection potentially also to anyone else, with the risk of opening up to unwanted possibilities # introduce a simple 'discovery-light' API to oak which solely provides information about which instances are active in a cluster. The implementation of this is not exposed. The pros: no need to expose a mongoDb/collection, allows any other jmx-functionality to remain unchanged. The cons: a new API that must be maintained This ticket is about the 3rd option, about a new mongo-based discovery-light service that is introduced to oak. The functionality in short: * it defines a 'local instance id' that is non-persisted, ie can change at each bundle activation. * it defines a 'view id' that uniquely identifies a particular incarnation of a 'cluster view/state' (which is: a list of active instance ids) * and it defines a list of active instance ids * the above attributes are passed to interested components via a listener that can be registered. that listener is called whenever the discovery-light notices the cluster view has changed. While the actual implementation could in fact be based on the existing {{getActiveClusterNodes()}} {{getClusterId()}} of the {{DocumentNodeStoreMBean}}, the suggestion is to not fiddle with that part, as that has dependencies to other logic. But instead, the suggestion is to create a dedicated, other, collection ('discovery') where heartbeats as well as the currentView are stored. Will attach a suggestion for an initial version of this for review. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3259) Optimize NodeDocument.getNewestRevision()
[ https://issues.apache.org/jira/browse/OAK-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709386#comment-14709386 ] Marcel Reutegger commented on OAK-3259: --- Added a cluster test: http://svn.apache.org/r1697410 Optimize NodeDocument.getNewestRevision() - Key: OAK-3259 URL: https://issues.apache.org/jira/browse/OAK-3259 Project: Jackrabbit Oak Issue Type: Improvement Components: core, mongomk Reporter: Marcel Reutegger Assignee: Marcel Reutegger Labels: performance Fix For: 1.3.5 Most of the time NodeDocument.getNewestRevision() is able to quickly identify the newest revision, but sometimes the code falls to a more expensive calculation, which attempts to read through available {{_revisions}} and {{_commitRoot}} entries. If either of those maps are empty, the method will go through the entire revision history. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (OAK-2875) Namespaces keep references to old node states
[ https://issues.apache.org/jira/browse/OAK-2875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Parvulescu resolved OAK-2875. -- Resolution: Fixed fixed with http://svn.apache.org/r1697423 Namespaces keep references to old node states - Key: OAK-2875 URL: https://issues.apache.org/jira/browse/OAK-2875 Project: Jackrabbit Oak Issue Type: Sub-task Components: core, jcr Reporter: Alex Parvulescu Assignee: Alex Parvulescu Fix For: 1.3.5 Attachments: OAK-2875-v1.patch, OAK-2875-v2.patch As described on the parent issue OA2849, the session namespaces keep a reference to a Tree instance which will make GC inefficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3235) Deadlock when closing a concurrently used FileStore
[ https://issues.apache.org/jira/browse/OAK-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709347#comment-14709347 ] Francesco Mari commented on OAK-3235: - {{SegmentWriter.flush()}} is not so different in the 1.0 and 1.2 branches, backporting shouldn't be a problem. Deadlock when closing a concurrently used FileStore --- Key: OAK-3235 URL: https://issues.apache.org/jira/browse/OAK-3235 Project: Jackrabbit Oak Issue Type: Bug Components: segmentmk Affects Versions: 1.3.3 Reporter: Francesco Mari Assignee: Michael Dürig Priority: Critical Fix For: 1.3.5 Attachments: OAK-3235-01.patch A deadlock was detected while stopping the {{SegmentCompactionIT}} using the exposed MBean. {noformat} Found one Java-level deadlock: = pool-1-thread-23: waiting to lock monitor 0x7fa8cf1f0488 (object 0x0007a0081e48, a org.apache.jackrabbit.oak.plugins.segment.file.FileStore), which is held by main main: waiting to lock monitor 0x7fa8cc015ff8 (object 0x0007a011f750, a org.apache.jackrabbit.oak.plugins.segment.SegmentWriter), which is held by pool-1-thread-23 Java stack information for the threads listed above: === pool-1-thread-23: at org.apache.jackrabbit.oak.plugins.segment.file.FileStore.writeSegment(FileStore.java:948) - waiting to lock 0x0007a0081e48 (a org.apache.jackrabbit.oak.plugins.segment.file.FileStore) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.flush(SegmentWriter.java:228) - locked 0x0007a011f750 (a org.apache.jackrabbit.oak.plugins.segment.SegmentWriter) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.prepare(SegmentWriter.java:329) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeListBucket(SegmentWriter.java:447) - locked 0x0007a011f750 (a org.apache.jackrabbit.oak.plugins.segment.SegmentWriter) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeList(SegmentWriter.java:698) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeNode(SegmentWriter.java:1190) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter$2.childNodeChanged(SegmentWriter.java:1135) at org.apache.jackrabbit.oak.plugins.memory.ModifiedNodeState.compareAgainstBaseState(ModifiedNodeState.java:400) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeNode(SegmentWriter.java:1126) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter$2.childNodeChanged(SegmentWriter.java:1135) at org.apache.jackrabbit.oak.plugins.memory.ModifiedNodeState.compareAgainstBaseState(ModifiedNodeState.java:400) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeNode(SegmentWriter.java:1126) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter$2.childNodeChanged(SegmentWriter.java:1135) at org.apache.jackrabbit.oak.plugins.memory.ModifiedNodeState.compareAgainstBaseState(ModifiedNodeState.java:400) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeNode(SegmentWriter.java:1126) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter$2.childNodeChanged(SegmentWriter.java:1135) at org.apache.jackrabbit.oak.plugins.memory.ModifiedNodeState.compareAgainstBaseState(ModifiedNodeState.java:400) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeNode(SegmentWriter.java:1126) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter$2.childNodeChanged(SegmentWriter.java:1135) at org.apache.jackrabbit.oak.plugins.memory.ModifiedNodeState.compareAgainstBaseState(ModifiedNodeState.java:400) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeNode(SegmentWriter.java:1126) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter$2.childNodeChanged(SegmentWriter.java:1135) at org.apache.jackrabbit.oak.plugins.memory.ModifiedNodeState.compareAgainstBaseState(ModifiedNodeState.java:400) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeNode(SegmentWriter.java:1126) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter$2.childNodeChanged(SegmentWriter.java:1135) at org.apache.jackrabbit.oak.plugins.memory.ModifiedNodeState.compareAgainstBaseState(ModifiedNodeState.java:400) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeNode(SegmentWriter.java:1126) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeNode(SegmentWriter.java:1154) at
[jira] [Commented] (OAK-2844) Introducing a simple document-based discovery-light service (to circumvent documentMk's eventual consistency delays)
[ https://issues.apache.org/jira/browse/OAK-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709369#comment-14709369 ] Stefan Egli commented on OAK-2844: -- oups, thx for spotting [~mreutegg]!! disabled the test for now and will find an alternative (http://svn.apache.org/r1697407) Introducing a simple document-based discovery-light service (to circumvent documentMk's eventual consistency delays) Key: OAK-2844 URL: https://issues.apache.org/jira/browse/OAK-2844 Project: Jackrabbit Oak Issue Type: New Feature Components: mongomk Reporter: Stefan Egli Assignee: Stefan Egli Labels: resilience Fix For: 1.3.5 Attachments: InstanceStateChangeListener.java, OAK-2844.WIP-02.patch, OAK-2844.patch, OAK-2844.v3.patch, OAK-2844.v4.patch When running discovery.impl on a mongoMk-backed jcr repository, there are risks of hitting problems such as described in SLING-3432 pseudo-network-partitioning: this happens when a jcr-level heartbeat does not reach peers within the configured heartbeat timeout - it then treats that affected instance as dead, removes it from the topology, and continues with the remainings, potentially electing a new leader, running the risk of duplicate leaders. This happens when delays in mongoMk grow larger than the (configured) heartbeat timeout. These problems ultimately are due to the 'eventual consistency' nature of, not only mongoDB, but more so of mongoMk. The only alternative so far is to increase the heartbeat timeout to match the expected or measured delays that mongoMk can produce (under say given load/performance scenarios). Assuming that mongoMk will always carry a risk of certain delays and a maximum, reasonable (for discovery.impl timeout that is) maximum cannot be guaranteed, a better solution is to provide discovery with more 'real-time' like information and/or privileged access to mongoDb. Here's a summary of alternatives that have so far been floating around as a solution to circumvent eventual consistency: # expose existing (jmx) information about active 'clusterIds' - this has been proposed in SLING-4603. The pros: reuse of existing functionality. The cons: going via jmx, binding of exposed functionality as 'to be maintained API' # expose a plain mongo db/collection (via osgi injection) such that a higher (sling) level discovery could directly write heartbeats there. The pros: heartbeat latency would be minimal (assuming the collection is not sharded). The cons: exposes a mongo db/collection potentially also to anyone else, with the risk of opening up to unwanted possibilities # introduce a simple 'discovery-light' API to oak which solely provides information about which instances are active in a cluster. The implementation of this is not exposed. The pros: no need to expose a mongoDb/collection, allows any other jmx-functionality to remain unchanged. The cons: a new API that must be maintained This ticket is about the 3rd option, about a new mongo-based discovery-light service that is introduced to oak. The functionality in short: * it defines a 'local instance id' that is non-persisted, ie can change at each bundle activation. * it defines a 'view id' that uniquely identifies a particular incarnation of a 'cluster view/state' (which is: a list of active instance ids) * and it defines a list of active instance ids * the above attributes are passed to interested components via a listener that can be registered. that listener is called whenever the discovery-light notices the cluster view has changed. While the actual implementation could in fact be based on the existing {{getActiveClusterNodes()}} {{getClusterId()}} of the {{DocumentNodeStoreMBean}}, the suggestion is to not fiddle with that part, as that has dependencies to other logic. But instead, the suggestion is to create a dedicated, other, collection ('discovery') where heartbeats as well as the currentView are stored. Will attach a suggestion for an initial version of this for review. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-3261) consider existing locks when creating new ones
[ https://issues.apache.org/jira/browse/OAK-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julian Reschke updated OAK-3261: Attachment: OAK-3261.diff proposed patch consider existing locks when creating new ones -- Key: OAK-3261 URL: https://issues.apache.org/jira/browse/OAK-3261 Project: Jackrabbit Oak Issue Type: Sub-task Components: jcr Affects Versions: 1.2.3, 1.3.3, 1.0.18 Reporter: Julian Reschke Assignee: Julian Reschke Fix For: 1.4 Attachments: OAK-3261.diff When creating new locks, existing locks need to be checked: - on ancestor nodes, when deep locks - on descendant nodes (Note that the check on descendant nodes might be costly as long as we have to walk to whole subtree) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (OAK-2844) Introducing a simple document-based discovery-light service (to circumvent documentMk's eventual consistency delays)
[ https://issues.apache.org/jira/browse/OAK-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli resolved OAK-2844. -- Resolution: Fixed introduced in http://svn.apache.org/r1697355 (in trunk) feedback incorporated, thx! Introducing a simple document-based discovery-light service (to circumvent documentMk's eventual consistency delays) Key: OAK-2844 URL: https://issues.apache.org/jira/browse/OAK-2844 Project: Jackrabbit Oak Issue Type: New Feature Components: mongomk Reporter: Stefan Egli Assignee: Stefan Egli Labels: resilience Fix For: 1.3.5 Attachments: InstanceStateChangeListener.java, OAK-2844.WIP-02.patch, OAK-2844.patch, OAK-2844.v3.patch, OAK-2844.v4.patch When running discovery.impl on a mongoMk-backed jcr repository, there are risks of hitting problems such as described in SLING-3432 pseudo-network-partitioning: this happens when a jcr-level heartbeat does not reach peers within the configured heartbeat timeout - it then treats that affected instance as dead, removes it from the topology, and continues with the remainings, potentially electing a new leader, running the risk of duplicate leaders. This happens when delays in mongoMk grow larger than the (configured) heartbeat timeout. These problems ultimately are due to the 'eventual consistency' nature of, not only mongoDB, but more so of mongoMk. The only alternative so far is to increase the heartbeat timeout to match the expected or measured delays that mongoMk can produce (under say given load/performance scenarios). Assuming that mongoMk will always carry a risk of certain delays and a maximum, reasonable (for discovery.impl timeout that is) maximum cannot be guaranteed, a better solution is to provide discovery with more 'real-time' like information and/or privileged access to mongoDb. Here's a summary of alternatives that have so far been floating around as a solution to circumvent eventual consistency: # expose existing (jmx) information about active 'clusterIds' - this has been proposed in SLING-4603. The pros: reuse of existing functionality. The cons: going via jmx, binding of exposed functionality as 'to be maintained API' # expose a plain mongo db/collection (via osgi injection) such that a higher (sling) level discovery could directly write heartbeats there. The pros: heartbeat latency would be minimal (assuming the collection is not sharded). The cons: exposes a mongo db/collection potentially also to anyone else, with the risk of opening up to unwanted possibilities # introduce a simple 'discovery-light' API to oak which solely provides information about which instances are active in a cluster. The implementation of this is not exposed. The pros: no need to expose a mongoDb/collection, allows any other jmx-functionality to remain unchanged. The cons: a new API that must be maintained This ticket is about the 3rd option, about a new mongo-based discovery-light service that is introduced to oak. The functionality in short: * it defines a 'local instance id' that is non-persisted, ie can change at each bundle activation. * it defines a 'view id' that uniquely identifies a particular incarnation of a 'cluster view/state' (which is: a list of active instance ids) * and it defines a list of active instance ids * the above attributes are passed to interested components via a listener that can be registered. that listener is called whenever the discovery-light notices the cluster view has changed. While the actual implementation could in fact be based on the existing {{getActiveClusterNodes()}} {{getClusterId()}} of the {{DocumentNodeStoreMBean}}, the suggestion is to not fiddle with that part, as that has dependencies to other logic. But instead, the suggestion is to create a dedicated, other, collection ('discovery') where heartbeats as well as the currentView are stored. Will attach a suggestion for an initial version of this for review. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (OAK-3267) Add discovery-lite descriptor for segmentNodeStore
Stefan Egli created OAK-3267: Summary: Add discovery-lite descriptor for segmentNodeStore Key: OAK-3267 URL: https://issues.apache.org/jira/browse/OAK-3267 Project: Jackrabbit Oak Issue Type: Task Affects Versions: 1.3.4 Reporter: Stefan Egli Assignee: Stefan Egli Fix For: 1.3.5 With OAK-2844 the DocumentNodeStore now exposes a repository descriptor 'oak.discoverylite.clusterview' - this should also be done for SegmentNodeStore - although that one will be a trivial static thingy - but upper layers should not have to worry about whether they are on document or segment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (OAK-3268) Improve datastore resilience
Michael Marth created OAK-3268: -- Summary: Improve datastore resilience Key: OAK-3268 URL: https://issues.apache.org/jira/browse/OAK-3268 Project: Jackrabbit Oak Issue Type: Improvement Components: blob Reporter: Michael Marth Fix For: 1.3.6 As discussed bilaterally grouping the improvements for datastore resilience in this issue for easier tracking -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-3090) Caching BlobStore implementation
[ https://issues.apache.org/jira/browse/OAK-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-3090: --- Fix Version/s: (was: 1.3.5) 1.3.6 Caching BlobStore implementation - Key: OAK-3090 URL: https://issues.apache.org/jira/browse/OAK-3090 Project: Jackrabbit Oak Issue Type: New Feature Components: blob Reporter: Chetan Mehrotra Labels: performance, resilience Fix For: 1.3.6 Storing binaries in Mongo puts lots of pressure on the MongoDB for reads. To reduce the read load it would be useful to have a filesystem based cache of frequently used binaries. This would be similar to CachingFDS (OAK-3005) but would be implemented on top of BlobStore API. Requirements * Specify the max binary size which can be cached on file system * Limit the size of all binary content present in the cache -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-3031) [Blob GC] Mbean for reporting shared repository GC stats
[ https://issues.apache.org/jira/browse/OAK-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-3031: --- Fix Version/s: (was: 1.3.5) 1.3.6 [Blob GC] Mbean for reporting shared repository GC stats Key: OAK-3031 URL: https://issues.apache.org/jira/browse/OAK-3031 Project: Jackrabbit Oak Issue Type: Sub-task Components: blob Reporter: Amit Jain Assignee: Amit Jain Labels: resilience, tooling Fix For: 1.3.6 For GC on a shared repository (OAK-1849) it is beneficial to add a JMX Mbean which can provide visibility on the state of GC. It could possibly show: * Various repositories registered in the DataStore * State of the blob reference collection for the registered repositories * Time of the reference files for each registered repository * Time interval for the earliest and the latest reference file of the registered repositories. This could be used to possibly automate the sweep phase if the time interval is less than a configured value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-3183) [Blob GC] Improvements/tools for blob garbage collection
[ https://issues.apache.org/jira/browse/OAK-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-3183: --- Fix Version/s: (was: 1.4) 1.3.6 [Blob GC] Improvements/tools for blob garbage collection Key: OAK-3183 URL: https://issues.apache.org/jira/browse/OAK-3183 Project: Jackrabbit Oak Issue Type: Improvement Components: blob Reporter: Amit Jain Assignee: Amit Jain Labels: resilience, tooling Fix For: 1.3.6 Container issue for improvements and reporting tools for the blob garbage collection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-3183) [Blob GC] Improvements/tools for blob garbage collection
[ https://issues.apache.org/jira/browse/OAK-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-3183: --- Labels: resilience tooling (was: tooling) [Blob GC] Improvements/tools for blob garbage collection Key: OAK-3183 URL: https://issues.apache.org/jira/browse/OAK-3183 Project: Jackrabbit Oak Issue Type: Improvement Components: blob Reporter: Amit Jain Assignee: Amit Jain Labels: resilience, tooling Fix For: 1.3.6 Container issue for improvements and reporting tools for the blob garbage collection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (OAK-3269) Improve Lucene indexer resilience
Michael Marth created OAK-3269: -- Summary: Improve Lucene indexer resilience Key: OAK-3269 URL: https://issues.apache.org/jira/browse/OAK-3269 Project: Jackrabbit Oak Issue Type: Improvement Components: lucene Reporter: Michael Marth As discussed bilaterally grouping the improvements for Lucene indexer resilience in this issue for easier tracking -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-3269) Improve Lucene indexer resilience
[ https://issues.apache.org/jira/browse/OAK-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-3269: --- Fix Version/s: 1.3.6 Improve Lucene indexer resilience - Key: OAK-3269 URL: https://issues.apache.org/jira/browse/OAK-3269 Project: Jackrabbit Oak Issue Type: Improvement Components: lucene Reporter: Michael Marth Labels: resilience Fix For: 1.3.6 As discussed bilaterally grouping the improvements for Lucene indexer resilience in this issue for easier tracking -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-2556) do intermediate commit during async indexing
[ https://issues.apache.org/jira/browse/OAK-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-2556: --- Fix Version/s: (was: 1.3.5) 1.3.6 do intermediate commit during async indexing Key: OAK-2556 URL: https://issues.apache.org/jira/browse/OAK-2556 Project: Jackrabbit Oak Issue Type: Bug Components: lucene Affects Versions: 1.0.11 Reporter: Stefan Egli Labels: resilience Fix For: 1.3.6 A recent issue found at a customer unveils a potential issue with the async indexer. Reading the AsyncIndexUpdate.updateIndex it looks like it is doing the entire update of the async indexer *in one go*, ie in one commit. When there is - for some reason - however, a huge diff that the async indexer has to process, the 'one big commit' can become gigantic. There is no limit to the size of the commit in fact. So the suggestion is to do intermediate commits while the async indexer is going on. The reason this is acceptable is the fact that by doing async indexing, that index is anyway not 100% up-to-date - so it would not make much of a difference if it would commit after every 100 or 1000 changes either. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-2722) IndexCopier fails to delete older index directory upon reindex
[ https://issues.apache.org/jira/browse/OAK-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-2722: --- Fix Version/s: (was: 1.3.5) 1.3.6 IndexCopier fails to delete older index directory upon reindex -- Key: OAK-2722 URL: https://issues.apache.org/jira/browse/OAK-2722 Project: Jackrabbit Oak Issue Type: Bug Components: lucene Reporter: Chetan Mehrotra Assignee: Chetan Mehrotra Priority: Minor Labels: resilience Fix For: 1.3.6 {{IndexCopier}} tries to remove the older index directory incase of reindex. This might fails on platform like Windows if the files are still memory mapped or are locked. For deleting directories we would need to take similar approach like being done with deleting old index files i.e. do retries later. Due to this following test fails on Windows (Per [~julian.resc...@gmx.de] ) {noformat} Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.07 sec FAILURE! deleteOldPostReindex(org.apache.jackrabbit.oak.plugins.index.lucene.IndexCopierTest) Time elapsed: 0.02 sec FAILURE! java.lang.AssertionError: Old index directory should have been removed at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertFalse(Assert.java:68) at org.apache.jackrabbit.oak.plugins.index.lucene.IndexCopierTest.deleteOldPostReindex(IndexCopierTest.java:160) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-2556) do intermediate commit during async indexing
[ https://issues.apache.org/jira/browse/OAK-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-2556: --- Issue Type: Improvement (was: Bug) do intermediate commit during async indexing Key: OAK-2556 URL: https://issues.apache.org/jira/browse/OAK-2556 Project: Jackrabbit Oak Issue Type: Improvement Components: lucene Affects Versions: 1.0.11 Reporter: Stefan Egli Labels: resilience Fix For: 1.3.6 A recent issue found at a customer unveils a potential issue with the async indexer. Reading the AsyncIndexUpdate.updateIndex it looks like it is doing the entire update of the async indexer *in one go*, ie in one commit. When there is - for some reason - however, a huge diff that the async indexer has to process, the 'one big commit' can become gigantic. There is no limit to the size of the commit in fact. So the suggestion is to do intermediate commits while the async indexer is going on. The reason this is acceptable is the fact that by doing async indexing, that index is anyway not 100% up-to-date - so it would not make much of a difference if it would commit after every 100 or 1000 changes either. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (OAK-3270) Improve DocumentMK resilience
Michael Marth created OAK-3270: -- Summary: Improve DocumentMK resilience Key: OAK-3270 URL: https://issues.apache.org/jira/browse/OAK-3270 Project: Jackrabbit Oak Issue Type: Improvement Components: mongomk, rdbmk Reporter: Michael Marth Collection of DocMK resilience improvements -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (OAK-3263) Support including and excluding paths for PropertyIndex
[ https://issues.apache.org/jira/browse/OAK-3263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710467#comment-14710467 ] Manfred Baedke edited comment on OAK-3263 at 8/25/15 2:37 AM: -- Added incomplete patch OAK-3263-prelimary.patch (based on branch 1.0) for reference purposes. [~chetanm], would you take a look to tell me if this goes into the right direction? Also I'm unsure about the equivalent of the IndexPlanner patch from OAK-2599 (the opt-out in case of query path mismatch); any pointer is appreciated. was (Author: baedke): Added incomplete patch OAK-3263-prelimary.patch for reference purposes. [~chetanm], would you take a look to tell me if this goes into the right direction? Also I'm unsure about the equivalent of the IndexPlanner patch from OAK-2599 (the opt-out in case of query path mismatch); any pointer is appreciated. Support including and excluding paths for PropertyIndex --- Key: OAK-3263 URL: https://issues.apache.org/jira/browse/OAK-3263 Project: Jackrabbit Oak Issue Type: Improvement Components: query Reporter: Chetan Mehrotra Fix For: 1.3.6 Attachments: OAK-3263-prelimary.patch As part of OAK-2599 support for excluding and including paths were added to Lucene index. It would be good to have such a support enabled for PropertyIndexe also -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-3263) Support including and excluding paths for PropertyIndex
[ https://issues.apache.org/jira/browse/OAK-3263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manfred Baedke updated OAK-3263: Attachment: OAK-3263-prelimary.patch Added incomplete patch OAK-3263-prelimary.patch for reference purposes. [~chetanm], would you take a look to tell me if this goes into the right direction? Also I'm unsure about the equivalent of the IndexPlanner patch from OAK-2599 (the opt-out in case of query path mismatch); any pointer is appreciated. Support including and excluding paths for PropertyIndex --- Key: OAK-3263 URL: https://issues.apache.org/jira/browse/OAK-3263 Project: Jackrabbit Oak Issue Type: Improvement Components: query Reporter: Chetan Mehrotra Fix For: 1.3.6 Attachments: OAK-3263-prelimary.patch As part of OAK-2599 support for excluding and including paths were added to Lucene index. It would be good to have such a support enabled for PropertyIndexe also -- This message was sent by Atlassian JIRA (v6.3.4#6332)