[jira] [Created] (OAK-5271) IndexDefinitionBuilder should ignore change in certain properties for determining reindex flag value
Chetan Mehrotra created OAK-5271: Summary: IndexDefinitionBuilder should ignore change in certain properties for determining reindex flag value Key: OAK-5271 URL: https://issues.apache.org/jira/browse/OAK-5271 Project: Jackrabbit Oak Issue Type: Improvement Components: lucene Reporter: Chetan Mehrotra Assignee: Chetan Mehrotra Priority: Minor Fix For: 1.6 IndexDefinitionBuilder runs an equals check on before and post change nodestate. If there is even a single change it would set the reindex flag to true. Changes in certain properties like switching async to (async,nrt) should be considered reidnex safe change and hence should not lead to reindex being set to true upon change -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-5253) Optimize AbstractBlob#equal to not do content equals when possible
[ https://issues.apache.org/jira/browse/OAK-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15741207#comment-15741207 ] Amit Jain commented on OAK-5253: We could change the {{SegmentBlob#equals()}} implementation to check for equality of getBlobIds() if non null otherwise delegate it to the AbstractBlob as being done currently. This is similar to what is already done for {{BlobStoreBlob#equals()}}. [~chetanm], [~jsedding], [~mduerig] wdyt? > Optimize AbstractBlob#equal to not do content equals when possible > -- > > Key: OAK-5253 > URL: https://issues.apache.org/jira/browse/OAK-5253 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: blob >Reporter: Amit Jain >Assignee: Amit Jain > Fix For: 1.6, 1.5.16, 1.4.11 > > Attachments: OAK-5253.1.patch > > > AbstractBlob#equals tries to match content when length is equal and content > identities is not null and different. Matching content triggers an expensive > download of binaries for S3DataStore. > Since, right now the content identity is the content hash the check can be > short -circuited when the content identities is not null and not equal to > return false. > This can be revisited if we change the identity to something different. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3976) journal should support large(r) entries
[ https://issues.apache.org/jira/browse/OAK-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15740133#comment-15740133 ] Vikas Saurabh commented on OAK-3976: To have a sense of what kind of threshold won't impact day-to-day commits and operations. All but one journal entries created on a fresh setup of AEM had information about sub-10k paths (mostly around 2-3k). One journal entry was creating full 2-level hierarchy (don't know why/how) with 2 hex digits ({{/../ab/cd}}) accounting for 65k nodes due to this and total number of nodes on that entry was 66733. > journal should support large(r) entries > --- > > Key: OAK-3976 > URL: https://issues.apache.org/jira/browse/OAK-3976 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: documentmk >Affects Versions: 1.3.14 >Reporter: Stefan Egli >Assignee: Vikas Saurabh > Fix For: 1.6, 1.5.16 > > > Journal entries are created in the background write. Normally this happens > every second. If for some reason there is a large delay between two > background writes, the number of pending changes can also accumulate. Which > can result in (arbitrary) large single journal entries (ie with large {{_c}} > property). > This can cause multiple problems down the road: > * journal gc at this point loads 450 entries - and if some are large this can > result in a very large memory consumption during gc (which can cause severe > stability problems for the VM, if not OOM etc). This should be fixed with > OAK-3001 (where we only get the id, thus do not care how big {{_c}} is) > * before OAK-3001 is done (which is currently scheduled after 1.4) what we > can do is reduce the delete batch size (OAK-3975) > * background reads however also read the journal entries and even if > OAK-3001/OAK-3975 are implemented the background read can still cause large > memory consumption. So we need to improve this one way or another. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3976) journal should support large(r) entries
[ https://issues.apache.org/jira/browse/OAK-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15740002#comment-15740002 ] Vikas Saurabh commented on OAK-3976: Also, I was a bit afraid of any sort of deadlocks arising here. So, I tried \[0] 100 committer threads adding a random node (sleeping 10ms in between) and a thread running background ops every 1s (journal push threshold set to 10). Letting this party run for 10s didn't dead-lock... so, there's a bit of relief :). \[0]: @Test public void journalPushMustntDeadlock() throws Exception { int oldJournalPushThreshold = DocumentNodeStore.journalPushThreshold; DocumentNodeStore.journalPushThreshold = 10; try { final DocumentNodeStore ns = builderProvider.newBuilder().setAsyncDelay(0).getNodeStore(); final AtomicBoolean stopTest = new AtomicBoolean(); List threads = new ArrayList<>(); threads.add(new Thread(new Runnable() { @Override public void run() { while (!stopTest.get()) { ns.runBackgroundOperations(); try { Thread.sleep(1000); //slow background thread } catch (InterruptedException e) { // ignore and continue; } } } })); for (int i = 0; i < 100; i++) { threads.add(new Thread(new Runnable() { @Override public void run() { while (!stopTest.get()) { NodeBuilder builder = ns.getRoot().builder(); builder.child("foo" + UUID.randomUUID()); try { merge(ns, builder); } catch (CommitFailedException e) { e.printStackTrace(); //ignore errors and continue } try { Thread.sleep(10); } catch (InterruptedException e) { // ignore and continue; } } } })); } for (Thread t : threads) { t.start(); } Thread.sleep(1);//let them party for 10 seconds stopTest.set(true); for (Thread t : threads) { t.join(); } int i = 1; } finally { DocumentNodeStore.journalPushThreshold = oldJournalPushThreshold; } } > journal should support large(r) entries > --- > > Key: OAK-3976 > URL: https://issues.apache.org/jira/browse/OAK-3976 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: documentmk >Affects Versions: 1.3.14 >Reporter: Stefan Egli >Assignee: Vikas Saurabh > Fix For: 1.6, 1.5.16 > > > Journal entries are created in the background write. Normally this happens > every second. If for some reason there is a large delay between two > background writes, the number of pending changes can also accumulate. Which > can result in (arbitrary) large single journal entries (ie with large {{_c}} > property). > This can cause multiple problems down the road: > * journal gc at this point loads 450 entries - and if some are large this can > result in a very large memory consumption during gc (which can cause severe > stability problems for the VM, if not OOM etc). This should be fixed with > OAK-3001 (where we only get the id, thus do not care how big {{_c}} is) > * before OAK-3001 is done (which is currently scheduled after 1.4) what we > can do is reduce the delete batch size (OAK-3975) > * background reads however also read the journal entries and even if > OAK-3001/OAK-3975 are implemented the background read can still cause large > memory consumption. So we need to improve this one way or another. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (OAK-3976) journal should support large(r) entries
[ https://issues.apache.org/jira/browse/OAK-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15740002#comment-15740002 ] Vikas Saurabh edited comment on OAK-3976 at 12/11/16 4:42 PM: -- Also, I was a bit afraid of any sort of deadlocks arising here. So, I tried \[0] 100 committer threads adding a random node (sleeping 10ms in between) and a thread running background ops every 1s (journal push threshold set to 10). Letting this party run for 10s didn't dead-lock... so, there's a bit of relief :). \[0]: {code} @Test public void journalPushMustntDeadlock() throws Exception { int oldJournalPushThreshold = DocumentNodeStore.journalPushThreshold; DocumentNodeStore.journalPushThreshold = 10; try { final DocumentNodeStore ns = builderProvider.newBuilder().setAsyncDelay(0).getNodeStore(); final AtomicBoolean stopTest = new AtomicBoolean(); List threads = new ArrayList<>(); threads.add(new Thread(new Runnable() { @Override public void run() { while (!stopTest.get()) { ns.runBackgroundOperations(); try { Thread.sleep(1000); //slow background thread } catch (InterruptedException e) { // ignore and continue; } } } })); for (int i = 0; i < 100; i++) { threads.add(new Thread(new Runnable() { @Override public void run() { while (!stopTest.get()) { NodeBuilder builder = ns.getRoot().builder(); builder.child("foo" + UUID.randomUUID()); try { merge(ns, builder); } catch (CommitFailedException e) { e.printStackTrace(); //ignore errors and continue } try { Thread.sleep(10); } catch (InterruptedException e) { // ignore and continue; } } } })); } for (Thread t : threads) { t.start(); } Thread.sleep(1);//let them party for 10 seconds stopTest.set(true); for (Thread t : threads) { t.join(); } int i = 1; } finally { DocumentNodeStore.journalPushThreshold = oldJournalPushThreshold; } } {code} was (Author: catholicon): Also, I was a bit afraid of any sort of deadlocks arising here. So, I tried \[0] 100 committer threads adding a random node (sleeping 10ms in between) and a thread running background ops every 1s (journal push threshold set to 10). Letting this party run for 10s didn't dead-lock... so, there's a bit of relief :). \[0]: @Test public void journalPushMustntDeadlock() throws Exception { int oldJournalPushThreshold = DocumentNodeStore.journalPushThreshold; DocumentNodeStore.journalPushThreshold = 10; try { final DocumentNodeStore ns = builderProvider.newBuilder().setAsyncDelay(0).getNodeStore(); final AtomicBoolean stopTest = new AtomicBoolean(); List threads = new ArrayList<>(); threads.add(new Thread(new Runnable() { @Override public void run() { while (!stopTest.get()) { ns.runBackgroundOperations(); try { Thread.sleep(1000); //slow background thread } catch (InterruptedException e) { // ignore and continue; } } } })); for (int i = 0; i < 100; i++) { threads.add(new Thread(new Runnable() { @Override public void run() { while (!stopTest.get()) { NodeBuilder builder = ns.getRoot().builder(); builder.child("foo" + UUID.randomUUID()); try { merge(ns, builder); } catch (CommitFailedException e) { e.printStackTrace(); //ignore errors and continue } try { Thread.sleep(10);
[jira] [Commented] (OAK-3976) journal should support large(r) entries
[ https://issues.apache.org/jira/browse/OAK-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15739956#comment-15739956 ] Vikas Saurabh commented on OAK-3976: While writing a few tests (specifically when force push happens), I realized that the patch above actually counts more that it truly should -- {{add(/foo) -> 2 paths (/, /foo) -> add(/bar) -> 4 paths (/, /foo, /, /bar)}}. Notice '/' got counted twice. But, as this is a preventive feature, I don't know if it is sufficient to warrant some refactor in {{JournalEntry.TreeNode#getOrCreateNode}} to somehow tell if it got created) to do the calculation correctly. /cc [~mreutegg] > journal should support large(r) entries > --- > > Key: OAK-3976 > URL: https://issues.apache.org/jira/browse/OAK-3976 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: documentmk >Affects Versions: 1.3.14 >Reporter: Stefan Egli >Assignee: Vikas Saurabh > Fix For: 1.6, 1.5.16 > > > Journal entries are created in the background write. Normally this happens > every second. If for some reason there is a large delay between two > background writes, the number of pending changes can also accumulate. Which > can result in (arbitrary) large single journal entries (ie with large {{_c}} > property). > This can cause multiple problems down the road: > * journal gc at this point loads 450 entries - and if some are large this can > result in a very large memory consumption during gc (which can cause severe > stability problems for the VM, if not OOM etc). This should be fixed with > OAK-3001 (where we only get the id, thus do not care how big {{_c}} is) > * before OAK-3001 is done (which is currently scheduled after 1.4) what we > can do is reduce the delete batch size (OAK-3975) > * background reads however also read the journal entries and even if > OAK-3001/OAK-3975 are implemented the background read can still cause large > memory consumption. So we need to improve this one way or another. -- This message was sent by Atlassian JIRA (v6.3.4#6332)