[jira] [Created] (OAK-5271) IndexDefinitionBuilder should ignore change in certain properties for determining reindex flag value

2016-12-11 Thread Chetan Mehrotra (JIRA)
Chetan Mehrotra created OAK-5271:


 Summary: IndexDefinitionBuilder should ignore change in certain 
properties for determining reindex flag value
 Key: OAK-5271
 URL: https://issues.apache.org/jira/browse/OAK-5271
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: lucene
Reporter: Chetan Mehrotra
Assignee: Chetan Mehrotra
Priority: Minor
 Fix For: 1.6


IndexDefinitionBuilder runs an equals check on before and post change 
nodestate. If there is even a single change it would set the reindex flag to 
true.

Changes in certain properties like switching async to (async,nrt) should be 
considered reidnex safe change and hence should not lead to reindex being set 
to true upon change



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-5253) Optimize AbstractBlob#equal to not do content equals when possible

2016-12-11 Thread Amit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15741207#comment-15741207
 ] 

Amit Jain commented on OAK-5253:


We could change the {{SegmentBlob#equals()}} implementation to check for 
equality of getBlobIds() if non null otherwise delegate it to the AbstractBlob 
as being done currently.
This is similar to what is already done for {{BlobStoreBlob#equals()}}.

[~chetanm], [~jsedding], [~mduerig] wdyt?

> Optimize AbstractBlob#equal to not do content equals when possible
> --
>
> Key: OAK-5253
> URL: https://issues.apache.org/jira/browse/OAK-5253
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: blob
>Reporter: Amit Jain
>Assignee: Amit Jain
> Fix For: 1.6, 1.5.16, 1.4.11
>
> Attachments: OAK-5253.1.patch
>
>
> AbstractBlob#equals tries to match content when length is equal and content 
> identities is not null and different. Matching content triggers an expensive 
> download of binaries for S3DataStore.
> Since, right now the content identity is the content hash the check can be 
> short -circuited when the content identities is not null and not equal to 
> return false.
> This can be revisited if we change the identity to something different.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3976) journal should support large(r) entries

2016-12-11 Thread Vikas Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15740133#comment-15740133
 ] 

Vikas Saurabh commented on OAK-3976:


To have a sense of what kind of threshold won't impact day-to-day commits and 
operations. All but one journal entries created on a fresh setup of AEM had 
information about sub-10k paths (mostly around 2-3k). One journal entry was 
creating full 2-level hierarchy (don't know why/how) with 2 hex digits 
({{/../ab/cd}}) accounting for 65k nodes due to this and total number of nodes 
on that entry was 66733.

> journal should support large(r) entries
> ---
>
> Key: OAK-3976
> URL: https://issues.apache.org/jira/browse/OAK-3976
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Affects Versions: 1.3.14
>Reporter: Stefan Egli
>Assignee: Vikas Saurabh
> Fix For: 1.6, 1.5.16
>
>
> Journal entries are created in the background write. Normally this happens 
> every second. If for some reason there is a large delay between two 
> background writes, the number of pending changes can also accumulate. Which 
> can result in (arbitrary) large single journal entries (ie with large {{_c}} 
> property).
> This can cause multiple problems down the road:
> * journal gc at this point loads 450 entries - and if some are large this can 
> result in a very large memory consumption during gc (which can cause severe 
> stability problems for the VM, if not OOM etc). This should be fixed with 
> OAK-3001 (where we only get the id, thus do not care how big {{_c}} is)
> * before OAK-3001 is done (which is currently scheduled after 1.4) what we 
> can do is reduce the delete batch size (OAK-3975)
> * background reads however also read the journal entries and even if 
> OAK-3001/OAK-3975 are implemented the background read can still cause large 
> memory consumption. So we need to improve this one way or another.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3976) journal should support large(r) entries

2016-12-11 Thread Vikas Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15740002#comment-15740002
 ] 

Vikas Saurabh commented on OAK-3976:


Also, I was a bit afraid of any sort of deadlocks arising here. So, I tried 
\[0] 100 committer threads adding a random node (sleeping 10ms in between) and 
a thread running background ops every 1s (journal push threshold set to 10). 
Letting this party run for 10s didn't dead-lock... so, there's a bit of relief 
:).

\[0]:

@Test
public void journalPushMustntDeadlock() throws Exception {
int oldJournalPushThreshold = DocumentNodeStore.journalPushThreshold;
DocumentNodeStore.journalPushThreshold = 10;
try {
final DocumentNodeStore ns = 
builderProvider.newBuilder().setAsyncDelay(0).getNodeStore();
final AtomicBoolean stopTest = new AtomicBoolean();

List threads = new ArrayList<>();
threads.add(new Thread(new Runnable() {
@Override
public void run() {
while (!stopTest.get()) {
ns.runBackgroundOperations();
try {
Thread.sleep(1000); //slow background thread
} catch (InterruptedException e) {
// ignore and continue;
}
}
}
}));
for (int i = 0; i < 100; i++) {
threads.add(new Thread(new Runnable() {
@Override
public void run() {
while (!stopTest.get()) {
NodeBuilder builder = ns.getRoot().builder();
builder.child("foo" + UUID.randomUUID());
try {
merge(ns, builder);
} catch (CommitFailedException e) {
e.printStackTrace();
//ignore errors and continue
}
try {
Thread.sleep(10);
} catch (InterruptedException e) {
// ignore and continue;
}
}
}
}));
}

for (Thread t : threads) {
t.start();
}
Thread.sleep(1);//let them party for 10 seconds
stopTest.set(true);
for (Thread t : threads) {
t.join();
}
int i = 1;
} finally {
DocumentNodeStore.journalPushThreshold = oldJournalPushThreshold;
}
}

> journal should support large(r) entries
> ---
>
> Key: OAK-3976
> URL: https://issues.apache.org/jira/browse/OAK-3976
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Affects Versions: 1.3.14
>Reporter: Stefan Egli
>Assignee: Vikas Saurabh
> Fix For: 1.6, 1.5.16
>
>
> Journal entries are created in the background write. Normally this happens 
> every second. If for some reason there is a large delay between two 
> background writes, the number of pending changes can also accumulate. Which 
> can result in (arbitrary) large single journal entries (ie with large {{_c}} 
> property).
> This can cause multiple problems down the road:
> * journal gc at this point loads 450 entries - and if some are large this can 
> result in a very large memory consumption during gc (which can cause severe 
> stability problems for the VM, if not OOM etc). This should be fixed with 
> OAK-3001 (where we only get the id, thus do not care how big {{_c}} is)
> * before OAK-3001 is done (which is currently scheduled after 1.4) what we 
> can do is reduce the delete batch size (OAK-3975)
> * background reads however also read the journal entries and even if 
> OAK-3001/OAK-3975 are implemented the background read can still cause large 
> memory consumption. So we need to improve this one way or another.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (OAK-3976) journal should support large(r) entries

2016-12-11 Thread Vikas Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15740002#comment-15740002
 ] 

Vikas Saurabh edited comment on OAK-3976 at 12/11/16 4:42 PM:
--

Also, I was a bit afraid of any sort of deadlocks arising here. So, I tried 
\[0] 100 committer threads adding a random node (sleeping 10ms in between) and 
a thread running background ops every 1s (journal push threshold set to 10). 
Letting this party run for 10s didn't dead-lock... so, there's a bit of relief 
:).

\[0]:
{code}
@Test
public void journalPushMustntDeadlock() throws Exception {
int oldJournalPushThreshold = DocumentNodeStore.journalPushThreshold;
DocumentNodeStore.journalPushThreshold = 10;
try {
final DocumentNodeStore ns = 
builderProvider.newBuilder().setAsyncDelay(0).getNodeStore();
final AtomicBoolean stopTest = new AtomicBoolean();

List threads = new ArrayList<>();
threads.add(new Thread(new Runnable() {
@Override
public void run() {
while (!stopTest.get()) {
ns.runBackgroundOperations();
try {
Thread.sleep(1000); //slow background thread
} catch (InterruptedException e) {
// ignore and continue;
}
}
}
}));
for (int i = 0; i < 100; i++) {
threads.add(new Thread(new Runnable() {
@Override
public void run() {
while (!stopTest.get()) {
NodeBuilder builder = ns.getRoot().builder();
builder.child("foo" + UUID.randomUUID());
try {
merge(ns, builder);
} catch (CommitFailedException e) {
e.printStackTrace();
//ignore errors and continue
}
try {
Thread.sleep(10);
} catch (InterruptedException e) {
// ignore and continue;
}
}
}
}));
}

for (Thread t : threads) {
t.start();
}
Thread.sleep(1);//let them party for 10 seconds
stopTest.set(true);
for (Thread t : threads) {
t.join();
}
int i = 1;
} finally {
DocumentNodeStore.journalPushThreshold = oldJournalPushThreshold;
}
}
{code}


was (Author: catholicon):
Also, I was a bit afraid of any sort of deadlocks arising here. So, I tried 
\[0] 100 committer threads adding a random node (sleeping 10ms in between) and 
a thread running background ops every 1s (journal push threshold set to 10). 
Letting this party run for 10s didn't dead-lock... so, there's a bit of relief 
:).

\[0]:

@Test
public void journalPushMustntDeadlock() throws Exception {
int oldJournalPushThreshold = DocumentNodeStore.journalPushThreshold;
DocumentNodeStore.journalPushThreshold = 10;
try {
final DocumentNodeStore ns = 
builderProvider.newBuilder().setAsyncDelay(0).getNodeStore();
final AtomicBoolean stopTest = new AtomicBoolean();

List threads = new ArrayList<>();
threads.add(new Thread(new Runnable() {
@Override
public void run() {
while (!stopTest.get()) {
ns.runBackgroundOperations();
try {
Thread.sleep(1000); //slow background thread
} catch (InterruptedException e) {
// ignore and continue;
}
}
}
}));
for (int i = 0; i < 100; i++) {
threads.add(new Thread(new Runnable() {
@Override
public void run() {
while (!stopTest.get()) {
NodeBuilder builder = ns.getRoot().builder();
builder.child("foo" + UUID.randomUUID());
try {
merge(ns, builder);
} catch (CommitFailedException e) {
e.printStackTrace();
//ignore errors and continue
}
try {
Thread.sleep(10);

[jira] [Commented] (OAK-3976) journal should support large(r) entries

2016-12-11 Thread Vikas Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15739956#comment-15739956
 ] 

Vikas Saurabh commented on OAK-3976:


While writing a few tests (specifically when force push happens), I realized 
that the patch above actually counts more that it truly should -- {{add(/foo) 
-> 2 paths (/, /foo) -> add(/bar) -> 4 paths (/, /foo, /, /bar)}}. Notice '/' 
got counted twice. But, as this is a preventive feature, I don't know if it is 
sufficient to warrant some refactor in 
{{JournalEntry.TreeNode#getOrCreateNode}} to somehow tell if it got created) to 
do the calculation correctly.

/cc [~mreutegg]

> journal should support large(r) entries
> ---
>
> Key: OAK-3976
> URL: https://issues.apache.org/jira/browse/OAK-3976
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Affects Versions: 1.3.14
>Reporter: Stefan Egli
>Assignee: Vikas Saurabh
> Fix For: 1.6, 1.5.16
>
>
> Journal entries are created in the background write. Normally this happens 
> every second. If for some reason there is a large delay between two 
> background writes, the number of pending changes can also accumulate. Which 
> can result in (arbitrary) large single journal entries (ie with large {{_c}} 
> property).
> This can cause multiple problems down the road:
> * journal gc at this point loads 450 entries - and if some are large this can 
> result in a very large memory consumption during gc (which can cause severe 
> stability problems for the VM, if not OOM etc). This should be fixed with 
> OAK-3001 (where we only get the id, thus do not care how big {{_c}} is)
> * before OAK-3001 is done (which is currently scheduled after 1.4) what we 
> can do is reduce the delete batch size (OAK-3975)
> * background reads however also read the journal entries and even if 
> OAK-3001/OAK-3975 are implemented the background read can still cause large 
> memory consumption. So we need to improve this one way or another.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)