Re: Supporting "resumable" operations on a large tree

2017-02-21 Thread Thomas Mueller
Hi,

For re-indexing, there are two problems actually:

* Indexing can take multiple days, so resume would be nice
* For synchronous indexes, indexing create a large commit, which is
problematic (specially for MongoDB)

To solve both problems ("kill two birds with one stone"), we could instead
try to split indexing into multiple commits. For example use a "fromPath"
.. "toPath" range, and only re-index part of the repository at a time. See
also 
https://issues.apache.org/jira/browse/OAK-5324?focusedCommentId=15837941&pa
ge=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment
-15837941

Regards,
Thomas



On 20/02/17 13:13, "Chetan Mehrotra"  wrote:

>Hi Team,
>
>In Oak many a times we perform operations which traverse the tree as
>part of some processing. For e.g. commit hooks, side grade, indexing
>etc. For small tree this works fine and in case of failure the
>processing can be done again from start.
>
>However for large operations like reindexing whole repository for some
>index this posses a problem. For example consider a Mongo setup having
>100M+ nodes and we need to provision a new index. This would trigger
>an IndexUpdate which would go through all the nodes in the repository
>(in some depth first manner) and then build up the index. This process
>can take long time say 1-2 days for a Mongo based setup.
>
>As with any remote setup such a flow may get interrupted due to some
>network issue or outage on Mongo/RDB side. In such a case the whole
>traversal is started again from start.
>
>Same would be the case for any sidegrade operation where we convert a
>big repository from one form to another.
>
>To improve the resiliency of such operations (OAK-2063) we need a way
>to "resume" traversal in a tree from some last known point. For
>operations performed on a sorted list such a "resume" is easy but
>doing that over a tree traversal looks tricky.
>
>Thoughts on what approach can be taken for enabling this?
>
>May be if we can expect a stable order in traversal at a given
>revision then we can keep track of paths t certain depth and then on
>retry skip processing of subtrees untill untill we get that path
>
>Chetan Mehrotra



Re: [Observation] Should listeners require constant inflow of commits to get all events?

2017-02-21 Thread Stefan Egli
>>
>>But agreed, this is a bug and we should fix it.
>>
>Actually, I'm not too sure as long as we concretely document the
>behavior and potentially have a sample abstract
>commit-creator/listener which does the job well (may be similar to the
>hack I used)

I've created OAK-5740 and attached test case that reproduces this. We can
follow up there if/when/how we want to fix this.

Cheers,
Stefan




Flaky tests due to timing issues

2017-02-21 Thread Michael Dürig


Hi,

I assume that at least some of the tests that sporadically fail on the 
Apache Jenkins fail because of timing issues. To address this we could 
either


a) skip these tests on Jenkins,
b) increase the time-out,
c) apply platform dependent time-outs.


I would prefer b). I presume that there is no impact on the build time 
unless the build fails anyway because it is running into one of these 
time-outs. If this is not acceptable we could go for b) and provision 
platform dependent time-outs through the CIHelpers class. I somewhat 
dislike the additional complexity though. As last resort we can still do a).


WDYT?

Michal


Re: Flaky tests due to timing issues

2017-02-21 Thread Thomas Mueller
Hi,

I assume with (b) you mean: change tests to use loops, combined with very
high timeouts. Example:

Before:

save();
Thread.sleep(1000);
assertTrue(abc());

After:

save();
for(int i=0; !abc() && i<600; i++) {
Thread.sleep(100);
}
assertTrue(abc());



The additional benefit of this logic is that on a fast machine, the test
is faster (only 100 ms sleep instead of 1 second). Disadvantage:
additional complexity, as you wrote (could be avoided with Java 8 lambda
expressions).

Regards,
Thomas



On 21/02/17 13:49, "Michael Dürig"  wrote:

>
>Hi,
>
>I assume that at least some of the tests that sporadically fail on the
>Apache Jenkins fail because of timing issues. To address this we could
>either
>
>a) skip these tests on Jenkins,
>b) increase the time-out,
>c) apply platform dependent time-outs.
>
>
>I would prefer b). I presume that there is no impact on the build time
>unless the build fails anyway because it is running into one of these
>time-outs. If this is not acceptable we could go for b) and provision
>platform dependent time-outs through the CIHelpers class. I somewhat
>dislike the additional complexity though. As last resort we can still do
>a).
>
>WDYT?
>
>Michal



Re: Flaky tests due to timing issues

2017-02-21 Thread Alex Parvulescu
Hi,

If in this context b) actually means 'fix the tests to be more lenient' I
agree this should be the way to go.

However I think the failing tests are not being given enough priority
currently and if people aren't able to carve out the time for
investigation, then it means we're stuck with very unreliable builds and
this can hide more problems down the line.
I would disable the tests that have failed more that a few times, on
Jenkins only with the goal of returning to a consistent green build again.

alex





On Tue, Feb 21, 2017 at 2:09 PM, Thomas Mueller  wrote:

> Hi,
>
> I assume with (b) you mean: change tests to use loops, combined with very
> high timeouts. Example:
>
> Before:
>
> save();
> Thread.sleep(1000);
> assertTrue(abc());
>
> After:
>
> save();
> for(int i=0; !abc() && i<600; i++) {
> Thread.sleep(100);
> }
> assertTrue(abc());
>
>
>
> The additional benefit of this logic is that on a fast machine, the test
> is faster (only 100 ms sleep instead of 1 second). Disadvantage:
> additional complexity, as you wrote (could be avoided with Java 8 lambda
> expressions).
>
> Regards,
> Thomas
>
>
>
> On 21/02/17 13:49, "Michael Dürig"  wrote:
>
> >
> >Hi,
> >
> >I assume that at least some of the tests that sporadically fail on the
> >Apache Jenkins fail because of timing issues. To address this we could
> >either
> >
> >a) skip these tests on Jenkins,
> >b) increase the time-out,
> >c) apply platform dependent time-outs.
> >
> >
> >I would prefer b). I presume that there is no impact on the build time
> >unless the build fails anyway because it is running into one of these
> >time-outs. If this is not acceptable we could go for b) and provision
> >platform dependent time-outs through the CIHelpers class. I somewhat
> >dislike the additional complexity though. As last resort we can still do
> >a).
> >
> >WDYT?
> >
> >Michal
>
>


Re: Flaky tests due to timing issues

2017-02-21 Thread Michael Dürig



On 21.02.17 14:09, Thomas Mueller wrote:

Hi,

I assume with (b) you mean: change tests to use loops, combined with very
high timeouts. Example:


No I actually meant getting individual time-out values (or a scaling 
factor for time-outs) from CIHelper. That class already provides the 
means to skip tests based on where they are running. So it should be 
relatively straight forward to have it supply scaling factors for 
time-outs in a similar manner.


Michael



Before:

save();
Thread.sleep(1000);
assertTrue(abc());

After:

save();
for(int i=0; !abc() && i<600; i++) {
Thread.sleep(100);
}
assertTrue(abc());



The additional benefit of this logic is that on a fast machine, the test
is faster (only 100 ms sleep instead of 1 second). Disadvantage:
additional complexity, as you wrote (could be avoided with Java 8 lambda
expressions).

Regards,
Thomas



On 21/02/17 13:49, "Michael Dürig"  wrote:



Hi,

I assume that at least some of the tests that sporadically fail on the
Apache Jenkins fail because of timing issues. To address this we could
either

a) skip these tests on Jenkins,
b) increase the time-out,
c) apply platform dependent time-outs.


I would prefer b). I presume that there is no impact on the build time
unless the build fails anyway because it is running into one of these
time-outs. If this is not acceptable we could go for b) and provision
platform dependent time-outs through the CIHelpers class. I somewhat
dislike the additional complexity though. As last resort we can still do
a).

WDYT?

Michal




Re: Flaky tests due to timing issues

2017-02-21 Thread Michael Dürig



On 21.02.17 15:32, Alex Parvulescu wrote:

Hi,

If in this context b) actually means 'fix the tests to be more lenient' I
agree this should be the way to go.


Yes and this is not necessarily bad as I presume that some of the 
time-outs might well be overly tight (at least in some environments).




However I think the failing tests are not being given enough priority
currently and if people aren't able to carve out the time for
investigation, then it means we're stuck with very unreliable builds and
this can hide more problems down the line.
I would disable the tests that have failed more that a few times, on
Jenkins only with the goal of returning to a consistent green build again.


+1. Please have a look at the CIHelper class and its usage for disabling 
tests on Jenkins only.


Michael




alex





On Tue, Feb 21, 2017 at 2:09 PM, Thomas Mueller  wrote:


Hi,

I assume with (b) you mean: change tests to use loops, combined with very
high timeouts. Example:

Before:

save();
Thread.sleep(1000);
assertTrue(abc());

After:

save();
for(int i=0; !abc() && i<600; i++) {
Thread.sleep(100);
}
assertTrue(abc());



The additional benefit of this logic is that on a fast machine, the test
is faster (only 100 ms sleep instead of 1 second). Disadvantage:
additional complexity, as you wrote (could be avoided with Java 8 lambda
expressions).

Regards,
Thomas



On 21/02/17 13:49, "Michael Dürig"  wrote:



Hi,

I assume that at least some of the tests that sporadically fail on the
Apache Jenkins fail because of timing issues. To address this we could
either

a) skip these tests on Jenkins,
b) increase the time-out,
c) apply platform dependent time-outs.


I would prefer b). I presume that there is no impact on the build time
unless the build fails anyway because it is running into one of these
time-outs. If this is not acceptable we could go for b) and provision
platform dependent time-outs through the CIHelpers class. I somewhat
dislike the additional complexity though. As last resort we can still do
a).

WDYT?

Michal







Re: Flaky tests due to timing issues

2017-02-21 Thread Thomas Mueller
Hi,

>No I actually meant getting individual time-out values (or a scaling
>factor for time-outs) from CIHelper. That class already provides the
>means to skip tests based on where they are running. So it should be
>relatively straight forward to have it supply scaling factors for
>time-outs in a similar manner.

Do you have an example?

I think timeouts in the order of seconds are problematic, and I don't
think that "scaling" them to 5 seconds or so will fully solve the problem.
Timeouts in the order of minutes are better, but I wouldn't want to
_always_ delay tests that long. That's why I believe using loops is
better. But in that case, configuration seems unnecessary.

Regards,
Thomas



[Oak origin/1.6] Apache Jackrabbit Oak matrix - Build # 1441 - Still Failing

2017-02-21 Thread Apache Jenkins Server
The Apache Jenkins build system has built Apache Jackrabbit Oak matrix (build 
#1441)

Status: Still Failing

Check console output at 
https://builds.apache.org/job/Apache%20Jackrabbit%20Oak%20matrix/1441/ to view 
the results.

Changes:
[reschke] OAK-5738: Potential NPE in LargeLdapProviderTest (ported to 1.6)

 

Test results:
1 tests failed.
FAILED:  
org.apache.jackrabbit.oak.run.osgi.SecurityProviderRegistrationTest.testRequiredUserAuthenticationFactoryNotAvailable

Error Message:
assert securityProviderServiceReferences != null|   
  |null  false

Stack Trace:
Assertion failed: 

assert securityProviderServiceReferences != null
   | |
   null  false

at 
org.codehaus.groovy.runtime.InvokerHelper.assertFailed(InvokerHelper.java:398)
at 
org.codehaus.groovy.runtime.ScriptBytecodeAdapter.assertFailed(ScriptBytecodeAdapter.java:646)
at 
org.apache.jackrabbit.oak.run.osgi.SecurityProviderRegistrationTest.testRequiredService(SecurityProviderRegistrationTest.groovy:296)
at 
org.apache.jackrabbit.oak.run.osgi.SecurityProviderRegistrationTest.this$3$testRequiredService(SecurityProviderRegistrationTest.groovy)
at 
org.apache.jackrabbit.oak.run.osgi.SecurityProviderRegistrationTest$this$3$testRequiredService.callCurrent(Unknown
 Source)
at 
org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:49)
at 
org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:133)
at 
org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:145)
at 
org.apache.jackrabbit.oak.run.osgi.SecurityProviderRegistrationTest.testRequiredUserAuthenticationFactoryNotAvailable(SecurityProviderRegistrationTest.groovy:139)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
at 
org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
at 
org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)




[Oak origin/1.4] Apache Jackrabbit Oak matrix - Build # 1442 - Still Failing

2017-02-21 Thread Apache Jenkins Server
The Apache Jenkins build system has built Apache Jackrabbit Oak matrix (build 
#1442)

Status: Still Failing

Check console output at 
https://builds.apache.org/job/Apache%20Jackrabbit%20Oak%20matrix/1442/ to view 
the results.

Changes:
[reschke] OAK-5738: Potential NPE in LargeLdapProviderTest (ported to 1.4)

 

Test results:
1 tests failed.
FAILED:  
org.apache.jackrabbit.oak.plugins.segment.standby.ExternalSharedStoreIT.testProxyFlippedIntermediateByte

Error Message:
expected:<{ root = { ... } }> but was:<{ root : { } }>

Stack Trace:
java.lang.AssertionError: expected:<{ root = { ... } }> but was:<{ root : { } }>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:834)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.jackrabbit.oak.plugins.segment.standby.DataStoreTestBase.useProxy(DataStoreTestBase.java:224)
at 
org.apache.jackrabbit.oak.plugins.segment.standby.DataStoreTestBase.testProxyFlippedIntermediateByte(DataStoreTestBase.java:174)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
at 
org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
at 
org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)




[Oak origin/1.2] Apache Jackrabbit Oak matrix - Build # 1443 - Still Failing

2017-02-21 Thread Apache Jenkins Server
The Apache Jenkins build system has built Apache Jackrabbit Oak matrix (build 
#1443)

Status: Still Failing

Check console output at 
https://builds.apache.org/job/Apache%20Jackrabbit%20Oak%20matrix/1443/ to view 
the results.

Changes:
[reschke] OAK-5738: Potential NPE in LargeLdapProviderTest (ported to 1.2)

 

Test results:
All tests passed

[Oak origin/trunk] Apache Jackrabbit Oak matrix - Build # 1444 - Still Failing

2017-02-21 Thread Apache Jenkins Server
The Apache Jenkins build system has built Apache Jackrabbit Oak matrix (build 
#1444)

Status: Still Failing

Check console output at 
https://builds.apache.org/job/Apache%20Jackrabbit%20Oak%20matrix/1444/ to view 
the results.

Changes:
[reschke] OAK-5738: Potential NPE in LargeLdapProviderTest

[reschke] OAK-5612: Test failure:

[thomasm] OAK-4888 Warn or fail queries above a configurable cost value

[stefanegli] OAK-5742 : logging more details in case stopAndWait returns false 
- to

[angela] OAK-5743 : UserQueryManager: omits nt-name when searching for 
properties

[angela] OAK-5689 : AbstractSecurityTest: enforce test-failure for traversal

 

Test results:
All tests passed