Yeah, I can’t see any “Too many open files” messages in your log.
From your log:
-----
[junit4] 2> NOTE: reproduce with: ant test -Dtestcase=ReplaceNodeTest
-Dtests.method=test -Dtests.seed=545A8F7F914CAA60 -Dtests.slow=true
-Dtests.locale=zh-HK -Dtests.timezone=Indian/Cocos -Dtests.asserts=true
-Dtests.file.encoding=UTF-8
[junit4] FAILURE 65.7s | ReplaceNodeTest.test <<<
[junit4] > Throwable #1: java.lang.AssertionError
[junit4] > at
__randomizedtesting.SeedInfo.seed([545A8F7F914CAA60:DC0EB0A53FB0C798]:0)
[junit4] > at
org.apache.solr.cloud.ReplaceNodeTest.test(ReplaceNodeTest.java:79)
-----
I tried again, and ^^ doesn't reproduce on my macbook pro.
Looks like this is a (roughly) 10-second timeout (200 x 50ms) - maybe the
operation is just taking longer than that? - could you try increasing the 200
below to a larger number? maybe also check for other statuses than just
COMPLETED and FAILED? (there is also RUNNING, SUBMITTED, and NOT_FOUND):
-----
67: new CollectionAdminRequest.ReplaceNode(node2bdecommissioned,
emptyNode).processAsync("000", cloudClient);
68: CollectionAdminRequest.RequestStatus requestStatus =
CollectionAdminRequest.requestStatus("000");
69: boolean success = false;
70: for (int i = 0; i < 200; i++) {
71: CollectionAdminRequest.RequestStatusResponse rsp =
requestStatus.process(cloudClient);
72: if (rsp.getRequestStatus() == RequestStatusState.COMPLETED) {
73: success = true;
74: break;
75: }
76: assertFalse(rsp.getRequestStatus() == RequestStatusState.FAILED);
77: Thread.sleep(50);
78: }
79: assertTrue(success);
-----
--
Steve
www.lucidworks.com
> On May 30, 2017, at 5:58 PM, Mike Drob <[email protected]> wrote:
>
> Thanks, Steve.
>
>
> I've uploaded a failure log to
> http://home.apache.org/~mdrob/lucene-solr_6_6/failure
>
> My ulimit settings are:
> core file size (blocks, -c) 0
> data seg size (kbytes, -d) unlimited
>
> file size (blocks, -f) unlimited
>
> max locked memory (kbytes, -l) unlimited
>
> max memory size (kbytes, -m) unlimited
>
> open files (-n) 4096
>
> pipe size (512 bytes, -p) 1
>
> stack size (kbytes, -s) 8192
>
> cpu time (seconds, -t) unlimited
>
> max user processes (-u) 709
>
> virtual memory (kbytes, -v) unlimited
>
>
>
> Do you think that open files limit is too low? I didn't see any evidence in
> the log of that (could easily have missed it though).
>
>
> On Tue, May 30, 2017 at 4:32 PM, Steve Rowe <[email protected]> wrote:
> Hi Mike,
>
> > On May 30, 2017, at 5:07 PM, Mike Drob <[email protected]> wrote:
> >
> > Was able to reproduce on both the unpacked RC and on branch_6_6 in the repo
> > with
> >
> > ant test -Dtestcase=ReplaceNodeTest -Dtests.seed=545A8F7F914CAA60
> > -Dtests.asserts=true
> >
> > My environment:
> >
> > Apache Ant(TM) version 1.10.1 compiled on February 2 2017
> >
> > java version "1.8.0_131"
> >
> > Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
> >
> > Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
> >
> > Mac OS X 10.12.4
>
> The repro line above does not reproduce for me:
> * on Linux on branch_6_6 (Debian 8.8, Oracle JDK 1.8.0_77, Ant 1.9.4);
> * on MacOS 10.12.5, Oracle JDK 1.8.0_112, Ant 1.9.6.
>
> Mike, can you provide a failure log?
>
> I went looking for Jenkins failures of this test, and the only public ones I
> see are from Policeman Jenkins on OSX, all of them caused by "Too many open
> files".
>
> On my local Jenkins, I see ObjectTracker failures for this test (an
> unreleased object) on branch_6x, but the most recent was from mid-February.
>
> --
> Steve
> www.lucidworks.com
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]