Yeah, I can’t see any “Too many open files” messages in your log. From your log:
----- [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=ReplaceNodeTest -Dtests.method=test -Dtests.seed=545A8F7F914CAA60 -Dtests.slow=true -Dtests.locale=zh-HK -Dtests.timezone=Indian/Cocos -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] FAILURE 65.7s | ReplaceNodeTest.test <<< [junit4] > Throwable #1: java.lang.AssertionError [junit4] > at __randomizedtesting.SeedInfo.seed([545A8F7F914CAA60:DC0EB0A53FB0C798]:0) [junit4] > at org.apache.solr.cloud.ReplaceNodeTest.test(ReplaceNodeTest.java:79) ----- I tried again, and ^^ doesn't reproduce on my macbook pro. Looks like this is a (roughly) 10-second timeout (200 x 50ms) - maybe the operation is just taking longer than that? - could you try increasing the 200 below to a larger number? maybe also check for other statuses than just COMPLETED and FAILED? (there is also RUNNING, SUBMITTED, and NOT_FOUND): ----- 67: new CollectionAdminRequest.ReplaceNode(node2bdecommissioned, emptyNode).processAsync("000", cloudClient); 68: CollectionAdminRequest.RequestStatus requestStatus = CollectionAdminRequest.requestStatus("000"); 69: boolean success = false; 70: for (int i = 0; i < 200; i++) { 71: CollectionAdminRequest.RequestStatusResponse rsp = requestStatus.process(cloudClient); 72: if (rsp.getRequestStatus() == RequestStatusState.COMPLETED) { 73: success = true; 74: break; 75: } 76: assertFalse(rsp.getRequestStatus() == RequestStatusState.FAILED); 77: Thread.sleep(50); 78: } 79: assertTrue(success); ----- -- Steve www.lucidworks.com > On May 30, 2017, at 5:58 PM, Mike Drob <md...@apache.org> wrote: > > Thanks, Steve. > > > I've uploaded a failure log to > http://home.apache.org/~mdrob/lucene-solr_6_6/failure > > My ulimit settings are: > core file size (blocks, -c) 0 > data seg size (kbytes, -d) unlimited > > file size (blocks, -f) unlimited > > max locked memory (kbytes, -l) unlimited > > max memory size (kbytes, -m) unlimited > > open files (-n) 4096 > > pipe size (512 bytes, -p) 1 > > stack size (kbytes, -s) 8192 > > cpu time (seconds, -t) unlimited > > max user processes (-u) 709 > > virtual memory (kbytes, -v) unlimited > > > > Do you think that open files limit is too low? I didn't see any evidence in > the log of that (could easily have missed it though). > > > On Tue, May 30, 2017 at 4:32 PM, Steve Rowe <sar...@gmail.com> wrote: > Hi Mike, > > > On May 30, 2017, at 5:07 PM, Mike Drob <md...@apache.org> wrote: > > > > Was able to reproduce on both the unpacked RC and on branch_6_6 in the repo > > with > > > > ant test -Dtestcase=ReplaceNodeTest -Dtests.seed=545A8F7F914CAA60 > > -Dtests.asserts=true > > > > My environment: > > > > Apache Ant(TM) version 1.10.1 compiled on February 2 2017 > > > > java version "1.8.0_131" > > > > Java(TM) SE Runtime Environment (build 1.8.0_131-b11) > > > > Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode) > > > > Mac OS X 10.12.4 > > The repro line above does not reproduce for me: > * on Linux on branch_6_6 (Debian 8.8, Oracle JDK 1.8.0_77, Ant 1.9.4); > * on MacOS 10.12.5, Oracle JDK 1.8.0_112, Ant 1.9.6. > > Mike, can you provide a failure log? > > I went looking for Jenkins failures of this test, and the only public ones I > see are from Policeman Jenkins on OSX, all of them caused by "Too many open > files". > > On my local Jenkins, I see ObjectTracker failures for this test (an > unreleased object) on branch_6x, but the most recent was from mid-February. > > -- > Steve > www.lucidworks.com > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org