Hoss Man created SOLR-10234: ------------------------------- Summary: "Too many open files" in distrib tests due to fixed HandleLimitFS (regardless of num nodes in test) Key: SOLR-10234 URL: https://issues.apache.org/jira/browse/SOLR-10234 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Reporter: Hoss Man
I just got an failure from BasicDistributedZkTest on master (acb185b2dc7522e6a4fa55d54e82910736668f8d) that caught my attention -- the reported failure was "Remote error message: Exception writing document id 57 to the index; possible analysis error.", but digging intothe logs the root cause was "Too many open files" coming from the mock {{HandleLimitFS}} class we have... {noformat} [junit4] 2> 495598 ERROR (qtp155652658-4405) [ ] o.a.s.h.RequestHandlerBase java.nio.file.FileSystemException: /home/jenkins/lucene-solr/solr/build/solr-core/test/J1/temp/solr.cloud.BasicDistributedZkTest_8D04773C07230D3B-001/index-NIOFSDirectory-002/_o_Memory_0.mdvm: Too many open files [junit4] 2> at org.apache.lucene.mockfile.HandleLimitFS.onOpen(HandleLimitFS.java:48) [junit4] 2> at org.apache.lucene.mockfile.HandleTrackingFS.callOpenHook(HandleTrackingFS.java:81) [junit4] 2> at org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:160) [junit4] 2> at java.base/java.nio.file.Files.newOutputStream(Files.java:218) [junit4] 2> at org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:413) [junit4] 2> at org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:409) [junit4] 2> at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253) [junit4] 2> at org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:665) ... [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=BasicDistributedZkTest -Dtests.method=test -Dtests.seed=8D04773C07230D3B -Dtests.slow=true -Dtests.locale=en-ER -Dtests.timezone=Europe/Volgograd -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] ERROR 259s J1 | BasicDistributedZkTest.test <<< {noformat} ...what concerns me in particular about this is is that it's coming from a distributed test, involving many multiple "nodes" (all using the same randomized similarity) writting to the same "file://" filesystem in the same JVM -- but {{TestRuleTemporaryFilesCleanup}} seems to be initializing the filesystem with a fixed {{MAX_OPEN_FILES = 2048}} So perhaps all (distributed/cloud) Solr tests should use {{SuppressFileSystems}} to ensure we don't get false failures like this? Or perhaps we should enhance the way we use {{HandleLimitFS}} in our test scaffolding so that we can give each solr node it's own mock filesystem? (with it's own MAX_OPEN_FILES limit?) -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org