Hoss Man created SOLR-10234:
-------------------------------

             Summary: "Too many open files" in distrib tests due to fixed 
HandleLimitFS (regardless of num nodes in test)
                 Key: SOLR-10234
                 URL: https://issues.apache.org/jira/browse/SOLR-10234
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: Hoss Man


I just got an failure from BasicDistributedZkTest on master 
(acb185b2dc7522e6a4fa55d54e82910736668f8d) that caught my attention -- the 
reported failure was "Remote error message: Exception writing document id 57 to 
the index; possible analysis error.", but digging intothe logs the root cause 
was "Too many open files" coming from the mock
{{HandleLimitFS}} class we have...

{noformat}

   [junit4]   2> 495598 ERROR (qtp155652658-4405) [    ] 
o.a.s.h.RequestHandlerBase java.nio.file.FileSystemException: 
/home/jenkins/lucene-solr/solr/build/solr-core/test/J1/temp/solr.cloud.BasicDistributedZkTest_8D04773C07230D3B-001/index-NIOFSDirectory-002/_o_Memory_0.mdvm:
 Too many open files
   [junit4]   2>        at 
org.apache.lucene.mockfile.HandleLimitFS.onOpen(HandleLimitFS.java:48)
   [junit4]   2>        at 
org.apache.lucene.mockfile.HandleTrackingFS.callOpenHook(HandleTrackingFS.java:81)
   [junit4]   2>        at 
org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:160)
   [junit4]   2>        at 
java.base/java.nio.file.Files.newOutputStream(Files.java:218)
   [junit4]   2>        at 
org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:413)
   [junit4]   2>        at 
org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:409)
   [junit4]   2>        at 
org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253)
   [junit4]   2>        at 
org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:665)
...
   [junit4]   2> NOTE: reproduce with: ant test  
-Dtestcase=BasicDistributedZkTest -Dtests.method=test 
-Dtests.seed=8D04773C07230D3B -Dtests.slow=true -Dtests.locale=en-ER 
-Dtests.timezone=Europe/Volgograd -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8
   [junit4] ERROR    259s J1 | BasicDistributedZkTest.test <<<
{noformat}

...what concerns me in particular about this is is that it's coming from a 
distributed test, involving many multiple "nodes" (all using the same 
randomized similarity) writting to the same "file://" filesystem in the same 
JVM -- but {{TestRuleTemporaryFilesCleanup}} seems to be initializing the 
filesystem with a fixed {{MAX_OPEN_FILES = 2048}}

So perhaps all (distributed/cloud) Solr tests should use 
{{SuppressFileSystems}} to ensure we don't get false failures like this?

Or perhaps we should enhance the way we use {{HandleLimitFS}} in our test 
scaffolding so that we can give each solr node it's own mock filesystem? (with 
it's own MAX_OPEN_FILES limit?)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to