[ https://issues.apache.org/jira/browse/SOLR-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358685#comment-14358685 ]
Shawn Heisey commented on SOLR-7191: ------------------------------------ [~dk]: The first thing I thought when I saw that you were trying 10K cores was that you would run out of threads unless you change the servlet container config. There is another limit looming after that ... the number of processes that you can create. A Linux/Unix system uses a 16-bit identifier for process IDs, so the absolute upper limit of processes (including all OS-related processes) is 65535. On Linux (and likely other Unix/Unix-like systems), threads take up a PID, although they are not visible to programs like "top" or "ps" without specific options. I have no idea what the situation is on Windows. On your patch: The first patch section removes a null check. This is never a good idea, because the fact that a null check exists tends to mean that the object identifier has the potential to be null, and presumably the first result on the trinary operator will fail (NullPointerException) somehow if the checked object actually is null. On the last patch section: Imposing a limit in the code without giving the user the option of configuring that limit will eventually cause problems for somebody. Also, someone who is really familiar with how the ZkContainer code works will need to let us know if reducing the number of threads might have unintended consequences. On LotsOfCores: SolrCloud brings a lot of complications to the situation, and when Erick did his work on that, he told all of us that trying to use transient cores in conjunction with SolrCloud would likely not work correctly. I think that the goal is to eventually make the two features coexist, but a lot of thought and work needs to happen. General observation: A patch like this is not likely to be backported to the 4.10 branch. That branch is in maintenance mode, so only trivial fixes or patches for major bugs will be committed, and new releases from the maintenance mode branch are not common. > Improve stability and startup performance of SolrCloud with thousands of > collections > ------------------------------------------------------------------------------------ > > Key: SOLR-7191 > URL: https://issues.apache.org/jira/browse/SOLR-7191 > Project: Solr > Issue Type: Bug > Components: SolrCloud > Affects Versions: 5.0 > Reporter: Shawn Heisey > Labels: performance, scalability > Attachments: SOLR-7191.patch, > lots-of-zkstatereader-updates-branch_5x.log > > > A user on the mailing list with thousands of collections (5000 on 4.10.3, > 4000 on 5.0) is having severe problems with getting Solr to restart. > I tried as hard as I could to duplicate the user setup, but I ran into many > problems myself even before I was able to get 4000 collections created on a > 5.0 example cloud setup. Restarting Solr takes a very long time, and it is > not very stable once it's up and running. > This kind of setup is very much pushing the envelope on SolrCloud performance > and scalability. It doesn't help that I'm running both Solr nodes on one > machine (I started with 'bin/solr -e cloud') and that ZK is embedded. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org