[ 
https://issues.apache.org/jira/browse/SOLR-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358685#comment-14358685
 ] 

Shawn Heisey commented on SOLR-7191:
------------------------------------

[~dk]:

The first thing I thought when I saw that you were trying 10K cores was that 
you would run out of threads unless you change the servlet container config.  
There is another limit looming after that ... the number of processes that you 
can create.  A Linux/Unix system uses a 16-bit identifier for process IDs, so 
the absolute upper limit of processes (including all OS-related processes) is 
65535.  On Linux (and likely other Unix/Unix-like systems), threads take up a 
PID, although they are not visible to programs like "top" or "ps" without 
specific options.  I have no idea what the situation is on Windows.

On your patch:

The first patch section removes a null check.  This is never a good idea, 
because the fact that a null check exists tends to mean that the object 
identifier has the potential to be null, and presumably the first result on the 
trinary operator will fail (NullPointerException) somehow if the checked object 
actually is null.

On the last patch section: Imposing a limit in the code without giving the user 
the option of configuring that limit will eventually cause problems for 
somebody.  Also, someone who is really familiar with how the ZkContainer code 
works will need to let us know if reducing the number of threads might have 
unintended consequences.

On LotsOfCores: SolrCloud brings a lot of complications to the situation, and 
when Erick did his work on that, he told all of us that trying to use transient 
cores in conjunction with SolrCloud would likely not work correctly.  I think 
that the goal is to eventually make the two features coexist, but a lot of 
thought and work needs to happen.

General observation:  A patch like this is not likely to be backported to the 
4.10 branch.  That branch is in maintenance mode, so only trivial fixes or 
patches for major bugs will be committed, and new releases from the maintenance 
mode branch are not common.


> Improve stability and startup performance of SolrCloud with thousands of 
> collections
> ------------------------------------------------------------------------------------
>
>                 Key: SOLR-7191
>                 URL: https://issues.apache.org/jira/browse/SOLR-7191
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 5.0
>            Reporter: Shawn Heisey
>              Labels: performance, scalability
>         Attachments: SOLR-7191.patch, 
> lots-of-zkstatereader-updates-branch_5x.log
>
>
> A user on the mailing list with thousands of collections (5000 on 4.10.3, 
> 4000 on 5.0) is having severe problems with getting Solr to restart.
> I tried as hard as I could to duplicate the user setup, but I ran into many 
> problems myself even before I was able to get 4000 collections created on a 
> 5.0 example cloud setup.  Restarting Solr takes a very long time, and it is 
> not very stable once it's up and running.
> This kind of setup is very much pushing the envelope on SolrCloud performance 
> and scalability.  It doesn't help that I'm running both Solr nodes on one 
> machine (I started with 'bin/solr -e cloud') and that ZK is embedded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to