[
https://issues.apache.org/jira/browse/SOLR-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358685#comment-14358685
]
Shawn Heisey commented on SOLR-7191:
------------------------------------
[~dk]:
The first thing I thought when I saw that you were trying 10K cores was that
you would run out of threads unless you change the servlet container config.
There is another limit looming after that ... the number of processes that you
can create. A Linux/Unix system uses a 16-bit identifier for process IDs, so
the absolute upper limit of processes (including all OS-related processes) is
65535. On Linux (and likely other Unix/Unix-like systems), threads take up a
PID, although they are not visible to programs like "top" or "ps" without
specific options. I have no idea what the situation is on Windows.
On your patch:
The first patch section removes a null check. This is never a good idea,
because the fact that a null check exists tends to mean that the object
identifier has the potential to be null, and presumably the first result on the
trinary operator will fail (NullPointerException) somehow if the checked object
actually is null.
On the last patch section: Imposing a limit in the code without giving the user
the option of configuring that limit will eventually cause problems for
somebody. Also, someone who is really familiar with how the ZkContainer code
works will need to let us know if reducing the number of threads might have
unintended consequences.
On LotsOfCores: SolrCloud brings a lot of complications to the situation, and
when Erick did his work on that, he told all of us that trying to use transient
cores in conjunction with SolrCloud would likely not work correctly. I think
that the goal is to eventually make the two features coexist, but a lot of
thought and work needs to happen.
General observation: A patch like this is not likely to be backported to the
4.10 branch. That branch is in maintenance mode, so only trivial fixes or
patches for major bugs will be committed, and new releases from the maintenance
mode branch are not common.
> Improve stability and startup performance of SolrCloud with thousands of
> collections
> ------------------------------------------------------------------------------------
>
> Key: SOLR-7191
> URL: https://issues.apache.org/jira/browse/SOLR-7191
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud
> Affects Versions: 5.0
> Reporter: Shawn Heisey
> Labels: performance, scalability
> Attachments: SOLR-7191.patch,
> lots-of-zkstatereader-updates-branch_5x.log
>
>
> A user on the mailing list with thousands of collections (5000 on 4.10.3,
> 4000 on 5.0) is having severe problems with getting Solr to restart.
> I tried as hard as I could to duplicate the user setup, but I ran into many
> problems myself even before I was able to get 4000 collections created on a
> 5.0 example cloud setup. Restarting Solr takes a very long time, and it is
> not very stable once it's up and running.
> This kind of setup is very much pushing the envelope on SolrCloud performance
> and scalability. It doesn't help that I'm running both Solr nodes on one
> machine (I started with 'bin/solr -e cloud') and that ZK is embedded.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]