[ https://issues.apache.org/jira/browse/SOLR-10265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16194319#comment-16194319 ]
Cao Manh Dat commented on SOLR-10265: ------------------------------------- Maybe the problem here is Overseer is processing all messages in a single thread ( with a lot of IO blocking for every time we peek, poll messages and write new clusterstate )? So it will be wasted in case of using a powerful machine just for Overseer. The idea here is each collection has its own {{states.json}}, so messages for different collections can be processed and updated in parallel. It can be tricky to implement but If we want a cluster with 400k cores, we can not just use a single thread to process 1m6 messages. > Overseer can become the bottleneck in very large clusters > --------------------------------------------------------- > > Key: SOLR-10265 > URL: https://issues.apache.org/jira/browse/SOLR-10265 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Varun Thacker > > Let's say we have a large cluster. Some numbers: > - To ingest the data at the volume we want to I need roughly a 600 shard > collection. > - Index into the collection for 1 hour and then create a new collection > - For a 30 days retention window with these numbers we would end up wth > ~400k cores in the cluster > - Just a rolling restart of this cluster can take hours because the overseer > queue gets backed up. If a few nodes looses connectivity to ZooKeeper then > also we can end up with lots of messages in the Overseer queue > With some tests here are the two high level problems we have identified: > 1> How fast can the overseer process operations: > The rate at which the overseer processes events is too slow at this scale. > I ran {{OverseerTest#testPerformance}} which creates 10 collections ( 1 shard > 1 replica ) and generates 20k state change events. The test took 119 seconds > to run on my machine which means ~170 events a second. Let's say a server can > process 5x of my machine so 1k events a second. > Total events generated by a 400k replica cluster = 400k * 4 ( state changes > till replica become active ) = 1.6M / 1k events a second will be 1600 minutes. > Second observation was that the rate at which the overseer can process events > slows down when the number of items in the queue gets larger > I ran the same {{OverseerTest#testPerformance}} but changed the number of > events generated to 2000 instead. The test took only 5 seconds to run. So it > was a lot faster than the test run which generated 20k events > 2> State changes overwhelming ZK: > For every state change Solr is writing out a big state.json to zookeeper. > This can lead to the zookeeper transaction logs going out of control even > with auto purging etc set . > I haven't debugged why the transaction logs ran into terabytes without taking > into snapshots but this was my assumption based on the other problems we > observed -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org