Hitting solr throttling for ingestion
Hi Experts, We are using solr 8.4 (none cloud). When ingesting data with multiple processes to one core in a solr node, we are hitting some throttling: the max ingestion rate achieved is about 47K docs per second with 17 posting processes; each doc is about 250 bytes; the CPU utilization rate is only 20% and I/O about 6%. When increasing the posting processes, the posting will start failing. With solr 6.6, such issue does not happen: increasing posting processes will increase CPU/IO utilization rates to be close to 100% then start failing. Below are some relevant configurations specified in solrconfig.xml: 16 1024 10 10 4096 0.1 4096 ${solr.lock.type:native} true ${solr.ulog.dir:} ${solr.ulog.numVersionBuckets:65536} ${solr.autoCommit.maxTime:12} false ${solr.autoSoftCommit.maxTime:5000} It seems maxIndexingThreads is no longer supported in solr 8? Any idea to break the solr throttling? Thanks. Shushuai
Use stream result like a query (alternative to innerJoin)
Hi all, I’m looking for a way to query two collections and find documents that exist in both, I know this can be done with innerJoin streaming expression but I want to avoid it, since one of the collection streams can possibly have billions of results: Let’s say two collections are: deletedItems = [{deletedItemId: 1}, {deletedItemId: 2}...] items = [ { id: 1, name: "a" }, { id: 2, name: "b" }, { id: 3, name: "c" }. ] “deletedItems” contain a few documents compared to “items” collection (1mil vs 2-3 bil). If I query them both with a typical query in our system, deletedItems gives a few thousand results but items give tens/hundreds of millions. To use innerJoin, I have to stream the whole items result to worker node over network. Is there a way to avoid this, something like using “deletedItems” result as a query to “items” stream? Thanks in advance for the help Sent from Mail for Windows 10