date:20201122

Hitting solr throttling for ingestion

2020-11-22 Thread shushuai zhu

Hi Experts,
We are using solr 8.4 (none cloud). When ingesting data with multiple processes 
to one core in a solr node, we are hitting some throttling: the max ingestion 
rate achieved is about 47K docs per second with 17 posting processes; each doc 
is about 250 bytes; the CPU utilization rate is only 20% and I/O about 6%. When 
increasing the posting processes, the posting will start failing. With solr 
6.6, such issue does not happen: increasing posting processes will increase 
CPU/IO utilization rates to be close to 100% then start failing. 
Below are some relevant configurations specified in solrconfig.xml:
  16  
1024      10    10    
4096    0.1    4096    
${solr.lock.type:native}  
true
      ${solr.ulog.dir:}    ${solr.ulog.numVersionBuckets:65536}  

      ${solr.autoCommit.maxTime:12}    
false  
      ${solr.autoSoftCommit.maxTime:5000}  

It seems maxIndexingThreads is no longer supported in solr 8? Any idea to break 
the solr throttling? Thanks. 
Shushuai

Use stream result like a query (alternative to innerJoin)

2020-11-22 Thread ufuk yılmaz

Hi all,

I’m looking for a way to query two collections and find documents that exist in 
both, I know this can be done with innerJoin streaming expression but I want to 
avoid it, since one of the collection streams can possibly have billions of 
results:

Let’s say two collections are:

deletedItems = [{deletedItemId: 1}, {deletedItemId: 2}...]
items = [
{
id: 1,
name: "a"
},
{   id: 2,
name: "b"
},
{
id: 3,
name: "c"
}.
]

“deletedItems” contain a few documents compared to “items” collection (1mil vs 
2-3 bil). If I query them both with a typical query in our system, deletedItems 
gives a few thousand results but items give tens/hundreds of millions. To use 
innerJoin, I have to stream the whole items result to worker node over network.

Is there a way to avoid this, something like using “deletedItems” result as a 
query to “items” stream?

Thanks in advance for the help

Sent from Mail for Windows 10

Hitting solr throttling for ingestion

Use stream result like a query (alternative to innerJoin)

2 matches

Site Navigation

Mail list logo

Footer information