Re: slow indexing when keys are verious
Hi all We have changed all solr configs and commit parameters that were mentioned by Shawn, but still - when inserting the same 300 documents from 20 threads we see no latency and when inserting different 300 docs from 20 threads it is very slow and no cpu/ram/disk/network are showing high metrics. I am wondering if the problem might be related to the fact that when inserting different 300 docs from each thread, the key is the only field that varied whilst the other fields are identical. So maybe many same values on the other fields for different keys cause the latency? As for latency that is related to doc routing, I don't see where it can affect us. Is it the zookeeper that might become a bottleneck? Thanks! Gilad -- View this message in context: http://lucene.472066.n3.nabble.com/slow-indexing-when-keys-are-verious-tp4327681p4329451.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: slow indexing when keys are verious
With hight entropy we see the same latency even when working with 1 shard. Assuming that even with 1 shard, Solr is still working hard to route the documents, what is the component that is responsible for the document routing? Is it the zookeeper? And how would you verify that that's the bottleneck? I can monitor zookeeper when having high and low entropy to see if it has different network stats. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/slow-indexing-when-keys-are-verious-tp4327681p4327724.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: slow indexing when keys are verious
Did you check the number of documents that end up on each shard in these two scenarios. My guess would be that - perhaps - low entropy key puts most of the documents into one shard and high-entropy key causes a lot more routing traffic with delay coming from the network communication and/or confirmation. Maybe even combined with the very low commit values. I am not a SolrCloud specialist. But that's one place I can see the entropy of the key becoming a factor. Regards, Alex. http://www.solr-start.com/ - Resources for Solr users, new and experienced On 30 March 2017 at 11:57, moscovig wrote: > Hi > > Yes it is solrCloud, we saw the same behavior with 1,2 and 4 shards. each > shard has 3 replicas. > > Each bulk contains 300 docs. We get approximately 800 docs inserted in a > second. > > ~6000 docs are being sent in an iteration by all loading threads. > we have 20 threads, each sending bulks of 300 docs. > > The loaders are waiting for the response, > which gets back after ~10 seconds for a loader. > > Thanks! > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/slow-indexing-when-keys-are-verious-tp4327681p4327714.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: slow indexing when keys are verious
Hi Yes it is solrCloud, we saw the same behavior with 1,2 and 4 shards. each shard has 3 replicas. Each bulk contains 300 docs. We get approximately 800 docs inserted in a second. ~6000 docs are being sent in an iteration by all loading threads. we have 20 threads, each sending bulks of 300 docs. The loaders are waiting for the response, which gets back after ~10 seconds for a loader. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/slow-indexing-when-keys-are-verious-tp4327681p4327714.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: slow indexing when keys are verious
Are you by any chance in the SolrCloud? And to confirm, the total number of documents is the same within any particular time period? Regards, Alex. http://www.solr-start.com/ - Resources for Solr users, new and experienced On 30 March 2017 at 10:50, moscovig wrote: > Thanks Shawn. > > We do specify > > > 3 > 30 > false > > > but I guess that still, the commitWithin 300 ms is a bad idea. > > We will definitely try playing with the configs you suggested. > > I still don't get the reason for a fast inserting when sending sets with > low keys cardinality. > But lets see what will happen after the changes. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/slow-indexing-when-keys-are-verious-tp4327681p4327703.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: slow indexing when keys are verious
Thanks Shawn. We do specify 3 30 false but I guess that still, the commitWithin 300 ms is a bad idea. We will definitely try playing with the configs you suggested. I still don't get the reason for a fast inserting when sending sets with low keys cardinality. But lets see what will happen after the changes. -- View this message in context: http://lucene.472066.n3.nabble.com/slow-indexing-when-keys-are-verious-tp4327681p4327703.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: slow indexing when keys are verious
On 3/30/2017 7:36 AM, moscovig wrote: > We are using solr 6.2.1 for server and solrj 6.2.0, with no explicit commits, > and - > > 3 > 30 > for autoCommit. > > Each request to solr contains 300 small documents with different keys., with > a commitWithin of 300 ms. I think the commitWithin is likely the problem here. As long as you are indexing, it will *try* to do a commit more than three times *every second*. Chances are that each commit is going to take longer than 300ms to complete, so the actual commit rate is probably lower, but effectively this means that as long as you're indexing, Solr is *constantly* doing commits that open a new Searcher. This kind of commit causes a large amount of disk I/O and CPU activity. You do not want to have an interval that low. I would suggest a value for commitWithin that's one or two minutes. Your autoCommit doesn't appear to set openSearcher to false. I recommend doing that, setting its maxTime to 60 seconds, and removing maxDocs. I would also add autoSoftCommit with a three minute (18) maxTime. It sounds like every request includes commitWithin ... the autoSoftCommit would just be there to catch anything that somehow didn't include the commitWithin. Very likely it would never be triggered as long as commitWithin is being used. You could choose to lower that time to two minutes and not use CommitWithin at all. Thanks, Shawn
slow indexing when keys are verious
Hi We are using solr 6.2.1 for server and solrj 6.2.0, with no explicit commits, and - 3 30 for autoCommit. Each request to solr contains 300 small documents with different keys., with a commitWithin of 300 ms. We have lots of requests coming in. The behavior is as the following: Fast - When all threads are using the same key generator, means that solr gets lots of similar documents in a second we get high throughput, and a very high cpu. Slow - When each thread is using different keys, at each iteration we get ~20 bulks with 300 docs each, means 6000 different keys. The throughput is terrible. We don't even see any special cpu or ram usage. What is the bottleneck in the slow scenario? What is the reason for that? Does solr have some kind of cache and when we send lots of similar keys, It is immediately updating the matching doc with no further operations? Why is the fast scenario is so light and fast? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/slow-indexing-when-keys-are-verious-tp4327681.html Sent from the Solr - User mailing list archive at Nabble.com.