subject:"\*\*SPAM\*\* Re\: SolrCloud scaling\/optimization for high request rate"

Re: SPAM Re: SolrCloud scaling/optimization for high request rate

2018-11-12 Thread Ere Maijala

From what I've gathered and what's been my experience docValues should 
be enabled, but if you can't think of anything else, I'd try turning 
them off to see if it makes any difference. As far as I can recall 
turning them off will increase usage of Solr's own caches and that 
caused noticeable slowdown for us, but your mileage may vary.


--Ere

Sofiya Strochyk kirjoitti 12.11.2018 klo 14.23:
Thanks for the suggestion Ere. It looks like they are actually enabled; 
in schema file the field is only marked as stored (field name="_id" 
type="string" multiValued="false" indexed="true" required="true" 
stored="true") but the admin UI shows DocValues as enabled, so I guess 
this is by default. Is the solution to add "docValues=false" in the schema?



On 12.11.18 10:43, Ere Maijala wrote:

Sofiya,

Do you have docValues enabled for the id field? Apparently that can 
make a significant difference. I'm failing to find the relevant 
references right now, but just something worth checking out.


Regards,
Ere

Sofiya Strochyk kirjoitti 6.11.2018 klo 16.38:

Hi Toke,

sorry for the late reply. The query i wrote here is edited to hide 
production details, but I can post additional info if this helps.


I have tested all of the suggested changes none of these seem to make 
a noticeable difference (usually response time and other metrics 
fluctuate over time, and the changes caused by different parameters 
are smaller than the fluctuations). What this probably means is that 
the heaviest task is retrieving IDs by query and not fields by ID. 
I've also checked QTime logged for these types of operations, and it 
is much higher for "get IDs by query" than for "get fields by IDs 
list". What could be done about this?


On 05.11.18 14:43, Toke Eskildsen wrote:

So far no answer from Sofiya. That's fair enough: My suggestions might
have seemed random. Let me try to qualify them a bit.


What we have to work with is the redacted query
q===0===24=2.2=json
and an earlier mention that sorting was complex.

My suggestions were to try

1) Only request simple sorting by score

If this improves performance substantially, we could try and see if
sorting could be made more efficient: Reducing complexity, pre-
calculating numbers etc.

2) Reduce rows to 0
3) Increase rows to 100

This measures one aspect of retrieval. If there is a big performance
difference between these two, we can further probe if the problem is
the number or size of fields - perhaps there is a ton of stored text,
perhaps there is a bunch of DocValued fields?

4) Set fl=id only

This is a variant of 2+3 to do a quick check if it is the resolving of
specific field values that is the problem. If using fl=id speeds up
substantially, the next step would be to add fields gradually until
(hopefully) there is a sharp performance decrease.

- Toke Eskildsen, Royal Danish Library




--
Email Signature
*Sofiia Strochyk
*


s...@interlogic.com.ua 
InterLogic
www.interlogic.com.ua 

Facebook icon  LinkedIn 
icon 






--
Email Signature
*Sofiia Strochyk
*


s...@interlogic.com.ua 
InterLogic
www.interlogic.com.ua 

Facebook icon  LinkedIn 
icon 




--
Ere Maijala
Kansalliskirjasto / The National Library of Finland

Re: SPAM Re: SolrCloud scaling/optimization for high request rate

2018-11-12 Thread Sofiya Strochyk

Thanks for the suggestion Ere. It looks like they are actually enabled; 
in schema file the field is only marked as stored (field name="_id" 
type="string" multiValued="false" indexed="true" required="true" 
stored="true") but the admin UI shows DocValues as enabled, so I guess 
this is by default. Is the solution to add "docValues=false" in the schema?



On 12.11.18 10:43, Ere Maijala wrote:

Sofiya,

Do you have docValues enabled for the id field? Apparently that can 
make a significant difference. I'm failing to find the relevant 
references right now, but just something worth checking out.


Regards,
Ere

Sofiya Strochyk kirjoitti 6.11.2018 klo 16.38:

Hi Toke,

sorry for the late reply. The query i wrote here is edited to hide 
production details, but I can post additional info if this helps.


I have tested all of the suggested changes none of these seem to make 
a noticeable difference (usually response time and other metrics 
fluctuate over time, and the changes caused by different parameters 
are smaller than the fluctuations). What this probably means is that 
the heaviest task is retrieving IDs by query and not fields by ID. 
I've also checked QTime logged for these types of operations, and it 
is much higher for "get IDs by query" than for "get fields by IDs 
list". What could be done about this?


On 05.11.18 14:43, Toke Eskildsen wrote:

So far no answer from Sofiya. That's fair enough: My suggestions might
have seemed random. Let me try to qualify them a bit.


What we have to work with is the redacted query
q===0===24=2.2=json
and an earlier mention that sorting was complex.

My suggestions were to try

1) Only request simple sorting by score

If this improves performance substantially, we could try and see if
sorting could be made more efficient: Reducing complexity, pre-
calculating numbers etc.

2) Reduce rows to 0
3) Increase rows to 100

This measures one aspect of retrieval. If there is a big performance
difference between these two, we can further probe if the problem is
the number or size of fields - perhaps there is a ton of stored text,
perhaps there is a bunch of DocValued fields?

4) Set fl=id only

This is a variant of 2+3 to do a quick check if it is the resolving of
specific field values that is the problem. If using fl=id speeds up
substantially, the next step would be to add fields gradually until
(hopefully) there is a sharp performance decrease.

- Toke Eskildsen, Royal Danish Library




--
Email Signature
*Sofiia Strochyk
*


s...@interlogic.com.ua 
InterLogic
www.interlogic.com.ua 

Facebook icon  LinkedIn 
icon 






--
Email Signature
*Sofiia Strochyk
*


s...@interlogic.com.ua 
InterLogic
www.interlogic.com.ua 

Facebook icon  LinkedIn 
icon

Re: SPAM Re: SolrCloud scaling/optimization for high request rate

2018-10-29 Thread Sofiya Strochyk

Hi Walter,

yes, after some point it gets really slow (before reaching 100% CPU
usage), so unless G1 or further tuning helps i guess we will have to add
more replicas or shards.

On 26.10.18 20:57, Walter Underwood wrote:

The G1 collector should improve 95th percentile performance, because it limits
the length of pauses.

With the CMS/ParNew collector, I ran very large Eden spaces, 2 Gb out of an 8
Gb heap. Nearly all of the allocations in Solr have the lifetime of one
request, so you don’t want any of those allocations to be promoted to tenured
space. Tenured space should be mostly cache evictions and should grow slowly.

For our clusters, when we hit 70% CPU, we add more CPUs. If we drive Solr much
harder than that, it goes into congestion collapse. That is totally expected.
When you use all of a resource, things get slow. Request more than all of a
resource and things get very, very slow.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)

On Oct 26, 2018, at 10:21 AM, Sofiya Strochyk wrote:

Thanks Erick,

1. We already use Solr 7.5, upgraded some of our nodes only recently to see if
this eliminates the difference in performance (it doesn't, but I'll test and
see if the situation with replicas syncing/recovery has improved since then)
2. Yes, we only open searcher once every 30 minutes so it is not an NRT case. But it
is only recommended

to use NRT/TLOG/TLOG+PULL replica types together (currently we have all NRT
replicas), would you suggest we change leaders to TLOG and slaves to PULL? And this
would also eliminate the redundancy provided by replication because PULL replicas
can't become leaders, right?
3. Yes but then it would be reflected in iowait metric, which is almost always
near zero on our servers. Is there anything else Solr could be waiting for, and
is there a way to check it? If we are going to need even more servers for
faster response and faceting then there must be a way to know which resource we
should get more of.
5. Yes, docValues are enabled for the fields we sort on (except score which is an internal field); _version_
is left at default i think (type="long" indexed="false" stored="false", and
it's also marked as having DocValues in the admin UI)
6. QPS and response time seem to be about the same with and without indexing;
server load also looks about the same so i assume indexing doesn't take up a
lot of resources (a little strange, but possible if it is limited by network or
some other things from point 3).

7. Will try using G1 if nothing else helps... Haven't tested it yet because it
is considered unsafe and i'd like to have all other options exhausted first.
(And even then it is probably going to be a minor improvement? How much more
efficient could it possibly be?)

On 26.10.18 19:18, Erick Erickson wrote:

Some ideas:

1> What version of Solr? Solr 7.3 completely re-wrote Leader Initiated
Recovery and 7.5 has other improvements for recovery, we're hoping
that the recovery situation is much improved.

2> In the 7x code line, there are TLOG and PULL replicas. As of 7.5,
you can set up so the queries are served by replica type, see:
https://issues.apache.org/jira/browse/SOLR-11982
. This might help you
out. This moves all the indexing to the leader and reserves the rest
of the nodes for queries only, using old-style replication. I'm
assuming from your commit rate that latency between when updates
happen and the updates are searchable isn't a big concern.

3> Just because the CPU isn't 100% doesn't mean Solr is running flat
out. There's I/O waits while sub-requests are serviced and the like.

4> As for how to add faceting without slowing down querying, there's
no way. Extra work is extra work. Depending on _what_ you're faceting
on, you may be able to do some tricks, but without details it's hard
to say. You need to get the query rate target first though ;)

5> OOMs Hmm, you say you're doing complex sorts, are all fields
involved in sorts docValues=true? They have to be to be used in
function queries of course, but what about any fields that aren't?
What about your _version_ field?

6> bq. "...indexed 2 times/day, as fast as the SOLR allows..." One
experiment I'd run is to test your QPS rate when there was _no_
indexing going on. That would give you a hint as to whether the
TLOG/PULL configuration would be helpful. There's been talk of
separate thread pools for indexing and querying to give queries a
better shot at the CPU, but that's not in place yet.

7> G1GC may also help rather than CMS, but as you're well aware GC
tuning "is more art than science" ;).

Good luck!
Erick

On Fri, Oct 26, 2018 at 8:55 AM Sofiya Strochyk
wrote:

Hi everyone,

We have a SolrCloud setup with the following configuration:

Re: SPAM Re: SolrCloud scaling/optimization for high request rate

Re: SPAM Re: SolrCloud scaling/optimization for high request rate

Re: SPAM Re: SolrCloud scaling/optimization for high request rate

3 matches

Site Navigation

Mail list logo

Footer information

Re: **SPAM** Re: SolrCloud scaling/optimization for high request rate

Re: **SPAM** Re: SolrCloud scaling/optimization for high request rate

Re: **SPAM** Re: SolrCloud scaling/optimization for high request rate

3 matches

Mail list logo

Re: SPAM Re: SolrCloud scaling/optimization for high request rate

Re: SPAM Re: SolrCloud scaling/optimization for high request rate

Re: SPAM Re: SolrCloud scaling/optimization for high request rate