[jira] [Commented] (SOLR-11078) Solr query performance degradation since Solr 6.4.2

Adrien Grand (JIRA) Tue, 20 Feb 2018 02:06:25 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-11078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16369875#comment-16369875
 ]


Adrien Grand commented on SOLR-11078:
-------------------------------------

{quote}Do we know why the point fields are less performant when it comes to 
simple field:value queries?
{quote}
Yes. You are right that both the terms dictionary and the BKD tree have a 
tree-like structure. The important difference is that terms store postings 
lists on the leaves, while BKD trees store (blocks of) (docId, value) pairs. If 
you run a query on a single field, BKD trees need to sort the list of matching 
doc ids in order to return an iterator. If the query matches few documents, 
this doesn't matter, but if it matches many documents it does. Also postings 
include skip data, so if your query is intersected with another query, then 
Lucene will be able skip sections of the postings that don't matter. BKD trees 
don't have any way to make query execution more efficient with intersections, 
even though {{IndexOrDocValueQuery}} mitigates the issue.

To be clear, single-value queries are slower with BKD trees, but only if this 
query matches many documents. If it matches few documents, it's fast in both 
cases.
{quote}Points were introduced to Lucene in the 6.x timeframe, I think it was 
relatively early.
{quote}
Points were introduced in Lucene 6.0, it was the main release highlight with 
the switch to BM25 by default. They made high-cardinality numeric fields much 
better: often 30% faster range queries, 70% faster indexing, 60% less disk 
usage and 80% less memory usage.
{quote}Solr has merely been reacting to realities forced on it by changes in 
Lucene.
{quote}
Solr could have stayed on legacy numerics, this is something that can be 
supported on top of Lucene, it doesn't need to be in Lucene. But in this 
particular case, I do think that the right decision was made to deprecate trie 
fields and recommend users to switch to points instead. I hope that Solr will 
soon stop using other legacy Lucene APIs like SlowCompositeReaderWrapper. This 
might make faceting a bit slower in some cases, but this would also make Solr 
perform _much_ better in the NRT case.

In my opinion we should just update documentation to say that id/enum fields 
that never need ranges should use StrField instead. I know it's not popular 
(see above comments) but in that case then let's just fill the gap with a new 
field, something like NumericIdField, that works exactly like StrField but uses 
an encoding that preserves the numeric order.
{quote}I do not know if the issues with Points can be fixed without reducing 
the performance of the things Points are good at.
{quote}
One easy way to do this would be to index both with points and terms. But in my 
opinion this would be wasteful, it's easier to make the decision when designing 
the schema of whether a field stores quantities/measures or ids/enums.

As a final note, some of these slow queries could be mitigated with 
{{IndexOrDocValueQuery}}, maybe we should add support for it in Solr's point 
fields? It allows to use doc values rather than points when the query on the 
BKD tree is not the one that would lead iteration.

> Solr query performance degradation since Solr 6.4.2
> ---------------------------------------------------
>
>                 Key: SOLR-11078
>                 URL: https://issues.apache.org/jira/browse/SOLR-11078
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: search, Server
>    Affects Versions: 6.6, 7.1
>         Environment: * CentOS 7.3 (Linux zasolrm03 3.10.0-514.26.2.el7.x86_64 
> #1 SMP Tue Jul 4 15:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux)
> * Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)
> * 4 CPU, 10GB RAM
> Running Solr 6.6.0 with the following JVM settings:
> java -server -Xms4G -Xmx4G -XX:NewRatio=3 -XX:SurvivorRatio=4 
> -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC 
> -XX:+UseParNewGC -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 
> -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m 
> -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 
> -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled 
> -XX:+ParallelRefProcEnabled -verbose:gc -XX:+PrintHeapAtGC 
> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps 
> -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime 
> -Xloggc:/home/prodza/solrserver/../logs/solr_gc.log -XX:+UseGCLogFileRotation 
> -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M 
> -Dsolr.log.dir=/home/prodza/solrserver/../logs -Djetty.port=8983 
> -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Duser.timezone=SAST 
> -Djetty.home=/home/prodza/solrserver/server 
> -Dsolr.solr.home=/home/prodza/solrserver/../solr 
> -Dsolr.install.dir=/home/prodza/solrserver 
> -Dlog4j.configuration=file:/home/prodza/solrserver/../config/log4j.properties 
> -Xss256k -Xss256k -Dsolr.log.muteconsole 
> -XX:OnOutOfMemoryError=/home/prodza/solrserver/bin/oom_solr.sh 8983 
> /home/prodza/solrserver/../logs -jar start.jar --module=http
>            Reporter: bidorbuy
>            Priority: Major
>         Attachments: compare-6.4.2-6.6.0.png, core-admin-tradesearch.png, 
> jvm-stats.png, schema.xml, screenshot-1.png, screenshot-2.png, 
> screenshot-3.png, solr-6-4-2-schema.xml, solr-6-4-2-solrconfig.xml, 
> solr-7-1-0-managed-schema, solr-7-1-0-solrconfig.xml, solr-71-vs-64.png, 
> solr-sample-warning-log.txt, solr.in.sh, solrconfig.xml
>
>
> We are currently running 2 separate Solr servers - refer to screenshots:
> * zasolrm02 is running on Solr 6.4.2
> * zasolrm03 is running on Solr 6.6.0
> Both servers have the same OS / JVM configuration and are using their own 
> indexes. We round-robin load-balance through our Tomcats and notice that 
> Since Solr 6.4.2 performance has dropped. We have two indices per server 
> "searchsuggestions" and "tradesearch". There is a noticeable drop in 
> performance since Solr 6.4.2.
> I am not sure if this is perhaps related to metric collation or other 
> underlying changes. I am not sure if other high transaction users have 
> noticed similar issues.
> *1) zasolrm03 (6.6.0) is almost twice as slow on the tradesearch index:*
> !compare-6.4.2-6.6.0.png!
> *2) This is also visible in the searchsuggestion index:*
> !screenshot-1.png!
> *3) The Tradesearch index shows the biggest difference:*
> !screenshot-2.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-11078) Solr query performance degradation since Solr 6.4.2

Reply via email to