[
https://issues.apache.org/jira/browse/SOLR-11078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16369875#comment-16369875
]
Adrien Grand commented on SOLR-11078:
-------------------------------------
{quote}Do we know why the point fields are less performant when it comes to
simple field:value queries?
{quote}
Yes. You are right that both the terms dictionary and the BKD tree have a
tree-like structure. The important difference is that terms store postings
lists on the leaves, while BKD trees store (blocks of) (docId, value) pairs. If
you run a query on a single field, BKD trees need to sort the list of matching
doc ids in order to return an iterator. If the query matches few documents,
this doesn't matter, but if it matches many documents it does. Also postings
include skip data, so if your query is intersected with another query, then
Lucene will be able skip sections of the postings that don't matter. BKD trees
don't have any way to make query execution more efficient with intersections,
even though {{IndexOrDocValueQuery}} mitigates the issue.
To be clear, single-value queries are slower with BKD trees, but only if this
query matches many documents. If it matches few documents, it's fast in both
cases.
{quote}Points were introduced to Lucene in the 6.x timeframe, I think it was
relatively early.
{quote}
Points were introduced in Lucene 6.0, it was the main release highlight with
the switch to BM25 by default. They made high-cardinality numeric fields much
better: often 30% faster range queries, 70% faster indexing, 60% less disk
usage and 80% less memory usage.
{quote}Solr has merely been reacting to realities forced on it by changes in
Lucene.
{quote}
Solr could have stayed on legacy numerics, this is something that can be
supported on top of Lucene, it doesn't need to be in Lucene. But in this
particular case, I do think that the right decision was made to deprecate trie
fields and recommend users to switch to points instead. I hope that Solr will
soon stop using other legacy Lucene APIs like SlowCompositeReaderWrapper. This
might make faceting a bit slower in some cases, but this would also make Solr
perform _much_ better in the NRT case.
In my opinion we should just update documentation to say that id/enum fields
that never need ranges should use StrField instead. I know it's not popular
(see above comments) but in that case then let's just fill the gap with a new
field, something like NumericIdField, that works exactly like StrField but uses
an encoding that preserves the numeric order.
{quote}I do not know if the issues with Points can be fixed without reducing
the performance of the things Points are good at.
{quote}
One easy way to do this would be to index both with points and terms. But in my
opinion this would be wasteful, it's easier to make the decision when designing
the schema of whether a field stores quantities/measures or ids/enums.
As a final note, some of these slow queries could be mitigated with
{{IndexOrDocValueQuery}}, maybe we should add support for it in Solr's point
fields? It allows to use doc values rather than points when the query on the
BKD tree is not the one that would lead iteration.
> Solr query performance degradation since Solr 6.4.2
> ---------------------------------------------------
>
> Key: SOLR-11078
> URL: https://issues.apache.org/jira/browse/SOLR-11078
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: search, Server
> Affects Versions: 6.6, 7.1
> Environment: * CentOS 7.3 (Linux zasolrm03 3.10.0-514.26.2.el7.x86_64
> #1 SMP Tue Jul 4 15:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux)
> * Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)
> * 4 CPU, 10GB RAM
> Running Solr 6.6.0 with the following JVM settings:
> java -server -Xms4G -Xmx4G -XX:NewRatio=3 -XX:SurvivorRatio=4
> -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC
> -XX:+UseParNewGC -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4
> -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m
> -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50
> -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled
> -XX:+ParallelRefProcEnabled -verbose:gc -XX:+PrintHeapAtGC
> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
> -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime
> -Xloggc:/home/prodza/solrserver/../logs/solr_gc.log -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M
> -Dsolr.log.dir=/home/prodza/solrserver/../logs -Djetty.port=8983
> -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Duser.timezone=SAST
> -Djetty.home=/home/prodza/solrserver/server
> -Dsolr.solr.home=/home/prodza/solrserver/../solr
> -Dsolr.install.dir=/home/prodza/solrserver
> -Dlog4j.configuration=file:/home/prodza/solrserver/../config/log4j.properties
> -Xss256k -Xss256k -Dsolr.log.muteconsole
> -XX:OnOutOfMemoryError=/home/prodza/solrserver/bin/oom_solr.sh 8983
> /home/prodza/solrserver/../logs -jar start.jar --module=http
> Reporter: bidorbuy
> Priority: Major
> Attachments: compare-6.4.2-6.6.0.png, core-admin-tradesearch.png,
> jvm-stats.png, schema.xml, screenshot-1.png, screenshot-2.png,
> screenshot-3.png, solr-6-4-2-schema.xml, solr-6-4-2-solrconfig.xml,
> solr-7-1-0-managed-schema, solr-7-1-0-solrconfig.xml, solr-71-vs-64.png,
> solr-sample-warning-log.txt, solr.in.sh, solrconfig.xml
>
>
> We are currently running 2 separate Solr servers - refer to screenshots:
> * zasolrm02 is running on Solr 6.4.2
> * zasolrm03 is running on Solr 6.6.0
> Both servers have the same OS / JVM configuration and are using their own
> indexes. We round-robin load-balance through our Tomcats and notice that
> Since Solr 6.4.2 performance has dropped. We have two indices per server
> "searchsuggestions" and "tradesearch". There is a noticeable drop in
> performance since Solr 6.4.2.
> I am not sure if this is perhaps related to metric collation or other
> underlying changes. I am not sure if other high transaction users have
> noticed similar issues.
> *1) zasolrm03 (6.6.0) is almost twice as slow on the tradesearch index:*
> !compare-6.4.2-6.6.0.png!
> *2) This is also visible in the searchsuggestion index:*
> !screenshot-1.png!
> *3) The Tradesearch index shows the biggest difference:*
> !screenshot-2.png!
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]