[jira] [Comment Edited] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149190#comment-16149190 ] Yonik Seeley edited comment on SOLR-8096 at 8/31/17 4:48 PM: - bq. For them, enabling docValues, which is supposed to be the magic bullet for faceting performance, causes performance to get even worse. Yep. DocValues is a better default because it uses little heap memory compared to the FieldCache. But in general, docValues can be slower than the old 4.x fieldCache, and definitely slower than UnInvertedField for multi-valued faceting. For dense fields, the newest iterator-based docValues is also somewhat slower than the old docValues. This isn't just Solr... for example, sorting on dense docValues fields is also slower since the cut-over to iterator docValues. Anyway, specific use-cases can pretty much always be sped up, but there's no magic bullet and we need to tackle them one at a time. For example, facet.method=uif was added to re-enable access to the UnInvertedField faceting method. Another difference is top-level fieldCache vs per-segment. For strings, top-level is faster, but it needs to be rebuilt from scratch each time the index changes. per-segment needs to merge string ords from different segments (hence it introduces some overhead and is thus slower), but only new segments need to have to be rebuilt when the index changes (better for NRT). But ability to do top-level fieldCache was removed in Lucene (some people are of the opinion that no caches should be top-level), hence some use-cases will be slower. was (Author: ysee...@gmail.com): bq. For them, enabling docValues, which is supposed to be the magic bullet for faceting performance, causes performance to get even worse. Yep. DocValues is a better default because it uses little heap memory compared to the FieldCache. But in general, docValues can be slower than the old 4.x fieldCache, and definitely slower than UnInvertedField for multi-valued faceting. For dense fields, the newest iterator-based docValues is also somewhat slower than the old docValues. This isn't just Solr... for example, sorting on dense docValues fields is also slower since the cut-over to iterator docValues. Anyway, specific use-cases can pretty much always be sped up, but there's no magic bullet and we need to tackle them one at a time. For example, facet.method=uif was added to re-enable access to the UnInvertedField faceting method. > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0 >Reporter: Yonik Seeley >Priority: Critical > Attachments: facetcache.diff, simple_facets.diff > > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was removed as part of LUCENE-5666, causing > severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. > edit: removed "secret" adverb by request -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960982#comment-15960982 ] Michael Gibney edited comment on SOLR-8096 at 4/7/17 3:45 PM: -- Of course I can't speak to the status of this issue from other folks' perspectives, but I did observe a couple of things that I wanted mention in case anyone else might find them useful. Performance has actually been acceptable for me, but implementing a simple cache for facets definitely improved performance (in my deployment) for common queries (see [^facetcache.diff]). A couple of observations: 1. Based on the fact that fields were being faceted using DocValues faceting, I assumed (incorrectly) that docValues must have been enabled. In fact, docValues are _not_ enabled by default; IndexReaders are wrapped in UninvertingReaders that (on demand) uninvert non-docValues-enabled fields in order to present a docValues-like interface for faceting. 2. DocValues cannot yet be enabled on analyzed fields, so if you require this, you'll be dealing with the UninvertingReader; you may be interested in SOLR-8362. 3. {{DocValuesFacets}} iterates over all the documents in a result set for _every_ query. Regardless of the underlying implementation, this is bound to be relatively expensive for result sets containing large numbers of documents. Furthermore, "Result sets containing large numbers of documents" constitute a fairly large proportion of common user interactions (landing page with faceting over the whole index presents users with a handful of clickable top-level filters, each of which covers a large portion of the index). Thus, faceting seems to be a good candidate for caching, regardless of the underlying implementation of the DocValues interface. Accordingly, I've attached a stab at a patch ([^facetcache.diff]) to {{DocValuesFacets}} to support a cache intended to speed dv faceting over high-cardinality docsets. Combined with a handful of warming queries, I've seen much improved performance for common requests. In addition to the patch, you must configure your solrconfig.xml with, e.g., {code:xml} {code} I tried to make the docset cardinality threshold for caching configurable at the field level, but haven't yet figured out how to pass in the configuration (you will see my unsuccessful attempts reflected in the changes to {{SimpleFacets}} -- with the patch in current state, if you want to adjust this parameter, it can only be done by changing the hardcoded default of 5000 (a reasonable value would probably be _much_ higher) for {{SimpleFacets.DEFAULT_PERSEG_FACET_CACHE_THRESHOLD}}). Just to clarify, this comment is not a suggestion to skip closing this issue, and I'm sorry if it's a bit off-topic; I hope it strikes people as related enough to justify posting here. was (Author: mgibney): Of course I can't speak to the status of this issue from other folks' perspectives, but I did observe a couple of things that I wanted mention in case anyone else might find them useful. Performance has actually been acceptable for me, but implementing a simple cache for facets definitely improved performance (in my deployment) for common queries (see [^facetcache.diff]). A couple of observations: 1. Based on the fact that fields were being faceted using DocValues faceting, I assumed (incorrectly) that docValues must have been enabled. In fact, docValues are _not_ enabled by default; IndexReaders are wrapped in UninvertingReaders that (on demand) uninvert non-docValues-enabled fields in order to present a docValues-like interface for faceting. 2. DocValues cannot yet be enabled on analyzed fields, so if you require this, you'll be dealing with the UninvertingReader; you may be interested in SOLR-8362. 3. {{DocValuesFacets}} iterates over all the documents in a result set for _every_ query. Regardless of the underlying implementation, this is bound to be relatively expensive for result sets containing large numbers of documents. Furthermore, "Result sets containing large numbers of documents" constitute a fairly large proportion of common user interactions (landing page with faceting over the whole index presents users with a handful of clickable top-level filters, each of which covers a large portion of the index). Thus, faceting seems to be a good candidate for caching, regardless of the underlying implementation of the DocValues interface. Accordingly, I've attached a stab at a patch ([^facetcache.diff]) to {{DocValuesFacets}} to support a cache intended to speed dv faceting over high-cardinality docsets. Combined with a handful of warming queries, I've seen much improved performance for common requests. In addition to the patch, you must configure your solrconfig.xml with, e.g., {code:xml} {code} I tried to make the docset cardinality threshold for caching configu
[jira] [Comment Edited] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298469#comment-15298469 ] Alessandro Benedetti edited comment on SOLR-8096 at 5/25/16 10:45 AM: -- Just adding some additional information as I just incurred on the issue with Solr 6.0 : Static index, around 50 *10^6 docs, 20 fields to facet, 1 of them with high cardinality on top of grouping. Groping was not affecting at all. All the symptoms are there, Solr 4.10.2 around 70 ms (enum) - 150 ms fcs and Solr 6.0 around 550 ms . The 'fieldValueCache' seems to be unused (no inserts nor lookups) in Solr 6.0. In Solr 4.10 the 'fieldValueCache' is in heavy use with a cumulative_hitratio of 0.96 . Switching from enum to fc to fcs to uif did not change that much. Moving to DocValues didn't improve that much the situation ( but I was on an optimized index, so I need to try the multi-segmented one according to [~mkhludnev] contribution in Solr 5.4.0 ) . Moving to field collapsing moved down the query to 110-120 ms ( but this is normal, we were faceting on 260 /1 million orignal docs) Adding facet.threads=NCores moved down the queryTime to 100 ms, in combination with field collapsing we reached 80-90 ms when warmed. What are the plan for the future related this ? Do we want to deprecate the legacy facets implementation and move everything to Json facets ( like it happened with the UIF ) ? So backward compatible but different implementation ? Cheers was (Author: alessandro.benedetti): Just adding some additional information as I just incurred on the issue with Solr 6.0 : Static index, around 50 *10^6 docs, 20 fields to facet, 1 of them with high cardinality on top of grouping. Groping was not affecting at all. All the symptoms are there, Solr 4.10.2 around 150 ms and Solr 6.0 around 550 ms . The 'fieldValueCache' seems to be unused (no inserts nor lookups) in Solr 6.0. In Solr 4.10 the 'fieldValueCache' is in heavy use with a cumulative_hitratio of 0.96 . Switching from enum to fc to fcs to uif did not change that much. Moving to DocValues didn't improve that much the situation ( but I was on an optimized index, so I need to try the multi-segmented one according to [~mkhludnev] contribution in Solr 5.4.0 ) . Moving to field collapsing moved down the query to 110-120 ms ( but this is normal, we were faceting on 260 /1 million orignal docs) Adding facet.threads=NCores moved down the queryTime to 100 ms, in combination with field collapsing we reached 80-90 ms when warmed. What are the plan for the future related this ? Do we want to deprecate the legacy facets implementation and move everything to Json facets ( like it happened with the UIF ) ? So backward compatible but different implementation ? Cheers > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0 >Reporter: Yonik Seeley >Priority: Critical > Attachments: simple_facets.diff > > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was removed as part of LUCENE-5666, causing > severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. > edit: removed "secret" adverb by request -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands,
[jira] [Comment Edited] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065185#comment-15065185 ] Jamie Johnson edited comment on SOLR-8096 at 12/19/15 5:12 AM: --- While some (all?) of the performance issues are addressed, would it not still be useful to add an option to support either faceting approach? I understand the benefits of DocValues but we have a case where the facets need to be calculated based on an access level the user has. Simply storing in a separate field is not an option because the access controls are complex. Given that the JSON Facet API allows developers to choose the faceting method it would seem reasonable to provide similar functionality here, no? Perhaps support the original implementation as the approach when method is fc and add a dv method to support docvalues. This would be inline with the new JSON API I believe, though from the looks of things it is not a trivial patch since the SimpleFacets seems pretty out of sync with the new faceting approach required in regards to using the UnInvertedField was (Author: jej2003): While some (all?) of the performance issues are addressed, would it not still be useful to add an option to support either faceting approach? I understand the benefits of DocValues but we have a case where the facets need to be calculated based on an access level the user has. Simply storing in a separate field is not an option because the access controls are complex. Given that the JSON Facet API allows developers to choose the faceting method it would seem reasonable to provide similar functionality here, no? It would seem a fairly trivial patch to support the original implementation as the approach when method is fc and add a dv method to support docvalues. This would be inline with the new JSON API I believe. > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk >Reporter: Yonik Seeley >Priority: Critical > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was removed as part of LUCENE-5666, causing > severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. > edit: removed "secret" adverb by request -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065185#comment-15065185 ] Jamie Johnson edited comment on SOLR-8096 at 12/19/15 3:31 AM: --- While some (all?) of the performance issues are addressed, would it not still be useful to add an option to support either faceting approach? I understand the benefits of DocValues but we have a case where the facets need to be calculated based on an access level the user has. Simply storing in a separate field is not an option because the access controls are complex. Given that the JSON Facet API allows developers to choose the faceting method it would seem reasonable to provide similar functionality here, no? It would seem a fairly trivial patch to support the original implementation as the approach when method is fc and add a dv method to support docvalues. This would be inline with the new JSON API I believe. was (Author: jej2003): While some (all?) of the performance issues are addressed, would it not still be useful to add an option to support either faceting approach? I understand the benefits of DocValues but we have a case where the facets need to be calculated based on an access level the user has. Simply storing in a separate field is not an option because the access controls are complex. Given that the JSON Facet API allows developers to choose the faceting method it would seem reasonable to provide similar functionality here, no? > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk >Reporter: Yonik Seeley >Priority: Critical > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was removed as part of LUCENE-5666, causing > severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. > edit: removed "secret" adverb by request -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907568#comment-14907568 ] Yonik Seeley edited comment on SOLR-8096 at 9/26/15 12:28 PM: -- bq. Are you sure it was secret and not just a mistake? Yes. - This algorithm had been relied apon by many since 2008 (SOLR-475), and completely removing it's use and replacing it would obviously warrant discussion, benchmarks, etc. - This was a massive patch, and relevant changes should be called out, esp if changes seem unrelated to the issue's description. - If you search the JIRA issue, "UnInvertedField" *never* appears. (the linked issues mention it now, but those were added by us after the fact) - The issue's title is "Add UninvertingReader" and the description had to do with Lucene's FieldCache, which UnInvertedField is not part of. - There is *no* mention of the issue or changes anywhere in Solr's CHANGES.txt - When asked to comment on impacts of this massive patch, the answer given was "Is the CHANGES.txt entry not good here? The docvalues apis did not change..." - The CHANGES entry for lucene made no mention of the change to Solr or UnInvertedField. - Although the UnInvertedField code was left behind (as dead code), the removal of the use of UnInvertedField was *not* by mistake - you can see by the test code that was explicitly removed. (TestFaceting.java) edit: removed inflammatory conclusion was (Author: ysee...@gmail.com): bq. Are you sure it was secret and not just a mistake? Yes. - This algorithm had been relied apon by many since 2008 (SOLR-475), and completely removing it's use and replacing it would obviously warrant discussion, benchmarks, etc. - This was a massive patch, and relevant changes should be called out, esp if changes seem unrelated to the issue's description. - If you search the JIRA issue, "UnInvertedField" *never* appears. (the linked issues mention it now, but those were added by us after the fact) - The issue's title is "Add UninvertingReader" and the description had to do with Lucene's FieldCache, which UnInvertedField is not part of. - There is *no* mention of the issue or changes anywhere in Solr's CHANGES.txt - When asked to comment on impacts of this massive patch, the answer given was "Is the CHANGES.txt entry not good here? The docvalues apis did not change..." - The CHANGES entry for lucene made no mention of the change to Solr or UnInvertedField. - Although the UnInvertedField code was left behind (as dead code), the removal of the use of UnInvertedField was *not* by mistake - you can see by the test code that was explicitly removed. (TestFaceting.java) Exactly what other conclusion is there to draw? Massive incompetence? > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk >Reporter: Yonik Seeley >Priority: Critical > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was removed as part of LUCENE-5666, causing > severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. > edit: removed "secret" adverb by request -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907870#comment-14907870 ] Uwe Schindler edited comment on SOLR-8096 at 9/25/15 10:06 AM: --- bq. Use of the highly optimized faceting that Solr had for multi-valued fields over relatively static indexes was secretly removed as part of LUCENE-5666, causing severe performance regressions. Hi, the removal was not "secret". Removal of FieldCache from Lucene (and replacement by UninvertingReader) was discussed on the Issue tracker, although interest by Solr people was small. I think this is the main issue here. Sometimes it would be good to have Solr committers taking part of discussions on Lucene issues. If you want to make Solr bettre, you should also help in making Lucene better! The old field cache was also put into a separate module (with the new DocValues emulating-API), because we (Lucene Committers) knew that Solr still uses it. Sure, we could have used UninvertingReader on top of SlowCompositeReaderWrapper, but this would bring other slowness! So the committers decided to step forward and remove the top-level facetting (which was long overdue). It was announced in several talks about Lucene 5 that FieldCache was removed and all facetting in Solr was implicitely changed to only use per segment field caches (e.g., see my talk @ fosdem 2015, JAX 2015, or berlinbuzzwords - around one of the last slides). Maybe there should have been added a changes entry also to the Solr CHANGES.txt about this, but this was forgotten. The CHANGES.txt about this entry was, the first line mentions that facetting in Solr is involved. Any Solr committer could have looked into the code and bring up complaints about those changes in the issue tracker also after this commit has been done: {quote} * LUCENE-5666: Change uninverted access (sorting, faceting, grouping, etc) to use the DocValues API instead of FieldCache. For FieldCache functionality, use UninvertingReader in lucene/misc (or implement your own FilterReader). UninvertingReader is more efficient: supports multi-valued numeric fields, detects when a multi-valued field is single-valued, reuses caches of compatible types (e.g. SORTED also supports BINARY and SORTED_SET access without insanity). "Insanity" is no longer possible unless you explicitly want it. Rename FieldCache* and DocTermOrds* classes in the search package to DocValues*. Move SortedSetSortField to core and add SortedSetFieldSource to queries/, which takes the same selectors. Add helper methods to DocValues.java that are better suited for search code (never return null, etc). (Mike McCandless, Robert Muir) {quote} So everybody was informed. bq. The people who did this are elasticsearch employees. That is one way to deal with Solr's faster faceting! This is speculation and really a bad behaviour on an Open Source issue tracker. We should discuss here about technical stuff, not make any assumptions about what people intend to do. This statement was posted by a person ([~mmurphy3141]) who I never met in person, and who really seldem took place in Lucene/Solr discussions at all. So I don't think we should count on that. It is also bad behaviour to accuse committers on twitter about sabotage: https://twitter.com/mmurphy3141/status/647254551356162048; please don't do this. I would ask to remove this tweet, thanks. I was informed about the changes mentioned here and I strongly agree with the committers behind LUCENE-5666. I was always in favour of removing those top-level facetting algorithms. So they still have my strong +1. On my Solr customers I have seen nobody who complained about slow top-level facetting recently (because I told them long time ago to no longer use those outdated top-level algorithms if they have dynamic indexes). Of course I don't know about people using static indexes. The right thing to do for Solr people would be to remove those top-level stuff completely. This is no longer fitting the new reader structure (composite and atomic/leaf readers) of Lucene 3 (with API cleanups to better reflect the new structure in Lucene 4). Lucene 3 is now several years retired already! So there was long time to fix Solr's facetting to go away from top-level. People with static indexes can still force merge their index and will have the same performance with the new algorithms. Please keep in mind that it took about half a year until the first one recognized a problem like this, which makes me think that only few people are using those mostly-static indexes. *We should work on this issue to fix the issue, not accuse people, thanks!* was (Author: thetaphi): bq. Use of the highly optimized faceting that Solr had for multi-valued fields over relatively static indexes was secretly removed as part of LUCENE-5666, causing s
[jira] [Comment Edited] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907870#comment-14907870 ] Uwe Schindler edited comment on SOLR-8096 at 9/25/15 10:04 AM: --- bq. Use of the highly optimized faceting that Solr had for multi-valued fields over relatively static indexes was secretly removed as part of LUCENE-5666, causing severe performance regressions. Hi, the removal was not "secret". Removal of FieldCache from Lucene (and replacement by UninvertingReader) was discussed on the Issue tracker, although interest by Solr people was small. I think this is the main issue here. Sometimes it would be good to have Solr committers taking part of discussions on Lucene issues. If you want to make Solr bettre, you should also help in making Lucene better! The old field cache was also put into a separate module (with the new DocValues emulating-API), because we (Lucene Committers) knew that Solr still uses it. Sure, we could have used UninvertingReader on top of SlowCompositeReaderWrapper, but this would bring other slowness! So the committers decided to step forward and remove the top-level facetting (which was long overdue). It was announced in several talks about Lucene 5 that FieldCache was removed and all facetting in Solr was implicitely changed to only use per segment field caches (e.g., see my talk @ focdem 2015, JAX 2015, or berlinbuzzwords - around one of the last slides). Maybe there should have been added a changes entry also to the Solr CHANGES.txt about this, but The CHANGES.txt about this entry was, the first line mentions that facetting in Solr is involved. Any Solr committer could have looked into the code and bring up complaints about those changes in the issue tracker also after this commit has been done: {quote} * LUCENE-5666: Change uninverted access (sorting, faceting, grouping, etc) to use the DocValues API instead of FieldCache. For FieldCache functionality, use UninvertingReader in lucene/misc (or implement your own FilterReader). UninvertingReader is more efficient: supports multi-valued numeric fields, detects when a multi-valued field is single-valued, reuses caches of compatible types (e.g. SORTED also supports BINARY and SORTED_SET access without insanity). "Insanity" is no longer possible unless you explicitly want it. Rename FieldCache* and DocTermOrds* classes in the search package to DocValues*. Move SortedSetSortField to core and add SortedSetFieldSource to queries/, which takes the same selectors. Add helper methods to DocValues.java that are better suited for search code (never return null, etc). (Mike McCandless, Robert Muir) {quote} So everybody was informed. bq. The people who did this are elasticsearch employees. That is one way to deal with Solr's faster faceting! This is speculation and really a bad behaviour on an Open Source issue tracker. We should discuss here about technical stuff, not make any assumptions about what people intend to do. This statement was posted by a person ([~mmurphy3141]) who I never met in person, and who really seldem took place in Lucene/Solr discussions at all. So I don't think we should count on that. It is also bad behaviour to accuse committers on twitter about sabotage: https://twitter.com/mmurphy3141/status/647254551356162048; please don't do this. I would ask to remove this tweet, thanks. I was informed about the changes mentioned here and I strongly agree with the committers behind LUCENE-5666. I was always in favour of removing those top-level facetting algorithms. So they still have my strong +1. On my Solr customers I have seen nobody who complained about slow top-level facetting recently (because I told them long time ago to no longer use those outdated top-level algorithms if they have dynamic indexes). Of course I don't know about people using static indexes. The right thing to do for Solr people would be to remove those top-level stuff completely. This is no longer fitting the new reader structure (composite and atomic/leaf readers) of Lucene 3 (with API cleanups to better reflect the new structure in Lucene 4). Lucene 3 is now several years retired already! So there was long time to fix Solr's facetting to go away from top-level. People with static indexes can still force merge their index and will have the same performance with the new algorithms. Please keep in mind that it took about half a year until the first one recognized a problem like this, which makes me think that only few people are using those mostly-static indexes. *We should work on this issue to fix the issue, not accuse people, thanks!* was (Author: thetaphi): bq. Use of the highly optimized faceting that Solr had for multi-valued fields over relatively static indexes was secretly removed as part of LUCENE-5666, causing severe performance r
[jira] [Comment Edited] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907870#comment-14907870 ] Uwe Schindler edited comment on SOLR-8096 at 9/25/15 9:49 AM: -- bq. Use of the highly optimized faceting that Solr had for multi-valued fields over relatively static indexes was secretly removed as part of LUCENE-5666, causing severe performance regressions. Hi, the removal was not "secret". Removal of FieldCache from Lucene (and replacement by UninvertingReader) was discussed on the Issue tracker, although interest by Solr people was small. I think this is the main issue here. Sometimes it would be good to have Solr committers taking part of discussions on Lucene issues. If you want to make Solr bettre, you should also help in making Lucene better! The old field cache was also put into a separate module (with the new DocValues emulating-API), because we (Lucene Committers) knew that Solr still uses it. Sure, we could have used UninvertingReader on top of SlowCompositeReaderWrapper, but this would bring other slowness! So the committers decided to step forward and remove the top-level facetting (which was long overdue). It was announced in several talks about Lucene 5 that FieldCache was removed and all facetting in Solr was implicitely changed to only use per segment field caches (e.g., see my talk @ focdem 2015, JAX 2015, or berlinbuzzwords - around one of the last slides). Maybe there should have been added a changes entry also to the Solr CHANGES.txt about this, but The CHANGES.txt about this entry was, the first line mentions that facetting in Solr is involved. Any Solr committer could have looked into the code and bring up complaints about those changes in the issue tracker also after this commit has been done: {quote} * LUCENE-5666: Change uninverted access (sorting, faceting, grouping, etc) to use the DocValues API instead of FieldCache. For FieldCache functionality, use UninvertingReader in lucene/misc (or implement your own FilterReader). UninvertingReader is more efficient: supports multi-valued numeric fields, detects when a multi-valued field is single-valued, reuses caches of compatible types (e.g. SORTED also supports BINARY and SORTED_SET access without insanity). "Insanity" is no longer possible unless you explicitly want it. Rename FieldCache* and DocTermOrds* classes in the search package to DocValues*. Move SortedSetSortField to core and add SortedSetFieldSource to queries/, which takes the same selectors. Add helper methods to DocValues.java that are better suited for search code (never return null, etc). (Mike McCandless, Robert Muir) {quote} So everybody was informed. bq. The people who did this are elasticsearch employees. That is one way to deal with Solr's faster faceting! This is speculation and really a bad behaviour on an Open Source issue tracker. We should discuss here about technical stuff, not make any assumptions about what people intend to do. This statement was posted by a person ([~mmurphy3141]) who I never met in person, and who really seldem took place in Lucene/Solr discussions at all. So I don't think we should count on that. It is also bad behaviour to accuse committers on twitter about sabotage: https://twitter.com/mmurphy3141/status/647254551356162048; please don't do this. I would ask to remove this tweet, thanks. I was informed about the changes mentioned here and I strongly agree with the committers behind LUCENE-5666. I was always in favour of removing those top-level facetting algorithms. So they still have my strong +1. On my Solr customers I have seen nobody who complained about slow top-level facetting (because I told them long time ago to no longer use those outdated top-level algorithms if they have dynamic indexes). The right thing to do for Solr people would be to remove those top-level stuff completely. This is no longer fitting the new reader structure (composite and atomic/leaf readers) of Lucene 3 (with API cleanups to better reflect the new structure in Lucene 4). Lucene 3 is now several years retired already! So there was long time to fix Solr's facetting to go away from top-level. People with static indexes can still force merge their index and will have the same performance with the new algorithms. Please keep in mind that it took about half a year until the first one recognized a problem like this, which makes me think that only few people are using those mostly-static indexes. *We should work on this issue to fix the issue, not accuse people, thanks!* was (Author: thetaphi): bq. Use of the highly optimized faceting that Solr had for multi-valued fields over relatively static indexes was secretly removed as part of LUCENE-5666, causing severe performance regressions. Hi, the removal was not "secret". Removal of FieldCache f
[jira] [Comment Edited] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907515#comment-14907515 ] Mike Murphy edited comment on SOLR-8096 at 9/25/15 3:38 AM: The people who did this are elasticsearch employees. That is one way to deal with Solr's faster faceting! This smells like the VW pollution scandal for lucene/solr/elasticsearch, except perhaps no consequences for those who pulled it off? Why are elasticsearch employees allowed to do this? was (Author: mmurphy3141): The people who did this are elasticsearch employees. That is one way to deal with Solr's faster faceting! This smells like the VW pollution scandal for lucene/solr/elasticsearch, except perhaps no consequences for those who pulled it off? > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk >Reporter: Yonik Seeley >Priority: Critical > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was *secretly removed* as part of LUCENE-5666, > causing severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org