[jira] [Comment Edited] (SOLR-8096) Major faceting performance regressions

2017-08-31 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149190#comment-16149190
 ] 

Yonik Seeley edited comment on SOLR-8096 at 8/31/17 4:48 PM:
-

bq. For them, enabling docValues, which is supposed to be the magic bullet for 
faceting performance, causes performance to get even worse.

Yep.  DocValues is a better default because it uses little heap memory compared 
to the FieldCache.  But in general, docValues can be slower than the old 4.x 
fieldCache, and definitely slower than UnInvertedField for multi-valued 
faceting.  For dense fields, the newest iterator-based docValues is also 
somewhat slower than the old docValues.  This isn't just Solr... for example, 
sorting on dense docValues fields is also slower since the cut-over to iterator 
docValues.  Anyway, specific use-cases can pretty much always be sped up, but 
there's no magic bullet and we need to tackle them one at a time.  For example, 
facet.method=uif was added to re-enable access to the UnInvertedField faceting 
method.

Another difference is top-level fieldCache vs per-segment.  For strings, 
top-level is faster, but it needs to be rebuilt from scratch each time the 
index changes.  per-segment needs to merge string ords from different segments 
(hence it introduces some overhead and is thus slower), but only new segments 
need to have to be rebuilt when the index changes (better for NRT).  But 
ability to do top-level fieldCache was removed in Lucene (some people are of 
the opinion that no caches should be top-level), hence some use-cases will be 
slower.


was (Author: ysee...@gmail.com):
bq. For them, enabling docValues, which is supposed to be the magic bullet for 
faceting performance, causes performance to get even worse.

Yep.  DocValues is a better default because it uses little heap memory compared 
to the FieldCache.  But in general, docValues can be slower than the old 4.x 
fieldCache, and definitely slower than UnInvertedField for multi-valued 
faceting.  For dense fields, the newest iterator-based docValues is also 
somewhat slower than the old docValues.  This isn't just Solr... for example, 
sorting on dense docValues fields is also slower since the cut-over to iterator 
docValues.  Anyway, specific use-cases can pretty much always be sped up, but 
there's no magic bullet and we need to tackle them one at a time.  For example, 
facet.method=uif was added to re-enable access to the UnInvertedField faceting 
method.

> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0
>Reporter: Yonik Seeley
>Priority: Critical
> Attachments: facetcache.diff, simple_facets.diff
>
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was removed as part of LUCENE-5666, causing 
> severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-8096) Major faceting performance regressions

2017-04-07 Thread Michael Gibney (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960982#comment-15960982
 ] 

Michael Gibney edited comment on SOLR-8096 at 4/7/17 3:45 PM:
--

Of course I can't speak to the status of this issue from other folks' 
perspectives, but I did observe a couple of things that I wanted mention in 
case anyone else might find them useful. Performance has actually been 
acceptable for me, but implementing a simple cache for facets definitely 
improved performance (in my deployment) for common queries (see 
[^facetcache.diff]). A couple of observations:

1. Based on the fact that fields were being faceted using DocValues faceting, I 
assumed (incorrectly) that docValues must have been enabled. In fact, docValues 
are _not_ enabled by default; IndexReaders are wrapped in UninvertingReaders 
that (on demand) uninvert non-docValues-enabled fields in order to present a 
docValues-like interface for faceting. 
2. DocValues cannot yet be enabled on analyzed fields, so if you require this, 
you'll be dealing with the UninvertingReader; you may be interested in 
SOLR-8362.
3. {{DocValuesFacets}} iterates over all the documents in a result set for 
_every_ query. Regardless of the underlying implementation, this is bound to be 
relatively expensive for result sets containing large numbers of documents. 
Furthermore, "Result sets containing large numbers of documents" constitute a 
fairly large proportion of common user interactions (landing page with faceting 
over the whole index presents users with a handful of clickable top-level 
filters, each of which covers a large portion of the index). Thus, faceting 
seems to be a good candidate for caching, regardless of the underlying 
implementation of the DocValues interface. 

Accordingly, I've attached a stab at a patch ([^facetcache.diff]) to 
{{DocValuesFacets}} to support a cache intended to speed dv faceting over 
high-cardinality docsets. Combined with a handful of warming queries, I've seen 
much improved performance for common requests. In addition to the patch, you 
must configure your solrconfig.xml with, e.g., 
{code:xml}

{code}
I tried to make the docset cardinality threshold for caching configurable at 
the field level, but haven't yet figured out how to pass in the configuration 
(you will see my unsuccessful attempts reflected in the changes to 
{{SimpleFacets}} -- with the patch in current state, if you want to adjust this 
parameter, it can only be done by changing the hardcoded default of 5000 (a 
reasonable value would probably be _much_ higher) for 
{{SimpleFacets.DEFAULT_PERSEG_FACET_CACHE_THRESHOLD}}).

Just to clarify, this comment is not a suggestion to skip closing this issue, 
and I'm sorry if it's a bit off-topic; I hope it strikes people as related 
enough to justify posting here. 


was (Author: mgibney):
Of course I can't speak to the status of this issue from other folks' 
perspectives, but I did observe a couple of things that I wanted mention in 
case anyone else might find them useful. Performance has actually been 
acceptable for me, but implementing a simple cache for facets definitely 
improved performance (in my deployment) for common queries (see 
[^facetcache.diff]). A couple of observations:
1. Based on the fact that fields were being faceted using DocValues faceting, I 
assumed (incorrectly) that docValues must have been enabled. In fact, docValues 
are _not_ enabled by default; IndexReaders are wrapped in UninvertingReaders 
that (on demand) uninvert non-docValues-enabled fields in order to present a 
docValues-like interface for faceting. 
2. DocValues cannot yet be enabled on analyzed fields, so if you require this, 
you'll be dealing with the UninvertingReader; you may be interested in 
SOLR-8362.
3. {{DocValuesFacets}} iterates over all the documents in a result set for 
_every_ query. Regardless of the underlying implementation, this is bound to be 
relatively expensive for result sets containing large numbers of documents. 
Furthermore, "Result sets containing large numbers of documents" constitute a 
fairly large proportion of common user interactions (landing page with faceting 
over the whole index presents users with a handful of clickable top-level 
filters, each of which covers a large portion of the index). Thus, faceting 
seems to be a good candidate for caching, regardless of the underlying 
implementation of the DocValues interface. 

Accordingly, I've attached a stab at a patch ([^facetcache.diff]) to 
{{DocValuesFacets}} to support a cache intended to speed dv faceting over 
high-cardinality docsets. Combined with a handful of warming queries, I've seen 
much improved performance for common requests. In addition to the patch, you 
must configure your solrconfig.xml with, e.g., 
{code:xml}

{code}
I tried to make the docset cardinality threshold for caching configu

[jira] [Comment Edited] (SOLR-8096) Major faceting performance regressions

2016-05-25 Thread Alessandro Benedetti (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298469#comment-15298469
 ] 

Alessandro Benedetti edited comment on SOLR-8096 at 5/25/16 10:45 AM:
--

Just adding some additional information as I just incurred on the issue with 
Solr 6.0 :
Static index, around 50 *10^6 docs, 20 fields to facet, 1 of them with high 
cardinality on top of grouping.
Groping was not affecting at all.

All the symptoms are there, Solr 4.10.2 around 70 ms (enum) - 150 ms fcs  and 
Solr 6.0 around 550 ms .
The 'fieldValueCache' seems to be unused (no inserts nor lookups) in Solr 6.0.
In Solr 4.10 the 'fieldValueCache' is in heavy use with a cumulative_hitratio 
of 0.96 .
Switching from enum to fc to fcs to uif did not change that much.

Moving to DocValues didn't improve that much the situation ( but I was on an 
optimized index, so I need to try the multi-segmented one according to 
[~mkhludnev] contribution in Solr 5.4.0 ) .

Moving to field collapsing moved down the query to 110-120 ms ( but this is 
normal, we were faceting on 260 /1 million orignal docs)
Adding facet.threads=NCores moved down the queryTime to 100 ms, in combination 
with field collapsing we reached 80-90 ms when warmed.

What are the plan for the future related this ?
Do we want to deprecate the legacy facets implementation and move everything to 
Json facets ( like it happened with the UIF ) ?
So backward compatible but different implementation ?

Cheers

 


was (Author: alessandro.benedetti):
Just adding some additional information as I just incurred on the issue with 
Solr 6.0 :
Static index, around 50 *10^6 docs, 20 fields to facet, 1 of them with high 
cardinality on top of grouping.
Groping was not affecting at all.

All the symptoms are there, Solr 4.10.2 around 150 ms  and Solr 6.0 around 550 
ms .
The 'fieldValueCache' seems to be unused (no inserts nor lookups) in Solr 6.0.
In Solr 4.10 the 'fieldValueCache' is in heavy use with a cumulative_hitratio 
of 0.96 .
Switching from enum to fc to fcs to uif did not change that much.

Moving to DocValues didn't improve that much the situation ( but I was on an 
optimized index, so I need to try the multi-segmented one according to 
[~mkhludnev] contribution in Solr 5.4.0 ) .

Moving to field collapsing moved down the query to 110-120 ms ( but this is 
normal, we were faceting on 260 /1 million orignal docs)
Adding facet.threads=NCores moved down the queryTime to 100 ms, in combination 
with field collapsing we reached 80-90 ms when warmed.

What are the plan for the future related this ?
Do we want to deprecate the legacy facets implementation and move everything to 
Json facets ( like it happened with the UIF ) ?
So backward compatible but different implementation ?

Cheers

 

> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0
>Reporter: Yonik Seeley
>Priority: Critical
> Attachments: simple_facets.diff
>
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was removed as part of LUCENE-5666, causing 
> severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, 

[jira] [Comment Edited] (SOLR-8096) Major faceting performance regressions

2015-12-18 Thread Jamie Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065185#comment-15065185
 ] 

Jamie Johnson edited comment on SOLR-8096 at 12/19/15 5:12 AM:
---

While some (all?) of the performance issues are addressed, would it not still 
be useful to add an option to support either faceting approach?  I understand 
the benefits of DocValues but we have a case where the facets need to be 
calculated based on an access level the user has.  Simply storing in a separate 
field is not an option because the access controls are complex.  Given that the 
JSON Facet API allows developers to choose the faceting method it would seem 
reasonable to provide similar functionality here, no?  Perhaps support the 
original implementation as the approach when method is fc and add a dv method 
to support docvalues.  This would be inline with the new JSON API I believe, 
though from the looks of things it is not a trivial patch since the 
SimpleFacets seems pretty out of sync with the new faceting approach required 
in regards to using the UnInvertedField


was (Author: jej2003):
While some (all?) of the performance issues are addressed, would it not still 
be useful to add an option to support either faceting approach?  I understand 
the benefits of DocValues but we have a case where the facets need to be 
calculated based on an access level the user has.  Simply storing in a separate 
field is not an option because the access controls are complex.  Given that the 
JSON Facet API allows developers to choose the faceting method it would seem 
reasonable to provide similar functionality here, no?  It would seem a fairly 
trivial patch to support the original implementation as the approach when 
method is fc and add a dv method to support docvalues.  This would be inline 
with the new JSON API I believe.

> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk
>Reporter: Yonik Seeley
>Priority: Critical
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was removed as part of LUCENE-5666, causing 
> severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-8096) Major faceting performance regressions

2015-12-18 Thread Jamie Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065185#comment-15065185
 ] 

Jamie Johnson edited comment on SOLR-8096 at 12/19/15 3:31 AM:
---

While some (all?) of the performance issues are addressed, would it not still 
be useful to add an option to support either faceting approach?  I understand 
the benefits of DocValues but we have a case where the facets need to be 
calculated based on an access level the user has.  Simply storing in a separate 
field is not an option because the access controls are complex.  Given that the 
JSON Facet API allows developers to choose the faceting method it would seem 
reasonable to provide similar functionality here, no?  It would seem a fairly 
trivial patch to support the original implementation as the approach when 
method is fc and add a dv method to support docvalues.  This would be inline 
with the new JSON API I believe.


was (Author: jej2003):
While some (all?) of the performance issues are addressed, would it not still 
be useful to add an option to support either faceting approach?  I understand 
the benefits of DocValues but we have a case where the facets need to be 
calculated based on an access level the user has.  Simply storing in a separate 
field is not an option because the access controls are complex.  Given that the 
JSON Facet API allows developers to choose the faceting method it would seem 
reasonable to provide similar functionality here, no?

> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk
>Reporter: Yonik Seeley
>Priority: Critical
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was removed as part of LUCENE-5666, causing 
> severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-8096) Major faceting performance regressions

2015-09-26 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907568#comment-14907568
 ] 

Yonik Seeley edited comment on SOLR-8096 at 9/26/15 12:28 PM:
--

bq. Are you sure it was secret and not just a mistake?

Yes.

- This algorithm had been relied apon by many since 2008 (SOLR-475), and 
completely removing it's use and replacing it would obviously warrant 
discussion, benchmarks, etc.
- This was a massive patch, and relevant changes should be called out, esp if 
changes seem unrelated to the issue's description.
- If you search the JIRA issue, "UnInvertedField" *never* appears.
  (the linked issues mention it now, but those were added by us after the fact)
- The issue's title is "Add UninvertingReader" and the description had to do 
with Lucene's FieldCache, which UnInvertedField is not part of.
- There is *no* mention of the issue or changes anywhere in Solr's CHANGES.txt
- When asked to comment on impacts of this massive patch, the answer given was 
"Is the CHANGES.txt entry not good here? The docvalues apis did not change..."
- The CHANGES entry for lucene made no mention of the change to Solr or 
UnInvertedField.
- Although the UnInvertedField code was left behind (as dead code), the removal 
of the use
  of UnInvertedField was *not* by mistake - you can see by the test code that 
was explicitly removed.
  (TestFaceting.java)

edit: removed inflammatory conclusion 


was (Author: ysee...@gmail.com):
bq. Are you sure it was secret and not just a mistake?

Yes.

- This algorithm had been relied apon by many since 2008 (SOLR-475), and 
completely removing it's use and replacing it would obviously warrant 
discussion, benchmarks, etc.
- This was a massive patch, and relevant changes should be called out, esp if 
changes seem unrelated to the issue's description.
- If you search the JIRA issue, "UnInvertedField" *never* appears.
  (the linked issues mention it now, but those were added by us after the fact)
- The issue's title is "Add UninvertingReader" and the description had to do 
with Lucene's FieldCache, which UnInvertedField is not part of.
- There is *no* mention of the issue or changes anywhere in Solr's CHANGES.txt
- When asked to comment on impacts of this massive patch, the answer given was 
"Is the CHANGES.txt entry not good here? The docvalues apis did not change..."
- The CHANGES entry for lucene made no mention of the change to Solr or 
UnInvertedField.
- Although the UnInvertedField code was left behind (as dead code), the removal 
of the use
  of UnInvertedField was *not* by mistake - you can see by the test code that 
was explicitly removed.
  (TestFaceting.java)

Exactly what other conclusion is there to draw?  Massive incompetence?

> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk
>Reporter: Yonik Seeley
>Priority: Critical
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was removed as part of LUCENE-5666, causing 
> severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-8096) Major faceting performance regressions

2015-09-25 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907870#comment-14907870
 ] 

Uwe Schindler edited comment on SOLR-8096 at 9/25/15 10:06 AM:
---

bq. Use of the highly optimized faceting that Solr had for multi-valued fields 
over relatively static indexes was secretly removed as part of LUCENE-5666, 
causing severe performance regressions.

Hi, the removal was not "secret". Removal of FieldCache from Lucene (and 
replacement by UninvertingReader) was discussed on the Issue tracker, although 
interest by Solr people was small. I think this is the main issue here. 
Sometimes it would be good to have Solr committers taking part of discussions 
on Lucene issues. If you want to make Solr bettre, you should also help in 
making Lucene better!

The old field cache was also put into a separate module (with the new DocValues 
emulating-API), because we (Lucene Committers) knew that Solr still uses it. 
Sure, we could have used UninvertingReader on top of 
SlowCompositeReaderWrapper, but this would bring other slowness! So the 
committers decided to step forward and remove the top-level facetting (which 
was long overdue).

It was announced in several talks about Lucene 5 that FieldCache was removed 
and all facetting in Solr was implicitely changed to only use per segment field 
caches (e.g., see my talk @ fosdem 2015, JAX 2015, or berlinbuzzwords - around 
one of the last slides). Maybe there should have been added a changes entry 
also to the Solr CHANGES.txt about this, but this was forgotten.

The CHANGES.txt about this entry was, the first line mentions that facetting in 
Solr is involved. Any Solr committer could have looked into the code and bring 
up complaints about those changes in the issue tracker also after this commit 
has been done:

{quote}
* LUCENE-5666: Change uninverted access (sorting, faceting, grouping, etc)
  to use the DocValues API instead of FieldCache. For FieldCache functionality,
  use UninvertingReader in lucene/misc (or implement your own FilterReader).
  UninvertingReader is more efficient: supports multi-valued numeric fields,
  detects when a multi-valued field is single-valued, reuses caches
  of compatible types (e.g. SORTED also supports BINARY and SORTED_SET access
  without insanity).  "Insanity" is no longer possible unless you explicitly 
want it. 
  Rename FieldCache* and DocTermOrds* classes in the search package to 
DocValues*. 
  Move SortedSetSortField to core and add SortedSetFieldSource to queries/, 
which
  takes the same selectors. Add helper methods to DocValues.java that are 
better 
  suited for search code (never return null, etc).  (Mike McCandless, Robert 
Muir)
{quote}

So everybody was informed.

bq. The people who did this are elasticsearch employees. That is one way to 
deal with Solr's faster faceting!

This is speculation and really a bad behaviour on an Open Source issue tracker. 
We should discuss here about technical stuff, not make any assumptions about 
what people intend to do. This statement was posted by a person 
([~mmurphy3141]) who I never met in person, and who really seldem took place in 
Lucene/Solr discussions at all. So I don't think we should count on that. It is 
also bad behaviour to accuse committers on twitter about sabotage: 
https://twitter.com/mmurphy3141/status/647254551356162048; please don't do 
this. I would ask to remove this tweet, thanks.

I was informed about the changes mentioned here and I strongly agree with the 
committers behind LUCENE-5666. I was always in favour of removing those 
top-level facetting algorithms. So they still have my strong +1. On my Solr 
customers I have seen nobody who complained about slow top-level facetting 
recently (because I told them long time ago to no longer use those outdated 
top-level algorithms if they have dynamic indexes). Of course I don't know 
about people using static indexes.

The right thing to do for Solr people would be to remove those top-level stuff 
completely. This is no longer fitting the new reader structure (composite and 
atomic/leaf readers) of Lucene 3 (with API cleanups to better reflect the new 
structure in Lucene 4). Lucene 3 is now several years retired already! So there 
was long time to fix Solr's facetting to go away from top-level. People with 
static indexes can still force merge their index and will have the same 
performance with the new algorithms.

Please keep in mind that it took about half a year until the first one 
recognized a problem like this, which makes me think that only few people are 
using those mostly-static indexes. 

*We should work on this issue to fix the issue, not accuse people, thanks!*


was (Author: thetaphi):
bq. Use of the highly optimized faceting that Solr had for multi-valued fields 
over relatively static indexes was secretly removed as part of LUCENE-5666, 
causing s

[jira] [Comment Edited] (SOLR-8096) Major faceting performance regressions

2015-09-25 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907870#comment-14907870
 ] 

Uwe Schindler edited comment on SOLR-8096 at 9/25/15 10:04 AM:
---

bq. Use of the highly optimized faceting that Solr had for multi-valued fields 
over relatively static indexes was secretly removed as part of LUCENE-5666, 
causing severe performance regressions.

Hi, the removal was not "secret". Removal of FieldCache from Lucene (and 
replacement by UninvertingReader) was discussed on the Issue tracker, although 
interest by Solr people was small. I think this is the main issue here. 
Sometimes it would be good to have Solr committers taking part of discussions 
on Lucene issues. If you want to make Solr bettre, you should also help in 
making Lucene better!

The old field cache was also put into a separate module (with the new DocValues 
emulating-API), because we (Lucene Committers) knew that Solr still uses it. 
Sure, we could have used UninvertingReader on top of 
SlowCompositeReaderWrapper, but this would bring other slowness! So the 
committers decided to step forward and remove the top-level facetting (which 
was long overdue).

It was announced in several talks about Lucene 5 that FieldCache was removed 
and all facetting in Solr was implicitely changed to only use per segment field 
caches (e.g., see my talk @ focdem 2015, JAX 2015, or berlinbuzzwords - around 
one of the last slides). Maybe there should have been added a changes entry 
also to the Solr CHANGES.txt about this, but 

The CHANGES.txt about this entry was, the first line mentions that facetting in 
Solr is involved. Any Solr committer could have looked into the code and bring 
up complaints about those changes in the issue tracker also after this commit 
has been done:

{quote}
* LUCENE-5666: Change uninverted access (sorting, faceting, grouping, etc)
  to use the DocValues API instead of FieldCache. For FieldCache functionality,
  use UninvertingReader in lucene/misc (or implement your own FilterReader).
  UninvertingReader is more efficient: supports multi-valued numeric fields,
  detects when a multi-valued field is single-valued, reuses caches
  of compatible types (e.g. SORTED also supports BINARY and SORTED_SET access
  without insanity).  "Insanity" is no longer possible unless you explicitly 
want it. 
  Rename FieldCache* and DocTermOrds* classes in the search package to 
DocValues*. 
  Move SortedSetSortField to core and add SortedSetFieldSource to queries/, 
which
  takes the same selectors. Add helper methods to DocValues.java that are 
better 
  suited for search code (never return null, etc).  (Mike McCandless, Robert 
Muir)
{quote}

So everybody was informed.

bq. The people who did this are elasticsearch employees. That is one way to 
deal with Solr's faster faceting!

This is speculation and really a bad behaviour on an Open Source issue tracker. 
We should discuss here about technical stuff, not make any assumptions about 
what people intend to do. This statement was posted by a person 
([~mmurphy3141]) who I never met in person, and who really seldem took place in 
Lucene/Solr discussions at all. So I don't think we should count on that. It is 
also bad behaviour to accuse committers on twitter about sabotage: 
https://twitter.com/mmurphy3141/status/647254551356162048; please don't do 
this. I would ask to remove this tweet, thanks.

I was informed about the changes mentioned here and I strongly agree with the 
committers behind LUCENE-5666. I was always in favour of removing those 
top-level facetting algorithms. So they still have my strong +1. On my Solr 
customers I have seen nobody who complained about slow top-level facetting 
recently (because I told them long time ago to no longer use those outdated 
top-level algorithms if they have dynamic indexes). Of course I don't know 
about people using static indexes.

The right thing to do for Solr people would be to remove those top-level stuff 
completely. This is no longer fitting the new reader structure (composite and 
atomic/leaf readers) of Lucene 3 (with API cleanups to better reflect the new 
structure in Lucene 4). Lucene 3 is now several years retired already! So there 
was long time to fix Solr's facetting to go away from top-level. People with 
static indexes can still force merge their index and will have the same 
performance with the new algorithms.

Please keep in mind that it took about half a year until the first one 
recognized a problem like this, which makes me think that only few people are 
using those mostly-static indexes. 

*We should work on this issue to fix the issue, not accuse people, thanks!*


was (Author: thetaphi):
bq. Use of the highly optimized faceting that Solr had for multi-valued fields 
over relatively static indexes was secretly removed as part of LUCENE-5666, 
causing severe performance r

[jira] [Comment Edited] (SOLR-8096) Major faceting performance regressions

2015-09-25 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907870#comment-14907870
 ] 

Uwe Schindler edited comment on SOLR-8096 at 9/25/15 9:49 AM:
--

bq. Use of the highly optimized faceting that Solr had for multi-valued fields 
over relatively static indexes was secretly removed as part of LUCENE-5666, 
causing severe performance regressions.

Hi, the removal was not "secret". Removal of FieldCache from Lucene (and 
replacement by UninvertingReader) was discussed on the Issue tracker, although 
interest by Solr people was small. I think this is the main issue here. 
Sometimes it would be good to have Solr committers taking part of discussions 
on Lucene issues. If you want to make Solr bettre, you should also help in 
making Lucene better!

The old field cache was also put into a separate module (with the new DocValues 
emulating-API), because we (Lucene Committers) knew that Solr still uses it. 
Sure, we could have used UninvertingReader on top of 
SlowCompositeReaderWrapper, but this would bring other slowness! So the 
committers decided to step forward and remove the top-level facetting (which 
was long overdue).

It was announced in several talks about Lucene 5 that FieldCache was removed 
and all facetting in Solr was implicitely changed to only use per segment field 
caches (e.g., see my talk @ focdem 2015, JAX 2015, or berlinbuzzwords - around 
one of the last slides). Maybe there should have been added a changes entry 
also to the Solr CHANGES.txt about this, but 

The CHANGES.txt about this entry was, the first line mentions that facetting in 
Solr is involved. Any Solr committer could have looked into the code and bring 
up complaints about those changes in the issue tracker also after this commit 
has been done:

{quote}
* LUCENE-5666: Change uninverted access (sorting, faceting, grouping, etc)
  to use the DocValues API instead of FieldCache. For FieldCache functionality,
  use UninvertingReader in lucene/misc (or implement your own FilterReader).
  UninvertingReader is more efficient: supports multi-valued numeric fields,
  detects when a multi-valued field is single-valued, reuses caches
  of compatible types (e.g. SORTED also supports BINARY and SORTED_SET access
  without insanity).  "Insanity" is no longer possible unless you explicitly 
want it. 
  Rename FieldCache* and DocTermOrds* classes in the search package to 
DocValues*. 
  Move SortedSetSortField to core and add SortedSetFieldSource to queries/, 
which
  takes the same selectors. Add helper methods to DocValues.java that are 
better 
  suited for search code (never return null, etc).  (Mike McCandless, Robert 
Muir)
{quote}

So everybody was informed.

bq. The people who did this are elasticsearch employees. That is one way to 
deal with Solr's faster faceting!

This is speculation and really a bad behaviour on an Open Source issue tracker. 
We should discuss here about technical stuff, not make any assumptions about 
what people intend to do. This statement was posted by a person 
([~mmurphy3141]) who I never met in person, and who really seldem took place in 
Lucene/Solr discussions at all. So I don't think we should count on that. It is 
also bad behaviour to accuse committers on twitter about sabotage: 
https://twitter.com/mmurphy3141/status/647254551356162048; please don't do 
this. I would ask to remove this tweet, thanks.

I was informed about the changes mentioned here and I strongly agree with the 
committers behind LUCENE-5666. I was always in favour of removing those 
top-level facetting algorithms. So they still have my strong +1. On my Solr 
customers I have seen nobody who complained about slow top-level facetting 
(because I told them long time ago to no longer use those outdated top-level 
algorithms if they have dynamic indexes).

The right thing to do for Solr people would be to remove those top-level stuff 
completely. This is no longer fitting the new reader structure (composite and 
atomic/leaf readers) of Lucene 3 (with API cleanups to better reflect the new 
structure in Lucene 4). Lucene 3 is now several years retired already! So there 
was long time to fix Solr's facetting to go away from top-level. People with 
static indexes can still force merge their index and will have the same 
performance with the new algorithms.

Please keep in mind that it took about half a year until the first one 
recognized a problem like this, which makes me think that only few people are 
using those mostly-static indexes. 

*We should work on this issue to fix the issue, not accuse people, thanks!*


was (Author: thetaphi):
bq. Use of the highly optimized faceting that Solr had for multi-valued fields 
over relatively static indexes was secretly removed as part of LUCENE-5666, 
causing severe performance regressions.

Hi, the removal was not "secret". Removal of FieldCache f

[jira] [Comment Edited] (SOLR-8096) Major faceting performance regressions

2015-09-24 Thread Mike Murphy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907515#comment-14907515
 ] 

Mike Murphy edited comment on SOLR-8096 at 9/25/15 3:38 AM:


The people who did this are elasticsearch employees.  That is one way to deal 
with Solr's faster faceting!
This smells like the VW pollution scandal for lucene/solr/elasticsearch, except 
perhaps no consequences for those who pulled it off?

Why are elasticsearch employees allowed to do this?


was (Author: mmurphy3141):
The people who did this are elasticsearch employees.  That is one way to deal 
with Solr's faster faceting!
This smells like the VW pollution scandal for lucene/solr/elasticsearch, except 
perhaps no consequences for those who pulled it off?

> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk
>Reporter: Yonik Seeley
>Priority: Critical
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was *secretly removed* as part of LUCENE-5666, 
> causing severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org