[ 
https://issues.apache.org/jira/browse/SOLR-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588172#comment-13588172
 ] 

Toke Eskildsen commented on SOLR-4496:
--------------------------------------

It looks to me that you're implementing true hierarchical faceting in Solr. Be 
aware that there are at least 3 existing attempts/solutions to this problem, 
namely SOLR-64 (looks abandoned), Lucene Faceting (not in Solr yet) and 
SOLR-2412 (my attempt).

There are different fundamental ways to go about this, where one is index-time 
(multi-value is a common challenge here) and one is search-time.

If you do it search-time, it can be done either by creating special purpose 
structures upon index-open (which increases warmup time and memory usage) or by 
doing a full run through the counter structures and perform the logic there 
(which increases response times and requires all potential field values to be 
resolved for each search). 

As far as I remember, Solr's internal facet counters maps to terms ordered by 
Unicode. This makes it possible to create an extension to the structure that 
keeps track of the hierarchy, which is fast to build and does not take up a lot 
of heap. I wrote a blot-post about it at 
https://sbdevel.wordpress.com/2010/10/05/fast-hierarchical-faceting/
                
> Support for faceting on the start of values
> -------------------------------------------
>
>                 Key: SOLR-4496
>                 URL: https://issues.apache.org/jira/browse/SOLR-4496
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Teun Duynstee
>            Priority: Minor
>         Attachments: limitLength-limitDelim-1st.patch
>
>
> The SimpleFacets component supports the prefix parameter to return only 
> facets starting with that prefix. This feature should (IMO) be complemented 
> by two more parameters to make it much more usefull (names could be improved 
> on of course):
> - limitLength: will return facets for only the first x characters of the real 
> facets. If the real values are AAA, CC and CCC, the limitLength=1 parameter 
> would cause the facets A and C to be returned, with the sum of the counts. 
> This could typpically be used for a UI that allows you to select a first 
> letter for fields with many facets.
> - limitDelim: this would not truncate on a fixed length, but on the occurence 
> of a certain character after the prefix. This would allow the user to search 
> for hierarchical fields without having to resort to including each level of 
> the hierarchy at index analysis. This way, the value of the filed cat would 
> be 'Comics>Marvel>Batman' and this would be found using 
> prefix=Comics>&limitDelim=>. This would return the facet Marvel with the 
> combined count for all undelying cat values.
> I am working on a patch that would achieve this by postprocessing the 
> resulting counts in getTermCounts(). However, this will not return the 
> correct counts for multivalued fields. Also, the combination with field.limit 
> is not easy. Any tips for how to implement this? I'm available to work on a 
> patch. Or is it a bad idea anyway?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to