Re: hierarchical faceting, SOLR-792 - confused on config
On Wed, 2011-03-16 at 18:36 +0100, Erik Hatcher wrote: Sorry, I missed the original mail on this thread I put together that hierarchical faceting wiki page a couple of years ago when helping a customer evaluate SOLR-64 vs. SOLR-792 vs.other approaches. Since then, SOLR-792 morphed and is committed as pivot faceting. SOLR-64 spawned a PathTokenizer which is part of Solr now too. Recently Toke updated that page with some additional info. It's definitely not a how to page, and perhaps should get renamed/moved/revamped? Toke? Unfortunately or luckily, depending on ones point of view, I am hit by a child #3 and buying house combo. A lot of intentions, but no promises for the next month or two. I think we need both an overview and a detailed how-to of the different angles on extended faceting in Solr, seen from a user-perspective. I am not sure I fully understand the different methods myself, so maybe we could start by discussing them here? Below is a quick outline of how I see them; please expand correct. I plan to back up the claims about scale later with a wiki-page with performance tests. http://www.lucidimagination.com/solutions/webcasts/faceting @27-33 min: - Requires the user to transform the paths to multiple special terms - Step-by-step drill down: If a visual tree is needed, it requires one call for each branch. - Supports multiple paths/document - Constraints on output works just as standard faceting - Scales very well when a single branch is requested Example use case: Click-to-expand tree structure of categories for books. PathHierarchyTokenizer (trunk): Changes /A/B/C to /A, /A/B and /A/B/C. I don't know how this can be used directly for hierarchical faceting. The Lucid Imagination webcast uses the tokenization 0/A, 1/A/B and 2/A/B/C so they seem incompatible to me. The discussion on SOLR-1057 indicates that it can be used with SOLR-64, but SOLR-64 does its own tokenization!? Little help here? SOLR-64 (not up to date with trunk?): - Uses a custom tokenizer to handle delimited paths (A/B/C). - Single-path hierarchical faceting - Constraints can be given on the depth of the hierarchy but not on the number of entries at a given level (huge result set when a wide hierarchy is analyzed) - Fine (speed memory) for small taxonomies - Does not scale well (speed) to large taxonomies Example use case: Tree structure of addresses for stores. SOLR-792 aka pivot faceting (Solr 4.0): - Uses multiple independent fields as input: Not suitable for taxonomies - Multi-value but not multi-path - Supports taxonomies by restraining to single-path/document(?) - Constraints can be given on entry count, but sorting cannot be done on recursive counting of entries (and it would be very CPU expensive to do so(?)) - Fine (speed memory) for small taxonomies - Scales well (speed memory)to large taxonomies - Scales poorly (speed)to large taxonomies and large result size Example use case: Tree structure with price, rating and stock SOLR-2412 (trunk, highly experimental): - Multi-path hierarchical faceting - Uses a field with delimited paths as input (A/B/C) - Constraints can be given on depth as well as entry count, but sorting cannot be done on recursive counting of entries (the number is there though, so it would be fairly easy to add such a sorter) - Fine (speed memory) for small taxonomies - Scales well (speed memory)to large taxonomies result size Example use case: Tree structure of categories for books.
Re: hierarchical faceting, SOLR-792 - confused on config
Yes, pivot faceting is committed to trunk. But is not part of upcoming 3.1 release. Erik On Mar 16, 2011, at 15:00 , McGibbney, Lewis John wrote: Hi Erik, I have been reading about the progression of SOLR-792 into pivot faceting, however can you expand to comment on where it is committed. Are you referring to trunk? The reason I am asking is that I have been using 1.4.1 for some time now and have been thinking of upgrading to trunk... or branch Thank you Lewis From: Erik Hatcher [erik.hatc...@gmail.com] Sent: 16 March 2011 17:36 To: solr-user@lucene.apache.org Subject: Re: hierarchical faceting, SOLR-792 - confused on config Sorry, I missed the original mail on this thread I put together that hierarchical faceting wiki page a couple of years ago when helping a customer evaluate SOLR-64 vs. SOLR-792 vs.other approaches. Since then, SOLR-792 morphed and is committed as pivot faceting. SOLR-64 spawned a PathTokenizer which is part of Solr now too. Recently Toke updated that page with some additional info. It's definitely not a how to page, and perhaps should get renamed/moved/revamped? Toke? Erik Glasgow Caledonian University is a registered Scottish charity, number SC021474 Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
Re: hierarchical faceting, SOLR-792 - confused on config
On Mar 16, 2011, at 14:53 , Jonathan Rochkind wrote: Interesting, any documentation on the PathTokenizer anywhere? Or just have to find and look at the source? That's something I hadn't known about, which may be useful to some stuff I've been working on depending on how it works. http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PathHierarchyTokenizerFactory Sorry, I said PathTokenizer which is what SOLR-1057 called it for a bit before it got renamed. Erik
RE: hierarchical faceting, SOLR-792 - confused on config
Hi, This is also where I am having problems. I have not been able to understand very much on the wiki. I do not understand how to configure the faceting we are referring to. Although I know very little about this, I can't help but think that the wiki is quite clearly unaccurate by some way! Any comments please Lewis From: kmf [kfole...@gmail.com] Sent: 23 February 2011 17:10 To: solr-user@lucene.apache.org Subject: Re: hierarchical faceting, SOLR-792 - confused on config I'm really confused now. Is this page completely out of date - http://wiki.apache.org/solr/HierarchicalFaceting - as it seems to imply that solr-792 is a form of hierarchical faceting. There are currently two similar, non-competing, approaches to generating tree/hierarchical facets from Solr: SOLR-64 and SOLR-792 To achieve hierarchical faceting, is the rule then that you form the hierarchical facets using a transformer in the DIH and do nothing in schema.xml or solrconfig.xml? I seem to recall reading somewhere that creating a copyField is needed. Sorry for the entry level question but, I'm still trying to understand how to configure solr to do hierarchical faceting. Thanks, kmf -- View this message in context: http://lucene.472066.n3.nabble.com/hierarchical-faceting-SOLR-792-confused-on-config-tp2556394p2561445.html Sent from the Solr - User mailing list archive at Nabble.com. Email has been scanned for viruses by Altman Technologies' email management service - www.altman.co.uk/emailsystems Glasgow Caledonian University is a registered Scottish charity, number SC021474 Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
Re: hierarchical faceting, SOLR-792 - confused on config
Sorry, I missed the original mail on this thread I put together that hierarchical faceting wiki page a couple of years ago when helping a customer evaluate SOLR-64 vs. SOLR-792 vs.other approaches. Since then, SOLR-792 morphed and is committed as pivot faceting. SOLR-64 spawned a PathTokenizer which is part of Solr now too. Recently Toke updated that page with some additional info. It's definitely not a how to page, and perhaps should get renamed/moved/revamped? Toke? Erik On Mar 16, 2011, at 12:39 , McGibbney, Lewis John wrote: Hi, This is also where I am having problems. I have not been able to understand very much on the wiki. I do not understand how to configure the faceting we are referring to. Although I know very little about this, I can't help but think that the wiki is quite clearly unaccurate by some way! Any comments please Lewis From: kmf [kfole...@gmail.com] Sent: 23 February 2011 17:10 To: solr-user@lucene.apache.org Subject: Re: hierarchical faceting, SOLR-792 - confused on config I'm really confused now. Is this page completely out of date - http://wiki.apache.org/solr/HierarchicalFaceting - as it seems to imply that solr-792 is a form of hierarchical faceting. There are currently two similar, non-competing, approaches to generating tree/hierarchical facets from Solr: SOLR-64 and SOLR-792 To achieve hierarchical faceting, is the rule then that you form the hierarchical facets using a transformer in the DIH and do nothing in schema.xml or solrconfig.xml? I seem to recall reading somewhere that creating a copyField is needed. Sorry for the entry level question but, I'm still trying to understand how to configure solr to do hierarchical faceting. Thanks, kmf -- View this message in context: http://lucene.472066.n3.nabble.com/hierarchical-faceting-SOLR-792-confused-on-config-tp2556394p2561445.html Sent from the Solr - User mailing list archive at Nabble.com. Email has been scanned for viruses by Altman Technologies' email management service - www.altman.co.uk/emailsystems Glasgow Caledonian University is a registered Scottish charity, number SC021474 Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
Re: hierarchical faceting, SOLR-792 - confused on config
Interesting, any documentation on the PathTokenizer anywhere? Or just have to find and look at the source? That's something I hadn't known about, which may be useful to some stuff I've been working on depending on how it works. If nothing else, in the meantime, I'm going to take that exact message from Erik and just add it to the top of the wiki page, to avoid other people getting confused (I've been confused by that page too) until someone spends the time to rewrite it to be more up to date and accurate, or clear about it's topicality. On 3/16/2011 1:36 PM, Erik Hatcher wrote: Sorry, I missed the original mail on this thread I put together that hierarchical faceting wiki page a couple of years ago when helping a customer evaluate SOLR-64 vs. SOLR-792 vs.other approaches. Since then, SOLR-792 morphed and is committed as pivot faceting. SOLR-64 spawned a PathTokenizer which is part of Solr now too. Recently Toke updated that page with some additional info. It's definitely not a how to page, and perhaps should get renamed/moved/revamped? Toke? Erik On Mar 16, 2011, at 12:39 , McGibbney, Lewis John wrote: Hi, This is also where I am having problems. I have not been able to understand very much on the wiki. I do not understand how to configure the faceting we are referring to. Although I know very little about this, I can't help but think that the wiki is quite clearly unaccurate by some way! Any comments please Lewis From: kmf [kfole...@gmail.com] Sent: 23 February 2011 17:10 To: solr-user@lucene.apache.org Subject: Re: hierarchical faceting, SOLR-792 - confused on config I'm really confused now. Is this page completely out of date - http://wiki.apache.org/solr/HierarchicalFaceting - as it seems to imply that solr-792 is a form of hierarchical faceting. There are currently two similar, non-competing, approaches to generating tree/hierarchical facets from Solr: SOLR-64 and SOLR-792 To achieve hierarchical faceting, is the rule then that you form the hierarchical facets using a transformer in the DIH and do nothing in schema.xml or solrconfig.xml? I seem to recall reading somewhere that creating a copyField is needed. Sorry for the entry level question but, I'm still trying to understand how to configure solr to do hierarchical faceting. Thanks, kmf -- View this message in context: http://lucene.472066.n3.nabble.com/hierarchical-faceting-SOLR-792-confused-on-config-tp2556394p2561445.html Sent from the Solr - User mailing list archive at Nabble.com. Email has been scanned for viruses by Altman Technologies' email management service - www.altman.co.uk/emailsystems Glasgow Caledonian University is a registered Scottish charity, number SC021474 Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
RE: hierarchical faceting, SOLR-792 - confused on config
Hi Erik, I have been reading about the progression of SOLR-792 into pivot faceting, however can you expand to comment on where it is committed. Are you referring to trunk? The reason I am asking is that I have been using 1.4.1 for some time now and have been thinking of upgrading to trunk... or branch Thank you Lewis From: Erik Hatcher [erik.hatc...@gmail.com] Sent: 16 March 2011 17:36 To: solr-user@lucene.apache.org Subject: Re: hierarchical faceting, SOLR-792 - confused on config Sorry, I missed the original mail on this thread I put together that hierarchical faceting wiki page a couple of years ago when helping a customer evaluate SOLR-64 vs. SOLR-792 vs.other approaches. Since then, SOLR-792 morphed and is committed as pivot faceting. SOLR-64 spawned a PathTokenizer which is part of Solr now too. Recently Toke updated that page with some additional info. It's definitely not a how to page, and perhaps should get renamed/moved/revamped? Toke? Erik Glasgow Caledonian University is a registered Scottish charity, number SC021474 Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
Re: hierarchical faceting, SOLR-792 - confused on config
(11/03/17 3:53), Jonathan Rochkind wrote: Interesting, any documentation on the PathTokenizer anywhere? It is PathHierarchyTokenizer: https://hudson.apache.org/hudson/job/Solr-trunk/javadoc/org/apache/solr/analysis/PathHierarchyTokenizerFactory.html Koji -- http://www.rondhuit.com/en/
Re: hierarchical faceting, SOLR-792 - confused on config
I'm really confused now. Is this page completely out of date - http://wiki.apache.org/solr/HierarchicalFaceting - as it seems to imply that solr-792 is a form of hierarchical faceting. There are currently two similar, non-competing, approaches to generating tree/hierarchical facets from Solr: SOLR-64 and SOLR-792 To achieve hierarchical faceting, is the rule then that you form the hierarchical facets using a transformer in the DIH and do nothing in schema.xml or solrconfig.xml? I seem to recall reading somewhere that creating a copyField is needed. Sorry for the entry level question but, I'm still trying to understand how to configure solr to do hierarchical faceting. Thanks, kmf -- View this message in context: http://lucene.472066.n3.nabble.com/hierarchical-faceting-SOLR-792-confused-on-config-tp2556394p2561445.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: hierarchical faceting, SOLR-792 - confused on config
(11/02/23 8:26), kmf wrote: I'm using solr 4.0 and trying to implement a hierarchical faceting example. The example I'm trying to implement is taken from the webcast Mastering the Power of Faceted Search. (http://www.lucidimagination.com/solutions/webcasts/faceting) Around minute 30, Chris Hostetter gives a very nice tips tricks example he described as Taxonomy facets. Where I'm confused is how to get the data indexed/organized into the taxonomy facets (0/NonFic, 1/NonFic/Law, 0/NonFic, 1/NonFic/Sci, 0/NonFic, 1/NonFic/Hist, 1/NonFic/Sci, 2/NonFic/Sci/Phys). Since I'm using DIH to import my data from a DB, do I create a TemplateTransformer to produce the indexed data? Do I have to do something special within schema.xml and/or solrconfig.xml? Once I figure out the correct config setup, I assume it's simply a matter of creating the correct solr query like he describes in the video? Thanks, kmf kmf, disclaimer: I've never seen the webcast yet. First, SOLR-792 is not for hierarchical faceting. Please see SOLR-64. Second, please take a look at PathHierarchyTokenizer in trunk and 3x. It cannot output the depth factor (0/, 1/, ...), though. Hmm, does everyone think that it has to be better if it outputs the depth factors to type or payload or somewhere else? Koji -- http://www.rondhuit.com/en/