Re: hierarchical faceting, SOLR-792 - confused on config

2011-03-17 Thread Toke Eskildsen
On Wed, 2011-03-16 at 18:36 +0100, Erik Hatcher wrote:
 Sorry, I missed the original mail on this thread
 
 I put together that hierarchical faceting wiki page a couple
 of years ago when helping a customer evaluate SOLR-64 vs.
 SOLR-792 vs.other approaches.  Since then, SOLR-792 morphed
 and is committed as pivot faceting.  SOLR-64 spawned a
 PathTokenizer which is part of Solr now too.
 
 Recently Toke updated that page with some additional info.
 It's definitely not a how to page, and perhaps should get
 renamed/moved/revamped?  Toke?

Unfortunately or luckily, depending on ones point of view, I am hit by a
child #3 and buying house combo. A lot of intentions, but no promises
for the next month or two. 


I think we need both an overview and a detailed how-to of the different
angles on extended faceting in Solr, seen from a user-perspective.

I am not sure I fully understand the different methods myself, so maybe
we could start by discussing them here? Below is a quick outline of how
I see them; please expand  correct. I plan to back up the claims about
scale later with a wiki-page with performance tests.


http://www.lucidimagination.com/solutions/webcasts/faceting @27-33 min:

- Requires the user to transform the paths to multiple special terms
- Step-by-step drill down: If a visual tree is needed, it requires one 
  call for each branch.
- Supports multiple paths/document
- Constraints on output works just as standard faceting
- Scales very well when a single branch is requested

Example use case:
Click-to-expand tree structure of categories for books.


PathHierarchyTokenizer (trunk):
Changes /A/B/C to /A, /A/B and /A/B/C.

I don't know how this can be used directly for hierarchical faceting.
The Lucid Imagination webcast uses the tokenization 0/A, 1/A/B and
2/A/B/C so they seem incompatible to me. The discussion on SOLR-1057
indicates that it can be used with SOLR-64, but SOLR-64 does its own
tokenization!?  Little help here?


SOLR-64 (not up to date with trunk?):

- Uses a custom tokenizer to handle delimited paths (A/B/C).
- Single-path hierarchical faceting
- Constraints can be given on the depth of the hierarchy but not on the 
  number of entries at a given level (huge result set when a wide 
  hierarchy is analyzed)
- Fine (speed  memory) for small taxonomies
- Does not scale well (speed) to large taxonomies

Example use case:
Tree structure of addresses for stores.


SOLR-792 aka pivot faceting (Solr 4.0):

- Uses multiple independent fields as input: Not suitable for taxonomies
- Multi-value but not multi-path
- Supports taxonomies by restraining to single-path/document(?)
- Constraints can be given on entry count, but sorting cannot be done 
  on recursive counting of entries (and it would be very CPU expensive
  to do so(?))
- Fine (speed  memory) for small taxonomies
- Scales well (speed  memory)to large taxonomies
- Scales poorly (speed)to large taxonomies and large result size

Example use case:
Tree structure with price, rating and stock


SOLR-2412 (trunk, highly experimental):

- Multi-path hierarchical faceting
- Uses a field with delimited paths as input (A/B/C)
- Constraints can be given on depth as well as entry count, but sorting
  cannot be done on recursive counting of entries (the number is there 
  though, so it would be fairly easy to add such a sorter)
- Fine (speed  memory) for small taxonomies
- Scales well (speed  memory)to large taxonomies  result size

Example use case:
Tree structure of categories for books.



Re: hierarchical faceting, SOLR-792 - confused on config

2011-03-17 Thread Erik Hatcher
Yes, pivot faceting is committed to trunk.  But is not part of upcoming 3.1 
release.

Erik

On Mar 16, 2011, at 15:00 , McGibbney, Lewis John wrote:

 Hi Erik,
 
 I have been reading about the progression of SOLR-792 into pivot faceting, 
 however can you expand to comment on
 where it is committed. Are you referring to trunk?
 The reason I am asking is that I have been using 1.4.1 for some time now and 
 have been thinking of upgrading to trunk... or branch
 
 Thank you Lewis
 
 From: Erik Hatcher [erik.hatc...@gmail.com]
 Sent: 16 March 2011 17:36
 To: solr-user@lucene.apache.org
 Subject: Re: hierarchical faceting, SOLR-792 - confused on config
 
 Sorry, I missed the original mail on this thread
 
 I put together that hierarchical faceting wiki page a couple of years ago 
 when helping a customer evaluate SOLR-64 vs. SOLR-792 vs.other approaches.  
 Since then, SOLR-792 morphed and is committed as pivot faceting.  SOLR-64 
 spawned a PathTokenizer which is part of Solr now too.
 
 Recently Toke updated that page with some additional info.  It's definitely 
 not a how to page, and perhaps should get renamed/moved/revamped?  Toke?
 
Erik
 
 
 Glasgow Caledonian University is a registered Scottish charity, number 
 SC021474
 
 Winner: Times Higher Education’s Widening Participation Initiative of the 
 Year 2009 and Herald Society’s Education Initiative of the Year 2009.
 http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html
 
 Winner: Times Higher Education’s Outstanding Support for Early Career 
 Researchers of the Year 2010, GCU as a lead with Universities Scotland 
 partners.
 http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html



Re: hierarchical faceting, SOLR-792 - confused on config

2011-03-17 Thread Erik Hatcher

On Mar 16, 2011, at 14:53 , Jonathan Rochkind wrote:

 Interesting, any documentation on the PathTokenizer anywhere? Or just have to 
 find and look at the source? That's something I hadn't known about, which may 
 be useful to some stuff I've been working on depending on how it works.

  
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PathHierarchyTokenizerFactory

Sorry, I said PathTokenizer which is what SOLR-1057 called it for a bit 
before it got renamed.

Erik



RE: hierarchical faceting, SOLR-792 - confused on config

2011-03-16 Thread McGibbney, Lewis John
Hi,

This is also where I am having problems. I have not been able to understand 
very much on the wiki.
I do not understand how to configure the faceting we are referring to.
Although I know very little about this, I can't help but think that the wiki is 
quite clearly unaccurate by some way!

Any comments please
Lewis

From: kmf [kfole...@gmail.com]
Sent: 23 February 2011 17:10
To: solr-user@lucene.apache.org
Subject: Re: hierarchical faceting, SOLR-792 - confused on config

I'm really confused now.  Is this page completely out of date -
http://wiki.apache.org/solr/HierarchicalFaceting - as it seems to imply that
solr-792 is a form of hierarchical faceting. There are currently two
similar, non-competing, approaches to generating tree/hierarchical facets
from Solr: SOLR-64 and SOLR-792

To achieve hierarchical faceting, is the rule then that you form the
hierarchical facets using a transformer in the DIH and do nothing in
schema.xml or solrconfig.xml?   I seem to recall reading somewhere that
creating a copyField is needed.  Sorry for the entry level question but, I'm
still trying to understand how to configure solr to do hierarchical
faceting.

Thanks,
kmf
--
View this message in context: 
http://lucene.472066.n3.nabble.com/hierarchical-faceting-SOLR-792-confused-on-config-tp2556394p2561445.html
Sent from the Solr - User mailing list archive at Nabble.com.

Email has been scanned for viruses by Altman Technologies' email management 
service - www.altman.co.uk/emailsystems

Glasgow Caledonian University is a registered Scottish charity, number SC021474

Winner: Times Higher Education’s Widening Participation Initiative of the Year 
2009 and Herald Society’s Education Initiative of the Year 2009.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html

Winner: Times Higher Education’s Outstanding Support for Early Career 
Researchers of the Year 2010, GCU as a lead with Universities Scotland partners.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html


Re: hierarchical faceting, SOLR-792 - confused on config

2011-03-16 Thread Erik Hatcher
Sorry, I missed the original mail on this thread

I put together that hierarchical faceting wiki page a couple of years ago when 
helping a customer evaluate SOLR-64 vs. SOLR-792 vs.other approaches.  Since 
then, SOLR-792 morphed and is committed as pivot faceting.  SOLR-64 spawned a 
PathTokenizer which is part of Solr now too.

Recently Toke updated that page with some additional info.  It's definitely not 
a how to page, and perhaps should get renamed/moved/revamped?  Toke?

Erik

On Mar 16, 2011, at 12:39 , McGibbney, Lewis John wrote:

 Hi,
 
 This is also where I am having problems. I have not been able to understand 
 very much on the wiki.
 I do not understand how to configure the faceting we are referring to.
 Although I know very little about this, I can't help but think that the wiki 
 is quite clearly unaccurate by some way!
 
 Any comments please
 Lewis
 
 From: kmf [kfole...@gmail.com]
 Sent: 23 February 2011 17:10
 To: solr-user@lucene.apache.org
 Subject: Re: hierarchical faceting, SOLR-792 - confused on config
 
 I'm really confused now.  Is this page completely out of date -
 http://wiki.apache.org/solr/HierarchicalFaceting - as it seems to imply that
 solr-792 is a form of hierarchical faceting. There are currently two
 similar, non-competing, approaches to generating tree/hierarchical facets
 from Solr: SOLR-64 and SOLR-792
 
 To achieve hierarchical faceting, is the rule then that you form the
 hierarchical facets using a transformer in the DIH and do nothing in
 schema.xml or solrconfig.xml?   I seem to recall reading somewhere that
 creating a copyField is needed.  Sorry for the entry level question but, I'm
 still trying to understand how to configure solr to do hierarchical
 faceting.
 
 Thanks,
 kmf
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/hierarchical-faceting-SOLR-792-confused-on-config-tp2556394p2561445.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 Email has been scanned for viruses by Altman Technologies' email management 
 service - www.altman.co.uk/emailsystems
 
 Glasgow Caledonian University is a registered Scottish charity, number 
 SC021474
 
 Winner: Times Higher Education’s Widening Participation Initiative of the 
 Year 2009 and Herald Society’s Education Initiative of the Year 2009.
 http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html
 
 Winner: Times Higher Education’s Outstanding Support for Early Career 
 Researchers of the Year 2010, GCU as a lead with Universities Scotland 
 partners.
 http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html



Re: hierarchical faceting, SOLR-792 - confused on config

2011-03-16 Thread Jonathan Rochkind
Interesting, any documentation on the PathTokenizer anywhere? Or just 
have to find and look at the source? That's something I hadn't known 
about, which may be useful to some stuff I've been working on depending 
on how it works.


If nothing else, in the meantime, I'm going to take that exact message 
from Erik and just add it to the top of the wiki page, to avoid other 
people getting confused (I've been confused by that page too) until 
someone spends the time to rewrite it to be more up to date and 
accurate, or clear about it's topicality.


On 3/16/2011 1:36 PM, Erik Hatcher wrote:

Sorry, I missed the original mail on this thread

I put together that hierarchical faceting wiki page a couple of years ago when 
helping a customer evaluate SOLR-64 vs. SOLR-792 vs.other approaches.  Since 
then, SOLR-792 morphed and is committed as pivot faceting.  SOLR-64 spawned a 
PathTokenizer which is part of Solr now too.

Recently Toke updated that page with some additional info.  It's definitely not a 
how to page, and perhaps should get renamed/moved/revamped?  Toke?

Erik

On Mar 16, 2011, at 12:39 , McGibbney, Lewis John wrote:


Hi,

This is also where I am having problems. I have not been able to understand 
very much on the wiki.
I do not understand how to configure the faceting we are referring to.
Although I know very little about this, I can't help but think that the wiki is 
quite clearly unaccurate by some way!

Any comments please
Lewis

From: kmf [kfole...@gmail.com]
Sent: 23 February 2011 17:10
To: solr-user@lucene.apache.org
Subject: Re: hierarchical faceting, SOLR-792 - confused on config

I'm really confused now.  Is this page completely out of date -
http://wiki.apache.org/solr/HierarchicalFaceting - as it seems to imply that
solr-792 is a form of hierarchical faceting. There are currently two
similar, non-competing, approaches to generating tree/hierarchical facets
from Solr: SOLR-64 and SOLR-792

To achieve hierarchical faceting, is the rule then that you form the
hierarchical facets using a transformer in the DIH and do nothing in
schema.xml or solrconfig.xml?   I seem to recall reading somewhere that
creating a copyField is needed.  Sorry for the entry level question but, I'm
still trying to understand how to configure solr to do hierarchical
faceting.

Thanks,
kmf
--
View this message in context: 
http://lucene.472066.n3.nabble.com/hierarchical-faceting-SOLR-792-confused-on-config-tp2556394p2561445.html
Sent from the Solr - User mailing list archive at Nabble.com.

Email has been scanned for viruses by Altman Technologies' email management 
service - www.altman.co.uk/emailsystems

Glasgow Caledonian University is a registered Scottish charity, number SC021474

Winner: Times Higher Education’s Widening Participation Initiative of the Year 
2009 and Herald Society’s Education Initiative of the Year 2009.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html

Winner: Times Higher Education’s Outstanding Support for Early Career 
Researchers of the Year 2010, GCU as a lead with Universities Scotland partners.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html




RE: hierarchical faceting, SOLR-792 - confused on config

2011-03-16 Thread McGibbney, Lewis John
Hi Erik,

I have been reading about the progression of SOLR-792 into pivot faceting, 
however can you expand to comment on
where it is committed. Are you referring to trunk?
The reason I am asking is that I have been using 1.4.1 for some time now and 
have been thinking of upgrading to trunk... or branch

Thank you Lewis

From: Erik Hatcher [erik.hatc...@gmail.com]
Sent: 16 March 2011 17:36
To: solr-user@lucene.apache.org
Subject: Re: hierarchical faceting, SOLR-792 - confused on config

Sorry, I missed the original mail on this thread

I put together that hierarchical faceting wiki page a couple of years ago when 
helping a customer evaluate SOLR-64 vs. SOLR-792 vs.other approaches.  Since 
then, SOLR-792 morphed and is committed as pivot faceting.  SOLR-64 spawned a 
PathTokenizer which is part of Solr now too.

Recently Toke updated that page with some additional info.  It's definitely not 
a how to page, and perhaps should get renamed/moved/revamped?  Toke?

Erik


Glasgow Caledonian University is a registered Scottish charity, number SC021474

Winner: Times Higher Education’s Widening Participation Initiative of the Year 
2009 and Herald Society’s Education Initiative of the Year 2009.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html

Winner: Times Higher Education’s Outstanding Support for Early Career 
Researchers of the Year 2010, GCU as a lead with Universities Scotland partners.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html


Re: hierarchical faceting, SOLR-792 - confused on config

2011-03-16 Thread Koji Sekiguchi

(11/03/17 3:53), Jonathan Rochkind wrote:

Interesting, any documentation on the PathTokenizer anywhere?


It is PathHierarchyTokenizer:

https://hudson.apache.org/hudson/job/Solr-trunk/javadoc/org/apache/solr/analysis/PathHierarchyTokenizerFactory.html

Koji
--
http://www.rondhuit.com/en/


Re: hierarchical faceting, SOLR-792 - confused on config

2011-02-23 Thread kmf

I'm really confused now.  Is this page completely out of date -
http://wiki.apache.org/solr/HierarchicalFaceting - as it seems to imply that
solr-792 is a form of hierarchical faceting. There are currently two
similar, non-competing, approaches to generating tree/hierarchical facets
from Solr: SOLR-64 and SOLR-792

To achieve hierarchical faceting, is the rule then that you form the
hierarchical facets using a transformer in the DIH and do nothing in
schema.xml or solrconfig.xml?   I seem to recall reading somewhere that
creating a copyField is needed.  Sorry for the entry level question but, I'm
still trying to understand how to configure solr to do hierarchical
faceting.

Thanks,
kmf
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/hierarchical-faceting-SOLR-792-confused-on-config-tp2556394p2561445.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: hierarchical faceting, SOLR-792 - confused on config

2011-02-22 Thread Koji Sekiguchi

(11/02/23 8:26), kmf wrote:


I'm using solr 4.0 and trying to implement a hierarchical faceting example.
The example I'm trying to implement is taken from the webcast Mastering the
Power of Faceted Search.
(http://www.lucidimagination.com/solutions/webcasts/faceting)  Around minute
30, Chris Hostetter gives a very nice tips  tricks example he described
as Taxonomy facets.  Where I'm confused is how to get the data
indexed/organized into the taxonomy facets (0/NonFic, 1/NonFic/Law,
0/NonFic, 1/NonFic/Sci, 0/NonFic, 1/NonFic/Hist, 1/NonFic/Sci,
2/NonFic/Sci/Phys).  Since I'm using DIH to import my data from a DB, do I
create a TemplateTransformer to produce the indexed data?  Do I have to do
something special within schema.xml and/or solrconfig.xml?

Once I figure out the correct config setup, I assume it's simply a matter of
creating the correct solr query like he describes in the video?

Thanks,
kmf


kmf,

disclaimer: I've never seen the webcast yet.

First, SOLR-792 is not for hierarchical faceting. Please see SOLR-64.
Second, please take a look at PathHierarchyTokenizer in trunk and 3x.
It cannot output the depth factor (0/, 1/, ...), though.

Hmm, does everyone think that it has to be better if it outputs
the depth factors to type or payload or somewhere else?

Koji
--
http://www.rondhuit.com/en/