Re: Hierarchical faceting

2014-11-17 Thread Evan Pease
I'm looking to see if Solr has any in-built tokenizer that splits the
tokens
and prepends with the depth information. I'd like to avoid building depth
information into the filed values if Solr already has something that can be
used.

So the goal is to find out the level of the tree for each category? You
could determine this in the UI by splitting the category facet value string
by the separator.

As you're aware, when you query a field indexed using
solr.PathHierarchyTokenizerFactory
you still get the full path category path back as a facet value.

For example, if a user navigates to Phy:
fq={!term f=category}NonFic/Sci/Phy

The facet values that are returned will look like this (made up counts):

lst name=category
  int name=NonFic/Sci/Phy10/int
  int name=NonFic/Sci/Phy/Quantum10/int
/lst

You could find out the level by doing .split(/).length on each value.

ECP

On Mon, Nov 17, 2014 at 9:25 PM, Jason Hellman jhellman.innov...@gmail.com
wrote:

 I realize you want to avoid putting depth details into the field values,
 but something has to imply the depth.  So with that in mind, here is
 another approach (with the assumption that you are chasing down a single
 branch of a tree (and all its subbranch offshoots)),

 Use dynamic fields
 Step from one level to the next with a simple increment
 Build the facet for the next level on the call
 The UI needs only know the current level

 This would possibly be as so:

 step_fieldname_n

 With a dynamic field configuration of:

 step_*

 The content of the step_fieldname_n field would either be the strong of
 the field value or the delimited path of the current level (as suited to
 taste).  Either way, most likely a fieldType of String (or some variation
 thereof)

 The UI would then call:

 facet.field=step_fieldname_n+1

 And the UI would need to be aware to carry the n+1 into the fq link
 verbiage:

 fq=step_fieldname_n+1:facetvalue

 The trick of all of this is that you must build your index with the depth
 of your hierarchy in mind to place the values into the suitable fields.
 You could, of course, write an UpdateProcessor to accomplish this if that
 seems fitting.

 Jason

  On Nov 17, 2014, at 12:22 PM, Alexandre Rafalovitch arafa...@gmail.com
 wrote:
 
  You might be able to stick in a couple of PatternReplaceFilterFactory
  in a row with regular expressions to catch different levels.
 
  Something like:
 
  filter class=solr.PatternReplaceFilterFactory
  pattern=^[^0-9][^/]+/[^/]/[^/]+$ replacement=2$0 /
  filter class=solr.PatternReplaceFilterFactory
  pattern=^[^0-9][^/]+/[^/]$ replacement=1$0 /
  ...
 
  I did not test this, you may need to escape some thing or put explicit
  groups in there.
 
  Regards,
Alex.
  P.s.
 http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/pattern/PatternReplaceFilterFactory.html
 
  Personal: http://www.outerthoughts.com/ and @arafalov
  Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
  Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
 
 
  On 17 November 2014 15:01, rashmy1 rashmy.appanerava...@siemens.com
 wrote:
  Hi Alexandre,
  Yes, I've read this post and that's the 'Option1' listed in my initial
 post.
 
  I'm looking to see if Solr has any in-built tokenizer that splits the
 tokens
  and prepends with the depth information. I'd like to avoid building
 depth
  information into the filed values if Solr already has something that
 can be
  used.
 
  Thanks!
 
 
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/Hierarchical-faceting-tp4169263p4169536.html
  Sent from the Solr - User mailing list archive at Nabble.com.




Re: Hierarchical faceting

2014-11-14 Thread Evan Pease
Hi Rashmi,

Here is some more details on how to use PathHierarchyTokenizer that Oleg
provided the link to.

If this is your document:

 *Sample document*
 doc
 name=Pbook1
 category=NonFic/Sci/Phy/Quantum
 author=ABC
 price=20.00
 doc

Then, in your schema.xml:

field name=category type=tree indexed=true stored=true
multiValued=true/
fieldType name=tree class=solr.TextField
  analyzer type=index
tokenizer class=solr.PathHierarchyTokenizerFactory delimiter=/ /
  /analyzer
  analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory /
  /analyzer
/fieldType

Then, in your Solr query, you can simply add:

facet=true
facet.field=category

You should see a facet that contains each level of the taxonomy with counts.

To navigate the taxonomy you add filter queries using the part of the path
you want narrow the results down to (values from the category facet).

So, for example a user clicks on NonFic

facet=true
facet.field=category
fq={!term f=category}NonFic

Then NonFic/Sci

fq={!term f=category}NonFic/Sci

Then NonFic/Sci/Phy

fq={!term f=category}NonFic/Sci/Phy

etc..

If you only want to display the leaf level category and indent child
categories you can easily do this in your UI by splitting the facet value
on your separator, / in this case.


Thanks,
Evan



On Nov 14, 2014 8:06 PM, Oleg Savrasov osavra...@griddynamics.com wrote:

 Hi Rashmi,

 I believe you are looking for PathHierarchyTokenizer,
 see

 https://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/path/PathHierarchyTokenizer.html

 Oleg

 2014-11-14 17:53 GMT-05:00 rashmy1 rashmy.appanerava...@siemens.com:

  Hello,
  I'm trying to setup Solr for fetching hierarchical facets.
  Please advice which of the below approaches should be followed for my
  scenario.
  *Scenario:
  *
  NonFic
  Hist
  HistBook1
  HistBook2
  Sci
  Phy
  Quantum
  Pbook1
  Pbook2
  Thermodynamics
  Pbook3
  Pbook4
  Chem
  Cbook1
  Math
  Mbook1
  Fic
  Mystery
  Mybook1
  Childrens
  Chbook1
  Chbook2
 
  *Sample document*
  doc
  name=Pbook1
  category=NonFic/Sci/Phy/Quantum
  author=ABC
  price=20.00
  doc
 
  *Requirements:*
  -Show drill down facets
  -If user searched for *, the initial set of facets to be shown are
  'NonFic' and 'Fic'
  -If user selects facet 'NonFic', we then show the facets 'Hist' and 'Sci'
  only.
 
  *Option1:*
  /Solr schema:/
  field indexed=true multiValued=true name=category required=true
  stored=true type=string/
  /Document supplied for indexing:/
  doc
  name=Pbook1
  category=0/NonFic
  category=1/NonFic/Sci
  category=2/NonFic/Sci/Phy
  category=3/NonFic/Sci/Phy/Quantum
  category=0/Other (a book can belong to multiple categories)
  author=ABC
  price=20.00
  doc
  With Option2, we can do a drill down facet query.
  For example, if we give facet.prefix=NonFic/Sci/, the facet results are:
  NonFic/Sci/Phy
  NonFic/Sci/Chem
  NonFic/Sci/Math
  The only issue is that I have to take care of generating all possible
 path
  information for 'category'
 
  *Option2:*
  /Solr schema:/
  fieldType class=solr.TextField name=path
analyzer type=index
  tokenizer class=solr.PathHierarchyTokenizerFactory
  delimiter=//
/analyzer
  /fieldType
  field indexed=true multiValued=true name=category required=true
  stored=true type=path/
  /Document supplied for indexing:/
  doc
  name=Pbook1
  category=NonFic/Sci/Phy/Quantum
  author=ABC
  price=20.00
  doc
  With Option2, we can do facet query but it returns all possible
 combination
  of paths.
  For example, if we give facet.prefix=Fic, the facet results are:
  Fic (3)
  Fic/Mystery (1)
  Fic/Childrens (2)
 
 
  I'm looking to supply a doc with just a single entry (like
  'category=NonFic/Sci/Phy/Quantum' ) and be able to do a drill down query.
  Is
  there some existing Solr tokernizer which takes care of generating all
  possibly combinations which indexing instead of having to generating them
  as
  part of doc creation?
 
  Thanks
 
 
 
 
 
  --
  View this message in context:
  http://lucene.472066.n3.nabble.com/Hierarchical-faceting-tp4169263.html
  Sent from the Solr - User mailing list archive at Nabble.com.