All facet.fields for a given facet.query?

2007-06-18 Thread James Mead

Thanks for a great project.

Is it possible to request all facet.fields for a given facet.query instead
of having to request specific facet.fields? e.g. is there a wildcard for
facet.fields?
--
James.
http://blog.floehopper.org


Re: All facet.fields for a given facet.query?

2007-06-18 Thread Yonik Seeley

On 6/18/07, James Mead <[EMAIL PROTECTED]> wrote:

Is it possible to request all facet.fields for a given facet.query instead
of having to request specific facet.fields? e.g. is there a wildcard for
facet.fields?


Not currently.
Can you elaborate on the problem you are trying to solve?  Are you
using dynamic fields and hence don't know the exact names of the
fields to facet on?

-Yonik


Re: All facet.fields for a given facet.query?

2007-06-19 Thread Thomas Traeger

Hi,

I'm also just at that point where I think I need a wildcard facet.field 
parameter (or someone points out another solution for my problem...). 
Here is my situation:


I have many products of different types with totally different 
attributes. There are currently more than 300 attributes
I use dynamic fields to import the attributes into solr without having 
to define a specific field for each attribute. Now when I make a query I 
would like to get back all facet.fields that are relevant for that query.


I think it would be really nice, if I don't have to know which facets 
fields are there at query time, instead just import attributes into 
dynamic fields, get the relevant facets back and decide in the frontend 
which to display and how...


What do the experts think about this?

Tom


Re: All facet.fields for a given facet.query?

2007-06-19 Thread Chris Hostetter
: I have many products of different types with totally different
: attributes. There are currently more than 300 attributes
: I use dynamic fields to import the attributes into solr without having
: to define a specific field for each attribute. Now when I make a query I
: would like to get back all facet.fields that are relevant for that query.
:
: I think it would be really nice, if I don't have to know which facets
: fields are there at query time, instead just import attributes into

The problem is there may be lots of fields you index but don't want to
facet on (full text search fields) and Solr has no easy way of knowing the
difference between those and the fields you think it makes sense to facet
on ... even if a field does make sense to facet on some of the time, that
doesn't mean it makes sense all of the time (as you say "when I make a
query I would like to get back all facet.fields that are relevant for that
query" ... Solr has no way of knowing which fields make sense for that
query unless it tries them all (can be very expensive) or you tell it.

I solve this problem by having metadata stored in my index which tells
my custom request handler what fields to facet on for each category ...
but i've also got several thousand categories.  If you've got less then
100 categories, you could easily enumerate them all with default
facet.field params in your solrconfig using seperate requesthandler
instances.

: What do the experts think about this?

you may want to read up on the past discussion of this in SOLR-247 ... in
particular note the link to the mail archive where there was assitional
discussion about it as well.  Where we left things is that it
might make sense to support true globging in both fl and facet.field, so
you can use naming conventions and say things like facet.field=facet_*
but that in general trying to do something like facet.field=* would be a
very bad idea even if it was supported.

http://issues.apache.org/jira/browse/SOLR-247


-Hoss



Re: All facet.fields for a given facet.query?

2007-06-19 Thread Martin Grotzke
On Tue, 2007-06-19 at 11:09 -0700, Chris Hostetter wrote:
> I solve this problem by having metadata stored in my index which tells
> my custom request handler what fields to facet on for each category ...
How do you define this metadata?

Cheers,
Martin


> but i've also got several thousand categories.  If you've got less then
> 100 categories, you could easily enumerate them all with default
> facet.field params in your solrconfig using seperate requesthandler
> instances.
> 
> : What do the experts think about this?
> 
> you may want to read up on the past discussion of this in SOLR-247 ... in
> particular note the link to the mail archive where there was assitional
> discussion about it as well.  Where we left things is that it
> might make sense to support true globging in both fl and facet.field, so
> you can use naming conventions and say things like facet.field=facet_*
> but that in general trying to do something like facet.field=* would be a
> very bad idea even if it was supported.
> 
> http://issues.apache.org/jira/browse/SOLR-247
> 
> 
> -Hoss
> 
-- 
Martin Grotzke
http://www.javakaffee.de/blog/


signature.asc
Description: This is a digitally signed message part


Re: All facet.fields for a given facet.query?

2007-06-19 Thread Martin Grotzke
On Tue, 2007-06-19 at 19:16 +0200, Thomas Traeger wrote:
> Hi,
> 
> I'm also just at that point where I think I need a wildcard facet.field 
> parameter (or someone points out another solution for my problem...). 
> Here is my situation:
> 
> I have many products of different types with totally different 
> attributes. There are currently more than 300 attributes
> I use dynamic fields to import the attributes into solr without having 
> to define a specific field for each attribute. Now when I make a query I 
> would like to get back all facet.fields that are relevant for that query.
> 
> I think it would be really nice, if I don't have to know which facets 
> fields are there at query time, instead just import attributes into 
> dynamic fields, get the relevant facets back and decide in the frontend 
> which to display and how...
Do you really need all facets in the frontend?

Would it be a solution to have a facet ranking in the field definitions,
and then decide at query time, on which fields to facet on? This would
need an additional query parameter like facet.query.count.

E.g. if you have a query with q=foo+AND+prop1:bar+AND+prop2:baz
and you have fields
prop1 with facet-ranking 100
prop2 with facet-ranking 90
prop3 with facet-ranking 80
prop4 with facet-ranking 70
prop5 with facet-ranking 60

then you might decide not to facet on prop1 and prop2 as you have
already a constraint on it, but to facet on prop3 and prop4 if
facet.query.count is 2.

Just thinking about that... :)

Cheers,
Martin


> 
> What do the experts think about this?
> 
> Tom
> 
-- 
Martin Grotzke
http://www.javakaffee.de/blog/


signature.asc
Description: This is a digitally signed message part


Re: All facet.fields for a given facet.query?

2007-06-20 Thread Thomas Traeger

first: sorry for the bad quoting, I found your message in the archive only...


I have many products of different types with totally different
attributes. There are currently more than 300 attributes
I use dynamic fields to import the attributes into solr without having
to define a specific field for each attribute. Now when I make a query I
would like to get back all facet.fields that are relevant for that query.
I think it would be really nice, if I don't have to know which facets
fields are there at query time, instead just import attributes into



The problem is there may be lots of fields you index but don't want to
facet on (full text search fields) and Solr has no easy way of knowing the
difference between those and the fields you think it makes sense to facet
on ... even if a field does make sense to facet on some of the time, that
doesn't mean it makes sense all of the time (as you say "when I make a
query I would like to get back all facet.fields that are relevant for that
query" ... Solr has no way of knowing which fields make sense for that
query unless it tries them all (can be very expensive) or you tell it.
I solve this problem by having metadata stored in my index which tells
my custom request handler what fields to facet on for each category ...
but i've also got several thousand categories.  If you've got less then
100 categories, you could easily enumerate them all with default
facet.field params in your solrconfig using seperate requesthandler
instances.



What do the experts think about this?



you may want to read up on the past discussion of this in SOLR-247 ... in
particular note the link to the mail archive where there was assitional
discussion about it as well.  Where we left things is that it
might make sense to support true globging in both fl and facet.field, so
you can use naming conventions and say things like facet.field=facet_*
but that in general trying to do something like facet.field=* would be a
very bad idea even if it was supported.
http://issues.apache.org/jira/browse/SOLR-247



to make it clear, i agree that it doesn't make sense faceting on all available 
fields, I only want faceting on those 300 attributes that are stored together 
with the fields for full text searches. A product/document has typically only 
5-10 attributes.

I like to decide at index time which attributes of a product might be of 
interest for faceting and store those in dynamic fields with the attribute-name 
and some kind of prefix or suffix to identify them at query time as 
facet.fields. Exactly the naming convention you mentioned.

I will have a closer look at SOLR-247 and the supplied patch, seems like a good 
starting point to dig deeper into solr... :o)

Tom




Re: All facet.fields for a given facet.query?

2007-06-20 Thread Thomas Traeger

Martin Grotzke schrieb:

On Tue, 2007-06-19 at 19:16 +0200, Thomas Traeger wrote:
  

Hi,

I'm also just at that point where I think I need a wildcard facet.field 
parameter (or someone points out another solution for my problem...). 
Here is my situation:


I have many products of different types with totally different 
attributes. There are currently more than 300 attributes
I use dynamic fields to import the attributes into solr without having 
to define a specific field for each attribute. Now when I make a query I 
would like to get back all facet.fields that are relevant for that query.


I think it would be really nice, if I don't have to know which facets 
fields are there at query time, instead just import attributes into 
dynamic fields, get the relevant facets back and decide in the frontend 
which to display and how...


Do you really need all facets in the frontend?
  

no, only the subset with matches for the current query.

Would it be a solution to have a facet ranking in the field definitions,
and then decide at query time, on which fields to facet on? This would
need an additional query parameter like facet.query.count.

E.g. if you have a query with q=foo+AND+prop1:bar+AND+prop2:baz
and you have fields
prop1 with facet-ranking 100
prop2 with facet-ranking 90
prop3 with facet-ranking 80
prop4 with facet-ranking 70
prop5 with facet-ranking 60

then you might decide not to facet on prop1 and prop2 as you have
already a constraint on it, but to facet on prop3 and prop4 if
facet.query.count is 2.

Just thinking about that... :)

Cheers,
Martin

  
One step after the other ;o), the ranking of the facets will be another 
problem I have to solve, counts of facets and matching documents will be 
a starting point. Another idea is to use the score of the documents 
returned by the query to compute a score for the facet.field...


Tom


Re: All facet.fields for a given facet.query?

2007-06-20 Thread Martin Grotzke
On Wed, 2007-06-20 at 12:59 +0200, Thomas Traeger wrote:
> Martin Grotzke schrieb:
> > On Tue, 2007-06-19 at 19:16 +0200, Thomas Traeger wrote:
[...]
> >> I think it would be really nice, if I don't have to know which facets 
> >> fields are there at query time, instead just import attributes into 
> >> dynamic fields, get the relevant facets back and decide in the frontend 
> >> which to display and how...
> >> 
> > Do you really need all facets in the frontend?
> >   
> no, only the subset with matches for the current query.
ok, that's somehow similar to our requirement, but we want to get only
e.g. the first 5 relevant facets back from solr and not handle this
in the frontend.

> > Would it be a solution to have a facet ranking in the field definitions,
> > and then decide at query time, on which fields to facet on? This would
> > need an additional query parameter like facet.query.count.
[...]
> >   
> One step after the other ;o), the ranking of the facets will be another 
> problem I have to solve, counts of facets and matching documents will be 
> a starting point. Another idea is to use the score of the documents 
> returned by the query to compute a score for the facet.field...
Yep, this is also different for different applications.

I'm also interested in this problem and would like to help solving
this problem (though I'm really new to lucene and solr)...

Cheers,
Martin


> 
> Tom
> 
-- 
Martin Grotzke
http://www.javakaffee.de/blog/


signature.asc
Description: This is a digitally signed message part


Re: All facet.fields for a given facet.query?

2007-06-20 Thread Chris Hostetter

: > I solve this problem by having metadata stored in my index which tells
: > my custom request handler what fields to facet on for each category ...
: How do you define this metadata?

this might be a good place to start, note that this message is almost two
years old, and predates the opensourcing of Solr ... the "Servlet" refered
to in this thread is Solr.

http://www.nabble.com/Announcement%3A-Lucene-powering-CNET.com-Product-Category-Listings-p748420.html

...i think i also talked a bit about the metadata documents in my
apachecon slides from last yera ... but i don't really remember, and i
haven't look at them in a while...

http://people.apache.org/~hossman/apachecon2006us/


-Hoss



Re: All facet.fields for a given facet.query?

2007-06-20 Thread Chris Hostetter
: to make it clear, i agree that it doesn't make sense faceting on all
: available fields, I only want faceting on those 300 attributes that are
: stored together with the fields for full text searches. A
: product/document has typically only 5-10 attributes.
:
: I like to decide at index time which attributes of a product might be of
: interest for faceting and store those in dynamic fields with the
: attribute-name and some kind of prefix or suffix to identify them at
: query time as facet.fields. Exactly the naming convention you mentioned.

but if the facet fields are different for every document, and they use a
simple dynamicField prefix (like "facet_*" for example) how do you know at
query time which fields to facet on? ... even if wildcards work in
facet.field, usingfacet.field=facet_* would require solr to compute the
counts for *every* field matching that pattern to find out which ones have
positive counts for the current result set -- there may only be 5 that
actually matter, but it's got to try all 300 of them to find out which 5
that is.

this is where custom request handlers that understand that faceting
"metadata" for your documents becomes key ... so you can say "when
querying across the entire collection, only try to facet on category and
manufacturer.  if the search is constrained by category, then lookup other
facet options to offer based on that category name from our metadata
store, etc...



-Hoss



Re: All facet.fields for a given facet.query?

2007-06-20 Thread Thomas Traeger

Chris Hostetter schrieb:

: to make it clear, i agree that it doesn't make sense faceting on all
: available fields, I only want faceting on those 300 attributes that are
: stored together with the fields for full text searches. A
: product/document has typically only 5-10 attributes.
:
: I like to decide at index time which attributes of a product might be of
: interest for faceting and store those in dynamic fields with the
: attribute-name and some kind of prefix or suffix to identify them at
: query time as facet.fields. Exactly the naming convention you mentioned.

but if the facet fields are different for every document, and they use a
simple dynamicField prefix (like "facet_*" for example) how do you know at
query time which fields to facet on? ... even if wildcards work in
facet.field, usingfacet.field=facet_* would require solr to compute the
counts for *every* field matching that pattern to find out which ones have
positive counts for the current result set -- there may only be 5 that
actually matter, but it's got to try all 300 of them to find out which 5
that is.
I just made a quick test by building a facet query with those 300 
attributes.
I realized, that the facets are build out of the whole index, not the 
subset

returned by the initial query. Therefore I have a large number of empty
facets which I simply ignore. In my case the QueryTime is somewhat 
higher (of

course) but it is still at some milliseconds. (wow!!!) :o)

So at this state of my investigation and in my use case I don't have to 
worry
about performance even if I use the system in a way that uses more 
resources

than necessary.

this is where custom request handlers that understand that faceting
"metadata" for your documents becomes key ... so you can say "when
querying across the entire collection, only try to facet on category and
manufacturer.  if the search is constrained by category, then lookup other
facet options to offer based on that category name from our metadata
store, etc...

Faceting on manufacturers and categories first and than present the
corresponding facets might be used under some circumstances, but in my case
the category structure is quite deep, detailed and complex. So when
the user enters a query I like to say to him "Look, here are the
manufacturers and categories with matches to your query, choose one if you
want, but maybe there is another one with products that better fit your
needs or products that you didn't even know about. So maybe you like to
filter based on the following attributes." Something like this ;o)

The point is, that i currently don't want to know too much about the data,
I just want to feed it into solr, follow some conventions and get the most
out of it as quickly as possible. Optimizations can and will take place at
a later time.

I hope to find some time to dig into solr SimpleFacets this weekend.

Regards,

Tom


Re: All facet.fields for a given facet.query?

2007-06-20 Thread Martin Grotzke
On Wed, 2007-06-20 at 12:49 -0700, Chris Hostetter wrote:
> : > I solve this problem by having metadata stored in my index which tells
> : > my custom request handler what fields to facet on for each category ...
> : How do you define this metadata?
> 
> this might be a good place to start, note that this message is almost two
> years old, and predates the opensourcing of Solr ... the "Servlet" refered
> to in this thread is Solr.
> 
> http://www.nabble.com/Announcement%3A-Lucene-powering-CNET.com-Product-Category-Listings-p748420.html
> 
> ...i think i also talked a bit about the metadata documents in my
> apachecon slides from last yera ... but i don't really remember, and i
> haven't look at them in a while...
> 
> http://people.apache.org/~hossman/apachecon2006us/

thx, I'll have a look at these resources.

cheers,
martin


> 
> 
> -Hoss
> 



signature.asc
Description: This is a digitally signed message part


Re: All facet.fields for a given facet.query?

2007-06-20 Thread Chris Hostetter

: I realized, that the facets are build out of the whole index, not the
: subset
: returned by the initial query. Therefore I have a large number of empty
: facets which I simply ignore. In my case the QueryTime is somewhat

facet.mincount is a way to tell solr not to bother giving you those 0
counts ... you sill still get the name of hte field though so that you
know it tried it.

: Faceting on manufacturers and categories first and than present the
: corresponding facets might be used under some circumstances, but in my case
: the category structure is quite deep, detailed and complex. So when
: the user enters a query I like to say to him "Look, here are the
: manufacturers and categories with matches to your query, choose one if you
: want, but maybe there is another one with products that better fit your
: needs or products that you didn't even know about. So maybe you like to
: filter based on the following attributes." Something like this ;o)

categories was just an example i used because it tends to be a common use
case ... my point is the decision about which facet qualifies for the
"maybe there is another one with products that better fit your needs" part
of the response either requires computing counts for *every* facet
constraint and then looking at them to see which ones provide good
distribution, or by knowing something more about your metadata (ie: having
stats that show the majority of people who search on the word "canon" want
to facet on "megapixels") .. this is where custom biz logic comes in,
becuase in a lot of situations computing counts for every possible facet
may not be practical (even if the syntax to request it was easier)


-Hoss



Re: All facet.fields for a given facet.query?

2007-06-20 Thread Yonik Seeley

On 6/20/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:

facet.mincount is a way to tell solr not to bother giving you those 0
counts ...


An aside: shouldn't that be the default?  All of the people using
facets that I have seen always have to set facet.mincount=1 (or
facet.zeros=false)

-Yonik


Re: All facet.fields for a given facet.query?

2007-06-21 Thread Thomas Traeger



: Faceting on manufacturers and categories first and than present the
: corresponding facets might be used under some circumstances, but in my case
: the category structure is quite deep, detailed and complex. So when
: the user enters a query I like to say to him "Look, here are the
: manufacturers and categories with matches to your query, choose one if you
: want, but maybe there is another one with products that better fit your
: needs or products that you didn't even know about. So maybe you like to
: filter based on the following attributes." Something like this ;o)

categories was just an example i used because it tends to be a common use
case ... my point is the decision about which facet qualifies for the
"maybe there is another one with products that better fit your needs" part
of the response either requires computing counts for *every* facet
constraint and then looking at them to see which ones provide good
distribution, or by knowing something more about your metadata (ie: having
stats that show the majority of people who search on the word "canon" want
to facet on "megapixels") .. this is where custom biz logic comes in,
becuase in a lot of situations computing counts for every possible facet
may not be practical (even if the syntax to request it was easier)
I get your point, but how to know where additional metadata is of value 
if not
just trying? Currently I start with a generic approach to see what 
really is

in the product data, to get an overview of the quality of the data and
what happens if I use the data in the new search solution. Then I can 
decide

what is to do to optimize the system, i.e. try to reduce the count of
attributes, get the marketing to split somewhat generic attributes into 
more
detailed ones, find a way to display the most relevant facets for the 
current

query first and so on...

Tom


Re: All facet.fields for a given facet.query?

2007-06-21 Thread Chris Hostetter

: I get your point, but how to know where additional metadata is of value
: if not
: just trying? Currently I start with a generic approach to see what

Man power.

for simple schemas the brute force facet on everything appraoch can scale
well .. but as soon as you start talking about having hundards of dynamic
fields where every product might be differnet you have to either
accept that you're going to be fighting an uphill performance battler
-- or start explicitly classifying those fields in some way that lets you
know which ones to use in which use cases (or at the very least: which
order to try them in in which use cases so you can do the most important
ones first and stop when you have some options to give the user.

you can even use the brute force "facet on everything" in Solr appraoch to
help you find those patterns for classifying your fields ... you might
even be able to completely automate it ... but i'm guessing you're going
to want to do it in batch on the backend and not in real time everytime a
user does a search.




-Hoss



Re: All facet.fields for a given facet.query?

2007-06-21 Thread Chris Hostetter

: > facet.mincount is a way to tell solr not to bother giving you those 0
: > counts ...
:
: An aside: shouldn't that be the default?  All of the people using
: facets that I have seen always have to set facet.mincount=1 (or
: facet.zeros=false)

Hmmm... maybe, but it's a really easy option to turn on, and i think if we
don't have facet.mincount default to 0 new users might get confused
when some constraints don't show up ... returning them with a 0 count
makes it clear Solr knows about them and tried them and found no
intersection with the current results.


-Hoss