Hi Jack,

> Are Lucene facets "static" in some/any sense?

Lucene facets are not static in any way. The taxonomy is built on-the-fly,
as documents are added to the index. You could say that it's 'discovered'
as you add documents.
The facets come with a rich userguide:
http://lucene.apache.org/core/4_0_0/facet/org/apache/lucene/facet/doc-files/userguide.html
I also wrote a few posts on it: http://shaierera.blogspot.com

> What decisions does an app developer need to make upfront

Well, as an app developer, you currently need to decide up front what
facets your documents will have. A document may not contain all the facets,
but you cannot say "hey, I added an Author field, now I want to facet on
it". The reason is that in order to facet on it, the values that you put
under Author need to be added to the taxonomy and resolved to an ordinal.
Then those ords are written in the search index, in a way that enables very
fast and efficient aggregations.

Also, if you're going to do more than just counting (see my first post -
intro to facets), you're going to need to index the facets in a special way
(I intend to write a blog about that too, w/ example code).
But I guess that's expected right? Like, you cannot add a 'price' field to
the index as String values, and suddenly expect to be able to do efficient
range queries on it.
As an app developer you'll recognize that when writing your app and add the
field as a numeric field.

> and can only be changed with a full reindex of the data?

As with regular Lucene fields, if you suddenly decide to make a change to
your taxonomy, e.g. that category A/C now needs to be under A/B/C, then
yes, you will need to re-index the documents that were previously
associated w/ A/C. But now that we're making progress w/ field level
updated (see LUCENE-4258), perhaps in the future you won't need to do so.

> I'm trying to get a handle on whether Lucene Facets is a guru-level
feature...

Absolutely not ! Lucene facets allow you to do very complicated things, but
also start up w/ a faceted index in I'd say even less than 5 minutes.
Look at this post (
http://shaierera.blogspot.com/2012/11/lucene-facets-part-2.html). You can
copy paste the code (over current trunk) and get an impression of what it's
like to index facets w/ Lucene.
Also, Mike McCandless and I are working on lots of simplifications now,
including some specialized code paths for common use cases. You can follow
LUCENE-4619.

> is it the kind of feature that is mainly of interest to the developers of
higher-level search platforms such as Solr and ElasticSearch as opposed to
the users of those platforms

Again, absolutely not! Well, it's true that in order to get the real value
out of faceted search you need to at least have a User Interface that shows
you the returned facets, weights etc.
But there's nothing in the module that restricts you from working with it
as-is.

Hope I answered all your questions.

Shai


On Thu, Dec 13, 2012 at 4:28 PM, Jack Krupansky <j...@basetechnology.com>wrote:

> "the lucene module requires users to decide at indexing time what and how
> to facet whereas Solr does everything at searching time"
>
> It would be nice to have some confirmation/clarification of that - Are
> Lucene facets "static" in some/any sense? What decisions does an app
> developer need to make upfront and can only be changed with a full reindex
> of the data?
>
> I'm trying to get a handle on whether Lucene Facets is a guru-level
> feature or something that an average Lucene user can trivially master with
> say 5 minutes of reading. Or is it the kind of feature that is mainly of
> interest to the developers of higher-level search platforms such as Solr
> and ElasticSearch as opposed to the users of those platforms?
>
> -- Jack Krupansky
>
> -----Original Message----- From: Adrien Grand
> Sent: Thursday, December 13, 2012 7:03 AM
> To: dev@lucene.apache.org
> Subject: Re: Solr faceting vs. Lucene faceting
>
>
> Hi Shai,
>
> On Thu, Dec 13, 2012 at 12:21 PM, Shai Erera <ser...@gmail.com> wrote:
>
>> As I said, if someone volunteers to do some work on the Solr side, I will
>> gladly participate in that effort.
>> I just don't even know where to start w/ Solr :).
>>
>
> The entry point for Solr facets is
> org.apache.solr.request.**SimpleFacets.getFacetCounts (called from
> FacetComponent).
>
>  One thing that would be really great is if we can build an adapter (I
>> think
>> someone mentioned that word here)
>> which supports basic facets capabilities, so that we can at least
>> benchmark
>> Solr's current
>> implementation vs the implementation w/ the module.
>>
>
> Comparing both impls would be great but an adapter might be hard to
> write given how Lucene faceting differs from Solr faceting: the lucene
> module requires users to decide at indexing time what and how to facet
> whereas Solr does everything at searching time (there is even an issue
> open in order to be able to compute facet counts based on arbitray
> functions [1]) using FieldCache and UninvertedField (meaning that you
> can compute facets on any field that is indexed). So Lucene faceting
> would probably require an additional field property in the schema to
> let Solr know that it should add category paths to documents? (Please
> correct me if anything I wrote here is wrong).
>
> I have a few questions regarding the faceting module:
> - do you have any rough idea of how speed and memory usage vary
> depending on the number of docs to collect, distinct field values,
> etc. ?
> - TaxonomyReader seems to use ints as ordinals for category paths,
> does it mean that the faceting module can't handle paths that have
> more than 2B distinct values? Is it fixable? (Or maybe it doesn't make
> sense to handle such large numbers of distinct values?)
>
> [1] 
> https://issues.apache.org/**jira/browse/SOLR-1581<https://issues.apache.org/jira/browse/SOLR-1581>
>
> --
> Adrien
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: 
> dev-unsubscribe@lucene.apache.**org<dev-unsubscr...@lucene.apache.org>
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: 
> dev-unsubscribe@lucene.apache.**org<dev-unsubscr...@lucene.apache.org>
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Reply via email to