Re: faceting

Garrett Barton Thu, 27 Sep 2012 11:39:34 -0700

I too want similar functionality.  The first thing I would like to see is a
simple ordered list of all terms in a field with counts returned. This
would be enabled I think through the analyzer definition at index creation
time probably. Make someone conciously decide they want to take the
calculation hit instead of putting the load on the shard servers.  Also
isn't it faster right now to just execute aditional queries and use the hit
counts than load up one with the facets?
The second thing is not faceting directly I just happen to be using it with
facets all the time.  I like to try and find the distinct values (and their
counts) of a field for a given query for filtering. Right now I plow
through some multiple of the results I return to try and get a mostly
complete list of terms, this is obviously not the complete list.  Is there
a way to get that list or make an API call to let me send that query to the
shards?


Thanks for listening!
Garrett

On Thursday, September 27, 2012, Aaron McCurry <[email protected]> wrote:
> Yep.  We can build it, but I think there needs to be some limits placed on
> how many terms can be enumerated on.  I would hate to have someone pick an
> primary key field to enumerate on and blow up the server.  I think that
> easiest way to do it would be to expand the terms in the field on the
shard
> server and run the current faceting query on those expanded terms.  I
think
> that is the easy part.  The hard part is going to be how we modify the
> facet api in thrift to accept the new facet type and how to return the
> facet results.  How would you want the result api to look?
>
> Aaron
>
> On Thu, Sep 27, 2012 at 1:27 PM, Tim Williams <[email protected]>
wrote:
>
>> On Tue, Sep 18, 2012 at 10:42 AM, Aaron McCurry <[email protected]>
>> wrote:
>> > In the BlurQuery object, add Facet objects to the facet list.  Where
the
>> > Facet object contains the query that you want to facet on for example:
>> >
>> > bq = new BlurQuery();
>> > bq.addFacet(new Facet("tweets.text:hadoop", Long.MAX_VALUE); // where
the
>> > long is the minimum number results in the facet to return.
>> > // So if the value was set to 10, the facet object would stop counting
>> the
>> > facet at 10.  Note: It's very likely that you will get more than your
>> > minimum back.
>> >
>> > results = client.query("table",bq);
>> > List<Long> counts = results.getFacetCounts();
>> > long hadoopCount = counts.get(0); // The index of the results will
match
>> > the index of the facet object that where in the query.
>> >
>> > Hope this helps, let me know if you have anymore questions.
>>
>> Thanks it does.  I'm in need of the other kind of faceting, where a
>> facet is essentially the distinct values for a field relative to a
>> given query. Something like Solr's Enum-Based Field Faceting[1].  Any
>> pointers for how I could implement that inside Blur?  The only thing I
>> can come up with is outside blur and seems inefficient - essentially
>> record distinct values for the fields of interest at ingest time; then
>> use those values in Blur's existing facetquery to get the counts.  I'm
>> guessing there's a better approach?
>>
>> Thanks,
>> --tim
>>
>> [1] - http://wiki.apache.org/solr/SolrFacetingOverview
>>
>

Re: faceting

Reply via email to