[jira] [Commented] (LUCENE-3079) Faceting module

Shai Erera (JIRA) Tue, 28 Jun 2011 11:26:42 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056696#comment-13056696
 ]


Shai Erera commented on LUCENE-3079:
------------------------------------

Oh Toke ... I just reviewed your test and there is a problem in it:

{code}
    CountFacetRequest facetRequest = new CountFacetRequest(
        new CategoryPath(HIERARCHICAL), num);
    facetRequest.setDepth(5);
{code}

You create a CountFacetRequest, requesting to count HIERARCHICAL (which is the 
root), and fetch the top <num>, which is 5. BUT, you set the depth of the 
request to 5, which means it will compute the top-5 categories at each level !

This is a nice feature of the package, which lets you get not only the top-N 
child nodes of the immediate "root", but also the top-N of their child nodes 
and it's applied recursively until 'depth'. This is a nice feature, but not 
very performance friendly :).

Can you please rerun the test then, commenting out that line? I can run it on 
my laptop, but I don't have the env. setup w/ the patch from LUCENE-2369, so I 
cannot compare.

In general, this looks like a very useful test. I think we can commit it too, 
but rename it so that it doesn't run regularly w/ our tests, but rather 
selectively.

> Faceting module
> ---------------
>
>                 Key: LUCENE-3079
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3079
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Michael McCandless
>            Assignee: Shai Erera
>             Fix For: 3.4, 4.0
>
>         Attachments: LUCENE-3079-dev-tools.patch, LUCENE-3079.patch, 
> LUCENE-3079.patch, LUCENE-3079.patch, TestPerformanceHack.java
>
>
> Faceting is a hugely important feature, available in Solr today but
> not [easily] usable by Lucene-only apps.
> We should fix this, by creating a shared faceting module.
> Ideally, we factor out Solr's faceting impl, and maybe poach/merge
> from other impls (eg Bobo browse).
> Hoss describes some important challenges we'll face in doing this
> (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here:
> {noformat}
> To look at "faceting" as a concrete example, there are big the reasons 
> faceting works so well in Solr: Solr has total control over the 
> index, knows exactly when the index has changed to rebuild caches, has a 
> strict schema so it can make sense of field types and 
> pick faceting algos accordingly, has multi-phase distributed search 
> approach to get exact counts efficiently across multiple shards, etc...
> (and there are still a lot of additional enhancements and improvements 
> that can be made to take even more advantage of knowledge solr has because 
> it "owns" the index that we no one has had time to tackle)
> {noformat}
> This is a great list of the things we face in refactoring.  It's also
> important because, if Solr needed to be so deeply intertwined with
> caching, schema, etc., other apps that want to facet will have the
> same "needs" and so we really have to address them in creating the
> shared module.
> I think we should get a basic faceting module started, but should not
> cut Solr over at first.  We should iterate on the module, fold in
> improvements, etc., and then, once we can fully verify that cutting
> over doesn't hurt Solr (ie lose functionality or performance) we can
> later cutover.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3079) Faceting module

Reply via email to