[ https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056696#comment-13056696 ]
Shai Erera commented on LUCENE-3079: ------------------------------------ Oh Toke ... I just reviewed your test and there is a problem in it: {code} CountFacetRequest facetRequest = new CountFacetRequest( new CategoryPath(HIERARCHICAL), num); facetRequest.setDepth(5); {code} You create a CountFacetRequest, requesting to count HIERARCHICAL (which is the root), and fetch the top <num>, which is 5. BUT, you set the depth of the request to 5, which means it will compute the top-5 categories at each level ! This is a nice feature of the package, which lets you get not only the top-N child nodes of the immediate "root", but also the top-N of their child nodes and it's applied recursively until 'depth'. This is a nice feature, but not very performance friendly :). Can you please rerun the test then, commenting out that line? I can run it on my laptop, but I don't have the env. setup w/ the patch from LUCENE-2369, so I cannot compare. In general, this looks like a very useful test. I think we can commit it too, but rename it so that it doesn't run regularly w/ our tests, but rather selectively. > Faceting module > --------------- > > Key: LUCENE-3079 > URL: https://issues.apache.org/jira/browse/LUCENE-3079 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/facet > Reporter: Michael McCandless > Assignee: Shai Erera > Fix For: 3.4, 4.0 > > Attachments: LUCENE-3079-dev-tools.patch, LUCENE-3079.patch, > LUCENE-3079.patch, LUCENE-3079.patch, TestPerformanceHack.java > > > Faceting is a hugely important feature, available in Solr today but > not [easily] usable by Lucene-only apps. > We should fix this, by creating a shared faceting module. > Ideally, we factor out Solr's faceting impl, and maybe poach/merge > from other impls (eg Bobo browse). > Hoss describes some important challenges we'll face in doing this > (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here: > {noformat} > To look at "faceting" as a concrete example, there are big the reasons > faceting works so well in Solr: Solr has total control over the > index, knows exactly when the index has changed to rebuild caches, has a > strict schema so it can make sense of field types and > pick faceting algos accordingly, has multi-phase distributed search > approach to get exact counts efficiently across multiple shards, etc... > (and there are still a lot of additional enhancements and improvements > that can be made to take even more advantage of knowledge solr has because > it "owns" the index that we no one has had time to tackle) > {noformat} > This is a great list of the things we face in refactoring. It's also > important because, if Solr needed to be so deeply intertwined with > caching, schema, etc., other apps that want to facet will have the > same "needs" and so we really have to address them in creating the > shared module. > I think we should get a basic faceting module started, but should not > cut Solr over at first. We should iterate on the module, fold in > improvements, etc., and then, once we can fully verify that cutting > over doesn't hurt Solr (ie lose functionality or performance) we can > later cutover. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org