[Lucene.Net] [jira] [Commented] (LUCENENET-415) Contrib/Faceted Search
[ https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042963#comment-13042963 ] Digy commented on LUCENENET-415: According to my last tests, SFS searches cost only an additional 60-80 ms compared to standard searches(~3GB index, 1M docs, 342 facets). (Assuming that the same # of documents are read from the index). Some other features like - Faceting by query (can SFS be named as Faceting by field?) - Range faceting (e.g., monthly facets on fields like 20110602) (again correct terminology?) - Disk cache for large # of BitSets etc. can be added in the future. I think this is enough for *Simple*FacetedSearch. I will commit it to trunk. DIGY Contrib/Faceted Search -- Key: LUCENENET-415 URL: https://issues.apache.org/jira/browse/LUCENENET-415 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Attachments: PerformanceTest.cs, PerformanceTest.cs, PerformanceTest.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch2.cs, SimpleFacetedSearch2.cs, SimpleFacetedSearch2.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, facet performance.xls, facet performance.xls, facet performance.xls Since I see a lot of questions about faceted search in these days, I plan to add a Faceted-Search project to contrib. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-415) Contrib/Faceted Search
[ https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13043011#comment-13043011 ] Digy commented on LUCENENET-415: Thanks M.Herndon for this wiki page https://cwiki.apache.org/confluence/display/LUCENENET/Simple+Faceted+Search DIGY Contrib/Faceted Search -- Key: LUCENENET-415 URL: https://issues.apache.org/jira/browse/LUCENENET-415 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Assignee: Digy Priority: Minor Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Attachments: PerformanceTest.cs, PerformanceTest.cs, PerformanceTest.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch2.cs, SimpleFacetedSearch2.cs, SimpleFacetedSearch2.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, facet performance.xls, facet performance.xls, facet performance.xls Since I see a lot of questions about faceted search in these days, I plan to add a Faceted-Search project to contrib. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-415) Contrib/Faceted Search
[ https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13039259#comment-13039259 ] Digy commented on LUCENENET-415: With the increasing number of attached files, it is getting hard to trace the changes. I created a contrib project(SimpleFacetedSearch) under 2.9.4g branch https://svn.apache.org/repos/asf/incubator/lucene.net/branches/Lucene.Net_2_9_4g. DIGY Contrib/Faceted Search -- Key: LUCENENET-415 URL: https://issues.apache.org/jira/browse/LUCENENET-415 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Attachments: PerformanceTest.cs, PerformanceTest.cs, PerformanceTest.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch2.cs, SimpleFacetedSearch2.cs, SimpleFacetedSearch2.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, facet performance.xls, facet performance.xls, facet performance.xls Since I see a lot of questions about faceted search in these days, I plan to add a Faceted-Search project to contrib. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-415) Contrib/Faceted Search
[ https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13039272#comment-13039272 ] Digy commented on LUCENENET-415: Hi Ben, Do you think we still need IndexSearcher UseCache? DIGY Contrib/Faceted Search -- Key: LUCENENET-415 URL: https://issues.apache.org/jira/browse/LUCENENET-415 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Attachments: PerformanceTest.cs, PerformanceTest.cs, PerformanceTest.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch2.cs, SimpleFacetedSearch2.cs, SimpleFacetedSearch2.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, facet performance.xls, facet performance.xls, facet performance.xls Since I see a lot of questions about faceted search in these days, I plan to add a Faceted-Search project to contrib. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-415) Contrib/Faceted Search
[ https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13039278#comment-13039278 ] Ben West commented on LUCENENET-415: No, I don't think we need them. I still don't understand why the CachingWrapperFilters are so much faster than QueryWrapperFilter even on fresh queries. But I guess since the cache has weak references, there isn't a lot of harm in using them. Contrib/Faceted Search -- Key: LUCENENET-415 URL: https://issues.apache.org/jira/browse/LUCENENET-415 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Attachments: PerformanceTest.cs, PerformanceTest.cs, PerformanceTest.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch2.cs, SimpleFacetedSearch2.cs, SimpleFacetedSearch2.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, facet performance.xls, facet performance.xls, facet performance.xls Since I see a lot of questions about faceted search in these days, I plan to add a Faceted-Search project to contrib. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-415) Contrib/Faceted Search
[ https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13039289#comment-13039289 ] Digy commented on LUCENENET-415: I'll wait a few days before closing this issue commiting to 2.9.4 Here are the sources: Source: https://svn.apache.org/repos/asf/incubator/lucene.net/branches/Lucene.Net_2_9_4g/src/contrib/SimpleFacetedSearch/ Readme: https://svn.apache.org/repos/asf/incubator/lucene.net/branches/Lucene.Net_2_9_4g/src/contrib/SimpleFacetedSearch/README.txt Test Usage: https://svn.apache.org/repos/asf/incubator/lucene.net/branches/Lucene.Net_2_9_4g/test/contrib/SimpleFacetedSearch Any comments on class/variable names, APIs etc. since I've never been good in them? DIGY Contrib/Faceted Search -- Key: LUCENENET-415 URL: https://issues.apache.org/jira/browse/LUCENENET-415 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Attachments: PerformanceTest.cs, PerformanceTest.cs, PerformanceTest.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch2.cs, SimpleFacetedSearch2.cs, SimpleFacetedSearch2.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, facet performance.xls, facet performance.xls, facet performance.xls Since I see a lot of questions about faceted search in these days, I plan to add a Faceted-Search project to contrib. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-415) Contrib/Faceted Search
[ https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038063#comment-13038063 ] Digy commented on LUCENENET-415: Hi Ben, Thanks for your comments test code. {code} sfs = new SimpleFacetedSearch(reader, category); sfs.Search(query) // + fetch {code} is roughly equal to {code} foreach(cat in GetGroups(category)) { BooleanQuery bq = BooleanQuery(); bg.Add(query , Lucene.Net.Search.BooleanClause.Occur.MUST) bg.Add(queryParser.Parse(category: + cat) , Lucene.Net.Search.BooleanClause.Occur.MUST); indexSearcher.Search(bg); // + fetch } {code} It would be good to compare these two codes too. DIGY Contrib/Faceted Search -- Key: LUCENENET-415 URL: https://issues.apache.org/jira/browse/LUCENENET-415 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Attachments: PerformanceTest.cs, PerformanceTest.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, facet performance.xls Since I see a lot of questions about faceted search in these days, I plan to add a Faceted-Search project to contrib. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-415) Contrib/Faceted Search
[ https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038228#comment-13038228 ] Digy commented on LUCENENET-415: But BitSet+Caching is still faster than BooleanQuery, if don't misinterpret your numbers. DIGY Contrib/Faceted Search -- Key: LUCENENET-415 URL: https://issues.apache.org/jira/browse/LUCENENET-415 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Attachments: PerformanceTest.cs, PerformanceTest.cs, PerformanceTest.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, facet performance.xls, facet performance.xls Since I see a lot of questions about faceted search in these days, I plan to add a Faceted-Search project to contrib. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-415) Contrib/Faceted Search
[ https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037478#comment-13037478 ] Digy commented on LUCENENET-415: Hi Ben, About performance test: - One of the costly ops in this faceted-search is the creation of SimpleFacetedSearch. It creates the bit sets for all of the group members. Since it should be created only once when a new IndexReader is opened(if some documents are added or deleted), its creation time should be excluded from the test. - Another costly op is the fetching data from index. After each search, some data should be read and this duration should be included in the test. Eg. {code} TopDocs hits = sfs.Search(q, 100); for (int j = 0; j hits.ScoreDocs.Length; j++) { Document doc = reader.Document(hits.ScoreDocs[j].doc); Fieldable f = doc.GetField(title); } SimpleFacetedSearch.Hits hits = sfs.Search(q,maxDocPerGroup); foreach (var h in hits.HitsPerGroup) { foreach (Document doc in h.Documents) { Fieldable f = doc.GetField(title); } } {code} - Hits is a deprecated class and it repeates the search every N (AFAIK 100) document access. It is not a normal search and should be excluded from the test. Thanks, DIGY Contrib/Faceted Search -- Key: LUCENENET-415 URL: https://issues.apache.org/jira/browse/LUCENENET-415 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Attachments: PerformanceTest.cs, SimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs Since I see a lot of questions about faceted search in these days, I plan to add a Faceted-Search project to contrib. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-415) Contrib/Faceted Search
[ https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037061#comment-13037061 ] Digy commented on LUCENENET-415: Here is the documentation of the code:) {code} SimpleFacetedSearch sfs = new SimpleFacetedSearch(_Reader, cat); Query query = new QueryParser(text, new StandardAnalyzer()).Parse(block*); SimpleFacetedSearch.Hits hits = sfs.Search(query); long totalHits = hits.TotalHitCount; foreach (SimpleFacetedSearch.HitsPerGroup hpg in hits.HitsPerGroup) { long hitCountPerGroup = hpg.HitCount; foreach (Document doc in hpg) { string text = doc.GetField(text).StringValue(); } } {code} DIGY Contrib/Faceted Search -- Key: LUCENENET-415 URL: https://issues.apache.org/jira/browse/LUCENENET-415 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Attachments: SimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs Since I see a lot of questions about faceted search in these days, I plan to add a Faceted-Search project to contrib. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-415) Contrib/Faceted Search
[ https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037066#comment-13037066 ] Ben West commented on LUCENENET-415: I believe line 94 should be _GroupByField, not cat. Contrib/Faceted Search -- Key: LUCENENET-415 URL: https://issues.apache.org/jira/browse/LUCENENET-415 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Attachments: SimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs Since I see a lot of questions about faceted search in these days, I plan to add a Faceted-Search project to contrib. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira