[Lucene.Net] [jira] [Commented] (LUCENENET-415) Contrib/Faceted Search

2011-06-02 Thread Digy (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042963#comment-13042963
 ] 

Digy commented on LUCENENET-415:


According to my last tests, SFS searches cost only an additional 60-80 ms 
compared to standard searches(~3GB index, 1M docs, 342 facets).
(Assuming that the same # of documents are read from the index).

Some other features like 
 - Faceting by query (can SFS be named as Faceting by field?)
 - Range faceting (e.g., monthly facets on fields like 20110602) (again 
correct terminology?)
 - Disk cache for large # of BitSets
etc. can be added in the future.
I think this is enough for *Simple*FacetedSearch.

I will commit it to trunk.

DIGY



 Contrib/Faceted Search
 --

 Key: LUCENENET-415
 URL: https://issues.apache.org/jira/browse/LUCENENET-415
 Project: Lucene.Net
  Issue Type: New Feature
Affects Versions: Lucene.Net 2.9.4
Reporter: Digy
Priority: Minor
 Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g

 Attachments: PerformanceTest.cs, PerformanceTest.cs, 
 PerformanceTest.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, 
 SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch2.cs, 
 SimpleFacetedSearch2.cs, SimpleFacetedSearch2.cs, TestSimpleFacetedSearch.cs, 
 TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch2.cs, 
 TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, 
 TestSimpleFacetedSearch2.cs, facet performance.xls, facet performance.xls, 
 facet performance.xls


 Since I see a lot of questions about faceted search in these days, I plan to 
 add a Faceted-Search project to contrib.
 DIGY

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[Lucene.Net] [jira] [Commented] (LUCENENET-415) Contrib/Faceted Search

2011-06-02 Thread Digy (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13043011#comment-13043011
 ] 

Digy commented on LUCENENET-415:


Thanks M.Herndon for this wiki page

https://cwiki.apache.org/confluence/display/LUCENENET/Simple+Faceted+Search

DIGY

 Contrib/Faceted Search
 --

 Key: LUCENENET-415
 URL: https://issues.apache.org/jira/browse/LUCENENET-415
 Project: Lucene.Net
  Issue Type: New Feature
Affects Versions: Lucene.Net 2.9.4
Reporter: Digy
Assignee: Digy
Priority: Minor
 Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g

 Attachments: PerformanceTest.cs, PerformanceTest.cs, 
 PerformanceTest.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, 
 SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch2.cs, 
 SimpleFacetedSearch2.cs, SimpleFacetedSearch2.cs, TestSimpleFacetedSearch.cs, 
 TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch2.cs, 
 TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, 
 TestSimpleFacetedSearch2.cs, facet performance.xls, facet performance.xls, 
 facet performance.xls


 Since I see a lot of questions about faceted search in these days, I plan to 
 add a Faceted-Search project to contrib.
 DIGY

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[Lucene.Net] [jira] [Commented] (LUCENENET-415) Contrib/Faceted Search

2011-05-25 Thread Digy (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13039259#comment-13039259
 ] 

Digy commented on LUCENENET-415:


With the increasing number of attached files, it is getting hard to trace the 
changes.
I created a contrib project(SimpleFacetedSearch) under 2.9.4g branch

https://svn.apache.org/repos/asf/incubator/lucene.net/branches/Lucene.Net_2_9_4g.

DIGY



 Contrib/Faceted Search
 --

 Key: LUCENENET-415
 URL: https://issues.apache.org/jira/browse/LUCENENET-415
 Project: Lucene.Net
  Issue Type: New Feature
Affects Versions: Lucene.Net 2.9.4
Reporter: Digy
Priority: Minor
 Attachments: PerformanceTest.cs, PerformanceTest.cs, 
 PerformanceTest.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, 
 SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch2.cs, 
 SimpleFacetedSearch2.cs, SimpleFacetedSearch2.cs, TestSimpleFacetedSearch.cs, 
 TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch2.cs, 
 TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, 
 TestSimpleFacetedSearch2.cs, facet performance.xls, facet performance.xls, 
 facet performance.xls


 Since I see a lot of questions about faceted search in these days, I plan to 
 add a Faceted-Search project to contrib.
 DIGY

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[Lucene.Net] [jira] [Commented] (LUCENENET-415) Contrib/Faceted Search

2011-05-25 Thread Digy (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13039272#comment-13039272
 ] 

Digy commented on LUCENENET-415:


Hi Ben,
Do you think we still need IndexSearcher  UseCache?

DIGY

 Contrib/Faceted Search
 --

 Key: LUCENENET-415
 URL: https://issues.apache.org/jira/browse/LUCENENET-415
 Project: Lucene.Net
  Issue Type: New Feature
Affects Versions: Lucene.Net 2.9.4
Reporter: Digy
Priority: Minor
 Attachments: PerformanceTest.cs, PerformanceTest.cs, 
 PerformanceTest.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, 
 SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch2.cs, 
 SimpleFacetedSearch2.cs, SimpleFacetedSearch2.cs, TestSimpleFacetedSearch.cs, 
 TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch2.cs, 
 TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, 
 TestSimpleFacetedSearch2.cs, facet performance.xls, facet performance.xls, 
 facet performance.xls


 Since I see a lot of questions about faceted search in these days, I plan to 
 add a Faceted-Search project to contrib.
 DIGY

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[Lucene.Net] [jira] [Commented] (LUCENENET-415) Contrib/Faceted Search

2011-05-25 Thread Ben West (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13039278#comment-13039278
 ] 

Ben West commented on LUCENENET-415:


No, I don't think we need them.

I still don't understand why the CachingWrapperFilters are so much faster than 
QueryWrapperFilter even on fresh queries. But I guess since the cache has weak 
references, there isn't a lot of harm in using them.

 Contrib/Faceted Search
 --

 Key: LUCENENET-415
 URL: https://issues.apache.org/jira/browse/LUCENENET-415
 Project: Lucene.Net
  Issue Type: New Feature
Affects Versions: Lucene.Net 2.9.4
Reporter: Digy
Priority: Minor
 Attachments: PerformanceTest.cs, PerformanceTest.cs, 
 PerformanceTest.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, 
 SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch2.cs, 
 SimpleFacetedSearch2.cs, SimpleFacetedSearch2.cs, TestSimpleFacetedSearch.cs, 
 TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch2.cs, 
 TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, 
 TestSimpleFacetedSearch2.cs, facet performance.xls, facet performance.xls, 
 facet performance.xls


 Since I see a lot of questions about faceted search in these days, I plan to 
 add a Faceted-Search project to contrib.
 DIGY

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[Lucene.Net] [jira] [Commented] (LUCENENET-415) Contrib/Faceted Search

2011-05-25 Thread Digy (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13039289#comment-13039289
 ] 

Digy commented on LUCENENET-415:


I'll wait a few days before closing this issue  commiting to 2.9.4
Here are the sources:
Source: 
https://svn.apache.org/repos/asf/incubator/lucene.net/branches/Lucene.Net_2_9_4g/src/contrib/SimpleFacetedSearch/
Readme: 
https://svn.apache.org/repos/asf/incubator/lucene.net/branches/Lucene.Net_2_9_4g/src/contrib/SimpleFacetedSearch/README.txt
Test  Usage: 
https://svn.apache.org/repos/asf/incubator/lucene.net/branches/Lucene.Net_2_9_4g/test/contrib/SimpleFacetedSearch

Any comments on class/variable names, APIs etc. since I've never been good in 
them?

DIGY

 Contrib/Faceted Search
 --

 Key: LUCENENET-415
 URL: https://issues.apache.org/jira/browse/LUCENENET-415
 Project: Lucene.Net
  Issue Type: New Feature
Affects Versions: Lucene.Net 2.9.4
Reporter: Digy
Priority: Minor
 Attachments: PerformanceTest.cs, PerformanceTest.cs, 
 PerformanceTest.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, 
 SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch2.cs, 
 SimpleFacetedSearch2.cs, SimpleFacetedSearch2.cs, TestSimpleFacetedSearch.cs, 
 TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch2.cs, 
 TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, 
 TestSimpleFacetedSearch2.cs, facet performance.xls, facet performance.xls, 
 facet performance.xls


 Since I see a lot of questions about faceted search in these days, I plan to 
 add a Faceted-Search project to contrib.
 DIGY

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[Lucene.Net] [jira] [Commented] (LUCENENET-415) Contrib/Faceted Search

2011-05-23 Thread Digy (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038063#comment-13038063
 ] 

Digy commented on LUCENENET-415:


Hi Ben,
Thanks for your comments  test code.

{code}
sfs = new SimpleFacetedSearch(reader, category);
sfs.Search(query) // + fetch
{code}
is roughly equal to
{code}
foreach(cat in GetGroups(category))
{
 
BooleanQuery bq = BooleanQuery();

bg.Add(query , Lucene.Net.Search.BooleanClause.Occur.MUST)
bg.Add(queryParser.Parse(category: + cat) , 
Lucene.Net.Search.BooleanClause.Occur.MUST);

indexSearcher.Search(bg); // + fetch
}
{code}

It would be good to compare these two codes too.

DIGY

 Contrib/Faceted Search
 --

 Key: LUCENENET-415
 URL: https://issues.apache.org/jira/browse/LUCENENET-415
 Project: Lucene.Net
  Issue Type: New Feature
Affects Versions: Lucene.Net 2.9.4
Reporter: Digy
Priority: Minor
 Attachments: PerformanceTest.cs, PerformanceTest.cs, 
 SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, 
 TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, facet performance.xls


 Since I see a lot of questions about faceted search in these days, I plan to 
 add a Faceted-Search project to contrib.
 DIGY

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[Lucene.Net] [jira] [Commented] (LUCENENET-415) Contrib/Faceted Search

2011-05-23 Thread Digy (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038228#comment-13038228
 ] 

Digy commented on LUCENENET-415:


But BitSet+Caching is still faster than BooleanQuery, if don't misinterpret 
your numbers.
DIGY

 Contrib/Faceted Search
 --

 Key: LUCENENET-415
 URL: https://issues.apache.org/jira/browse/LUCENENET-415
 Project: Lucene.Net
  Issue Type: New Feature
Affects Versions: Lucene.Net 2.9.4
Reporter: Digy
Priority: Minor
 Attachments: PerformanceTest.cs, PerformanceTest.cs, 
 PerformanceTest.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, 
 SimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, 
 TestSimpleFacetedSearch.cs, facet performance.xls, facet performance.xls


 Since I see a lot of questions about faceted search in these days, I plan to 
 add a Faceted-Search project to contrib.
 DIGY

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[Lucene.Net] [jira] [Commented] (LUCENENET-415) Contrib/Faceted Search

2011-05-21 Thread Digy (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037478#comment-13037478
 ] 

Digy commented on LUCENENET-415:


Hi Ben,
About performance test:

- One of the costly ops in this faceted-search is the creation of 
SimpleFacetedSearch. It creates the bit sets for all of the group members. 
Since it should be created only once when a new IndexReader is opened(if some 
documents are added or deleted), its creation time should be excluded from the 
test.
- Another costly op is the fetching data from index. After each search, some 
data should be read and this duration should be included in the test.
Eg. 
{code}
TopDocs hits = sfs.Search(q, 100);
for (int j = 0; j  hits.ScoreDocs.Length; j++)
{
Document doc = reader.Document(hits.ScoreDocs[j].doc);
Fieldable f = doc.GetField(title);
}

SimpleFacetedSearch.Hits hits = sfs.Search(q,maxDocPerGroup);
foreach (var h in hits.HitsPerGroup)
{
 foreach (Document doc in h.Documents)
{
Fieldable f = doc.GetField(title);
}
}
{code}
- Hits is a deprecated class and it repeates the search every N (AFAIK 100) 
document access. It is not a normal search and should be excluded from the 
test.

Thanks,
DIGY

 Contrib/Faceted Search
 --

 Key: LUCENENET-415
 URL: https://issues.apache.org/jira/browse/LUCENENET-415
 Project: Lucene.Net
  Issue Type: New Feature
Affects Versions: Lucene.Net 2.9.4
Reporter: Digy
Priority: Minor
 Attachments: PerformanceTest.cs, SimpleFacetedSearch.cs, 
 TestSimpleFacetedSearch.cs


 Since I see a lot of questions about faceted search in these days, I plan to 
 add a Faceted-Search project to contrib.
 DIGY

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[Lucene.Net] [jira] [Commented] (LUCENENET-415) Contrib/Faceted Search

2011-05-20 Thread Digy (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037061#comment-13037061
 ] 

Digy commented on LUCENENET-415:


Here is the documentation of the code:)

{code}
SimpleFacetedSearch sfs = new SimpleFacetedSearch(_Reader, cat);
Query query = new QueryParser(text, new StandardAnalyzer()).Parse(block*);
SimpleFacetedSearch.Hits hits = sfs.Search(query);

long totalHits = hits.TotalHitCount;
foreach (SimpleFacetedSearch.HitsPerGroup hpg in hits.HitsPerGroup)
{
long hitCountPerGroup = hpg.HitCount;
foreach (Document doc in hpg)
{
string text = doc.GetField(text).StringValue();
}
}
{code}

DIGY

 Contrib/Faceted Search
 --

 Key: LUCENENET-415
 URL: https://issues.apache.org/jira/browse/LUCENENET-415
 Project: Lucene.Net
  Issue Type: New Feature
Affects Versions: Lucene.Net 2.9.4
Reporter: Digy
Priority: Minor
 Attachments: SimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs


 Since I see a lot of questions about faceted search in these days, I plan to 
 add a Faceted-Search project to contrib.
 DIGY

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[Lucene.Net] [jira] [Commented] (LUCENENET-415) Contrib/Faceted Search

2011-05-20 Thread Ben West (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037066#comment-13037066
 ] 

Ben West commented on LUCENENET-415:


I believe line 94 should be _GroupByField, not cat. 

 Contrib/Faceted Search
 --

 Key: LUCENENET-415
 URL: https://issues.apache.org/jira/browse/LUCENENET-415
 Project: Lucene.Net
  Issue Type: New Feature
Affects Versions: Lucene.Net 2.9.4
Reporter: Digy
Priority: Minor
 Attachments: SimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs


 Since I see a lot of questions about faceted search in these days, I plan to 
 add a Faceted-Search project to contrib.
 DIGY

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira