[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2009-08-21 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12746017#action_12746017
 ] 

Michael Busch commented on LUCENE-584:
--

Mark, are you working on this? Wanna assign this to you?

 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.1
Reporter: Peter Schäfer
Assignee: Michael Busch
Priority: Minor
 Fix For: 2.9

 Attachments: bench-diff.txt, bench-diff.txt, CHANGES.txt.patch, 
 ContribQueries20080111.patch, lucene-584-take2.patch, 
 lucene-584-take3-part1.patch, lucene-584-take3-part2.patch, 
 lucene-584-take4-part1.patch, lucene-584-take4-part2.patch, 
 lucene-584-take5-part1.patch, lucene-584-take5-part2.patch, lucene-584.patch, 
 Matcher-20070905-2default.patch, Matcher-20070905-3core.patch, 
 Matcher-20071122-1ground.patch, Some Matchers.zip, Test20080111.patch


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2008-10-30 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12643767#action_12643767
 ] 

Paul Elschot commented on LUCENE-584:
-

Wouter, about this:
{{java.lang.ClassCastException: java.util.BitSet cannot be cast to 
org.apache.lucene.search.DocIdSet}}

LUCENE-1187 should have fixed this, so could you file a bug report?
In case you need a workaround, also have a look at LUCENE-1296.




 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.1
Reporter: Peter Schäfer
Assignee: Michael Busch
Priority: Minor
 Fix For: 2.4

 Attachments: bench-diff.txt, bench-diff.txt, CHANGES.txt.patch, 
 ContribQueries20080111.patch, lucene-584-take2.patch, 
 lucene-584-take3-part1.patch, lucene-584-take3-part2.patch, 
 lucene-584-take4-part1.patch, lucene-584-take4-part2.patch, 
 lucene-584-take5-part1.patch, lucene-584-take5-part2.patch, lucene-584.patch, 
 Matcher-20070905-2default.patch, Matcher-20070905-3core.patch, 
 Matcher-20071122-1ground.patch, Some Matchers.zip, Test20080111.patch


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2008-10-29 Thread Wouter Heijke (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12643495#action_12643495
 ] 

Wouter Heijke commented on LUCENE-584:
--

We got the same error here on a 15Gb index with Lucene 2.4.0:

java.lang.ClassCastException: java.util.BitSet cannot be cast to 
org.apache.lucene.search.DocIdSet
 
org.apache.lucene.search.CachingWrapperFilter.getDocIdSet(CachingWrapperFilter.java:76)
 org.apache.lucene.misc.ChainedFilter.getDocIdSet(ChainedFilter.java:200)
 org.apache.lucene.misc.ChainedFilter.getDocIdSet(ChainedFilter.java:145)
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:140)
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:112)
 org.apache.lucene.search.Searcher.search(Searcher.java:136)

 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.1
Reporter: Peter Schäfer
Assignee: Michael Busch
Priority: Minor
 Fix For: 2.4

 Attachments: bench-diff.txt, bench-diff.txt, CHANGES.txt.patch, 
 ContribQueries20080111.patch, lucene-584-take2.patch, 
 lucene-584-take3-part1.patch, lucene-584-take3-part2.patch, 
 lucene-584-take4-part1.patch, lucene-584-take4-part2.patch, 
 lucene-584-take5-part1.patch, lucene-584-take5-part2.patch, lucene-584.patch, 
 Matcher-20070905-2default.patch, Matcher-20070905-3core.patch, 
 Matcher-20071122-1ground.patch, Some Matchers.zip, Test20080111.patch


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2008-03-11 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12577326#action_12577326
 ] 

Paul Elschot commented on LUCENE-584:
-

From the traceback I suppose this happened at the end, using the ChainedFilter?
Iirc ChainedFilter is from contrib/..., and it is mentioned at LUCENE-1187 as 
one of the things to be done.
Could you contribute this code as a contrib/... test case there?
Sorry, I don't remember exactly from which contrib module ChainedFilter is.

 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.1
Reporter: Peter Schäfer
Assignee: Michael Busch
Priority: Minor
 Fix For: 2.4

 Attachments: bench-diff.txt, bench-diff.txt, CHANGES.txt.patch, 
 ContribQueries20080111.patch, lucene-584-take2.patch, 
 lucene-584-take3-part1.patch, lucene-584-take3-part2.patch, 
 lucene-584-take4-part1.patch, lucene-584-take4-part2.patch, 
 lucene-584-take5-part1.patch, lucene-584-take5-part2.patch, lucene-584.patch, 
 Matcher-20070905-2default.patch, Matcher-20070905-3core.patch, 
 Matcher-20071122-1ground.patch, Some Matchers.zip, Test20080111.patch


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2008-03-10 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12577207#action_12577207
 ] 

Mark Miller commented on LUCENE-584:


I think there is still an issue here. The code below just broke for me.

java.lang.ClassCastException: org.apache.lucene.util.OpenBitSet cannot be cast 
to java.util.BitSet
at 
org.apache.lucene.search.CachingWrapperFilter.bits(CachingWrapperFilter.java:55)
at org.apache.lucene.misc.ChainedFilter.bits(ChainedFilter.java:177)
at org.apache.lucene.misc.ChainedFilter.bits(ChainedFilter.java:152)
at org.apache.lucene.search.Filter.getDocIdSet(Filter.java:49)

{code}
  public void testChainedCachedQueryFilter() throws IOException, ParseException 
{
String path = c:/TestIndex;
Analyzer analyzer = new WhitespaceAnalyzer();
IndexWriter writer = new IndexWriter(path, analyzer, true);

Document doc = new Document();
doc.add(new Field(category, red, Store.YES, Index.TOKENIZED));
doc.add(new Field(content, the big bad fox, Store.NO, Index.TOKENIZED));
writer.addDocument(doc);
doc = new Document();
doc.add(new Field(category, red, Store.YES, Index.TOKENIZED));
doc.add(new Field(content, the big bad pig, Store.NO, Index.TOKENIZED));
writer.addDocument(doc);
doc = new Document();
doc.add(new Field(category, red, Store.YES, Index.TOKENIZED));
doc.add(new Field(content, the horrific girl, Store.NO, 
Index.TOKENIZED));
writer.addDocument(doc);
doc = new Document();
doc.add(new Field(category, blue, Store.YES, Index.TOKENIZED));
doc.add(new Field(content, the dirty boy, Store.NO, Index.TOKENIZED));
writer.addDocument(doc);
doc = new Document();
doc.add(new Field(category, blue, Store.YES, Index.TOKENIZED));
doc.add(new Field(content, the careful bad fox, Store.NO, 
Index.TOKENIZED));
writer.addDocument(doc);

writer.addDocument(doc);

Searcher searcher = null;

searcher = new IndexSearcher(path);

QueryParser qp = new QueryParser(field, new KeywordAnalyzer());
Query query = qp.parse(content:fox);
QueryWrapperFilter queryFilter = new QueryWrapperFilter(query);
CachingWrapperFilter cwf = new CachingWrapperFilter(queryFilter);

TopDocs hits = searcher.search(query, cwf, 1);
System.out.println(hits: + hits.totalHits);

queryFilter = new QueryWrapperFilter(qp.parse(category:red));
CachingWrapperFilter fcwf = new CachingWrapperFilter(queryFilter);
Filter[] chain = new Filter[2];
chain[0] = cwf;
chain[1] = fcwf;
ChainedFilter cf = new ChainedFilter(chain, ChainedFilter.AND);

hits = searcher.search(new MatchAllDocsQuery(), cf, 1);

System.out.println(red: + hits.totalHits);

queryFilter = new QueryWrapperFilter(qp.parse(category:blue));
CachingWrapperFilter fbcwf = new CachingWrapperFilter(queryFilter);
chain = new Filter[2];
chain[0] = cwf;
chain[1] = fbcwf;
cf = new ChainedFilter(chain, ChainedFilter.AND);

hits = searcher.search(new MatchAllDocsQuery(), cf, 1);

System.out.println(blue: + hits.totalHits);

  }

{code}



 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.1
Reporter: Peter Schäfer
Assignee: Michael Busch
Priority: Minor
 Fix For: 2.4

 Attachments: bench-diff.txt, bench-diff.txt, CHANGES.txt.patch, 
 ContribQueries20080111.patch, lucene-584-take2.patch, 
 lucene-584-take3-part1.patch, lucene-584-take3-part2.patch, 
 lucene-584-take4-part1.patch, lucene-584-take4-part2.patch, 
 lucene-584-take5-part1.patch, lucene-584-take5-part2.patch, lucene-584.patch, 
 Matcher-20070905-2default.patch, Matcher-20070905-3core.patch, 
 Matcher-20071122-1ground.patch, Some Matchers.zip, Test20080111.patch


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The 

[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2008-02-01 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12564878#action_12564878
 ] 

Michael Busch commented on LUCENE-584:
--

Thanks, Paul for testing and reviewing.

I'll correct the javadocs.

OK, I will commit this tomorrow if nobody objects!

 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Assignee: Michael Busch
Priority: Minor
 Fix For: 2.4

 Attachments: bench-diff.txt, bench-diff.txt, 
 ContribQueries20080111.patch, lucene-584-take2.patch, 
 lucene-584-take3-part1.patch, lucene-584-take3-part2.patch, 
 lucene-584-take4-part1.patch, lucene-584-take4-part2.patch, 
 lucene-584-take5-part1.patch, lucene-584-take5-part2.patch, lucene-584.patch, 
 Matcher-20070905-2default.patch, Matcher-20070905-3core.patch, 
 Matcher-20071122-1ground.patch, Some Matchers.zip, Test20080111.patch


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2008-02-01 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12564875#action_12564875
 ] 

Paul Elschot commented on LUCENE-584:
-

The take5 patch tests ok here.

One very minor remark: the javadoc at RangeFilter.getDocIdSet still mentions 
BitSet.


 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Assignee: Michael Busch
Priority: Minor
 Fix For: 2.4

 Attachments: bench-diff.txt, bench-diff.txt, 
 ContribQueries20080111.patch, lucene-584-take2.patch, 
 lucene-584-take3-part1.patch, lucene-584-take3-part2.patch, 
 lucene-584-take4-part1.patch, lucene-584-take4-part2.patch, 
 lucene-584-take5-part1.patch, lucene-584-take5-part2.patch, lucene-584.patch, 
 Matcher-20070905-2default.patch, Matcher-20070905-3core.patch, 
 Matcher-20071122-1ground.patch, Some Matchers.zip, Test20080111.patch


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2008-01-31 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12564415#action_12564415
 ] 

Mark Harwood commented on LUCENE-584:
-

Hi Paul,
Just eyeballed the code but not had a chance to patch and run it yet.

I was wondering about the return type for skipTo() after looking at these types 
of calls:
   if (docIdSetIterator.skipTo(i)  (docIdSetIterator.doc() == i))

You could save a method invocation in cases like this if skipTo() returned the 
next doc id rather than a boolean. Returning a -1 would be the equivalent of 
what used to be false.
Not tried benchmarking it but does this seem like something worth considering?

Cheers
Mark

 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Assignee: Michael Busch
Priority: Minor
 Fix For: 2.4

 Attachments: bench-diff.txt, bench-diff.txt, 
 ContribQueries20080111.patch, lucene-584-take2.patch, 
 lucene-584-take3-part1.patch, lucene-584-take3-part2.patch, 
 lucene-584-take4-part1.patch, lucene-584-take4-part2.patch, lucene-584.patch, 
 Matcher-20070905-2default.patch, Matcher-20070905-3core.patch, 
 Matcher-20071122-1ground.patch, Some Matchers.zip, Test20080111.patch


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2008-01-31 Thread Marvin Humphrey


On Jan 31, 2008, at 9:29 AM, Mark Harwood (JIRA) wrote:

You could save a method invocation in cases like this if skipTo()  
returned the next doc id rather than a boolean. Returning a -1  
would be the equivalent of what used to be false.
Not tried benchmarking it but does this seem like something worth  
considering?


A contributor to KinoSearch persuaded me to have document numbers  
begin at 1 for this reason.


Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2008-01-31 Thread Paul Elschot
Op Thursday 31 January 2008 18:29:12 schreef Mark Harwood (JIRA):
 
 [ 
 https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12564415#action_12564415
  ] 
 
 Mark Harwood commented on LUCENE-584:
 -
 
 Hi Paul,
 Just eyeballed the code but not had a chance to patch and run it yet.
 
 I was wondering about the return type for skipTo() after looking at these 
 types of calls:
if (docIdSetIterator.skipTo(i)  (docIdSetIterator.doc() == i))
 
 You could save a method invocation in cases like this if skipTo() returned 
 the next doc id rather than a boolean.
 Returning a -1 would be the equivalent of what used to be false. 
 Not tried benchmarking it but does this seem like something worth considering?
 
 Cheers
 Mark

Performance is always worth consideration, but this is another
issue.

Returning -1 is not without cost either, it's a constant that needs
to be loaded on the called side and tested against on the calling side.
A returned boolean may have to be loaded and can be tested directly,
so with good inlining I'd expect it to be faster in the normal case in
which the document number is not needed immediately.
The code shown is likely from an explain() method, and not from
a next() or skipTo() implementation, and then it's not the normal case.

Less (using a boolean) is more (performance) in this case, I think, but
benchmarking may show something else.

This skipTo() is also Scorer.skipTo(), so a change there could have an even
bigger impact than a change in Filter. Have a look at the size of the take4
patch at LUCENE-584 before trying to change skipTo() at home :)

Regards,
Paul Elschot

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2008-01-31 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12564620#action_12564620
 ] 

Michael Busch commented on LUCENE-584:
--

{quote}
You could save a method invocation in cases like this if skipTo() returned the 
next doc id rather than a boolean. Returning a -1 would be the equivalent of 
what used to be false.
{quote}

To change the signature of skipTo()  would be an API change, because with this 
patch Scorer extends DocIdSetIterator.

-Michael

 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Assignee: Michael Busch
Priority: Minor
 Fix For: 2.4

 Attachments: bench-diff.txt, bench-diff.txt, 
 ContribQueries20080111.patch, lucene-584-take2.patch, 
 lucene-584-take3-part1.patch, lucene-584-take3-part2.patch, 
 lucene-584-take4-part1.patch, lucene-584-take4-part2.patch, lucene-584.patch, 
 Matcher-20070905-2default.patch, Matcher-20070905-3core.patch, 
 Matcher-20071122-1ground.patch, Some Matchers.zip, Test20080111.patch


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2008-01-15 Thread Eks Dev (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558997#action_12558997
 ] 

Eks Dev commented on LUCENE-584:


Michal, would this work? 
1. providing default implementation for basic methods that is using skipping 
iterator(always there), so it works by default for *all* implementations, 
something along the lines:

/**
 * A DocIdSet contains a set of doc ids. Implementing classes must provide
 * a [EMAIL PROTECTED] DocIdSetIterator} to access the set. 
 */
public abstract class DocIdSet {
public abstract DocIdSetIterator iterator();

public  DocIdSet and(DocIdSet){
// default implementation using *iterator*;
}

public  DocIdSet or(DocIdSet){
// default implementation using iterator;
}

}

2.  And then we *optimize* particular cases, e.g

public class DocIdBitSet extends DocIdSet{   
BitSet bits; // Must be there in order for iterator to work

public DocIdSetIterator iterator(){
//this is easy...
}

public  DocIdSet and(DocIdSet dis){
if (dois instanceof DocIdBitSet) {
//not exactly like this, but the idea is there
 this.bits.and(((DocIdBitSet) dis));
 return this;
}
return super.and(DocIdSet);
  
 }
}

So it works always, and it works fast if need be, one instanceof check does not 
hurt there. Did I miss something obvious?





 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Assignee: Michael Busch
Priority: Minor
 Fix For: 2.4

 Attachments: bench-diff.txt, bench-diff.txt, 
 ContribQueries20080111.patch, lucene-584-take2.patch, 
 lucene-584-take3-part1.patch, lucene-584-take3-part2.patch, 
 lucene-584-take4-part1.patch, lucene-584-take4-part2.patch, lucene-584.patch, 
 Matcher-20070905-2default.patch, Matcher-20070905-3core.patch, 
 Matcher-20071122-1ground.patch, Some Matchers.zip, Test20080111.patch


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2008-01-15 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12559168#action_12559168
 ] 

Paul Elschot commented on LUCENE-584:
-

I indeed recall having an problem with remote filter caching. At the time I 
thought it was related to serialization but I could not resolve it that way. 
Never mind, it does not matter anymore.

BooleanFilter and ChainedFilter have the same issue here. As they provide just 
about the same functionality, could they perhaps be merged?

The solution using DocIdSet.and() and DocIdSet.or() looks good to me, but it 
will require some form of collector for the results, much like 
HitCollector.collect(doc, score) now and MatchCollector.collect(doc) in the 
Matcher...patch.
The boolean operations could then be accumulated into a BitSet or into an 
OpenBitSet, using a special case for DocId(Open)BitSet.

I'd like these boolean operations on DocIdSets to be general enough for use in 
Scorers, for example for the conjunctions in ConjunctionScorer, PhraseScorer 
and in the two NearSpans. But that is another issue.



 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Assignee: Michael Busch
Priority: Minor
 Fix For: 2.4

 Attachments: bench-diff.txt, bench-diff.txt, 
 ContribQueries20080111.patch, lucene-584-take2.patch, 
 lucene-584-take3-part1.patch, lucene-584-take3-part2.patch, 
 lucene-584-take4-part1.patch, lucene-584-take4-part2.patch, lucene-584.patch, 
 Matcher-20070905-2default.patch, Matcher-20070905-3core.patch, 
 Matcher-20071122-1ground.patch, Some Matchers.zip, Test20080111.patch


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2008-01-14 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558856#action_12558856
 ] 

Michael Busch commented on LUCENE-584:
--

I think I understand now which problems you had when you wanted to 
change BooleanFilter and xml-query-parser to use the new Filter APIs.

BooleanFilter is optimized to utilize BitSets for performing boolean
operations fast. Now if we change BooleanFilter to use the new 
DocIdSetIterator, then it can't use the fast BitSet operations (e. g.
union for or, intersect for and) anymore. 

Now we can introduce BitSetFilter as you suggested and what I did in
the take4 patch. But here's the problem: Introducing subclasses of 
Filter doesn't play nicely with the caching mechanism in Lucene.
For example: if we change BooleanFilter to only work with 
BitSetFilters, then it won't work with a CachingWrapperFilter anymore,
because CachingWrapperFilter extends Filter. Then we would have to
introduce new CachingWrapper***Filter, for the different kinds of
Filter subclasses, which is a bad design as Mark pointed out in his
comment: 
https://issues.apache.org/jira/browse/LUCENE-584?focusedCommentId=12547901#action_12547901

One solution would be to add a getBitSet() method to DocIdBitSet.
DocIdBitSet is a new class that is basically just a wrapper around a
Java BitSet and provides a DocIdSetIterator to access the BitSet.

Then BooleanFilter could do something like this:
{code:java}
DocIdSet docIdSet = filter.getDocIdSet();
if (docIdSet instanceof DocIdBitSet) {
  BitSet bits = ((DocIdBitSet) docIdSet).getBitSet();
  ... // existing code
} else {
  throw new UnsupportedOperationException(BooleanFilter only 
  supports Filters that use DocIdBitSet.);
}
{code}

But then, changing the core filters to use OpenBitSets instead of
Java BitSets is technically an API change, because BooleanFilter
would not work anymore with the core filters.

So if we took this approach we would have to wait until 3.0 to move
the core from BitSet to OpenBitSet and also change BooleanFilter 
then to use OpenBitSets. BooleanFilter could then also work with
either of the two BitSet implementions, but probably not with those
two mixed.

Any feedback about this is very welcome. I'll try to further think
about how to marry the new Filter API, caching mechanism and Filter
implementations like BooleanFilter nicely.

 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Assignee: Michael Busch
Priority: Minor
 Fix For: 2.4

 Attachments: bench-diff.txt, bench-diff.txt, 
 ContribQueries20080111.patch, lucene-584-take2.patch, 
 lucene-584-take3-part1.patch, lucene-584-take3-part2.patch, 
 lucene-584-take4-part1.patch, lucene-584-take4-part2.patch, lucene-584.patch, 
 Matcher-20070905-2default.patch, Matcher-20070905-3core.patch, 
 Matcher-20071122-1ground.patch, Some Matchers.zip, Test20080111.patch


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2008-01-11 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12557935#action_12557935
 ] 

Michael Busch commented on LUCENE-584:
--

{quote}
As for PrefixGenerator:
in my (up to date) trunk directory, this command: find . -name 'PrefixGenerator'
only gave this result: 
./build/classes/java/org/apache/lucene/search/PrefixGenerator.class
and that disappeared after ant clean.
It seems that the source class was removed from the trunk.
{quote}

As I said, PrefixGenerator is defined in PrefixFilter.java.

 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Assignee: Michael Busch
Priority: Minor
 Fix For: 2.4

 Attachments: bench-diff.txt, bench-diff.txt, lucene-584-take2.patch, 
 lucene-584-take3-part1.patch, lucene-584-take3-part2.patch, lucene-584.patch, 
 Matcher-20070905-2default.patch, Matcher-20070905-3core.patch, 
 Matcher-20071122-1ground.patch, Some Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2008-01-11 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12557955#action_12557955
 ] 

Paul Elschot commented on LUCENE-584:
-

I'm sorry about my PrefixGenerator remarks, I did not read your answer 
accurately.

On the take4 patch of 11 Jan 2008:

I have started in a fresh trunk checkout that passed all tests.
Both parts of take4 apply cleanly, using patch -p0  ... .
ant jar, ant test-core and ant test-contrib all pass nicely.

I remember having problems with moving contrib/xml-queryparser from Filter
to BitSetFilter, see my comment of 30 July 2007.
So I'd like to verify that this can be done, and I hope Mark Harwood can give
some hints as to how to do this.

For me, this was the main reason to make this move:
from Filter with subclass BitSetFilter (as in the take4 patch, and in my first 
attempts)
to MatchFilter with subclass Filter (as in Matcher... patches of Sep and Nov 
2007).
In these Matcher... patches no changes were necessary to 
contrib/xml-queryparser.


Less important for now:

The test classes extend TestCase, but iirc there is also a LuceneTestCase for 
this.

On the take4 patch ant javadocs-core gives this:
BitSetFilter.java:40: warning - Tag @link: reference not found: 
DocIdBitSetIterator


 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Assignee: Michael Busch
Priority: Minor
 Fix For: 2.4

 Attachments: bench-diff.txt, bench-diff.txt, lucene-584-take2.patch, 
 lucene-584-take3-part1.patch, lucene-584-take3-part2.patch, 
 lucene-584-take4-part1.patch, lucene-584-take4-part2.patch, lucene-584.patch, 
 Matcher-20070905-2default.patch, Matcher-20070905-3core.patch, 
 Matcher-20071122-1ground.patch, Some Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2008-01-11 Thread Eks Dev (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12557999#action_12557999
 ] 

Eks Dev commented on LUCENE-584:


it looks like  ChainedFilter could  become obsolete if Filter/DocSetIdIterator 
gets added as a Clause to the  BooleanQuery? I am thinking along the lines: 
ChainedFilter evaluates boolean expression of docId-s, that is exactly what 
BooleanQuery does plus a bit more (scoring)... 

 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Assignee: Michael Busch
Priority: Minor
 Fix For: 2.4

 Attachments: bench-diff.txt, bench-diff.txt, 
 ContribQueries20080111.patch, lucene-584-take2.patch, 
 lucene-584-take3-part1.patch, lucene-584-take3-part2.patch, 
 lucene-584-take4-part1.patch, lucene-584-take4-part2.patch, lucene-584.patch, 
 Matcher-20070905-2default.patch, Matcher-20070905-3core.patch, 
 Matcher-20071122-1ground.patch, Some Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2008-01-11 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558076#action_12558076
 ] 

Paul Elschot commented on LUCENE-584:
-

{quote}
it looks like ChainedFilter could become obsolete if Filter/DocSetIdIterator 
gets added as a Clause to the BooleanQuery?
{quote}

The function is indeed the same, but ChainedFilter works directly on BitSets 
and BooleanQuery works on input Scorers/DocIdSetIterators and outputs collected 
docids (and score values). Working directly on  (Open)BitSets is normally 
faster, so ChainedFilter can have a good use.
And boolean operations on DocIdSets are not (yet) directly available in Lucene. 
The various boolean scorers have the logic, but currently only for Scorers.

That leaves the question on what to do with ChainedFilter here. Any ideas?
The easiest way is to open another issue for it. This will have to be resolved 
before Filter.bits() is removed.


 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Assignee: Michael Busch
Priority: Minor
 Fix For: 2.4

 Attachments: bench-diff.txt, bench-diff.txt, 
 ContribQueries20080111.patch, lucene-584-take2.patch, 
 lucene-584-take3-part1.patch, lucene-584-take3-part2.patch, 
 lucene-584-take4-part1.patch, lucene-584-take4-part2.patch, lucene-584.patch, 
 Matcher-20070905-2default.patch, Matcher-20070905-3core.patch, 
 Matcher-20071122-1ground.patch, Some Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2008-01-11 Thread Eks Dev (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558115#action_12558115
 ] 

Eks Dev commented on LUCENE-584:


hmm, in order to have fast and/or operations we need to know the type of the 
underlaying object in Filter, and sometimes we must use iterators (e.g. case 
where one Filter/DocSetId is int list and another Hash bit set ). I guess, 
knowing type of DocIdSet is the trick to pool. 
 
Default implementation of ChainedFilter (there is also BooleanFilter somewhere 
in contrib, I like it more) should be using iterator (like scorers), and at a 
few key points checking if(first instance of SomeInstanceOfDocIdSet  second  
SomeInstanceOfDocIdSet) first.doFastOR/AND(second);

something in that direction looks reasonable to me for ChainedFilter 
If it proves to be really better to have it around. I am still of an opinion 
that it would be better to integrate DocIdSet into BooleanQuery as a clause, 
somehow, that would be some kind of ConstantBoolean(MUST/SHOULD/NOT)Clause, 
much cleaner from design/usability point of view, even at some minor penalty in 
performance (anyhow, you can always combine filters before you enter scorers) 
but you are right that is another issue... let us stop polluting this issue :) 


 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Assignee: Michael Busch
Priority: Minor
 Fix For: 2.4

 Attachments: bench-diff.txt, bench-diff.txt, 
 ContribQueries20080111.patch, lucene-584-take2.patch, 
 lucene-584-take3-part1.patch, lucene-584-take3-part2.patch, 
 lucene-584-take4-part1.patch, lucene-584-take4-part2.patch, lucene-584.patch, 
 Matcher-20070905-2default.patch, Matcher-20070905-3core.patch, 
 Matcher-20071122-1ground.patch, Some Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2008-01-10 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12557595#action_12557595
 ] 

Paul Elschot commented on LUCENE-584:
-

On the take3 patch of 10 Jan 2008:

SortedVIntList extends DocIdSet: nice, thanks.

PrefixGenerator is used but not defined in the patch, so it will not compile.

Nevertheless, with all tests passing, I think this is a good way to
make Filter independent of BitSet.


Minor concerns:

There is neither a BitSetFilter nor an OpenBitSetFilter in the patch.
These might be useful for existing code currently implementing Filter
to overcome the deprecation of Filter.bits().
With the current core moving to OpenBitSet, it might also use an
explicit OpenBitSetFilter.

Some javadoc changes did not make it into the take3 patch, I'll check these 
later.

FilteredQuery.explain(): When a document does not pass the Filter
I think it would be better not to use setValue(0.0f) on the resulting
Explanation. However, this may be necessary for backward compatibility.


For the future:

About adding a Filter as a clause to BooleanScorer, and adding
DocSetIdIterator as a Scorer to ConjunctionScorer:
This is the reason for the CHECKME in IndexSearcher for using
ConjunctionScorer when a filter is given.
A ConjunctionScorer that accepts a DocIdSetIterator could also be used in
FilteredQuery.

 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Assignee: Michael Busch
Priority: Minor
 Fix For: 2.4

 Attachments: bench-diff.txt, bench-diff.txt, lucene-584-take2.patch, 
 lucene-584-take3-part1.patch, lucene-584-take3-part2.patch, lucene-584.patch, 
 Matcher-20070905-2default.patch, Matcher-20070905-3core.patch, 
 Matcher-20071122-1ground.patch, Some Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2008-01-10 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12557755#action_12557755
 ] 

Michael Busch commented on LUCENE-584:
--

{quote}
On the take3 patch of 10 Jan 2008:
{quote}

Thanks for the review!

{quote}
PrefixGenerator is used but not defined in the patch, so it will not compile.
{quote}

Not sure I understand what you mean. PrefixGenerator is (and was) 
defined in PrefixFilter.java. It compiles for me.

{quote}
There is neither a BitSetFilter nor an OpenBitSetFilter in the patch.
These might be useful for existing code currently implementing Filter
to overcome the deprecation of Filter.bits().
With the current core moving to OpenBitSet, it might also use an
explicit OpenBitSetFilter.
{quote}

I think that it should be straightforward for users having filters that use
BitSets to wrap the new DocIdBitSet around the BitSet, just as Filter currently 
does for backwards compatibility?

{quote}
Some javadoc changes did not make it into the take3 patch, I'll check these 
later.
{quote}

Oh, which ones?

{quote}
FilteredQuery.explain(): When a document does not pass the Filter
I think it would be better not to use setValue(0.0f) on the resulting
Explanation. However, this may be necessary for backward compatibility.
{quote}

Yeah, it used to work this way, that's why I didn't change it for backwards-
compatibility reasons.

{quote}
About adding a Filter as a clause to BooleanScorer, and adding
DocSetIdIterator as a Scorer to ConjunctionScorer:
This is the reason for the CHECKME in IndexSearcher for using
ConjunctionScorer when a filter is given.
A ConjunctionScorer that accepts a DocIdSetIterator could also be used in
FilteredQuery.
{quote}

Well, let's address this with a different issue after this one is committed.
I might have some concerns here, but I've to further think about it.

 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Assignee: Michael Busch
Priority: Minor
 Fix For: 2.4

 Attachments: bench-diff.txt, bench-diff.txt, lucene-584-take2.patch, 
 lucene-584-take3-part1.patch, lucene-584-take3-part2.patch, lucene-584.patch, 
 Matcher-20070905-2default.patch, Matcher-20070905-3core.patch, 
 Matcher-20071122-1ground.patch, Some Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2008-01-10 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12557814#action_12557814
 ] 

Paul Elschot commented on LUCENE-584:
-

As for PrefixGenerator:
in my (up to date) trunk directory, this command: find . -name 
'*PrefixGenerator*'
only gave this result: 
./build/classes/java/org/apache/lucene/search/PrefixGenerator.class
and that disappeared after ant clean.
It seems that the source class was removed from the trunk.

{quote}
I think that it should be straightforward for users having filters that use
BitSets to wrap the new DocIdBitSet around the BitSet, just as Filter currently
does for backwards compatibility?
{quote}

BitSetFilter would inherit from Filter, and have an abstract bits() method, not 
deprecated.
This would be useful for people that don't what to move to OpenBitSet yet.
A rename (and maybe a package change) from Filter to BitSetFilter should be 
sufficient
in their code to get rid of the deprecation warning for Filter.bits().

OpenBitSetFilter similar, and that could be used in a few places in the patch 
iirc.

The javadoc changes I meant came with Matcher and use 'match' consistently for 
documents
that are collected during a query search.



 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Assignee: Michael Busch
Priority: Minor
 Fix For: 2.4

 Attachments: bench-diff.txt, bench-diff.txt, lucene-584-take2.patch, 
 lucene-584-take3-part1.patch, lucene-584-take3-part2.patch, lucene-584.patch, 
 Matcher-20070905-2default.patch, Matcher-20070905-3core.patch, 
 Matcher-20071122-1ground.patch, Some Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-12-04 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548208
 ] 

Mark Harwood commented on LUCENE-584:
-

For the data structures (bitset/openbitset/sorted VintList/) I would suggest 
one of these: IntSet, IntegerSet or IntegerSequence as names for the common 
interface.
I did a quick Google for IntegerSet and you are in the number one spot, Paul :) 
[http://www.google.com/search?hl=enq=integerset+bitset]

// A cachable, immutable, sorted, threadsafe collection of ints.
interface IntegerSet
{
   IntegerSetIterator getIterator();
   int size(); //negative numbers could be used to represent estimates?
}

// A single-use thread-unsafe iterator.
interface IntegerSetIterator
{
boolean next();
boolean skipTo(int next);
int currentValue();
}

If _detailed_ explanations of hits are required these should really sit with 
the source not the result- i.e. with the Filters. They contain all the match 
criteria used to populate IntegerSets  and can be thought of more generically 
as IntegerSetBuilder. 

//Contains criteria to create a set of matching documents. MUST implement 
hashcode and equals based on this criteria to enable use as cache keys for 
IntegerSets.
interface IntegerSetBuilder extends Serializable
{
  IntegerSet build (IndexReader reader)
  Explanation explain(int docNr);
}



A single CachingIntegerSetBuilder class would be able to take ANY 
IntegerSetBuilder as a source, cache ANY type of IntegerSet they produced and 
defer back to the original IntegerSetBuilder for a full and thorough 
explanation of a match even when the match occurred on a cached IntegerSet, if 
required.

class CachingIntegerSetBuilder implements IntegerSetBuilder
{
 private WeakHashMap perIndexReaderCache;
 public CachingIntegerSetBuilder(IntegerSetBuilder delegate) {}
 .
}

The reason for introducing IntegerSetBuilder as a more generic name than 
Filter is IntegerSets have uses outside of  filtering e.g. to do category 
counts or clustering. In these use cases they don't actually perform anything 
to do with filtering.  It may actually be better named DocIdSetBuilder given 
that it is tied to Lucene's IndexReader and therefore limited to producing sets 
of document ids.


 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Assignee: Michael Busch
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, lucene-584-take2.patch, 
 lucene-584.patch, Matcher-20070905-2default.patch, 
 Matcher-20070905-3core.patch, Matcher-20071122-1ground.patch, Some 
 Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-12-04 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548341
 ] 

Paul Elschot commented on LUCENE-584:
-

For the moment a DocId is an int, but that might change to long sooner than we 
think. So DocIdSet... would be a better name than IntegerSet..., and it's 
better to use an abstract superclass than an interface:

{code}
abstract class DocIdSetIterator {
  boolean next();
  boolean skipTo(int next);
  int doc();
}

// and the rest is in the patch, except the superclass for Matcher:

abstract class Matcher extends DocIdSetIterator {
  Explanation explain(int doc);
}

abstract class Scorer extends Matcher {
  float score();
  ...
}
{code}

Would this DocIdSetIterator be close enough to the IntegerSetIterator?


 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Assignee: Michael Busch
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, lucene-584-take2.patch, 
 lucene-584.patch, Matcher-20070905-2default.patch, 
 Matcher-20070905-3core.patch, Matcher-20071122-1ground.patch, Some 
 Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-12-03 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547901
 ] 

Mark Harwood commented on LUCENE-584:
-

To go back to post #1 on this topic:

   _Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
memory. It would be desirable to have an alternative BitSet implementation with 
smaller memory footprint._

Given the motivation to move to more memory efficient structures  why is the 
only attempt at caching dedicated exclusively to caching the very structures we 
were trying to move away from?.

   _I deprecated also CachingWrapperFilter and RemoteCachingWrapperFilter 
and added corresponding CachingBitSetFilter and RemoteCachingBitSetFilter_

Does this suggest we are to have type-specific CachingXFilters and 
RemoteCachingXFilters created for every new filter type? Why not provide a 
single caching mechanism that works for all those other, new, more 
memory-efficient structures? I beleive the reason this hasn't been done is due 
to the issue I highlighted earlier - the cachable artefacts (what I chose to 
call DocIdSet here: [#action_12518642] ) are not modelled in  a way which 
promotes re-use. That's why we would end up needing a specialised caching 
implementations for each type. 

If we are to move forward from the existing Lucene implementation it's 
important to note the change:

* Filters currently produce, at great cost, BitSets. Bitsets provide both a 
cachable data structure and a thread-safe, reusable  means of iterating across 
the contents.

* By replacing BitSets with Matchers this proposal has removed an important 
aspect of the existing design -  the visibility (and therefore cachability) of 
these expensive-to-recreate data structures. Matchers are single-use, 
non-threadsafe objects and hide the data structure over which they iterate. 
With this change if I want to implement a caching mechanism in my application I 
need to know the Filter type and what sort of data structure it returns and get 
it from it directly:
  if(myFilter instanceof BitSetFilter)wrap specific data structure using 
CachingBitSetFilter
  else
  if(myFilter instanceof OpenBitSetFilter)   wrap specific data structure using 
CachingXFilter
  else...

...looks like an Anti-pattern to me. Worse, this ties the choice of 
datastructure to the type of Filter that produces it. Why can't my RangeFilter 
be free to create a SortedVIntList or a BitSet depending on the sparseness of 
matches for a particular set of criteria?

I'm not saying lets just stick with Bitsets, just consider caching more in 
the design. Post [#action_12518642] lays out how this could be modelled with 
the introduction of DocIdSet and DocIdSetIterator as separate responsibilities 
(whereas Matcher currently combines them both).

Cheers
Mark














 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Assignee: Michael Busch
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, lucene-584-take2.patch, 
 lucene-584.patch, Matcher-20070905-2default.patch, 
 Matcher-20070905-3core.patch, Matcher-20071122-1ground.patch, Some 
 Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-12-03 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547895
 ] 

Paul Elschot commented on LUCENE-584:
-

A few remarks on the lucene-584-take2 patch:

In the @deprecated javadoc at Filter.bits() a reference to BitSetFilter could 
be added.

While Filter.bits() is still deprecated, one could also use the BitSet in 
IndexSearcher
in case this turns out to be performance sensitive; see also my remark of 28 
November.

A few complete (test) classes are deprecated, it might be good to add the 
target release
for removal there.

For the rest this patch looks good to me. Did you also run ant test-contrib ?

 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Assignee: Michael Busch
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, lucene-584-take2.patch, 
 lucene-584.patch, Matcher-20070905-2default.patch, 
 Matcher-20070905-3core.patch, Matcher-20071122-1ground.patch, Some 
 Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-12-03 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547958
 ] 

Paul Elschot commented on LUCENE-584:
-

Mark, in the latest Matcher-2default.patch there is the 
org.apache.lucene.MatcherProvider interface with this javadoc:

/** To be used in a cache to implement caching for a MatchFilter. */

This interface has only one method:

public Matcher getMatcher();


There is also a cache for filters in the Matcher3core.patch in the class 
CachingWrapperFilter .

Would those be a good starting point?


 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Assignee: Michael Busch
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, lucene-584-take2.patch, 
 lucene-584.patch, Matcher-20070905-2default.patch, 
 Matcher-20070905-3core.patch, Matcher-20071122-1ground.patch, Some 
 Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-12-03 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547988
 ] 

Mark Harwood commented on LUCENE-584:
-

I'm getting lost as to which patches we're considering here. I was looking at 
lucene-584-take2 patch.

MatcherProvider in the earlier patch does look like something that will help 
with caching.

Would those be a good starting point?

Overall I feel uncomfortable with a lot of the classnames. I think the use of 
Matcher says more about what you want to do with the class in this particular 
case rather than what _it_ does generally. I have other uses in mind for these 
classes that are outside of filtering search results. For me, these classes can 
be thought of much more simply as utility classes in the same mould as the java 
Collections API. Fundamentally, they are efficient implementations of 
sets/lists of integers with support for iterators. The whole thing would be a 
lot cleaner if classes were named around this scheme.
MatcherProvider for example is essentially a DocIdSet  which creates forms of 
DocIdSetIterators (Matchers) and could also usefully have a size() method. 



 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Assignee: Michael Busch
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, lucene-584-take2.patch, 
 lucene-584.patch, Matcher-20070905-2default.patch, 
 Matcher-20070905-3core.patch, Matcher-20071122-1ground.patch, Some 
 Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-12-03 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548030
 ] 

Paul Elschot commented on LUCENE-584:
-

In case there is a better name than Matcher for a Scorer without a score() 
method (and maybe without an explain() method), I'm all ears. Names are 
important, and at this point they can still be changed very easily.

For Matcher I'd rather have a method to estimate the number of matching docs 
than a size() method. This estimate would be useful in implementing 
conjunctions, as the Matchers with the lowest estimates could be used first. 
However, this is another issue.


 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Assignee: Michael Busch
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, lucene-584-take2.patch, 
 lucene-584.patch, Matcher-20070905-2default.patch, 
 Matcher-20070905-3core.patch, Matcher-20071122-1ground.patch, Some 
 Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-12-02 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547612
 ] 

Paul Elschot commented on LUCENE-584:
-

I tried implementing a Searchable, and indeed ran into compilation errors.
So, backward compatibility is indeed not complete.

Also, Searchable is an interface, so it should not be changed.
In case there are other interfaces affected by the patch these should not be 
changed either.

There are two ways out of this:

Do a name change on MatcherFilter/Filter - Filter/BitSetFilter.
Changing the current Filter to BitSetFilter gives other problems with contrib 
packages.
I tried this some time ago, see above, but I could not make it work.

I'd prefer to add an interface (or abstract class?) like Searchable that uses 
MatchFilter for those implementers that want to take advantage of MatchFilter.
I don't expect problems from leaving the Searchable interface available 
unchanged.
Other interfaces that use Filter can be treated the same way, in case there are 
any.




 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Assignee: Michael Busch
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, lucene-584.patch, 
 Matcher-20070905-2default.patch, Matcher-20070905-3core.patch, 
 Matcher-20071122-1ground.patch, Some Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-12-02 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547681
 ] 

Michael Busch commented on LUCENE-584:
--

Why do we actually need the new MatchFilter class at all?
Filter is an abstract class, not an interface. So I think we could
simply add the new method getMatcher() like you already did
in your patch:

{code:java}
  /**
   * @return A Matcher constructed from the provided BitSet.
   * @seeDefaultMatcher#defaultMatcher(BitSet)
   */
  public Matcher getMatcher(IndexReader reader) throws IOException {
return new BitSetMatcher(bits(reader));
  }
{code}

This shouldn't break existing Filter implementations? 
Maybe I'm missing an apparent reason why we need the MatchFilter
class?

 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Assignee: Michael Busch
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, lucene-584.patch, 
 Matcher-20070905-2default.patch, Matcher-20070905-3core.patch, 
 Matcher-20071122-1ground.patch, Some Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-12-02 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547686
 ] 

Paul Elschot commented on LUCENE-584:
-

For example, OpenBitSetFilter would like this:
{code}
class OpenBitSetFilter  /* ... */ {
  OpenBitSet bits(reader) { ... }
  Matcher getMatcher(reader) { ... }
}
{code}
Since the only thing needed by an IndexSearcher would be the Matcher,
MatchFilter what Filter and OpenBitSetFilter have in common, the getMatcher() 
method.



 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Assignee: Michael Busch
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, lucene-584.patch, 
 Matcher-20070905-2default.patch, Matcher-20070905-3core.patch, 
 Matcher-20071122-1ground.patch, Some Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-12-02 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547689
 ] 

Michael Busch commented on LUCENE-584:
--

What about adding the getMatcher() method to Filter and
deprecating bits(IndexReader)? Then when we release
3.0 we can remove bits() and the only method in Filter
will be getMatcher().

Then this patch should be backwards compatible
and we'd do the API change with the next major release.
Any objections?

 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Assignee: Michael Busch
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, lucene-584.patch, 
 Matcher-20070905-2default.patch, Matcher-20070905-3core.patch, 
 Matcher-20071122-1ground.patch, Some Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-12-02 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547691
 ] 

Paul Elschot commented on LUCENE-584:
-

I had not thought about deprecating yet, but it should work nicely.
I suppose you want to add a class BitSetFilter (subclass of Filter) as the 
preferred alternative
to the deprecated method?
Initially Filter and BitSetFilter would  be very similar, except that 
Filter.bits() would be deprecated.
Later, after  removal of Filter.bits(), Filter.getMatcher() would be declared 
abstract.

I tried to do something pretty close to this for contrib/xml-query-parser, but 
I could not make that work,
which is why I changed to adding a new superclass MatchFilter.
Nevertheless, I think the deprecation above should work, but at the moment I 
can't foresee the consequences.


 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Assignee: Michael Busch
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, lucene-584.patch, 
 Matcher-20070905-2default.patch, Matcher-20070905-3core.patch, 
 Matcher-20071122-1ground.patch, Some Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-12-01 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547524
 ] 

Michael Busch commented on LUCENE-584:
--

{quote}
The patch is backwards compatible,
{quote}

I think that custom Searcher or Searchable implementations won't compile 
anymore?
Because the signature of some abstract methods changed, e. g. in Searchable:

{code:java}
@@ -86,13 +86,14 @@
* pCalled by [EMAIL PROTECTED] Hits}.
*
* pApplications should usually call [EMAIL PROTECTED] 
Searcher#search(Query)} or
-   * [EMAIL PROTECTED] Searcher#search(Query,Filter)} instead.
+   * [EMAIL PROTECTED] Searcher#search(Query,MatchFilter)} instead.
* @throws BooleanQuery.TooManyClauses
*/
-  TopDocs search(Weight weight, Filter filter, int n) throws IOException;
+  TopDocs search(Weight weight, MatchFilter filter, int n) throws IOException;
{code}

 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Assignee: Michael Busch
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, lucene-584.patch, 
 Matcher-20070905-2default.patch, Matcher-20070905-3core.patch, 
 Matcher-20071122-1ground.patch, Some Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-11-28 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546166
 ] 

Paul Elschot commented on LUCENE-584:
-

The patch is backwards compatible, except for current subclasses of Filter 
already have a getMatcher method. The fact that no changes are needed to 
contrib confirms the compatibility.

I have made no performance tests on BitSetMatcher for two reasons.
The first reason is that OpenBitSet is actually faster than BitSet (have a look 
at the graph in the SomeMatchers.zip file attachment by Eks Dev), so it seems 
to be better to go in that direction.
The second is that it is easy to do the skipping in IndexSearcher on a BitSet 
directly by using nextSetBit on the BitSet instead of skipTo on the 
BitSetMatcher. For this it would only be necessary to check whether the given 
MatchFilter is a Filter.
Anyway, I prefer to see where the real performance bottlenecks are before 
optimizing for performance.

DefaultMatcher should be in the ...2default... patch.
The change in Hits to use MatchFilter should be in the ...3core.. patch.

So far, I never tried to use these patches on their own, I have only split them 
for a better overview. Splitting the combined patches to iterate would need a 
different split, as you found out. It might even be necessary to split within a 
single class, but I'll gladly do that.


 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, 
 Matcher-20070905-2default.patch, Matcher-20070905-3core.patch, 
 Matcher-20071122-1ground.patch, Some Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-11-28 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546393
 ] 

Paul Elschot commented on LUCENE-584:
-

With the full patch applied, the following test cases use a BitSetMatcher:

TestQueryParser
TestComplexExplanations
TestComplexExplanationsOfNonMatches
TestConstantScoreRangeQuery
TestDateFilter
TestFilteredQuery
TestMultiSearcherRanking
TestPrefixFilter
TestRangeFilter
TestRemoteCachingWrapperFilter
TestRemoteSearchable
TestScorerPerf
TestSimpleExplanations
TestSimpleExplanationsOfNonMatches
TestSort

so I don't think it is necessary to provide seperate test cases.

 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Assignee: Michael Busch
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, lucene-584.patch, 
 Matcher-20070905-2default.patch, Matcher-20070905-3core.patch, 
 Matcher-20071122-1ground.patch, Some Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-11-28 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546420
 ] 

Michael Busch commented on LUCENE-584:
--

Yes you're right, I ran the tests w/ code coverage analysis enabled, and the
BitSetMatcher is fully covered. Good!

 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Assignee: Michael Busch
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, lucene-584.patch, 
 Matcher-20070905-2default.patch, Matcher-20070905-3core.patch, 
 Matcher-20071122-1ground.patch, Some Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-11-27 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546148
 ] 

Michael Busch commented on LUCENE-584:
--

{quote}
1. introduce Matcher as superclass of Scorer and adapt javadocs to use matching 
consistently.
2. introduce MatchFilter as superclass of Filter and add a minimal 
DefaultMatcher to be used in IndexSearcher, i.e. add BitSetMatcher
{quote}

Paul, I like the iterative plan you suggested. I started reviewing the
Matcher-20071122-1ground.patch. I've some question:
- Is the API fully backwards compatible?
- Did you make performance tests to check whether BitSetMatcher is 
slower than using a bitset directly?
- With just the mentioned patch applied I get compile errors, 
because the DefaultMatcher is missing. Could you provide a patch that
also includes the BitSetMatcher and Filter#getMatcher() returns it?
Also I believe the patch should modify Hits.java to use MatchFilter 
instead of Filter? And a unit test that tests the BitSetMatcher 
would be nice!

 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, 
 Matcher-20070905-2default.patch, Matcher-20070905-3core.patch, 
 Matcher-20071122-1ground.patch, Some Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-09-22 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12529632
 ] 

Paul Elschot commented on LUCENE-584:
-

As the current patch set is large, I've been pondering how to do this in a 
series of smaller patches that can each be applied by itself. This is possible 
in the following way:

1. introduce Matcher as superclass of Scorer and adapt javadocs to use matching 
consistently.
2. introduce MatchFilter as superclass of Filter and add a minimal 
DefaultMatcher to be used in IndexSearcher, i.e. add BitSetMatcher
3. change the current Searcher/Searchable API to use MatchFilter instead of 
Filter.

Step 1 can be reasonably done before a new a release.
After step 2 this issue might be closed, and all the rest could be treated as 
new issues.

After that three (almost) independent paths can be followed:
4. add more data structures to be used for filter caches.
5. adapt CachingWrapperFilter to provide a Matcher from a cached datastructure, 
for example SortedVIntList or BitSet or OpenBitSet.
6. further use of Matcher, mostly in BooleanScorer2.

My question is: shall I go ahead and provide a patch for step 1?

At the moment I'm refining BooleanScorer2. to use Matcher. This is for the case 
of multiple prohibited clauses, and also to allow the use of required and 
prohibited Matchers to allow adding filtering clauses to BooleanQuery.


 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, 
 Matcher-20070905-1ground.patch, Matcher-20070905-2default.patch, 
 Matcher-20070905-3core.patch, Some Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-09-18 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12528539
 ] 

Paul Elschot commented on LUCENE-584:
-

The posted patch proposes to use this class to determine which documents should 
be filtered:

public abstract class Matcher {
  public abstract boolean next() throws IOException;
  public abstract boolean skipTo(int target) throws IOException;
  public abstract int doc();
  // plus a few more methods
}

This class is then used as a superclass of org.apache.lucene.search.Scorer.


 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, 
 Matcher-20070905-1ground.patch, Matcher-20070905-2default.patch, 
 Matcher-20070905-3core.patch, Some Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-08-22 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12522030
 ] 

Hoss Man commented on LUCENE-584:
-

I, unfortunately, haven't had the time to read through everything in the latest 
patches, but catching up on my jira mail one of Paul's comments jumped out at 
me, so i wanted to make sure it's completley clear: this latest set of patches 
completely breaks backwards compatibility for any clients who have Filter 
subclasses, or methods that take a Filter as a param, since the Filter class 
now has an abstract getMatcher method and no longer supports an abstract BitSet 
method -- presumably the expectation being that all client code should have a 
search/replace done from Filter=BitSetFilter

which begs the question: why not eliminate BitSetFilter and move it's 
getMatcher impl to the Filter class?  (if the concern is just that there be a 
higher level class in which both methods are abstract, why not insert a 
parent with some new name above the Filter class?)




For the record: it really bothers me that the old attachments got deleted ... 
the inability to refresh my memory by looking at the older patches and compare 
them with the current patches is extremely frustrating

 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, 
 Matcher1-ground-20070730.patch, Matcher2-default-20070730.patch, 
 Matcher3-core-20070730.patch, Matcher4-contrib-misc-20070730.patch, 
 Matcher5-contrib-queries-20070730.patch, Matcher6-contrib-xml-20070730.patch, 
 Some Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-08-09 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518642
 ] 

Mark Harwood commented on LUCENE-584:
-

Some further thought on the roles/responsibilities of the various components:

Given a blank sheet of paper (a luxury we may not have) the minimum 
requirements I would have could be met with the following:
(note that use of the words Matcher and Filter etc have been removed 
because sets of doc IDs have applications outside of filtering/querying e.g. 
category counts)

interface DocIdSetFactory
{
DocIdSet getDocIdSet(IndexReader reader)
}
This is more or less equivalent to the purpose of the existing Filter - 
different implementations define their own selection criteria and produce a set 
of matching doc Ids e.g. equivalent of RangeFilter. Each implementation must 
implement hashcode and equals methods based on it's criteria so the factory 
can be cached and reused (in the same way Query objects are expected to). The 
existing CachedFilterBuilder in the XMLQueryParser provides one example of a 
strategy for caching Filters using this facility. 


interface DocIdSet
{
DocIdSetIterator getIterator();
}
This interface defines an immutable, threadsafe (and therefore cachable) 
collection of doc IDs. Different implementations provide space-efficient 
alternatives for sparse or heavily populated sets e.g. BitSet, OpenBitSet, 
SortedVIntList. As an example caching strategy - the existing 
CachingWrapperFilter would cache these objects in a WeakHashMap keyed on 
IndexReader.

interface DocIdSetIterator
{
boolean next();
int getDoc();
   etc
}
A thread unsafe, single use object, (probably with only one implementation) 
that is used to iterate across any DocIdSet. Not cachable and used by Scorers.

In the existing proposal it feels like DocIdSet and DocIdSetIterator are rolled 
into one in the form of the Matcher which complicates/prevents caching 
strategies.

Cheers
Mark




 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, 
 Matcher1-ground-20070730.patch, Matcher2-default-20070730.patch, 
 Matcher3-core-20070730.patch, Matcher4-contrib-misc-20070730.patch, 
 Matcher5-contrib-queries-20070730.patch, Matcher6-contrib-xml-20070730.patch, 
 Some Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-08-09 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518825
 ] 

Paul Elschot commented on LUCENE-584:
-

Mark,

I said: there is never a threadsafety problem. (See BitSetMatcher.getMatcher() 
which uses a local class for the resulting Matcher.)
That was a mistake. BitSetMatcher is a Matcher constructed from a BitSet, and 
SortedVIntList has a getMatcher() method, and I confused the two.

A Matcher is intended to be used in a single thread, so I don't expect thread 
safety problems.

The problem for the XML parser is that with this patch, the implementing data 
structure of a Filter becomes
unaccessible from the Filter class, so it cannot be cached from there.
That means that some cached data structure will have to be chosen, and one way 
to do
that is by using class BitSetFilter from the patch. This has a bits() method 
just like the current Filter class.
CachingWrapperFilter could then become a cache for BitSetFilter.

There is indeed no caching of filters in this patch.
The reason for that is that some Filters do not need a cache. For example:
class TermFilter {
  TermFilter(Term t) {this.term = t;}
  Matcher getMatcher(reader) {return new TermMatcher( 
reader.termDocs(this.term);}
}
TermMatcher does not exist (yet), but it could be easily introduced by leaving 
all the
scoring out of the current TermScorer.

As for DocIdSet, as long as this provides a Matcher as an iterator, it can be 
used to
implement a (caching) filter.

I don't think this patch complicates the implementation of caching strategies.
For example one could define:
class CachableFilter extends Filter {
  ... some methods to access the underlying data structure to be cached. ...
}
or write a similar adapter for some subclass of Filter and then write a 
FilterCache that caches these.

I did consider defining Matcher as an interface, but I preferred not to do that 
because
of the default explain() method in the Matcher class of the patch.


 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, 
 Matcher1-ground-20070730.patch, Matcher2-default-20070730.patch, 
 Matcher3-core-20070730.patch, Matcher4-contrib-misc-20070730.patch, 
 Matcher5-contrib-queries-20070730.patch, Matcher6-contrib-xml-20070730.patch, 
 Some Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-08-09 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518845
 ] 

Mark Harwood commented on LUCENE-584:
-

Hi Paul,

Not sure we've reached a common understanding here yet.

You said That was a mistake. BitSetMatcher is a Matcher constructed from a 
BitSet, and SortedVIntList has a getMatcher() method, and I confused the two. 
Ok, thanks for the clarification. I still feel uncomfortable because the method 
getMatcher() is not abstracted to a common interface. This was the thinking 
behind my getIterator method on DocIdSet.

I too made a mistake in my earlier comments. DocIdSetIterator does NOT have 
probably one implementation. There would be an implementation for each 
different type of DocIdSet (Bitset/OpenBitSet/VIntList).

You said some Filters do not need a cache. For example: TermFilter.  I'm not 
sure why that has been singled out as not worthy of caching. I have certain 
terms (e.g. gender:male) where the TermDocs is very large (50% of all docs in 
the index!) so multiple calls to TermDocs for term gender:male (if that is 
what you are suggesting) is highly undesirable. These are typically handled in 
the XMLQueryParser using syntax like this:
  CachedFilter
TermsFilter fieldName=gendermale/TermsFilter
  /CachedFilter

You said: CachingWrapperFilter could then become a cache for BitSetFilter. 
This means that the only caching strategy is one based on bitsets - does this 
not lose perhaps the main benefit of your whole proposal? - the ability to have 
alternative space efficient storage of sets of document ids e.g. SortedVIntList.

If this is undesirable (my guess is yes) then the proposal in my previous 
comment is a solution which allows for caching of any/all types of the new sets 
(openBitSet,BitSet,SortedVIntList etc) Regardless of my choice of class names 
or decisions over interfaces vs abstract classes do you not at least agree the 
need for 3 types of functionality:

1) A factory for instantiating sets of document ids matching a particular set 
of criteria (which can be costly to call). While the factory is not expected to 
implement a caching  strategy it is expected to implement hashcode/equals 
simply to aid any caching services which would need this help to identify 
previously instantiated sets which share the same criteria as ant new requests 
(This service I identified as my DocIdSetFactory and TermsFilter/RangeFilter 
would be example implementations). 
2) An object representing an instantiated set of document ids which can be 
cached and can create iterators for use in seperate threads (identified as my 
DocIdSet -  example implementations being called something like BitSetDocSet, 
SortedVIntList) 
3) An iterator for a set of document ids (my DocIdSetIterator - example impls 
being called something like BitSetDocSetIterator SortedVIntListIterator)

Each type of functionality can have different implementations so the 
functionality must be defined using an interface or abstract class. 
If we can agree this much as a set of responsibilities then we can begin to map 
these services onto something more concrete.


Cheers
Mark






 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, 
 Matcher1-ground-20070730.patch, Matcher2-default-20070730.patch, 
 Matcher3-core-20070730.patch, Matcher4-contrib-misc-20070730.patch, 
 Matcher5-contrib-queries-20070730.patch, Matcher6-contrib-xml-20070730.patch, 
 Some Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-08-09 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518858
 ] 

Paul Elschot commented on LUCENE-584:
-

Mark,

I think we are one the same line, it's just that I don't want to go that far 
now.
Have another look at the title of this issue, it may be in your title bar, but 
otherwise 
it's quite a bit of scrolling so I'll repeat it here: Decouple Filter from 
BitSet. 
That is the main thing that this patch tries to do.

And that also makes it a starting point for caching of different data 
structures for Filters.
Caching of Filters is very much needed, but I'd rather see that as another 
issue.

The DefaultMatcher class tries to do some compression by using a SortedVIntList 
when that is smaller than a BitSet, and that is about as far as I'd like to go 
now.

Proost,
Paul Elschot


 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, 
 Matcher1-ground-20070730.patch, Matcher2-default-20070730.patch, 
 Matcher3-core-20070730.patch, Matcher4-contrib-misc-20070730.patch, 
 Matcher5-contrib-queries-20070730.patch, Matcher6-contrib-xml-20070730.patch, 
 Some Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-08-09 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518868
 ] 

Mark Harwood commented on LUCENE-584:
-

OK, I appreciate caching may not be a top priority in this proposal but I have 
live systems in production using XMLQueryParser and which use the existing core 
facilities for caching. As it stands this proposal breaks this functionality 
(see FIXME in contrib's CachedFilterBuilder and my concerns over use of  
unthreadsafe Matcher in the core class CachingWrapperFilter)

I am obviously concerned by this and keen to help shape a solution which 
preserves the existing capabilities while adding your new functionality. I'm 
not sure I share your view that support for caching can be treated as a 
separate issue to be dealt with at a later date. There are a larger number of 
changes proposed in this patch and if the design does not at least consider 
future caching issues now, I suspect much will have to be reworked later. The 
change I can envisage most clearly is expressed in my concern that the DocIdSet 
and DocIdSetIterator services I outlined are being combined in Matcher as it 
stands now and these functions will have to be separated.

Cheers
Mark

 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, 
 Matcher1-ground-20070730.patch, Matcher2-default-20070730.patch, 
 Matcher3-core-20070730.patch, Matcher4-contrib-misc-20070730.patch, 
 Matcher5-contrib-queries-20070730.patch, Matcher6-contrib-xml-20070730.patch, 
 Some Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-08-08 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518569
 ] 

Mark Harwood commented on LUCENE-584:
-

Hi Paul,
Many thanks for your responses.
Sorry for the delay in communications - just got back from 2 weeks holiday and 
slowly picking my way through this patch. 

You said: there is never a threadsafety problem. (See 
BitSetMatcher.getMatcher() which uses a local class for the resulting Matcher.)

Did you mean BitSetFilter.getMatcher()? BitSetMatcher has no getMatcher method.

If so, doesn't my original thread safety issue still stand? - 
CachingWrapperFilter is caching Matchers (not Filters which are factories for 
matchers). 

The existing approach of adding a CachedFilter tag around my XML-based query 
templates offers a major speed up in my applications and I don't see this 
supported in this patch currently which gives me some concern. This existing 
caching technique is based on the use of CachingWrapperFilter.

The proposed framework seems to be missing a means of caching reusable, 
threadsafe  Matchers in a type-independent fashion. One solution (which I think 
you may be suggesting with the getMatcher comment) is to cache Filter objects 
and use Filter.getMatcher(reader) as a factory method for thread-specific, 
single-use Matchers but this would suggest that any caching then becomes an 
implied responsibility/overhead of each Filter implementation. Not too great. 
CachingWrapperFilter is an example of a better design where the caching policy 
has been implemented in a single class and it can be used to decorate any 
Filter implementation (RangeFilter etc) with the required caching behaviour. 
Unfortunately with this proposed patch there is no way that any such single 
caching policy can work with any Filter because Matcher is not 
reusable/cachable. Time to remove any  thread-specific state from Matcher?


Cheers
Mark













 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, 
 Matcher1-ground-20070730.patch, Matcher2-default-20070730.patch, 
 Matcher3-core-20070730.patch, Matcher4-contrib-misc-20070730.patch, 
 Matcher5-contrib-queries-20070730.patch, Matcher6-contrib-xml-20070730.patch, 
 Some Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-07-30 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516502
 ] 

Paul Elschot commented on LUCENE-584:
-

Some more remarks on the 20070730 patches.

To recap, this introduces Matcher as a superclass of Scorer to take the role 
that BitSet currently has in Filter.

The total number of java files changed/added by these patches is 47, so some 
extra care will be needed.
The following issues are still pending:

What approach should be taken for the API change to Filter (see above, 2 
comments up)?

I'd like to get all test cases to pass again. TestRemoteCachingWrapperFilter 
still does not pass, and
I don't know why.

For xml-query-parser in contrib I'd like to know in which direction to proceed 
(see 1 comment up).
Does it make sense to try and get the TestQueryTemplateManager test to pass 
again?

The ..default.. patch has taken OpenBitSet and friends from solr to have a 
default implementation.
However, I have not checked whether there is unused code in there, so some 
trimming may still
be appropriate.

Once these issues have been resolved far enough, I would recommend to introduce 
this shortly after a release so there is some time to let things settle.



 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, 
 Matcher1-ground-20070730.patch, Matcher2-default-20070730.patch, 
 Matcher3-core-20070730.patch, Matcher4-contrib-misc-20070730.patch, 
 Matcher5-contrib-queries-20070730.patch, Matcher6-contrib-xml-20070730.patch, 
 Some Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-07-28 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516154
 ] 

Paul Elschot commented on LUCENE-584:
-

Mark,

An easy way to keep things like BooleanFilter working could be to
introduce a subclass of Filter, say BitsFilter that adds a bits(IndexReader) 
method.
This class should also implement getMatcher(), the default implementation could
be used for that initially.
Then BooleanFilter could simply be a subclass of BitsFilter, possibly without 
further
modifications, although I would prefer to rename it to BooleanBitsFilter.

That would only involve some deprecation warnings in BitsFilter for the period
that Filter.bits() is deprecated.

I would not even mind cooking this up as patch to contrib.

Thoughts?


 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, 
 Matcher-core20070725.patch, Matcher-default20070725.patch, 
 Matcher-ground20070725.patch, Some Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-07-26 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515630
 ] 

Paul Elschot commented on LUCENE-584:
-

Mark,

The exhausted flag is only in the iterator/Matcher, not in the underlying set 
data structure. One can use as many iterators as necessary, for example one per 
thread, and then there is never a threadsafety problem. (See 
BitSetMatcher.getMatcher() which uses a local class for the resulting Matcher.)

You wrote: I use BooleanFilter a lot for security where many large sets are 
cached and combined on the fly - caching all the possible combinations as 
single bitsets would lead to too many possible combinations.

That can still be done, but one needs to get to the BitSets for example by 
caching them outside the Filters and constructing the resulting BitSetMatcher 
for the combined Filter on the fly.

An alternative would be to have a BooleanQuery.add(Matcher, Occur), where the 
occurrence can only be required or prohibited. Then there is no need to 
construct any resulting filter because the boolean logic will be executed 
during the search.  This might even be more efficient than combining the full 
BitSets ahead of the search.

And with many large BitSets cache memory savings from more compact 
implementations can also be helpful.



 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, 
 Matcher-core20070725.patch, Matcher-default20070725.patch, 
 Matcher-ground20070725.patch, Some Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-07-26 Thread eks dev
Mark Harwood commented on LUCENE-584:


Hi Mark, we used to use Filters a lot...  and concluded,  Matcher is great!  It 
just takes some time to get it in head, let me try to help you get there :)

I saw BitSetMatcher etc and appreciate the motivation behind the design for 
alternative implementations . What concerns me with the Matcher API in general 
is that Matchers have non-threadsafe safe state (i.e. the current position 
required to support next() )and as such aren't safely cachable in the same way 
as BitSets. I see the searcher code uses the safer skipTo() rather than next()  
but there's still the if(exhausted) thread safety problem to worry about 
which is why I raised points 1 and 4.


1. Caching Issue: You do not want to cache Matcher, this is just an Iterator 
with forward skipping possibility, why would one cache iterators? (can  be 
done by introducing rewind(), maybe not bad idea?). What you really need to put 
in cache is object that implements Matcher interface, or some object for which 
is easy to provide Matcher interface.

2. thread safety issue I did not get it, what scenario you see here? 

Additionally, combining Bitsets using Booolean logic is one method call 
whereas combining heterogenous Matchers using Boolean logic requires iteration 
across them and therefore potentially many method calls (point 3). 

3. Lucene core uses next() and skipTo() to combine Filter/Query today, there 
are no BitSet.and(BitSet) in Lucene core! this is not going to be changed. If 
yo need to combine bit sets, you can do it easily on classes that implement 
Matcher (imagine, you have two OpenBitSets and they implement Matcher, nothing 
prevents you from OpenBitSet.and(OpenBitSet)-ing these implementing objects? ). 
Simply, you are not less flexible due to Matcher, simply you can do everything 
as before,  you are just  not bound  to  memory hungry, slow BitSet ...

I haven't benchmarked this but I imagine it to be significantly slower?
Sure,  but you do not have to make your Filter arithmetic via Matcher, just do 
it directly on your implementing classes. 

I use BooleanFilter a lot for security where many large sets are cached and 
combined on the fly - caching all the possible combinations as single bitsets 
would lead to too many possible combinations. 

You can freely keep something like BooleanFilter , even make it faster with 
OpenBitSet, or something else even faster, memory better,  and than, once you 
have Filter content you'd like to use, just pass it as Matcher to search() 
method and ta da, yo have it.










  ___ 
Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for
your free account today 
http://uk.rd.yahoo.com/evt=44106/*http://uk.docs.yahoo.com/mail/winter07.html


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-07-25 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515313
 ] 

Paul Elschot commented on LUCENE-584:
-

There is some code in contrib where a Filter is assumed to have BitSet 
available:

contrib/queries/src/java/org/apache/lucene/search/BooleanFilter.java
contrib/miscellaneous/src/java/org/apache/lucene/misc/ChainedFilter.java

When Filter is going to move from BitSet to Matcher, these will have to be 
reimplemented.
They basically use Filters to provide BitSets, but it seems to me that they also
could use lists of BitSets, for example.


 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, BitsMatcher.java, 
 DefaultMatcher20070725.patch, Filter-20060628.patch, 
 HitCollector-20060628.patch, IndexSearcher-20060628.patch, 
 MatchCollector.java, Matcher.java, Matcher20070226.patch, 
 Scorer-20060628.patch, Searchable-20060628.patch, Searcher-20060628.patch, 
 Some Matchers.zip, SortedVIntList.java, TestSortedVIntList.java


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-07-25 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515395
 ] 

Mark Harwood commented on LUCENE-584:
-

Hi Paul,
Not sure if I'm missing something but I think this patch may not work for 
scenarios other than the simple option of a single filter being used on a 
search.

A Matcher does not have the same utility as a BitSet because using a BitSet you 
can:

1) iterate across it using multiple threads.
2) Clone it.
3) Merge it quickly with other bitsets using Boolean logic .
4) Use it more than once.

I think these differences become important in the following scenarios :

In CachingWrapperFilter I don't think you can cache Matchers instead of bitsets 
- because Matchers don't have features 1 and 4

BooleanFilter and ChainedFilter in contrib don't work with Matchers because 
there is no support  for 3) 

Is there something obvious I've missed?

Cheers
Mark

 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, 
 Matcher-core20070725.patch, Matcher-default20070725.patch, 
 Matcher-ground20070725.patch, Some Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-07-25 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515434
 ] 

Paul Elschot commented on LUCENE-584:
-

Have a look at BitSetMatcher in the -default patch. It is constructed from a 
BitSet, and it has a method getMatcher() that returns a Matcher that acts as a 
searching iterator over the BitSet.

So that is 1) to 4), at least potentially. A clone() method is currently not 
implemented iirc, but each call to getMatcher() will return a new iterator over 
the underlying BitSet. And when guaranteed non modifyability is needed, a 
constructor can take a copy of the given document set, in whatever form.

The point of Matcher is that it allows other implementations than BitSet, like 
OpenBitSet and SortedVIntList. Both have the properties that you are looking 
for. SortedVIntList can
save a lot of memory when compared to (Open)BitSet, and OpenBitSet is somewhat 
faster than BitSet. 

I'd like to have a skip list version of SortedVIntList, too. This would be 
slightly larger than SortedVIntList, but more efficient on skipTo().

But the first thing that is necessary is to have Filter independent from BitSet.

The real pain with that is going to be the code that currently implements 
Filters
outside the lucene code base, and a default implementation of a Matcher
should be of help there, just as it is in the -core patch now.

The default implementation will probably need to be improved from its current
state, but that can be done later. For example, one could also use OpenBitSet
in all cases, and even collect the filtered documents directly in that.

Cheers,
Paul Elschot

 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, 
 Matcher-core20070725.patch, Matcher-default20070725.patch, 
 Matcher-ground20070725.patch, Some Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-07-25 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515437
 ] 

Paul Elschot commented on LUCENE-584:
-

I forgot to mention that boolean logic on Matchers is already in present in 
BooleanScorer2.
This is because each Scorer is a Matcher.

 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, 
 Matcher-core20070725.patch, Matcher-default20070725.patch, 
 Matcher-ground20070725.patch, Some Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-07-25 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515494
 ] 

Mark Harwood commented on LUCENE-584:
-

Thanks for the reply, Paul.

I saw BitSetMatcher etc and appreciate the motivation behind the design for 
alternative implementations . What concerns me with the Matcher API in general 
is that Matchers have non-threadsafe safe state (i.e. the current position 
required to support next() )and as such aren't safely cachable in the same way 
as BitSets. I see the searcher code uses the safer skipTo() rather than next()  
but there's still the if(exhausted) thread safety problem to worry about 
which is why I raised points 1 and 4.

Additionally, combining Bitsets using Booolean logic is one method call whereas 
combining heterogenous Matchers using Boolean logic requires iteration across 
them and therefore potentially many method calls (point 3). I haven't 
benchmarked this but I imagine it to be significantly slower?
I use BooleanFilter a lot for security where many large sets are cached and 
combined on the fly - caching all the possible combinations as single bitsets 
would lead to too many possible combinations. 

Cheers
Mark

 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, 
 Matcher-core20070725.patch, Matcher-default20070725.patch, 
 Matcher-ground20070725.patch, Some Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-07-09 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511186
 ] 

Paul Elschot commented on LUCENE-584:
-

With 2.2 out, and LUCENE-730 out of the way, wouldn't this be a good moment for 
some progress with this issue?
The patch still applies cleanly, and I'd like to start working on a skipping 
extension of SortedVIntList, much like the latest index format for document 
lists.


 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, BitsMatcher.java, 
 Filter-20060628.patch, HitCollector-20060628.patch, 
 IndexSearcher-20060628.patch, MatchCollector.java, Matcher.java, 
 Matcher20070226.patch, Scorer-20060628.patch, Searchable-20060628.patch, 
 Searcher-20060628.patch, Some Matchers.zip, SortedVIntList.java, 
 TestSortedVIntList.java


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet; relation with LUCENE-730

2007-04-15 Thread eks dev
a totally different view on Filters would be to think about them as index 
slicer, at the lowest possible level, TermDocs.  Basically all document ids in 
such Filter  would appear, at the TermDocs level as if not in index, simply 
invisible.

TermDocs  that is aware  of  Filtered doc ids (doing skipping over Filter 
AND Term).

for example, one could extend FilterIndexReader, provide setFilter(Matcher) 
method on it and than method termDocs() would need to check if Matcher == null 
and return  TermDocs instanca that hides or not Filtered documents

it looks too simple to be real, nice thing about it, as far as I can tell,  it 
does not require  any changes  in Lucene core!   

conceptually,  it filters some documents out of index, simply provides another 
view on index (hence FilterIndexReader). The same as current Filter, but with a 
bit shifted perspective on providing index view-s.

It is rather possible that this idea sucks big time, please let me know if you 
see anything super wrong with it. 








  ___
Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
now.
http://uk.answers.yahoo.com/ 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet; relation with LUCENE-730

2007-04-14 Thread eks dev
Hoss, what about radical approach :) Instead of Decouple Filter from BitSet 
change target to add support for Matcher, meaning:

- do not change Filter and existing search() method in InedexSearcher at all, 
leave it as it is, no new assumptions about anything

- Add IndexSearcher.search() method that uses Matcher and makes documented 
assumption that Scorer used to score supplied Query supports skipTo. 

how I see it, as long as  we have this degree of freedom, optional support for 
skipTo() in Scorer, we will have to have implicit knowledge of this fact for 
any code that interacts with Scorer, one way or another.

Making skipTo() required for scorers would be nice, big, simplifying change, 
but this is *way out of my league* to argue something like that (I simply have 
no idea what implications, effort... this could have). 

this would work as this search method with Matcher gets expert status until 
we find a way to relax this assumption. and actually do what we wanted in the 
first place decouple Filter from BitSet

cheers, e.



- Original Message 
From: Chris Hostetter [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Saturday, 14 April, 2007 1:13:21 AM
Subject: Re: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet; 
relation with LUCENE-730


:  Hoss, would this work (is this what you said)?

:  public Matcher getMatcher(IndexReader reader) throws IOException {
:if(bits() == null) throw new SomeException(Filter must implement at least
: one of...);
:return new BitsMatcher(bits());
:  }

Assuming BitsMatcher does what i think it does then yes, that's what i had
in mind ... i was specificly saying to make a default Matcher
implementation out of the code in the patched version of IndexSearcher
that has the comment...

+} else { // bits for filtering, skipTo() not used on scorer:

: This will not work correctly when the Scorer for the query that is searched
: with a filter does not implement skipTo(), for example BooleanScorer.
: See also the javadoc of class IndexSearcher in the patch.

I don'tget it, how would a Scorer not implement skipTo? ...oh...

final class BooleanScorer extends Scorer {
  ...
  public boolean skipTo(int target) {
throw new UnsupportedOperationException();
  }

...so lemme see if i understand this:

What's happening in the current trunk is that the only situations
in which code will attempt to call skipTo on a Scorer are:
 a) From the score(HitCollector hc) method of the same Scorer class
(you should know if you suport it, you're in the class)
 b) From the skipTo method of an enclosing Scorer
(If you add Scorer X to a a wrapper Scorer Y, and Y implements
skipTo, it can assume that X implements skipTo).

Am I correct so far?

In the latest version of the Matcher patch...
https://issues.apache.org/jira/secure/attachment/12352057/Matcher20070226.patch
...this changes, such that IndexSearcher will assume a Scorer supports
skipTo iff a Filter is used which implements getMatcher (I guess the
assumption being that if the code being used is new enough to support Matchers, 
it's
new enough to support Scorer.skipTo).  *BUT* if it's an old Filter using
a BitSet the code in IndexSearcher will continue with the same old
assumptions about the Scorer.

And the change eks describes (which is a much better way to describe what
i was suggesting) would break this safety net by always assuming skipTo
was safe to call.

So really the issue is that the patch assumpes one thing (Scorer supports
skipTo) based on the presence of something that should be thought of as
newer (Filter supports getMatcher) and relying on documentation to
enforce this.

Am I caught up now?

Off the top of my head, the best solution i can think of to this issue
would be to add the naive implementation of skipTo to Scorer, remove
the UnsupportedOperationException of skipTo from all Scorers in the core,
and rev Lucene to version 3.0 since this would probably be considered a
serious API change (method sigs don't change, but now we're requiring
people to implement a method that we have said in the past (by example)
can be Unsupported.

In general i'm not fond of assuming Scorer.skipTo when Filter.getMatcher
... the concepts are really orthoginal and even if it's a decent
assumption to make today, it doens't help us tomorow when we want to add a
getMatcher method to all of the core Filter classes to improve
performance.



-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






  ___ 
Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for
your free account today 
http://uk.rd.yahoo.com/evt=44106/*http://uk.docs.yahoo.com/mail/winter07.html 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional

Re: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet; relation with LUCENE-730

2007-04-14 Thread Paul Elschot
Hoss,

A bit long, sorry for that, sometimes things are just as complex as they are.

On Saturday 14 April 2007 01:13, Chris Hostetter wrote:
 
...
 
 I don'tget it, how would a Scorer not implement skipTo? ...oh...
 
   final class BooleanScorer extends Scorer {
 ...
 public boolean skipTo(int target) {
   throw new UnsupportedOperationException();
 }

Some history for the underlying reason for this:

Once upon a time no Scorer would implement skipTo().
Most people would use BooleanScorer for queries with multiple terms, and 
things worked well with the Scorer.next() method, especially for 
disjunctions. Occasionally documents would be scored out of document order, 
but that did not lead to problems because Hits would reorder the documents by 
score value anyway.

Then skipTo() was added to improve the speed of conjunctions. To do this each 
Scorer needs to score all documents in document number order and implement 
skipTo() because it skipTo() used by ConjunctionScorer. BooleanScorer will 
only use ConjunctionScorer in very specific (but also frequently occurring) 
circumstances. At this point the index format was also changed to include the 
skip forward information.

As I said, the implementation of disjunctions in BooleanScorer does not score 
documents strictly in document order. It can be made to do that, but that 
would lead to some loss of performance. BooleanScorer uses a kind of 
distributive sort that is faster than the priority queue used by 
DisjunctionSumScorer.

Then BooleanScorer2 came along. BooleanScorer2 uses ConjunctionScorer in more 
circumstances than BooleanScorer., and it usesuses DisjunctionSumScorer for 
disjunctions. LUCENCE-730 is an attempt to get the top level disjunction 
performance of BooleanScorer back.

Disjunctions below top level, for example in a query like this:
+(a1 a2) +(b1 b2)
need skipTo() (called from ConjunctionScorer) on the two nested disjunctions, 
and for that DisjunctionSumScorer is used. Currently for the top level 
disjunction case:
a1 a2 b1 b2
DisjunctionSumScorer is normally used. But when the setUseScorer14() method is 
used, BooleanScorer will (always?) be used. The patch at LUCENE-584 tries to 
handle this setUseScorer14() case by keeping also the old filtering method 
that checks the Bits individually in IndexSearcher.
LUCENE-730 will always use BooleanScorer for the top level disjunctions, so 
with a bit of luck the setUseScorer14 method can also be deprecated/removed.

LUCENE-584 has another possible performance advantage in that it allows an 
implementation of filtering by using a ConjunctionScorer directly instead of 
doing the filtering in IndexSearcher, but that still needs to be added.

Regards,
Paul Elschot

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-04-13 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488733
 ] 

Hoss Man commented on LUCENE-584:
-

I'm still behind on following this issue, but Otis: if you are interested in 
moving forward with this, you might consider trying the cahnges i proposed in 
my 15/Mar/07 11:06 AM Comment...

https://issues.apache.org/jira/browse/LUCENE-584#action_12481263

...I think it would keep IndexSearcher a little cleaner, and make it easier for 
people to migrate existing Filter's gradually (without requiring extra work for 
people writing new Matcher style Filters from scratch)

 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, BitsMatcher.java, 
 Filter-20060628.patch, HitCollector-20060628.patch, 
 IndexSearcher-20060628.patch, MatchCollector.java, Matcher.java, 
 Matcher20070226.patch, Scorer-20060628.patch, Searchable-20060628.patch, 
 Searcher-20060628.patch, Some Matchers.zip, SortedVIntList.java, 
 TestSortedVIntList.java


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-04-13 Thread eks dev
Hoss, would this work (is this what you said)? 
 
public BitSet bits(IndexReader reader) throws IOException{
 return null;
}

public Matcher getMatcher(IndexReader reader) throws IOException {
  if(bits() == null) throw new SomeException(Filter must implement at least 
one of...); 
  return new BitsMatcher(bits());
}

and IndexSearcher does not have any logic, just uses getMatcher()
current implementations would work, new as well

- Original Message 
From: Hoss Man (JIRA) [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Friday, 13 April, 2007 8:01:16 PM
Subject: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet


[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488733
 ] 

Hoss Man commented on LUCENE-584:
-

I'm still behind on following this issue, but Otis: if you are interested in 
moving forward with this, you might consider trying the cahnges i proposed in 
my 15/Mar/07 11:06 AM Comment...

https://issues.apache.org/jira/browse/LUCENE-584#action_12481263

...I think it would keep IndexSearcher a little cleaner, and make it easier for 
people to migrate existing Filter's gradually (without requiring extra work for 
people writing new Matcher style Filters from scratch)

 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, BitsMatcher.java, 
 Filter-20060628.patch, HitCollector-20060628.patch, 
 IndexSearcher-20060628.patch, MatchCollector.java, Matcher.java, 
 Matcher20070226.patch, Scorer-20060628.patch, Searchable-20060628.patch, 
 Searcher-20060628.patch, Some Matchers.zip, SortedVIntList.java, 
 TestSortedVIntList.java


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






  ___ 
Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for
your free account today 
http://uk.rd.yahoo.com/evt=44106/*http://uk.docs.yahoo.com/mail/winter07.html

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet; relation with LUCENE-730

2007-04-13 Thread Paul Elschot
On Friday 13 April 2007 22:10, eks dev wrote:
 Hoss, would this work (is this what you said)? 
  
 public BitSet bits(IndexReader reader) throws IOException{
  return null;
 }
 
 public Matcher getMatcher(IndexReader reader) throws IOException {
   if(bits() == null) throw new SomeException(Filter must implement at least 
one of...); 
   return new BitsMatcher(bits());
 }

This will not work correctly when the Scorer for the query that is searched
with a filter does not implement skipTo(), for example BooleanScorer.
See also the javadoc of class IndexSearcher in the patch.

LUCENE-730 explicitly uses BooleanScorer, but only for the non filtered case
with a top level disjunction.

I think that with LUCENE-730 also added, the filtered case with BooleanScorer 
would go away, allowing to simplify this logic in IndexSearcher.
This simplification of IndexSearcher is not in the LUCENE-730 patch, because 
LUCENE-584 is not committed. At the moment I don't know precisely what
IndexSearcher would look like after LUCENE-730.

With LUCENE-730 BooleanScorer.setUseScorer14() could also be 
removed/deprecated, but that is also not yet in the LUCENE-730 patch.

Regards,
Paul Elschot

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet; relation with LUCENE-730

2007-04-13 Thread eks dev
ok , i see, thanks for hand holding here.

the simplest solution would be (without making another bigger/riskier patch):

- commit LUCENE-584 as is; no harm to anyone but some temporary complexity in 
IndexSearcher

- commit   LUCENE-730 - does no harm

- open new Jura issue Simplify Filter usage in IndexSearcher and re-factor 
Filter to behave as Hoss mentioned  it


- Original Message 
From: Paul Elschot [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Friday, 13 April, 2007 11:05:10 PM
Subject: Re: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet; 
relation with LUCENE-730

On Friday 13 April 2007 22:10, eks dev wrote:
 Hoss, would this work (is this what you said)? 
  
 public BitSet bits(IndexReader reader) throws IOException{
  return null;
 }
 
 public Matcher getMatcher(IndexReader reader) throws IOException {
   if(bits() == null) throw new SomeException(Filter must implement at least 
one of...); 
   return new BitsMatcher(bits());
 }

This will not work correctly when the Scorer for the query that is searched
with a filter does not implement skipTo(), for example BooleanScorer.
See also the javadoc of class IndexSearcher in the patch.

LUCENE-730 explicitly uses BooleanScorer, but only for the non filtered case
with a top level disjunction.

I think that with LUCENE-730 also added, the filtered case with BooleanScorer 
would go away, allowing to simplify this logic in IndexSearcher.
This simplification of IndexSearcher is not in the LUCENE-730 patch, because 
LUCENE-584 is not committed. At the moment I don't know precisely what
IndexSearcher would look like after LUCENE-730.

With LUCENE-730 BooleanScorer.setUseScorer14() could also be 
removed/deprecated, but that is also not yet in the LUCENE-730 patch.

Regards,
Paul Elschot

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






  ___
Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
now.
http://uk.answers.yahoo.com/ 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet; relation with LUCENE-730

2007-04-13 Thread Chris Hostetter

:  Hoss, would this work (is this what you said)?

:  public Matcher getMatcher(IndexReader reader) throws IOException {
:if(bits() == null) throw new SomeException(Filter must implement at least
: one of...);
:return new BitsMatcher(bits());
:  }

Assuming BitsMatcher does what i think it does then yes, that's what i had
in mind ... i was specificly saying to make a default Matcher
implementation out of the code in the patched version of IndexSearcher
that has the comment...

+} else { // bits for filtering, skipTo() not used on scorer:

: This will not work correctly when the Scorer for the query that is searched
: with a filter does not implement skipTo(), for example BooleanScorer.
: See also the javadoc of class IndexSearcher in the patch.

I don'tget it, how would a Scorer not implement skipTo? ...oh...

final class BooleanScorer extends Scorer {
  ...
  public boolean skipTo(int target) {
throw new UnsupportedOperationException();
  }

...so lemme see if i understand this:

What's happening in the current trunk is that the only situations
in which code will attempt to call skipTo on a Scorer are:
 a) From the score(HitCollector hc) method of the same Scorer class
(you should know if you suport it, you're in the class)
 b) From the skipTo method of an enclosing Scorer
(If you add Scorer X to a a wrapper Scorer Y, and Y implements
skipTo, it can assume that X implements skipTo).

Am I correct so far?

In the latest version of the Matcher patch...
https://issues.apache.org/jira/secure/attachment/12352057/Matcher20070226.patch
...this changes, such that IndexSearcher will assume a Scorer supports
skipTo iff a Filter is used which implements getMatcher (I guess the
assumption being that if the code being used is new enough to support Matchers, 
it's
new enough to support Scorer.skipTo).  *BUT* if it's an old Filter using
a BitSet the code in IndexSearcher will continue with the same old
assumptions about the Scorer.

And the change eks describes (which is a much better way to describe what
i was suggesting) would break this safety net by always assuming skipTo
was safe to call.

So really the issue is that the patch assumpes one thing (Scorer supports
skipTo) based on the presence of something that should be thought of as
newer (Filter supports getMatcher) and relying on documentation to
enforce this.

Am I caught up now?

Off the top of my head, the best solution i can think of to this issue
would be to add the naive implementation of skipTo to Scorer, remove
the UnsupportedOperationException of skipTo from all Scorers in the core,
and rev Lucene to version 3.0 since this would probably be considered a
serious API change (method sigs don't change, but now we're requiring
people to implement a method that we have said in the past (by example)
can be Unsupported.

In general i'm not fond of assuming Scorer.skipTo when Filter.getMatcher
... the concepts are really orthoginal and even if it's a decent
assumption to make today, it doens't help us tomorow when we want to add a
getMatcher method to all of the core Filter classes to improve
performance.



-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-04-10 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487706
 ] 

Paul Elschot commented on LUCENE-584:
-

That could be improved in a DisjunctionMatcher.
With a bit of bookkeeping DisjunctionSumScorer could also delay calling score() 
on the subscorers
but the bookkeeping would affect performance for the normal case.

For the usual queries the score() call will never have much of a performance 
impact.
The reason for this is that TermScorer.score() is really very efficient, iirc 
it caches
weighted tf() values for low term frequencies.
All the rest is mostly additions, and occasionally a multiplication for a 
coordination factor.

To determine which documents match the query, the index need to be accessed,
and that takes more time than score value computations because the complete 
index
almost never fits in the fastest cache.



 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, BitsMatcher.java, 
 Filter-20060628.patch, HitCollector-20060628.patch, 
 IndexSearcher-20060628.patch, MatchCollector.java, Matcher.java, 
 Matcher20070226.patch, Scorer-20060628.patch, Searchable-20060628.patch, 
 Searcher-20060628.patch, Some Matchers.zip, SortedVIntList.java, 
 TestSortedVIntList.java


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-04-10 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487789
 ] 

Otis Gospodnetic commented on LUCENE-584:
-

Ah, too bad. :(
Last time I benchmarked Lucene searching on Sun's Niagara vs. non-massive Intel 
boxes, Intel boxes with Linux on them actually won, and my impression was that 
this was due to Niagara's weak FPU (a known weakness in Niagara, I believe).  
Thus, I thought, if we could just skip scoring and various floating point 
calculations, we'd see better performance, esp. on Niagara boxes.

Paul, when you say fastest cache, what exactly are you referring to?  The 
Niagara I tested things on had 32GB of RAM, and I gave the JVM 20+GB, so at 
least the JVM had plenty of RAM to work with.


 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, BitsMatcher.java, 
 Filter-20060628.patch, HitCollector-20060628.patch, 
 IndexSearcher-20060628.patch, MatchCollector.java, Matcher.java, 
 Matcher20070226.patch, Scorer-20060628.patch, Searchable-20060628.patch, 
 Searcher-20060628.patch, Some Matchers.zip, SortedVIntList.java, 
 TestSortedVIntList.java


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-04-10 Thread eks dev

If I remember well, the last time we profiled search with high density  OR 
queries  scoring was taking up to 30% of the time. This was a 8Mio collection 
of short documents fitting comfortably in RAM. So I am sure disabling scoring 
in some cases could bring us something. 

I am not all that familiar with scoring inner workings to stand 100% behind 
this statement, so please take it with some healthy reserve.

But anyhow, with Matcher in place, we have at least a chance to prove it brings 
something for this scenario. For Filtering case it brings definitely a lot. 

on the other note, 
Paul, would it be possible/easy to have something like. It looks easy to add 
it, but I may be missing something: 
BooleanQuery.add(Matcher mtr,
BooleanClause.Occur occur)



- Original Message 
From: Otis Gospodnetic (JIRA) [EMAIL PROTECTED]
To: java-dev@lucene.apache.org
Sent: Tuesday, 10 April, 2007 5:11:32 PM
Subject: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet


[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487789
 ] 

Otis Gospodnetic commented on LUCENE-584:
-

Ah, too bad. :(
Last time I benchmarked Lucene searching on Sun's Niagara vs. non-massive Intel 
boxes, Intel boxes with Linux on them actually won, and my impression was that 
this was due to Niagara's weak FPU (a known weakness in Niagara, I believe).  
Thus, I thought, if we could just skip scoring and various floating point 
calculations, we'd see better performance, esp. on Niagara boxes.

Paul, when you say fastest cache, what exactly are you referring to?  The 
Niagara I tested things on had 32GB of RAM, and I gave the JVM 20+GB, so at 
least the JVM had plenty of RAM to work with.


 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, BitsMatcher.java, 
 Filter-20060628.patch, HitCollector-20060628.patch, 
 IndexSearcher-20060628.patch, MatchCollector.java, Matcher.java, 
 Matcher20070226.patch, Scorer-20060628.patch, Searchable-20060628.patch, 
 Searcher-20060628.patch, Some Matchers.zip, SortedVIntList.java, 
 TestSortedVIntList.java


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






  ___ 
Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for
your free account today 
http://uk.rd.yahoo.com/evt=44106/*http://uk.docs.yahoo.com/mail/winter07.html

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-04-10 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487882
 ] 

Paul Elschot commented on LUCENE-584:
-

By fastest cache I meant the L1 cache of the processor. The size is normally in 
tens of kilobytes.
An array lookup hitting that cache takes about as much time as a floating point 
addition.

During a query search the use of a.o. the term frequencies, the proximity data, 
and the document weights normally cause an L1 cache miss.

I would expect that by not doing the score value computations, only the cache 
misses for document weights can be saved.


 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, BitsMatcher.java, 
 Filter-20060628.patch, HitCollector-20060628.patch, 
 IndexSearcher-20060628.patch, MatchCollector.java, Matcher.java, 
 Matcher20070226.patch, Scorer-20060628.patch, Searchable-20060628.patch, 
 Searcher-20060628.patch, Some Matchers.zip, SortedVIntList.java, 
 TestSortedVIntList.java


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-04-10 Thread Paul Elschot
On Tuesday 10 April 2007 17:41, eks dev wrote:
 
 If I remember well, the last time we profiled search with high density  OR 
queries  scoring was taking up to 30% of the time. This was a 8Mio collection 
of short documents fitting comfortably in RAM. So I am sure disabling scoring 
in some cases could bring us something. 
 
 I am not all that familiar with scoring inner workings to stand 100% behind 
this statement, so please take it with some healthy reserve.

For high density OR I'd guess most of the work was spent maintaining
the priority queue by document number. See also LUCENE-730 .

 
 But anyhow, with Matcher in place, we have at least a chance to prove it 
brings something for this scenario. For Filtering case it brings definitely a 
lot. 
 
 on the other note, 
 Paul, would it be possible/easy to have something like. It looks easy to add 
it, but I may be missing something: 
 BooleanQuery.add(Matcher mtr,
 BooleanClause.Occur occur)

That's one of the things I'd like to see added. It would allow a single
ConjunctionScorer to do a filtered search for a query with some
required terms.

Regards,
Paul Elschot


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-04-10 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487940
 ] 

Hoss Man commented on LUCENE-584:
-

I'm a little behind on following this issue, but if i can attempt to sum up the 
recent discussion about performance...

   Migrating towards a Matcher API *may* allow some types of Queries to be 
faster in situations where clients can use a MatchCollector instead of a 
HitCollector, but this won't be a silver bullet performance win for all Query 
classes -- just those where some of the score calculations is (or can be) 
isolated to the score method (as opposed to skipTO or next)

I think it's important to remember the motivation of this issue wasn't to 
improve the speed performance of non-scoring searchers, it was to decouple the 
concept of Filtering results away from needing to populate a (potentially 
large) BitSet when the logic neccessary for Filtering can easily be expressed 
in terms of a doc iterator (aka: a Matcher) -- opening up the possibility of 
memory performance improvements.  

A second benefit that has arisen as the issue evolved, has been the API 
generalization of the Matcher concept to be a super class of Scorer for 
simpler APIs moving forward.




 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, BitsMatcher.java, 
 Filter-20060628.patch, HitCollector-20060628.patch, 
 IndexSearcher-20060628.patch, MatchCollector.java, Matcher.java, 
 Matcher20070226.patch, Scorer-20060628.patch, Searchable-20060628.patch, 
 Searcher-20060628.patch, Some Matchers.zip, SortedVIntList.java, 
 TestSortedVIntList.java


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-04-10 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487966
 ] 

Otis Gospodnetic commented on LUCENE-584:
-

Right.  I was under the wrong impression that the Matcher also happens to avoid 
scoring.  However, now that we've all looked at this patch (still applies 
cleanly and unit tests all pass), and nobody had any criticisms, I think we 
should commit it, say this Friday.

As I'm in the performance squeezing mode, I'll go look at LUCENE-730, another 
one of Paul's great patches, and see if I can measure performance improvement 
there.


 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, BitsMatcher.java, 
 Filter-20060628.patch, HitCollector-20060628.patch, 
 IndexSearcher-20060628.patch, MatchCollector.java, Matcher.java, 
 Matcher20070226.patch, Scorer-20060628.patch, Searchable-20060628.patch, 
 Searcher-20060628.patch, Some Matchers.zip, SortedVIntList.java, 
 TestSortedVIntList.java


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-04-09 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487594
 ] 

Yonik Seeley commented on LUCENE-584:
-

 When you rerun, you may want to use my alg - to compare the two approaches in 
 one run.

This is more dangerous though.  GC from one method's garbage can penalize the 
2nd methods performance.
Also, hotspot effects are hard to account for (if method1 and method2 use 
common methods, method2 will often execute faster than method one because more 
optimization has been done on those common methods).

The hotspot effect can be minimized by running the test multiple times in the 
same JVM instance and discarding the first runs, but it's not so easy for GC.

 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, BitsMatcher.java, 
 Filter-20060628.patch, HitCollector-20060628.patch, 
 IndexSearcher-20060628.patch, MatchCollector.java, Matcher.java, 
 Matcher20070226.patch, Scorer-20060628.patch, Searchable-20060628.patch, 
 Searcher-20060628.patch, Some Matchers.zip, SortedVIntList.java, 
 TestSortedVIntList.java


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-04-09 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487613
 ] 

Mike Klaas commented on LUCENE-584:
---

Instead of discarding the first run, the approach I usually take is to run 3-4 
times and pick the minimum.  You can then run several of these sets and 
average over the minimum of each.  GC is still an issues, though.  It is hard 
to get around when it is a marksweep collector (reference counting is much 
friendlier in this regard)

 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, BitsMatcher.java, 
 Filter-20060628.patch, HitCollector-20060628.patch, 
 IndexSearcher-20060628.patch, MatchCollector.java, Matcher.java, 
 Matcher20070226.patch, Scorer-20060628.patch, Searchable-20060628.patch, 
 Searcher-20060628.patch, Some Matchers.zip, SortedVIntList.java, 
 TestSortedVIntList.java


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-04-09 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487616
 ] 

Doron Cohen commented on LUCENE-584:


  When you rerun, you may want to use my alg - to compare the two approaches 
  in one run. 
 This is more dangerous though. 

Agree. I was trying to get rid of this by splitting each round to 3: - gc(), 
warm(), work() - when work() and warm() are the same, just that warm()'s stats 
are disregarded. Still switching the order of by match and by bits yield 
different results. 

Sometimes we would like not to disregard GC - in particular if one approach is 
creating more (or more complex) garbage than another approach. 

Perhaps we should look at two measures: best  avg/sum (2nd ignoring first run, 
for hotspot). 


 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, BitsMatcher.java, 
 Filter-20060628.patch, HitCollector-20060628.patch, 
 IndexSearcher-20060628.patch, MatchCollector.java, Matcher.java, 
 Matcher20070226.patch, Scorer-20060628.patch, Searchable-20060628.patch, 
 Searcher-20060628.patch, Some Matchers.zip, SortedVIntList.java, 
 TestSortedVIntList.java


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-04-09 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487631
 ] 

Otis Gospodnetic commented on LUCENE-584:
-

Doron: just to address your question from Apr/7 - I expect/hope to see an 
improvement in performance because of this difference:

  hc.collect(doc(), score()); 
  mc.collect(doc()); 

the delta being the cost of the score() call that does the scoring.  If I 
understand things correctly, that means that what grant described at the bottom 
of http://lucene.apache.org/java/docs/scoring.html will all be skipped.  No 
Scorer, no BooleanScorer(2), no ConjunctionScorer...


 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, BitsMatcher.java, 
 Filter-20060628.patch, HitCollector-20060628.patch, 
 IndexSearcher-20060628.patch, MatchCollector.java, Matcher.java, 
 Matcher20070226.patch, Scorer-20060628.patch, Searchable-20060628.patch, 
 Searcher-20060628.patch, Some Matchers.zip, SortedVIntList.java, 
 TestSortedVIntList.java


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-04-09 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487667
 ] 

Doron Cohen commented on LUCENE-584:


 No Scorer, no BooleanScorer(2), no ConjunctionScorer... 

Thanks, I was reading score instead of score()...

But there is a scorer in the process, it is used for next()-ing to matched 
docs. So most of the work - preparing to be able to compute the scores - was 
done already. The scorer doc queue is created and populated. Not calling 
score() is saving the (final) looping on the scorers for aggregating their 
scores, multiplying by coord factor, etc. I assume this is why only a small 
speed up is seen. 


 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, BitsMatcher.java, 
 Filter-20060628.patch, HitCollector-20060628.patch, 
 IndexSearcher-20060628.patch, MatchCollector.java, Matcher.java, 
 Matcher20070226.patch, Scorer-20060628.patch, Searchable-20060628.patch, 
 Searcher-20060628.patch, Some Matchers.zip, SortedVIntList.java, 
 TestSortedVIntList.java


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-04-09 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487674
 ] 

Otis Gospodnetic commented on LUCENE-584:
-

A.  I'll look at the patch again tomorrow and follow what you said.  All 
this time I was under the impression that one of the points or at least 
side-effects of the Matcher was that scoring was skipped, which would be 
perfect where matches are ordered by anything other than relevance.



 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, BitsMatcher.java, 
 Filter-20060628.patch, HitCollector-20060628.patch, 
 IndexSearcher-20060628.patch, MatchCollector.java, Matcher.java, 
 Matcher20070226.patch, Scorer-20060628.patch, Searchable-20060628.patch, 
 Searcher-20060628.patch, Some Matchers.zip, SortedVIntList.java, 
 TestSortedVIntList.java


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-04-09 Thread Marvin Humphrey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487675
 ] 

Marvin Humphrey commented on LUCENE-584:


DisjunctionSumScorer (the ORScorer) actually calls Scorer.score() on all of the 
matching scorers in the ScorerDocQueue during next(), in order to accumulate an 
aggregate score.  The MatchCollector can't save you from that.

 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, BitsMatcher.java, 
 Filter-20060628.patch, HitCollector-20060628.patch, 
 IndexSearcher-20060628.patch, MatchCollector.java, Matcher.java, 
 Matcher20070226.patch, Scorer-20060628.patch, Searchable-20060628.patch, 
 Searcher-20060628.patch, Some Matchers.zip, SortedVIntList.java, 
 TestSortedVIntList.java


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-04-08 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487456
 ] 

Doron Cohen commented on LUCENE-584:


...right, your diff-txt had the Match tasks - I missed that - checked it, it is 
exactly what I did, so we're ok here. 

When you rerun, you may want to use my alg - to compare the two approaches in 
one run. You can run this by something like:
 ant run-task -Dtask.mem=256M -Dtask.alg=conf\matcher-vs-bitset.alg

Also, to get cleaner results, add the line:
 ResetSystemSoft
just in the beginning of the search round - this resets the (query) inputs 
and also calls GC.

I tried like this twice, and got inconsistent results:

When the bitset searches preceded the match searches:
 [java] Operation   round   runCnt   recsPerRunrec/s  
elapsedSecavgUsedMemavgTotalMem
 [java] SrchBitsSamRdr_5000 -   10 5000706.4   
70.78 7,511,219 16,573,645
 [java] SrchMtchSamRdr_5000 -   - -  -  10 -  -  - 5000 -  -   689.6 -  -  
72.50 -   8,223,005 -   11,926,323
 [java] SrchBitsNewRdr_500  -   10  500152.5   
32.8014,360,618 16,962,356
 [java] SrchMtchNewRdr_500 -  - - -  -  10 -  -  -  500 -  -   171.3 -  -  
29.19 -  15,150,797 -   17,395,712

When the match searches preceded the bitset searches:
 [java] Operation   round   runCnt   recsPerRunrec/s  
elapsedSecavgUsedMemavgTotalMem
 [java] SrchMtchSamRdr_5000 -   10 5000763.5   
65.49 9,563,243 17,128,244
 [java] SrchBitsSamRdr_5000 -   - -  -  10 -  -  - 5000 -  -   729.3 -  -  
68.56 -  10,003,775 -   13,001,114
 [java] SrchMtchNewRdr_500  -   10  500175.7   
28.4612,068,559 17,524,326
 [java] SrchBitsNewRdr_500 -  - - -  -  10 -  -  -  500 -  -   183.7 -  -  
27.22 -  15,098,480 -   17,974,476

My conclusion from this is that the speed-up, if exists, is minor, at least for 
the setup of this test. 

There are only 15 unique queries in this test - also printed in the log - are 
these the queries you would expect to save in? 

I didn't follow this issue very closely so I don't know where the saving is 
expected here. Both SearchTask and MatchTask now do nothing in collect, so no 
difference at the actual collect() call.

Also, Scorer.score(HitCollector) and Matcher.match(MatchCollector) are very 
similar:
  public void score(HitCollector hc) throws IOException {
while (next()) {
  hc.collect(doc(), score());
}
  }
  public void match(MatchCollector mc) throws IOException {
while (next()) {
  mc.collect(doc());
}
  }
Especially for the case that the collect() method is doing nothing, as in this 
test.

I think there is a potential gain for large boolean OR queries, because score() 
would have to call next() on all TermScorers and collect/sum their scores, 
while match() could use skipTo(last+1) because any match encountered is a match 
and there is no need to sum the individual scores for the same doc by other 
scorers. However as far as I can tell, current match() implementation does not 
take advantage of this, but I may be overlooking something?

 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, BitsMatcher.java, 
 Filter-20060628.patch, HitCollector-20060628.patch, 
 IndexSearcher-20060628.patch, MatchCollector.java, Matcher.java, 
 Matcher20070226.patch, Scorer-20060628.patch, Searchable-20060628.patch, 
 Searcher-20060628.patch, Some Matchers.zip, SortedVIntList.java, 
 TestSortedVIntList.java


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 

[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-04-07 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487431
 ] 

Doron Cohen commented on LUCENE-584:


One line was cut out - here are the four lines again

Operation   round   runCnt   recsPerRunrec/s  elapsedSec
avgUsedMemavgTotalMem
SrchMtchSamRdr_5000 -   10 5000642.2   77.85
12,331,866 16,408,576
SrchBitsSamRdr_5000 -   - -  -  10 -  -  - 5000 -  -   586.9 -  -  85.20 -   
9,515,875 -   12,009,472
SrchMtchNewRdr_500  -   10  500134.7   37.11
13,376,113 17,171,660
SrchBitsNewRdr_500 -  - - -  -  10 -  -  -  500 -  -   154.0 -  -  32.47 -  
15,351,395 -   17,522,688


 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, BitsMatcher.java, 
 Filter-20060628.patch, HitCollector-20060628.patch, 
 IndexSearcher-20060628.patch, MatchCollector.java, Matcher.java, 
 Matcher20070226.patch, Scorer-20060628.patch, Searchable-20060628.patch, 
 Searcher-20060628.patch, Some Matchers.zip, SortedVIntList.java, 
 TestSortedVIntList.java


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-04-07 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487432
 ] 

Otis Gospodnetic commented on LUCENE-584:
-

Doron, thanks for jumping on this!

1. I thought I'd see better performance with the Matcher because it skips 
scoring.  While Paul's patch does make changes to the Filtering code, I'm more 
focused on HitCollector vs. MatchCollector performance here.  Am I missing 
something here?  If scoring is skipped, we should see at least some speed 
improvement, and your results show that.

2. You said you *did* see MatchCollector was faster than HitCollector.  Hmmm, 
weird, not in my 4 runs:

Matcher:
 [java] SearchSameRdr_5 - - - - - - - - 4 - - 5 - - 1,064.7 - - 187.84 
- 11,060,036 - 14,806,016 
HitCollector: 
[java] SearchSameRdr_5 - - - - - - - - 4 - - 5 - - 1,070.3 - - 186.86 - 
10,500,146 - 13,821,952 

I'll try it again on a different computer.  My previous runs were on a Mac with 
OSX.

3. My bench-diff.txt did include Match tasks:

$ grep Match bench-diff.txt | grep class
public class SearchMatchTask extends MatchTask {
public abstract class MatchTask extends ReadTask {

... but I didn't svn add them, so I produced the diff by simply cat-ing the 
new tasks to bench-diff.txt .  So if you used my bench-diff.txt as a patch, it 
wouldn't have worked.  Not a big deal, just clarifying.


 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, BitsMatcher.java, 
 Filter-20060628.patch, HitCollector-20060628.patch, 
 IndexSearcher-20060628.patch, MatchCollector.java, Matcher.java, 
 Matcher20070226.patch, Scorer-20060628.patch, Searchable-20060628.patch, 
 Searcher-20060628.patch, Some Matchers.zip, SortedVIntList.java, 
 TestSortedVIntList.java


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-03-24 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12483868
 ] 

Yonik Seeley commented on LUCENE-584:
-

 BitsMatcher could also work without the exhausted flag, but then an 
 infinite loop
 might occur when trying to continue after the first time next() or skipTo() 
 returned false.
 Continuing after false was returned in these cases is a bug, however an 
 infinite loop
 can be difficult to debug. I'd rather be on the safe side of that with the 
 exhausted flag and wait for an actual 
 profile to show the performance problem.

We know that matchers will be inner-loop stuff.  It seems like any scorers that 
call next() after false was returned should be fixed.

 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: BitsMatcher.java, Filter-20060628.patch, 
 HitCollector-20060628.patch, IndexSearcher-20060628.patch, 
 MatchCollector.java, Matcher.java, Matcher20070226.patch, 
 Scorer-20060628.patch, Searchable-20060628.patch, Searcher-20060628.patch, 
 Some Matchers.zip, SortedVIntList.java, TestSortedVIntList.java


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-03-23 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12483724
 ] 

Otis Gospodnetic commented on LUCENE-584:
-

Paul:
Applied the patch, applied cleanly, run ant test - BUILD SUCCESSFUL :)

I'm primarily interested in using this in order to get matches, but avoid 
scoring.  From what I can tell, I'd just need to switch to using the new 
match(Query, MatchCollector) method in IndexSearcher.  However, I need Sort and 
TopFieldDocs, and I don't see a match method with those.  Is there a reason why 
such a match method is not in the patch?


 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: BitsMatcher.java, Filter-20060628.patch, 
 HitCollector-20060628.patch, IndexSearcher-20060628.patch, 
 MatchCollector.java, Matcher.java, Matcher20070226.patch, 
 Scorer-20060628.patch, Searchable-20060628.patch, Searcher-20060628.patch, 
 Some Matchers.zip, SortedVIntList.java, TestSortedVIntList.java


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-03-23 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12483731
 ] 

Paul Elschot commented on LUCENE-584:
-

Otis:

 However, I need Sort and TopFieldDocs, and I don't see a match method with 
 those.
 Is there a reason why such a match method is not in the patch? 

A bit silly perhaps, but what sort criterion would like to have used when no 
score() value is available?

I don't know the sorting code, but it might be possible to use a field value 
for sorting.
In that case the sorting code for a Matcher would need to check whether the 
sort criterion does
not imply the use of a score value.
I personally have no use for sorting by field values, so that is why I never 
thought of combining this with a Matcher.





 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: BitsMatcher.java, Filter-20060628.patch, 
 HitCollector-20060628.patch, IndexSearcher-20060628.patch, 
 MatchCollector.java, Matcher.java, Matcher20070226.patch, 
 Scorer-20060628.patch, Searchable-20060628.patch, Searcher-20060628.patch, 
 Some Matchers.zip, SortedVIntList.java, TestSortedVIntList.java


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-03-16 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12481734
 ] 

Paul Elschot commented on LUCENE-584:
-

Hoss,

Paul: I notice Filter.getMatcher returns null, and IndexSearcher tests for 
that and uses
 it to decide whether or not to iterator over the (non null) Matcher, or over 
 the BitSet
 from Filter.bits. is there any reason that logic can't be put in getMatcher, 
 so that if
 subclasses of Filter don't override the getMatcher method it will call bits 
 and then
 return a Matcher that iterates over the set Bits?

Two reasons:
- uncertainty over performance of a Matcher instead of a BitSet,
- this way backward compatibility very easily guaranteed.

There is also LUCENE-730, which may interfere with the removal of BitSet,
since it allows documents to be scored out of order. However, LUCENE-730
should only be used at the top level of a query search and without a Filter.
I cannot think of an actual case in which there might be interference, but
I may not have not looked into that deep enough.

 we could even change Filter.bits so it's no longer abstract ... it could have
 an implementation that would call getMatcher, and iterate over all of the 
 matched
 docs setting bits on a BitSet that is then returned ... the class would still 
 be
 abstract, and the class javadocs would make it clear that subclasses must 
 override
 at least one of the methods...

I must say that creating a BitSet from a Matcher never occurred to me.
Anyway, when Filter.bits() is deprecated I have no preference about how
it is actually removed.


 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: BitsMatcher.java, Filter-20060628.patch, 
 HitCollector-20060628.patch, IndexSearcher-20060628.patch, 
 MatchCollector.java, Matcher.java, Matcher20070226.patch, 
 Scorer-20060628.patch, Searchable-20060628.patch, Searcher-20060628.patch, 
 Some Matchers.zip, SortedVIntList.java, TestSortedVIntList.java


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-03-15 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12481263
 ] 

Hoss Man commented on LUCENE-584:
-

It's been a while since i looked at this issue, but it's come up in discussion 
recently so i took another glance...

Paul: I notice Filter.getMatcher returns null, and IndexSearcher tests for that 
and uses it to decide whether or not to iterator over the (non null) Matcher, 
or over the BitSet from Filter.bits.  is there any reason that logic can't be 
put in getMatcher, so that if subclasses of Filter don't override the 
getMatcher method it will call bits and then return a Matcher that iterates 
over the set Bits?

(this is the roll-out approach i advocated a while back when discussing this on 
email, excecept that at the time Matcher was refered to as SearchFilter: 
http://www.nabble.com/RE%3A-Filter-p2605271.html )

Thinking about it now, we could even change Filter.bits so it's no longer 
abstract ... it could have an implementation that would call getMatcher, and 
iterate over all of the matched docs setting bits on a BitSet that is then 
returned ... the class would still be abstract, and the class javadocs  would 
make it clear that subclasses must override at least one of the methods ... 
legacy Filters will work fine because they'll already have a bits method, and 
people writing new Filters will see that bits is deprecated, so they'll just 
write a getMatcher method and be done.

This appears to be the same approach taken with Analyzer.tokenStream back in 
1.4.3...

http://lucene.apache.org/java/1_4_3/api/org/apache/lucene/analysis/Analyzer.html

 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: BitsMatcher.java, Filter-20060628.patch, 
 HitCollector-20060628.patch, IndexSearcher-20060628.patch, 
 MatchCollector.java, Matcher.java, Matcher20070226.patch, 
 Scorer-20060628.patch, Searchable-20060628.patch, Searcher-20060628.patch, 
 Some Matchers.zip, SortedVIntList.java, TestSortedVIntList.java


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2006-11-17 Thread Paul Elschot (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-584?page=comments#action_12450715 ] 

Paul Elschot commented on LUCENE-584:
-

I have just resolved some minor local conflicts on the updated copyrights of 
four java  files.
Please holler when a fresh  patch is needed.

 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: http://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: BitsMatcher.java, Filter-20060628.patch, 
 HitCollector-20060628.patch, IndexSearcher-20060628.patch, 
 MatchCollector.java, Matcher.java, Matcher20060830b.patch, 
 Scorer-20060628.patch, Searchable-20060628.patch, Searcher-20060628.patch, 
 Some Matchers.zip, SortedVIntList.java, TestSortedVIntList.java


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2006-09-24 Thread Paul Elschot (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-584?page=comments#action_12437242 ] 

Paul Elschot commented on LUCENE-584:
-

I wrote:

 One could add an abstract Scorer.explain() to catch these, or
 provide a default implementation for Scorer.explain().

by mistake. The good news is that the patch leaves the 
the existing abstract Scorer.explain() method unaffected.


 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: http://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: BitsMatcher.java, Filter-20060628.patch, 
 HitCollector-20060628.patch, IndexSearcher-20060628.patch, 
 MatchCollector.java, Matcher.java, Matcher20060830b.patch, 
 Scorer-20060628.patch, Searchable-20060628.patch, Searcher-20060628.patch, 
 Some Matchers.zip, SortedVIntList.java, TestSortedVIntList.java


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2006-09-15 Thread Paul Elschot (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-584?page=comments#action_12434901 ] 

Paul Elschot commented on LUCENE-584:
-

In the inheritance from Matcher to Scorer there is an asymmetry
in this patch.

Matcher provides a default implementation for Matcher.explain()
but Scorer does not, and this might lead to unexpected surprises
for future Scorers when the current Matcher.explain() is used.
One could add an abstract Scorer.explain() to catch these, or
provide a default implementation for Scorer.explain().

With matcher implementations quite a few other implementation
decisions need to be taken. 
Also any place in the current code where a Scorer is used, but none
of the Scorer.score() methods, is a candidate for a change from
Scorer to Matcher.
This will be mostly the current filtering implementations,
but ConstantScoringQuery is another nice example.

Regards,
Paul Elschot


 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: http://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: BitsMatcher.java, Filter-20060628.patch, 
 HitCollector-20060628.patch, IndexSearcher-20060628.patch, 
 MatchCollector.java, Matcher.java, Matcher20060830b.patch, 
 Scorer-20060628.patch, Searchable-20060628.patch, Searcher-20060628.patch, 
 Some Matchers.zip, SortedVIntList.java, TestSortedVIntList.java


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2006-09-14 Thread Eks Dev (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-584?page=comments#action_12434637 ] 

Eks Dev commented on LUCENE-584:


Paul,
What is next now, we did on our app enough experiments and are now sure that 
this patch causes no incompatibilities. 
We also tried to replace our filters with OpenBitSet and VInt matchers and 
results there are more than good, our app showed crazy  30% speed-up!!! Hard to 
identify where from exactly, but I suspect VInt matcher in case of not too 
dense BitVectors increased our Filter Cache utilization significantly.

I would propose to commit this patch before we go further with something that 
would actually utilize Matcher. Just to avoid creating monster patch on patch 
... 

This is ground work, and now using Matcher will be pure poetry, I see a lot of 
places we could see beter life by using use Matchers, ConstantScoringQuery, 
PreffixFilter, ChainedFilter (becomes obsolete now)... actually replace all 
uses of BitSet with OpenBitSet (or a bit smarter with SortedIntList. VInt...)...
Than question here, do we create dependancy to Solr from Lucene, or we 
migrate OpenBitSet to Lucene (as this dependancy allready exists) or we 
copy-paste and have two OpenBitSets, Yonik? As far as I am concerned, makes no 
real diference.

Do you, or someone else see now things to be done before commiting this? 



 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: http://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: BitsMatcher.java, Filter-20060628.patch, 
 HitCollector-20060628.patch, IndexSearcher-20060628.patch, 
 MatchCollector.java, Matcher.java, Matcher20060830b.patch, 
 Scorer-20060628.patch, Searchable-20060628.patch, Searcher-20060628.patch, 
 Some Matchers.zip, SortedVIntList.java, TestSortedVIntList.java


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2006-09-14 Thread Paul Elschot (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-584?page=comments#action_12434763 ] 

Paul Elschot commented on LUCENE-584:
-

 Do you, or someone else see now things to be done before commiting this?

Yes. In the steps listed here:
http://wiki.apache.org/jakarta-lucene/HowToContribute
the next step is to be patient.
Wether being patient is something that can be done
is open question...

Paul Elschot.



 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: http://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: BitsMatcher.java, Filter-20060628.patch, 
 HitCollector-20060628.patch, IndexSearcher-20060628.patch, 
 MatchCollector.java, Matcher.java, Matcher20060830b.patch, 
 Scorer-20060628.patch, Searchable-20060628.patch, Searcher-20060628.patch, 
 Some Matchers.zip, SortedVIntList.java, TestSortedVIntList.java


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2006-09-14 Thread eks dev
not being inpatient, just asking if all holes are covered, Matcher rocks and 
I'd like to clean up a lot of mess we created in our local copy in order to 
simulate what Matcher will permit us to do in really elegant way...

if being patient is all what it takes, cool ;)

- Original Message 
From: Paul Elschot (JIRA) [EMAIL PROTECTED]
To: java-dev@lucene.apache.org
Sent: Thursday, 14 September, 2006 8:41:25 PM
Subject: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet

[ 
http://issues.apache.org/jira/browse/LUCENE-584?page=comments#action_12434763 ] 

Paul Elschot commented on LUCENE-584:
-

 Do you, or someone else see now things to be done before commiting this?

Yes. In the steps listed here:
http://wiki.apache.org/jakarta-lucene/HowToContribute
the next step is to be patient.
Wether being patient is something that can be done
is open question...

Paul Elschot.



 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: http://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: BitsMatcher.java, Filter-20060628.patch, 
 HitCollector-20060628.patch, IndexSearcher-20060628.patch, 
 MatchCollector.java, Matcher.java, Matcher20060830b.patch, 
 Scorer-20060628.patch, Searchable-20060628.patch, Searcher-20060628.patch, 
 Some Matchers.zip, SortedVIntList.java, TestSortedVIntList.java


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2006-09-04 Thread paul.elschot (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-584?page=comments#action_12432435 ] 

paul.elschot commented on LUCENE-584:
-

 No performance changes as well.

It's good to hear that. As mentioned earlier, this is groundwork only.
Once an actual Matcher is used I expect some some performance differences to 
show up.

Which comment of Yonik related to HitCollector do you mean?

 Early this week we will try to implement our first Matchers and see how they 
 behave 

BitsMatcher and SortedVIntList could start that.
Also I'd like to see one on Solr's OpenBitSet...



 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: http://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: BitsMatcher.java, Filter-20060628.patch, 
 HitCollector-20060628.patch, IndexSearcher-20060628.patch, 
 MatchCollector.java, Matcher.java, Matcher20060830b.patch, 
 Scorer-20060628.patch, Searchable-20060628.patch, Searcher-20060628.patch, 
 SortedVIntList.java, TestSortedVIntList.java


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2006-09-04 Thread eks dev
Yonik, 
any reason to have BitSetItrator method 
int next(int fromIndex) {...
package protected 

Would be interesing to see how BitSetIterator works in Matcher, skipping is 
needed there



- Original Message 
From: paul.elschot (JIRA) [EMAIL PROTECTED]
To: java-dev@lucene.apache.org
Sent: Monday, 4 September, 2006 8:47:24 AM
Subject: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet

[ 
http://issues.apache.org/jira/browse/LUCENE-584?page=comments#action_12432435 ] 

paul.elschot commented on LUCENE-584:
-

 No performance changes as well.

It's good to hear that. As mentioned earlier, this is groundwork only.
Once an actual Matcher is used I expect some some performance differences to 
show up.

Which comment of Yonik related to HitCollector do you mean?

 Early this week we will try to implement our first Matchers and see how they 
 behave 

BitsMatcher and SortedVIntList could start that.
Also I'd like to see one on Solr's OpenBitSet...



 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: http://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: BitsMatcher.java, Filter-20060628.patch, 
 HitCollector-20060628.patch, IndexSearcher-20060628.patch, 
 MatchCollector.java, Matcher.java, Matcher20060830b.patch, 
 Scorer-20060628.patch, Searchable-20060628.patch, Searcher-20060628.patch, 
 SortedVIntList.java, TestSortedVIntList.java


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2006-09-04 Thread Eks Dev (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-584?page=comments#action_12432497 ] 

Eks Dev commented on LUCENE-584:


Paul,
What is exact semantics of skipTo(int) in Matcher?

- is it OK to skip back and forth before I reach end?
e.g.: skipTo(0); skipTo(333); skipTo(0); 

- once I reach end, skipTo(int) does nothing (BitsMatcher, exhausted). It is 
impossible to reposition Matcher after that

Is this intended behavior, skip forward until you reach end, and then, you are 
at the end :) ? 






 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: http://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: BitsMatcher.java, Filter-20060628.patch, 
 HitCollector-20060628.patch, IndexSearcher-20060628.patch, 
 MatchCollector.java, Matcher.java, Matcher20060830b.patch, 
 Scorer-20060628.patch, Searchable-20060628.patch, Searcher-20060628.patch, 
 SortedVIntList.java, TestSortedVIntList.java


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2006-09-04 Thread Paul Elschot
On Monday 04 September 2006 13:43, Eks Dev (JIRA) wrote:
 
[ http://issues.apache.org/jira/browse/LUCENE-584?page=comments#action_12432497 
] 
 
 Eks Dev commented on LUCENE-584:
 
 
 Paul,
 What is exact semantics of skipTo(int) in Matcher?
 
 - is it OK to skip back and forth before I reach end?
 e.g.: skipTo(0); skipTo(333); skipTo(0); 
 
 - once I reach end, skipTo(int) does nothing (BitsMatcher, exhausted). It is 
impossible to reposition Matcher after that
 
 Is this intended behavior, skip forward until you reach end, and then, you 
are at the end :) ? 

This last one. From the javadocs (in the patch):

Skips to the first match whose document number is greater than or equal to a 
given target. If, after next() or skipTo(int) has been called the first time, 
the target is before or at the current document, the current document may 
change to the next matching document.

Regards,
Paul Elschot

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2006-09-03 Thread Eks Dev (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-584?page=comments#action_12432378 ] 

Eks Dev commented on LUCENE-584:


Hi Paul,
for me, this patch did not cause any incompatibility issues. All our tests 
passed without noticing any difference to the previous trunk version. No 
performance changes as well ( we use HitCollector only, so Yoniks comment does 
not apply here).
Tests are application level, and make index hot (6hrs searches with test batch 
of requests with known responses), 50Mio not artificial docs, real requests...

Early this week we will try to implement our first Matchers and see how they 
behave
 

 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: http://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: BitsMatcher.java, Filter-20060628.patch, 
 HitCollector-20060628.patch, IndexSearcher-20060628.patch, 
 MatchCollector.java, Matcher.java, Matcher20060830b.patch, 
 Scorer-20060628.patch, Searchable-20060628.patch, Searcher-20060628.patch, 
 SortedVIntList.java, TestSortedVIntList.java


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2006-08-30 Thread Yonik Seeley (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-584?page=comments#action_12431684 ] 

Yonik Seeley commented on LUCENE-584:
-

Thanks Paul,
I like the Matcher/Scorer relation.

It looks like no Filters currently return a matcher, so the current patch just 
lays the groundwork, right?

When some filters do start to return a matcher, it looks like support for the 
1.4 BooleanScorer needs to be removed, or a check done in 
IndexSearcher.search() to disable skipping on the scorer if it's in use.

I wonder what the performance impact is... for a dense search with a dense 
bitset filter, it looks like quite a bit of overhead is added (two calls in 
order to get the next doc, use of nextSetBit() instead of get(), checking 
exhausted each time and checking for -1 to set exhausted).  I suppose one can 
always drop back to using a HitCollector for special cases though.

 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: http://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: BitsMatcher.java, Filter-20060628.patch, 
 HitCollector-20060628.patch, IndexSearcher-20060628.patch, 
 MatchCollector.java, Matcher.java, Matcher20060830.patch, 
 Scorer-20060628.patch, Searchable-20060628.patch, Searcher-20060628.patch, 
 SortedVIntList.java, TestSortedVIntList.java


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



  1   2   >