[jira] Updated: (LUCENE-584) Decouple Filter from BitSet

2006-09-04 Thread Eks Dev (JIRA)
 [ http://issues.apache.org/jira/browse/LUCENE-584?page=all ]

Eks Dev updated LUCENE-584:
---

Attachment: Some Matchers.zip

Here are some Matcher implementations,

- OpenBitsMatcher- the same as the code Paul wrote for BitsMatcher, with 
replaced OpenBitSet instead 

-DenseOpenBitsMatcher  - Using solr BitSetIterator (for skipTo() to work, one 
method in BitSetIterator should become public)

Also attached one simple  test (just basic fuctionality) that also contains one 
dummy relative performance  test 

Perf. test simply iterates over different Matcher implementations  and measures 
ellapsed time (not including Matcher creation, pure forward scan to the end) 
for different set bit densities.

imho, this code is not sufficiantly tested nor commented, needs an hour or two. 
 

As expected, Yonik made this ButSetIterator really fast. What was surprise for 
me was OpenBitSet nextSetBit() comparing bad to the BitSet  (or I made some 
dummy mistake somewhere?)

> Decouple Filter from BitSet
> ---
>
> Key: LUCENE-584
> URL: http://issues.apache.org/jira/browse/LUCENE-584
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.0.1
>Reporter: Peter Schäfer
>Priority: Minor
> Attachments: BitsMatcher.java, Filter-20060628.patch, 
> HitCollector-20060628.patch, IndexSearcher-20060628.patch, 
> MatchCollector.java, Matcher.java, Matcher20060830b.patch, 
> Scorer-20060628.patch, Searchable-20060628.patch, Searcher-20060628.patch, 
> Some Matchers.zip, SortedVIntList.java, TestSortedVIntList.java
>
>
> {code}
> package org.apache.lucene.search;
> public abstract class Filter implements java.io.Serializable 
> {
>   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
> }
> public interface AbstractBitSet 
> {
>   public boolean get(int index);
> }
> {code}
> It would be useful if the method =Filter.bits()= returned an abstract 
> interface, instead of =java.util.BitSet=.
> Use case: there is a very large index, and, depending on the user's 
> privileges, only a small portion of the index is actually visible.
> Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
> memory. It would be desirable to have an alternative BitSet implementation 
> with smaller memory footprint.
> Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
> obviously not designed for that purpose.
> That's why I propose to use an interface instead. The default implementation 
> could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



jvm crashes on FieldCache.DEFAULT.getStrings(reader, field);

2006-09-04 Thread Johannes Zillmann

Dear lucene folks,

we have 3 indicees a la 10 mio documents.
All indicees are accessed via one MultiReader.
For the the first hits of a query we call:
FieldCache.DEFAULT.getStrings(reader, field);

After start querying the first 10 queries seems to hang in the 
getStrings()-method, then the the jvm crashes silently...

Any clue what the problem could be ?

best regards
Johannes


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2006-09-04 Thread Paul Elschot
On Monday 04 September 2006 13:43, Eks Dev (JIRA) wrote:
> 
[ http://issues.apache.org/jira/browse/LUCENE-584?page=comments#action_12432497 
] 
> 
> Eks Dev commented on LUCENE-584:
> 
> 
> Paul,
> What is exact semantics of skipTo(int) in Matcher?
> 
> - is it OK to skip back and forth before I reach end?
> e.g.: skipTo(0); skipTo(333); skipTo(0); 
> 
> - once I reach end, skipTo(int) does nothing (BitsMatcher, exhausted). It is 
impossible to reposition Matcher after that
> 
> Is this intended behavior, "skip forward until you reach end, and then, you 
are at the end :)" ? 

This last one. From the javadocs (in the patch):

"Skips to the first match whose document number is greater than or equal to a 
given target. If, after next() or skipTo(int) has been called the first time, 
the target is before or at the current document, the current document may 
change to the next matching document."

Regards,
Paul Elschot

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-632) The creation of a spell index from a LuceneDictionary via SpellChecker.indexDictionary (Dictionary dict) fails starting with 1.9.1 (up to current svn version)

2006-09-04 Thread Karsten Dello (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-632?page=comments#action_12432518 ] 

Karsten Dello commented on LUCENE-632:
--

Sorry for not responding for such a long time, I have been out of the office.

Otis:
The current SVN version (as of today)  works fine for me, though the spellIndex 
has to be created manually before using the SpellChecker constructor. As Karl 
pointed out a simple 
new IndexWriter(d2, null, true).close();
does the job.

Miles:
I think you are right, had the same problem. I worked around that problem by 
calling exist("foo") before indexDictionary , but that is not a bugfix  (which 
is, as you said, that the method should check if reader is null)


> The creation of a spell index from a LuceneDictionary via 
> SpellChecker.indexDictionary (Dictionary dict) fails starting with 1.9.1 (up 
> to current svn version)
> --
>
> Key: LUCENE-632
> URL: http://issues.apache.org/jira/browse/LUCENE-632
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Other
>Affects Versions: 2.0.0, 1.9
>Reporter: Karsten Dello
>Priority: Minor
> Attachments: lazy_searcher.diff
>
>
> Two different errors in 1.9.1/2.0.0 and current svn version.
> 1.9.1/2.0.0:
> at the end of indexDictionary (Dictionary dict) 
> the IndexReader-instance reader is closed.
> This causes a NullpointerException because reader has not been initialized 
> before (neither in that method nor in the constructor).
> Uncommenting this line (reader.close()) seems to resolve that issue.
> current svn:
> the constructor tries to create an IndexSearcher-instance for the specified 
> path;
> as there is no index in that path - it is not created yet -  an exception is 
> thrown.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2006-09-04 Thread Eks Dev (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-584?page=comments#action_12432497 ] 

Eks Dev commented on LUCENE-584:


Paul,
What is exact semantics of skipTo(int) in Matcher?

- is it OK to skip back and forth before I reach end?
e.g.: skipTo(0); skipTo(333); skipTo(0); 

- once I reach end, skipTo(int) does nothing (BitsMatcher, exhausted). It is 
impossible to reposition Matcher after that

Is this intended behavior, "skip forward until you reach end, and then, you are 
at the end :)" ? 






> Decouple Filter from BitSet
> ---
>
> Key: LUCENE-584
> URL: http://issues.apache.org/jira/browse/LUCENE-584
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.0.1
>Reporter: Peter Schäfer
>Priority: Minor
> Attachments: BitsMatcher.java, Filter-20060628.patch, 
> HitCollector-20060628.patch, IndexSearcher-20060628.patch, 
> MatchCollector.java, Matcher.java, Matcher20060830b.patch, 
> Scorer-20060628.patch, Searchable-20060628.patch, Searcher-20060628.patch, 
> SortedVIntList.java, TestSortedVIntList.java
>
>
> {code}
> package org.apache.lucene.search;
> public abstract class Filter implements java.io.Serializable 
> {
>   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
> }
> public interface AbstractBitSet 
> {
>   public boolean get(int index);
> }
> {code}
> It would be useful if the method =Filter.bits()= returned an abstract 
> interface, instead of =java.util.BitSet=.
> Use case: there is a very large index, and, depending on the user's 
> privileges, only a small portion of the index is actually visible.
> Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
> memory. It would be desirable to have an alternative BitSet implementation 
> with smaller memory footprint.
> Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
> obviously not designed for that purpose.
> That's why I propose to use an interface instead. The default implementation 
> could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2006-09-04 Thread eks dev
Yonik, 
any reason to have BitSetItrator method 
int next(int fromIndex) {...
package protected 

Would be interesing to see how BitSetIterator works in Matcher, skipping is 
needed there



- Original Message 
From: paul.elschot (JIRA) <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Monday, 4 September, 2006 8:47:24 AM
Subject: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet

[ 
http://issues.apache.org/jira/browse/LUCENE-584?page=comments#action_12432435 ] 

paul.elschot commented on LUCENE-584:
-

> No performance changes as well.

It's good to hear that. As mentioned earlier, this is groundwork only.
Once an actual Matcher is used I expect some some performance differences to 
show up.

Which comment of Yonik related to HitCollector do you mean?

> Early this week we will try to implement our first Matchers and see how they 
> behave 

BitsMatcher and SortedVIntList could start that.
Also I'd like to see one on Solr's OpenBitSet...



> Decouple Filter from BitSet
> ---
>
> Key: LUCENE-584
> URL: http://issues.apache.org/jira/browse/LUCENE-584
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.0.1
>Reporter: Peter Schäfer
>Priority: Minor
> Attachments: BitsMatcher.java, Filter-20060628.patch, 
> HitCollector-20060628.patch, IndexSearcher-20060628.patch, 
> MatchCollector.java, Matcher.java, Matcher20060830b.patch, 
> Scorer-20060628.patch, Searchable-20060628.patch, Searcher-20060628.patch, 
> SortedVIntList.java, TestSortedVIntList.java
>
>
> {code}
> package org.apache.lucene.search;
> public abstract class Filter implements java.io.Serializable 
> {
>   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
> }
> public interface AbstractBitSet 
> {
>   public boolean get(int index);
> }
> {code}
> It would be useful if the method =Filter.bits()= returned an abstract 
> interface, instead of =java.util.BitSet=.
> Use case: there is a very large index, and, depending on the user's 
> privileges, only a small portion of the index is actually visible.
> Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
> memory. It would be desirable to have an alternative BitSet implementation 
> with smaller memory footprint.
> Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
> obviously not designed for that purpose.
> That's why I propose to use an interface instead. The default implementation 
> could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]