[jira] Updated: (LUCENE-584) Decouple Filter from BitSet
[ http://issues.apache.org/jira/browse/LUCENE-584?page=all ] Eks Dev updated LUCENE-584: --- Attachment: Some Matchers.zip Here are some Matcher implementations, - OpenBitsMatcher- the same as the code Paul wrote for BitsMatcher, with replaced OpenBitSet instead -DenseOpenBitsMatcher - Using solr BitSetIterator (for skipTo() to work, one method in BitSetIterator should become public) Also attached one simple test (just basic fuctionality) that also contains one dummy relative performance test Perf. test simply iterates over different Matcher implementations and measures ellapsed time (not including Matcher creation, pure forward scan to the end) for different set bit densities. imho, this code is not sufficiantly tested nor commented, needs an hour or two. As expected, Yonik made this ButSetIterator really fast. What was surprise for me was OpenBitSet nextSetBit() comparing bad to the BitSet (or I made some dummy mistake somewhere?) > Decouple Filter from BitSet > --- > > Key: LUCENE-584 > URL: http://issues.apache.org/jira/browse/LUCENE-584 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.0.1 >Reporter: Peter Schäfer >Priority: Minor > Attachments: BitsMatcher.java, Filter-20060628.patch, > HitCollector-20060628.patch, IndexSearcher-20060628.patch, > MatchCollector.java, Matcher.java, Matcher20060830b.patch, > Scorer-20060628.patch, Searchable-20060628.patch, Searcher-20060628.patch, > Some Matchers.zip, SortedVIntList.java, TestSortedVIntList.java > > > {code} > package org.apache.lucene.search; > public abstract class Filter implements java.io.Serializable > { > public abstract AbstractBitSet bits(IndexReader reader) throws IOException; > } > public interface AbstractBitSet > { > public boolean get(int index); > } > {code} > It would be useful if the method =Filter.bits()= returned an abstract > interface, instead of =java.util.BitSet=. > Use case: there is a very large index, and, depending on the user's > privileges, only a small portion of the index is actually visible. > Sparsely populated =java.util.BitSet=s are not efficient and waste lots of > memory. It would be desirable to have an alternative BitSet implementation > with smaller memory footprint. > Though it _is_ possibly to derive classes from =java.util.BitSet=, it was > obviously not designed for that purpose. > That's why I propose to use an interface instead. The default implementation > could still delegate to =java.util.BitSet=. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
jvm crashes on FieldCache.DEFAULT.getStrings(reader, field);
Dear lucene folks, we have 3 indicees a la 10 mio documents. All indicees are accessed via one MultiReader. For the the first hits of a query we call: FieldCache.DEFAULT.getStrings(reader, field); After start querying the first 10 queries seems to hang in the getStrings()-method, then the the jvm crashes silently... Any clue what the problem could be ? best regards Johannes - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet
On Monday 04 September 2006 13:43, Eks Dev (JIRA) wrote: > [ http://issues.apache.org/jira/browse/LUCENE-584?page=comments#action_12432497 ] > > Eks Dev commented on LUCENE-584: > > > Paul, > What is exact semantics of skipTo(int) in Matcher? > > - is it OK to skip back and forth before I reach end? > e.g.: skipTo(0); skipTo(333); skipTo(0); > > - once I reach end, skipTo(int) does nothing (BitsMatcher, exhausted). It is impossible to reposition Matcher after that > > Is this intended behavior, "skip forward until you reach end, and then, you are at the end :)" ? This last one. From the javadocs (in the patch): "Skips to the first match whose document number is greater than or equal to a given target. If, after next() or skipTo(int) has been called the first time, the target is before or at the current document, the current document may change to the next matching document." Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-632) The creation of a spell index from a LuceneDictionary via SpellChecker.indexDictionary (Dictionary dict) fails starting with 1.9.1 (up to current svn version)
[ http://issues.apache.org/jira/browse/LUCENE-632?page=comments#action_12432518 ] Karsten Dello commented on LUCENE-632: -- Sorry for not responding for such a long time, I have been out of the office. Otis: The current SVN version (as of today) works fine for me, though the spellIndex has to be created manually before using the SpellChecker constructor. As Karl pointed out a simple new IndexWriter(d2, null, true).close(); does the job. Miles: I think you are right, had the same problem. I worked around that problem by calling exist("foo") before indexDictionary , but that is not a bugfix (which is, as you said, that the method should check if reader is null) > The creation of a spell index from a LuceneDictionary via > SpellChecker.indexDictionary (Dictionary dict) fails starting with 1.9.1 (up > to current svn version) > -- > > Key: LUCENE-632 > URL: http://issues.apache.org/jira/browse/LUCENE-632 > Project: Lucene - Java > Issue Type: Bug > Components: Other >Affects Versions: 2.0.0, 1.9 >Reporter: Karsten Dello >Priority: Minor > Attachments: lazy_searcher.diff > > > Two different errors in 1.9.1/2.0.0 and current svn version. > 1.9.1/2.0.0: > at the end of indexDictionary (Dictionary dict) > the IndexReader-instance reader is closed. > This causes a NullpointerException because reader has not been initialized > before (neither in that method nor in the constructor). > Uncommenting this line (reader.close()) seems to resolve that issue. > current svn: > the constructor tries to create an IndexSearcher-instance for the specified > path; > as there is no index in that path - it is not created yet - an exception is > thrown. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-584) Decouple Filter from BitSet
[ http://issues.apache.org/jira/browse/LUCENE-584?page=comments#action_12432497 ] Eks Dev commented on LUCENE-584: Paul, What is exact semantics of skipTo(int) in Matcher? - is it OK to skip back and forth before I reach end? e.g.: skipTo(0); skipTo(333); skipTo(0); - once I reach end, skipTo(int) does nothing (BitsMatcher, exhausted). It is impossible to reposition Matcher after that Is this intended behavior, "skip forward until you reach end, and then, you are at the end :)" ? > Decouple Filter from BitSet > --- > > Key: LUCENE-584 > URL: http://issues.apache.org/jira/browse/LUCENE-584 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.0.1 >Reporter: Peter Schäfer >Priority: Minor > Attachments: BitsMatcher.java, Filter-20060628.patch, > HitCollector-20060628.patch, IndexSearcher-20060628.patch, > MatchCollector.java, Matcher.java, Matcher20060830b.patch, > Scorer-20060628.patch, Searchable-20060628.patch, Searcher-20060628.patch, > SortedVIntList.java, TestSortedVIntList.java > > > {code} > package org.apache.lucene.search; > public abstract class Filter implements java.io.Serializable > { > public abstract AbstractBitSet bits(IndexReader reader) throws IOException; > } > public interface AbstractBitSet > { > public boolean get(int index); > } > {code} > It would be useful if the method =Filter.bits()= returned an abstract > interface, instead of =java.util.BitSet=. > Use case: there is a very large index, and, depending on the user's > privileges, only a small portion of the index is actually visible. > Sparsely populated =java.util.BitSet=s are not efficient and waste lots of > memory. It would be desirable to have an alternative BitSet implementation > with smaller memory footprint. > Though it _is_ possibly to derive classes from =java.util.BitSet=, it was > obviously not designed for that purpose. > That's why I propose to use an interface instead. The default implementation > could still delegate to =java.util.BitSet=. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet
Yonik, any reason to have BitSetItrator method int next(int fromIndex) {... package protected Would be interesing to see how BitSetIterator works in Matcher, skipping is needed there - Original Message From: paul.elschot (JIRA) <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Monday, 4 September, 2006 8:47:24 AM Subject: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet [ http://issues.apache.org/jira/browse/LUCENE-584?page=comments#action_12432435 ] paul.elschot commented on LUCENE-584: - > No performance changes as well. It's good to hear that. As mentioned earlier, this is groundwork only. Once an actual Matcher is used I expect some some performance differences to show up. Which comment of Yonik related to HitCollector do you mean? > Early this week we will try to implement our first Matchers and see how they > behave BitsMatcher and SortedVIntList could start that. Also I'd like to see one on Solr's OpenBitSet... > Decouple Filter from BitSet > --- > > Key: LUCENE-584 > URL: http://issues.apache.org/jira/browse/LUCENE-584 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.0.1 >Reporter: Peter Schäfer >Priority: Minor > Attachments: BitsMatcher.java, Filter-20060628.patch, > HitCollector-20060628.patch, IndexSearcher-20060628.patch, > MatchCollector.java, Matcher.java, Matcher20060830b.patch, > Scorer-20060628.patch, Searchable-20060628.patch, Searcher-20060628.patch, > SortedVIntList.java, TestSortedVIntList.java > > > {code} > package org.apache.lucene.search; > public abstract class Filter implements java.io.Serializable > { > public abstract AbstractBitSet bits(IndexReader reader) throws IOException; > } > public interface AbstractBitSet > { > public boolean get(int index); > } > {code} > It would be useful if the method =Filter.bits()= returned an abstract > interface, instead of =java.util.BitSet=. > Use case: there is a very large index, and, depending on the user's > privileges, only a small portion of the index is actually visible. > Sparsely populated =java.util.BitSet=s are not efficient and waste lots of > memory. It would be desirable to have an alternative BitSet implementation > with smaller memory footprint. > Though it _is_ possibly to derive classes from =java.util.BitSet=, it was > obviously not designed for that purpose. > That's why I propose to use an interface instead. The default implementation > could still delegate to =java.util.BitSet=. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]