[jira] Resolved: (LUCENE-970) FilterIndexReader should overwrite isOptimized()
[ https://issues.apache.org/jira/browse/LUCENE-970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch resolved LUCENE-970. -- Resolution: Fixed Lucene Fields: [New, Patch Available] (was: [Patch Available, New]) > FilterIndexReader should overwrite isOptimized() > > > Key: LUCENE-970 > URL: https://issues.apache.org/jira/browse/LUCENE-970 > Project: Lucene - Java > Issue Type: Bug > Components: Search >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Trivial > Fix For: 2.3 > > Attachments: lucene-970.patch > > > A call of FilterIndexReader.isOptimized() results in a NPE because > FilterIndexReader does not overwrite isOptimized(). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-965) Implement a state-of-the-art retrieval function in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516547 ] Doron Cohen commented on LUCENE-965: > Is there a way to plug in a patch into my local source repository, so I can > diff with my favorite diff tool? : patch -p 0 < foo.patch Try with --dry-run first. Another convenient way in case you are using Eclipse is the Subclipse plugin that lets you visually diff patches just before applying them. > But may I suggest the alternative? I think you have a valid point here. I too don't understand the proposed "Axiomatic Retrieval Function" (ARF) in this regard: in Lucene, the norm value stored for a document (assuming all boosts are 1) is norm(D) = 1 / sqrt(numTerms(D)) This value is ready to use at scoring time, multiplying it with tf(t in d) - idf(t)^^2 as described in http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/Similarity.html Now, the ARF paper in http://sifaka.cs.uiuc.edu/hfang/lucene/Lucene_exp.pdf describes Lucene scoring using |D| in place of norm(D) above, and describes ARF scoring using |D| again, the same as it seems to be implemented in this patch e.g. in TermScorer. However, the paper defines |D| as "the length of D". I find this confusing. Usually "|D|" really means the number of words in a document, and "avgdl" would mean the average of all |D|'s in the collection (see for instance "Okapi BM25" in Wikipedia). Now, your proposed change is something I can understand - it first translates back norm(D) into Length(D) (ignoring boosts), and only then averaging it. In any case - I mean if either this is fixed or I am wrong and an explanation shows why no fix is needed - I have to admit I still don't understand the logic behind ARF, intuitively, why would it be better? Guess provable search quality results can help in persuading... (LUCENE-836 is resolved btw). > Implement a state-of-the-art retrieval function in Lucene > - > > Key: LUCENE-965 > URL: https://issues.apache.org/jira/browse/LUCENE-965 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.2 >Reporter: Hui Fang > Attachments: axiomaticFunction.patch > > > We implemented the axiomatic retrieval function, which is a state-of-the-art > retrieval function, to > replace the default similarity function in Lucene. We compared the > performance of these two functions and reported the results at > http://sifaka.cs.uiuc.edu/hfang/lucene/Lucene_exp.pdf. > The report shows that the performance of the axiomatic retrieval function is > much better than the default function. The axiomatic retrieval function is > able to find more relevant documents and users can see more relevant > documents in the top-ranked documents. Incorporating such a state-of-the-art > retrieval function could improve the search performance of all the > applications which were built upon Lucene. > Most changes related to the implementation are made in AXSimilarity, > TermScorer and TermQuery.java. However, many test cases are hand coded to > test whether the implementation of the default function is correct. Thus, I > also made the modification to many test files to make the new retrieval > function pass those cases. In fact, we found that some old test cases are not > reasonable. For example, in the testQueries02 of TestBoolean2.java, > the query is "+w3 xx", and we have two documents "w1 xx w2 yy w3" and "w1 w3 > xx w2 yy w3". > The second document should be more relevant than the first one, because it > has more > occurrences of the query term "w3". But the original test case would require > us to rank > the first document higher than the second one, which is not reasonable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Updated: (LUCENE-743) IndexReader.reopen()
https://issues.apache.org/jira/browse/LUCENE-743 Michael Busch (JIRA) wrote: [ https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch updated LUCENE-743: - Fix Version/s: 2.3 IndexReader.reopen() Key: LUCENE-743 URL: https://issues.apache.org/jira/browse/LUCENE-743 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Otis Gospodnetic Assignee: Michael Busch Priority: Minor Fix For: 2.3 Attachments: IndexReaderUtils.java, lucene-743.patch, lucene-743.patch, MyMultiReader.java, MySegmentReader.java This is Robert Engels' implementation of IndexReader.reopen() functionality, as a set of 3 new classes (this was easier for him to implement, but should probably be folded into the core, if this looks good). - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-969) Optimize the core tokenizers/analyzers & deprecate Token.termText
[ https://issues.apache.org/jira/browse/LUCENE-969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe updated LUCENE-969: --- Lucene Fields: [New, Patch Available] (was: [Patch Available, New]) CharSequence was introduced in 1.4: http://java.sun.com/j2se/1.4.2/docs/api/java/lang/CharSequence.html > Optimize the core tokenizers/analyzers & deprecate Token.termText > - > > Key: LUCENE-969 > URL: https://issues.apache.org/jira/browse/LUCENE-969 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Affects Versions: 2.3 >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 2.3 > > Attachments: LUCENE-969.patch > > > There is some "low hanging fruit" for optimizing the core tokenizers > and analyzers: > - Re-use a single Token instance during indexing instead of creating > a new one for every term. To do this, I added a new method "Token > next(Token result)" (Doron's suggestion) which means TokenStream > may use the "Token result" as the returned Token, but is not > required to (ie, can still return an entirely different Token if > that is more convenient). I added default implementations for > both next() methods in TokenStream.java so that a TokenStream can > choose to implement only one of the next() methods. > - Use "char[] termBuffer" in Token instead of the "String > termText". > Token now maintains a char[] termBuffer for holding the term's > text. Tokenizers & filters should retrieve this buffer and > directly alter it to put the term text in or change the term > text. > I only deprecated the termText() method. I still allow the ctors > that pass in String termText, as well as setTermText(String), but > added a NOTE about performance cost of using these methods. I > think it's OK to keep these as convenience methods? > After the next release, when we can remove the deprecated API, we > should clean up Token.java to no longer maintain "either String or > char[]" (and the initTermBuffer() private method) and always use > the char[] termBuffer instead. > - Re-use TokenStream instances across Fields & Documents instead of > creating a new one for each doc. To do this I added an optional > "reusableTokenStream(...)" to Analyzer which just defaults to > calling tokenStream(...), and then I implemented this for the core > analyzers. > I'm using the patch from LUCENE-967 for benchmarking just > tokenization. > The changes above give 21% speedup (742 seconds -> 585 seconds) for > LowerCaseTokenizer -> StopFilter -> PorterStemFilter chain, tokenizing > all of Wikipedia, on JDK 1.6 -server -Xmx1024M, Debian Linux, RAID 5 > IO system (best of 2 runs). > If I pre-break Wikipedia docs into 100 token docs then it's 37% faster > (1236 sec -> 774 sec), I think because of re-using TokenStreams across > docs. > I'm just running with this alg and recording the elapsed time: > analyzer=org.apache.lucene.analysis.LowercaseStopPorterAnalyzer > doc.tokenize.log.step=5 > docs.file=/lucene/wikifull.txt > doc.maker=org.apache.lucene.benchmark.byTask.feeds.LineDocMaker > doc.tokenized=true > doc.maker.forever=false > {ReadTokens > : * > See this thread for discussion leading up to this: > http://www.gossamer-threads.com/lists/lucene/java-dev/51283 > I also fixed Token.toString() to work correctly when termBuffer is > used (and added unit test). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-969) Optimize the core tokenizers/analyzers & deprecate Token.termText
[ https://issues.apache.org/jira/browse/LUCENE-969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516517 ] Yonik Seeley commented on LUCENE-969: - > [...] implement CharSequence I think CharSequence is Java5 > Optimize the core tokenizers/analyzers & deprecate Token.termText > - > > Key: LUCENE-969 > URL: https://issues.apache.org/jira/browse/LUCENE-969 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Affects Versions: 2.3 >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 2.3 > > Attachments: LUCENE-969.patch > > > There is some "low hanging fruit" for optimizing the core tokenizers > and analyzers: > - Re-use a single Token instance during indexing instead of creating > a new one for every term. To do this, I added a new method "Token > next(Token result)" (Doron's suggestion) which means TokenStream > may use the "Token result" as the returned Token, but is not > required to (ie, can still return an entirely different Token if > that is more convenient). I added default implementations for > both next() methods in TokenStream.java so that a TokenStream can > choose to implement only one of the next() methods. > - Use "char[] termBuffer" in Token instead of the "String > termText". > Token now maintains a char[] termBuffer for holding the term's > text. Tokenizers & filters should retrieve this buffer and > directly alter it to put the term text in or change the term > text. > I only deprecated the termText() method. I still allow the ctors > that pass in String termText, as well as setTermText(String), but > added a NOTE about performance cost of using these methods. I > think it's OK to keep these as convenience methods? > After the next release, when we can remove the deprecated API, we > should clean up Token.java to no longer maintain "either String or > char[]" (and the initTermBuffer() private method) and always use > the char[] termBuffer instead. > - Re-use TokenStream instances across Fields & Documents instead of > creating a new one for each doc. To do this I added an optional > "reusableTokenStream(...)" to Analyzer which just defaults to > calling tokenStream(...), and then I implemented this for the core > analyzers. > I'm using the patch from LUCENE-967 for benchmarking just > tokenization. > The changes above give 21% speedup (742 seconds -> 585 seconds) for > LowerCaseTokenizer -> StopFilter -> PorterStemFilter chain, tokenizing > all of Wikipedia, on JDK 1.6 -server -Xmx1024M, Debian Linux, RAID 5 > IO system (best of 2 runs). > If I pre-break Wikipedia docs into 100 token docs then it's 37% faster > (1236 sec -> 774 sec), I think because of re-using TokenStreams across > docs. > I'm just running with this alg and recording the elapsed time: > analyzer=org.apache.lucene.analysis.LowercaseStopPorterAnalyzer > doc.tokenize.log.step=5 > docs.file=/lucene/wikifull.txt > doc.maker=org.apache.lucene.benchmark.byTask.feeds.LineDocMaker > doc.tokenized=true > doc.maker.forever=false > {ReadTokens > : * > See this thread for discussion leading up to this: > http://www.gossamer-threads.com/lists/lucene/java-dev/51283 > I also fixed Token.toString() to work correctly when termBuffer is > used (and added unit test). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-969) Optimize the core tokenizers/analyzers & deprecate Token.termText
[ https://issues.apache.org/jira/browse/LUCENE-969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516510 ] Michael Busch commented on LUCENE-969: -- Hi Mike, this is just an idea to keep Token.java simpler, but I haven't really thought about all the consequences. So feel free to tell me that it's a bad idea ;) Could you add a new class TermBuffer including the char[] array and your resize() logic that would implement CharSequence? Then you could get rid of the duplicate constructors and setters for String and char[], because String also implements CharSequence. And CharSequence has the method charAt(int index), so it should be almost as fast as directly accessing the char array in case the TermBuffer is used. You would need to change the existing constructors and setter to take a CharSequence object instead of a String, but this is not an API change as users can still pass in a String object. And then you would just need to add a new constructor with offset and length and a similiar setter. Thoughts? > Optimize the core tokenizers/analyzers & deprecate Token.termText > - > > Key: LUCENE-969 > URL: https://issues.apache.org/jira/browse/LUCENE-969 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Affects Versions: 2.3 >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 2.3 > > Attachments: LUCENE-969.patch > > > There is some "low hanging fruit" for optimizing the core tokenizers > and analyzers: > - Re-use a single Token instance during indexing instead of creating > a new one for every term. To do this, I added a new method "Token > next(Token result)" (Doron's suggestion) which means TokenStream > may use the "Token result" as the returned Token, but is not > required to (ie, can still return an entirely different Token if > that is more convenient). I added default implementations for > both next() methods in TokenStream.java so that a TokenStream can > choose to implement only one of the next() methods. > - Use "char[] termBuffer" in Token instead of the "String > termText". > Token now maintains a char[] termBuffer for holding the term's > text. Tokenizers & filters should retrieve this buffer and > directly alter it to put the term text in or change the term > text. > I only deprecated the termText() method. I still allow the ctors > that pass in String termText, as well as setTermText(String), but > added a NOTE about performance cost of using these methods. I > think it's OK to keep these as convenience methods? > After the next release, when we can remove the deprecated API, we > should clean up Token.java to no longer maintain "either String or > char[]" (and the initTermBuffer() private method) and always use > the char[] termBuffer instead. > - Re-use TokenStream instances across Fields & Documents instead of > creating a new one for each doc. To do this I added an optional > "reusableTokenStream(...)" to Analyzer which just defaults to > calling tokenStream(...), and then I implemented this for the core > analyzers. > I'm using the patch from LUCENE-967 for benchmarking just > tokenization. > The changes above give 21% speedup (742 seconds -> 585 seconds) for > LowerCaseTokenizer -> StopFilter -> PorterStemFilter chain, tokenizing > all of Wikipedia, on JDK 1.6 -server -Xmx1024M, Debian Linux, RAID 5 > IO system (best of 2 runs). > If I pre-break Wikipedia docs into 100 token docs then it's 37% faster > (1236 sec -> 774 sec), I think because of re-using TokenStreams across > docs. > I'm just running with this alg and recording the elapsed time: > analyzer=org.apache.lucene.analysis.LowercaseStopPorterAnalyzer > doc.tokenize.log.step=5 > docs.file=/lucene/wikifull.txt > doc.maker=org.apache.lucene.benchmark.byTask.feeds.LineDocMaker > doc.tokenized=true > doc.maker.forever=false > {ReadTokens > : * > See this thread for discussion leading up to this: > http://www.gossamer-threads.com/lists/lucene/java-dev/51283 > I also fixed Token.toString() to work correctly when termBuffer is > used (and added unit test). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-965) Implement a state-of-the-art retrieval function in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516507 ] Doug Cutting commented on LUCENE-965: - > Is there a way to plug in a patch into my local source repository, so I can > diff with my favorite diff tool? patch -p 0 < foo.patch > Implement a state-of-the-art retrieval function in Lucene > - > > Key: LUCENE-965 > URL: https://issues.apache.org/jira/browse/LUCENE-965 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.2 >Reporter: Hui Fang > Attachments: axiomaticFunction.patch > > > We implemented the axiomatic retrieval function, which is a state-of-the-art > retrieval function, to > replace the default similarity function in Lucene. We compared the > performance of these two functions and reported the results at > http://sifaka.cs.uiuc.edu/hfang/lucene/Lucene_exp.pdf. > The report shows that the performance of the axiomatic retrieval function is > much better than the default function. The axiomatic retrieval function is > able to find more relevant documents and users can see more relevant > documents in the top-ranked documents. Incorporating such a state-of-the-art > retrieval function could improve the search performance of all the > applications which were built upon Lucene. > Most changes related to the implementation are made in AXSimilarity, > TermScorer and TermQuery.java. However, many test cases are hand coded to > test whether the implementation of the default function is correct. Thus, I > also made the modification to many test files to make the new retrieval > function pass those cases. In fact, we found that some old test cases are not > reasonable. For example, in the testQueries02 of TestBoolean2.java, > the query is "+w3 xx", and we have two documents "w1 xx w2 yy w3" and "w1 w3 > xx w2 yy w3". > The second document should be more relevant than the first one, because it > has more > occurrences of the query term "w3". But the original test case would require > us to rank > the first document higher than the second one, which is not reasonable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-970) FilterIndexReader should overwrite isOptimized()
[ https://issues.apache.org/jira/browse/LUCENE-970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch updated LUCENE-970: - Attachment: lucene-970.patch Trivial patch. I'm planning to commit this shortly. > FilterIndexReader should overwrite isOptimized() > > > Key: LUCENE-970 > URL: https://issues.apache.org/jira/browse/LUCENE-970 > Project: Lucene - Java > Issue Type: Bug > Components: Search >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Trivial > Fix For: 2.3 > > Attachments: lucene-970.patch > > > A call of FilterIndexReader.isOptimized() results in a NPE because > FilterIndexReader does not overwrite isOptimized(). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-584) Decouple Filter from BitSet
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516502 ] Paul Elschot commented on LUCENE-584: - Some more remarks on the 20070730 patches. To recap, this introduces Matcher as a superclass of Scorer to take the role that BitSet currently has in Filter. The total number of java files changed/added by these patches is 47, so some extra care will be needed. The following issues are still pending: What approach should be taken for the API change to Filter (see above, 2 comments up)? I'd like to get all test cases to pass again. TestRemoteCachingWrapperFilter still does not pass, and I don't know why. For xml-query-parser in contrib I'd like to know in which direction to proceed (see 1 comment up). Does it make sense to try and get the TestQueryTemplateManager test to pass again? The ..default.. patch has taken OpenBitSet and friends from solr to have a default implementation. However, I have not checked whether there is unused code in there, so some trimming may still be appropriate. Once these issues have been resolved far enough, I would recommend to introduce this shortly after a release so there is some time to let things settle. > Decouple Filter from BitSet > --- > > Key: LUCENE-584 > URL: https://issues.apache.org/jira/browse/LUCENE-584 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.0.1 >Reporter: Peter Schäfer >Priority: Minor > Attachments: bench-diff.txt, bench-diff.txt, > Matcher1-ground-20070730.patch, Matcher2-default-20070730.patch, > Matcher3-core-20070730.patch, Matcher4-contrib-misc-20070730.patch, > Matcher5-contrib-queries-20070730.patch, Matcher6-contrib-xml-20070730.patch, > Some Matchers.zip > > > {code} > package org.apache.lucene.search; > public abstract class Filter implements java.io.Serializable > { > public abstract AbstractBitSet bits(IndexReader reader) throws IOException; > } > public interface AbstractBitSet > { > public boolean get(int index); > } > {code} > It would be useful if the method =Filter.bits()= returned an abstract > interface, instead of =java.util.BitSet=. > Use case: there is a very large index, and, depending on the user's > privileges, only a small portion of the index is actually visible. > Sparsely populated =java.util.BitSet=s are not efficient and waste lots of > memory. It would be desirable to have an alternative BitSet implementation > with smaller memory footprint. > Though it _is_ possibly to derive classes from =java.util.BitSet=, it was > obviously not designed for that purpose. > That's why I propose to use an interface instead. The default implementation > could still delegate to =java.util.BitSet=. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-584) Decouple Filter from BitSet
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Elschot updated LUCENE-584: Attachment: (was: Matcher1-ground-20070730.patch) > Decouple Filter from BitSet > --- > > Key: LUCENE-584 > URL: https://issues.apache.org/jira/browse/LUCENE-584 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.0.1 >Reporter: Peter Schäfer >Priority: Minor > Attachments: bench-diff.txt, bench-diff.txt, > Matcher1-ground-20070730.patch, Matcher2-default-20070730.patch, > Matcher3-core-20070730.patch, Matcher4-contrib-misc-20070730.patch, > Matcher5-contrib-queries-20070730.patch, Matcher6-contrib-xml-20070730.patch, > Matcher6-contrib-xml-20070730.patch, Some Matchers.zip > > > {code} > package org.apache.lucene.search; > public abstract class Filter implements java.io.Serializable > { > public abstract AbstractBitSet bits(IndexReader reader) throws IOException; > } > public interface AbstractBitSet > { > public boolean get(int index); > } > {code} > It would be useful if the method =Filter.bits()= returned an abstract > interface, instead of =java.util.BitSet=. > Use case: there is a very large index, and, depending on the user's > privileges, only a small portion of the index is actually visible. > Sparsely populated =java.util.BitSet=s are not efficient and waste lots of > memory. It would be desirable to have an alternative BitSet implementation > with smaller memory footprint. > Though it _is_ possibly to derive classes from =java.util.BitSet=, it was > obviously not designed for that purpose. > That's why I propose to use an interface instead. The default implementation > could still delegate to =java.util.BitSet=. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-584) Decouple Filter from BitSet
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Elschot updated LUCENE-584: Attachment: (was: Matcher3-core-20070730.patch) > Decouple Filter from BitSet > --- > > Key: LUCENE-584 > URL: https://issues.apache.org/jira/browse/LUCENE-584 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.0.1 >Reporter: Peter Schäfer >Priority: Minor > Attachments: bench-diff.txt, bench-diff.txt, > Matcher1-ground-20070730.patch, Matcher2-default-20070730.patch, > Matcher3-core-20070730.patch, Matcher4-contrib-misc-20070730.patch, > Matcher5-contrib-queries-20070730.patch, Matcher6-contrib-xml-20070730.patch, > Matcher6-contrib-xml-20070730.patch, Some Matchers.zip > > > {code} > package org.apache.lucene.search; > public abstract class Filter implements java.io.Serializable > { > public abstract AbstractBitSet bits(IndexReader reader) throws IOException; > } > public interface AbstractBitSet > { > public boolean get(int index); > } > {code} > It would be useful if the method =Filter.bits()= returned an abstract > interface, instead of =java.util.BitSet=. > Use case: there is a very large index, and, depending on the user's > privileges, only a small portion of the index is actually visible. > Sparsely populated =java.util.BitSet=s are not efficient and waste lots of > memory. It would be desirable to have an alternative BitSet implementation > with smaller memory footprint. > Though it _is_ possibly to derive classes from =java.util.BitSet=, it was > obviously not designed for that purpose. > That's why I propose to use an interface instead. The default implementation > could still delegate to =java.util.BitSet=. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-584) Decouple Filter from BitSet
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Elschot updated LUCENE-584: Attachment: Matcher6-contrib-xml-20070730.patch Matcher5-contrib-queries-20070730.patch Matcher4-contrib-misc-20070730.patch > Decouple Filter from BitSet > --- > > Key: LUCENE-584 > URL: https://issues.apache.org/jira/browse/LUCENE-584 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.0.1 >Reporter: Peter Schäfer >Priority: Minor > Attachments: bench-diff.txt, bench-diff.txt, > Matcher1-ground-20070730.patch, Matcher2-default-20070730.patch, > Matcher3-core-20070730.patch, Matcher4-contrib-misc-20070730.patch, > Matcher5-contrib-queries-20070730.patch, Matcher6-contrib-xml-20070730.patch, > Matcher6-contrib-xml-20070730.patch, Some Matchers.zip > > > {code} > package org.apache.lucene.search; > public abstract class Filter implements java.io.Serializable > { > public abstract AbstractBitSet bits(IndexReader reader) throws IOException; > } > public interface AbstractBitSet > { > public boolean get(int index); > } > {code} > It would be useful if the method =Filter.bits()= returned an abstract > interface, instead of =java.util.BitSet=. > Use case: there is a very large index, and, depending on the user's > privileges, only a small portion of the index is actually visible. > Sparsely populated =java.util.BitSet=s are not efficient and waste lots of > memory. It would be desirable to have an alternative BitSet implementation > with smaller memory footprint. > Though it _is_ possibly to derive classes from =java.util.BitSet=, it was > obviously not designed for that purpose. > That's why I propose to use an interface instead. The default implementation > could still delegate to =java.util.BitSet=. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-584) Decouple Filter from BitSet
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Elschot updated LUCENE-584: Attachment: (was: Matcher2-default-20070730.patch) > Decouple Filter from BitSet > --- > > Key: LUCENE-584 > URL: https://issues.apache.org/jira/browse/LUCENE-584 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.0.1 >Reporter: Peter Schäfer >Priority: Minor > Attachments: bench-diff.txt, bench-diff.txt, > Matcher1-ground-20070730.patch, Matcher2-default-20070730.patch, > Matcher3-core-20070730.patch, Matcher4-contrib-misc-20070730.patch, > Matcher5-contrib-queries-20070730.patch, Matcher6-contrib-xml-20070730.patch, > Matcher6-contrib-xml-20070730.patch, Some Matchers.zip > > > {code} > package org.apache.lucene.search; > public abstract class Filter implements java.io.Serializable > { > public abstract AbstractBitSet bits(IndexReader reader) throws IOException; > } > public interface AbstractBitSet > { > public boolean get(int index); > } > {code} > It would be useful if the method =Filter.bits()= returned an abstract > interface, instead of =java.util.BitSet=. > Use case: there is a very large index, and, depending on the user's > privileges, only a small portion of the index is actually visible. > Sparsely populated =java.util.BitSet=s are not efficient and waste lots of > memory. It would be desirable to have an alternative BitSet implementation > with smaller memory footprint. > Though it _is_ possibly to derive classes from =java.util.BitSet=, it was > obviously not designed for that purpose. > That's why I propose to use an interface instead. The default implementation > could still delegate to =java.util.BitSet=. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-970) FilterIndexReader should overwrite isOptimized()
FilterIndexReader should overwrite isOptimized() Key: LUCENE-970 URL: https://issues.apache.org/jira/browse/LUCENE-970 Project: Lucene - Java Issue Type: Bug Components: Search Reporter: Michael Busch Assignee: Michael Busch Priority: Trivial Fix For: 2.3 A call of FilterIndexReader.isOptimized() results in a NPE because FilterIndexReader does not overwrite isOptimized(). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-584) Decouple Filter from BitSet
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Elschot updated LUCENE-584: Attachment: Matcher3-core-20070730.patch Matcher2-default-20070730.patch Matcher1-ground-20070730.patch Uploading the patches again, this time with the ASF license. > Decouple Filter from BitSet > --- > > Key: LUCENE-584 > URL: https://issues.apache.org/jira/browse/LUCENE-584 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.0.1 >Reporter: Peter Schäfer >Priority: Minor > Attachments: bench-diff.txt, bench-diff.txt, > Matcher1-ground-20070730.patch, Matcher2-default-20070730.patch, > Matcher3-core-20070730.patch, Matcher4-contrib-misc-20070730.patch, > Matcher5-contrib-queries-20070730.patch, Matcher6-contrib-xml-20070730.patch, > Matcher6-contrib-xml-20070730.patch, Some Matchers.zip > > > {code} > package org.apache.lucene.search; > public abstract class Filter implements java.io.Serializable > { > public abstract AbstractBitSet bits(IndexReader reader) throws IOException; > } > public interface AbstractBitSet > { > public boolean get(int index); > } > {code} > It would be useful if the method =Filter.bits()= returned an abstract > interface, instead of =java.util.BitSet=. > Use case: there is a very large index, and, depending on the user's > privileges, only a small portion of the index is actually visible. > Sparsely populated =java.util.BitSet=s are not efficient and waste lots of > memory. It would be desirable to have an alternative BitSet implementation > with smaller memory footprint. > Though it _is_ possibly to derive classes from =java.util.BitSet=, it was > obviously not designed for that purpose. > That's why I propose to use an interface instead. The default implementation > could still delegate to =java.util.BitSet=. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-584) Decouple Filter from BitSet
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Elschot updated LUCENE-584: Attachment: (was: Matcher6-contrib-xml-20070730.patch) > Decouple Filter from BitSet > --- > > Key: LUCENE-584 > URL: https://issues.apache.org/jira/browse/LUCENE-584 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.0.1 >Reporter: Peter Schäfer >Priority: Minor > Attachments: bench-diff.txt, bench-diff.txt, > Matcher1-ground-20070730.patch, Matcher2-default-20070730.patch, > Matcher3-core-20070730.patch, Matcher4-contrib-misc-20070730.patch, > Matcher5-contrib-queries-20070730.patch, Matcher6-contrib-xml-20070730.patch, > Some Matchers.zip > > > {code} > package org.apache.lucene.search; > public abstract class Filter implements java.io.Serializable > { > public abstract AbstractBitSet bits(IndexReader reader) throws IOException; > } > public interface AbstractBitSet > { > public boolean get(int index); > } > {code} > It would be useful if the method =Filter.bits()= returned an abstract > interface, instead of =java.util.BitSet=. > Use case: there is a very large index, and, depending on the user's > privileges, only a small portion of the index is actually visible. > Sparsely populated =java.util.BitSet=s are not efficient and waste lots of > memory. It would be desirable to have an alternative BitSet implementation > with smaller memory footprint. > Though it _is_ possibly to derive classes from =java.util.BitSet=, it was > obviously not designed for that purpose. > That's why I propose to use an interface instead. The default implementation > could still delegate to =java.util.BitSet=. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-584) Decouple Filter from BitSet
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Elschot updated LUCENE-584: Attachment: (was: Matcher4-contrib-misc-20070730.patch) > Decouple Filter from BitSet > --- > > Key: LUCENE-584 > URL: https://issues.apache.org/jira/browse/LUCENE-584 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.0.1 >Reporter: Peter Schäfer >Priority: Minor > Attachments: bench-diff.txt, bench-diff.txt, > Matcher1-ground-20070730.patch, Matcher2-default-20070730.patch, > Matcher3-core-20070730.patch, Matcher4-contrib-misc-20070730.patch, > Matcher5-contrib-queries-20070730.patch, Matcher6-contrib-xml-20070730.patch, > Matcher6-contrib-xml-20070730.patch, Some Matchers.zip > > > {code} > package org.apache.lucene.search; > public abstract class Filter implements java.io.Serializable > { > public abstract AbstractBitSet bits(IndexReader reader) throws IOException; > } > public interface AbstractBitSet > { > public boolean get(int index); > } > {code} > It would be useful if the method =Filter.bits()= returned an abstract > interface, instead of =java.util.BitSet=. > Use case: there is a very large index, and, depending on the user's > privileges, only a small portion of the index is actually visible. > Sparsely populated =java.util.BitSet=s are not efficient and waste lots of > memory. It would be desirable to have an alternative BitSet implementation > with smaller memory footprint. > Though it _is_ possibly to derive classes from =java.util.BitSet=, it was > obviously not designed for that purpose. > That's why I propose to use an interface instead. The default implementation > could still delegate to =java.util.BitSet=. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-584) Decouple Filter from BitSet
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Elschot updated LUCENE-584: Attachment: (was: Matcher5-contrib-queries-20070730.patch) > Decouple Filter from BitSet > --- > > Key: LUCENE-584 > URL: https://issues.apache.org/jira/browse/LUCENE-584 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.0.1 >Reporter: Peter Schäfer >Priority: Minor > Attachments: bench-diff.txt, bench-diff.txt, > Matcher1-ground-20070730.patch, Matcher2-default-20070730.patch, > Matcher3-core-20070730.patch, Matcher4-contrib-misc-20070730.patch, > Matcher5-contrib-queries-20070730.patch, Matcher6-contrib-xml-20070730.patch, > Matcher6-contrib-xml-20070730.patch, Some Matchers.zip > > > {code} > package org.apache.lucene.search; > public abstract class Filter implements java.io.Serializable > { > public abstract AbstractBitSet bits(IndexReader reader) throws IOException; > } > public interface AbstractBitSet > { > public boolean get(int index); > } > {code} > It would be useful if the method =Filter.bits()= returned an abstract > interface, instead of =java.util.BitSet=. > Use case: there is a very large index, and, depending on the user's > privileges, only a small portion of the index is actually visible. > Sparsely populated =java.util.BitSet=s are not efficient and waste lots of > memory. It would be desirable to have an alternative BitSet implementation > with smaller memory footprint. > Though it _is_ possibly to derive classes from =java.util.BitSet=, it was > obviously not designed for that purpose. > That's why I propose to use an interface instead. The default implementation > could still delegate to =java.util.BitSet=. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-584) Decouple Filter from BitSet
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Elschot updated LUCENE-584: Attachment: Matcher6-contrib-xml-20070730.patch Matcher5-contrib-queries-20070730.patch Matcher4-contrib-misc-20070730.patch Some 20070730 patches to contrib using BitSetFilter. The contrib-misc and contrib-queries patches are reasonbly good, their tests pass and replacing Filter by BitSetFilter is right for them. However, I'm not happy with the contrib-xml patch to the xml-query parser. I had to criple some of the code and to disable the TestQueryTemplateManager test. I don't know how to get around this, basically because I don't know whether it is a good idea at all to move the xml-query-parser to BitSetFilter. It might be better to move it to Filter.getMatcher() instead, but I have no idea how to do this. > Decouple Filter from BitSet > --- > > Key: LUCENE-584 > URL: https://issues.apache.org/jira/browse/LUCENE-584 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.0.1 >Reporter: Peter Schäfer >Priority: Minor > Attachments: bench-diff.txt, bench-diff.txt, > Matcher1-ground-20070730.patch, Matcher2-default-20070730.patch, > Matcher3-core-20070730.patch, Matcher4-contrib-misc-20070730.patch, > Matcher5-contrib-queries-20070730.patch, Matcher6-contrib-xml-20070730.patch, > Some Matchers.zip > > > {code} > package org.apache.lucene.search; > public abstract class Filter implements java.io.Serializable > { > public abstract AbstractBitSet bits(IndexReader reader) throws IOException; > } > public interface AbstractBitSet > { > public boolean get(int index); > } > {code} > It would be useful if the method =Filter.bits()= returned an abstract > interface, instead of =java.util.BitSet=. > Use case: there is a very large index, and, depending on the user's > privileges, only a small portion of the index is actually visible. > Sparsely populated =java.util.BitSet=s are not efficient and waste lots of > memory. It would be desirable to have an alternative BitSet implementation > with smaller memory footprint. > Though it _is_ possibly to derive classes from =java.util.BitSet=, it was > obviously not designed for that purpose. > That's why I propose to use an interface instead. The default implementation > could still delegate to =java.util.BitSet=. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-584) Decouple Filter from BitSet
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Elschot updated LUCENE-584: Attachment: (was: Matcher-ground20070725.patch) > Decouple Filter from BitSet > --- > > Key: LUCENE-584 > URL: https://issues.apache.org/jira/browse/LUCENE-584 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.0.1 >Reporter: Peter Schäfer >Priority: Minor > Attachments: bench-diff.txt, bench-diff.txt, > Matcher1-ground-20070730.patch, Matcher2-default-20070730.patch, > Matcher3-core-20070730.patch, Some Matchers.zip > > > {code} > package org.apache.lucene.search; > public abstract class Filter implements java.io.Serializable > { > public abstract AbstractBitSet bits(IndexReader reader) throws IOException; > } > public interface AbstractBitSet > { > public boolean get(int index); > } > {code} > It would be useful if the method =Filter.bits()= returned an abstract > interface, instead of =java.util.BitSet=. > Use case: there is a very large index, and, depending on the user's > privileges, only a small portion of the index is actually visible. > Sparsely populated =java.util.BitSet=s are not efficient and waste lots of > memory. It would be desirable to have an alternative BitSet implementation > with smaller memory footprint. > Though it _is_ possibly to derive classes from =java.util.BitSet=, it was > obviously not designed for that purpose. > That's why I propose to use an interface instead. The default implementation > could still delegate to =java.util.BitSet=. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-584) Decouple Filter from BitSet
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Elschot updated LUCENE-584: Attachment: (was: Matcher-default20070725.patch) > Decouple Filter from BitSet > --- > > Key: LUCENE-584 > URL: https://issues.apache.org/jira/browse/LUCENE-584 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.0.1 >Reporter: Peter Schäfer >Priority: Minor > Attachments: bench-diff.txt, bench-diff.txt, > Matcher1-ground-20070730.patch, Matcher2-default-20070730.patch, > Matcher3-core-20070730.patch, Some Matchers.zip > > > {code} > package org.apache.lucene.search; > public abstract class Filter implements java.io.Serializable > { > public abstract AbstractBitSet bits(IndexReader reader) throws IOException; > } > public interface AbstractBitSet > { > public boolean get(int index); > } > {code} > It would be useful if the method =Filter.bits()= returned an abstract > interface, instead of =java.util.BitSet=. > Use case: there is a very large index, and, depending on the user's > privileges, only a small portion of the index is actually visible. > Sparsely populated =java.util.BitSet=s are not efficient and waste lots of > memory. It would be desirable to have an alternative BitSet implementation > with smaller memory footprint. > Though it _is_ possibly to derive classes from =java.util.BitSet=, it was > obviously not designed for that purpose. > That's why I propose to use an interface instead. The default implementation > could still delegate to =java.util.BitSet=. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-584) Decouple Filter from BitSet
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Elschot updated LUCENE-584: Attachment: (was: Matcher-core20070725.patch) > Decouple Filter from BitSet > --- > > Key: LUCENE-584 > URL: https://issues.apache.org/jira/browse/LUCENE-584 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.0.1 >Reporter: Peter Schäfer >Priority: Minor > Attachments: bench-diff.txt, bench-diff.txt, > Matcher-default20070725.patch, Matcher-ground20070725.patch, > Matcher1-ground-20070730.patch, Matcher2-default-20070730.patch, > Matcher3-core-20070730.patch, Some Matchers.zip > > > {code} > package org.apache.lucene.search; > public abstract class Filter implements java.io.Serializable > { > public abstract AbstractBitSet bits(IndexReader reader) throws IOException; > } > public interface AbstractBitSet > { > public boolean get(int index); > } > {code} > It would be useful if the method =Filter.bits()= returned an abstract > interface, instead of =java.util.BitSet=. > Use case: there is a very large index, and, depending on the user's > privileges, only a small portion of the index is actually visible. > Sparsely populated =java.util.BitSet=s are not efficient and waste lots of > memory. It would be desirable to have an alternative BitSet implementation > with smaller memory footprint. > Though it _is_ possibly to derive classes from =java.util.BitSet=, it was > obviously not designed for that purpose. > That's why I propose to use an interface instead. The default implementation > could still delegate to =java.util.BitSet=. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-584) Decouple Filter from BitSet
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Elschot updated LUCENE-584: Attachment: Matcher3-core-20070730.patch Matcher2-default-20070730.patch Matcher1-ground-20070730.patch A different take in the patches of 20070730. In this version class Filter has only one method: public abstract Matcher getMatcher(IndexReader). Class BitSetFilter is added as a subclass of Filter, and it has the familiar public abstract BitSet bits(IndexReader), as well as a default implementation of the getMatcher() method. In the ..core.. patch, and in the ..contrib.. patches (to follow), most uses of Filter simply replaced by BitSetFilter. This turned out to be an easy way of dealing with this API change in Filter. This change to Filter and its replacement by BitSetFilter could well be taking things too far for now, and I'd like to know whether other approaches are preferred. The ..default.. patch contains a default implementation of a Matcher from a BitSet, and it has OpenBitSet and friends from solr, as well as SortedVIntList as posted earlier. > Decouple Filter from BitSet > --- > > Key: LUCENE-584 > URL: https://issues.apache.org/jira/browse/LUCENE-584 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.0.1 >Reporter: Peter Schäfer >Priority: Minor > Attachments: bench-diff.txt, bench-diff.txt, > Matcher-core20070725.patch, Matcher-default20070725.patch, > Matcher-ground20070725.patch, Matcher1-ground-20070730.patch, > Matcher2-default-20070730.patch, Matcher3-core-20070730.patch, Some > Matchers.zip > > > {code} > package org.apache.lucene.search; > public abstract class Filter implements java.io.Serializable > { > public abstract AbstractBitSet bits(IndexReader reader) throws IOException; > } > public interface AbstractBitSet > { > public boolean get(int index); > } > {code} > It would be useful if the method =Filter.bits()= returned an abstract > interface, instead of =java.util.BitSet=. > Use case: there is a very large index, and, depending on the user's > privileges, only a small portion of the index is actually visible. > Sparsely populated =java.util.BitSet=s are not efficient and waste lots of > memory. It would be desirable to have an alternative BitSet implementation > with smaller memory footprint. > Though it _is_ possibly to derive classes from =java.util.BitSet=, it was > obviously not designed for that purpose. > That's why I propose to use an interface instead. The default implementation > could still delegate to =java.util.BitSet=. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Commented: (LUCENE-968) SpanFilter should not extend Filter
Right, I thought briefly about that one, but in the end wasn't sure how to handle it. Having the SpanFilterResult change is no big deal, btw, at least not until it is officially released. I would be fine w/ putting a note saying this is experimental and subject to change. On Jul 30, 2007, at 12:24 PM, Paul Elschot (JIRA) wrote: [ https://issues.apache.org/jira/browse/LUCENE-968? page=com.atlassian.jira.plugin.system.issuetabpanels:comment- tabpanel#action_12516429 ] Paul Elschot commented on LUCENE-968: - Ok, I missed that possible use as a Filter. I'm busy with LUCENE-584, and I could not figure out how to deal with this one. Since it is a Filter, I'll include it in there as one of the currently present Filters. SpanFilter should not extend Filter --- Key: LUCENE-968 URL: https://issues.apache.org/jira/browse/LUCENE-968 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.3 Reporter: Paul Elschot Priority: Trivial Fix For: 2.3 Attachments: SpanFilter20070729.patch All tests pass with the patch applied. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] Migrate Lucene to JDK 1.5 for 3.0 release
On Jul 30, 2007, at 8:18 AM, DM Smith wrote: +1 from me, too. Not because I have a vote or that I am for going to 1.5, but because it is inevitable and this is a well thought out, fine plan. (excepting the aggressive timeline that has been hashed out already in this thread) I'd like to point out that there is a consequence of this plan and how Lucene has done things in the past. At 1.9 it was fully compatible with 1.4.3, with deprecations. 2.0 mostly had deprecations removed and a few bug fixes. Then the 2.x series has been backwardly compatible but not with 1.x (except being able to read prior indexes, perhaps a few other things.). If we continue that same pattern, then there will be no 1.5 features in 2.9. (Otherwise it won't compile under 1.4). Thus, 3.0 will have a 1.4.2 compatible interface. And except for new classes, new methods and compile equivalent features (such as Enums), 1.5 features won't appear in the 3.x series API. Yes, this is a slight variation from the 1.9 -> 2.0 migration. I think the plan is to switch to 1.5 for compilation for 3.0-dev and then we will be immediately open for accepting 1.5 patches. In fact, if someone submitted a patch that converted all collections to generics, I would be in favor of accepting it with all the usual caveats. I don't see any other way around, as I don't think the intent is to say that 3.x contains no 1.5 features other than it compiles using JDK 1.5. I think it is very important to preserve the Lucene API where possible and reasonable, not changing it without gain. Given that this has been the practice, I don't think it is an issue. I agree. I think method names, etc. will stay the same, but we will start adding Generics and Enums where appropriate and new code can be all 1.5. For instance, though, the Field declaration parameters are a prime place for Enums. So, the move would be to add in the new Enums and deprecate the old Field.Index and Field.Store static ints. Thus, they would not go away until 4.x (wow, that is weird to say) Does that seem reasonable? -Grant - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-968) SpanFilter should not extend Filter
[ https://issues.apache.org/jira/browse/LUCENE-968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Elschot resolved LUCENE-968. - Resolution: Invalid Lucene Fields: [New, Patch Available] (was: [Patch Available, New]) > SpanFilter should not extend Filter > --- > > Key: LUCENE-968 > URL: https://issues.apache.org/jira/browse/LUCENE-968 > Project: Lucene - Java > Issue Type: Bug > Components: Search >Affects Versions: 2.3 >Reporter: Paul Elschot >Priority: Trivial > Fix For: 2.3 > > Attachments: SpanFilter20070729.patch > > > All tests pass with the patch applied. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-968) SpanFilter should not extend Filter
[ https://issues.apache.org/jira/browse/LUCENE-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516429 ] Paul Elschot commented on LUCENE-968: - Ok, I missed that possible use as a Filter. I'm busy with LUCENE-584, and I could not figure out how to deal with this one. Since it is a Filter, I'll include it in there as one of the currently present Filters. > SpanFilter should not extend Filter > --- > > Key: LUCENE-968 > URL: https://issues.apache.org/jira/browse/LUCENE-968 > Project: Lucene - Java > Issue Type: Bug > Components: Search >Affects Versions: 2.3 >Reporter: Paul Elschot >Priority: Trivial > Fix For: 2.3 > > Attachments: SpanFilter20070729.patch > > > All tests pass with the patch applied. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-871) ISOLatin1AccentFilter a bit slow
[ https://issues.apache.org/jira/browse/LUCENE-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516403 ] Michael McCandless commented on LUCENE-871: --- OK, for LUCENE-969 I made yet a 3rd option for optimizing ISOLatin1AccentFilter. In that patch I reuse the Token instance, using the char[] API for the Token's text instead of String, and I also re-use a single TokenStream instance (I did this for all core tokenizers). I just tested total time to tokenize all wikipedia content with current trunk (1116 sec) vs with LUCENE-969 (500 sec), with a WhitespaceTokenizer -> ISOLatin1AccentFilter chain. I separately timed just creating the documents at 112 sec, to subtract it off from the above times (so I can measure only cost of tokenization). This gives net speedup of this filter is 2.97X faster (1004 sec -> 388 sec). > ISOLatin1AccentFilter a bit slow > > > Key: LUCENE-871 > URL: https://issues.apache.org/jira/browse/LUCENE-871 > Project: Lucene - Java > Issue Type: Bug > Components: Analysis >Affects Versions: 1.9, 2.0.0, 2.0.1, 2.1, 2.2 >Reporter: Ian Boston > Attachments: fasterisoremove1.patch, fasterisoremove2.patch, > ISOLatin1AccentFilter.java.patch > > > The ISOLatin1AccentFilter is a bit slow giving 300+ ms responses when used in > a highligher for output responses. > Patch to follow -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-969) Optimize the core tokenizers/analyzers & deprecate Token.termText
[ https://issues.apache.org/jira/browse/LUCENE-969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-969: -- Lucene Fields: [New, Patch Available] (was: [New]) > Optimize the core tokenizers/analyzers & deprecate Token.termText > - > > Key: LUCENE-969 > URL: https://issues.apache.org/jira/browse/LUCENE-969 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Affects Versions: 2.3 >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 2.3 > > Attachments: LUCENE-969.patch > > > There is some "low hanging fruit" for optimizing the core tokenizers > and analyzers: > - Re-use a single Token instance during indexing instead of creating > a new one for every term. To do this, I added a new method "Token > next(Token result)" (Doron's suggestion) which means TokenStream > may use the "Token result" as the returned Token, but is not > required to (ie, can still return an entirely different Token if > that is more convenient). I added default implementations for > both next() methods in TokenStream.java so that a TokenStream can > choose to implement only one of the next() methods. > - Use "char[] termBuffer" in Token instead of the "String > termText". > Token now maintains a char[] termBuffer for holding the term's > text. Tokenizers & filters should retrieve this buffer and > directly alter it to put the term text in or change the term > text. > I only deprecated the termText() method. I still allow the ctors > that pass in String termText, as well as setTermText(String), but > added a NOTE about performance cost of using these methods. I > think it's OK to keep these as convenience methods? > After the next release, when we can remove the deprecated API, we > should clean up Token.java to no longer maintain "either String or > char[]" (and the initTermBuffer() private method) and always use > the char[] termBuffer instead. > - Re-use TokenStream instances across Fields & Documents instead of > creating a new one for each doc. To do this I added an optional > "reusableTokenStream(...)" to Analyzer which just defaults to > calling tokenStream(...), and then I implemented this for the core > analyzers. > I'm using the patch from LUCENE-967 for benchmarking just > tokenization. > The changes above give 21% speedup (742 seconds -> 585 seconds) for > LowerCaseTokenizer -> StopFilter -> PorterStemFilter chain, tokenizing > all of Wikipedia, on JDK 1.6 -server -Xmx1024M, Debian Linux, RAID 5 > IO system (best of 2 runs). > If I pre-break Wikipedia docs into 100 token docs then it's 37% faster > (1236 sec -> 774 sec), I think because of re-using TokenStreams across > docs. > I'm just running with this alg and recording the elapsed time: > analyzer=org.apache.lucene.analysis.LowercaseStopPorterAnalyzer > doc.tokenize.log.step=5 > docs.file=/lucene/wikifull.txt > doc.maker=org.apache.lucene.benchmark.byTask.feeds.LineDocMaker > doc.tokenized=true > doc.maker.forever=false > {ReadTokens > : * > See this thread for discussion leading up to this: > http://www.gossamer-threads.com/lists/lucene/java-dev/51283 > I also fixed Token.toString() to work correctly when termBuffer is > used (and added unit test). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Token termBuffer issues
"Michael McCandless" <[EMAIL PROTECTED]> wrote: > "Yonik Seeley" <[EMAIL PROTECTED]> wrote: > > On 7/25/07, Michael McCandless <[EMAIL PROTECTED]> wrote: > > > "Yonik Seeley" <[EMAIL PROTECTED]> wrote: > > > > On 7/24/07, Michael McCandless <[EMAIL PROTECTED]> wrote: > > > > > "Yonik Seeley" <[EMAIL PROTECTED]> wrote: > > > > > > On 7/24/07, Michael McCandless <[EMAIL PROTECTED]> wrote: > > > > > > > OK, I ran some benchmarks here. > > > > > > > > > > > > > > The performance gains are sizable: 12.8% speedup using Sun's JDK > > > > > > > 5 and > > > > > > > 17.2% speedup using Sun's JDK 6, on Linux. This is indexing all > > > > > > > Wikipedia content using LowerCaseTokenizer + StopFilter + > > > > > > > PorterStemFilter. I think it's worth pursuing! > > > > > > > > > > > > Did you try it w/o token reuse (reuse tokens only when mutating, not > > > > > > when creating new tokens from the tokenizer)? > > > > > > > > > > I haven't tried this variant yet. I guess for long filter chains the > > > > > GC cost of the tokenizer making the initial token should go down as > > > > > overall part of the time. Though I think we should still re-use the > > > > > initial token since it should (?) only help. > > > > > > > > If it weren't any slower, that would be great... but I worry about > > > > filters that need buffering (either on the input side or the output > > > > side) and how that interacts with filters that try and reuse. > > > > > > OK I will tease out this effect & measure performance impact. > > > > > > This would mean that the tokenizer must not only produce new Token > > > instance for each term but also cannot re-use the underlying char[] > > > buffer in that token, right? > > > > If the tokenizer can actually change the contents of the char[], then > > yes, it seems like when next() is called rather than next(Token), a > > new char[] would need to be allocated. > > Right. So I'm now testing "reuse all" vs "tokenizer makes a full copy > but filters get to re-use it". OK, I tested this case where CharTokenizer makes a new Token (and new char[] array) for every token instead of re-using each. This way is 6% slower than fully re-using the Token (585 sec -> 618 sec) -- using same test as described in https://issues.apache.org/jira/browse/LUCENE-969. > > > EG with mods for CharTokenizer I re-use > > > its "char[] buffer" with every Token, but I'll change that to be a new > > > buffer for each Token for this test. > > > > It's not just for a test, right? If next() is called, it can't reuse > > the char[]. And there is no getting around the fact that some > > tokenizers will need to call next() because of buffering. > > Correct -- the way I'm doing this now is in TokenStream.java I have a > default "Token next()" which calls "next(Token result)" but makes a > complete copy before returning it. This keeps full backwards > compatiblity even in the case where a consumer wants a private copy > (calls next()) but the provider only provides the "re-use" API > (next(Token result)). > > Mike > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-969) Optimize the core tokenizers/analyzers & deprecate Token.termText
[ https://issues.apache.org/jira/browse/LUCENE-969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-969: -- Attachment: LUCENE-969.patch First-cut patch. All tests pass. I still need do fix some javadocs but otherwise I think this is close... > Optimize the core tokenizers/analyzers & deprecate Token.termText > - > > Key: LUCENE-969 > URL: https://issues.apache.org/jira/browse/LUCENE-969 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Affects Versions: 2.3 >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 2.3 > > Attachments: LUCENE-969.patch > > > There is some "low hanging fruit" for optimizing the core tokenizers > and analyzers: > - Re-use a single Token instance during indexing instead of creating > a new one for every term. To do this, I added a new method "Token > next(Token result)" (Doron's suggestion) which means TokenStream > may use the "Token result" as the returned Token, but is not > required to (ie, can still return an entirely different Token if > that is more convenient). I added default implementations for > both next() methods in TokenStream.java so that a TokenStream can > choose to implement only one of the next() methods. > - Use "char[] termBuffer" in Token instead of the "String > termText". > Token now maintains a char[] termBuffer for holding the term's > text. Tokenizers & filters should retrieve this buffer and > directly alter it to put the term text in or change the term > text. > I only deprecated the termText() method. I still allow the ctors > that pass in String termText, as well as setTermText(String), but > added a NOTE about performance cost of using these methods. I > think it's OK to keep these as convenience methods? > After the next release, when we can remove the deprecated API, we > should clean up Token.java to no longer maintain "either String or > char[]" (and the initTermBuffer() private method) and always use > the char[] termBuffer instead. > - Re-use TokenStream instances across Fields & Documents instead of > creating a new one for each doc. To do this I added an optional > "reusableTokenStream(...)" to Analyzer which just defaults to > calling tokenStream(...), and then I implemented this for the core > analyzers. > I'm using the patch from LUCENE-967 for benchmarking just > tokenization. > The changes above give 21% speedup (742 seconds -> 585 seconds) for > LowerCaseTokenizer -> StopFilter -> PorterStemFilter chain, tokenizing > all of Wikipedia, on JDK 1.6 -server -Xmx1024M, Debian Linux, RAID 5 > IO system (best of 2 runs). > If I pre-break Wikipedia docs into 100 token docs then it's 37% faster > (1236 sec -> 774 sec), I think because of re-using TokenStreams across > docs. > I'm just running with this alg and recording the elapsed time: > analyzer=org.apache.lucene.analysis.LowercaseStopPorterAnalyzer > doc.tokenize.log.step=5 > docs.file=/lucene/wikifull.txt > doc.maker=org.apache.lucene.benchmark.byTask.feeds.LineDocMaker > doc.tokenized=true > doc.maker.forever=false > {ReadTokens > : * > See this thread for discussion leading up to this: > http://www.gossamer-threads.com/lists/lucene/java-dev/51283 > I also fixed Token.toString() to work correctly when termBuffer is > used (and added unit test). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-969) Optimize the core tokenizers/analyzers & deprecate Token.termText
Optimize the core tokenizers/analyzers & deprecate Token.termText - Key: LUCENE-969 URL: https://issues.apache.org/jira/browse/LUCENE-969 Project: Lucene - Java Issue Type: Improvement Components: Analysis Affects Versions: 2.3 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.3 There is some "low hanging fruit" for optimizing the core tokenizers and analyzers: - Re-use a single Token instance during indexing instead of creating a new one for every term. To do this, I added a new method "Token next(Token result)" (Doron's suggestion) which means TokenStream may use the "Token result" as the returned Token, but is not required to (ie, can still return an entirely different Token if that is more convenient). I added default implementations for both next() methods in TokenStream.java so that a TokenStream can choose to implement only one of the next() methods. - Use "char[] termBuffer" in Token instead of the "String termText". Token now maintains a char[] termBuffer for holding the term's text. Tokenizers & filters should retrieve this buffer and directly alter it to put the term text in or change the term text. I only deprecated the termText() method. I still allow the ctors that pass in String termText, as well as setTermText(String), but added a NOTE about performance cost of using these methods. I think it's OK to keep these as convenience methods? After the next release, when we can remove the deprecated API, we should clean up Token.java to no longer maintain "either String or char[]" (and the initTermBuffer() private method) and always use the char[] termBuffer instead. - Re-use TokenStream instances across Fields & Documents instead of creating a new one for each doc. To do this I added an optional "reusableTokenStream(...)" to Analyzer which just defaults to calling tokenStream(...), and then I implemented this for the core analyzers. I'm using the patch from LUCENE-967 for benchmarking just tokenization. The changes above give 21% speedup (742 seconds -> 585 seconds) for LowerCaseTokenizer -> StopFilter -> PorterStemFilter chain, tokenizing all of Wikipedia, on JDK 1.6 -server -Xmx1024M, Debian Linux, RAID 5 IO system (best of 2 runs). If I pre-break Wikipedia docs into 100 token docs then it's 37% faster (1236 sec -> 774 sec), I think because of re-using TokenStreams across docs. I'm just running with this alg and recording the elapsed time: analyzer=org.apache.lucene.analysis.LowercaseStopPorterAnalyzer doc.tokenize.log.step=5 docs.file=/lucene/wikifull.txt doc.maker=org.apache.lucene.benchmark.byTask.feeds.LineDocMaker doc.tokenized=true doc.maker.forever=false {ReadTokens > : * See this thread for discussion leading up to this: http://www.gossamer-threads.com/lists/lucene/java-dev/51283 I also fixed Token.toString() to work correctly when termBuffer is used (and added unit test). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] Migrate Lucene to JDK 1.5 for 3.0 release
+1 from me, too. Not because I have a vote or that I am for going to 1.5, but because it is inevitable and this is a well thought out, fine plan. (excepting the aggressive timeline that has been hashed out already in this thread) I'd like to point out that there is a consequence of this plan and how Lucene has done things in the past. At 1.9 it was fully compatible with 1.4.3, with deprecations. 2.0 mostly had deprecations removed and a few bug fixes. Then the 2.x series has been backwardly compatible but not with 1.x (except being able to read prior indexes, perhaps a few other things.). If we continue that same pattern, then there will be no 1.5 features in 2.9. (Otherwise it won't compile under 1.4). Thus, 3.0 will have a 1.4.2 compatible interface. And except for new classes, new methods and compile equivalent features (such as Enums), 1.5 features won't appear in the 3.x series API. I think it is very important to preserve the Lucene API where possible and reasonable, not changing it without gain. Given that this has been the practice, I don't think it is an issue. -- DM Smith On Jul 26, 2007, at 8:36 PM, Grant Ingersoll wrote: I propose we take the following path for migrating Lucene Java to JDK 1.5: 1. Put in any new deprecations we want, cleanups, etc. 2. Release 2.4 so all of Mike M's goodness is available to 1.4 users within the next 2-4 weeks using our new release mechanism (i.e code freeze, branch, documentation. I tentatively volunteer to be the RM, but hope someone will be my wingman on it). 3. Announce that 2.9 will be the last version under JDK 1.4 4. Put in any other deprecations that we want and do as we did when moving from 1.4.3 to 1.9 by laying out a migration plan, etc. 5. Release 2.9 as the last official release on JDK 1.4 6. Switch 3.0-dev to be on JDK 1.5, removing any deprecated code and updating ANT to use 1.5 for source and target. 7. Start accepting JDK 1.5 patches on 3.0-dev If possible, efforts should be made to identify people who are willing to backport 3.x changes to JDK 1.4 on 2.9 and give them branch commit rights, but this is not a strict requirement of this plan. Thus: +1 for JDK 1.5 as outlined in steps 1-7 0 if you don't care -1 if you are against it Since the weekend is coming up, how about we leave this vote open until Monday? You can see discussions of this here: http://www.gossamer- threads.com/lists/lucene/java-dev/51421 Here is my +1. Cheers, Grant - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[EMAIL PROTECTED]: Project lucene-java (in module lucene-java) failed
To whom it may engage... This is an automated request, but not an unsolicited one. For more information please visit http://gump.apache.org/nagged.html, and/or contact the folk at [EMAIL PROTECTED] Project lucene-java has an issue affecting its community integration. This issue affects 3 projects, and has been outstanding for 24 runs. The current state of this project is 'Failed', with reason 'Build Failed'. For reference only, the following projects are affected by this: - eyebrowse : Web-based mail archive browsing - jakarta-lucene : Java Based Search Engine - lucene-java : Java Based Search Engine Full details are available at: http://vmgump.apache.org/gump/public/lucene-java/lucene-java/index.html That said, some information snippets are provided here. The following annotations (debug/informational/warning/error messages) were provided: -DEBUG- Sole output [lucene-core-30072007.jar] identifier set to project name -DEBUG- Dependency on javacc exists, no need to add for property javacc.home. -INFO- Failed with reason build failed -INFO- Failed to extract fallback artifacts from Gump Repository The following work was performed: http://vmgump.apache.org/gump/public/lucene-java/lucene-java/gump_work/build_lucene-java_lucene-java.html Work Name: build_lucene-java_lucene-java (Type: Build) Work ended in a state of : Failed Elapsed: 34 secs Command Line: /usr/lib/jvm/java-1.5.0-sun/bin/java -Djava.awt.headless=true -Xbootclasspath/p:/srv/gump/public/workspace/xml-commons/java/external/build/xml-apis.jar:/srv/gump/public/workspace/xml-xerces2/build/xercesImpl.jar org.apache.tools.ant.Main -Dgump.merge=/srv/gump/public/gump/work/merge.xml -Dbuild.sysclasspath=only -Dversion=30072007 -Djavacc.home=/srv/gump/packages/javacc-3.1 package [Working Directory: /srv/gump/public/workspace/lucene-java] CLASSPATH: /usr/lib/jvm/java-1.5.0-sun/lib/tools.jar:/srv/gump/public/workspace/lucene-java/build/classes/java:/srv/gump/public/workspace/lucene-java/build/classes/demo:/srv/gump/public/workspace/lucene-java/build/classes/test:/srv/gump/public/workspace/lucene-java/contrib/db/bdb/lib/db-4.3.29.jar:/srv/gump/public/workspace/lucene-java/contrib/gdata-server/lib/gdata-client-1.0.jar:/srv/gump/public/workspace/lucene-java/build/contrib/analyzers/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/ant/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/benchmark/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/db/bdb/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/db/bdb-je/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/gdata-server/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/highlighter/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/javascript/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/lucli/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/memory/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/queries/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/regex/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/similarity/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/snowball/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/spellchecker/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/surround/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/swing/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/wordnet/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/xml-query-parser/classes/java:/srv/gump/public/workspace/ant/dist/lib/ant-jmf.jar:/srv/gump/public/workspace/ant/dist/lib/ant-swing.jar:/srv/gump/public/workspace/ant/dist/lib/ant-apache-resolver.jar:/srv/gump/public/workspace/ant/dist/lib/ant-trax.jar:/srv/gump/public/workspace/ant/dist/lib/ant-junit.jar:/srv/gump/public/workspace/ant/dist/lib/ant-launcher.jar:/srv/gump/public/workspace/ant/dist/lib/ant-nodeps.jar:/srv/gump/public/workspace/ant/dist/lib/ant.jar:/srv/gump/packages/junit3.8.1/junit.jar:/srv/gump/public/workspace/xml-commons/java/build/resolver.jar:/srv/gump/packages/je-1.7.1/lib/je.jar:/srv/gump/public/workspace/apache-commons/digester/dist/commons-digester.jar:/srv/gump/public/workspace/jakarta-regexp/build/jakarta-regexp-30072007.jar:/srv/gump/packages/javacc-3.1/bin/lib/javacc.jar:/srv/gump/public/workspace/jline/target/jline-0.9.92-SNAPSHOT.jar:/srv/gump/packages/jtidy-04aug2000r7-dev/build/Tidy.jar:/srv/gump/public/workspace/junit/dist/junit-30072007.jar:/srv/gump/public/workspace/xml-commons/java/external/build/xml-apis-ext.jar:/srv/gump/public/workspace/apache-commons/logging/target/commons-logging-30072007.jar:/srv/gump/public/workspace/apache-commons/logging/target/commons-logging-api-30072007.jar:/srv/gump/public/workspace/jakarta-servletapi-5/jsr154/dist/lib/servlet-api.jar:/srv/gump/packages/nekoh
[EMAIL PROTECTED]: Project lucene-java (in module lucene-java) failed
To whom it may engage... This is an automated request, but not an unsolicited one. For more information please visit http://gump.apache.org/nagged.html, and/or contact the folk at [EMAIL PROTECTED] Project lucene-java has an issue affecting its community integration. This issue affects 3 projects, and has been outstanding for 24 runs. The current state of this project is 'Failed', with reason 'Build Failed'. For reference only, the following projects are affected by this: - eyebrowse : Web-based mail archive browsing - jakarta-lucene : Java Based Search Engine - lucene-java : Java Based Search Engine Full details are available at: http://vmgump.apache.org/gump/public/lucene-java/lucene-java/index.html That said, some information snippets are provided here. The following annotations (debug/informational/warning/error messages) were provided: -DEBUG- Sole output [lucene-core-30072007.jar] identifier set to project name -DEBUG- Dependency on javacc exists, no need to add for property javacc.home. -INFO- Failed with reason build failed -INFO- Failed to extract fallback artifacts from Gump Repository The following work was performed: http://vmgump.apache.org/gump/public/lucene-java/lucene-java/gump_work/build_lucene-java_lucene-java.html Work Name: build_lucene-java_lucene-java (Type: Build) Work ended in a state of : Failed Elapsed: 34 secs Command Line: /usr/lib/jvm/java-1.5.0-sun/bin/java -Djava.awt.headless=true -Xbootclasspath/p:/srv/gump/public/workspace/xml-commons/java/external/build/xml-apis.jar:/srv/gump/public/workspace/xml-xerces2/build/xercesImpl.jar org.apache.tools.ant.Main -Dgump.merge=/srv/gump/public/gump/work/merge.xml -Dbuild.sysclasspath=only -Dversion=30072007 -Djavacc.home=/srv/gump/packages/javacc-3.1 package [Working Directory: /srv/gump/public/workspace/lucene-java] CLASSPATH: /usr/lib/jvm/java-1.5.0-sun/lib/tools.jar:/srv/gump/public/workspace/lucene-java/build/classes/java:/srv/gump/public/workspace/lucene-java/build/classes/demo:/srv/gump/public/workspace/lucene-java/build/classes/test:/srv/gump/public/workspace/lucene-java/contrib/db/bdb/lib/db-4.3.29.jar:/srv/gump/public/workspace/lucene-java/contrib/gdata-server/lib/gdata-client-1.0.jar:/srv/gump/public/workspace/lucene-java/build/contrib/analyzers/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/ant/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/benchmark/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/db/bdb/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/db/bdb-je/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/gdata-server/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/highlighter/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/javascript/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/lucli/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/memory/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/queries/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/regex/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/similarity/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/snowball/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/spellchecker/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/surround/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/swing/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/wordnet/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/xml-query-parser/classes/java:/srv/gump/public/workspace/ant/dist/lib/ant-jmf.jar:/srv/gump/public/workspace/ant/dist/lib/ant-swing.jar:/srv/gump/public/workspace/ant/dist/lib/ant-apache-resolver.jar:/srv/gump/public/workspace/ant/dist/lib/ant-trax.jar:/srv/gump/public/workspace/ant/dist/lib/ant-junit.jar:/srv/gump/public/workspace/ant/dist/lib/ant-launcher.jar:/srv/gump/public/workspace/ant/dist/lib/ant-nodeps.jar:/srv/gump/public/workspace/ant/dist/lib/ant.jar:/srv/gump/packages/junit3.8.1/junit.jar:/srv/gump/public/workspace/xml-commons/java/build/resolver.jar:/srv/gump/packages/je-1.7.1/lib/je.jar:/srv/gump/public/workspace/apache-commons/digester/dist/commons-digester.jar:/srv/gump/public/workspace/jakarta-regexp/build/jakarta-regexp-30072007.jar:/srv/gump/packages/javacc-3.1/bin/lib/javacc.jar:/srv/gump/public/workspace/jline/target/jline-0.9.92-SNAPSHOT.jar:/srv/gump/packages/jtidy-04aug2000r7-dev/build/Tidy.jar:/srv/gump/public/workspace/junit/dist/junit-30072007.jar:/srv/gump/public/workspace/xml-commons/java/external/build/xml-apis-ext.jar:/srv/gump/public/workspace/apache-commons/logging/target/commons-logging-30072007.jar:/srv/gump/public/workspace/apache-commons/logging/target/commons-logging-api-30072007.jar:/srv/gump/public/workspace/jakarta-servletapi-5/jsr154/dist/lib/servlet-api.jar:/srv/gump/packages/nekoh