[jira] Commented: (LUCENE-1941) MinPayloadFunction returns 0 when only one payload is present
[ https://issues.apache.org/jira/browse/LUCENE-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833552#action_12833552 ] Erik Hatcher commented on LUCENE-1941: -- Uwe - patch looks good. Go for it! MinPayloadFunction returns 0 when only one payload is present - Key: LUCENE-1941 URL: https://issues.apache.org/jira/browse/LUCENE-1941 Project: Lucene - Java Issue Type: Bug Components: Query/Scoring Affects Versions: 2.9, 3.0 Reporter: Erik Hatcher Assignee: Uwe Schindler Fix For: 2.9.2, 3.0.1, 3.1 Attachments: LUCENE-1941.patch, LUCENE-1941.patch In some experiments with payload scoring through PayloadTermQuery, I'm seeing 0 returned when using MinPayloadFunction. I believe there is a bug there. No time at the moment to flesh out a unit test, but wanted to report it for tracking. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1941) MinPayloadFunction returns 0 when only one payload is present
[ https://issues.apache.org/jira/browse/LUCENE-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832932#action_12832932 ] Erik Hatcher commented on LUCENE-1941: -- Feel free to adjust this issue to whichever Lucene version makes sense. I don't have bandwidth at the moment to address this myself. MinPayloadFunction returns 0 when only one payload is present - Key: LUCENE-1941 URL: https://issues.apache.org/jira/browse/LUCENE-1941 Project: Lucene - Java Issue Type: Bug Components: Query/Scoring Affects Versions: 2.9, 3.0 Reporter: Erik Hatcher Fix For: 2.9.2, 3.0.1, 3.1 In some experiments with payload scoring through PayloadTermQuery, I'm seeing 0 returned when using MinPayloadFunction. I believe there is a bug there. No time at the moment to flesh out a unit test, but wanted to report it for tracking. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2238) deprecate ChineseAnalyzer
[ https://issues.apache.org/jira/browse/LUCENE-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806239#action_12806239 ] Erik Hatcher commented on LUCENE-2238: -- +1 deprecate ChineseAnalyzer - Key: LUCENE-2238 URL: https://issues.apache.org/jira/browse/LUCENE-2238 Project: Lucene - Java Issue Type: Task Components: contrib/analyzers Reporter: Robert Muir Priority: Minor Fix For: 3.1 Attachments: LUCENE-2238.patch The ChineseAnalyzer, ChineseTokenizer, and ChineseFilter (not the smart one, or CJK) indexes chinese text as individual characters and removes english stopwords, etc. In my opinion we should simply deprecate all of this in favor of StandardAnalyzer, StandardTokenizer, and StopFilter, which does the same thing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-2231) my lucene project is able to search single time how can make it as long as i can
[ https://issues.apache.org/jira/browse/LUCENE-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher resolved LUCENE-2231. -- Resolution: Not A Problem Please ask support questions on the java-user list. Also (bias noted here), the book Lucene in Action will help you out immensely with these getting started questions. my lucene project is able to search single time how can make it as long as i can Key: LUCENE-2231 URL: https://issues.apache.org/jira/browse/LUCENE-2231 Project: Lucene - Java Issue Type: Wish Affects Versions: 2.9.1 Reporter: sameeuddin Mohammed Priority: Critical Original Estimate: 5h Remaining Estimate: 5h i am using lucene with netbeans 6.5 when i execute my project it will show only single time next time there are no results in search and i want to know how to match lower case and higher case as same, and when i have i word for ex simpletext i want to search for only simple plz send my reply as soon as possible -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2198) support protected words in Stemming TokenFilters
[ https://issues.apache.org/jira/browse/LUCENE-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12799839#action_12799839 ] Erik Hatcher commented on LUCENE-2198: -- +1 on the StemAttribute approach. I've just encountered this exact need in some custom code I've been reviewing, where the decision to stem or not is dynamic per term (with the approach I'm looking at using a custom term type string and a custom stem filter). support protected words in Stemming TokenFilters Key: LUCENE-2198 URL: https://issues.apache.org/jira/browse/LUCENE-2198 Project: Lucene - Java Issue Type: Improvement Components: Analysis Affects Versions: 3.0 Reporter: Robert Muir Priority: Minor This is from LUCENE-1515 I propose that all stemming TokenFilters have an 'exclusion set' that bypasses any stemming for words in this set. Some stemming tokenfilters have this, some do not. This would be one way for Karl to implement his new swedish stemmer (as a text file of ignore words). Additionally, it would remove duplication between lucene and solr, as they reimplement snowballfilter since it does not have this functionality. Finally, I think this is a pretty common use case, where people want to ignore things like proper nouns in the stemming. As an alternative design I considered a case where we generalized this to CharArrayMap (and ignoring words would mean mapping them to themselves), which would also provide a mechanism to override the stemming algorithm. But I think this is too expert, could be its own filter, and the only example of this i can find is in the Dutch stemmer. So I think we should just provide ignore with CharArraySet, but if you feel otherwise please comment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-1941) MinPayloadFunction returns 0 when only one payload is present
MinPayloadFunction returns 0 when only one payload is present - Key: LUCENE-1941 URL: https://issues.apache.org/jira/browse/LUCENE-1941 Project: Lucene - Java Issue Type: Bug Components: Query/Scoring Affects Versions: 2.9 Reporter: Erik Hatcher In some experiments with payload scoring through PayloadTermQuery, I'm seeing 0 returned when using MinPayloadFunction. I believe there is a bug there. No time at the moment to flesh out a unit test, but wanted to report it for tracking. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1938) Precedence query parser using the contrib/queryparser framework
[ https://issues.apache.org/jira/browse/LUCENE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12761405#action_12761405 ] Erik Hatcher commented on LUCENE-1938: -- Yes, let's just remove the old PrecedenceQueryParser (which was just an experiment by me - is anyone actually using it?) Precedence query parser using the contrib/queryparser framework --- Key: LUCENE-1938 URL: https://issues.apache.org/jira/browse/LUCENE-1938 Project: Lucene - Java Issue Type: New Feature Components: contrib/* Affects Versions: 2.9 Reporter: Adriano Crestani Assignee: Adriano Crestani Priority: Minor Fix For: 3.1 Attachments: LUCENE-1938.patch Extend the current StandardQueryParser on contrib so it supports boolean precedence -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-1850) Update overview example code
Update overview example code Key: LUCENE-1850 URL: https://issues.apache.org/jira/browse/LUCENE-1850 Project: Lucene - Java Issue Type: Task Components: Examples, Javadocs Reporter: Erik Hatcher Fix For: 2.9 See http://lucene.apache.org/java/2_4_1/api/core/overview-summary.html - need to update for non-deprecated best-practices/recommended API usage. Also, double-check that the demo app works as documented. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-1806) Add args to test-macro
[ https://issues.apache.org/jira/browse/LUCENE-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher resolved LUCENE-1806. -- Resolution: Fixed Done, thanks Jason. Add args to test-macro -- Key: LUCENE-1806 URL: https://issues.apache.org/jira/browse/LUCENE-1806 Project: Lucene - Java Issue Type: Improvement Components: Build Affects Versions: 2.4.1 Reporter: Jason Rutherglen Priority: Trivial Fix For: 2.9 Attachments: LUCENE-1806.patch Original Estimate: 0.03h Remaining Estimate: 0.03h Add passing args to JUnit. (Like Solr and mainly for debugging). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1800) QueryParser should use reusable token streams
[ https://issues.apache.org/jira/browse/LUCENE-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742781#action_12742781 ] Erik Hatcher commented on LUCENE-1800: -- Does anyone use PrecedenceQueryParser? It was an experiment tossed out there, but I've not heard of anyone using it for real. QueryParser should use reusable token streams - Key: LUCENE-1800 URL: https://issues.apache.org/jira/browse/LUCENE-1800 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Yonik Seeley Assignee: Yonik Seeley Fix For: 2.9 Attachments: LUCENE-1800.patch, LUCENE-1800_analyzingQP.patch Just like indexing, the query parser should use reusable token streams -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-1405) Support for new Resources model in ant 1.7 in Lucene ant task.
[ https://issues.apache.org/jira/browse/LUCENE-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher resolved LUCENE-1405. -- Resolution: Fixed Przemyslaw - apologies for the delay in addressing this valuable patch. It's now been tested and committed. I also added a comment to example.xml showing how to run the index task from a source checkout. Support for new Resources model in ant 1.7 in Lucene ant task. -- Key: LUCENE-1405 URL: https://issues.apache.org/jira/browse/LUCENE-1405 Project: Lucene - Java Issue Type: Improvement Components: contrib/* Affects Versions: 2.3.2 Reporter: Przemyslaw Sztoch Assignee: Erik Hatcher Fix For: 2.9 Attachments: lucene-ant1.7-newresources.patch Ant Task for Lucene should use modern Resource model (not only FileSet child element). There is a patch with required changes. Supported by old (ant 1.6) and new (ant 1.7) resources model: index !-- Lucene Ant Task -- fileset ... / /index Supported only by new (ant 1.7) resources model: index !-- Lucene Ant Task -- filelist ... / /index index !-- Lucene Ant Task -- userdefinied-filesource ... / /index -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Assigned: (LUCENE-1405) Support for new Resources model in ant 1.7 in Lucene ant task.
[ https://issues.apache.org/jira/browse/LUCENE-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher reassigned LUCENE-1405: Assignee: Erik Hatcher Support for new Resources model in ant 1.7 in Lucene ant task. -- Key: LUCENE-1405 URL: https://issues.apache.org/jira/browse/LUCENE-1405 Project: Lucene - Java Issue Type: Improvement Components: contrib/* Affects Versions: 2.3.2 Reporter: Przemyslaw Sztoch Assignee: Erik Hatcher Fix For: 2.9 Attachments: lucene-ant1.7-newresources.patch Ant Task for Lucene should use modern Resource model (not only FileSet child element). There is a patch with required changes. Supported by old (ant 1.6) and new (ant 1.7) resources model: index !-- Lucene Ant Task -- fileset ... / /index Supported only by new (ant 1.7) resources model: index !-- Lucene Ant Task -- filelist ... / /index index !-- Lucene Ant Task -- userdefinied-filesource ... / /index -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-1629) contrib intelligent Analyzer for Chinese
[ https://issues.apache.org/jira/browse/LUCENE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12708912#action_12708912 ] Erik Hatcher edited comment on LUCENE-1629 at 5/13/09 5:58 AM: --- My initial thought is to move the copy excluding {noformat} **/*.java and **/*.html{noformat} to the compile macro. In the ancient past, Ant actually used to do this automatically with javac. was (Author: ehatcher): My initial thought is to move the copy excluding **/*.java and **/*.html to the compile macro. In the ancient past, Ant actually used to do this automatically with javac. contrib intelligent Analyzer for Chinese Key: LUCENE-1629 URL: https://issues.apache.org/jira/browse/LUCENE-1629 Project: Lucene - Java Issue Type: Improvement Components: contrib/analyzers Affects Versions: 2.4.1 Environment: for java 1.5 or higher, lucene 2.4.1 Reporter: Xiaoping Gao Assignee: Michael McCandless Fix For: 2.9 Attachments: analysis-data.zip, bigramdict.mem, build-resources.patch, coredict.mem, LUCENE-1629-java1.4.patch I wrote a Analyzer for apache lucene for analyzing sentences in Chinese language. it's called imdict-chinese-analyzer, the project on google code is here: http://code.google.com/p/imdict-chinese-analyzer/ In Chinese, 我是中国人(I am Chinese), should be tokenized as 我(I) 是(am) 中国人(Chinese), not 我 是中 国人. So the analyzer must handle each sentence properly, or there will be mis-understandings everywhere in the index constructed by Lucene, and the accuracy of the search engine will be affected seriously! Although there are two analyzer packages in apache repository which can handle Chinese: ChineseAnalyzer and CJKAnalyzer, they take each character or every two adjoining characters as a single word, this is obviously not true in reality, also this strategy will increase the index size and hurt the performance baddly. The algorithm of imdict-chinese-analyzer is based on Hidden Markov Model (HMM), so it can tokenize chinese sentence in a really intelligent way. Tokenizaion accuracy of this model is above 90% according to the paper HHMM-based Chinese Lexical analyzer ICTCLAL while other analyzer's is about 60%. As imdict-chinese-analyzer is a really fast and intelligent. I want to contribute it to the apache lucene repository. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1629) contrib intelligent Analyzer for Chinese
[ https://issues.apache.org/jira/browse/LUCENE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12708912#action_12708912 ] Erik Hatcher commented on LUCENE-1629: -- My initial thought is to move the copy excluding **/*.java and **/*.html to the compile macro. In the ancient past, Ant actually used to do this automatically with javac. contrib intelligent Analyzer for Chinese Key: LUCENE-1629 URL: https://issues.apache.org/jira/browse/LUCENE-1629 Project: Lucene - Java Issue Type: Improvement Components: contrib/analyzers Affects Versions: 2.4.1 Environment: for java 1.5 or higher, lucene 2.4.1 Reporter: Xiaoping Gao Assignee: Michael McCandless Fix For: 2.9 Attachments: analysis-data.zip, bigramdict.mem, build-resources.patch, coredict.mem, LUCENE-1629-java1.4.patch I wrote a Analyzer for apache lucene for analyzing sentences in Chinese language. it's called imdict-chinese-analyzer, the project on google code is here: http://code.google.com/p/imdict-chinese-analyzer/ In Chinese, 我是中国人(I am Chinese), should be tokenized as 我(I) 是(am) 中国人(Chinese), not 我 是中 国人. So the analyzer must handle each sentence properly, or there will be mis-understandings everywhere in the index constructed by Lucene, and the accuracy of the search engine will be affected seriously! Although there are two analyzer packages in apache repository which can handle Chinese: ChineseAnalyzer and CJKAnalyzer, they take each character or every two adjoining characters as a single word, this is obviously not true in reality, also this strategy will increase the index size and hurt the performance baddly. The algorithm of imdict-chinese-analyzer is based on Hidden Markov Model (HMM), so it can tokenize chinese sentence in a really intelligent way. Tokenizaion accuracy of this model is above 90% according to the paper HHMM-based Chinese Lexical analyzer ICTCLAL while other analyzer's is about 60%. As imdict-chinese-analyzer is a really fast and intelligent. I want to contribute it to the apache lucene repository. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1314) IndexReader.clone
[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12662475#action_12662475 ] Erik Hatcher commented on LUCENE-1314: -- {quote} Is there a way with ant to only test one test case? Tried: ant -Dtestcase=org.apache.lucene.index.TestIndexReaderReopen test-core which according to the Wiki http://wiki.apache.org/lucene-java/HowToContribute should work. {quote} The value of the testcase parameter fits in this way **/${testcase}.java in common-build.xml, so in your case it'd be -Dtestcase=TestIndexReaderReopen IndexReader.clone - Key: LUCENE-1314 URL: https://issues.apache.org/jira/browse/LUCENE-1314 Project: Lucene - Java Issue Type: New Feature Components: Index Affects Versions: 2.3.1 Reporter: Jason Rutherglen Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Attachments: LUCENE-1314.patch, LUCENE-1314.patch, LUCENE-1314.patch, LUCENE-1314.patch, LUCENE-1314.patch, LUCENE-1314.patch, LUCENE-1314.patch, LUCENE-1314.patch, LUCENE-1314.patch, LUCENE-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch Based on discussion http://www.nabble.com/IndexReader.reopen-issue-td18070256.html. The problem is reopen returns the same reader if there are no changes, so if docs are deleted from the new reader, they are also reflected in the previous reader which is not always desired behavior. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1387) Add LocalLucene
[ https://issues.apache.org/jira/browse/LUCENE-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12658062#action_12658062 ] Erik Hatcher commented on LUCENE-1387: -- I've taken some quick peeks into the code, run the unit tests, nicely packaged and presented! A couple of thoughts: * Maybe the Filter's should be using the DocIdSet API rather than the BitSet deprecated stuff? We can refactor that after being committed I supposed, but not something we want to leave like that. * DistanceQuery is awkwardly named. It's not an (extends) Query it's a POJO with helpers. Maybe DistanceQueryFactory? (but it creates a Filter also) * CartesianPolyFilter is not a Filter (but CartesianShapeFilter is) I think this looks good enough to commit as well, just noting the above for cosmetic refactoring consideration after the code is in. Add LocalLucene --- Key: LUCENE-1387 URL: https://issues.apache.org/jira/browse/LUCENE-1387 Project: Lucene - Java Issue Type: New Feature Components: contrib/* Reporter: Grant Ingersoll Priority: Minor Attachments: spatial-lucene.zip, spatial.tar.gz, spatial.zip Local Lucene (Geo-search) has been donated to the Lucene project, per https://issues.apache.org/jira/browse/INCUBATOR-77. This issue is to handle the Lucene portion of integration. See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1061) Adding a factory to QueryParser to instantiate query instances
[ https://issues.apache.org/jira/browse/LUCENE-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12626499#action_12626499 ] Erik Hatcher commented on LUCENE-1061: -- Michael - you are a machine! +1 to the subclassing approach and your general patch. What might be even more interesting is to make the newXXX methods return Query instead of a specific type. I'm not sure if that would work in all cases (surely not for BooleanQuery), but might for most of 'em. For example, what if newTermQuery(Term term) returned a Query instead of a TermQuery? That'd add a fair bit more flexibility, as long as none of the calling code needed a specific type of Query. The hoops we jump through because we're in Java sheesh. :) Adding a factory to QueryParser to instantiate query instances -- Key: LUCENE-1061 URL: https://issues.apache.org/jira/browse/LUCENE-1061 Project: Lucene - Java Issue Type: Improvement Components: QueryParser Affects Versions: 2.3 Reporter: John Wang Assignee: Michael McCandless Fix For: 2.4 Attachments: LUCENE-1061.patch, lucene_patch.txt With the new efforts with Payload and scoring functions, it would be nice to plugin custom query implementations while using the same QueryParser. Included is a patch with some refactoring the QueryParser to take a factory that produces query instances. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1061) Adding a factory to QueryParser to instantiate query instances
[ https://issues.apache.org/jira/browse/LUCENE-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12626055#action_12626055 ] Erik Hatcher commented on LUCENE-1061: -- What's wrong with just subclassing QueryParser and overriding the desired methods? Either way someone wanting to provide custom Query implementations will be writing effectively the same code, just with more indirection with this method. Adding a factory to QueryParser to instantiate query instances -- Key: LUCENE-1061 URL: https://issues.apache.org/jira/browse/LUCENE-1061 Project: Lucene - Java Issue Type: Improvement Components: QueryParser Affects Versions: 2.3 Reporter: John Wang Fix For: 2.4 Attachments: lucene_patch.txt With the new efforts with Payload and scoring functions, it would be nice to plugin custom query implementations while using the same QueryParser. Included is a patch with some refactoring the QueryParser to take a factory that produces query instances. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1343) A replacement for ISOLatin1AccentFilter that does a more thorough job of removing diacritical marks or non-spacing modifiers.
[ https://issues.apache.org/jira/browse/LUCENE-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12622476#action_12622476 ] Erik Hatcher commented on LUCENE-1343: -- {quote} Unit tests are the best way to document the many ways this thing can work. {quote} gets a judges score of 11 from me. Gold for Lance for Quote of the Day. A replacement for ISOLatin1AccentFilter that does a more thorough job of removing diacritical marks or non-spacing modifiers. - Key: LUCENE-1343 URL: https://issues.apache.org/jira/browse/LUCENE-1343 Project: Lucene - Java Issue Type: Improvement Components: Analysis Reporter: Robert Haschart Priority: Minor Attachments: normalizer.jar, UnicodeCharUtil.java, UnicodeNormalizationFilter.java, UnicodeNormalizationFilterFactory.java The ISOLatin1AccentFilter takes Unicode characters that have diacritical marks and replaces them with a version of that character with the diacritical mark removed. For example é becomes e. However another equally valid way of representing an accented character in Unicode is to have the unaccented character followed by a non-spacing modifier character (like this: é ) The ISOLatin1AccentFilter doesn't handle the accents in decomposed unicode characters at all.Additionally there are some instances where a word will contain what looks like an accented character, that is actually considered to be a separate unaccented character such as Ł but which to make searching easier you want to fold onto the latin1 lookalike version L . The UnicodeNormalizationFilter can filter out accents and diacritical marks whether they occur as composed characters or decomposed characters, it can also handle cases where as described above characters that look like they have diacritics (but don't) are to be folded onto the letter that they look like ( Ł - L ) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1095) StopFilter should have option to incr positionIncrement after stop word
[ https://issues.apache.org/jira/browse/LUCENE-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552905 ] Erik Hatcher commented on LUCENE-1095: -- I believe QueryParser has been fixed since that first change I made mentioned by Steven to account for positions returned from an Analyzer.So maybe all is well with fixing StopFilter now. Unit tests needed :) StopFilter should have option to incr positionIncrement after stop word --- Key: LUCENE-1095 URL: https://issues.apache.org/jira/browse/LUCENE-1095 Project: Lucene - Java Issue Type: Improvement Reporter: Hoss Man I've seen this come up on the mailing list a few times in the last month, so i'm filing a known bug/improvement arround it... StopFilter should have an option that if set, records how many stop words are skipped in a row, and then sets that value as the positionIncrement on the next token that StopFilter does return. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-167) [PATCH] QueryParser not handling queries containing AND and OR
[ https://issues.apache.org/jira/browse/LUCENE-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549046 ] Erik Hatcher commented on LUCENE-167: - the PrecedenceQueryParser is in the contrib/miscellaneous codebase (in Lucene's repo) and in released miscellaneous JAR. But it has some issues that are documented in the test case, so it is definitely not ready for prime time. [PATCH] QueryParser not handling queries containing AND and OR -- Key: LUCENE-167 URL: https://issues.apache.org/jira/browse/LUCENE-167 Project: Lucene - Java Issue Type: Bug Components: QueryParser Affects Versions: unspecified Environment: Operating System: Linux Platform: PC Reporter: Morus Walter Assignee: Erik Hatcher Attachments: LuceneTest.java, QueryParser.jj.patch, QueryParser.patch The QueryParser does not seem to handle boolean queries containing AND and OR operators correctly: e.g. a AND b OR c AND d gets parsed as +a +b +c +d. The attached patch fixes this by changing the vector of boolean clauses into a vector of vectors of boolean clauses in the addClause method of the query parser. A new sub-vector is created whenever an explicit OR operator is used. Queries using explicit AND/OR are grouped by precedence of AND over OR. That is a OR b AND c gets a OR (b AND c). Queries using implicit AND/OR (depending on the default operator) are handled as before (so one can still use a +b -c to create one boolean query, where b is required, c forbidden and a optional). It's less clear how a query using both explizit AND/OR and implicit operators should be handled. Since the patch groups on explicit OR operators a query a OR b c is read as a (b c) whereas a AND b c as +a +b c (given that default operator or is used). There's one issue left: The old query parser reads a query `a OR NOT b' as `a -b' which is the same as `a AND NOT b'. The modified query parser reads this as `a (-b)'. While this looks better (at least to me), it does not produce the result of a OR NOT b. Instead the (-b) part seems to be silently dropped. While I understand that this query is illegal (just searching for one negative term) I don't think that silently dropping this part is an appropriate way to deal with that. But I don't think that's a query parser issue. The only question is, if the query parser should take care of that. I attached the patch (made against 1.3rc3 but working for 1.3final as well) and a test program. The test program parses a number of queries with default-or and default-and operator and reparses the result of the toString method of the created query. It outputs the initial query, the parsed query with default or, the reparesed query, the parsed query with the default and it's reparsed query. If called with a -q option, it also run's the queries against an index consisting of all documentes containing one or none a b c or d. Using an unpatched and a patched version of lucene in the classpath one can look at the effect of the patch in detail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1049) Simple toString() for BooleanFilter
[ https://issues.apache.org/jira/browse/LUCENE-1049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541497 ] Erik Hatcher commented on LUCENE-1049: -- Jason - the patch looks like it is generated backwards (minus signs, not plusses). Simple toString() for BooleanFilter --- Key: LUCENE-1049 URL: https://issues.apache.org/jira/browse/LUCENE-1049 Project: Lucene - Java Issue Type: Improvement Components: contrib/* Reporter: Jason Calabrese Priority: Trivial Attachments: patch.txt While working with BooleanFilter I wanted a basic toString() for debugging. This is what I came up. It works ok for me. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Assigned: (LUCENE-961) RegexCapabilities is not Serializable
[ https://issues.apache.org/jira/browse/LUCENE-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher reassigned LUCENE-961: --- Assignee: Erik Hatcher RegexCapabilities is not Serializable - Key: LUCENE-961 URL: https://issues.apache.org/jira/browse/LUCENE-961 Project: Lucene - Java Issue Type: Bug Components: QueryParser Affects Versions: 2.2 Reporter: Konrad Rokicki Assignee: Erik Hatcher Priority: Minor The class RegexQuery is marked Serializable by its super class, but it contains a RegexCapabilities which is not Serializable. Thus attempting to serialize the query results in an exception. Making RegexCapabilities serializable should be no problem since its subclasses contain only serializable classes (java.util.regex.Pattern and org.apache.regexp.RE). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-898) contrib/javascript is not packaged into releases
[ https://issues.apache.org/jira/browse/LUCENE-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500715 ] Erik Hatcher commented on LUCENE-898: - It may still work ok, but my hunch is that changes to the QueryParser have made this javascript code more deprecated than anything. Even if we removed it from svn, it historically would still be there in case anyone really needed it. Again, I am +1 for removing it entirely after running it by the java-user list to see if anyone desires it. contrib/javascript is not packaged into releases Key: LUCENE-898 URL: https://issues.apache.org/jira/browse/LUCENE-898 Project: Lucene - Java Issue Type: Bug Components: Build Reporter: Hoss Man Priority: Trivial the contrib/javascript directory is (apparently) a collection of javascript utilities for lucene .. but it has not build files or any mechanism to package it, so it is excluded form releases. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-898) contrib/javascript is not packaged into releases
[ https://issues.apache.org/jira/browse/LUCENE-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500453 ] Erik Hatcher commented on LUCENE-898: - My vote is to remove the javascript contrib area entirely. It doesn't really do all that much useful. I'd be surprised if anyone really uses it. contrib/javascript is not packaged into releases Key: LUCENE-898 URL: https://issues.apache.org/jira/browse/LUCENE-898 Project: Lucene - Java Issue Type: Bug Components: Build Reporter: Hoss Man Priority: Trivial the contrib/javascript directory is (apparently) a collection of javascript utilities for lucene .. but it has not build files or any mechanism to package it, so it is excluded form releases. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-885) clean up build files so contrib tests are run more easily
[ https://issues.apache.org/jira/browse/LUCENE-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499740 ] Erik Hatcher commented on LUCENE-885: - PQP was a hack I made long ago to mainly show how QP could be possibly improved. I'm fine with that class being removed altogether, or the failing tests commented out. I don't use that class personally. clean up build files so contrib tests are run more easily - Key: LUCENE-885 URL: https://issues.apache.org/jira/browse/LUCENE-885 Project: Lucene - Java Issue Type: Improvement Components: Build Reporter: Hoss Man Assignee: Hoss Man Attachments: LUCENE-885.patch, LUCENE-885.patch Per mailing list discussion... http://www.nabble.com/Tests%2C-Contribs%2C-and-Releases-tf3768924.html#a10655448 Tests for contribs should be run when ant test is used, existing test target renamed to test-core -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-889) Standard tokenizer with punctuation output
[ https://issues.apache.org/jira/browse/LUCENE-889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499085 ] Erik Hatcher commented on LUCENE-889: - This patch concerns me. This changes default behavior in a very basic and commonly used piece of Lucene. At the very least this should be made entirely optional and off by default. Thoughts? Standard tokenizer with punctuation output -- Key: LUCENE-889 URL: https://issues.apache.org/jira/browse/LUCENE-889 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.1 Reporter: Karl Wettin Priority: Trivial Attachments: standard.patch, test.patch This patch adds punctuation (comma, period, question mark and exclamation point) tokens as output from the StandardTokenizer, and filters them out in the StandardFilter. (I needed them for text classification reasons.) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-874) Automatic reopen of IndexSearcher/IndexReader
[ https://issues.apache.org/jira/browse/LUCENE-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493570 ] Erik Hatcher commented on LUCENE-874: - Do note that Solr can be embedded: http://wiki.apache.org/solr/EmbeddedSolr And there are improvements to this in the works too. Automatic reopen of IndexSearcher/IndexReader - Key: LUCENE-874 URL: https://issues.apache.org/jira/browse/LUCENE-874 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: João Fonseca Priority: Minor To improve performance, a single instance of IndexSearcher should be used. However, if the index is updated, it's hard to close/reopen it, because multiple threads may be accessing it at the same time. Lucene should include an out-of-the-box solution to this problem. Either a new class should be implemented to manage this behaviour (singleton IndexSearcher, plus detection of a modified index, plus safely closing and reopening the IndexSearcher) or this could be behind the scenes by the IndexSearcher class. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Closed: (LUCENE-707) Lucene Java Site docs
[ https://issues.apache.org/jira/browse/LUCENE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher closed LUCENE-707. --- Applied, thanks George! Lucene Java Site docs - Key: LUCENE-707 URL: https://issues.apache.org/jira/browse/LUCENE-707 Project: Lucene - Java Issue Type: Improvement Components: Website Environment: N/A Reporter: Grant Ingersoll Assigned To: Grant Ingersoll Priority: Minor Attachments: lucene.apache.org.patch It would be really nice if the Java site docs where consistent with the rest of the Lucene family (namely, with navigation tabs, etc.) so that one can easily go between Nutch, Hadoop, etc. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Assigned: (LUCENE-707) Lucene Java Site docs
[ https://issues.apache.org/jira/browse/LUCENE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher reassigned LUCENE-707: --- Assignee: Erik Hatcher (was: Grant Ingersoll) Lucene Java Site docs - Key: LUCENE-707 URL: https://issues.apache.org/jira/browse/LUCENE-707 Project: Lucene - Java Issue Type: Improvement Components: Website Environment: N/A Reporter: Grant Ingersoll Assigned To: Erik Hatcher Priority: Minor Attachments: lucene.apache.org.patch It would be really nice if the Java site docs where consistent with the rest of the Lucene family (namely, with navigation tabs, etc.) so that one can easily go between Nutch, Hadoop, etc. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-446) FunctionQuery - score based on field value
[ https://issues.apache.org/jira/browse/LUCENE-446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12481944 ] Erik Hatcher commented on LUCENE-446: - +1 to FunctionQuery being brought into Lucene proper. FunctionQuery - score based on field value -- Key: LUCENE-446 URL: https://issues.apache.org/jira/browse/LUCENE-446 Project: Lucene - Java Issue Type: New Feature Components: Search Affects Versions: 1.9 Reporter: Yonik Seeley Attachments: function.zip, function.zip FunctionQuery can return a score based on a field's value or on it's ordinal value. FunctionFactory subclasses define the details of the function. There is currently a LinearFloatFunction (a line specified by slope and intercept). Field values are typically obtained from FieldValueSourceFactory. Implementations include FloatFieldSource, IntFieldSource, and OrdFieldSource. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-805) New Lucene Demo
[ https://issues.apache.org/jira/browse/LUCENE-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473419 ] Erik Hatcher commented on LUCENE-805: - The examples from Lucene in Action are freely available and Otis and I are fine with assigning the ASL to them (its currently unspecified but implicitly ASLd). If these would be useful, at least the Indexer.java and Searcher.java which are better demos than current demo application, we're free to use that as a starter. All the code could be contributed if folks are ok with that. In fact, maybe Otis and I should do the 2nd edition codebase within the Lucene svn somewhere so that it serves as a built-in example. New Lucene Demo --- Key: LUCENE-805 URL: https://issues.apache.org/jira/browse/LUCENE-805 Project: Lucene - Java Issue Type: Improvement Components: Examples Reporter: Grant Ingersoll Assigned To: Grant Ingersoll Priority: Minor The much maligned demo, while useful, could use a breath of fresh air. This issue is to start collecting requirements about what people would like to see in a demo and what they don't like in the current one. Ideas (not necessarily in order of importance): 1. More in-depth tutorial explaining indexing/searching 2. Multilingual support/demonstration 3. Better demonstration of querying capabilities: Spans, Phrases, Wildcards, Filters, sorting, etc. 4. Dealing with different content types and pointers to resources 5. Wiki use cases links -- I think it would be cool to solicit people to contribute use cases to the docs. 6. Demonstration of contrib packages, esp. Highlighter 7. Performance issues/factors/tradeoffs. Lucene lessons learned and best practices Advanced tutorials: 1. Hadoop + Lucene 2. Writing custom analyzers/filters/tokenizers 3. Changing Scoring 4. Payloads (when they are committed) Please contribute what else you would like to see. I may be able to address some of these issues for my ApacheCon talk, but not all of them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-797) Query for searching document whose title starts with ...
[ https://issues.apache.org/jira/browse/LUCENE-797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher resolved LUCENE-797. - Resolution: Invalid The java-user e-mail list is the appropriate forum to ask questions. The issue tracker is used for tracking bugs and feature enhancements. If you did not tokenize the title, you could use a prefix query (title*) with QueryParser (though you will likely want to lowercase, and index a tokenized title into another field for full-text search capability). QueryParser does not currently support the SpanQuery's, but with a SpanQuery you could find terms at the beginning of a field. Query for searching document whose title starts with ... Key: LUCENE-797 URL: https://issues.apache.org/jira/browse/LUCENE-797 Project: Lucene - Java Issue Type: Task Components: QueryParser Reporter: diasp Do you know the correct syntax for QueryParser to search all documents whose field 'title' starts with a selected text? Thank you for your help. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-645) Highligter fails to include non-token at end of string to be highlighted
[ http://issues.apache.org/jira/browse/LUCENE-645?page=comments#action_12425643 ] Erik Hatcher commented on LUCENE-645: - There is some commented out code in Highlighter.java: if (lastEndOffset text.length()) newText.append(encoder.encodeText(text.substring(lastEndOffset))); uncommenting that code fixes this issue. I've added this test to HighlighterTest.java: public void testOffByOne() throws IOException { TermQuery query= new TermQuery( new Term( data, help )); Highlighter hg = new Highlighter(new SimpleHTMLFormatter(), new QueryScorer( query )); hg.setTextFragmenter( new NullFragmenter() ); String match = null; match = hg.getBestFragment( new StandardAnalyzer(), data, help me [54-65]); assertEquals(Bhelp/B me [54-65], match); } all tests pass even with that code uncommented. I'll commit if there are no objections. Highligter fails to include non-token at end of string to be highlighted Key: LUCENE-645 URL: http://issues.apache.org/jira/browse/LUCENE-645 Project: Lucene - Java Issue Type: Bug Components: Other Affects Versions: 1.9 Environment: Red Hat Linux, Java 1.5 Windows Java 1.5 Reporter: Andrew Palmer Priority: Minor The following code extract show the problem TermQuery query= new TermQuery( new Term( data, help )); Highlighter hg = new Highlighter(new SimpleHTMLFormatter(), new QueryScorer( query )); hg.setTextFragmenter( new NullFragmenter() ); String match = null; try { match = hg.getBestFragment( new StandardAnalyzer(), data, help me [54-65] ); } catch (IOException e) { e.printStackTrace(); } System.out.println( match ); The sytsem outputs Bhelp/B me [54-65 would expect Bhelp/B me [54-65] -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Closed: (LUCENE-637) class HitDoc should be inner class or in its own .java file
[ http://issues.apache.org/jira/browse/LUCENE-637?page=all ] Erik Hatcher closed LUCENE-637. --- Resolution: Invalid class HitDoc should be inner class or in its own .java file --- Key: LUCENE-637 URL: http://issues.apache.org/jira/browse/LUCENE-637 Project: Lucene - Java Issue Type: Wish Affects Versions: 2.0.0 Reporter: alan ezust Why is class HitDoc tacked onto the end of Hits.java like that? I'd like to use it. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Assigned: (LUCENE-415) Merge error during add to index (IndexOutOfBoundsException)
[ http://issues.apache.org/jira/browse/LUCENE-415?page=all ] Erik Hatcher reassigned LUCENE-415: --- Assign To: Yonik Seeley (was: Lucene Developers) Merge error during add to index (IndexOutOfBoundsException) --- Key: LUCENE-415 URL: http://issues.apache.org/jira/browse/LUCENE-415 Project: Lucene - Java Type: Bug Components: Index Versions: 1.4 Environment: Operating System: Linux Platform: Other Reporter: Daniel Quaroni Assignee: Yonik Seeley I've been batch-building indexes, and I've build a couple hundred indexes with a total of around 150 million records. This only happened once, so it's probably impossible to reproduce, but anyway... I was building an index with around 9.6 million records, and towards the end I got this: java.lang.IndexOutOfBoundsException: Index: 54, Size: 24 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:155) at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:151) at org.apache.lucene.index.SegmentTermEnum.readTerm(SegmentTermEnum.java :149) at org.apache.lucene.index.SegmentTermEnum.next (SegmentTermEnum.java:115) at org.apache.lucene.index.SegmentMergeInfo.next (SegmentMergeInfo.java:52) at org.apache.lucene.index.SegmentMerger.mergeTermInfos (SegmentMerger.java:294) at org.apache.lucene.index.SegmentMerger.mergeTerms (SegmentMerger.java:254) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:93) at org.apache.lucene.index.IndexWriter.mergeSegments (IndexWriter.java:487) at org.apache.lucene.index.IndexWriter.maybeMergeSegments (IndexWriter.java:458) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:310) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:294) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-436) [PATCH] TermInfosReader, SegmentTermEnum Out Of Memory Exception
[ http://issues.apache.org/jira/browse/LUCENE-436?page=comments#action_12377933 ] Erik Hatcher commented on LUCENE-436: - Please, everyone, let's keep this discussion technical and factual and avoid making degrading statements to one another. It doesn't help the situation to have such negative tones being used. The discussion aspect of this should be moved to java-dev anyway, and leave JIRA comments for details on patches attached and other technical details related directly towards resolving this issue. [PATCH] TermInfosReader, SegmentTermEnum Out Of Memory Exception Key: LUCENE-436 URL: http://issues.apache.org/jira/browse/LUCENE-436 Project: Lucene - Java Type: Improvement Components: Index Versions: 1.4 Environment: Solaris JVM 1.4.1 Linux JVM 1.4.2/1.5.0 Windows not tested Reporter: kieran Attachments: FixedThreadLocal.java, Lucene-436-TestCase.tar.gz, TermInfosReader.java, ThreadLocalTest.java We've been experiencing terrible memory problems on our production search server, running lucene (1.4.3). Our live app regularly opens new indexes and, in doing so, releases old IndexReaders for garbage collection. But...there appears to be a memory leak in org.apache.lucene.index.TermInfosReader.java. Under certain conditions (possibly related to JVM version, although I've personally observed it under both linux JVM 1.4.2_06, and 1.5.0_03, and SUNOS JVM 1.4.1) the ThreadLocal member variable, enumerators doesn't get garbage-collected when the TermInfosReader object is gc-ed. Looking at the code in TermInfosReader.java, there's no reason why it _shouldn't_ be gc-ed, so I can only presume (and I've seen this suggested elsewhere) that there could be a bug in the garbage collector of some JVMs. I've seen this problem briefly discussed; in particular at the following URL: http://java2.5341.com/msg/85821.html The patch that Doug recommended, which is included in lucene-1.4.3 doesn't work in our particular circumstances. Doug's patch only clears the ThreadLocal variable for the thread running the finalizer (my knowledge of java breaks down here - I'm not sure which thread actually runs the finalizer). In our situation, the TermInfosReader is (potentially) used by more than one thread, meaning that Doug's patch _doesn't_ allow the affected JVMs to correctly collect garbage. So...I've devised a simple patch which, from my observations on linux JVMs 1.4.2_06, and 1.5.0_03, fixes this problem. Kieran PS Thanks to daniel naber for pointing me to jira/lucene @@ -19,6 +19,7 @@ import java.io.IOException; import org.apache.lucene.store.Directory; +import java.util.Hashtable; /** This stores a monotonically increasing set of Term, TermInfo pairs in a * Directory. Pairs are accessed either by Term or by ordinal position the @@ -29,7 +30,7 @@ private String segment; private FieldInfos fieldInfos; - private ThreadLocal enumerators = new ThreadLocal(); + private final Hashtable enumeratorsByThread = new Hashtable(); private SegmentTermEnum origEnum; private long size; @@ -60,10 +61,10 @@ } private SegmentTermEnum getEnum() { -SegmentTermEnum termEnum = (SegmentTermEnum)enumerators.get(); +SegmentTermEnum termEnum = (SegmentTermEnum)enumeratorsByThread.get(Thread.currentThread()); if (termEnum == null) { termEnum = terms(); - enumerators.set(termEnum); + enumeratorsByThread.put(Thread.currentThread(), termEnum); } return termEnum; } @@ -195,5 +196,15 @@ public SegmentTermEnum terms(Term term) throws IOException { get(term); return (SegmentTermEnum)getEnum().clone(); + } + + /* some jvms might have trouble gc-ing enumeratorsByThread */ + protected void finalize() throws Throwable { +try { +// make sure gc can clear up. +enumeratorsByThread.clear(); +} finally { +super.finalize(); +} } } TermInfosReader.java, full source: == package org.apache.lucene.index; /** * Copyright 2004 The Apache Software Foundation * * Licensed under the Apache License, Version 2.0 (the License); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an AS IS BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ import java.io.IOException; import org.apache.lucene.store.Directory; import java.util.Hashtable; /** This stores a monotonically increasing
[jira] Commented: (LUCENE-555) Index Corruption
[ http://issues.apache.org/jira/browse/LUCENE-555?page=comments#action_12375996 ] Erik Hatcher commented on LUCENE-555: - Could you share a test case that demonstrates this issue? Index Corruption Key: LUCENE-555 URL: http://issues.apache.org/jira/browse/LUCENE-555 Project: Lucene - Java Type: Bug Components: Index Versions: 1.9 Environment: Linux FC4, Java 1.4.9 Reporter: dan Priority: Critical Index Corruption output java.io.FileNotFoundException: ../_aki.fnm (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:204) at org.apache.lucene.store.FSIndexInput$Descriptor.init(FSDirectory.java:425) at org.apache.lucene.store.FSIndexInput.init(FSDirectory.java:434) at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:324) at org.apache.lucene.index.FieldInfos.init(FieldInfos.java:56) at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:144) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:129) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:110) at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:674) at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:658) at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:517) input - I open an index, I read, I write, I optimize, and eventually the above happens. The index is unusable. - This has happened to me somewhere between 20 and 30 times now - on indexes of different shapes and sizes. - I don't know the reason. But, the following requirement applies regardless. requirement - Like all modern database programs, there has to be a way to repair an index. Period. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Closed: (LUCENE-527) Bug in the TermDocs.freq() method?
[ http://issues.apache.org/jira/browse/LUCENE-527?page=all ] Erik Hatcher closed LUCENE-527: --- Resolution: Invalid Bug in the TermDocs.freq() method? --- Key: LUCENE-527 URL: http://issues.apache.org/jira/browse/LUCENE-527 Project: Lucene - Java Type: Bug Versions: 1.9 Environment: Scientific linux Reporter: Håkon T. Bommen I belive I get incorrect data from the TermDocs.freq() method. The attached code demonstrate this. Document one has correct term count. In document zero and two, the term stored and indexed is reported to occure once in both documents. This is incorrect. // LuceneTest.java import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.queryParser.ParseException; import org.apache.lucene.document.*; import org.apache.lucene.index.*; import org.apache.lucene.search.*; import org.apache.lucene.queryParser.QueryParser; import org.apache.lucene.store.RAMDirectory; import org.apache.lucene.store.Directory; public class LuceneTest{ public LuceneTest(){} public static void main(String[] args){ IndexWriter writer; IndexReader reader; Searcher searcher; Document doc; Directory dir = new RAMDirectory(); try{ // create index writer = new IndexWriter( dir , new StandardAnalyzer(), true); doc = new Document(); doc.add( new Field( title, Doc 0, Field.Store.YES, Field.Index.TOKENIZED ) ); doc.add( new Field( contents, Text Text and more Text, Field.Store.NO, Field.Index.TOKENIZED ) ); writer.addDocument(doc); doc = new Document(); doc.add( new Field( title, Doc 1, Field.Store.YES, Field.Index.TOKENIZED ) ); doc.add( new Field( contents, This text is not stored, only indexed., Field.Store.NO, Field.Index.TOKENIZED ) ); writer.addDocument(doc); doc = new Document(); doc.add( new Field( title, Doc 2, Field.Store.YES, Field.Index.TOKENIZED ) ); doc.add( new Field( contents, Text Text Text Text, Field.Store.NO, Field.Index.TOKENIZED ) ); writer.addDocument(doc); writer.close(); // search searcher = new IndexSearcher(dir); reader = IndexReader.open(dir); QueryParser qp = new QueryParser(contents, new StandardAnalyzer()); Query query = qp.parse(stored and indexed text); String[] terms = {stored, indexed, text}; Hits queryHits = searcher.search(query); // print results System.out.println( Found + queryHits.length() + hits.); for(int i=0; iqueryHits.length(); i++){ doc = queryHits.doc(i); System.out.println(*** + doc.get(title) + ***); int docID = queryHits.id(i); for (int j=0; jterms.length; j++){ TermDocs td = reader.termDocs( new Term(contents, terms[j]) ); td.skipTo(docID); System.out.println( Term ' + terms[j] + ' occures + td.freq() + time(s) in document nr. + docID ); } } }catch(Exception e){System.out.println(Darn);} } } -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-330) [PATCH] Use filter bits for next() and skipTo() in FilteredQuery
[ http://issues.apache.org/jira/browse/LUCENE-330?page=comments#action_12369342 ] Erik Hatcher commented on LUCENE-330: - The patch from FilteredQueryPatch1.txt has been applied and committed. Thanks for the fix, Paul! [PATCH] Use filter bits for next() and skipTo() in FilteredQuery Key: LUCENE-330 URL: http://issues.apache.org/jira/browse/LUCENE-330 Project: Lucene - Java Type: Improvement Components: Search Versions: CVS Nightly - Specify date in submission Environment: Operating System: other Platform: Other Reporter: paul.elschot Assignee: Lucene Developers Priority: Minor Attachments: FilteredQuery.java, FilteredQuery.java, FilteredQuery.java, FilteredQuery.java, FilteredQueryPatch1.txt, IndexSearcherPatch2.txt, SkipFilter.java, SkipFilter.java This improves performance of FilteredQuery by not calling score() on documents that do not pass the filter. This passes the current tests for FilteredQuery, but these tests have not been adapted/extended. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-330) [PATCH] Use filter bits for next() and skipTo() in FilteredQuery
[ http://issues.apache.org/jira/browse/LUCENE-330?page=comments#action_12368934 ] Erik Hatcher commented on LUCENE-330: - Paul - it is unfortunate that we've let this patch sit for as long as it has. I've just encountered issues with FilteredQuery myself and am looking to apply your patches in hopes they'll address the problem I've encountered with FilteredQuery's nested within a BooleanQuery. There is a comment in some of your code that this doesn't work with BooleanQuery though. Since the code has changed and your patches are no longer easily applied, could you advise on what the latest patches should be and how to go about going from trunk to these patches? Many thanks! [PATCH] Use filter bits for next() and skipTo() in FilteredQuery Key: LUCENE-330 URL: http://issues.apache.org/jira/browse/LUCENE-330 Project: Lucene - Java Type: Improvement Components: Search Versions: CVS Nightly - Specify date in submission Environment: Operating System: other Platform: Other Reporter: paul.elschot Assignee: Lucene Developers Priority: Minor Attachments: FilteredQuery.java, FilteredQuery.java, FilteredQuery.java, FilteredQuery.java, FilteredQueryPatch1.txt, IndexSearcherPatch2.txt, SkipFilter.java, SkipFilter.java This improves performance of FilteredQuery by not calling score() on documents that do not pass the filter. This passes the current tests for FilteredQuery, but these tests have not been adapted/extended. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-330) [PATCH] Use filter bits for next() and skipTo() in FilteredQuery
[ http://issues.apache.org/jira/browse/LUCENE-330?page=comments#action_12368948 ] Erik Hatcher commented on LUCENE-330: - I manually applied that patch (prior to my first comment actually) as automatically applying didn't work. I just committed another test to TestFilteredQuery, which fails with this patch with this error: java.lang.IndexOutOfBoundsException: Not a valid hit number: 0 at org.apache.lucene.search.Hits.hitDoc(Hits.java:134) at org.apache.lucene.search.Hits.id(Hits.java:116) at org.apache.lucene.search.TestFilteredQuery.testBoolean(TestFilteredQuery.java:139) I'm fairly confident I applied the patch correctly, though I suppose its possible I missed something. Here's an inlined version of the diff I have locally of FilteredQuery: $ svn diff FilteredQuery.java Index: FilteredQuery.java === --- FilteredQuery.java (revision 383339) +++ FilteredQuery.java (working copy) @@ -34,6 +34,7 @@ * pCreated: Apr 20, 2004 8:58:29 AM * * @author Tim Jones + * @author Paul Elschot * @since 1.4 * @version $Id$ * @see CachingWrapperFilter @@ -75,22 +76,42 @@ // return this query public Query getQuery() { return FilteredQuery.this; } - // return a scorer that overrides the enclosed query's score if - // the given hit has been filtered out. - public Scorer scorer (IndexReader indexReader) throws IOException { + // return a filtering scorer + public Scorer scorer (IndexReader indexReader) throws IOException { final Scorer scorer = weight.scorer (indexReader); final BitSet bitset = filter.bits (indexReader); return new Scorer (similarity) { - // pass these methods through to the enclosed scorer - public boolean next() throws IOException { return scorer.next(); } + public boolean next() throws IOException { +do { + if (! scorer.next()) { +return false; + } +} while (! bitset.get(scorer.doc())); +/* When skipTo() is allowed on scorer it should be used here + * in combination with bitset.nextSetBit(...) + * See the while loop in skipTo() below. + */ +return true; + } public int doc() { return scorer.doc(); } - public boolean skipTo (int i) throws IOException { return scorer.skipTo(i); } - // if the document has been filtered out, set score to 0.0 - public float score() throws IOException { -return (bitset.get(scorer.doc())) ? scorer.score() : 0.0f; - } + public boolean skipTo(int i) throws IOException { +if (! scorer.skipTo(i)) { + return false; +} +while (! bitset.get(scorer.doc())) { + int nextFiltered = bitset.nextSetBit(scorer.doc() + 1); + if (nextFiltered == -1) { +return false; + } else if (! scorer.skipTo(nextFiltered)) { +return false; + } +} +return true; + } + + public float score() throws IOException { return scorer.score(); } // add an explanation about whether the document was filtered public Explanation explain (int i) throws IOException { What am I missing? [PATCH] Use filter bits for next() and skipTo() in FilteredQuery Key: LUCENE-330 URL: http://issues.apache.org/jira/browse/LUCENE-330 Project: Lucene - Java Type: Improvement Components: Search Versions: CVS Nightly - Specify date in submission Environment: Operating System: other Platform: Other Reporter: paul.elschot Assignee: Lucene Developers Priority: Minor Attachments: FilteredQuery.java, FilteredQuery.java, FilteredQuery.java, FilteredQuery.java, FilteredQueryPatch1.txt, IndexSearcherPatch2.txt, SkipFilter.java, SkipFilter.java This improves performance of FilteredQuery by not calling score() on documents that do not pass the filter. This passes the current tests for FilteredQuery, but these tests have not been adapted/extended. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-330) [PATCH] Use filter bits for next() and skipTo() in FilteredQuery
[ http://issues.apache.org/jira/browse/LUCENE-330?page=comments#action_12368953 ] Erik Hatcher commented on LUCENE-330: - Could you be more specific? The new TestFilteredQuery shows the details of the failure, with the stack trace in my last comment provided the patch I supplied. Those are all the specifics I have. The patch contains my name as @author, could that be removed? Sure, no problem. I simply was true to the patch you provided earlier on in this issue, but I'd be happy to remove it if this patch gets committed. [PATCH] Use filter bits for next() and skipTo() in FilteredQuery Key: LUCENE-330 URL: http://issues.apache.org/jira/browse/LUCENE-330 Project: Lucene - Java Type: Improvement Components: Search Versions: CVS Nightly - Specify date in submission Environment: Operating System: other Platform: Other Reporter: paul.elschot Assignee: Lucene Developers Priority: Minor Attachments: FilteredQuery.java, FilteredQuery.java, FilteredQuery.java, FilteredQuery.java, FilteredQueryPatch1.txt, IndexSearcherPatch2.txt, SkipFilter.java, SkipFilter.java This improves performance of FilteredQuery by not calling score() on documents that do not pass the filter. This passes the current tests for FilteredQuery, but these tests have not been adapted/extended. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-413) [PATCH] BooleanScorer2 ArrayIndexOutOfBoundsException + alternative NearSpans
[ http://issues.apache.org/jira/browse/LUCENE-413?page=all ] Erik Hatcher updated LUCENE-413: Attachment: (was: BooleanScorer2.java) [PATCH] BooleanScorer2 ArrayIndexOutOfBoundsException + alternative NearSpans - Key: LUCENE-413 URL: http://issues.apache.org/jira/browse/LUCENE-413 Project: Lucene - Java Type: Bug Components: Search Versions: CVS Nightly - Specify date in submission Environment: Operating System: other Platform: Other Reporter: paul.elschot Assignee: Lucene Developers Attachments: DisjunctionSumScorerPatch3.txt, DisjunctionSumScorerPatch4.txt, DisjunctionSumScorerTestPatch1.txt, NearSpansOrdered.java, NearSpansOrderedBugHuntPatch1.txt, NearSpansUnordered.java, SpanNearQueryPatch1.txt, SpanScorerTestPatch1.txt, TestSpansPatch1.txt From Erik's post at java-dev: [java] Caused by: java.lang.ArrayIndexOutOfBoundsException: 4 [java] at org.apache.lucene.search.BooleanScorer2 $Coordinator.coordFactor(BooleanScorer2.java:54) [java] at org.apache.lucene.search.BooleanScorer2.score (BooleanScorer2.java:292) ... and my answer: Probably nrMatchers is increased too often in score() by calling score() more than once. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-413) [PATCH] BooleanScorer2 ArrayIndexOutOfBoundsException + alternative NearSpans
[ http://issues.apache.org/jira/browse/LUCENE-413?page=all ] Erik Hatcher updated LUCENE-413: Attachment: (was: BooleanScorer2Patch20050721.txt) [PATCH] BooleanScorer2 ArrayIndexOutOfBoundsException + alternative NearSpans - Key: LUCENE-413 URL: http://issues.apache.org/jira/browse/LUCENE-413 Project: Lucene - Java Type: Bug Components: Search Versions: CVS Nightly - Specify date in submission Environment: Operating System: other Platform: Other Reporter: paul.elschot Assignee: Lucene Developers Attachments: DisjunctionSumScorerPatch3.txt, DisjunctionSumScorerPatch4.txt, DisjunctionSumScorerTestPatch1.txt, NearSpansOrdered.java, NearSpansOrderedBugHuntPatch1.txt, NearSpansUnordered.java, SpanNearQueryPatch1.txt, SpanScorerTestPatch1.txt, TestSpansPatch1.txt From Erik's post at java-dev: [java] Caused by: java.lang.ArrayIndexOutOfBoundsException: 4 [java] at org.apache.lucene.search.BooleanScorer2 $Coordinator.coordFactor(BooleanScorer2.java:54) [java] at org.apache.lucene.search.BooleanScorer2.score (BooleanScorer2.java:292) ... and my answer: Probably nrMatchers is increased too often in score() by calling score() more than once. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-413) [PATCH] BooleanScorer2 ArrayIndexOutOfBoundsException + alternative NearSpans
[ http://issues.apache.org/jira/browse/LUCENE-413?page=comments#action_12364232 ] Erik Hatcher commented on LUCENE-413: - I ran into one issue after applying all of these patches: [javac] /Users/erik/dev/lucene/src/java/org/apache/lucene/search/BooleanQuery.java:337: cannot find symbol [javac] symbol : constructor BooleanScorer2(org.apache.lucene.search.Similarity,int) [javac] location: class org.apache.lucene.search.BooleanScorer2 [javac] BooleanScorer2 result = new BooleanScorer2(similarity, The code in BooleanQuery was this: BooleanScorer2 result = new BooleanScorer2(similarity, minNrShouldMatch); I'm not sure where the mismatch came in. I removed the 2nd parameter to the non-existent BooleanScorer2 constructor to get the compile to work. What am I missing? [PATCH] BooleanScorer2 ArrayIndexOutOfBoundsException + alternative NearSpans - Key: LUCENE-413 URL: http://issues.apache.org/jira/browse/LUCENE-413 Project: Lucene - Java Type: Bug Components: Search Versions: CVS Nightly - Specify date in submission Environment: Operating System: other Platform: Other Reporter: paul.elschot Assignee: Lucene Developers Attachments: DisjunctionSumScorerPatch3.txt, DisjunctionSumScorerPatch4.txt, DisjunctionSumScorerTestPatch1.txt, NearSpansOrdered.java, NearSpansOrderedBugHuntPatch1.txt, NearSpansUnordered.java, SpanNearQueryPatch1.txt, SpanScorerTestPatch1.txt From Erik's post at java-dev: [java] Caused by: java.lang.ArrayIndexOutOfBoundsException: 4 [java] at org.apache.lucene.search.BooleanScorer2 $Coordinator.coordFactor(BooleanScorer2.java:54) [java] at org.apache.lucene.search.BooleanScorer2.score (BooleanScorer2.java:292) ... and my answer: Probably nrMatchers is increased too often in score() by calling score() more than once. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-413) [PATCH] BooleanScorer2 ArrayIndexOutOfBoundsException + alternative NearSpans
[ http://issues.apache.org/jira/browse/LUCENE-413?page=all ] Erik Hatcher updated LUCENE-413: Comment: was deleted [PATCH] BooleanScorer2 ArrayIndexOutOfBoundsException + alternative NearSpans - Key: LUCENE-413 URL: http://issues.apache.org/jira/browse/LUCENE-413 Project: Lucene - Java Type: Bug Components: Search Versions: CVS Nightly - Specify date in submission Environment: Operating System: other Platform: Other Reporter: paul.elschot Assignee: Lucene Developers Attachments: DisjunctionSumScorerPatch3.txt, DisjunctionSumScorerPatch4.txt, DisjunctionSumScorerTestPatch1.txt, NearSpansOrdered.java, NearSpansOrderedBugHuntPatch1.txt, NearSpansUnordered.java, SpanNearQueryPatch1.txt, SpanScorerTestPatch1.txt From Erik's post at java-dev: [java] Caused by: java.lang.ArrayIndexOutOfBoundsException: 4 [java] at org.apache.lucene.search.BooleanScorer2 $Coordinator.coordFactor(BooleanScorer2.java:54) [java] at org.apache.lucene.search.BooleanScorer2.score (BooleanScorer2.java:292) ... and my answer: Probably nrMatchers is increased too often in score() by calling score() more than once. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-490) JavaCC 4.0 fails to generate QueryParser.java
[ http://issues.apache.org/jira/browse/LUCENE-490?page=all ] Erik Hatcher resolved LUCENE-490: - Fix Version: 1.9 Resolution: Fixed Assign To: Erik Hatcher Patch applied, thanks! JavaCC 4.0 fails to generate QueryParser.java - Key: LUCENE-490 URL: http://issues.apache.org/jira/browse/LUCENE-490 Project: Lucene - Java Type: Bug Components: QueryParser Versions: CVS Nightly - Specify date in submission Reporter: Steven Rowe Assignee: Erik Hatcher Priority: Minor Fix For: 1.9 Attachments: QueryParser.jj.patch When generating the Java source for QueryParser via the ant task 'javacc-QueryParser' against Subversion trunk (updated Jan. 25, 2006), JavaCC 4.0 gives the following error: javacc-QueryParser: [javacc] Java Compiler Compiler Version 4.0 (Parser Generator) [javacc] (type javacc with no arguments for help) [javacc] Reading from file [...]/src/java/org/apache/lucene/queryParser/QueryParser.jj . . . [javacc] org.javacc.parser.ParseException: Encountered at line 754, column 3. [javacc] Was expecting one of: [javacc] STRING_LITERAL ... [javacc] ... [javacc] [javacc] Detected 1 errors and 0 warnings. BUILD FAILED -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Closed: (LUCENE-490) JavaCC 4.0 fails to generate QueryParser.java
[ http://issues.apache.org/jira/browse/LUCENE-490?page=all ] Erik Hatcher closed LUCENE-490: --- JavaCC 4.0 fails to generate QueryParser.java - Key: LUCENE-490 URL: http://issues.apache.org/jira/browse/LUCENE-490 Project: Lucene - Java Type: Bug Components: QueryParser Versions: CVS Nightly - Specify date in submission Reporter: Steven Rowe Assignee: Erik Hatcher Priority: Minor Fix For: 1.9 Attachments: QueryParser.jj.patch When generating the Java source for QueryParser via the ant task 'javacc-QueryParser' against Subversion trunk (updated Jan. 25, 2006), JavaCC 4.0 gives the following error: javacc-QueryParser: [javacc] Java Compiler Compiler Version 4.0 (Parser Generator) [javacc] (type javacc with no arguments for help) [javacc] Reading from file [...]/src/java/org/apache/lucene/queryParser/QueryParser.jj . . . [javacc] org.javacc.parser.ParseException: Encountered at line 754, column 3. [javacc] Was expecting one of: [javacc] STRING_LITERAL ... [javacc] ... [javacc] [javacc] Detected 1 errors and 0 warnings. BUILD FAILED -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-489) Wildcard Queries with leading *
[ http://issues.apache.org/jira/browse/LUCENE-489?page=comments#action_12363831 ] Erik Hatcher commented on LUCENE-489: - There are term rotation techniques that allow for efficient wildcard querying. For example, the word cat can be indexed as cat, $cat, t$ca, and at$c. For a query of a*, the search can be rotated to search for a*. Wildcard Queries with leading * - Key: LUCENE-489 URL: http://issues.apache.org/jira/browse/LUCENE-489 Project: Lucene - Java Type: Wish Components: QueryParser Reporter: Peter Schäfer It would be nice to have wildcard queries with a leading wildcard (? or *). I'm aware that this is a well-known issue, and I do understand the reasons behind it, but try explaining that to our end-users ... :-( -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-489) Wildcard Queries with leading *
[ http://issues.apache.org/jira/browse/LUCENE-489?page=comments#action_12363833 ] Erik Hatcher commented on LUCENE-489: - FYI - Actually it would not be possible to override getWildcardQuery to reverse a *foo query term. The parser prevents *foo from being parsed before even getting to getWildcardQuery without a change to the .jj grammar. Wildcard Queries with leading * - Key: LUCENE-489 URL: http://issues.apache.org/jira/browse/LUCENE-489 Project: Lucene - Java Type: Wish Components: QueryParser Reporter: Peter Schäfer It would be nice to have wildcard queries with a leading wildcard (? or *). I'm aware that this is a well-known issue, and I do understand the reasons behind it, but try explaining that to our end-users ... :-( -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Closed: (LUCENE-477) Build an index which allows me to broswe by category.
[ http://issues.apache.org/jira/browse/LUCENE-477?page=all ] Erik Hatcher closed LUCENE-477: --- Resolution: Invalid Yes, please bring this topic to the user list rather than JIRA Build an index which allows me to broswe by category. - Key: LUCENE-477 URL: http://issues.apache.org/jira/browse/LUCENE-477 Project: Lucene - Java Type: Task Components: Index Versions: 1.4 Environment: JDK 1.4, Windows 2003, Tomcat 5.0.28 Reporter: Mark Dos Santos Hello there, I have a collection of documents that I am using lucene to build an index for, and then I have a jsp app to search my documents. This all works great. I believe lucene is such an amazing product, but thats a whole other topic. Anyway, maybe it's my lack of experience in building indexes, but I am have trouble coming up with an index that kind of mimics verity's parametric index. You see my documents all have a category path (I have over 50,000 docs). A document can be at any level of the category path, and that same path can have many different documents. IE. Document x, has a category path USA//New Jersey//Trenton//09890 and Document y has a category path USA//New Jersey//Trenton//09890. Basically, I would like to build an index using lucene, where when I search, if my results were to bring back those two documents, I would like to retrieve the distinct category path for those two documents. Of course I can loop through and build a vector with only the unique paths that come in the search results, but that obviously would take to long when I get lets say 1 results from my search. So the question I guess is, how can I build an index that would facilitate this functionality for me. If anyone has any suggestions I would greatly appreciate it. Thanks, Mark -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Closed: (LUCENE-324) org.apache.lucene.analysis.cn.ChineseTokenizer missing offset decrement
[ http://issues.apache.org/jira/browse/LUCENE-324?page=all ] Erik Hatcher closed LUCENE-324: --- Assign To: (was: Erik Hatcher) org.apache.lucene.analysis.cn.ChineseTokenizer missing offset decrement --- Key: LUCENE-324 URL: http://issues.apache.org/jira/browse/LUCENE-324 Project: Lucene - Java Type: Bug Components: Analysis Versions: unspecified Environment: Operating System: All Platform: All Reporter: Ray Tsang Priority: Trivial Fix For: 1.9 Attachments: ChineseTokenizerTest.java, chinese_tokenizer-missing_offset.patch Apparently, in ChineseTokenizer, offset should be decremented like bufferIndex when Character is OTHER_LETTER. This directly affects startOffset and endOffset values. This is critical to have Highlighter working correctly because Highlighter marks matching text based on these offset values. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Closed: (LUCENE-288) [patch] better support gcj compilation
[ http://issues.apache.org/jira/browse/LUCENE-288?page=all ] Erik Hatcher closed LUCENE-288: --- [patch] better support gcj compilation -- Key: LUCENE-288 URL: http://issues.apache.org/jira/browse/LUCENE-288 Project: Lucene - Java Type: Bug Components: Search Versions: 1.4 Environment: Operating System: All Platform: Other Reporter: Andi Vajda Attachments: 15411.txt In order to workaround http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15411 the attached patch is necessary. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-470) Refactoring and slight extension of regex testing code.
[ http://issues.apache.org/jira/browse/LUCENE-470?page=comments#action_12358455 ] Erik Hatcher commented on LUCENE-470: - Paul - I committed your changes, thanks! I did have to add String in front of the declaration of the FN variable though :) Refactoring and slight extension of regex testing code. --- Key: LUCENE-470 URL: http://issues.apache.org/jira/browse/LUCENE-470 Project: Lucene - Java Type: Test Components: Search Versions: CVS Nightly - Specify date in submission Reporter: paul.elschot Fix For: CVS Nightly - Specify date in submission Attachments: TestRegexQuery.java -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-461) StandardTokenizer splitting all of Korean words into separate characters
[ http://issues.apache.org/jira/browse/LUCENE-461?page=all ] Erik Hatcher resolved LUCENE-461: - Fix Version: 1.9 Resolution: Fixed These patches have been applied, thanks! There is one thing to note, and that is a change in the token type emitted from CJK to CJ. It is possible that folks have written code to rely on that, but this token type is currently brittle as it is based on the JavaCC grammar definition and I view this as an acceptable break in full backwards compatibility because it is unlikely that anyone is using that token type. StandardTokenizer splitting all of Korean words into separate characters Key: LUCENE-461 URL: http://issues.apache.org/jira/browse/LUCENE-461 Project: Lucene - Java Type: Bug Components: Analysis Environment: Analyzing Korean text with Apache Lucene, esp. with StandardAnalyzer. Reporter: Cheolgoo Kang Priority: Minor Fix For: 1.9 Attachments: StandardTokenizer_KoreanWord.patch, TestStandardAnalyzer_KoreanWord.patch StandardTokenizer splits all those Korean words inth separate character tokens. For example, ? is one Korean word that means Hello, but StandardAnalyzer separates it into five tokens of ?, ?, ?, ?, ?. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Closed: (LUCENE-461) StandardTokenizer splitting all of Korean words into separate characters
[ http://issues.apache.org/jira/browse/LUCENE-461?page=all ] Erik Hatcher closed LUCENE-461: --- StandardTokenizer splitting all of Korean words into separate characters Key: LUCENE-461 URL: http://issues.apache.org/jira/browse/LUCENE-461 Project: Lucene - Java Type: Bug Components: Analysis Environment: Analyzing Korean text with Apache Lucene, esp. with StandardAnalyzer. Reporter: Cheolgoo Kang Priority: Minor Fix For: 1.9 Attachments: StandardTokenizer_KoreanWord.patch, TestStandardAnalyzer_KoreanWord.patch StandardTokenizer splits all those Korean words inth separate character tokens. For example, ? is one Korean word that means Hello, but StandardAnalyzer separates it into five tokens of ?, ?, ?, ?, ?. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-452) PrefixQuery is missing the equals() method
[ http://issues.apache.org/jira/browse/LUCENE-452?page=comments#action_12331878 ] Erik Hatcher commented on LUCENE-452: - Thank you for this patch! I'm in the process of applying it right now. Your use of the boost factor was nice to see, but points out that we have ignored it in other .equals methods (e.g. WildcardQuery). If you're interested, we'd accept patches to correct all of the other .equals methods to incorporate the boost factor :) PrefixQuery is missing the equals() method -- Key: LUCENE-452 URL: http://issues.apache.org/jira/browse/LUCENE-452 Project: Lucene - Java Type: Improvement Versions: 1.9 Reporter: Guillaume Blain Priority: Minor Attachments: PrefixQuery.java The PrefixQuery is inheriting the java.lang.Object's object default equals method. This makes it hard to have test working of PrefixFilter or any other task requiring equals to work proerply (insertion in Set, etc.). The equal method should be very similar, not to say identical except for class casting, to the equals() of TermQuery. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Closed: (LUCENE-444) StandardTokenizer loses Korean characters
[ http://issues.apache.org/jira/browse/LUCENE-444?page=all ] Erik Hatcher closed LUCENE-444: --- I'm closing this issue... but some unit tests would be nice to go along with this too, eventually :) StandardTokenizer loses Korean characters - Key: LUCENE-444 URL: http://issues.apache.org/jira/browse/LUCENE-444 Project: Lucene - Java Type: Bug Components: Analysis Reporter: Cheolgoo Kang Priority: Minor Fix For: 1.9 Attachments: StandardTokenizer_Korean.patch While using StandardAnalyzer, exp. StandardTokenizer with Korean text stream, StandardTokenizer ignores the Korean characters. This is because the definition of CJK token in StandardTokenizer.jj JavaCC file doesn't have enough range covering Korean syllables described in Unicode character map. This patch adds one line of 0xAC00~0xD7AF, the Korean syllables range to the StandardTokenizer.jj code. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Closed: (LUCENE-429) Little improvement for SimpleHTMLEncoder
[ http://issues.apache.org/jira/browse/LUCENE-429?page=all ] Erik Hatcher closed LUCENE-429: --- Resolution: Fixed Little improvement for SimpleHTMLEncoder Key: LUCENE-429 URL: http://issues.apache.org/jira/browse/LUCENE-429 Project: Lucene - Java Type: Improvement Components: Examples Versions: CVS Nightly - Specify date in submission Environment: Operating System: other Platform: Other Reporter: Stefan Wachter Priority: Minor The SimpleHTMLEncoder could be improved slightly: all characters with code = 128 should be encoded as character entities. The reason is, that the encoder does not know the encoding that is used for the response. Therefore it is safer to encode all characters beyond ASCII as character entities. Here is the necessary modification of SimpleHTMLEncoder: default: if (c 128) { result.append(c); } else { result.append(#).append((int)c).append(;); } -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-437) SnowballFilter loses token position offset
[ http://issues.apache.org/jira/browse/LUCENE-437?page=all ] Erik Hatcher resolved LUCENE-437: - Fix Version: unspecified Resolution: Fixed Yonik - thanks for the patch! It has been applied. SnowballFilter loses token position offset -- Key: LUCENE-437 URL: http://issues.apache.org/jira/browse/LUCENE-437 Project: Lucene - Java Type: Bug Components: Analysis Versions: CVS Nightly - Specify date in submission Reporter: Yonik Seeley Assignee: Erik Hatcher Fix For: unspecified Attachments: yonik_snowballfix.txt SnowballFilter doesn't set the token position increment (and thus it defaults to 1). This also affetcs SnowballAnalyzer since it uses SnowballFilter. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-429) Little improvement for SimpleHTMLEncoder
[ http://issues.apache.org/jira/browse/LUCENE-429?page=all ] Erik Hatcher updated LUCENE-429: Bugzilla Id: (was: 36333) Component: Examples (was: Other) Description: The SimpleHTMLEncoder could be improved slightly: all characters with code = 128 should be encoded as character entities. The reason is, that the encoder does not know the encoding that is used for the response. Therefore it is safer to encode all characters beyond ASCII as character entities. Here is the necessary modification of SimpleHTMLEncoder: default: if (c 128) { result.append(c); } else { result.append(#).append((int)c).append(;); } was: The SimpleHTMLEncoder could be improved slightly: all characters with code = 128 should be encoded as character entities. The reason is, that the encoder does not know the encoding that is used for the response. Therefore it is safer to encode all characters beyond ASCII as character entities. Here is the necessary modification of SimpleHTMLEncoder: default: if (c 128) { result.append(c); } else { result.append(#).append((int)c).append(;); } Environment: Operating System: other Platform: Other was: Operating System: other Platform: Other Assign To: (was: Lucene Developers) Little improvement for SimpleHTMLEncoder Key: LUCENE-429 URL: http://issues.apache.org/jira/browse/LUCENE-429 Project: Lucene - Java Type: Improvement Components: Examples Versions: CVS Nightly - Specify date in submission Environment: Operating System: other Platform: Other Reporter: Stefan Wachter Priority: Minor The SimpleHTMLEncoder could be improved slightly: all characters with code = 128 should be encoded as character entities. The reason is, that the encoder does not know the encoding that is used for the response. Therefore it is safer to encode all characters beyond ASCII as character entities. Here is the necessary modification of SimpleHTMLEncoder: default: if (c 128) { result.append(c); } else { result.append(#).append((int)c).append(;); } -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-438) add Token.setTermText(), remove final
[ http://issues.apache.org/jira/browse/LUCENE-438?page=comments#action_12330239 ] Erik Hatcher commented on LUCENE-438: - Yes, please elaborate on why you need to subclass Token. add Token.setTermText(), remove final - Key: LUCENE-438 URL: http://issues.apache.org/jira/browse/LUCENE-438 Project: Lucene - Java Type: Improvement Versions: CVS Nightly - Specify date in submission Reporter: Yonik Seeley Priority: Minor Attachments: yonik_Token.txt The Token class should be more friendly to classes not in it's package: 1) add setTermText() 2) remove final from class and toString() 3) add clone() Support for (1): TokenFilters in the same package as Token are able to do things like t.termText = t.termText.toLowerCase(); which is more efficient, but more importantly less error prone. Without the ability to change *only* the term text, a new Token must be created, and one must remember to set all the properties correctly. This exact issue caused this bug: http://issues.apache.org/jira/browse/LUCENE-437 Support for (2): Removing final allows one to subclass Token. I didn't see any performance impact after removing final. I can go into more detail on why I want to subclass Token if anyone is interested. Support for (3): - support for a synonym TokenFilter, where one needs to make two tokens from one (same args that support (1), and esp important if instance is a subclass of Token). -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]