[jira] Assigned: (LUCENE-1809) highlight-vs-vector-highlight.alg is unfair
[ https://issues.apache.org/jira/browse/LUCENE-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-1809: -- Assignee: Michael McCandless > highlight-vs-vector-highlight.alg is unfair > --- > > Key: LUCENE-1809 > URL: https://issues.apache.org/jira/browse/LUCENE-1809 > Project: Lucene - Java > Issue Type: Bug > Components: contrib/benchmark >Affects Versions: 2.9 >Reporter: Koji Sekiguchi >Assignee: Michael McCandless >Priority: Trivial > Attachments: LUCENE-1809.patch, LUCENE-1809.patch > > > highlight-vs-vector-highlight.alg uses EnwikiQueryMaker which makes > SpanQueries, but FastVectorHighlighter simply ignores SpanQueries. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1809) highlight-vs-vector-highlight.alg is unfair
[ https://issues.apache.org/jira/browse/LUCENE-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743666#action_12743666 ] Michael McCandless commented on LUCENE-1809: Patch looks good, thanks Koji. I'll commit shortly! > highlight-vs-vector-highlight.alg is unfair > --- > > Key: LUCENE-1809 > URL: https://issues.apache.org/jira/browse/LUCENE-1809 > Project: Lucene - Java > Issue Type: Bug > Components: contrib/benchmark >Affects Versions: 2.9 >Reporter: Koji Sekiguchi >Priority: Trivial > Attachments: LUCENE-1809.patch, LUCENE-1809.patch > > > highlight-vs-vector-highlight.alg uses EnwikiQueryMaker which makes > SpanQueries, but FastVectorHighlighter simply ignores SpanQueries. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-1809) highlight-vs-vector-highlight.alg is unfair
[ https://issues.apache.org/jira/browse/LUCENE-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1809. Resolution: Fixed Fix Version/s: 2.9 Thanks Koji! > highlight-vs-vector-highlight.alg is unfair > --- > > Key: LUCENE-1809 > URL: https://issues.apache.org/jira/browse/LUCENE-1809 > Project: Lucene - Java > Issue Type: Bug > Components: contrib/benchmark >Affects Versions: 2.9 >Reporter: Koji Sekiguchi >Assignee: Michael McCandless >Priority: Trivial > Fix For: 2.9 > > Attachments: LUCENE-1809.patch, LUCENE-1809.patch > > > highlight-vs-vector-highlight.alg uses EnwikiQueryMaker which makes > SpanQueries, but FastVectorHighlighter simply ignores SpanQueries. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-1811) TestIndexReaderReopen nightly build failure
TestIndexReaderReopen nightly build failure --- Key: LUCENE-1811 URL: https://issues.apache.org/jira/browse/LUCENE-1811 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9 Reporter: Michael McCandless Priority: Minor Fix For: 2.9 An interesting failure in last night's build (http://hudson.zones.apache.org/hudson/job/Lucene-trunk/920). I think the root cause wast he AIOOB exception... all the "lock obtain timed out" exceptions look like they cascaded. {code} [junit] Testsuite: org.apache.lucene.index.TestIndexReaderReopen [junit] Lock obtain timed out: org.apache.lucene.store.singleinstancel...@6ac615: write.lock) [junit] Tests run: 15, Failures: 1, Errors: 0, Time elapsed: 31.087 sec [junit] [junit] - Standard Output --- [junit] java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 148 [junit] at org.apache.lucene.util.BitVector.getAndSet(BitVector.java:74) [junit] at org.apache.lucene.index.SegmentReader.doDelete(SegmentReader.java:908) [junit] at org.apache.lucene.index.IndexReader.deleteDocument(IndexReader.java:1122) [junit] at org.apache.lucene.index.DirectoryReader.doDelete(DirectoryReader.java:521) [junit] at org.apache.lucene.index.IndexReader.deleteDocument(IndexReader.java:1122) [junit] at org.apache.lucene.index.TestIndexReaderReopen$8.modifyIndex(TestIndexReaderReopen.java:638) [junit] at org.apache.lucene.index.TestIndexReaderReopen.refreshReader(TestIndexReaderReopen.java:840) [junit] at org.apache.lucene.index.TestIndexReaderReopen.access$400(TestIndexReaderReopen.java:47) [junit] at org.apache.lucene.index.TestIndexReaderReopen$9.run(TestIndexReaderReopen.java:681) [junit] at org.apache.lucene.index.TestIndexReaderReopen$ReaderThread.run(TestIndexReaderReopen.java:822) [junit] org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: org.apache.lucene.store.singleinstancel...@88d319: write.lock [junit] at org.apache.lucene.store.Lock.obtain(Lock.java:85) [junit] at org.apache.lucene.index.DirectoryReader.acquireWriteLock(DirectoryReader.java:666) [junit] at org.apache.lucene.index.IndexReader.setNorm(IndexReader.java:994) [junit] at org.apache.lucene.index.IndexReader.setNorm(IndexReader.java:1020) [junit] at org.apache.lucene.index.TestIndexReaderReopen$8.modifyIndex(TestIndexReaderReopen.java:634) [junit] at org.apache.lucene.index.TestIndexReaderReopen.refreshReader(TestIndexReaderReopen.java:840) [junit] at org.apache.lucene.index.TestIndexReaderReopen.access$400(TestIndexReaderReopen.java:47) [junit] at org.apache.lucene.index.TestIndexReaderReopen$9.run(TestIndexReaderReopen.java:681) [junit] at org.apache.lucene.index.TestIndexReaderReopen$ReaderThread.run(TestIndexReaderReopen.java:822) ... [junit] - --- [junit] Testcase: testThreadSafety(org.apache.lucene.index.TestIndexReaderReopen): FAILED [junit] Error occurred in thread Thread-36: [junit] Lock obtain timed out: org.apache.lucene.store.singleinstancel...@6ac615: write.lock [junit] junit.framework.AssertionFailedError: Error occurred in thread Thread-36: [junit] Lock obtain timed out: org.apache.lucene.store.singleinstancel...@6ac615: write.lock [junit] at org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety(TestIndexReaderReopen.java:764) [junit] [junit] {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Build failed in Hudson: Lucene-trunk #920
This failure looks real. We hit a spooky AIOOBE in TestIndexReaderReopen.testThreadSafety. I've opened https://issues.apache.org/jira/browse/LUCENE-1811 to track it. Mike On Fri, Aug 14, 2009 at 11:16 PM, Apache Hudson Server wrote: > See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/920/changes > > Changes: > > [uschindler] LUCENE-1801: All Tokenizers/TokenStreams that are source of > tokens call AttributeSource.clearAttributes() first. Made Token.clear() > consistent to AttributeImpl (clear everything) > > [gsingers] LUCENE-1790: pass in position information for scoring > > [ehatcher] LUCENE-1806: add args to test macro (Jason Rutherglen via ehatcher) > > [mikemccand] LUCENE-1807: allow passing the Map of field name -> analyzer to > PerFieldAnalyzerWrapper > > -- > [...truncated 16851 lines...] > [junit] Testsuite: org.apache.lucene.index.store.TestRAMDirectory > [junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 3.36 sec > [junit] > [junit] Testsuite: org.apache.lucene.queryParser.TestMultiAnalyzer > [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.41 sec > [junit] > [junit] Testsuite: org.apache.lucene.queryParser.TestMultiFieldQueryParser > [junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 2.281 sec > [junit] > [junit] Testsuite: org.apache.lucene.queryParser.TestQueryParser > [junit] Tests run: 26, Failures: 0, Errors: 0, Time elapsed: 1.582 sec > [junit] > [junit] - Standard Output --- > [junit] Result: (fieldX:x fieldy:)^2.0 > [junit] - --- > [junit] Testsuite: org.apache.lucene.search.TestBoolean2 > [junit] Tests run: 11, Failures: 0, Errors: 0, Time elapsed: 12.37 sec > [junit] > [junit] Testsuite: org.apache.lucene.search.TestBooleanMinShouldMatch > [junit] Tests run: 15, Failures: 0, Errors: 0, Time elapsed: 12.08 sec > [junit] > [junit] Testsuite: org.apache.lucene.search.TestBooleanOr > [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 2.15 sec > [junit] > [junit] Testsuite: org.apache.lucene.search.TestBooleanPrefixQuery > [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.572 sec > [junit] > [junit] Testsuite: org.apache.lucene.search.TestBooleanQuery > [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.339 sec > [junit] > [junit] Testsuite: org.apache.lucene.search.TestBooleanScorer > [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.773 sec > [junit] > [junit] Testsuite: org.apache.lucene.search.TestCachingWrapperFilter > [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.503 sec > [junit] > [junit] Testsuite: org.apache.lucene.search.TestComplexExplanations > [junit] Tests run: 22, Failures: 0, Errors: 0, Time elapsed: 3.39 sec > [junit] > [junit] Testsuite: > org.apache.lucene.search.TestComplexExplanationsOfNonMatches > [junit] Tests run: 22, Failures: 0, Errors: 0, Time elapsed: 0.895 sec > [junit] > [junit] Testsuite: org.apache.lucene.search.TestConstantScoreRangeQuery > [junit] Tests run: 11, Failures: 0, Errors: 0, Time elapsed: 10.066 sec > [junit] > [junit] Testsuite: org.apache.lucene.search.TestCustomSearcherSort > [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 2.433 sec > [junit] > [junit] Testsuite: org.apache.lucene.search.TestDateFilter > [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 1.038 sec > [junit] > [junit] Testsuite: org.apache.lucene.search.TestDateSort > [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.841 sec > [junit] > [junit] Testsuite: org.apache.lucene.search.TestDisjunctionMaxQuery > [junit] Tests run: 10, Failures: 0, Errors: 0, Time elapsed: 2.333 sec > [junit] > [junit] Testsuite: org.apache.lucene.search.TestDocBoost > [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.626 sec > [junit] > [junit] Testsuite: org.apache.lucene.search.TestExplanations > [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.666 sec > [junit] > [junit] Testsuite: org.apache.lucene.search.TestExtendedFieldCache > [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.439 sec > [junit] > [junit] Testsuite: org.apache.lucene.search.TestFilteredQuery > [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 2.108 sec > [junit] > [junit] Testsuite: org.apache.lucene.search.TestFilteredSearch > [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.637 sec > [junit] > [junit] Testsuite: org.apache.lucene.search.TestFuzzyQuery > [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.924 sec > [junit] > [junit] Testsuite: org.apache.lucene.search.TestMatchAllDocsQuery > [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.946 sec > [junit] >
[jira] Commented: (LUCENE-1792) new QueryParser fails to set AUTO REWRITE for multi-term queries
[ https://issues.apache.org/jira/browse/LUCENE-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743668#action_12743668 ] Michael McCandless commented on LUCENE-1792: On quick glance the patch looks good, but I'm not going to have enough time to look more thoroughly! I think you used "svn move" to rename PrefixWildcardQueryNodeProcessore -> WildcardQueryNodeProcessor? (because "patch" fails to apply the changes). > new QueryParser fails to set AUTO REWRITE for multi-term queries > > > Key: LUCENE-1792 > URL: https://issues.apache.org/jira/browse/LUCENE-1792 > Project: Lucene - Java > Issue Type: Bug > Components: contrib/* >Affects Versions: 2.9 >Reporter: Michael McCandless >Assignee: Michael Busch >Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1792.patch, LUCENE-1792.patch, > removal_of_wildcard_and_prefix_detection_from_the_syntaxparser.patch > > > The old QueryParser defaults to constant score rewrite for > Prefix,Fuzzy,Wildcard,TermRangeQuery, but the new one seems not to. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1811) TestIndexReaderReopen nightly build failure
[ https://issues.apache.org/jira/browse/LUCENE-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743715#action_12743715 ] Michael McCandless commented on LUCENE-1811: I believe this is just a thread-safety bug in the test. It's deleting by a fixed docID, but, depending on how threads are scheduled, that docID may be invalid. I'll commit a simple fix shortly... > TestIndexReaderReopen nightly build failure > --- > > Key: LUCENE-1811 > URL: https://issues.apache.org/jira/browse/LUCENE-1811 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 2.9 >Reporter: Michael McCandless >Priority: Minor > Fix For: 2.9 > > > An interesting failure in last night's build > (http://hudson.zones.apache.org/hudson/job/Lucene-trunk/920). > I think the root cause wast he AIOOB exception... all the "lock obtain timed > out" exceptions look like they cascaded. > {code} > [junit] Testsuite: org.apache.lucene.index.TestIndexReaderReopen > [junit] Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@6ac615: write.lock) > [junit] Tests run: 15, Failures: 1, Errors: 0, Time elapsed: 31.087 sec > [junit] > [junit] - Standard Output --- > [junit] java.lang.ArrayIndexOutOfBoundsException: Array index out of > range: 148 > [junit] at org.apache.lucene.util.BitVector.getAndSet(BitVector.java:74) > [junit] at > org.apache.lucene.index.SegmentReader.doDelete(SegmentReader.java:908) > [junit] at > org.apache.lucene.index.IndexReader.deleteDocument(IndexReader.java:1122) > [junit] at > org.apache.lucene.index.DirectoryReader.doDelete(DirectoryReader.java:521) > [junit] at > org.apache.lucene.index.IndexReader.deleteDocument(IndexReader.java:1122) > [junit] at > org.apache.lucene.index.TestIndexReaderReopen$8.modifyIndex(TestIndexReaderReopen.java:638) > [junit] at > org.apache.lucene.index.TestIndexReaderReopen.refreshReader(TestIndexReaderReopen.java:840) > [junit] at > org.apache.lucene.index.TestIndexReaderReopen.access$400(TestIndexReaderReopen.java:47) > [junit] at > org.apache.lucene.index.TestIndexReaderReopen$9.run(TestIndexReaderReopen.java:681) > [junit] at > org.apache.lucene.index.TestIndexReaderReopen$ReaderThread.run(TestIndexReaderReopen.java:822) > [junit] org.apache.lucene.store.LockObtainFailedException: Lock obtain > timed out: org.apache.lucene.store.singleinstancel...@88d319: write.lock > [junit] at org.apache.lucene.store.Lock.obtain(Lock.java:85) > [junit] at > org.apache.lucene.index.DirectoryReader.acquireWriteLock(DirectoryReader.java:666) > [junit] at > org.apache.lucene.index.IndexReader.setNorm(IndexReader.java:994) > [junit] at > org.apache.lucene.index.IndexReader.setNorm(IndexReader.java:1020) > [junit] at > org.apache.lucene.index.TestIndexReaderReopen$8.modifyIndex(TestIndexReaderReopen.java:634) > [junit] at > org.apache.lucene.index.TestIndexReaderReopen.refreshReader(TestIndexReaderReopen.java:840) > [junit] at > org.apache.lucene.index.TestIndexReaderReopen.access$400(TestIndexReaderReopen.java:47) > [junit] at > org.apache.lucene.index.TestIndexReaderReopen$9.run(TestIndexReaderReopen.java:681) > [junit] at > org.apache.lucene.index.TestIndexReaderReopen$ReaderThread.run(TestIndexReaderReopen.java:822) > ... > [junit] - --- > [junit] Testcase: > testThreadSafety(org.apache.lucene.index.TestIndexReaderReopen):FAILED > [junit] Error occurred in thread Thread-36: > [junit] Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@6ac615: write.lock > [junit] junit.framework.AssertionFailedError: Error occurred in thread > Thread-36: > [junit] Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@6ac615: write.lock > [junit] at > org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety(TestIndexReaderReopen.java:764) > [junit] > [junit] > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-1811) TestIndexReaderReopen nightly build failure
[ https://issues.apache.org/jira/browse/LUCENE-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1811. Resolution: Fixed > TestIndexReaderReopen nightly build failure > --- > > Key: LUCENE-1811 > URL: https://issues.apache.org/jira/browse/LUCENE-1811 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 2.9 >Reporter: Michael McCandless >Priority: Minor > Fix For: 2.9 > > > An interesting failure in last night's build > (http://hudson.zones.apache.org/hudson/job/Lucene-trunk/920). > I think the root cause wast he AIOOB exception... all the "lock obtain timed > out" exceptions look like they cascaded. > {code} > [junit] Testsuite: org.apache.lucene.index.TestIndexReaderReopen > [junit] Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@6ac615: write.lock) > [junit] Tests run: 15, Failures: 1, Errors: 0, Time elapsed: 31.087 sec > [junit] > [junit] - Standard Output --- > [junit] java.lang.ArrayIndexOutOfBoundsException: Array index out of > range: 148 > [junit] at org.apache.lucene.util.BitVector.getAndSet(BitVector.java:74) > [junit] at > org.apache.lucene.index.SegmentReader.doDelete(SegmentReader.java:908) > [junit] at > org.apache.lucene.index.IndexReader.deleteDocument(IndexReader.java:1122) > [junit] at > org.apache.lucene.index.DirectoryReader.doDelete(DirectoryReader.java:521) > [junit] at > org.apache.lucene.index.IndexReader.deleteDocument(IndexReader.java:1122) > [junit] at > org.apache.lucene.index.TestIndexReaderReopen$8.modifyIndex(TestIndexReaderReopen.java:638) > [junit] at > org.apache.lucene.index.TestIndexReaderReopen.refreshReader(TestIndexReaderReopen.java:840) > [junit] at > org.apache.lucene.index.TestIndexReaderReopen.access$400(TestIndexReaderReopen.java:47) > [junit] at > org.apache.lucene.index.TestIndexReaderReopen$9.run(TestIndexReaderReopen.java:681) > [junit] at > org.apache.lucene.index.TestIndexReaderReopen$ReaderThread.run(TestIndexReaderReopen.java:822) > [junit] org.apache.lucene.store.LockObtainFailedException: Lock obtain > timed out: org.apache.lucene.store.singleinstancel...@88d319: write.lock > [junit] at org.apache.lucene.store.Lock.obtain(Lock.java:85) > [junit] at > org.apache.lucene.index.DirectoryReader.acquireWriteLock(DirectoryReader.java:666) > [junit] at > org.apache.lucene.index.IndexReader.setNorm(IndexReader.java:994) > [junit] at > org.apache.lucene.index.IndexReader.setNorm(IndexReader.java:1020) > [junit] at > org.apache.lucene.index.TestIndexReaderReopen$8.modifyIndex(TestIndexReaderReopen.java:634) > [junit] at > org.apache.lucene.index.TestIndexReaderReopen.refreshReader(TestIndexReaderReopen.java:840) > [junit] at > org.apache.lucene.index.TestIndexReaderReopen.access$400(TestIndexReaderReopen.java:47) > [junit] at > org.apache.lucene.index.TestIndexReaderReopen$9.run(TestIndexReaderReopen.java:681) > [junit] at > org.apache.lucene.index.TestIndexReaderReopen$ReaderThread.run(TestIndexReaderReopen.java:822) > ... > [junit] - --- > [junit] Testcase: > testThreadSafety(org.apache.lucene.index.TestIndexReaderReopen):FAILED > [junit] Error occurred in thread Thread-36: > [junit] Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@6ac615: write.lock > [junit] junit.framework.AssertionFailedError: Error occurred in thread > Thread-36: > [junit] Lock obtain timed out: > org.apache.lucene.store.singleinstancel...@6ac615: write.lock > [junit] at > org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety(TestIndexReaderReopen.java:764) > [junit] > [junit] > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1794) implement reusableTokenStream for all contrib analyzers
[ https://issues.apache.org/jira/browse/LUCENE-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743721#action_12743721 ] Yonik Seeley commented on LUCENE-1794: -- Patch looks good - do you plan on committing soon Robert? > implement reusableTokenStream for all contrib analyzers > --- > > Key: LUCENE-1794 > URL: https://issues.apache.org/jira/browse/LUCENE-1794 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers >Reporter: Robert Muir >Assignee: Robert Muir >Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1794.patch, LUCENE-1794.patch, LUCENE-1794.patch, > LUCENE-1794.patch, LUCENE-1794.patch, LUCENE-1794.patch > > > most contrib analyzers do not have an impl for reusableTokenStream > regardless of how expensive the back compat reflection is for indexing speed, > I think we should do this to mitigate any performance costs. hey, overall it > might even be an improvement! > the back compat code for non-final analyzers is already in place so this is > easy money in my opinion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-1812) Static index pruning by in-document term frequency (Carmel pruning)
Static index pruning by in-document term frequency (Carmel pruning) --- Key: LUCENE-1812 URL: https://issues.apache.org/jira/browse/LUCENE-1812 Project: Lucene - Java Issue Type: New Feature Components: contrib/* Affects Versions: 2.9 Reporter: Andrzej Bialecki This module provides tools to produce a subset of input indexes by removing postings data for those terms where their in-document frequency is below a specified threshold. The net effect of this processing is a much smaller index that for common types of queries returns nearly identical top-N results as compared with the original index, but with increased performance. Optionally, stored values and term vectors can also be removed. This functionality is largely independent, so it can be used without term pruning (when term freq. threshold is set to 1). As the threshold value increases, the total size of the index decreases, search performance increases, and recall decreases (i.e. search quality deteriorates). NOTE: especially phrase recall deteriorates significantly at higher threshold values. Primary purpose of this class is to produce small first-tier indexes that fit completely in RAM, and store these indexes using IndexWriter.addIndexes(IndexReader[]). Usually the performance of this class will not be sufficient to use the resulting index view for on-the-fly pruning and searching. NOTE: If the input index is optimized (i.e. doesn't contain deletions) then the index produced via IndexWriter.addIndexes(IndexReader[]) will preserve internal document id-s so that they are in sync with the original index. This means that all other auxiliary information not necessary for first-tier processing, such as some stored fields, can also be removed, to be quickly retrieved on-demand from the original index using the same internal document id. Threshold values can be specified globally (for terms in all fields) using defaultThreshold parameter, and can be overriden using per-field or per-term values supplied in a thresholds map. Keys in this map are either field names, or terms in field:text format. The precedence of these values is the following: first a per-term threshold is used if present, then per-field threshold if present, and finally the default threshold. A command-line tool (PruningTool) is provided for convenience. At this moment it doesn't support all functionality available through API. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1812) Static index pruning by in-document term frequency (Carmel pruning)
[ https://issues.apache.org/jira/browse/LUCENE-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated LUCENE-1812: -- Attachment: pruning.patch Patch relative to the current trunk. > Static index pruning by in-document term frequency (Carmel pruning) > --- > > Key: LUCENE-1812 > URL: https://issues.apache.org/jira/browse/LUCENE-1812 > Project: Lucene - Java > Issue Type: New Feature > Components: contrib/* >Affects Versions: 2.9 >Reporter: Andrzej Bialecki > Attachments: pruning.patch > > > This module provides tools to produce a subset of input indexes by removing > postings data for those terms where their in-document frequency is below a > specified threshold. The net effect of this processing is a much smaller > index that for common types of queries returns nearly identical top-N results > as compared with the original index, but with increased performance. > Optionally, stored values and term vectors can also be removed. This > functionality is largely independent, so it can be used without term pruning > (when term freq. threshold is set to 1). > As the threshold value increases, the total size of the index decreases, > search performance increases, and recall decreases (i.e. search quality > deteriorates). NOTE: especially phrase recall deteriorates significantly at > higher threshold values. > Primary purpose of this class is to produce small first-tier indexes that fit > completely in RAM, and store these indexes using > IndexWriter.addIndexes(IndexReader[]). Usually the performance of this class > will not be sufficient to use the resulting index view for on-the-fly pruning > and searching. > NOTE: If the input index is optimized (i.e. doesn't contain deletions) then > the index produced via IndexWriter.addIndexes(IndexReader[]) will preserve > internal document id-s so that they are in sync with the original index. This > means that all other auxiliary information not necessary for first-tier > processing, such as some stored fields, can also be removed, to be quickly > retrieved on-demand from the original index using the same internal document > id. > Threshold values can be specified globally (for terms in all fields) using > defaultThreshold parameter, and can be overriden using per-field or per-term > values supplied in a thresholds map. Keys in this map are either field names, > or terms in field:text format. The precedence of these values is the > following: first a per-term threshold is used if present, then per-field > threshold if present, and finally the default threshold. > A command-line tool (PruningTool) is provided for convenience. At this moment > it doesn't support all functionality available through API. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1794) implement reusableTokenStream for all contrib analyzers
[ https://issues.apache.org/jira/browse/LUCENE-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743727#action_12743727 ] Robert Muir commented on LUCENE-1794: - Yonik, thanks for reviewing it. I wanted to wait a bit and see if Shai wanted to give a crack at ReusingAnalyzer, but we could do that as a separate issue and then refactor code to use it? > implement reusableTokenStream for all contrib analyzers > --- > > Key: LUCENE-1794 > URL: https://issues.apache.org/jira/browse/LUCENE-1794 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers >Reporter: Robert Muir >Assignee: Robert Muir >Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1794.patch, LUCENE-1794.patch, LUCENE-1794.patch, > LUCENE-1794.patch, LUCENE-1794.patch, LUCENE-1794.patch > > > most contrib analyzers do not have an impl for reusableTokenStream > regardless of how expensive the back compat reflection is for indexing speed, > I think we should do this to mitigate any performance costs. hey, overall it > might even be an improvement! > the back compat code for non-final analyzers is already in place so this is > easy money in my opinion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1794) implement reusableTokenStream for all contrib analyzers
[ https://issues.apache.org/jira/browse/LUCENE-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743728#action_12743728 ] Yonik Seeley commented on LUCENE-1794: -- Yes, I think we should just commit this now - the most important part is that people can create their own reusable tokenstreams from Lucene's tokenizers and token filters. Making an easier to use ReusingAnalyzer can be a separate issue. > implement reusableTokenStream for all contrib analyzers > --- > > Key: LUCENE-1794 > URL: https://issues.apache.org/jira/browse/LUCENE-1794 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers >Reporter: Robert Muir >Assignee: Robert Muir >Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1794.patch, LUCENE-1794.patch, LUCENE-1794.patch, > LUCENE-1794.patch, LUCENE-1794.patch, LUCENE-1794.patch > > > most contrib analyzers do not have an impl for reusableTokenStream > regardless of how expensive the back compat reflection is for indexing speed, > I think we should do this to mitigate any performance costs. hey, overall it > might even be an improvement! > the back compat code for non-final analyzers is already in place so this is > easy money in my opinion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1794) implement reusableTokenStream for all contrib analyzers
[ https://issues.apache.org/jira/browse/LUCENE-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743730#action_12743730 ] Robert Muir commented on LUCENE-1794: - Yonik, ok, I will look over the patch again, but I plan on committing this tonight or tomorrow if nothing comes up. > implement reusableTokenStream for all contrib analyzers > --- > > Key: LUCENE-1794 > URL: https://issues.apache.org/jira/browse/LUCENE-1794 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers >Reporter: Robert Muir >Assignee: Robert Muir >Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1794.patch, LUCENE-1794.patch, LUCENE-1794.patch, > LUCENE-1794.patch, LUCENE-1794.patch, LUCENE-1794.patch > > > most contrib analyzers do not have an impl for reusableTokenStream > regardless of how expensive the back compat reflection is for indexing speed, > I think we should do this to mitigate any performance costs. hey, overall it > might even be an improvement! > the back compat code for non-final analyzers is already in place so this is > easy money in my opinion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-1813) Add option to ReverseStringFilter to mark reversed tokens
Add option to ReverseStringFilter to mark reversed tokens - Key: LUCENE-1813 URL: https://issues.apache.org/jira/browse/LUCENE-1813 Project: Lucene - Java Issue Type: Improvement Components: contrib/analyzers Affects Versions: 2.9 Reporter: Andrzej Bialecki Attachments: reverseMark.patch This patch implements additional functionality in the filter to "mark" reversed tokens with a special marker character (Unicode 0001). This is useful when indexing both straight and reversed tokens (e.g. to implement efficient leading wildcards search). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1813) Add option to ReverseStringFilter to mark reversed tokens
[ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated LUCENE-1813: -- Attachment: reverseMark.patch Patch and unit tests. > Add option to ReverseStringFilter to mark reversed tokens > - > > Key: LUCENE-1813 > URL: https://issues.apache.org/jira/browse/LUCENE-1813 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers >Affects Versions: 2.9 >Reporter: Andrzej Bialecki > Attachments: reverseMark.patch > > > This patch implements additional functionality in the filter to "mark" > reversed tokens with a special marker character (Unicode 0001). This is > useful when indexing both straight and reversed tokens (e.g. to implement > efficient leading wildcards search). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Assigned: (LUCENE-1813) Add option to ReverseStringFilter to mark reversed tokens
[ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir reassigned LUCENE-1813: --- Assignee: Robert Muir > Add option to ReverseStringFilter to mark reversed tokens > - > > Key: LUCENE-1813 > URL: https://issues.apache.org/jira/browse/LUCENE-1813 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers >Affects Versions: 2.9 >Reporter: Andrzej Bialecki >Assignee: Robert Muir > Attachments: reverseMark.patch > > > This patch implements additional functionality in the filter to "mark" > reversed tokens with a special marker character (Unicode 0001). This is > useful when indexing both straight and reversed tokens (e.g. to implement > efficient leading wildcards search). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1813) Add option to ReverseStringFilter to mark reversed tokens
[ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743737#action_12743737 ] Robert Muir commented on LUCENE-1813: - the corresponding solr task (SOLR-1321) is marked as version 1.4 does anyone oppose putting this one in 2.9? > Add option to ReverseStringFilter to mark reversed tokens > - > > Key: LUCENE-1813 > URL: https://issues.apache.org/jira/browse/LUCENE-1813 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers >Affects Versions: 2.9 >Reporter: Andrzej Bialecki >Assignee: Robert Muir > Attachments: reverseMark.patch > > > This patch implements additional functionality in the filter to "mark" > reversed tokens with a special marker character (Unicode 0001). This is > useful when indexing both straight and reversed tokens (e.g. to implement > efficient leading wildcards search). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1813) Add option to ReverseStringFilter to mark reversed tokens
[ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743740#action_12743740 ] Robert Muir commented on LUCENE-1813: - andrzej, the reverse() methods are public, can you supply default impls (withMark=false) just in the case that someone is using them? alternatively, maybe the reverse() methods could stay the same, and the marking could happen in incrementToken() ? > Add option to ReverseStringFilter to mark reversed tokens > - > > Key: LUCENE-1813 > URL: https://issues.apache.org/jira/browse/LUCENE-1813 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers >Affects Versions: 2.9 >Reporter: Andrzej Bialecki >Assignee: Robert Muir > Attachments: reverseMark.patch > > > This patch implements additional functionality in the filter to "mark" > reversed tokens with a special marker character (Unicode 0001). This is > useful when indexing both straight and reversed tokens (e.g. to implement > efficient leading wildcards search). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1813) Add option to ReverseStringFilter to mark reversed tokens
[ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743743#action_12743743 ] Andrzej Bialecki commented on LUCENE-1813: --- Either way is fine with me. To preserve the public API I think it's better to move this marking logic to incrementToken(). I'll prepare an updated patch. > Add option to ReverseStringFilter to mark reversed tokens > - > > Key: LUCENE-1813 > URL: https://issues.apache.org/jira/browse/LUCENE-1813 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers >Affects Versions: 2.9 >Reporter: Andrzej Bialecki >Assignee: Robert Muir > Attachments: reverseMark.patch > > > This patch implements additional functionality in the filter to "mark" > reversed tokens with a special marker character (Unicode 0001). This is > useful when indexing both straight and reversed tokens (e.g. to implement > efficient leading wildcards search). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1790) Add Boosting Function Term Query and Some Payload Query refactorings
[ https://issues.apache.org/jira/browse/LUCENE-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743747#action_12743747 ] Mark Miller commented on LUCENE-1790: - BoostingFunctionTermQuery implements equals but not hashcode - important for a query class I think. > Add Boosting Function Term Query and Some Payload Query refactorings > > > Key: LUCENE-1790 > URL: https://issues.apache.org/jira/browse/LUCENE-1790 > Project: Lucene - Java > Issue Type: New Feature >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1790-position.patch, LUCENE-1790.patch, > LUCENE-1790.patch, LUCENE-1790.patch > > > Similar to the BoostingTermQuery, the BoostingFunctionTermQuery is a > SpanTermQuery, but the difference is the payload score for a doc is not the > average of all the payloads, but applies a function to them instead. > BoostingTermQuery becomes a BoostingFunctionTermQuery with an > AveragePayloadFunction applied to it. > Also add marker interface to indicate PayloadQuery types. Refactor > Similarity.scorePayload to also take in the doc id. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1790) Add Boosting Function Term Query and Some Payload Query refactorings
[ https://issues.apache.org/jira/browse/LUCENE-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-1790: Attachment: LUCENE-1790.patch remove some unused imports added missing license header Added hashCode to BoostingFunctionTermQuery Added hashCode/equals to PayloadFunction classes added hashcode/equals to query - really it should be handling the equals/hashcode for boost, not subclasses (which will be likely to forget it - you should check super classes for these things anyway as well). BoostingFunctionTermQuery is a subclass of SpanTermQuery, but both of them use a weak equals method (using instanceof) so while BoostingFunctionTermQuery.equals(SpanTermQuery) should equal SpanTermQuery.equals(BoostFunctionTermQuery), it doesn't. Added new hashCode/equals for both classes that work properly. Added a couple tests for these fixes > Add Boosting Function Term Query and Some Payload Query refactorings > > > Key: LUCENE-1790 > URL: https://issues.apache.org/jira/browse/LUCENE-1790 > Project: Lucene - Java > Issue Type: New Feature >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1790-position.patch, LUCENE-1790.patch, > LUCENE-1790.patch, LUCENE-1790.patch, LUCENE-1790.patch > > > Similar to the BoostingTermQuery, the BoostingFunctionTermQuery is a > SpanTermQuery, but the difference is the payload score for a doc is not the > average of all the payloads, but applies a function to them instead. > BoostingTermQuery becomes a BoostingFunctionTermQuery with an > AveragePayloadFunction applied to it. > Also add marker interface to indicate PayloadQuery types. Refactor > Similarity.scorePayload to also take in the doc id. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1808) make Query.createWeight public (or add back Query.createQueryWeight())
[ https://issues.apache.org/jira/browse/LUCENE-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743754#action_12743754 ] Mark Miller commented on LUCENE-1808: - I havn't yet figured out how to do this without breaking back compat - I think this was an issue before as well. I'd have to dig it up, but some user complained about a similar issue when QueryWeight was put in. If you add createQueryWeight as a public method, then all of the Lucene classes have to be changed to call it - otherwise, if you override it in a user Query, it won't be called on that Query. But anyone with an external Query class that overrode createWeight will not call createQueryWeight, and won't work correctly with classes that override it. I guess if we make it final it would close that loop hole, but then thats a loss from createWeight where you could override, and is still a back compat break? > make Query.createWeight public (or add back Query.createQueryWeight()) > -- > > Key: LUCENE-1808 > URL: https://issues.apache.org/jira/browse/LUCENE-1808 > Project: Lucene - Java > Issue Type: Improvement > Components: Query/Scoring >Affects Versions: 2.9 >Reporter: Tim Smith >Assignee: Mark Miller > > Now that the QueryWeight class has been removed, the public QueryWeight > createQueryWeight() method on Query was also removed > i have cases where i want to create a weight for a sub query (outside of the > org.apache.lucene.search package) and i don't want the weight normalized > (think BooleanQuery outside of the o.a.l.search package) > in order to do this, i have to create a static Utils class inside > o.a.l.search, pass in the Query and searcher, and have the static method call > the protected createWeight method > this should not be necessary > This could be fixed in one of 2 ways: > 1. make createWeight() public on Query (breaks back compat) > 2. add the following method: > {code} > public Weight createQueryWeight(Searcher searcher) throws IOException { > return createWeight(searcher); > } > {code} > createWeight(Searcher) should then be deprectated in favor of the publicly > accessible method -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1808) make Query.createWeight public (or add back Query.createQueryWeight())
[ https://issues.apache.org/jira/browse/LUCENE-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743755#action_12743755 ] Mark Miller commented on LUCENE-1808: - 1. make createWeight() public on Query (breaks back compat) hmmm - I took that as fact, but is that true? Can't you open up visibility without breaking back compat? Time to look this stuff up again ... > make Query.createWeight public (or add back Query.createQueryWeight()) > -- > > Key: LUCENE-1808 > URL: https://issues.apache.org/jira/browse/LUCENE-1808 > Project: Lucene - Java > Issue Type: Improvement > Components: Query/Scoring >Affects Versions: 2.9 >Reporter: Tim Smith >Assignee: Mark Miller > > Now that the QueryWeight class has been removed, the public QueryWeight > createQueryWeight() method on Query was also removed > i have cases where i want to create a weight for a sub query (outside of the > org.apache.lucene.search package) and i don't want the weight normalized > (think BooleanQuery outside of the o.a.l.search package) > in order to do this, i have to create a static Utils class inside > o.a.l.search, pass in the Query and searcher, and have the static method call > the protected createWeight method > this should not be necessary > This could be fixed in one of 2 ways: > 1. make createWeight() public on Query (breaks back compat) > 2. add the following method: > {code} > public Weight createQueryWeight(Searcher searcher) throws IOException { > return createWeight(searcher); > } > {code} > createWeight(Searcher) should then be deprectated in favor of the publicly > accessible method -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1794) implement reusableTokenStream for all contrib analyzers
[ https://issues.apache.org/jira/browse/LUCENE-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-1794: --- Attachment: LUCENE-1794-reusing-analyzer.patch Apologies for the late post, I had a busy weekend. Attached patch includes ReusingAnalyzer, Streams in Analyzer and javadocs. Robert, please have a look. I think extending it should be fairly straightforward and we can probably finish the integration in a couple of days. However if you discover it isn't the case, we can separate it into a different issue. Also, I did not include a note in CHANGES. Once you're done merging it into the larger patch, I can help w/ the javadocs and CHANGES if required. > implement reusableTokenStream for all contrib analyzers > --- > > Key: LUCENE-1794 > URL: https://issues.apache.org/jira/browse/LUCENE-1794 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers >Reporter: Robert Muir >Assignee: Robert Muir >Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1794-reusing-analyzer.patch, LUCENE-1794.patch, > LUCENE-1794.patch, LUCENE-1794.patch, LUCENE-1794.patch, LUCENE-1794.patch, > LUCENE-1794.patch > > > most contrib analyzers do not have an impl for reusableTokenStream > regardless of how expensive the back compat reflection is for indexing speed, > I think we should do this to mitigate any performance costs. hey, overall it > might even be an improvement! > the back compat code for non-final analyzers is already in place so this is > easy money in my opinion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1808) make Query.createWeight public (or add back Query.createQueryWeight())
[ https://issues.apache.org/jira/browse/LUCENE-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743760#action_12743760 ] Shai Erera commented on LUCENE-1808: bq. Can't you open up visibility without breaking back compat? I don't see why this would break back-compat. I can always extend a class and make a package-private or protected method public. I cannot reduce visibility, but can always increase it. About the issues w/ createQueryWeight, I think you're referring to the chain of comments that started here: https://issues.apache.org/jira/browse/LUCENE-1630?focusedCommentId=12723976&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12723976. Is that what you were talking about? > make Query.createWeight public (or add back Query.createQueryWeight()) > -- > > Key: LUCENE-1808 > URL: https://issues.apache.org/jira/browse/LUCENE-1808 > Project: Lucene - Java > Issue Type: Improvement > Components: Query/Scoring >Affects Versions: 2.9 >Reporter: Tim Smith >Assignee: Mark Miller > > Now that the QueryWeight class has been removed, the public QueryWeight > createQueryWeight() method on Query was also removed > i have cases where i want to create a weight for a sub query (outside of the > org.apache.lucene.search package) and i don't want the weight normalized > (think BooleanQuery outside of the o.a.l.search package) > in order to do this, i have to create a static Utils class inside > o.a.l.search, pass in the Query and searcher, and have the static method call > the protected createWeight method > this should not be necessary > This could be fixed in one of 2 ways: > 1. make createWeight() public on Query (breaks back compat) > 2. add the following method: > {code} > public Weight createQueryWeight(Searcher searcher) throws IOException { > return createWeight(searcher); > } > {code} > createWeight(Searcher) should then be deprectated in favor of the publicly > accessible method -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1808) make Query.createWeight public (or add back Query.createQueryWeight())
[ https://issues.apache.org/jira/browse/LUCENE-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743761#action_12743761 ] Shai Erera commented on LUCENE-1808: bq. I can always extend a class and make a package-private or protected method public. I cannot reduce visibility, but can always increase it. Ohh ... after hitting Submit I understood why it would break back-compat - if I extend Query and override createWeight, and leave it 'protected' I won't compile if we make it public, since I'll be reducing visibility. > make Query.createWeight public (or add back Query.createQueryWeight()) > -- > > Key: LUCENE-1808 > URL: https://issues.apache.org/jira/browse/LUCENE-1808 > Project: Lucene - Java > Issue Type: Improvement > Components: Query/Scoring >Affects Versions: 2.9 >Reporter: Tim Smith >Assignee: Mark Miller > > Now that the QueryWeight class has been removed, the public QueryWeight > createQueryWeight() method on Query was also removed > i have cases where i want to create a weight for a sub query (outside of the > org.apache.lucene.search package) and i don't want the weight normalized > (think BooleanQuery outside of the o.a.l.search package) > in order to do this, i have to create a static Utils class inside > o.a.l.search, pass in the Query and searcher, and have the static method call > the protected createWeight method > this should not be necessary > This could be fixed in one of 2 ways: > 1. make createWeight() public on Query (breaks back compat) > 2. add the following method: > {code} > public Weight createQueryWeight(Searcher searcher) throws IOException { > return createWeight(searcher); > } > {code} > createWeight(Searcher) should then be deprectated in favor of the publicly > accessible method -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1808) make Query.createWeight public (or add back Query.createQueryWeight())
[ https://issues.apache.org/jira/browse/LUCENE-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743762#action_12743762 ] Mark Miller commented on LUCENE-1808: - Ahh - nice catch. I'm not sure what to do here then... The previous possible break (I didn't actually look into it so I dunno) was referenced here: http://search.lucidimagination.com/search/document/41004a9436799675/spanquery_and_boostingtermquery_oddities > make Query.createWeight public (or add back Query.createQueryWeight()) > -- > > Key: LUCENE-1808 > URL: https://issues.apache.org/jira/browse/LUCENE-1808 > Project: Lucene - Java > Issue Type: Improvement > Components: Query/Scoring >Affects Versions: 2.9 >Reporter: Tim Smith >Assignee: Mark Miller > > Now that the QueryWeight class has been removed, the public QueryWeight > createQueryWeight() method on Query was also removed > i have cases where i want to create a weight for a sub query (outside of the > org.apache.lucene.search package) and i don't want the weight normalized > (think BooleanQuery outside of the o.a.l.search package) > in order to do this, i have to create a static Utils class inside > o.a.l.search, pass in the Query and searcher, and have the static method call > the protected createWeight method > this should not be necessary > This could be fixed in one of 2 ways: > 1. make createWeight() public on Query (breaks back compat) > 2. add the following method: > {code} > public Weight createQueryWeight(Searcher searcher) throws IOException { > return createWeight(searcher); > } > {code} > createWeight(Searcher) should then be deprectated in favor of the publicly > accessible method -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1794) implement reusableTokenStream for all contrib analyzers
[ https://issues.apache.org/jira/browse/LUCENE-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743763#action_12743763 ] Yonik Seeley commented on LUCENE-1794: -- Perhaps the Streams class should be part of ReusingAnalyzer and not Analyzer? It's a specific implementation of a reusable token stream, not part of the Analyzer interface. > implement reusableTokenStream for all contrib analyzers > --- > > Key: LUCENE-1794 > URL: https://issues.apache.org/jira/browse/LUCENE-1794 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers >Reporter: Robert Muir >Assignee: Robert Muir >Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1794-reusing-analyzer.patch, LUCENE-1794.patch, > LUCENE-1794.patch, LUCENE-1794.patch, LUCENE-1794.patch, LUCENE-1794.patch, > LUCENE-1794.patch > > > most contrib analyzers do not have an impl for reusableTokenStream > regardless of how expensive the back compat reflection is for indexing speed, > I think we should do this to mitigate any performance costs. hey, overall it > might even be an improvement! > the back compat code for non-final analyzers is already in place so this is > easy money in my opinion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1813) Add option to ReverseStringFilter to mark reversed tokens
[ https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated LUCENE-1813: -- Attachment: reverseMark-2.patch Updated patch that moves the marking logic to incrementToken(). > Add option to ReverseStringFilter to mark reversed tokens > - > > Key: LUCENE-1813 > URL: https://issues.apache.org/jira/browse/LUCENE-1813 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers >Affects Versions: 2.9 >Reporter: Andrzej Bialecki >Assignee: Robert Muir > Attachments: reverseMark-2.patch, reverseMark.patch > > > This patch implements additional functionality in the filter to "mark" > reversed tokens with a special marker character (Unicode 0001). This is > useful when indexing both straight and reversed tokens (e.g. to implement > efficient leading wildcards search). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1808) make Query.createWeight public (or add back Query.createQueryWeight())
[ https://issues.apache.org/jira/browse/LUCENE-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743767#action_12743767 ] Shai Erera commented on LUCENE-1808: When I changed createQueryWeight from protected to public, it was because we introduced it in 2.9 only, so it was possible. Perhaps we should deprecate createWeight, and add back createQueryWeight as public? > make Query.createWeight public (or add back Query.createQueryWeight()) > -- > > Key: LUCENE-1808 > URL: https://issues.apache.org/jira/browse/LUCENE-1808 > Project: Lucene - Java > Issue Type: Improvement > Components: Query/Scoring >Affects Versions: 2.9 >Reporter: Tim Smith >Assignee: Mark Miller > > Now that the QueryWeight class has been removed, the public QueryWeight > createQueryWeight() method on Query was also removed > i have cases where i want to create a weight for a sub query (outside of the > org.apache.lucene.search package) and i don't want the weight normalized > (think BooleanQuery outside of the o.a.l.search package) > in order to do this, i have to create a static Utils class inside > o.a.l.search, pass in the Query and searcher, and have the static method call > the protected createWeight method > this should not be necessary > This could be fixed in one of 2 ways: > 1. make createWeight() public on Query (breaks back compat) > 2. add the following method: > {code} > public Weight createQueryWeight(Searcher searcher) throws IOException { > return createWeight(searcher); > } > {code} > createWeight(Searcher) should then be deprectated in favor of the publicly > accessible method -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1794) implement reusableTokenStream for all contrib analyzers
[ https://issues.apache.org/jira/browse/LUCENE-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743770#action_12743770 ] Shai Erera commented on LUCENE-1794: Well ... it's true and false at the same time. On one hand, I think Analyzer should impl reusableTokenStream just like ReusingAnalyzer, but we can't do that because of back-compat. On the other hand, Streams does belong to ReusingAnalyzer because it makes use of it. What I thought was that maybe someone would want to make use of Streams w/o extending Analyzer. And ... we may want to constraint setPreviousTokenStream to Streams, or TokenStream or a generic type of thing, to avoid casting and be more type-safe. I wonder if we'll stay w/ Analyzer.reusableTS as it is forever, or will we break it one day to be like ReusingAnalyzer (and by that deprecate ReusingAnalyzer?). I guess that if we think for the long term that ReusingAnalyzer will stay, and hence most Analyzers will actually be ReusingAnalyzer extension, then I'm ok w/ moving Streams into ReusingAnalyzer. But keeping it in Analyzer will allow us in the future to constrain prevTokenStream to be of that type and not a generic Object. > implement reusableTokenStream for all contrib analyzers > --- > > Key: LUCENE-1794 > URL: https://issues.apache.org/jira/browse/LUCENE-1794 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers >Reporter: Robert Muir >Assignee: Robert Muir >Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1794-reusing-analyzer.patch, LUCENE-1794.patch, > LUCENE-1794.patch, LUCENE-1794.patch, LUCENE-1794.patch, LUCENE-1794.patch, > LUCENE-1794.patch > > > most contrib analyzers do not have an impl for reusableTokenStream > regardless of how expensive the back compat reflection is for indexing speed, > I think we should do this to mitigate any performance costs. hey, overall it > might even be an improvement! > the back compat code for non-final analyzers is already in place so this is > easy money in my opinion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1808) make Query.createWeight public (or add back Query.createQueryWeight())
[ https://issues.apache.org/jira/browse/LUCENE-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743771#action_12743771 ] Mark Miller commented on LUCENE-1808: - Done you have the above problem though: {quote} If you add createQueryWeight as a public method, then all of the Lucene classes have to be changed to call it - otherwise, if you override it in a user Query, it won't be called on that Query. But anyone with an external Query class that calls {<-FIXED} createWeight will not call createQueryWeight, and won't work correctly with classes that override it. I guess if we make it final it would close that loop hole, but then thats a loss from createWeight where you could override, and is still a back compat break? {quote} > make Query.createWeight public (or add back Query.createQueryWeight()) > -- > > Key: LUCENE-1808 > URL: https://issues.apache.org/jira/browse/LUCENE-1808 > Project: Lucene - Java > Issue Type: Improvement > Components: Query/Scoring >Affects Versions: 2.9 >Reporter: Tim Smith >Assignee: Mark Miller > > Now that the QueryWeight class has been removed, the public QueryWeight > createQueryWeight() method on Query was also removed > i have cases where i want to create a weight for a sub query (outside of the > org.apache.lucene.search package) and i don't want the weight normalized > (think BooleanQuery outside of the o.a.l.search package) > in order to do this, i have to create a static Utils class inside > o.a.l.search, pass in the Query and searcher, and have the static method call > the protected createWeight method > this should not be necessary > This could be fixed in one of 2 ways: > 1. make createWeight() public on Query (breaks back compat) > 2. add the following method: > {code} > public Weight createQueryWeight(Searcher searcher) throws IOException { > return createWeight(searcher); > } > {code} > createWeight(Searcher) should then be deprectated in favor of the publicly > accessible method -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-1808) make Query.createWeight public (or add back Query.createQueryWeight())
[ https://issues.apache.org/jira/browse/LUCENE-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743771#action_12743771 ] Mark Miller edited comment on LUCENE-1808 at 8/15/09 1:43 PM: -- Done you have the above problem though: {quote} If you add createQueryWeight as a public method, then all of the Lucene classes have to be changed to call it - otherwise, if you override it in a user Query, it won't be called on that Query. But anyone with an external Query class that calls [<-FIXED] createWeight will not call createQueryWeight, and won't work correctly with classes that override it. I guess if we make it final it would close that loop hole, but then thats a loss from createWeight where you could override, and is still a back compat break? {quote} was (Author: markrmil...@gmail.com): Done you have the above problem though: {quote} If you add createQueryWeight as a public method, then all of the Lucene classes have to be changed to call it - otherwise, if you override it in a user Query, it won't be called on that Query. But anyone with an external Query class that calls {<-FIXED} createWeight will not call createQueryWeight, and won't work correctly with classes that override it. I guess if we make it final it would close that loop hole, but then thats a loss from createWeight where you could override, and is still a back compat break? {quote} > make Query.createWeight public (or add back Query.createQueryWeight()) > -- > > Key: LUCENE-1808 > URL: https://issues.apache.org/jira/browse/LUCENE-1808 > Project: Lucene - Java > Issue Type: Improvement > Components: Query/Scoring >Affects Versions: 2.9 >Reporter: Tim Smith >Assignee: Mark Miller > > Now that the QueryWeight class has been removed, the public QueryWeight > createQueryWeight() method on Query was also removed > i have cases where i want to create a weight for a sub query (outside of the > org.apache.lucene.search package) and i don't want the weight normalized > (think BooleanQuery outside of the o.a.l.search package) > in order to do this, i have to create a static Utils class inside > o.a.l.search, pass in the Query and searcher, and have the static method call > the protected createWeight method > this should not be necessary > This could be fixed in one of 2 ways: > 1. make createWeight() public on Query (breaks back compat) > 2. add the following method: > {code} > public Weight createQueryWeight(Searcher searcher) throws IOException { > return createWeight(searcher); > } > {code} > createWeight(Searcher) should then be deprectated in favor of the publicly > accessible method -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1794) implement reusableTokenStream for all contrib analyzers
[ https://issues.apache.org/jira/browse/LUCENE-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743774#action_12743774 ] Robert Muir commented on LUCENE-1794: - Shai, I will take a look at your patch as soon as I am at a real computer. thanks for your work in advance, we maybe should put it on another issue though just to keep the scope of this one reasonably contained. {quote} And ... we may want to constraint setPreviousTokenStream to Streams, or TokenStream or a generic type of thing, to avoid casting and be more type-safe. {quote} see QueryAutoStopWordAnalyzer in my patch for a counter-example to this. in this case, it is a Set, because it is dependent upon field. > implement reusableTokenStream for all contrib analyzers > --- > > Key: LUCENE-1794 > URL: https://issues.apache.org/jira/browse/LUCENE-1794 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers >Reporter: Robert Muir >Assignee: Robert Muir >Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1794-reusing-analyzer.patch, LUCENE-1794.patch, > LUCENE-1794.patch, LUCENE-1794.patch, LUCENE-1794.patch, LUCENE-1794.patch, > LUCENE-1794.patch > > > most contrib analyzers do not have an impl for reusableTokenStream > regardless of how expensive the back compat reflection is for indexing speed, > I think we should do this to mitigate any performance costs. hey, overall it > might even be an improvement! > the back compat code for non-final analyzers is already in place so this is > easy money in my opinion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-1814) Some Lucene tests try and use a Junit Assert in new threads
Some Lucene tests try and use a Junit Assert in new threads --- Key: LUCENE-1814 URL: https://issues.apache.org/jira/browse/LUCENE-1814 Project: Lucene - Java Issue Type: Bug Reporter: Mark Miller Priority: Minor There are a few cases in Lucene tests where JUnit Asserts are used inside a new threads run method - this won't work because Junit throws an exception when a call to Assert fails - that will kill the thread, but the exception will not propagate to JUnit - so unless a failure is caused later from the thread termination, the Asserts are invalid. TestThreadSafe TestStressIndexing2 TestStringIntern -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1794) implement reusableTokenStream for all contrib analyzers
[ https://issues.apache.org/jira/browse/LUCENE-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743776#action_12743776 ] Yonik Seeley commented on LUCENE-1794: -- In general, we should strive to treat our base abstract classes like interfaces, with the ability to provide default implementations to avoid back compatibility breaks (while avoiding adding members or non-overrideable methods). One could make the case that the ClosableThreadLocal should not be in Analyzer either, but it's been there long enough now, it would break back compat to move it. bq. What I thought was that maybe someone would want to make use of Streams w/o extending Analyzer. They still can - ReusableAnalyzer.Streams. bq. But keeping it in Analyzer will allow us in the future to constrain prevTokenStream to be of that type and not a generic Object. Doesn't seem like we should force all tokenstreams to be reusable, or constrain the exact form of how a reusable token stream is obtained. > implement reusableTokenStream for all contrib analyzers > --- > > Key: LUCENE-1794 > URL: https://issues.apache.org/jira/browse/LUCENE-1794 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers >Reporter: Robert Muir >Assignee: Robert Muir >Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1794-reusing-analyzer.patch, LUCENE-1794.patch, > LUCENE-1794.patch, LUCENE-1794.patch, LUCENE-1794.patch, LUCENE-1794.patch, > LUCENE-1794.patch > > > most contrib analyzers do not have an impl for reusableTokenStream > regardless of how expensive the back compat reflection is for indexing speed, > I think we should do this to mitigate any performance costs. hey, overall it > might even be an improvement! > the back compat code for non-final analyzers is already in place so this is > easy money in my opinion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1794) implement reusableTokenStream for all contrib analyzers
[ https://issues.apache.org/jira/browse/LUCENE-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743778#action_12743778 ] Shai Erera commented on LUCENE-1794: I guess you're both right. I thought that one day we'll cancel ReusingAnalyzer and pull it up to Analyzer, but it looks like ReusingAnalyzer makes sense to stay, and so we can move Streams to it. Robert, if possible, I'd like to get this one in as part of this issue. The reason is that you already modified all Analyzers to impl reusableTokenStream. I'm afraid that if we'll do it in another issue, some Analyzers will be skipped over. If you want, I can apply this to your patch and post pack an updated one tomorrow. > implement reusableTokenStream for all contrib analyzers > --- > > Key: LUCENE-1794 > URL: https://issues.apache.org/jira/browse/LUCENE-1794 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers >Reporter: Robert Muir >Assignee: Robert Muir >Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1794-reusing-analyzer.patch, LUCENE-1794.patch, > LUCENE-1794.patch, LUCENE-1794.patch, LUCENE-1794.patch, LUCENE-1794.patch, > LUCENE-1794.patch > > > most contrib analyzers do not have an impl for reusableTokenStream > regardless of how expensive the back compat reflection is for indexing speed, > I think we should do this to mitigate any performance costs. hey, overall it > might even be an improvement! > the back compat code for non-final analyzers is already in place so this is > easy money in my opinion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1808) make Query.createWeight public (or add back Query.createQueryWeight())
[ https://issues.apache.org/jira/browse/LUCENE-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743780#action_12743780 ] Shai Erera commented on LUCENE-1808: I thought that's partly we took care of here: https://issues.apache.org/jira/browse/LUCENE-1630?focusedCommentId=12723996&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12723996 True, if someone overrides createWeight (he ought to) and call it specifically, createQueryWeight won't be called. But then, all of our code will call createQueryWeight. And if we deprecate createWeight, those who call it directly will need to move to createQueryWeight, so I think we should be fine? Anyway, I may not think too clear at this hour (1 AM), so if I misunderstood something, I'll read it again in the morning. > make Query.createWeight public (or add back Query.createQueryWeight()) > -- > > Key: LUCENE-1808 > URL: https://issues.apache.org/jira/browse/LUCENE-1808 > Project: Lucene - Java > Issue Type: Improvement > Components: Query/Scoring >Affects Versions: 2.9 >Reporter: Tim Smith >Assignee: Mark Miller > > Now that the QueryWeight class has been removed, the public QueryWeight > createQueryWeight() method on Query was also removed > i have cases where i want to create a weight for a sub query (outside of the > org.apache.lucene.search package) and i don't want the weight normalized > (think BooleanQuery outside of the o.a.l.search package) > in order to do this, i have to create a static Utils class inside > o.a.l.search, pass in the Query and searcher, and have the static method call > the protected createWeight method > this should not be necessary > This could be fixed in one of 2 ways: > 1. make createWeight() public on Query (breaks back compat) > 2. add the following method: > {code} > public Weight createQueryWeight(Searcher searcher) throws IOException { > return createWeight(searcher); > } > {code} > createWeight(Searcher) should then be deprectated in favor of the publicly > accessible method -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1794) implement reusableTokenStream for all contrib analyzers
[ https://issues.apache.org/jira/browse/LUCENE-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743786#action_12743786 ] Mark Miller commented on LUCENE-1794: - To not break back compat, everything has got to work even if they don't yet move from the deprecated method. > implement reusableTokenStream for all contrib analyzers > --- > > Key: LUCENE-1794 > URL: https://issues.apache.org/jira/browse/LUCENE-1794 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers >Reporter: Robert Muir >Assignee: Robert Muir >Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1794-reusing-analyzer.patch, LUCENE-1794.patch, > LUCENE-1794.patch, LUCENE-1794.patch, LUCENE-1794.patch, LUCENE-1794.patch, > LUCENE-1794.patch > > > most contrib analyzers do not have an impl for reusableTokenStream > regardless of how expensive the back compat reflection is for indexing speed, > I think we should do this to mitigate any performance costs. hey, overall it > might even be an improvement! > the back compat code for non-final analyzers is already in place so this is > easy money in my opinion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1794) implement reusableTokenStream for all contrib analyzers
[ https://issues.apache.org/jira/browse/LUCENE-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743788#action_12743788 ] Robert Muir commented on LUCENE-1794: - {quote} Robert, if possible, I'd like to get this one in as part of this issue. The reason is that you already modified all Analyzers to impl reusableTokenStream. I'm afraid that if we'll do it in another issue, some Analyzers will be skipped over. If you want, I can apply this to your patch and post pack an updated one tomorrow. {quote} Shai, this is a valid concern. But also lets not forget analyzers that already implement reusableTS that are not a part of this patch (yet should be changed to extend ReusingAnalyzer)... examples include collation/* analyzers/fa, etc. But even before this I think we should make sure everyone is happy with ReusingAnalyzer itself... this is the only reason I think it might merit another issue... this patch is already a little unwieldy because I crept the scope to include reset(Reader) and reset() methods for tokenstreams that keep state... > implement reusableTokenStream for all contrib analyzers > --- > > Key: LUCENE-1794 > URL: https://issues.apache.org/jira/browse/LUCENE-1794 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers >Reporter: Robert Muir >Assignee: Robert Muir >Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1794-reusing-analyzer.patch, LUCENE-1794.patch, > LUCENE-1794.patch, LUCENE-1794.patch, LUCENE-1794.patch, LUCENE-1794.patch, > LUCENE-1794.patch > > > most contrib analyzers do not have an impl for reusableTokenStream > regardless of how expensive the back compat reflection is for indexing speed, > I think we should do this to mitigate any performance costs. hey, overall it > might even be an improvement! > the back compat code for non-final analyzers is already in place so this is > easy money in my opinion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1794) implement reusableTokenStream for all contrib analyzers
[ https://issues.apache.org/jira/browse/LUCENE-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743791#action_12743791 ] Yonik Seeley commented on LUCENE-1794: -- bq. But even before this I think we should make sure everyone is happy with ReusingAnalyzer itself... this is the only reason I think it might merit another issue +1 The ReusingAnalyzer brings up other issues of protocol - right now consumers like lucene indexing call reset() on the stream, but I see the prototype ReusingAnalyzer also calling reset() on the stream. > implement reusableTokenStream for all contrib analyzers > --- > > Key: LUCENE-1794 > URL: https://issues.apache.org/jira/browse/LUCENE-1794 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers >Reporter: Robert Muir >Assignee: Robert Muir >Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1794-reusing-analyzer.patch, LUCENE-1794.patch, > LUCENE-1794.patch, LUCENE-1794.patch, LUCENE-1794.patch, LUCENE-1794.patch, > LUCENE-1794.patch > > > most contrib analyzers do not have an impl for reusableTokenStream > regardless of how expensive the back compat reflection is for indexing speed, > I think we should do this to mitigate any performance costs. hey, overall it > might even be an improvement! > the back compat code for non-final analyzers is already in place so this is > easy money in my opinion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Reopened: (LUCENE-1522) another highlighter
[ https://issues.apache.org/jira/browse/LUCENE-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi reopened LUCENE-1522: There is a bug in BaseFragmentsBuilder. When the highlighting field is not stored, StringIndexOutOfBoundException will be thrown. I'd like to reopen this issue so the fix can be included in 2.9. I'll post the fix soon. > another highlighter > --- > > Key: LUCENE-1522 > URL: https://issues.apache.org/jira/browse/LUCENE-1522 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/highlighter >Reporter: Koji Sekiguchi >Assignee: Michael McCandless >Priority: Minor > Fix For: 2.9 > > Attachments: colored-tag-sample.png, > LUCENE-1522-multiValued-test.patch, LUCENE-1522.patch, LUCENE-1522.patch, > LUCENE-1522.patch, LUCENE-1522.patch, LUCENE-1522.patch, LUCENE-1522.patch, > LUCENE-1522.patch > > > I've written this highlighter for my project to support bi-gram token stream > (general token stream (e.g. WhitespaceTokenizer) also supported. see test > code in patch). The idea was inherited from my previous project with my > colleague and LUCENE-644. This approach needs highlight fields to be > TermVector.WITH_POSITIONS_OFFSETS, but is fast and can support N-grams. This > depends on LUCENE-1448 to get refined term offsets. > usage: > {code:java} > TopDocs docs = searcher.search( query, 10 ); > Highlighter h = new Highlighter(); > FieldQuery fq = h.getFieldQuery( query ); > for( ScoreDoc scoreDoc : docs.scoreDocs ){ > // fieldName="content", fragCharSize=100, numFragments=3 > String[] fragments = h.getBestFragments( fq, reader, scoreDoc.doc, > "content", 100, 3 ); > if( fragments != null ){ > for( String fragment : fragments ) > System.out.println( fragment ); > } > } > {code} > features: > - fast for large docs > - supports not only whitespace-based token stream, but also "fixed size" > N-gram (e.g. (2,2), not (1,3)) (can solve LUCENE-1489) > - supports PhraseQuery, phrase-unit highlighting with slops > {noformat} > q="w1 w2" > w1 w2 > --- > q="w1 w2"~1 > w1 w3 w2 w3 w1 w2 > {noformat} > - highlight fields need to be TermVector.WITH_POSITIONS_OFFSETS > - easy to apply patch due to independent package (contrib/highlighter2) > - uses Java 1.5 > - looks query boost to score fragments (currently doesn't see idf, but it > should be possible) > - pluggable FragListBuilder > - pluggable FragmentsBuilder > to do: > - term positions can be unnecessary when phraseHighlight==false > - collects performance numbers -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1522) another highlighter
[ https://issues.apache.org/jira/browse/LUCENE-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated LUCENE-1522: --- Attachment: LUCENE-1522-fix-SIOOBE.patch The patch includes the fix and a test case. > another highlighter > --- > > Key: LUCENE-1522 > URL: https://issues.apache.org/jira/browse/LUCENE-1522 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/highlighter >Reporter: Koji Sekiguchi >Assignee: Michael McCandless >Priority: Minor > Fix For: 2.9 > > Attachments: colored-tag-sample.png, LUCENE-1522-fix-SIOOBE.patch, > LUCENE-1522-multiValued-test.patch, LUCENE-1522.patch, LUCENE-1522.patch, > LUCENE-1522.patch, LUCENE-1522.patch, LUCENE-1522.patch, LUCENE-1522.patch, > LUCENE-1522.patch > > > I've written this highlighter for my project to support bi-gram token stream > (general token stream (e.g. WhitespaceTokenizer) also supported. see test > code in patch). The idea was inherited from my previous project with my > colleague and LUCENE-644. This approach needs highlight fields to be > TermVector.WITH_POSITIONS_OFFSETS, but is fast and can support N-grams. This > depends on LUCENE-1448 to get refined term offsets. > usage: > {code:java} > TopDocs docs = searcher.search( query, 10 ); > Highlighter h = new Highlighter(); > FieldQuery fq = h.getFieldQuery( query ); > for( ScoreDoc scoreDoc : docs.scoreDocs ){ > // fieldName="content", fragCharSize=100, numFragments=3 > String[] fragments = h.getBestFragments( fq, reader, scoreDoc.doc, > "content", 100, 3 ); > if( fragments != null ){ > for( String fragment : fragments ) > System.out.println( fragment ); > } > } > {code} > features: > - fast for large docs > - supports not only whitespace-based token stream, but also "fixed size" > N-gram (e.g. (2,2), not (1,3)) (can solve LUCENE-1489) > - supports PhraseQuery, phrase-unit highlighting with slops > {noformat} > q="w1 w2" > w1 w2 > --- > q="w1 w2"~1 > w1 w3 w2 w3 w1 w2 > {noformat} > - highlight fields need to be TermVector.WITH_POSITIONS_OFFSETS > - easy to apply patch due to independent package (contrib/highlighter2) > - uses Java 1.5 > - looks query boost to score fragments (currently doesn't see idf, but it > should be possible) > - pluggable FragListBuilder > - pluggable FragmentsBuilder > to do: > - term positions can be unnecessary when phraseHighlight==false > - collects performance numbers -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Hudson build is back to normal: Lucene-trunk #921
See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/921/changes - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1791) Enhance QueryUtils and CheckHIts to wrap everything they check in MultiReader/MultiSearcher
[ https://issues.apache.org/jira/browse/LUCENE-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated LUCENE-1791: - Attachment: LUCENE-1791.patch i put the doc "ids" into a KEY field and refactored ItemizedFilter to be a trivial subclass of FieldCacheTermsFilter. I also added more wrap permutations to address some of the possible edge cases Simon pointed out (good catch SImon) but didn't introduce any randomization for hte reasons mentioned before (even with the change to not rely on consistent docIds in ItemizedFilter, we can't allow deletions before the wrapped searcher/reader because CheckHIts does it magic based on docIds. (hmm... i suppose the wrap functions could return some metadata about what offset the old ids have in the new search/reader and CheckHits could use that hmmm ... seems kludgy so i'm not going to worry about it) I think we're good to go here unless anyone has any objections > Enhance QueryUtils and CheckHIts to wrap everything they check in > MultiReader/MultiSearcher > --- > > Key: LUCENE-1791 > URL: https://issues.apache.org/jira/browse/LUCENE-1791 > Project: Lucene - Java > Issue Type: Test >Reporter: Hoss Man > Fix For: 2.9 > > Attachments: LUCENE-1791.patch, LUCENE-1791.patch, LUCENE-1791.patch, > LUCENE-1791.patch, LUCENE-1791.patch, LUCENE-1791.patch, LUCENE-1791.patch, > LUCENE-1791.patch > > > methods in CheckHits & QueryUtils are in a good position to take any Searcher > they are given and not only test it, but also test MultiReader & > MultiSearcher constructs built around them -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Assigned: (LUCENE-1791) Enhance QueryUtils and CheckHIts to wrap everything they check in MultiReader/MultiSearcher
[ https://issues.apache.org/jira/browse/LUCENE-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man reassigned LUCENE-1791: Assignee: Hoss Man > Enhance QueryUtils and CheckHIts to wrap everything they check in > MultiReader/MultiSearcher > --- > > Key: LUCENE-1791 > URL: https://issues.apache.org/jira/browse/LUCENE-1791 > Project: Lucene - Java > Issue Type: Test >Reporter: Hoss Man >Assignee: Hoss Man > Fix For: 2.9 > > Attachments: LUCENE-1791.patch, LUCENE-1791.patch, LUCENE-1791.patch, > LUCENE-1791.patch, LUCENE-1791.patch, LUCENE-1791.patch, LUCENE-1791.patch, > LUCENE-1791.patch > > > methods in CheckHits & QueryUtils are in a good position to take any Searcher > they are given and not only test it, but also test MultiReader & > MultiSearcher constructs built around them -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1794) implement reusableTokenStream for all contrib analyzers
[ https://issues.apache.org/jira/browse/LUCENE-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743816#action_12743816 ] Shai Erera commented on LUCENE-1794: bq. right now consumers like lucene indexing call reset() on the stream, but I see the prototype ReusingAnalyzer also calling reset() on the stream. I don't think that's a new problem - I simply coded what I think most Analyzers that do impl reusableTS do. And if there are reusableTS impls that don't call reset() on purpose, then we shouldn't call it. Therefore, I think that we should change our code to not call reset(). I don't think there's a reusableTS impl which does not call reset(), because it relies on the consumer to do it (nobody guarantees that anyway). We should simply note that on reusableTS javadoc (e.g., something like "return an already reset token stream"). I don't mind doing that in a separate issue if that's what you prefer. > implement reusableTokenStream for all contrib analyzers > --- > > Key: LUCENE-1794 > URL: https://issues.apache.org/jira/browse/LUCENE-1794 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers >Reporter: Robert Muir >Assignee: Robert Muir >Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1794-reusing-analyzer.patch, LUCENE-1794.patch, > LUCENE-1794.patch, LUCENE-1794.patch, LUCENE-1794.patch, LUCENE-1794.patch, > LUCENE-1794.patch > > > most contrib analyzers do not have an impl for reusableTokenStream > regardless of how expensive the back compat reflection is for indexing speed, > I think we should do this to mitigate any performance costs. hey, overall it > might even be an improvement! > the back compat code for non-final analyzers is already in place so this is > easy money in my opinion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org