Re: lucene indexing and merge process
It seems like there are (at least) two angles here for getting better performance from FieldCache: 1) Be incremental: with reopen() we should only have to update a subset of the array in the FieldCache, according to the changed segments. This is what Hoss is working on and Mark was referring to and I think it's very important! 2) Parsing is slow (?): I'm guessing one of the reasons that John added the _X.udt file was because it's much faster to load an array of already-parsed ints than to ask FieldCache to populate itself. Even if we do #1, I think #2 could be a big win (in addition)? John do you have any numbers of how much faster it is to load the array of ints from the _X.udt file vs having FieldCache populate itself? Also on the original question of can we open up SegmentReader, FieldsWriter, etc., I think that's a good idea? At least we can make things protected instead of private/final? Mike Ning Li [EMAIL PROTECTED] wrote: I see what you mean by 2) now. What Mark said should work for you when it's done. Cheers, Ning On 10/18/07, John Wang [EMAIL PROTECTED] wrote: Hi Ning: That is essentially what field cache does. Doing this for each docid in the result set will be slow if the result set is large. But loading it in memory when opening index can also be slow if the index is large and updates often. Thanks -John On 10/18/07, Ning Li [EMAIL PROTECTED] wrote: Make all documents have a term, say ID:UID, and for each document, store its UID in the term's payload. You can read off this posting list to create your array. Will this work for you, John? Cheers, Ning On 10/18/07, Erik Hatcher [EMAIL PROTECTED] wrote: Forwarding this to java-dev per request. Seems like the best place to discuss this topic. Erik Begin forwarded message: From: John Wang [EMAIL PROTECTED] Date: October 17, 2007 5:43:29 PM EDT To: [EMAIL PROTECTED] Subject: lucene indexing and merge process Hi Erik: We are revamping our search system here at LinekdIn. And we are using Lucene. One issue we ran across is that we store an UID in Lucene which we map to the DB storage. So given a docid, to lookup its UID, we have the following solutions: 1) Index it as a Stored field and get it from reader.document (very slow if recall is large) 2) Load/Warmup the FieldCache (for large corpus, loading up the indexreader can be slow) 3) construct it using the FieldCache and persist it on disk everytime the index changes. (not suitable for real time indexing, e.g. this process will degrade as # of documents get large) None of the above solutions turn out to be adequate for our requirements. What we end up doing is to modify Lucene code by changing SegmentReader,DocumentWriter,and FieldWriter classes by taking advantage of the Lucene Segment/merge process. E.g: For each segment, we store a .udt file, which is an int[] array, (by changing the FieldWriter class) And SegmentReader will load the .udt file into an array. And merge happens seemlessly. Because the tight encapsulation around these classes, e.g. private and final methods, it is very difficult to extend Lucene while avoiding branch into our own version. Is there a way we can open up and make these classes extensible? We'd be happy to contribute what we have done. I guess to tackle the problem from a different angle: is there a way to incorporate FieldCache into the segments (it is strictly in memory now), and build disk versions while indexing. Hope I am making sense. I did not send this out to the mailing list because I wasn't sure if this is a dev question or an user question, feel free to either forward it to the right mailing list or let me know and I can forward it. Thanks -John - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1020) Basic tool for checking repairing an index
[ https://issues.apache.org/jira/browse/LUCENE-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1020: --- Attachment: LUCENE-1020.take2.patch Attached patch: another rev of this tool, with a few minor additions. I plan to commit in a day or two. Basic tool for checking repairing an index Key: LUCENE-1020 URL: https://issues.apache.org/jira/browse/LUCENE-1020 Project: Lucene - Java Issue Type: New Feature Components: Index Affects Versions: 2.3 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Attachments: LUCENE-1020.patch, LUCENE-1020.take2.patch This has been requested a number of times on the mailing lists. Most recently here: http://www.gossamer-threads.com/lists/lucene/java-user/53474 I think we should provide a basic tool out of the box. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-743) IndexReader.reopen()
[ https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536353 ] Michael Busch commented on LUCENE-743: -- Hi Mike, I'm not sure if I fully understand your comment. Consider the following (quite constructed) example: {code:java} IndexReader reader1 = IndexReader.open(index1); // optimized index, reader1 is a SegmentReader IndexReader multiReader1 = new MultiReader(new IndexReader[] {reader1, IndexReader.open(index2)}); ... // modify index2 IndexReader multiReader2 = multiReader1.reopen(); // only index2 changed, so multiReader2 uses reader1 and has to increment the refcount of reader1 ... // modify index1 IndexReader reader2 = reader1.reopen(); // reader2 is a new instance that shares resources with reader1 ... // modify index1 IndexReader reader3 = reader2.reopen(); // reader3 is a new instance that shares resources with reader1 and reader2 {code} Now the user closes the readers in this order: # multiReader1.close(); # multiReader2.close(); # reader2.close(); # reader3.close(); reader1 should be marked as closed after 2., right? Because multiReader1.close() and multiReader2.close() have to decrement reader1's refcount. But the underlying files have to remain open until after 4., because reader2 and reader3 use reader1's resources. So don't we need 2 refcount values in reader1? One that tells us when the reader itself can be marked as closed, and one that tells when the resources can be closed? Then multiReader1 and multiReader2 would decrement the first refCount, whereas reader2 and reader3 both have to know reader1, so that they can decrement the second refcount. I hope I'm just completely confused now and someone tells me that the whole thing is much simpler :-) IndexReader.reopen() Key: LUCENE-743 URL: https://issues.apache.org/jira/browse/LUCENE-743 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Otis Gospodnetic Assignee: Michael Busch Priority: Minor Fix For: 2.3 Attachments: IndexReaderUtils.java, lucene-743-take2.patch, lucene-743.patch, lucene-743.patch, lucene-743.patch, MyMultiReader.java, MySegmentReader.java, varient-no-isCloneSupported.BROKEN.patch This is Robert Engels' implementation of IndexReader.reopen() functionality, as a set of 3 new classes (this was easier for him to implement, but should probably be folded into the core, if this looks good). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Per-document Payloads (was: Re: lucene indexing and merge process)
Hi Michael: Thanks for the info. I haven't played with payloads. Can you give me an example or point me to how it is used to solve this problem? Thanks -John On 10/19/07, Michael Busch [EMAIL PROTECTED] wrote: John Wang wrote: I can tried to get some numbers for leading an int[] array vs FieldCache.getInts(). I've had a similar performance problem when I used the FieldCache. The loading performance is apparently so slow, because each value is stored as a term in the dictionary. For loading the cache it is necessary to iterate over all terms for the field in the dictionary. And for each term it's posting list is opened to check which documents have that value. If you store unique docIds, then there are no two documents that share the same value. That means, that each value gets its own entry in the dictionary and to load each value it is necessary to perform two random I/O seeks (one for term lookup + one to open the posting list). In my app it took for a big index several minutes to fill the cache like that. To speed things up I did essentially what Ning suggested. Now I store the values as payloads in the posting list of an artificial term. To fill my cache it's only necessary to perform a couple of I/O seeks for opening the posting list of the specific term, then it is just a sequential scan to load all values. With this approach the time for filling the cache went down from minutes to seconds! Now this approach is already much better than the current field cache implementation, but it still can be improved. In fact, we already have a mechanism for doing that: the norms. Norms are stored with a fixed size, which means both random access and sequential scan are optimal. Norms are also cached in memory, and filling that cache is much faster compared to the current FieldCache approach. I was therefore thinking about adding per-document payloads to Lucene (we can also call it document-metadata). The API could look like this: Document d = new Document(); byte[] uidValue = ... d.addMetadata(uid, uidValue); And on the retrieval side all values could either be loaded into the field cache, or, if the index is too big, a new API can be used: IndexReader reader = IndexReader.open(...); DocumentMetadataIterator it = reader.metadataIterator(uid); where DocumentMetadataIterator is an interface similar to TermDocs: interface DocumentMetadataIterator { void seek(String name); boolean next(); boolean skipTo(int doc); int doc(); byte[] getMetadata(); } The next question would be how to store the per-doc payloads (PDP). If all values have the same length (as the unique docIds), then we should store them as efficiently as possible, like the norms. However, we still want to offer the flexibility of having variable-length values. For this case we could use a new data structure similar to our posting list. PDPList -- FixedLengthPDPList | VariableLengthPDPList, SkipList FixedLengthPDPList-- Payload^SegSize VariableLengthPDPList -- DocDelta, PayloadLength?, Payload Payload -- Byte^PayloadLength PayloadLength -- VInt SkipList -- see frq.file Because we don't have global field semantics Lucene should automatically pick the right data structure. This could work like this: When the DocumentsWriter writes a segment it checks whether all values of a PDP have the same length. If yes, it stores them as FixedLengthPDPList, if not, then as VariableLengthPDPList. When the SegmentMerger merges two or more segments it checks if all segments have a FixedLengthPDPList with the same length for a PDP. If not, it writes a VariableLengthPDPList to the new segment. I think this would be a nice new feature for Lucene. We could then have user-defined and Lucene-specific PDPs. For example, norms would be in the latter category (this way we would get rid of the special code for norms, as they could be handled as PDPs). It would also be easy to add new features in the future, like splitting the norms into two values: a norm and a boost value. OK lot's of thoughts, I'm sure I'll get lot's of comments too ... ;) - Michael Thanks -John On 10/19/07, Michael McCandless [EMAIL PROTECTED] wrote: It seems like there are (at least) two angles here for getting better performance from FieldCache: 1) Be incremental: with reopen() we should only have to update a subset of the array in the FieldCache, according to the changed segments. This is what Hoss is working on and Mark was referring to and I think it's very important! 2) Parsing is slow (?): I'm guessing one of the reasons that John added the _X.udt file was because it's much faster to load an array of already-parsed ints than to ask FieldCache to populate itself. Even if we do #1, I think #2 could be a big win (in addition)? John do you have any numbers of
Per-document Payloads (was: Re: lucene indexing and merge process)
John Wang wrote: I can tried to get some numbers for leading an int[] array vs FieldCache.getInts(). I've had a similar performance problem when I used the FieldCache. The loading performance is apparently so slow, because each value is stored as a term in the dictionary. For loading the cache it is necessary to iterate over all terms for the field in the dictionary. And for each term it's posting list is opened to check which documents have that value. If you store unique docIds, then there are no two documents that share the same value. That means, that each value gets its own entry in the dictionary and to load each value it is necessary to perform two random I/O seeks (one for term lookup + one to open the posting list). In my app it took for a big index several minutes to fill the cache like that. To speed things up I did essentially what Ning suggested. Now I store the values as payloads in the posting list of an artificial term. To fill my cache it's only necessary to perform a couple of I/O seeks for opening the posting list of the specific term, then it is just a sequential scan to load all values. With this approach the time for filling the cache went down from minutes to seconds! Now this approach is already much better than the current field cache implementation, but it still can be improved. In fact, we already have a mechanism for doing that: the norms. Norms are stored with a fixed size, which means both random access and sequential scan are optimal. Norms are also cached in memory, and filling that cache is much faster compared to the current FieldCache approach. I was therefore thinking about adding per-document payloads to Lucene (we can also call it document-metadata). The API could look like this: Document d = new Document(); byte[] uidValue = ... d.addMetadata(uid, uidValue); And on the retrieval side all values could either be loaded into the field cache, or, if the index is too big, a new API can be used: IndexReader reader = IndexReader.open(...); DocumentMetadataIterator it = reader.metadataIterator(uid); where DocumentMetadataIterator is an interface similar to TermDocs: interface DocumentMetadataIterator { void seek(String name); boolean next(); boolean skipTo(int doc); int doc(); byte[] getMetadata(); } The next question would be how to store the per-doc payloads (PDP). If all values have the same length (as the unique docIds), then we should store them as efficiently as possible, like the norms. However, we still want to offer the flexibility of having variable-length values. For this case we could use a new data structure similar to our posting list. PDPList -- FixedLengthPDPList | VariableLengthPDPList, SkipList FixedLengthPDPList-- Payload^SegSize VariableLengthPDPList -- DocDelta, PayloadLength?, Payload Payload -- Byte^PayloadLength PayloadLength -- VInt SkipList -- see frq.file Because we don't have global field semantics Lucene should automatically pick the right data structure. This could work like this: When the DocumentsWriter writes a segment it checks whether all values of a PDP have the same length. If yes, it stores them as FixedLengthPDPList, if not, then as VariableLengthPDPList. When the SegmentMerger merges two or more segments it checks if all segments have a FixedLengthPDPList with the same length for a PDP. If not, it writes a VariableLengthPDPList to the new segment. I think this would be a nice new feature for Lucene. We could then have user-defined and Lucene-specific PDPs. For example, norms would be in the latter category (this way we would get rid of the special code for norms, as they could be handled as PDPs). It would also be easy to add new features in the future, like splitting the norms into two values: a norm and a boost value. OK lot's of thoughts, I'm sure I'll get lot's of comments too ... ;) - Michael Thanks -John On 10/19/07, Michael McCandless [EMAIL PROTECTED] wrote: It seems like there are (at least) two angles here for getting better performance from FieldCache: 1) Be incremental: with reopen() we should only have to update a subset of the array in the FieldCache, according to the changed segments. This is what Hoss is working on and Mark was referring to and I think it's very important! 2) Parsing is slow (?): I'm guessing one of the reasons that John added the _X.udt file was because it's much faster to load an array of already-parsed ints than to ask FieldCache to populate itself. Even if we do #1, I think #2 could be a big win (in addition)? John do you have any numbers of how much faster it is to load the array of ints from the _X.udt file vs having FieldCache populate itself? Also on the original question of can we open up SegmentReader, FieldsWriter, etc., I think that's a good idea? At least we can make things protected instead of private/final? Mike Ning Li [EMAIL
[jira] Updated: (LUCENE-997) Add search timeout support to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Timm updated LUCENE-997: - Attachment: timeout.patch Two issues are addressed in this latest patch: 1) Timeout support was not added to: public TopFieldDocs search(Weight weight, Filter filter, final int nDocs, Sort sort) 2) getCounter() in TimerThread was replaced by getMilliseconds() Add search timeout support to Lucene Key: LUCENE-997 URL: https://issues.apache.org/jira/browse/LUCENE-997 Project: Lucene - Java Issue Type: New Feature Reporter: Sean Timm Priority: Minor Attachments: LuceneTimeoutTest.java, timeout.patch, timeout.patch This patch is based on Nutch-308. This patch adds support for a maximum search time limit. After this time is exceeded, the search thread is stopped, partial results (if any) are returned and the total number of results is estimated. This patch tries to minimize the overhead related to time-keeping by using a version of safe unsynchronized timer. This was also discussed in an e-mail thread. http://www.nabble.com/search-timeout-tf3410206.html#a9501029 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Remove TermEnum.skipTo(Term target)
Karl Wettin wrote: So what's up with this method? Did anyone ever figure out what it is used for? I found the origin of it. It was added in 2004: http://svn.apache.org/viewvc?view=revrevision=150206. This was to fix issue: http://issues.apache.org/bugzilla/show_bug.cgi?id=18927 But the code was originally written in 2001 (!) by Dmitry Serebrennikov. The original patch is at: http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200110.mbox/[EMAIL PROTECTED] Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: lucene indexing and merge process
Hi Mike: This is an excellent analysis. To do 2), we tried computing the field cache at indexing time to avoid parsing at search time. But what we've found out was that this degrades indexing (because it computes the entire fieldcache, not in segements) which was not acceptable to our project either. I can tried to get some numbers for leading an int[] array vs FieldCache.getInts(). Thanks -John On 10/19/07, Michael McCandless [EMAIL PROTECTED] wrote: It seems like there are (at least) two angles here for getting better performance from FieldCache: 1) Be incremental: with reopen() we should only have to update a subset of the array in the FieldCache, according to the changed segments. This is what Hoss is working on and Mark was referring to and I think it's very important! 2) Parsing is slow (?): I'm guessing one of the reasons that John added the _X.udt file was because it's much faster to load an array of already-parsed ints than to ask FieldCache to populate itself. Even if we do #1, I think #2 could be a big win (in addition)? John do you have any numbers of how much faster it is to load the array of ints from the _X.udt file vs having FieldCache populate itself? Also on the original question of can we open up SegmentReader, FieldsWriter, etc., I think that's a good idea? At least we can make things protected instead of private/final? Mike Ning Li [EMAIL PROTECTED] wrote: I see what you mean by 2) now. What Mark said should work for you when it's done. Cheers, Ning On 10/18/07, John Wang [EMAIL PROTECTED] wrote: Hi Ning: That is essentially what field cache does. Doing this for each docid in the result set will be slow if the result set is large. But loading it in memory when opening index can also be slow if the index is large and updates often. Thanks -John On 10/18/07, Ning Li [EMAIL PROTECTED] wrote: Make all documents have a term, say ID:UID, and for each document, store its UID in the term's payload. You can read off this posting list to create your array. Will this work for you, John? Cheers, Ning On 10/18/07, Erik Hatcher [EMAIL PROTECTED] wrote: Forwarding this to java-dev per request. Seems like the best place to discuss this topic. Erik Begin forwarded message: From: John Wang [EMAIL PROTECTED] Date: October 17, 2007 5:43:29 PM EDT To: [EMAIL PROTECTED] Subject: lucene indexing and merge process Hi Erik: We are revamping our search system here at LinekdIn. And we are using Lucene. One issue we ran across is that we store an UID in Lucene which we map to the DB storage. So given a docid, to lookup its UID, we have the following solutions: 1) Index it as a Stored field and get it from reader.document(very slow if recall is large) 2) Load/Warmup the FieldCache (for large corpus, loading up the indexreader can be slow) 3) construct it using the FieldCache and persist it on disk everytime the index changes. (not suitable for real time indexing, e.g. this process will degrade as # of documents get large) None of the above solutions turn out to be adequate for our requirements. What we end up doing is to modify Lucene code by changing SegmentReader,DocumentWriter,and FieldWriter classes by taking advantage of the Lucene Segment/merge process. E.g: For each segment, we store a .udt file, which is an int[] array, (by changing the FieldWriter class) And SegmentReader will load the .udt file into an array. And merge happens seemlessly. Because the tight encapsulation around these classes, e.g. private and final methods, it is very difficult to extend Lucene while avoiding branch into our own version. Is there a way we can open up and make these classes extensible? We'd be happy to contribute what we have done. I guess to tackle the problem from a different angle: is there a way to incorporate FieldCache into the segments (it is strictly in memory now), and build disk versions while indexing. Hope I am making sense. I did not send this out to the mailing list because I wasn't sure if this is a dev question or an user question, feel free to either forward it to the right mailing list or let me know and I can forward it. Thanks -John - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries
[ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-794: --- Attachment: (was: CachedTokenStream.java) Extend contrib Highlighter to properly support phrase queries and span queries -- Key: LUCENE-794 URL: https://issues.apache.org/jira/browse/LUCENE-794 Project: Lucene - Java Issue Type: Improvement Components: Other Reporter: Mark Miller Priority: Minor Attachments: Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans. See http://issues.apache.org/jira/browse/LUCENE-403 for some background. There is a dependency on MemoryIndex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries
[ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-794: --- Attachment: (was: CachedTokenStream.java) Extend contrib Highlighter to properly support phrase queries and span queries -- Key: LUCENE-794 URL: https://issues.apache.org/jira/browse/LUCENE-794 Project: Lucene - Java Issue Type: Improvement Components: Other Reporter: Mark Miller Priority: Minor Attachments: Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans. See http://issues.apache.org/jira/browse/LUCENE-403 for some background. There is a dependency on MemoryIndex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries
[ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-794: --- Attachment: (was: SpanHighlighterTest.java) Extend contrib Highlighter to properly support phrase queries and span queries -- Key: LUCENE-794 URL: https://issues.apache.org/jira/browse/LUCENE-794 Project: Lucene - Java Issue Type: Improvement Components: Other Reporter: Mark Miller Priority: Minor Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans. See http://issues.apache.org/jira/browse/LUCENE-403 for some background. There is a dependency on MemoryIndex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries
[ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-794: --- Attachment: (was: SpanScorer.java) Extend contrib Highlighter to properly support phrase queries and span queries -- Key: LUCENE-794 URL: https://issues.apache.org/jira/browse/LUCENE-794 Project: Lucene - Java Issue Type: Improvement Components: Other Reporter: Mark Miller Priority: Minor Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans. See http://issues.apache.org/jira/browse/LUCENE-403 for some background. There is a dependency on MemoryIndex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries
[ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-794: --- Attachment: (was: WeightedSpanTerm.java) Extend contrib Highlighter to properly support phrase queries and span queries -- Key: LUCENE-794 URL: https://issues.apache.org/jira/browse/LUCENE-794 Project: Lucene - Java Issue Type: Improvement Components: Other Reporter: Mark Miller Priority: Minor Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans. See http://issues.apache.org/jira/browse/LUCENE-403 for some background. There is a dependency on MemoryIndex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries
[ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-794: --- Attachment: (was: SpanHighlighterTest.java) Extend contrib Highlighter to properly support phrase queries and span queries -- Key: LUCENE-794 URL: https://issues.apache.org/jira/browse/LUCENE-794 Project: Lucene - Java Issue Type: Improvement Components: Other Reporter: Mark Miller Priority: Minor Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans. See http://issues.apache.org/jira/browse/LUCENE-403 for some background. There is a dependency on MemoryIndex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries
[ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-794: --- Attachment: (was: QuerySpansExtractor.java) Extend contrib Highlighter to properly support phrase queries and span queries -- Key: LUCENE-794 URL: https://issues.apache.org/jira/browse/LUCENE-794 Project: Lucene - Java Issue Type: Improvement Components: Other Reporter: Mark Miller Priority: Minor Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans. See http://issues.apache.org/jira/browse/LUCENE-403 for some background. There is a dependency on MemoryIndex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries
[ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-794: --- Attachment: (was: MemoryIndex.java) Extend contrib Highlighter to properly support phrase queries and span queries -- Key: LUCENE-794 URL: https://issues.apache.org/jira/browse/LUCENE-794 Project: Lucene - Java Issue Type: Improvement Components: Other Reporter: Mark Miller Priority: Minor Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans. See http://issues.apache.org/jira/browse/LUCENE-403 for some background. There is a dependency on MemoryIndex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries
[ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-794: --- Attachment: (was: SpanScorer.java) Extend contrib Highlighter to properly support phrase queries and span queries -- Key: LUCENE-794 URL: https://issues.apache.org/jira/browse/LUCENE-794 Project: Lucene - Java Issue Type: Improvement Components: Other Reporter: Mark Miller Priority: Minor Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans. See http://issues.apache.org/jira/browse/LUCENE-403 for some background. There is a dependency on MemoryIndex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries
[ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-794: --- Attachment: (was: SimpleFormatter.java) Extend contrib Highlighter to properly support phrase queries and span queries -- Key: LUCENE-794 URL: https://issues.apache.org/jira/browse/LUCENE-794 Project: Lucene - Java Issue Type: Improvement Components: Other Reporter: Mark Miller Priority: Minor Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans. See http://issues.apache.org/jira/browse/LUCENE-403 for some background. There is a dependency on MemoryIndex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries
[ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-794: --- Attachment: (was: QuerySpansExtractor.java) Extend contrib Highlighter to properly support phrase queries and span queries -- Key: LUCENE-794 URL: https://issues.apache.org/jira/browse/LUCENE-794 Project: Lucene - Java Issue Type: Improvement Components: Other Reporter: Mark Miller Priority: Minor Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans. See http://issues.apache.org/jira/browse/LUCENE-403 for some background. There is a dependency on MemoryIndex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries
[ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-794: --- Attachment: (was: HighlighterTest.java) Extend contrib Highlighter to properly support phrase queries and span queries -- Key: LUCENE-794 URL: https://issues.apache.org/jira/browse/LUCENE-794 Project: Lucene - Java Issue Type: Improvement Components: Other Reporter: Mark Miller Priority: Minor Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans. See http://issues.apache.org/jira/browse/LUCENE-403 for some background. There is a dependency on MemoryIndex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries
[ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-794: --- Attachment: (was: QuerySpansExtractor.java) Extend contrib Highlighter to properly support phrase queries and span queries -- Key: LUCENE-794 URL: https://issues.apache.org/jira/browse/LUCENE-794 Project: Lucene - Java Issue Type: Improvement Components: Other Reporter: Mark Miller Priority: Minor Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans. See http://issues.apache.org/jira/browse/LUCENE-403 for some background. There is a dependency on MemoryIndex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries
[ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-794: --- Attachment: (was: Highlighter.java) Extend contrib Highlighter to properly support phrase queries and span queries -- Key: LUCENE-794 URL: https://issues.apache.org/jira/browse/LUCENE-794 Project: Lucene - Java Issue Type: Improvement Components: Other Reporter: Mark Miller Priority: Minor Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans. See http://issues.apache.org/jira/browse/LUCENE-403 for some background. There is a dependency on MemoryIndex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries
[ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-794: --- Attachment: (was: Highlighter.java) Extend contrib Highlighter to properly support phrase queries and span queries -- Key: LUCENE-794 URL: https://issues.apache.org/jira/browse/LUCENE-794 Project: Lucene - Java Issue Type: Improvement Components: Other Reporter: Mark Miller Priority: Minor Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans. See http://issues.apache.org/jira/browse/LUCENE-403 for some background. There is a dependency on MemoryIndex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries
[ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-794: --- Attachment: (was: Highlighter.java) Extend contrib Highlighter to properly support phrase queries and span queries -- Key: LUCENE-794 URL: https://issues.apache.org/jira/browse/LUCENE-794 Project: Lucene - Java Issue Type: Improvement Components: Other Reporter: Mark Miller Priority: Minor Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans. See http://issues.apache.org/jira/browse/LUCENE-403 for some background. There is a dependency on MemoryIndex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries
[ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-794: --- Attachment: (was: HighlighterTest.java) Extend contrib Highlighter to properly support phrase queries and span queries -- Key: LUCENE-794 URL: https://issues.apache.org/jira/browse/LUCENE-794 Project: Lucene - Java Issue Type: Improvement Components: Other Reporter: Mark Miller Priority: Minor Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans. See http://issues.apache.org/jira/browse/LUCENE-403 for some background. There is a dependency on MemoryIndex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries
[ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-794: --- Attachment: (was: HighlighterTest.java) Extend contrib Highlighter to properly support phrase queries and span queries -- Key: LUCENE-794 URL: https://issues.apache.org/jira/browse/LUCENE-794 Project: Lucene - Java Issue Type: Improvement Components: Other Reporter: Mark Miller Priority: Minor Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans. See http://issues.apache.org/jira/browse/LUCENE-403 for some background. There is a dependency on MemoryIndex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries
[ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-794: --- Attachment: (was: Highlighter.java) Extend contrib Highlighter to properly support phrase queries and span queries -- Key: LUCENE-794 URL: https://issues.apache.org/jira/browse/LUCENE-794 Project: Lucene - Java Issue Type: Improvement Components: Other Reporter: Mark Miller Priority: Minor Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans. See http://issues.apache.org/jira/browse/LUCENE-403 for some background. There is a dependency on MemoryIndex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries
[ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-794: --- Attachment: (was: Formatter.java) Extend contrib Highlighter to properly support phrase queries and span queries -- Key: LUCENE-794 URL: https://issues.apache.org/jira/browse/LUCENE-794 Project: Lucene - Java Issue Type: Improvement Components: Other Reporter: Mark Miller Priority: Minor Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans. See http://issues.apache.org/jira/browse/LUCENE-403 for some background. There is a dependency on MemoryIndex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries
[ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-794: --- Attachment: (was: Encoder.java) Extend contrib Highlighter to properly support phrase queries and span queries -- Key: LUCENE-794 URL: https://issues.apache.org/jira/browse/LUCENE-794 Project: Lucene - Java Issue Type: Improvement Components: Other Reporter: Mark Miller Priority: Minor Attachments: Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans. See http://issues.apache.org/jira/browse/LUCENE-403 for some background. There is a dependency on MemoryIndex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries
[ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-794: --- Attachment: (was: DefaultEncoder.java) Extend contrib Highlighter to properly support phrase queries and span queries -- Key: LUCENE-794 URL: https://issues.apache.org/jira/browse/LUCENE-794 Project: Lucene - Java Issue Type: Improvement Components: Other Reporter: Mark Miller Priority: Minor Attachments: Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans. See http://issues.apache.org/jira/browse/LUCENE-403 for some background. There is a dependency on MemoryIndex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries
[ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-794: --- Attachment: spanhighlighter12.patch Nice little addition courtesy of Michael Goddard: ...In our Lucene work, we took the approach of indexing all fields into a single field, FULLTEXT, which is the default field for queries. Our query syntax is such that a user can combine clauses against named fields with clauses with no field specified. When we go to highlight such queries, if a given clause is against this FULLTEXT field but we're highlighting text in the TITLE field, we'd still like for matching terms to be highlighted... Thanks for the patch Micahael. There is a new constructor that allows you to specify a default field. Terms from this field will be highlighted regardless of the specific field you are highlighting. Only file to worry about in that huge mess of files listed above is spanhighlighter12.patch. Extend contrib Highlighter to properly support phrase queries and span queries -- Key: LUCENE-794 URL: https://issues.apache.org/jira/browse/LUCENE-794 Project: Lucene - Java Issue Type: Improvement Components: Other Reporter: Mark Miller Priority: Minor Attachments: Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans. See http://issues.apache.org/jira/browse/LUCENE-403 for some background. There is a dependency on MemoryIndex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-550) InstantiatedIndex - faster but memory consuming index
[ https://issues.apache.org/jira/browse/LUCENE-550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wettin updated LUCENE-550: --- Attachment: LUCENE-550_20071019_no_core_changes.txt In this patch: * IndexReader.terms(Term) optimization, initial seek now jit-call away given the term exists, rather than using binary search. * A handful of minor optimizations * IndexReader.version() mimics Segment-dito InstantiatedIndex - faster but memory consuming index - Key: LUCENE-550 URL: https://issues.apache.org/jira/browse/LUCENE-550 Project: Lucene - Java Issue Type: New Feature Components: Store Affects Versions: 2.0.0 Reporter: Karl Wettin Assignee: Grant Ingersoll Attachments: HitCollectionBench.jpg, lucene-550.jpg, LUCENE-550_20070804_no_core_changes.txt, LUCENE-550_20070808_no_core_changes.txt, LUCENE-550_20070817_no_core_changes.txt, LUCENE-550_20070928_no_core_changes.txt, LUCENE-550_20071008_no_core_changes.txt, LUCENE-550_20071017_no_core_changes.txt, LUCENE-550_20071019_no_core_changes.txt, test-reports.zip, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2 An non file centrinc all in memory index. Consumes some 2x the memory of a RAMDirectory (in a term satured index) but is between 3x-60x faster depending on application and how one counts. Average query is about 8x faster. IndexWriter and IndexModifier have been realized in InterfaceIndexWriter and InterfaceIndexModifier. InstantiatedIndex is wrapped in a new top layer index facade (class Index) that comes with factory methods for writers, readers and searchers for unison index handeling. There are decorators with notification handling that can be used for automatically syncronizing searchers on updates, et.c. Index also comes with FS/RAMDirectory implementation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-1031) Fixes a handful of misspellings/mistakes in changes.txt
[ https://issues.apache.org/jira/browse/LUCENE-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1031. Resolution: Fixed Fix Version/s: 2.3 Lucene Fields: [New, Patch Available] (was: [Patch Available, New]) Fixed -- thanks Mark! Fixes a handful of misspellings/mistakes in changes.txt --- Key: LUCENE-1031 URL: https://issues.apache.org/jira/browse/LUCENE-1031 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 2.3 Reporter: Mark Miller Assignee: Michael McCandless Priority: Trivial Fix For: 2.3 Attachments: changes.txt.patch There are a handful of misspellings/mistakes in changes.txt. This patch fixes them. Avoided the one or two British to English conversions g -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1031) Fixes a handful of misspellings/mistakes in changes.txt
[ https://issues.apache.org/jira/browse/LUCENE-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536205 ] Michael McCandless commented on LUCENE-1031: Sheesh, we committers really can't spell!! Thanks Mark! I'll commit. Fixes a handful of misspellings/mistakes in changes.txt --- Key: LUCENE-1031 URL: https://issues.apache.org/jira/browse/LUCENE-1031 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 2.3 Reporter: Mark Miller Priority: Trivial Attachments: changes.txt.patch There are a handful of misspellings/mistakes in changes.txt. This patch fixes them. Avoided the one or two British to English conversions g -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Remove TermEnum.skipTo(Term target)
Wolfgang Hoschek wrote at Wed, 04 May 2005 20:59:24 GMT: I was considering an efficient impl of TermEnum.skipTo(Term target) for the MemoryIndex. But then I realized that nothing anywhere in Lucene calls that method. It's effectively dead code; a remainder of a previous ice age - nothing would break if it would be removed. I'd suggest doing so unless I'm missing something. So what's up with this method? Did anyone ever figure out what it is used for? -- karl - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Commented: (LUCENE-1029) Illegal character replacements in ISOLatin1AccentFilter
If you are to compare with stemmers, consider that these creates unique tokens that does not interfere with semantic meanings. Not starting anything here again, but it took me so darn long to find something that porter stems and kills the semantic meaning that I had to share. That damn algorithm is amazing...I was coming to the conclusion that it was absolutely perfect on the English language...until after a couple days of searching I found international goes to intern. Eureka! Though a hollow victory at best. That algorithm is pretty amazing... - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1031) Fixes a handful of misspellings/mistakes in changes.txt
[ https://issues.apache.org/jira/browse/LUCENE-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-1031: Attachment: changes.txt.patch Fixes a handful of misspellings/mistakes in changes.txt --- Key: LUCENE-1031 URL: https://issues.apache.org/jira/browse/LUCENE-1031 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 2.3 Reporter: Mark Miller Priority: Trivial Attachments: changes.txt.patch There are a handful of misspellings/mistakes in changes.txt. This patch fixes them. Avoided the one or two British to English conversions g -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1031) Fixes a handful of misspellings/mistakes in changes.txt
Fixes a handful of misspellings/mistakes in changes.txt --- Key: LUCENE-1031 URL: https://issues.apache.org/jira/browse/LUCENE-1031 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 2.3 Reporter: Mark Miller Priority: Trivial Attachments: changes.txt.patch There are a handful of misspellings/mistakes in changes.txt. This patch fixes them. Avoided the one or two British to English conversions g -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries
[ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-794: --- Attachment: (was: HighlighterTest.java) Extend contrib Highlighter to properly support phrase queries and span queries -- Key: LUCENE-794 URL: https://issues.apache.org/jira/browse/LUCENE-794 Project: Lucene - Java Issue Type: Improvement Components: Other Reporter: Mark Miller Priority: Minor Attachments: spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans. See http://issues.apache.org/jira/browse/LUCENE-403 for some background. There is a dependency on MemoryIndex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries
[ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-794: --- Attachment: (was: CachedTokenStream.java) Extend contrib Highlighter to properly support phrase queries and span queries -- Key: LUCENE-794 URL: https://issues.apache.org/jira/browse/LUCENE-794 Project: Lucene - Java Issue Type: Improvement Components: Other Reporter: Mark Miller Priority: Minor Attachments: Encoder.java, Formatter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, WeightedSpanTerm.java This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts to fragment without breaking up Spans. See http://issues.apache.org/jira/browse/LUCENE-403 for some background. There is a dependency on MemoryIndex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]