[jira] Updated: (LUCENE-2348) DuplicateFilter incorrectly handles multiple calls to getDocIdSet for segment readers
[ https://issues.apache.org/jira/browse/LUCENE-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trejkaz updated LUCENE-2348: Component/s: (was: Search) contrib/* Changing to contrib, only just realised it was in that location... > DuplicateFilter incorrectly handles multiple calls to getDocIdSet for segment > readers > - > > Key: LUCENE-2348 > URL: https://issues.apache.org/jira/browse/LUCENE-2348 > Project: Lucene - Java > Issue Type: Bug > Components: contrib/* >Affects Versions: 2.9.2 >Reporter: Trejkaz > > DuplicateFilter currently works by building a single doc ID set, without > taking into account that getDocIdSet() will be called once per segment and > only with each segment's local reader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2348) DuplicateFilter incorrectly handles multiple calls to getDocIdSet for segment readers
DuplicateFilter incorrectly handles multiple calls to getDocIdSet for segment readers - Key: LUCENE-2348 URL: https://issues.apache.org/jira/browse/LUCENE-2348 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.9.2 Reporter: Trejkaz DuplicateFilter currently works by building a single doc ID set, without taking into account that getDocIdSet() will be called once per segment and only with each segment's local reader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1683) RegexQuery matches terms the input regex doesn't actually match
[ https://issues.apache.org/jira/browse/LUCENE-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718270#action_12718270 ] Trejkaz commented on LUCENE-1683: - I screwed up the formatting. Fixed version: {code} @Test public void testNecessity() throws Exception { File dir = new File(new File(System.getProperty("java.io.tmpdir")), "index"); IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer(), true); Document doc = new Document(); doc.add(new Field("field", "cat cats cathy", Field.Store.YES, Field.Index.TOKENIZED)); writer.addDocument(doc); writer.close(); IndexReader reader = IndexReader.open(dir); TermEnum terms = new RegexQuery(new Term("field", "cat.")).getEnum(reader); assertEquals("Wrong term", "cats", terms.term().text()); assertFalse("Should have only been one term", terms.next()); } {code} > RegexQuery matches terms the input regex doesn't actually match > --- > > Key: LUCENE-1683 > URL: https://issues.apache.org/jira/browse/LUCENE-1683 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/* >Affects Versions: 2.3.2 >Reporter: Trejkaz > > I was writing some unit tests for our own wrapper around the Lucene regex > classes, and got tripped up by something interesting. > The regex "cat." will match "cats" but also anything with "cat" and 1+ > following letters (e.g. "cathy", "catcher", ...) It is as if there is an > implicit .* always added to the end of the regex. > Here's a unit test for the behaviour I would expect myself: > @Test > public void testNecessity() throws Exception { > File dir = new File(new File(System.getProperty("java.io.tmpdir")), > "index"); > IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer(), > true); > try { > Document doc = new Document(); > doc.add(new Field("field", "cat cats cathy", Field.Store.YES, > Field.Index.TOKENIZED)); > writer.addDocument(doc); > } finally { > writer.close(); > } > IndexReader reader = IndexReader.open(dir); > try { > TermEnum terms = new RegexQuery(new Term("field", > "cat.")).getEnum(reader); > assertEquals("Wrong term", "cats", terms.term()); > assertFalse("Should have only been one term", terms.next()); > } finally { > reader.close(); > } > } > This test fails on the term check with terms.term() equal to "cathy". > Our workaround is to mangle the query like this: > String fixed = String.format("(?:%s)$", original); -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-1683) RegexQuery matches terms the input regex doesn't actually match
RegexQuery matches terms the input regex doesn't actually match --- Key: LUCENE-1683 URL: https://issues.apache.org/jira/browse/LUCENE-1683 Project: Lucene - Java Issue Type: Improvement Components: contrib/* Affects Versions: 2.3.2 Reporter: Trejkaz I was writing some unit tests for our own wrapper around the Lucene regex classes, and got tripped up by something interesting. The regex "cat." will match "cats" but also anything with "cat" and 1+ following letters (e.g. "cathy", "catcher", ...) It is as if there is an implicit .* always added to the end of the regex. Here's a unit test for the behaviour I would expect myself: @Test public void testNecessity() throws Exception { File dir = new File(new File(System.getProperty("java.io.tmpdir")), "index"); IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer(), true); try { Document doc = new Document(); doc.add(new Field("field", "cat cats cathy", Field.Store.YES, Field.Index.TOKENIZED)); writer.addDocument(doc); } finally { writer.close(); } IndexReader reader = IndexReader.open(dir); try { TermEnum terms = new RegexQuery(new Term("field", "cat.")).getEnum(reader); assertEquals("Wrong term", "cats", terms.term()); assertFalse("Should have only been one term", terms.next()); } finally { reader.close(); } } This test fails on the term check with terms.term() equal to "cathy". Our workaround is to mangle the query like this: String fixed = String.format("(?:%s)$", original); -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1646) QueryParser throws new exceptions even if custom parsing logic threw a better one
[ https://issues.apache.org/jira/browse/LUCENE-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711850#action_12711850 ] Trejkaz commented on LUCENE-1646: - Our improvements are (so far) specific to our subclass of QueryParser, in that we use it when getFieldQuery() gets a value which doesn't make sense for the given field. So in a sense, in our case the query was parsed successfully by the parser, but the input was invalid within one of the fields. As such our custom ParseException subclass has the field name and field value, but it isn't useful to the Lucene project as-is, as the only things throwing it are called from our subclass. :-( > QueryParser throws new exceptions even if custom parsing logic threw a better > one > - > > Key: LUCENE-1646 > URL: https://issues.apache.org/jira/browse/LUCENE-1646 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.4.1 >Reporter: Trejkaz > > We have subclassed QueryParser and have various custom fields. When these > fields contain invalid values, we throw a subclass of ParseException which > has a more useful message (and also a localised message.) > Problem is, Lucene's QueryParser is doing this: > {code} > catch (ParseException tme) { > // rethrow to include the original query: > throw new ParseException("Cannot parse '" +query+ "': " + > tme.getMessage()); > } > {code} > Thus, our nice and useful ParseException is thrown away, replaced by one with > no information about what's actually wrong with the query (it does append > getMessage() but that isn't localised. And it also throws away the > underlying cause for the exception.) > I am about to patch our copy to simply remove these four lines; the caller > knows what the query string was (they have to have a copy of it because they > are passing it in!) so having it in the error message itself is not useful. > Furthermore, when the query string is very big, what the user wants to know > is not that the whole query was bad, but which part of it was bad. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1646) QueryParser throws new exceptions even if custom parsing logic threw a better one
[ https://issues.apache.org/jira/browse/LUCENE-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711412#action_12711412 ] Trejkaz commented on LUCENE-1646: - I guess that's true if you look at exceptions as a logging mechanism, but in our case it's a parsing exception for text coming from the user. Because of this, our use case is for the user to get a useful error message, and it's not useful at all if we just tell them their entire query was bad. Thus we have inserted improvements (in our subclass) to make it complain only about the fragment of the query which is actually a problem, so they know which part to fix. Related, but is there any way it could at least be reduced to the portion of the query which caused the problem? In a way it would be nice if ParseException had methods to get out the problematic fragment (my subclass has it...) I'm guessing this is much easier for exceptions relating to values inside fields which otherwise parsed correctly, but a lot harder to do for exceptions from the parser proper. > QueryParser throws new exceptions even if custom parsing logic threw a better > one > - > > Key: LUCENE-1646 > URL: https://issues.apache.org/jira/browse/LUCENE-1646 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.4.1 >Reporter: Trejkaz > > We have subclassed QueryParser and have various custom fields. When these > fields contain invalid values, we throw a subclass of ParseException which > has a more useful message (and also a localised message.) > Problem is, Lucene's QueryParser is doing this: > {code} > catch (ParseException tme) { > // rethrow to include the original query: > throw new ParseException("Cannot parse '" +query+ "': " + > tme.getMessage()); > } > {code} > Thus, our nice and useful ParseException is thrown away, replaced by one with > no information about what's actually wrong with the query (it does append > getMessage() but that isn't localised. And it also throws away the > underlying cause for the exception.) > I am about to patch our copy to simply remove these four lines; the caller > knows what the query string was (they have to have a copy of it because they > are passing it in!) so having it in the error message itself is not useful. > Furthermore, when the query string is very big, what the user wants to know > is not that the whole query was bad, but which part of it was bad. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-1646) QueryParser throws new exceptions even if custom parsing logic threw a better one
QueryParser throws new exceptions even if custom parsing logic threw a better one - Key: LUCENE-1646 URL: https://issues.apache.org/jira/browse/LUCENE-1646 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.4.1 Reporter: Trejkaz We have subclassed QueryParser and have various custom fields. When these fields contain invalid values, we throw a subclass of ParseException which has a more useful message (and also a localised message.) Problem is, Lucene's QueryParser is doing this: {code} catch (ParseException tme) { // rethrow to include the original query: throw new ParseException("Cannot parse '" +query+ "': " + tme.getMessage()); } {code} Thus, our nice and useful ParseException is thrown away, replaced by one with no information about what's actually wrong with the query (it does append getMessage() but that isn't localised. And it also throws away the underlying cause for the exception.) I am about to patch our copy to simply remove these four lines; the caller knows what the query string was (they have to have a copy of it because they are passing it in!) so having it in the error message itself is not useful. Furthermore, when the query string is very big, what the user wants to know is not that the whole query was bad, but which part of it was bad. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-893) Increase buffer sizes used during searching
[ https://issues.apache.org/jira/browse/LUCENE-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676521#action_12676521 ] Trejkaz commented on LUCENE-893: Someone else on my team did some benchmarks for a query which was slow for us. We have a tree of items (one item per document), and given N items, we walk the path from the root to the item until finding an item with certain properties (expected average queries per item is around 3.) All the queries are against the same field (our unique ID field.) Timings of this process for 60,000 items: Buffer size Time 10241015s 2048540s 4096335s 8196281s 16384 278s 32768 284s 65536 322s > Increase buffer sizes used during searching > --- > > Key: LUCENE-893 > URL: https://issues.apache.org/jira/browse/LUCENE-893 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.1 >Reporter: Michael McCandless > > Spinoff of LUCENE-888. > In LUCENE-888 we increased buffer sizes that impact indexing and found > substantial (10-18%) overall performance gains. > It's very likely that we can also gain some performance for searching > by increasing the read buffers in BufferedIndexInput used by > searching. > We need to test performance impact to verify and then pick a good > overall default buffer size, also being careful not to add too much > overall HEAP RAM usage because a potentially very large number of > BufferedIndexInput instances are created during searching > (# segments X # index files per segment). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1290) Deprecate Hits
[ https://issues.apache.org/jira/browse/LUCENE-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12604701#action_12604701 ] Trejkaz commented on LUCENE-1290: - To answer Doug's initial question, yes, we are using this in a desktop application inside a Swing TableModel. > Deprecate Hits > -- > > Key: LUCENE-1290 > URL: https://issues.apache.org/jira/browse/LUCENE-1290 > Project: Lucene - Java > Issue Type: Task > Components: Search >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: 2.4 > > Attachments: lucene-1290.patch, lucene-1290.patch, lucene-1290.patch > > > The Hits class has several drawbacks as pointed out in LUCENE-954. > The other search APIs that use TopDocCollector and TopDocs should be used > instead. > This patch: > - deprecates org/apache/lucene/search/Hits, Hit, and HitIterator, as well as > the Searcher.search( * ) methods which return a Hits Object. > - removes all references to Hits from the core and uses TopDocs and ScoreDoc > instead > - Changes the demo SearchFiles: adds the two modes 'paging search' and > 'streaming search', > each of which demonstrating a different way of using the search APIs. The > former > uses TopDocs and a TopDocCollector, the latter a custom HitCollector > implementation. > - Updates the online tutorial that descibes the demo. > All tests pass. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1262) IndexOutOfBoundsException from FieldsReader after problem reading the index
[ https://issues.apache.org/jira/browse/LUCENE-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trejkaz updated LUCENE-1262: Attachment: Test.java Attaching a test program to reproduce the problem under 2.3.1. It occurs approximately 1 in every 4 executions for any reasonably large text index (really small ones don't seem to do it so I couldn't attach a text index with it.) The number of fields may be related, looking at the IndexOutOfBoundsException numbers it seems that the indexes we have happen to have a large number of fields. > IndexOutOfBoundsException from FieldsReader after problem reading the index > --- > > Key: LUCENE-1262 > URL: https://issues.apache.org/jira/browse/LUCENE-1262 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 2.3.1 >Reporter: Trejkaz > Attachments: Test.java > > > There is a situation where there is an IOException reading from Hits, and > then the next time you get a NullPointerException instead of an IOException. > Example stack traces: > java.io.IOException: The specified network name is no longer available > at java.io.RandomAccessFile.readBytes(Native Method) > at java.io.RandomAccessFile.read(RandomAccessFile.java:322) > at > org.apache.lucene.store.FSIndexInput.readInternal(FSDirectory.java:536) > at > org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:74) > at > org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal(CompoundFileReader.java:220) > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:93) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:34) > at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:57) > at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:88) > at > org.apache.lucene.index.SegmentReader.document(SegmentReader.java:344) > at org.apache.lucene.index.IndexReader.document(IndexReader.java:368) > at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:84) > at org.apache.lucene.search.Hits.doc(Hits.java:104) > That error is fine. The problem is the next call to doc generates: > java.lang.NullPointerException > at > org.apache.lucene.index.FieldsReader.getIndexType(FieldsReader.java:280) > at org.apache.lucene.index.FieldsReader.addField(FieldsReader.java:216) > at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:101) > at > org.apache.lucene.index.SegmentReader.document(SegmentReader.java:344) > at org.apache.lucene.index.IndexReader.document(IndexReader.java:368) > at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:84) > at org.apache.lucene.search.Hits.doc(Hits.java:104) > Presumably FieldsReader is caching partially-initialised data somewhere. I > would normally expect the exact same IOException to be thrown for subsequent > calls to the method. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1262) IndexOutOfBoundsException from FieldsReader after problem reading the index
[ https://issues.apache.org/jira/browse/LUCENE-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trejkaz updated LUCENE-1262: Affects Version/s: (was: 2.1) 2.3.1 Summary: IndexOutOfBoundsException from FieldsReader after problem reading the index (was: NullPointerException from FieldsReader after problem reading the index) I managed to reproduce the problem as-is under version 2.2. For 2.3 the problem has changed -- instead of a NullPointerException it is now an IndexOutOfBoundsException: Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 52, Size: 34 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:260) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:154) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:659) at org.apache.lucene.index.IndexReader.document(IndexReader.java:525) at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:92) at org.apache.lucene.search.Hits.doc(Hits.java:167) at Test.main(Test.java:24) Will attach my test program in a moment. > IndexOutOfBoundsException from FieldsReader after problem reading the index > --- > > Key: LUCENE-1262 > URL: https://issues.apache.org/jira/browse/LUCENE-1262 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 2.3.1 >Reporter: Trejkaz > > There is a situation where there is an IOException reading from Hits, and > then the next time you get a NullPointerException instead of an IOException. > Example stack traces: > java.io.IOException: The specified network name is no longer available > at java.io.RandomAccessFile.readBytes(Native Method) > at java.io.RandomAccessFile.read(RandomAccessFile.java:322) > at > org.apache.lucene.store.FSIndexInput.readInternal(FSDirectory.java:536) > at > org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:74) > at > org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal(CompoundFileReader.java:220) > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:93) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:34) > at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:57) > at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:88) > at > org.apache.lucene.index.SegmentReader.document(SegmentReader.java:344) > at org.apache.lucene.index.IndexReader.document(IndexReader.java:368) > at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:84) > at org.apache.lucene.search.Hits.doc(Hits.java:104) > That error is fine. The problem is the next call to doc generates: > java.lang.NullPointerException > at > org.apache.lucene.index.FieldsReader.getIndexType(FieldsReader.java:280) > at org.apache.lucene.index.FieldsReader.addField(FieldsReader.java:216) > at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:101) > at > org.apache.lucene.index.SegmentReader.document(SegmentReader.java:344) > at org.apache.lucene.index.IndexReader.document(IndexReader.java:368) > at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:84) > at org.apache.lucene.search.Hits.doc(Hits.java:104) > Presumably FieldsReader is caching partially-initialised data somewhere. I > would normally expect the exact same IOException to be thrown for subsequent > calls to the method. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1262) NullPointerException from FieldsReader after problem reading the index
[ https://issues.apache.org/jira/browse/LUCENE-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trejkaz updated LUCENE-1262: Affects Version/s: (was: 2.2) 2.1 Okay I'll eat my words now, it is indeed 2.1 as the version doesn't have openInput(String,int) in it. Anyway an update: I've managed to reproduce it on any text index by simulating random network outage. I'm keeping a flag which I set to true. The trick is that the wrapping IndexInput implementation *randomly* throws IOException if the flag is true -- if it always throws IOException the problem doesn't occur. If it randomly throws it then it occurs occasionally, and it always seems to be for larger queries (I'm using MatchAllDocsQuery now.) I'll see if I can tweak the code to make it more likely to happen and then start working up to each version of Lucene to see if it stops happening somewhere. > NullPointerException from FieldsReader after problem reading the index > -- > > Key: LUCENE-1262 > URL: https://issues.apache.org/jira/browse/LUCENE-1262 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 2.1 >Reporter: Trejkaz > > There is a situation where there is an IOException reading from Hits, and > then the next time you get a NullPointerException instead of an IOException. > Example stack traces: > java.io.IOException: The specified network name is no longer available > at java.io.RandomAccessFile.readBytes(Native Method) > at java.io.RandomAccessFile.read(RandomAccessFile.java:322) > at > org.apache.lucene.store.FSIndexInput.readInternal(FSDirectory.java:536) > at > org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:74) > at > org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal(CompoundFileReader.java:220) > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:93) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:34) > at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:57) > at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:88) > at > org.apache.lucene.index.SegmentReader.document(SegmentReader.java:344) > at org.apache.lucene.index.IndexReader.document(IndexReader.java:368) > at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:84) > at org.apache.lucene.search.Hits.doc(Hits.java:104) > That error is fine. The problem is the next call to doc generates: > java.lang.NullPointerException > at > org.apache.lucene.index.FieldsReader.getIndexType(FieldsReader.java:280) > at org.apache.lucene.index.FieldsReader.addField(FieldsReader.java:216) > at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:101) > at > org.apache.lucene.index.SegmentReader.document(SegmentReader.java:344) > at org.apache.lucene.index.IndexReader.document(IndexReader.java:368) > at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:84) > at org.apache.lucene.search.Hits.doc(Hits.java:104) > Presumably FieldsReader is caching partially-initialised data somewhere. I > would normally expect the exact same IOException to be thrown for subsequent > calls to the method. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1262) NullPointerException from FieldsReader after problem reading the index
[ https://issues.apache.org/jira/browse/LUCENE-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trejkaz updated LUCENE-1262: Affects Version/s: (was: 2.3.1) 2.2 Whoops. I don't think it's 2.1 but it must be 2.2. I'll try and reproduce this standalone but first I need a way to have readInternal throw an exception. I presume you were using some kind of custom store implementation to do that, I'll see if I can make it happen.under 2.2 and then try the same thing under 2.3.1 to confirm whether it still breaks. > NullPointerException from FieldsReader after problem reading the index > -- > > Key: LUCENE-1262 > URL: https://issues.apache.org/jira/browse/LUCENE-1262 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 2.2 >Reporter: Trejkaz > > There is a situation where there is an IOException reading from Hits, and > then the next time you get a NullPointerException instead of an IOException. > Example stack traces: > java.io.IOException: The specified network name is no longer available > at java.io.RandomAccessFile.readBytes(Native Method) > at java.io.RandomAccessFile.read(RandomAccessFile.java:322) > at > org.apache.lucene.store.FSIndexInput.readInternal(FSDirectory.java:536) > at > org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:74) > at > org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal(CompoundFileReader.java:220) > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:93) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:34) > at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:57) > at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:88) > at > org.apache.lucene.index.SegmentReader.document(SegmentReader.java:344) > at org.apache.lucene.index.IndexReader.document(IndexReader.java:368) > at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:84) > at org.apache.lucene.search.Hits.doc(Hits.java:104) > That error is fine. The problem is the next call to doc generates: > java.lang.NullPointerException > at > org.apache.lucene.index.FieldsReader.getIndexType(FieldsReader.java:280) > at org.apache.lucene.index.FieldsReader.addField(FieldsReader.java:216) > at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:101) > at > org.apache.lucene.index.SegmentReader.document(SegmentReader.java:344) > at org.apache.lucene.index.IndexReader.document(IndexReader.java:368) > at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:84) > at org.apache.lucene.search.Hits.doc(Hits.java:104) > Presumably FieldsReader is caching partially-initialised data somewhere. I > would normally expect the exact same IOException to be thrown for subsequent > calls to the method. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1262) NullPointerException from FieldsReader after problem reading the index
NullPointerException from FieldsReader after problem reading the index -- Key: LUCENE-1262 URL: https://issues.apache.org/jira/browse/LUCENE-1262 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3.1 Reporter: Trejkaz There is a situation where there is an IOException reading from Hits, and then the next time you get a NullPointerException instead of an IOException. Example stack traces: java.io.IOException: The specified network name is no longer available at java.io.RandomAccessFile.readBytes(Native Method) at java.io.RandomAccessFile.read(RandomAccessFile.java:322) at org.apache.lucene.store.FSIndexInput.readInternal(FSDirectory.java:536) at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:74) at org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal(CompoundFileReader.java:220) at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:93) at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:34) at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:57) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:88) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:344) at org.apache.lucene.index.IndexReader.document(IndexReader.java:368) at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:84) at org.apache.lucene.search.Hits.doc(Hits.java:104) That error is fine. The problem is the next call to doc generates: java.lang.NullPointerException at org.apache.lucene.index.FieldsReader.getIndexType(FieldsReader.java:280) at org.apache.lucene.index.FieldsReader.addField(FieldsReader.java:216) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:101) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:344) at org.apache.lucene.index.IndexReader.document(IndexReader.java:368) at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:84) at org.apache.lucene.search.Hits.doc(Hits.java:104) Presumably FieldsReader is caching partially-initialised data somewhere. I would normally expect the exact same IOException to be thrown for subsequent calls to the method. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Issue Comment Edited: (LUCENE-1245) MultiFieldQueryParser is not friendly for overriding getFieldQuery(String,String,int)
[ https://issues.apache.org/jira/browse/LUCENE-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582490#action_12582490 ] trejkaz edited comment on LUCENE-1245 at 3/26/08 5:32 PM: -- Here's an example illustrating the way we were using it, although instead of changing the query text we're actually returning a different query class -- that class isn't in Lucene Core and also it's easier to build up an expected query if it's just a TermQuery. {noformat} public void testOverrideGetFieldQuery() throws Exception { String[] fields = { "a", "b" }; QueryParser parser = new MultiFieldQueryParser(fields, new StandardAnalyzer()) { protected Query getFieldQuery(String field, String queryText, int slop) throws ParseException { if (field != null && slop == 1) { queryText = "z" + queryText; } return super.getFieldQuery(field, queryText, slop); } }; BooleanQuery expected = new BooleanQuery(); expected.add(new TermQuery(new Term("a", "zabc")), BooleanClause.Occur.SHOULD); expected.add(new TermQuery(new Term("b", "zabc")), BooleanClause.Occur.SHOULD); assertEquals("Expected a mangled query", expected, parser.parse("\"abc\"~1")); } {noformat} was (Author: trejkaz): Here's an example illustrating the way we were using it, although instead of changing the query text we're actually returning a different query class -- that class isn't in Lucene Core and also it's easier to build up an expected query if it's just a TermQuery. {noformat} public void testOverrideGetFieldQuery() throws Exception { String[] fields = { "a", "b" }; QueryParser parser = new MultiFieldQueryParser(fields, new StandardAnalyzer()) { protected Query getFieldQuery(String field, String queryText, int slop) throws ParseException { if (field != null && slop == 1) { field = "z" + field; } return super.getFieldQuery(field, queryText, slop); } }; BooleanQuery expected = new BooleanQuery(); expected.add(new TermQuery(new Term("a", "zabc")), BooleanClause.Occur.SHOULD); expected.add(new TermQuery(new Term("b", "zabc")), BooleanClause.Occur.SHOULD); assertEquals("Expected a mangled query", expected, parser.parse("\"abc\"~1")); } {noformat} > MultiFieldQueryParser is not friendly for overriding > getFieldQuery(String,String,int) > - > > Key: LUCENE-1245 > URL: https://issues.apache.org/jira/browse/LUCENE-1245 > Project: Lucene - Java > Issue Type: Improvement > Components: QueryParser >Affects Versions: 2.3.2 >Reporter: Trejkaz > Attachments: multifield.patch > > > LUCENE-1213 fixed an issue in MultiFieldQueryParser where the slop parameter > wasn't being properly applied. Problem is, the fix which eventually got > committed is calling super.getFieldQuery(String,String), bypassing any > possibility of customising the query behaviour. > This should be relatively simply fixable by modifying > getFieldQuery(String,String,int) to, if field is null, recursively call > getFieldQuery(String,String,int) instead of setting the slop itself. This > gives subclasses which override either getFieldQuery method a chance to do > something different. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Issue Comment Edited: (LUCENE-1245) MultiFieldQueryParser is not friendly for overriding getFieldQuery(String,String,int)
[ https://issues.apache.org/jira/browse/LUCENE-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582490#action_12582490 ] trejkaz edited comment on LUCENE-1245 at 3/26/08 5:13 PM: -- Here's an example illustrating the way we were using it, although instead of changing the query text we're actually returning a different query class -- that class isn't in Lucene Core and also it's easier to build up an expected query if it's just a TermQuery. {noformat} public void testOverrideGetFieldQuery() throws Exception { String[] fields = { "a", "b" }; QueryParser parser = new MultiFieldQueryParser(fields, new StandardAnalyzer()) { protected Query getFieldQuery(String field, String queryText, int slop) throws ParseException { if (field != null && slop == 1) { field = "z" + field; } return super.getFieldQuery(field, queryText, slop); } }; BooleanQuery expected = new BooleanQuery(); expected.add(new TermQuery(new Term("a", "zabc")), BooleanClause.Occur.SHOULD); expected.add(new TermQuery(new Term("b", "zabc")), BooleanClause.Occur.SHOULD); assertEquals("Expected a mangled query", expected, parser.parse("\"abc\"~1")); } {noformat} was (Author: trejkaz): Here's an example illustrating the way we were using it, although instead of changing the query text we're actually returning a different query class -- that class isn't in Lucene Core and also it's easier to build up an expected query if it's just a TermQuery. public void testOverrideGetFieldQuery() throws Exception { String[] fields = { "a", "b" }; QueryParser parser = new MultiFieldQueryParser(fields, new StandardAnalyzer()) { protected Query getFieldQuery(String field, String queryText, int slop) throws ParseException { if (field != null && slop == 1) { field = "z" + field; } return super.getFieldQuery(field, queryText, slop); } }; BooleanQuery expected = new BooleanQuery(); expected.add(new TermQuery(new Term("a", "zabc")), BooleanClause.Occur.SHOULD); expected.add(new TermQuery(new Term("b", "zabc")), BooleanClause.Occur.SHOULD); assertEquals("Expected a mangled query", expected, parser.parse("\"abc\"~1")); } > MultiFieldQueryParser is not friendly for overriding > getFieldQuery(String,String,int) > - > > Key: LUCENE-1245 > URL: https://issues.apache.org/jira/browse/LUCENE-1245 > Project: Lucene - Java > Issue Type: Improvement > Components: QueryParser >Affects Versions: 2.3.2 >Reporter: Trejkaz > Attachments: multifield.patch > > > LUCENE-1213 fixed an issue in MultiFieldQueryParser where the slop parameter > wasn't being properly applied. Problem is, the fix which eventually got > committed is calling super.getFieldQuery(String,String), bypassing any > possibility of customising the query behaviour. > This should be relatively simply fixable by modifying > getFieldQuery(String,String,int) to, if field is null, recursively call > getFieldQuery(String,String,int) instead of setting the slop itself. This > gives subclasses which override either getFieldQuery method a chance to do > something different. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1245) MultiFieldQueryParser is not friendly for overriding getFieldQuery(String,String,int)
[ https://issues.apache.org/jira/browse/LUCENE-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582490#action_12582490 ] Trejkaz commented on LUCENE-1245: - Here's an example illustrating the way we were using it, although instead of changing the query text we're actually returning a different query class -- that class isn't in Lucene Core and also it's easier to build up an expected query if it's just a TermQuery. public void testOverrideGetFieldQuery() throws Exception { String[] fields = { "a", "b" }; QueryParser parser = new MultiFieldQueryParser(fields, new StandardAnalyzer()) { protected Query getFieldQuery(String field, String queryText, int slop) throws ParseException { if (field != null && slop == 1) { field = "z" + field; } return super.getFieldQuery(field, queryText, slop); } }; BooleanQuery expected = new BooleanQuery(); expected.add(new TermQuery(new Term("a", "zabc")), BooleanClause.Occur.SHOULD); expected.add(new TermQuery(new Term("b", "zabc")), BooleanClause.Occur.SHOULD); assertEquals("Expected a mangled query", expected, parser.parse("\"abc\"~1")); } > MultiFieldQueryParser is not friendly for overriding > getFieldQuery(String,String,int) > - > > Key: LUCENE-1245 > URL: https://issues.apache.org/jira/browse/LUCENE-1245 > Project: Lucene - Java > Issue Type: Improvement > Components: QueryParser >Affects Versions: 2.3.2 >Reporter: Trejkaz > Attachments: multifield.patch > > > LUCENE-1213 fixed an issue in MultiFieldQueryParser where the slop parameter > wasn't being properly applied. Problem is, the fix which eventually got > committed is calling super.getFieldQuery(String,String), bypassing any > possibility of customising the query behaviour. > This should be relatively simply fixable by modifying > getFieldQuery(String,String,int) to, if field is null, recursively call > getFieldQuery(String,String,int) instead of setting the slop itself. This > gives subclasses which override either getFieldQuery method a chance to do > something different. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1245) MultiFieldQueryParser is not friendly for overriding getFieldQuery(String,String,int)
[ https://issues.apache.org/jira/browse/LUCENE-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trejkaz updated LUCENE-1245: Attachment: multifield.patch Fix makes getFieldQuery(String,String) and getFieldQuery(String,String,int) work more or less the same. Neither calls methods on super and thus overriding the methods will work (and does. Although I have no unit test for this yet.) Common boosting logic is extracted to an applyBoost method. Also the check for the clauses being empty, I have removed... as getBooleanQuery appears to be doing that already. > MultiFieldQueryParser is not friendly for overriding > getFieldQuery(String,String,int) > - > > Key: LUCENE-1245 > URL: https://issues.apache.org/jira/browse/LUCENE-1245 > Project: Lucene - Java > Issue Type: Improvement > Components: QueryParser >Affects Versions: 2.3.2 >Reporter: Trejkaz > Attachments: multifield.patch > > > LUCENE-1213 fixed an issue in MultiFieldQueryParser where the slop parameter > wasn't being properly applied. Problem is, the fix which eventually got > committed is calling super.getFieldQuery(String,String), bypassing any > possibility of customising the query behaviour. > This should be relatively simply fixable by modifying > getFieldQuery(String,String,int) to, if field is null, recursively call > getFieldQuery(String,String,int) instead of setting the slop itself. This > gives subclasses which override either getFieldQuery method a chance to do > something different. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1245) MultiFieldQueryParser is not friendly for overriding getFieldQuery(String,String,int)
[ https://issues.apache.org/jira/browse/LUCENE-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trejkaz updated LUCENE-1245: Lucene Fields: [New, Patch Available] (was: [New]) Summary: MultiFieldQueryParser is not friendly for overriding getFieldQuery(String,String,int) (was: MultiFieldQueryParser is not friendly for overriding) (Updating title to be more specific about what wasn't friendly.) > MultiFieldQueryParser is not friendly for overriding > getFieldQuery(String,String,int) > - > > Key: LUCENE-1245 > URL: https://issues.apache.org/jira/browse/LUCENE-1245 > Project: Lucene - Java > Issue Type: Improvement > Components: QueryParser >Affects Versions: 2.3.2 >Reporter: Trejkaz > > LUCENE-1213 fixed an issue in MultiFieldQueryParser where the slop parameter > wasn't being properly applied. Problem is, the fix which eventually got > committed is calling super.getFieldQuery(String,String), bypassing any > possibility of customising the query behaviour. > This should be relatively simply fixable by modifying > getFieldQuery(String,String,int) to, if field is null, recursively call > getFieldQuery(String,String,int) instead of setting the slop itself. This > gives subclasses which override either getFieldQuery method a chance to do > something different. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1245) MultiFieldQueryParser is not friendly for overriding
MultiFieldQueryParser is not friendly for overriding Key: LUCENE-1245 URL: https://issues.apache.org/jira/browse/LUCENE-1245 Project: Lucene - Java Issue Type: Improvement Components: QueryParser Affects Versions: 2.3.2 Reporter: Trejkaz LUCENE-1213 fixed an issue in MultiFieldQueryParser where the slop parameter wasn't being properly applied. Problem is, the fix which eventually got committed is calling super.getFieldQuery(String,String), bypassing any possibility of customising the query behaviour. This should be relatively simply fixable by modifying getFieldQuery(String,String,int) to, if field is null, recursively call getFieldQuery(String,String,int) instead of setting the slop itself. This gives subclasses which override either getFieldQuery method a chance to do something different. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1240) TermsFilter: reuse TermDocs
[ https://issues.apache.org/jira/browse/LUCENE-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trejkaz updated LUCENE-1240: Lucene Fields: [New, Patch Available] (was: [New]) > TermsFilter: reuse TermDocs > --- > > Key: LUCENE-1240 > URL: https://issues.apache.org/jira/browse/LUCENE-1240 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.3.1 >Reporter: Trejkaz > Attachments: terms-filter.patch > > > TermsFilter currently calls termDocs(Term) once per term in the TermsFilter. > If we sort the terms it's filtering on, this can be optimised to call > termDocs() once and then skip(Term) once per term, which should significantly > speed up this filter. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1240) TermsFilter: reuse TermDocs
[ https://issues.apache.org/jira/browse/LUCENE-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trejkaz updated LUCENE-1240: Attachment: terms-filter.patch Attaching my attempt at improving this. The original code didn't close all the TermDocs it created either; this is now fixed also. > TermsFilter: reuse TermDocs > --- > > Key: LUCENE-1240 > URL: https://issues.apache.org/jira/browse/LUCENE-1240 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.3.1 >Reporter: Trejkaz > Attachments: terms-filter.patch > > > TermsFilter currently calls termDocs(Term) once per term in the TermsFilter. > If we sort the terms it's filtering on, this can be optimised to call > termDocs() once and then skip(Term) once per term, which should significantly > speed up this filter. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1240) TermsFilter: reuse TermDocs
TermsFilter: reuse TermDocs --- Key: LUCENE-1240 URL: https://issues.apache.org/jira/browse/LUCENE-1240 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.3.1 Reporter: Trejkaz TermsFilter currently calls termDocs(Term) once per term in the TermsFilter. If we sort the terms it's filtering on, this can be optimised to call termDocs() once and then skip(Term) once per term, which should significantly speed up this filter. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1213) MultiFieldQueryParser ignores slop parameter
[ https://issues.apache.org/jira/browse/LUCENE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trejkaz updated LUCENE-1213: Attachment: multifield-fix.patch Attaching one possible fix. It's more verbose than I wish it could be, but I couldn't think of a reliable way to make it delegate as it would require casting the result to BooleanQuery to get the clauses our, and a subclass may return something else entirely. > MultiFieldQueryParser ignores slop parameter > > > Key: LUCENE-1213 > URL: https://issues.apache.org/jira/browse/LUCENE-1213 > Project: Lucene - Java > Issue Type: Bug > Components: QueryParser >Reporter: Trejkaz > Attachments: multifield-fix.patch > > > MultiFieldQueryParser.getFieldQuery(String, String, int) calls > super.getFieldQuery(String, String), thus obliterating any slop parameter > present in the query. > It should probably be changed to call super.getFieldQuery(String, String, > int), except doing only that will result in a recursive loop which is a > side-effect of what may be a deeper problem in MultiFieldQueryParser -- > getFieldQuery(String, String, int) is documented as delegating to > getFieldQuery(String, String), yet what it actually does is the exact > opposite. This also causes problems for subclasses which need to override > getFieldQuery(String, String) to provide different behaviour. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1213) MultiFieldQueryParser ignores slop parameter
MultiFieldQueryParser ignores slop parameter Key: LUCENE-1213 URL: https://issues.apache.org/jira/browse/LUCENE-1213 Project: Lucene - Java Issue Type: Bug Reporter: Trejkaz MultiFieldQueryParser.getFieldQuery(String, String, int) calls super.getFieldQuery(String, String), thus obliterating any slop parameter present in the query. It should probably be changed to call super.getFieldQuery(String, String, int), except doing only that will result in a recursive loop which is a side-effect of what may be a deeper problem in MultiFieldQueryParser -- getFieldQuery(String, String, int) is documented as delegating to getFieldQuery(String, String), yet what it actually does is the exact opposite. This also causes problems for subclasses which need to override getFieldQuery(String, String) to provide different behaviour. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1213) MultiFieldQueryParser ignores slop parameter
[ https://issues.apache.org/jira/browse/LUCENE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trejkaz updated LUCENE-1213: Component/s: QueryParser > MultiFieldQueryParser ignores slop parameter > > > Key: LUCENE-1213 > URL: https://issues.apache.org/jira/browse/LUCENE-1213 > Project: Lucene - Java > Issue Type: Bug > Components: QueryParser >Reporter: Trejkaz > > MultiFieldQueryParser.getFieldQuery(String, String, int) calls > super.getFieldQuery(String, String), thus obliterating any slop parameter > present in the query. > It should probably be changed to call super.getFieldQuery(String, String, > int), except doing only that will result in a recursive loop which is a > side-effect of what may be a deeper problem in MultiFieldQueryParser -- > getFieldQuery(String, String, int) is documented as delegating to > getFieldQuery(String, String), yet what it actually does is the exact > opposite. This also causes problems for subclasses which need to override > getFieldQuery(String, String) to provide different behaviour. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1206) Ability to store Reader / InputStream fields
Ability to store Reader / InputStream fields Key: LUCENE-1206 URL: https://issues.apache.org/jira/browse/LUCENE-1206 Project: Lucene - Java Issue Type: New Feature Components: Index Reporter: Trejkaz In some situations we would like to store the whole text, but the whole text won't always fit in memory so we can't create a String. Likewise for storing binary, it would sometimes be better if we didn't have to read into a byte[] up-front (even when it doesn't use much memory, it increases the number of copies made and adds burden to GC.) FieldsWriter currently writes the length at the start of the chunks though, so I don't know whether it would be possible to seek back and write the length after writing the data. It would also be useful to use this in conjunction with compression, both for Reader and InputStream types. And when retrieving the field, it should be possible to create a Reader without reading the entire String into memory up-front. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1181) Token reuse is not ideal for avoiding array copies
Token reuse is not ideal for avoiding array copies -- Key: LUCENE-1181 URL: https://issues.apache.org/jira/browse/LUCENE-1181 Project: Lucene - Java Issue Type: Improvement Components: Analysis Affects Versions: 2.3 Reporter: Trejkaz The way the Token API is currently written results in two unnecessary array copies which could be avoided by changing the way it works. 1. setTermBuffer(char[],int,int) calls resizeTermBuffer(int) which copies the original term text even though it's about to be overwritten. #1 should be trivially fixable by introducing a private resizeTermBuffer(int,boolean) where the new boolean parameter specifies whether the existing term data gets copied over or not. 2. setTermBuffer(char[],int,int) copies what you pass in, instead of actually setting the term buffer. Setting aside the fact that the setTermBuffer method is misleadingly named, consider a token filter which performs Unicode normalisation on each token. How it has to be implemented at present: once: - create a reusable char[] for storing the normalisation result every token: - use getTermBuffer() and getTermLength() to get the buffer and relevant length - normalise the original string into our temporary buffer (if it isn't big enough, grow the temp buffer size.) - setTermBuffer(byte[],int,int) - this does an extra copy. The following sequence would be much better: once: - create a reusable char[] for storing the normalisation result every token: - use getTermBuffer() and getTermLength() to get the buffer and relevant length - normalise the original string into our temporary buffer (if it isn't big enough, grow the temp buffer size.) - setTermBuffer(byte[],int,int) sets in our buffer by reference - set the term buffer which used to be in the Token such that it becomes our new temp buffer. The latter sequence results in no copying with the exception of the normalisation itself, which is unavoidable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-587) Explanation.toHtml outputs invalid HTML
Explanation.toHtml outputs invalid HTML --- Key: LUCENE-587 URL: http://issues.apache.org/jira/browse/LUCENE-587 Project: Lucene - Java Type: Bug Components: Search Versions: 2.0.0 Reporter: Trejkaz If you want an HTML representation of an Explanation, you might call the toHtml() method. However, the output of this method looks like the following: some value = some description some nested value = some description As it is illegal in HTML to nest a UL directly inside a UL, this method will always output unparseable HTML if there are nested explanations. What Lucene probably means to output is the following, which is valid HTML: some value = some description some nested value = some description -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-458) Merging may create duplicates if the JVM crashes half way through
[ http://issues.apache.org/jira/browse/LUCENE-458?page=comments#action_12356029 ] Trejkaz commented on LUCENE-458: I was thinking more along the lines of... 1. open a reader, writer 2. read the document 3. write a marker marking that this document is the result of a move of another one 4. write the document 5. delete the original document 6. delete the marker 7. close the reader, writer Then later on, when the reader opens an index and finds a marker, it goes and checks the location the marker points at, and if the location is still there, it continues from step 5 again. > Merging may create duplicates if the JVM crashes half way through > - > > Key: LUCENE-458 > URL: http://issues.apache.org/jira/browse/LUCENE-458 > Project: Lucene - Java > Type: Bug > Versions: 1.4 > Environment: Windows XP SP2, JDK 1.5.0_04 (crash occurred in this version. > We've updated to 1.5.0_05 since, but discovered this issue with an older text > index since.) > Reporter: Trejkaz > > In the past, our indexing process crashed due to a Hotspot compiler bug on > SMP systems (although it could happen with any bad native code.) Everything > picked up and appeared to work, but now that it's a month later I've > discovered an oddity in the text index. > We have two documents which are identical in the text index. I know we only > stored it once for two reasons. First, we store the MD5 of every document > into the hash and the MD5s were the same. Second, we store a GUID into each > document which is generated uniquely for each document. The GUID and the MD5 > hash on these two documents, as well as all other fields, is exactly the same. > My conclusion is that a merge was occurring at the point the JVM crashed, > which is consistent with the time the process crashed. Is it possible that > Lucene did the copy of this document to the new location, and didn't get to > delete the original? > If so, I guess this issue should be prevented somehow. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-458) Merging may create duplicates if the JVM crashes half way through
Merging may create duplicates if the JVM crashes half way through - Key: LUCENE-458 URL: http://issues.apache.org/jira/browse/LUCENE-458 Project: Lucene - Java Type: Bug Versions: 1.4 Environment: Windows XP SP2, JDK 1.5.0_04 (crash occurred in this version. We've updated to 1.5.0_05 since, but discovered this issue with an older text index since.) Reporter: Trejkaz In the past, our indexing process crashed due to a Hotspot compiler bug on SMP systems (although it could happen with any bad native code.) Everything picked up and appeared to work, but now that it's a month later I've discovered an oddity in the text index. We have two documents which are identical in the text index. I know we only stored it once for two reasons. First, we store the MD5 of every document into the hash and the MD5s were the same. Second, we store a GUID into each document which is generated uniquely for each document. The GUID and the MD5 hash on these two documents, as well as all other fields, is exactly the same. My conclusion is that a merge was occurring at the point the JVM crashed, which is consistent with the time the process crashed. Is it possible that Lucene did the copy of this document to the new location, and didn't get to delete the original? If so, I guess this issue should be prevented somehow. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]