[jira] Updated: (LUCENE-2348) DuplicateFilter incorrectly handles multiple calls to getDocIdSet for segment readers

2010-03-25 Thread Trejkaz (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trejkaz updated LUCENE-2348:


Component/s: (was: Search)
 contrib/*

Changing to contrib, only just realised it was in that location...


> DuplicateFilter incorrectly handles multiple calls to getDocIdSet for segment 
> readers
> -
>
> Key: LUCENE-2348
> URL: https://issues.apache.org/jira/browse/LUCENE-2348
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/*
>Affects Versions: 2.9.2
>Reporter: Trejkaz
>
> DuplicateFilter currently works by building a single doc ID set, without 
> taking into account that getDocIdSet() will be called once per segment and 
> only with each segment's local reader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2348) DuplicateFilter incorrectly handles multiple calls to getDocIdSet for segment readers

2010-03-25 Thread Trejkaz (JIRA)
DuplicateFilter incorrectly handles multiple calls to getDocIdSet for segment 
readers
-

 Key: LUCENE-2348
 URL: https://issues.apache.org/jira/browse/LUCENE-2348
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.9.2
Reporter: Trejkaz


DuplicateFilter currently works by building a single doc ID set, without taking 
into account that getDocIdSet() will be called once per segment and only with 
each segment's local reader.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1683) RegexQuery matches terms the input regex doesn't actually match

2009-06-10 Thread Trejkaz (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718270#action_12718270
 ] 

Trejkaz commented on LUCENE-1683:
-

I screwed up the formatting.  Fixed version:

{code}
@Test
public void testNecessity() throws Exception
{
File dir = new File(new File(System.getProperty("java.io.tmpdir")), 
"index");
IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer(), true);
Document doc = new Document();
doc.add(new Field("field", "cat cats cathy", Field.Store.YES, 
Field.Index.TOKENIZED));
writer.addDocument(doc);
writer.close();

IndexReader reader = IndexReader.open(dir);

TermEnum terms = new RegexQuery(new Term("field", 
"cat.")).getEnum(reader);
assertEquals("Wrong term", "cats", terms.term().text());
assertFalse("Should have only been one term", terms.next());
}
{code}


> RegexQuery matches terms the input regex doesn't actually match
> ---
>
> Key: LUCENE-1683
> URL: https://issues.apache.org/jira/browse/LUCENE-1683
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/*
>Affects Versions: 2.3.2
>Reporter: Trejkaz
>
> I was writing some unit tests for our own wrapper around the Lucene regex 
> classes, and got tripped up by something interesting.
> The regex "cat." will match "cats" but also anything with "cat" and 1+ 
> following letters (e.g. "cathy", "catcher", ...)  It is as if there is an 
> implicit .* always added to the end of the regex.
> Here's a unit test for the behaviour I would expect myself:
> @Test
> public void testNecessity() throws Exception {
> File dir = new File(new File(System.getProperty("java.io.tmpdir")), 
> "index");
> IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer(), 
> true);
> try {
> Document doc = new Document();
> doc.add(new Field("field", "cat cats cathy", Field.Store.YES, 
> Field.Index.TOKENIZED));
> writer.addDocument(doc);
> } finally {
> writer.close();
> }
> IndexReader reader = IndexReader.open(dir);
> try {
> TermEnum terms = new RegexQuery(new Term("field", 
> "cat.")).getEnum(reader);
> assertEquals("Wrong term", "cats", terms.term());
> assertFalse("Should have only been one term", terms.next());
> } finally {
> reader.close();
> }
> }
> This test fails on the term check with terms.term() equal to "cathy".
> Our workaround is to mangle the query like this:
> String fixed = String.format("(?:%s)$", original);

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-1683) RegexQuery matches terms the input regex doesn't actually match

2009-06-10 Thread Trejkaz (JIRA)
RegexQuery matches terms the input regex doesn't actually match
---

 Key: LUCENE-1683
 URL: https://issues.apache.org/jira/browse/LUCENE-1683
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Affects Versions: 2.3.2
Reporter: Trejkaz


I was writing some unit tests for our own wrapper around the Lucene regex 
classes, and got tripped up by something interesting.

The regex "cat." will match "cats" but also anything with "cat" and 1+ 
following letters (e.g. "cathy", "catcher", ...)  It is as if there is an 
implicit .* always added to the end of the regex.

Here's a unit test for the behaviour I would expect myself:

@Test
public void testNecessity() throws Exception {
File dir = new File(new File(System.getProperty("java.io.tmpdir")), 
"index");
IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer(), true);
try {
Document doc = new Document();
doc.add(new Field("field", "cat cats cathy", Field.Store.YES, 
Field.Index.TOKENIZED));
writer.addDocument(doc);
} finally {
writer.close();
}

IndexReader reader = IndexReader.open(dir);
try {
TermEnum terms = new RegexQuery(new Term("field", 
"cat.")).getEnum(reader);
assertEquals("Wrong term", "cats", terms.term());
assertFalse("Should have only been one term", terms.next());
} finally {
reader.close();
}
}

This test fails on the term check with terms.term() equal to "cathy".

Our workaround is to mangle the query like this:

String fixed = String.format("(?:%s)$", original);


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1646) QueryParser throws new exceptions even if custom parsing logic threw a better one

2009-05-21 Thread Trejkaz (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711850#action_12711850
 ] 

Trejkaz commented on LUCENE-1646:
-

Our improvements are (so far) specific to our subclass of QueryParser, in that 
we use it when getFieldQuery() gets a value which doesn't make sense for the 
given field.

So in a sense, in our case the query was parsed successfully by the parser, but 
the input was invalid within one of the fields.  As such our custom 
ParseException subclass has the field name and field value, but it isn't useful 
to the Lucene project as-is, as the only things throwing it are called from our 
subclass. :-(


> QueryParser throws new exceptions even if custom parsing logic threw a better 
> one
> -
>
> Key: LUCENE-1646
> URL: https://issues.apache.org/jira/browse/LUCENE-1646
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.4.1
>Reporter: Trejkaz
>
> We have subclassed QueryParser and have various custom fields.  When these 
> fields contain invalid values, we throw a subclass of ParseException which 
> has a more useful message (and also a localised message.)
> Problem is, Lucene's QueryParser is doing this:
> {code}
> catch (ParseException tme) {
> // rethrow to include the original query:
> throw new ParseException("Cannot parse '" +query+ "': " + 
> tme.getMessage());
> }
> {code}
> Thus, our nice and useful ParseException is thrown away, replaced by one with 
> no information about what's actually wrong with the query (it does append 
> getMessage() but that isn't localised.  And it also throws away the 
> underlying cause for the exception.)
> I am about to patch our copy to simply remove these four lines; the caller 
> knows what the query string was (they have to have a copy of it because they 
> are passing it in!) so having it in the error message itself is not useful.  
> Furthermore, when the query string is very big, what the user wants to know 
> is not that the whole query was bad, but which part of it was bad.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1646) QueryParser throws new exceptions even if custom parsing logic threw a better one

2009-05-20 Thread Trejkaz (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711412#action_12711412
 ] 

Trejkaz commented on LUCENE-1646:
-

I guess that's true if you look at exceptions as a logging mechanism, but in 
our case it's a parsing exception for text coming from the user.  Because of 
this, our use case is for the user to get a useful error message, and it's not 
useful at all if we just tell them their entire query was bad.  Thus we have 
inserted improvements (in our subclass) to make it complain only about the 
fragment of the query which is actually a problem, so they know which part to 
fix.

Related, but is there any way it could at least be reduced to the portion of 
the query which caused the problem?   In a way it would be nice if 
ParseException had methods to get out the problematic fragment (my subclass has 
it...)  I'm guessing this is much easier for exceptions relating to values 
inside fields which otherwise parsed correctly, but a lot harder to do for 
exceptions from the parser proper.



> QueryParser throws new exceptions even if custom parsing logic threw a better 
> one
> -
>
> Key: LUCENE-1646
> URL: https://issues.apache.org/jira/browse/LUCENE-1646
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.4.1
>Reporter: Trejkaz
>
> We have subclassed QueryParser and have various custom fields.  When these 
> fields contain invalid values, we throw a subclass of ParseException which 
> has a more useful message (and also a localised message.)
> Problem is, Lucene's QueryParser is doing this:
> {code}
> catch (ParseException tme) {
> // rethrow to include the original query:
> throw new ParseException("Cannot parse '" +query+ "': " + 
> tme.getMessage());
> }
> {code}
> Thus, our nice and useful ParseException is thrown away, replaced by one with 
> no information about what's actually wrong with the query (it does append 
> getMessage() but that isn't localised.  And it also throws away the 
> underlying cause for the exception.)
> I am about to patch our copy to simply remove these four lines; the caller 
> knows what the query string was (they have to have a copy of it because they 
> are passing it in!) so having it in the error message itself is not useful.  
> Furthermore, when the query string is very big, what the user wants to know 
> is not that the whole query was bad, but which part of it was bad.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-1646) QueryParser throws new exceptions even if custom parsing logic threw a better one

2009-05-19 Thread Trejkaz (JIRA)
QueryParser throws new exceptions even if custom parsing logic threw a better 
one
-

 Key: LUCENE-1646
 URL: https://issues.apache.org/jira/browse/LUCENE-1646
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.4.1
Reporter: Trejkaz


We have subclassed QueryParser and have various custom fields.  When these 
fields contain invalid values, we throw a subclass of ParseException which has 
a more useful message (and also a localised message.)

Problem is, Lucene's QueryParser is doing this:

{code}
catch (ParseException tme) {
// rethrow to include the original query:
throw new ParseException("Cannot parse '" +query+ "': " + 
tme.getMessage());
}
{code}

Thus, our nice and useful ParseException is thrown away, replaced by one with 
no information about what's actually wrong with the query (it does append 
getMessage() but that isn't localised.  And it also throws away the underlying 
cause for the exception.)

I am about to patch our copy to simply remove these four lines; the caller 
knows what the query string was (they have to have a copy of it because they 
are passing it in!) so having it in the error message itself is not useful.  
Furthermore, when the query string is very big, what the user wants to know is 
not that the whole query was bad, but which part of it was bad.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-893) Increase buffer sizes used during searching

2009-02-24 Thread Trejkaz (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676521#action_12676521
 ] 

Trejkaz commented on LUCENE-893:


Someone else on my team did some benchmarks for a query which was slow for us.

We have a tree of items (one item per document), and given N items, we walk the 
path from the root to the item until finding an item with certain properties 
(expected average queries per item is around 3.)  All the queries are against 
the same field (our unique ID field.)

Timings of this process for 60,000 items:

Buffer size Time
10241015s
2048540s
4096335s
8196281s
16384   278s
32768   284s
65536   322s



> Increase buffer sizes used during searching
> ---
>
> Key: LUCENE-893
> URL: https://issues.apache.org/jira/browse/LUCENE-893
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.1
>Reporter: Michael McCandless
>
> Spinoff of LUCENE-888.
> In LUCENE-888 we increased buffer sizes that impact indexing and found
> substantial (10-18%) overall performance gains.
> It's very likely that we can also gain some performance for searching
> by increasing the read buffers in BufferedIndexInput used by
> searching.
> We need to test performance impact to verify and then pick a good
> overall default buffer size, also being careful not to add too much
> overall HEAP RAM usage because a potentially very large number of
> BufferedIndexInput instances are created during searching
> (# segments X # index files per segment).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1290) Deprecate Hits

2008-06-12 Thread Trejkaz (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12604701#action_12604701
 ] 

Trejkaz commented on LUCENE-1290:
-

To answer Doug's initial question, yes, we are using this in a desktop 
application inside a Swing TableModel.


> Deprecate Hits
> --
>
> Key: LUCENE-1290
> URL: https://issues.apache.org/jira/browse/LUCENE-1290
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Search
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 2.4
>
> Attachments: lucene-1290.patch, lucene-1290.patch, lucene-1290.patch
>
>
> The Hits class has several drawbacks as pointed out in LUCENE-954.
> The other search APIs that use TopDocCollector and TopDocs should be used 
> instead.
> This patch:
> - deprecates org/apache/lucene/search/Hits, Hit, and HitIterator, as well as
>   the Searcher.search( * ) methods which return a Hits Object.
> - removes all references to Hits from the core and uses TopDocs and ScoreDoc
>   instead
> - Changes the demo SearchFiles: adds the two modes 'paging search' and 
> 'streaming search',
>   each of which demonstrating a different way of using the search APIs. The 
> former
>   uses TopDocs and a TopDocCollector, the latter a custom HitCollector 
> implementation.
> - Updates the online tutorial that descibes the demo.
> All tests pass.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1262) IndexOutOfBoundsException from FieldsReader after problem reading the index

2008-04-09 Thread Trejkaz (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trejkaz updated LUCENE-1262:


Attachment: Test.java

Attaching a test program to reproduce the problem under 2.3.1.

It occurs approximately 1 in every 4 executions for any reasonably large text 
index (really small ones don't seem to do it so I couldn't attach a text index 
with it.)  The number of fields may be related, looking at the 
IndexOutOfBoundsException numbers it seems that the indexes we have happen to 
have a large number of fields.


> IndexOutOfBoundsException from FieldsReader after problem reading the index
> ---
>
> Key: LUCENE-1262
> URL: https://issues.apache.org/jira/browse/LUCENE-1262
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.3.1
>Reporter: Trejkaz
> Attachments: Test.java
>
>
> There is a situation where there is an IOException reading from Hits, and 
> then the next time you get a NullPointerException instead of an IOException.
> Example stack traces:
> java.io.IOException: The specified network name is no longer available
>   at java.io.RandomAccessFile.readBytes(Native Method)
>   at java.io.RandomAccessFile.read(RandomAccessFile.java:322)
>   at 
> org.apache.lucene.store.FSIndexInput.readInternal(FSDirectory.java:536)
>   at 
> org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:74)
>   at 
> org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal(CompoundFileReader.java:220)
>   at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:93)
>   at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:34)
>   at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:57)
>   at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:88)
>   at 
> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:344)
>   at org.apache.lucene.index.IndexReader.document(IndexReader.java:368)
>   at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:84)
>   at org.apache.lucene.search.Hits.doc(Hits.java:104)
> That error is fine.  The problem is the next call to doc generates:
> java.lang.NullPointerException
>   at 
> org.apache.lucene.index.FieldsReader.getIndexType(FieldsReader.java:280)
>   at org.apache.lucene.index.FieldsReader.addField(FieldsReader.java:216)
>   at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:101)
>   at 
> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:344)
>   at org.apache.lucene.index.IndexReader.document(IndexReader.java:368)
>   at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:84)
>   at org.apache.lucene.search.Hits.doc(Hits.java:104)
> Presumably FieldsReader is caching partially-initialised data somewhere.  I 
> would normally expect the exact same IOException to be thrown for subsequent 
> calls to the method.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1262) IndexOutOfBoundsException from FieldsReader after problem reading the index

2008-04-09 Thread Trejkaz (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trejkaz updated LUCENE-1262:


Affects Version/s: (was: 2.1)
   2.3.1
  Summary: IndexOutOfBoundsException from FieldsReader after 
problem reading the index  (was: NullPointerException from FieldsReader after 
problem reading the index)

I managed to reproduce the problem as-is under version 2.2.

For 2.3 the problem has changed -- instead of a NullPointerException it is now 
an IndexOutOfBoundsException:

Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 52, 
Size: 34
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:260)
at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:154)
at 
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:659)
at org.apache.lucene.index.IndexReader.document(IndexReader.java:525)
at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:92)
at org.apache.lucene.search.Hits.doc(Hits.java:167)
at Test.main(Test.java:24)

Will attach my test program in a moment.


> IndexOutOfBoundsException from FieldsReader after problem reading the index
> ---
>
> Key: LUCENE-1262
> URL: https://issues.apache.org/jira/browse/LUCENE-1262
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.3.1
>Reporter: Trejkaz
>
> There is a situation where there is an IOException reading from Hits, and 
> then the next time you get a NullPointerException instead of an IOException.
> Example stack traces:
> java.io.IOException: The specified network name is no longer available
>   at java.io.RandomAccessFile.readBytes(Native Method)
>   at java.io.RandomAccessFile.read(RandomAccessFile.java:322)
>   at 
> org.apache.lucene.store.FSIndexInput.readInternal(FSDirectory.java:536)
>   at 
> org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:74)
>   at 
> org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal(CompoundFileReader.java:220)
>   at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:93)
>   at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:34)
>   at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:57)
>   at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:88)
>   at 
> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:344)
>   at org.apache.lucene.index.IndexReader.document(IndexReader.java:368)
>   at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:84)
>   at org.apache.lucene.search.Hits.doc(Hits.java:104)
> That error is fine.  The problem is the next call to doc generates:
> java.lang.NullPointerException
>   at 
> org.apache.lucene.index.FieldsReader.getIndexType(FieldsReader.java:280)
>   at org.apache.lucene.index.FieldsReader.addField(FieldsReader.java:216)
>   at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:101)
>   at 
> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:344)
>   at org.apache.lucene.index.IndexReader.document(IndexReader.java:368)
>   at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:84)
>   at org.apache.lucene.search.Hits.doc(Hits.java:104)
> Presumably FieldsReader is caching partially-initialised data somewhere.  I 
> would normally expect the exact same IOException to be thrown for subsequent 
> calls to the method.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1262) NullPointerException from FieldsReader after problem reading the index

2008-04-09 Thread Trejkaz (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trejkaz updated LUCENE-1262:


Affects Version/s: (was: 2.2)
   2.1

Okay I'll eat my words now, it is indeed 2.1 as the version doesn't have 
openInput(String,int) in it.

Anyway an update: I've managed to reproduce it on any text index by simulating 
random network outage.  I'm keeping a flag which I set to true.  The trick is 
that the wrapping IndexInput implementation *randomly* throws IOException if 
the flag is true -- if it always throws IOException the problem doesn't occur.  
If it randomly throws it then it occurs occasionally, and it always seems to be 
for larger queries (I'm using MatchAllDocsQuery now.)

I'll see if I can tweak the code to make it more likely to happen and then 
start working up to each version of Lucene to see if it stops happening 
somewhere.


> NullPointerException from FieldsReader after problem reading the index
> --
>
> Key: LUCENE-1262
> URL: https://issues.apache.org/jira/browse/LUCENE-1262
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.1
>Reporter: Trejkaz
>
> There is a situation where there is an IOException reading from Hits, and 
> then the next time you get a NullPointerException instead of an IOException.
> Example stack traces:
> java.io.IOException: The specified network name is no longer available
>   at java.io.RandomAccessFile.readBytes(Native Method)
>   at java.io.RandomAccessFile.read(RandomAccessFile.java:322)
>   at 
> org.apache.lucene.store.FSIndexInput.readInternal(FSDirectory.java:536)
>   at 
> org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:74)
>   at 
> org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal(CompoundFileReader.java:220)
>   at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:93)
>   at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:34)
>   at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:57)
>   at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:88)
>   at 
> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:344)
>   at org.apache.lucene.index.IndexReader.document(IndexReader.java:368)
>   at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:84)
>   at org.apache.lucene.search.Hits.doc(Hits.java:104)
> That error is fine.  The problem is the next call to doc generates:
> java.lang.NullPointerException
>   at 
> org.apache.lucene.index.FieldsReader.getIndexType(FieldsReader.java:280)
>   at org.apache.lucene.index.FieldsReader.addField(FieldsReader.java:216)
>   at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:101)
>   at 
> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:344)
>   at org.apache.lucene.index.IndexReader.document(IndexReader.java:368)
>   at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:84)
>   at org.apache.lucene.search.Hits.doc(Hits.java:104)
> Presumably FieldsReader is caching partially-initialised data somewhere.  I 
> would normally expect the exact same IOException to be thrown for subsequent 
> calls to the method.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1262) NullPointerException from FieldsReader after problem reading the index

2008-04-09 Thread Trejkaz (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trejkaz updated LUCENE-1262:


Affects Version/s: (was: 2.3.1)
   2.2

Whoops.  I don't think it's 2.1 but it must be 2.2.

I'll try and reproduce this standalone but first I need a way to have 
readInternal throw an exception.  I presume you were using some kind of custom 
store implementation to do that, I'll see if I can make it happen.under 2.2 and 
then try the same thing under 2.3.1 to confirm whether it still breaks.


> NullPointerException from FieldsReader after problem reading the index
> --
>
> Key: LUCENE-1262
> URL: https://issues.apache.org/jira/browse/LUCENE-1262
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.2
>Reporter: Trejkaz
>
> There is a situation where there is an IOException reading from Hits, and 
> then the next time you get a NullPointerException instead of an IOException.
> Example stack traces:
> java.io.IOException: The specified network name is no longer available
>   at java.io.RandomAccessFile.readBytes(Native Method)
>   at java.io.RandomAccessFile.read(RandomAccessFile.java:322)
>   at 
> org.apache.lucene.store.FSIndexInput.readInternal(FSDirectory.java:536)
>   at 
> org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:74)
>   at 
> org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal(CompoundFileReader.java:220)
>   at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:93)
>   at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:34)
>   at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:57)
>   at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:88)
>   at 
> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:344)
>   at org.apache.lucene.index.IndexReader.document(IndexReader.java:368)
>   at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:84)
>   at org.apache.lucene.search.Hits.doc(Hits.java:104)
> That error is fine.  The problem is the next call to doc generates:
> java.lang.NullPointerException
>   at 
> org.apache.lucene.index.FieldsReader.getIndexType(FieldsReader.java:280)
>   at org.apache.lucene.index.FieldsReader.addField(FieldsReader.java:216)
>   at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:101)
>   at 
> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:344)
>   at org.apache.lucene.index.IndexReader.document(IndexReader.java:368)
>   at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:84)
>   at org.apache.lucene.search.Hits.doc(Hits.java:104)
> Presumably FieldsReader is caching partially-initialised data somewhere.  I 
> would normally expect the exact same IOException to be thrown for subsequent 
> calls to the method.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-1262) NullPointerException from FieldsReader after problem reading the index

2008-04-08 Thread Trejkaz (JIRA)
NullPointerException from FieldsReader after problem reading the index
--

 Key: LUCENE-1262
 URL: https://issues.apache.org/jira/browse/LUCENE-1262
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.3.1
Reporter: Trejkaz


There is a situation where there is an IOException reading from Hits, and then 
the next time you get a NullPointerException instead of an IOException.

Example stack traces:

java.io.IOException: The specified network name is no longer available
at java.io.RandomAccessFile.readBytes(Native Method)
at java.io.RandomAccessFile.read(RandomAccessFile.java:322)
at 
org.apache.lucene.store.FSIndexInput.readInternal(FSDirectory.java:536)
at 
org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:74)
at 
org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal(CompoundFileReader.java:220)
at 
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:93)
at 
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:34)
at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:57)
at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:88)
at 
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:344)
at org.apache.lucene.index.IndexReader.document(IndexReader.java:368)
at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:84)
at org.apache.lucene.search.Hits.doc(Hits.java:104)

That error is fine.  The problem is the next call to doc generates:

java.lang.NullPointerException
at 
org.apache.lucene.index.FieldsReader.getIndexType(FieldsReader.java:280)
at org.apache.lucene.index.FieldsReader.addField(FieldsReader.java:216)
at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:101)
at 
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:344)
at org.apache.lucene.index.IndexReader.document(IndexReader.java:368)
at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:84)
at org.apache.lucene.search.Hits.doc(Hits.java:104)

Presumably FieldsReader is caching partially-initialised data somewhere.  I 
would normally expect the exact same IOException to be thrown for subsequent 
calls to the method.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Issue Comment Edited: (LUCENE-1245) MultiFieldQueryParser is not friendly for overriding getFieldQuery(String,String,int)

2008-03-26 Thread Trejkaz (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582490#action_12582490
 ] 

trejkaz edited comment on LUCENE-1245 at 3/26/08 5:32 PM:
--

Here's an example illustrating the way we were using it, although instead of 
changing the query text we're actually returning a different query class -- 
that class isn't in Lucene Core and also it's easier to build up an expected 
query if it's just a TermQuery.

{noformat}
public void testOverrideGetFieldQuery() throws Exception {
String[] fields = { "a", "b" };
QueryParser parser = new MultiFieldQueryParser(fields, new 
StandardAnalyzer()) {
protected Query getFieldQuery(String field, String queryText, int 
slop) throws ParseException {
if (field != null && slop == 1) {
queryText = "z" + queryText;
}
return super.getFieldQuery(field, queryText, slop);
}
};

BooleanQuery expected = new BooleanQuery();
expected.add(new TermQuery(new Term("a", "zabc")), 
BooleanClause.Occur.SHOULD);
expected.add(new TermQuery(new Term("b", "zabc")), 
BooleanClause.Occur.SHOULD);
assertEquals("Expected a mangled query", expected, 
parser.parse("\"abc\"~1"));
}
{noformat}


  was (Author: trejkaz):
Here's an example illustrating the way we were using it, although instead 
of changing the query text we're actually returning a different query class -- 
that class isn't in Lucene Core and also it's easier to build up an expected 
query if it's just a TermQuery.

{noformat}
public void testOverrideGetFieldQuery() throws Exception {
String[] fields = { "a", "b" };
QueryParser parser = new MultiFieldQueryParser(fields, new 
StandardAnalyzer()) {
protected Query getFieldQuery(String field, String queryText, int 
slop) throws ParseException {
if (field != null && slop == 1) {
field = "z" + field;
}
return super.getFieldQuery(field, queryText, slop);
}
};

BooleanQuery expected = new BooleanQuery();
expected.add(new TermQuery(new Term("a", "zabc")), 
BooleanClause.Occur.SHOULD);
expected.add(new TermQuery(new Term("b", "zabc")), 
BooleanClause.Occur.SHOULD);
assertEquals("Expected a mangled query", expected, 
parser.parse("\"abc\"~1"));
}
{noformat}

  
> MultiFieldQueryParser is not friendly for overriding 
> getFieldQuery(String,String,int)
> -
>
> Key: LUCENE-1245
> URL: https://issues.apache.org/jira/browse/LUCENE-1245
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: QueryParser
>Affects Versions: 2.3.2
>Reporter: Trejkaz
> Attachments: multifield.patch
>
>
> LUCENE-1213 fixed an issue in MultiFieldQueryParser where the slop parameter 
> wasn't being properly applied.  Problem is, the fix which eventually got 
> committed is calling super.getFieldQuery(String,String), bypassing any 
> possibility of customising the query behaviour.
> This should be relatively simply fixable by modifying 
> getFieldQuery(String,String,int) to, if field is null, recursively call 
> getFieldQuery(String,String,int) instead of setting the slop itself.  This 
> gives subclasses which override either getFieldQuery method a chance to do 
> something different.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Issue Comment Edited: (LUCENE-1245) MultiFieldQueryParser is not friendly for overriding getFieldQuery(String,String,int)

2008-03-26 Thread Trejkaz (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582490#action_12582490
 ] 

trejkaz edited comment on LUCENE-1245 at 3/26/08 5:13 PM:
--

Here's an example illustrating the way we were using it, although instead of 
changing the query text we're actually returning a different query class -- 
that class isn't in Lucene Core and also it's easier to build up an expected 
query if it's just a TermQuery.

{noformat}
public void testOverrideGetFieldQuery() throws Exception {
String[] fields = { "a", "b" };
QueryParser parser = new MultiFieldQueryParser(fields, new 
StandardAnalyzer()) {
protected Query getFieldQuery(String field, String queryText, int 
slop) throws ParseException {
if (field != null && slop == 1) {
field = "z" + field;
}
return super.getFieldQuery(field, queryText, slop);
}
};

BooleanQuery expected = new BooleanQuery();
expected.add(new TermQuery(new Term("a", "zabc")), 
BooleanClause.Occur.SHOULD);
expected.add(new TermQuery(new Term("b", "zabc")), 
BooleanClause.Occur.SHOULD);
assertEquals("Expected a mangled query", expected, 
parser.parse("\"abc\"~1"));
}
{noformat}


  was (Author: trejkaz):
Here's an example illustrating the way we were using it, although instead 
of changing the query text we're actually returning a different query class -- 
that class isn't in Lucene Core and also it's easier to build up an expected 
query if it's just a TermQuery.

public void testOverrideGetFieldQuery() throws Exception {
String[] fields = { "a", "b" };
QueryParser parser = new MultiFieldQueryParser(fields, new 
StandardAnalyzer()) {
protected Query getFieldQuery(String field, String queryText, int 
slop) throws ParseException {
if (field != null && slop == 1) {
field = "z" + field;
}
return super.getFieldQuery(field, queryText, slop);
}
};

BooleanQuery expected = new BooleanQuery();
expected.add(new TermQuery(new Term("a", "zabc")), 
BooleanClause.Occur.SHOULD);
expected.add(new TermQuery(new Term("b", "zabc")), 
BooleanClause.Occur.SHOULD);
assertEquals("Expected a mangled query", expected, 
parser.parse("\"abc\"~1"));
}

  
> MultiFieldQueryParser is not friendly for overriding 
> getFieldQuery(String,String,int)
> -
>
> Key: LUCENE-1245
> URL: https://issues.apache.org/jira/browse/LUCENE-1245
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: QueryParser
>Affects Versions: 2.3.2
>Reporter: Trejkaz
> Attachments: multifield.patch
>
>
> LUCENE-1213 fixed an issue in MultiFieldQueryParser where the slop parameter 
> wasn't being properly applied.  Problem is, the fix which eventually got 
> committed is calling super.getFieldQuery(String,String), bypassing any 
> possibility of customising the query behaviour.
> This should be relatively simply fixable by modifying 
> getFieldQuery(String,String,int) to, if field is null, recursively call 
> getFieldQuery(String,String,int) instead of setting the slop itself.  This 
> gives subclasses which override either getFieldQuery method a chance to do 
> something different.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1245) MultiFieldQueryParser is not friendly for overriding getFieldQuery(String,String,int)

2008-03-26 Thread Trejkaz (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582490#action_12582490
 ] 

Trejkaz commented on LUCENE-1245:
-

Here's an example illustrating the way we were using it, although instead of 
changing the query text we're actually returning a different query class -- 
that class isn't in Lucene Core and also it's easier to build up an expected 
query if it's just a TermQuery.

public void testOverrideGetFieldQuery() throws Exception {
String[] fields = { "a", "b" };
QueryParser parser = new MultiFieldQueryParser(fields, new 
StandardAnalyzer()) {
protected Query getFieldQuery(String field, String queryText, int 
slop) throws ParseException {
if (field != null && slop == 1) {
field = "z" + field;
}
return super.getFieldQuery(field, queryText, slop);
}
};

BooleanQuery expected = new BooleanQuery();
expected.add(new TermQuery(new Term("a", "zabc")), 
BooleanClause.Occur.SHOULD);
expected.add(new TermQuery(new Term("b", "zabc")), 
BooleanClause.Occur.SHOULD);
assertEquals("Expected a mangled query", expected, 
parser.parse("\"abc\"~1"));
}


> MultiFieldQueryParser is not friendly for overriding 
> getFieldQuery(String,String,int)
> -
>
> Key: LUCENE-1245
> URL: https://issues.apache.org/jira/browse/LUCENE-1245
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: QueryParser
>Affects Versions: 2.3.2
>Reporter: Trejkaz
> Attachments: multifield.patch
>
>
> LUCENE-1213 fixed an issue in MultiFieldQueryParser where the slop parameter 
> wasn't being properly applied.  Problem is, the fix which eventually got 
> committed is calling super.getFieldQuery(String,String), bypassing any 
> possibility of customising the query behaviour.
> This should be relatively simply fixable by modifying 
> getFieldQuery(String,String,int) to, if field is null, recursively call 
> getFieldQuery(String,String,int) instead of setting the slop itself.  This 
> gives subclasses which override either getFieldQuery method a chance to do 
> something different.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1245) MultiFieldQueryParser is not friendly for overriding getFieldQuery(String,String,int)

2008-03-26 Thread Trejkaz (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trejkaz updated LUCENE-1245:


Attachment: multifield.patch

Fix makes getFieldQuery(String,String) and getFieldQuery(String,String,int) 
work more or less the same.  Neither calls methods on super and thus overriding 
the methods will work (and does.  Although I have no unit test for this yet.)

Common boosting logic is extracted to an applyBoost method.  Also the check for 
the clauses being empty, I have removed... as getBooleanQuery appears to be 
doing that already.


> MultiFieldQueryParser is not friendly for overriding 
> getFieldQuery(String,String,int)
> -
>
> Key: LUCENE-1245
> URL: https://issues.apache.org/jira/browse/LUCENE-1245
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: QueryParser
>Affects Versions: 2.3.2
>Reporter: Trejkaz
> Attachments: multifield.patch
>
>
> LUCENE-1213 fixed an issue in MultiFieldQueryParser where the slop parameter 
> wasn't being properly applied.  Problem is, the fix which eventually got 
> committed is calling super.getFieldQuery(String,String), bypassing any 
> possibility of customising the query behaviour.
> This should be relatively simply fixable by modifying 
> getFieldQuery(String,String,int) to, if field is null, recursively call 
> getFieldQuery(String,String,int) instead of setting the slop itself.  This 
> gives subclasses which override either getFieldQuery method a chance to do 
> something different.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1245) MultiFieldQueryParser is not friendly for overriding getFieldQuery(String,String,int)

2008-03-26 Thread Trejkaz (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trejkaz updated LUCENE-1245:


Lucene Fields: [New, Patch Available]  (was: [New])
  Summary: MultiFieldQueryParser is not friendly for overriding 
getFieldQuery(String,String,int)  (was: MultiFieldQueryParser is not friendly 
for overriding)

(Updating title to be more specific about what wasn't friendly.)

> MultiFieldQueryParser is not friendly for overriding 
> getFieldQuery(String,String,int)
> -
>
> Key: LUCENE-1245
> URL: https://issues.apache.org/jira/browse/LUCENE-1245
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: QueryParser
>Affects Versions: 2.3.2
>Reporter: Trejkaz
>
> LUCENE-1213 fixed an issue in MultiFieldQueryParser where the slop parameter 
> wasn't being properly applied.  Problem is, the fix which eventually got 
> committed is calling super.getFieldQuery(String,String), bypassing any 
> possibility of customising the query behaviour.
> This should be relatively simply fixable by modifying 
> getFieldQuery(String,String,int) to, if field is null, recursively call 
> getFieldQuery(String,String,int) instead of setting the slop itself.  This 
> gives subclasses which override either getFieldQuery method a chance to do 
> something different.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-1245) MultiFieldQueryParser is not friendly for overriding

2008-03-25 Thread Trejkaz (JIRA)
MultiFieldQueryParser is not friendly for overriding


 Key: LUCENE-1245
 URL: https://issues.apache.org/jira/browse/LUCENE-1245
 Project: Lucene - Java
  Issue Type: Improvement
  Components: QueryParser
Affects Versions: 2.3.2
Reporter: Trejkaz


LUCENE-1213 fixed an issue in MultiFieldQueryParser where the slop parameter 
wasn't being properly applied.  Problem is, the fix which eventually got 
committed is calling super.getFieldQuery(String,String), bypassing any 
possibility of customising the query behaviour.

This should be relatively simply fixable by modifying 
getFieldQuery(String,String,int) to, if field is null, recursively call 
getFieldQuery(String,String,int) instead of setting the slop itself.  This 
gives subclasses which override either getFieldQuery method a chance to do 
something different.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1240) TermsFilter: reuse TermDocs

2008-03-18 Thread Trejkaz (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trejkaz updated LUCENE-1240:


Lucene Fields: [New, Patch Available]  (was: [New])

> TermsFilter: reuse TermDocs
> ---
>
> Key: LUCENE-1240
> URL: https://issues.apache.org/jira/browse/LUCENE-1240
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.3.1
>Reporter: Trejkaz
> Attachments: terms-filter.patch
>
>
> TermsFilter currently calls termDocs(Term) once per term in the TermsFilter.  
> If we sort the terms it's filtering on, this can be optimised to call 
> termDocs() once and then skip(Term) once per term, which should significantly 
> speed up this filter.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1240) TermsFilter: reuse TermDocs

2008-03-18 Thread Trejkaz (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trejkaz updated LUCENE-1240:


Attachment: terms-filter.patch

Attaching my attempt at improving this.

The original code didn't close all the TermDocs it created either; this is now 
fixed also.


> TermsFilter: reuse TermDocs
> ---
>
> Key: LUCENE-1240
> URL: https://issues.apache.org/jira/browse/LUCENE-1240
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.3.1
>Reporter: Trejkaz
> Attachments: terms-filter.patch
>
>
> TermsFilter currently calls termDocs(Term) once per term in the TermsFilter.  
> If we sort the terms it's filtering on, this can be optimised to call 
> termDocs() once and then skip(Term) once per term, which should significantly 
> speed up this filter.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-1240) TermsFilter: reuse TermDocs

2008-03-18 Thread Trejkaz (JIRA)
TermsFilter: reuse TermDocs
---

 Key: LUCENE-1240
 URL: https://issues.apache.org/jira/browse/LUCENE-1240
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.3.1
Reporter: Trejkaz


TermsFilter currently calls termDocs(Term) once per term in the TermsFilter.  
If we sort the terms it's filtering on, this can be optimised to call 
termDocs() once and then skip(Term) once per term, which should significantly 
speed up this filter.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1213) MultiFieldQueryParser ignores slop parameter

2008-03-09 Thread Trejkaz (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trejkaz updated LUCENE-1213:


Attachment: multifield-fix.patch

Attaching one possible fix.  It's more verbose than I wish it could be, but I 
couldn't think of a reliable way to make it delegate as it would require 
casting the result to BooleanQuery to get the clauses our, and a subclass may 
return something else entirely.


> MultiFieldQueryParser ignores slop parameter
> 
>
> Key: LUCENE-1213
> URL: https://issues.apache.org/jira/browse/LUCENE-1213
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: QueryParser
>Reporter: Trejkaz
> Attachments: multifield-fix.patch
>
>
> MultiFieldQueryParser.getFieldQuery(String, String, int) calls 
> super.getFieldQuery(String, String), thus obliterating any slop parameter 
> present in the query.
> It should probably be changed to call super.getFieldQuery(String, String, 
> int), except doing only that will result in a recursive loop which is a 
> side-effect of what may be a deeper problem in MultiFieldQueryParser -- 
> getFieldQuery(String, String, int) is documented as delegating to 
> getFieldQuery(String, String), yet what it actually does is the exact 
> opposite.  This also causes problems for subclasses which need to override 
> getFieldQuery(String, String) to provide different behaviour.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-1213) MultiFieldQueryParser ignores slop parameter

2008-03-09 Thread Trejkaz (JIRA)
MultiFieldQueryParser ignores slop parameter


 Key: LUCENE-1213
 URL: https://issues.apache.org/jira/browse/LUCENE-1213
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Trejkaz


MultiFieldQueryParser.getFieldQuery(String, String, int) calls 
super.getFieldQuery(String, String), thus obliterating any slop parameter 
present in the query.

It should probably be changed to call super.getFieldQuery(String, String, int), 
except doing only that will result in a recursive loop which is a side-effect 
of what may be a deeper problem in MultiFieldQueryParser -- 
getFieldQuery(String, String, int) is documented as delegating to 
getFieldQuery(String, String), yet what it actually does is the exact opposite. 
 This also causes problems for subclasses which need to override 
getFieldQuery(String, String) to provide different behaviour.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1213) MultiFieldQueryParser ignores slop parameter

2008-03-09 Thread Trejkaz (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trejkaz updated LUCENE-1213:


Component/s: QueryParser

> MultiFieldQueryParser ignores slop parameter
> 
>
> Key: LUCENE-1213
> URL: https://issues.apache.org/jira/browse/LUCENE-1213
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: QueryParser
>Reporter: Trejkaz
>
> MultiFieldQueryParser.getFieldQuery(String, String, int) calls 
> super.getFieldQuery(String, String), thus obliterating any slop parameter 
> present in the query.
> It should probably be changed to call super.getFieldQuery(String, String, 
> int), except doing only that will result in a recursive loop which is a 
> side-effect of what may be a deeper problem in MultiFieldQueryParser -- 
> getFieldQuery(String, String, int) is documented as delegating to 
> getFieldQuery(String, String), yet what it actually does is the exact 
> opposite.  This also causes problems for subclasses which need to override 
> getFieldQuery(String, String) to provide different behaviour.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-1206) Ability to store Reader / InputStream fields

2008-03-06 Thread Trejkaz (JIRA)
Ability to store Reader / InputStream fields


 Key: LUCENE-1206
 URL: https://issues.apache.org/jira/browse/LUCENE-1206
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Reporter: Trejkaz


In some situations we would like to store the whole text, but the whole text 
won't always fit in memory so we can't create a String.  Likewise for storing 
binary, it would sometimes be better if we didn't have to read into a byte[] 
up-front (even when it doesn't use much memory, it increases the number of 
copies made and adds burden to GC.)

FieldsWriter currently writes the length at the start of the chunks though, so 
I don't know whether it would be possible to seek back and write the length 
after writing the data.

It would also be useful to use this in conjunction with compression, both for 
Reader and InputStream types.  And when retrieving the field, it should be 
possible to create a Reader without reading the entire String into memory 
up-front.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-1181) Token reuse is not ideal for avoiding array copies

2008-02-18 Thread Trejkaz (JIRA)
Token reuse is not ideal for avoiding array copies
--

 Key: LUCENE-1181
 URL: https://issues.apache.org/jira/browse/LUCENE-1181
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis
Affects Versions: 2.3
Reporter: Trejkaz


The way the Token API is currently written results in two unnecessary array 
copies which could be avoided by changing the way it works.

1. setTermBuffer(char[],int,int) calls resizeTermBuffer(int) which copies the 
original term text even though it's about to be overwritten.

#1 should be trivially fixable by introducing a private 
resizeTermBuffer(int,boolean) where the new boolean parameter specifies whether 
the existing term data gets copied over or not.

2. setTermBuffer(char[],int,int) copies what you pass in, instead of actually 
setting the term buffer.

Setting aside the fact that the setTermBuffer method is misleadingly named, 
consider a token filter which performs Unicode normalisation on each token.

How it has to be implemented at present:
  once:
- create a reusable char[] for storing the normalisation result
  every token:
- use getTermBuffer() and getTermLength() to get the buffer and relevant 
length
- normalise the original string into our temporary buffer   (if it isn't 
big enough, grow the temp buffer size.)
- setTermBuffer(byte[],int,int) - this does an extra copy.

The following sequence would be much better:
  once:
- create a reusable char[] for storing the normalisation result
  every token:
- use getTermBuffer() and getTermLength() to get the buffer and relevant 
length
- normalise the original string into our temporary buffer   (if it isn't 
big enough, grow the temp buffer size.)
- setTermBuffer(byte[],int,int) sets in our buffer by reference
- set the term buffer which used to be in the Token such that it becomes 
our new temp buffer.

The latter sequence results in no copying with the exception of the 
normalisation itself, which is unavoidable.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-587) Explanation.toHtml outputs invalid HTML

2006-06-01 Thread Trejkaz (JIRA)
Explanation.toHtml outputs invalid HTML
---

 Key: LUCENE-587
 URL: http://issues.apache.org/jira/browse/LUCENE-587
 Project: Lucene - Java
Type: Bug

  Components: Search  
Versions: 2.0.0
Reporter: Trejkaz


If you want an HTML representation of an Explanation, you might call the 
toHtml() method.  However, the output of this method looks like the following:


  some value = some description
  
some nested value = some description
  


As it is illegal in HTML to nest a UL directly inside a UL, this method will 
always output unparseable HTML if there are nested explanations.

What Lucene probably means to output is the following, which is valid HTML:


  some value = some description

  some nested value = some description

  



-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-458) Merging may create duplicates if the JVM crashes half way through

2005-10-26 Thread Trejkaz (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-458?page=comments#action_12356029 ] 

Trejkaz commented on LUCENE-458:


I was thinking more along the lines of...

1. open a reader, writer
2. read the document
3. write a marker marking that this document is the result of a move of another 
one
4. write the document
5. delete the original document
6. delete the marker
7. close the reader, writer

Then later on, when the reader opens an index and finds a marker, it goes and 
checks the location the marker points at, and if the location is still there, 
it continues from step 5 again.

> Merging may create duplicates if the JVM crashes half way through
> -
>
>  Key: LUCENE-458
>  URL: http://issues.apache.org/jira/browse/LUCENE-458
>  Project: Lucene - Java
> Type: Bug
> Versions: 1.4
>  Environment: Windows XP SP2, JDK 1.5.0_04 (crash occurred in this version.  
> We've updated to 1.5.0_05 since, but discovered this issue with an older text 
> index since.)
> Reporter: Trejkaz

>
> In the past, our indexing process crashed due to a Hotspot compiler bug on 
> SMP systems (although it could happen with any bad native code.)  Everything 
> picked up and appeared to work, but now that it's a month later I've 
> discovered an oddity in the text index.
> We have two documents which are identical in the text index.  I know we only 
> stored it once for two reasons.  First, we store the MD5 of every document 
> into the hash and the MD5s were the same.  Second, we store a GUID into each 
> document which is generated uniquely for each document.  The GUID and the MD5 
> hash on these two documents, as well as all other fields, is exactly the same.
> My conclusion is that a merge was occurring at the point the JVM crashed, 
> which is consistent with the time the process crashed.  Is it possible that 
> Lucene did the copy of this document to the new location, and didn't get to 
> delete the original?
> If so, I guess this issue should be prevented somehow.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-458) Merging may create duplicates if the JVM crashes half way through

2005-10-25 Thread Trejkaz (JIRA)
Merging may create duplicates if the JVM crashes half way through
-

 Key: LUCENE-458
 URL: http://issues.apache.org/jira/browse/LUCENE-458
 Project: Lucene - Java
Type: Bug
Versions: 1.4
 Environment: Windows XP SP2, JDK 1.5.0_04 (crash occurred in this version.  
We've updated to 1.5.0_05 since, but discovered this issue with an older text 
index since.)

Reporter: Trejkaz


In the past, our indexing process crashed due to a Hotspot compiler bug on SMP 
systems (although it could happen with any bad native code.)  Everything picked 
up and appeared to work, but now that it's a month later I've discovered an 
oddity in the text index.

We have two documents which are identical in the text index.  I know we only 
stored it once for two reasons.  First, we store the MD5 of every document into 
the hash and the MD5s were the same.  Second, we store a GUID into each 
document which is generated uniquely for each document.  The GUID and the MD5 
hash on these two documents, as well as all other fields, is exactly the same.

My conclusion is that a merge was occurring at the point the JVM crashed, which 
is consistent with the time the process crashed.  Is it possible that Lucene 
did the copy of this document to the new location, and didn't get to delete the 
original?

If so, I guess this issue should be prevented somehow.


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]