Re: index reopen question

2008-04-11 Thread Michael Busch

Chris Hostetter wrote:

: 1)  looking at the code:
: 
: if (this.hasChanges || this.isCurrent()) {

:   // the index hasn't changed - nothing to do here
:   return this;
: }

:Shouldn't it be !this.hasChanges?

...that's from DirectoryIndexReader, and it sure looks like a bug to me.



I don't think this is a bug. If hasChanges==true, then this reader has 
the write lock on the index directory and must be up to date.


-Michael

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Created: (LUCENE-1257) Port to Java5

2008-04-11 Thread robert engels

Correct.

On Apr 11, 2008, at 7:43 PM, Chris Hostetter wrote:



Since i had to read robert's email about 3 times before i got what  
he was

saying, i'll elaborate in case anyone else is scratching their head as
much as i was...

because you could write code that looks like this...
   for (int i = 0; i < arr.length; i++) {
   i = getSomeNumberNotBetweenZeroAndArrLength()
   String s = arr[i];
   }
...the arr[i] lookup must do bounds checking and raise an exception if
needed.  This is not neccessary in the "foreach" style construct where
there is no explicit loop counter.

: When iterating over an array using an indexed loop, you typically  
need to

: access the element, as follows:
:
: for(int i=0;i<100;i++) {
:   String s = array[i];
:   ...
: }
:
: Java performs bounds checking on the array[i] access to make sure  
i is within
: the limits of the array. Granted, there are optimizations the JVM  
can do in
: many cases using escape processing to know that i will always be  
in the range,

: but it is not always feasible.
:
: when you use
:
: for(String s : array) {
: }
:
: the JVM uses its own internal indexer that it knows cannot be  
outside the

: bounds, and thus the bounds checking can be avoided.


-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Created: (LUCENE-1257) Port to Java5

2008-04-11 Thread Chris Hostetter

Since i had to read robert's email about 3 times before i got what he was 
saying, i'll elaborate in case anyone else is scratching their head as 
much as i was...

because you could write code that looks like this...
   for (int i = 0; i < arr.length; i++) {
   i = getSomeNumberNotBetweenZeroAndArrLength()
   String s = arr[i];
   }
...the arr[i] lookup must do bounds checking and raise an exception if 
needed.  This is not neccessary in the "foreach" style construct where 
there is no explicit loop counter.

: When iterating over an array using an indexed loop, you typically need to
: access the element, as follows:
: 
: for(int i=0;i<100;i++) {
:   String s = array[i];
:   ...
: }
: 
: Java performs bounds checking on the array[i] access to make sure i is within
: the limits of the array. Granted, there are optimizations the JVM can do in
: many cases using escape processing to know that i will always be in the range,
: but it is not always feasible.
: 
: when you use
: 
: for(String s : array) {
: }
: 
: the JVM uses its own internal indexer that it knows cannot be outside the
: bounds, and thus the bounds checking can be avoided.


-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: index reopen question

2008-04-11 Thread Chris Hostetter

: 1)  looking at the code:
: 
: if (this.hasChanges || this.isCurrent()) {
:   // the index hasn't changed - nothing to do here
:   return this;
: }

:Shouldn't it be !this.hasChanges?

...that's from DirectoryIndexReader, and it sure looks like a bug to me.

: 2) FilterIndexReader calls the ensureOpen() method from the super class
: instead of overriding the method and call the inner reader's ensureOpen, is
: that expected?

FilterIndexReader doesn't call super.ensureOpen(), it just calls 
ensureOpen() so that subclasses can do whatever house keepingthey 
want in that method.  The "inner" reader is responsiblefor calling it's 
own ensureOpen method as appropriate.

: 3) When you reopen an index, the inner reference count is not updated, is
: that ok?

which class are you refering to? what do you mean by "inner" reference 
count ?  ... I'm guessing FilterIndexReader based on your question #2, but 
it doesn't support reopen.


-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-1260) Norm codec strategy in Similarity

2008-04-11 Thread Chris Hostetter

: I still am but mainly because it is the simplest and only way to get 
: better document boost resolution at the moment.

I would argue that using a FieldScoreQuery is the easiest way to get 
better document boost resolution ... but it doesn't change the fact thta 
support for more flexible norm encoding is worthwhile. (but as i said: i 
suspect column stride fields may be a suitbale replacement for the built 
in norm support down the road)


-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1260) Norm codec strategy in Similarity

2008-04-11 Thread Karl Wettin (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12587954#action_12587954
 ] 

Karl Wettin commented on LUCENE-1260:
-

This is a retroactive ASL blessing of the patch posted 11/Apr/08 06:01 AM

> Norm codec strategy in Similarity
> -
>
> Key: LUCENE-1260
> URL: https://issues.apache.org/jira/browse/LUCENE-1260
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.3.1
>Reporter: Karl Wettin
> Attachments: LUCENE-1260.txt, LUCENE-1260.txt
>
>
> The static span and resolution of the 8 bit norms codec might not fit with 
> all applications. 
> My use case requires that 100f-250f is discretized in 60 bags instead of the 
> default.. 10?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1260) Norm codec strategy in Similarity

2008-04-11 Thread Karl Wettin (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wettin updated LUCENE-1260:


Attachment: LUCENE-1260.txt

Fixed some typos and added some tests. Perhaps it needs new javadocs too?

> Norm codec strategy in Similarity
> -
>
> Key: LUCENE-1260
> URL: https://issues.apache.org/jira/browse/LUCENE-1260
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.3.1
>Reporter: Karl Wettin
> Attachments: LUCENE-1260.txt, LUCENE-1260.txt
>
>
> The static span and resolution of the 8 bit norms codec might not fit with 
> all applications. 
> My use case requires that 100f-250f is discretized in 60 bags instead of the 
> default.. 10?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Created: (LUCENE-1257) Port to Java5

2008-04-11 Thread robert engels
When iterating over an array using an indexed loop, you typically  
need to access the element, as follows:


for(int i=0;i<100;i++) {
String s = array[i];
...
}

Java performs bounds checking on the array[i] access to make sure i  
is within the limits of the array. Granted, there are optimizations  
the JVM can do in many cases using escape processing to know that i  
will always be in the range, but it is not always feasible.


when you use

for(String s : array) {
}

the JVM uses its own internal indexer that it knows cannot be outside  
the bounds, and thus the bounds checking can be avoided.


I would need to read the spec to know what happens if the array  
changes during the loop execution - my bet is that the loop maintains  
a reference to the original array, and thus it continues to work.


On Apr 11, 2008, at 2:28 AM, Endre Stølsvik wrote:


robert engels wrote:

The 'foreach' should be faster in the general case for arrays as  
the bounds checking can be avoided.


Why is that? Where do you mean that the bounds-checking can be  
avoided?


But, I doubt the speed difference is going to matter much either  
way, and eventually the JVM impl will converge to near equal  
performance.


This I actually agree on: if the foreach has some disadvantage of  
explicit indexing through an array, it will at some point be fixed  
so that it doesn't have this disadvantage anymore..


Endre.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Deprecation of flush in IndexWriter

2008-04-11 Thread Shay Banon

Hi,

   I was just looking a bit at the trunk. First, let me say that the
progress you guys make is amazing!. I would still like to ask a quick
question regarding deprecation of flush in IndexWriter. I think that there
are cases where flush is needed. For example, in trying to create a two
phase (or as close as possible to one) commit. The flush can be used for the
fist phase and the close/commit can be used for the second one. Does it make
sense?

Cheers,
Shay
-- 
View this message in context: 
http://www.nabble.com/Deprecation-of-flush-in-IndexWriter-tp16627610p16627610.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1262) IndexOutOfBoundsException from FieldsReader after problem reading the index

2008-04-11 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1262:
---

Attachment: LUCENE-1262.patch

Attached patch.  All tests pass.  I plan to commit in a day or so, to
both trunk (2.4) and 2.3.X branch (2.3.2).

I got the failure to happen with a standalone test case, added to
TestFieldsReader.

I found & fixed the issue.  It's in BufferedIndexReader's refill()
method.  The problem is that method changes bufferLength even if an
exception is hit.  This leaves incorrect bytes in the buffer such that
a subsequent readByte will return the incorrect bytes.

The fix is simple: use a local "int newLength" and only assign that to
value to bufferLength if the readInternal() call succeeds.  The test
fails without the fix and passes with it.

> IndexOutOfBoundsException from FieldsReader after problem reading the index
> ---
>
> Key: LUCENE-1262
> URL: https://issues.apache.org/jira/browse/LUCENE-1262
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.3.1
>Reporter: Trejkaz
> Attachments: LUCENE-1262.patch, Test.java
>
>
> There is a situation where there is an IOException reading from Hits, and 
> then the next time you get a NullPointerException instead of an IOException.
> Example stack traces:
> java.io.IOException: The specified network name is no longer available
>   at java.io.RandomAccessFile.readBytes(Native Method)
>   at java.io.RandomAccessFile.read(RandomAccessFile.java:322)
>   at 
> org.apache.lucene.store.FSIndexInput.readInternal(FSDirectory.java:536)
>   at 
> org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:74)
>   at 
> org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal(CompoundFileReader.java:220)
>   at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:93)
>   at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:34)
>   at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:57)
>   at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:88)
>   at 
> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:344)
>   at org.apache.lucene.index.IndexReader.document(IndexReader.java:368)
>   at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:84)
>   at org.apache.lucene.search.Hits.doc(Hits.java:104)
> That error is fine.  The problem is the next call to doc generates:
> java.lang.NullPointerException
>   at 
> org.apache.lucene.index.FieldsReader.getIndexType(FieldsReader.java:280)
>   at org.apache.lucene.index.FieldsReader.addField(FieldsReader.java:216)
>   at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:101)
>   at 
> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:344)
>   at org.apache.lucene.index.IndexReader.document(IndexReader.java:368)
>   at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:84)
>   at org.apache.lucene.search.Hits.doc(Hits.java:104)
> Presumably FieldsReader is caching partially-initialised data somewhere.  I 
> would normally expect the exact same IOException to be thrown for subsequent 
> calls to the method.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1262) IndexOutOfBoundsException from FieldsReader after problem reading the index

2008-04-11 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12587896#action_12587896
 ] 

Michael McCandless commented on LUCENE-1262:


OK indeed I can get the failure to happen, using your Test running against a 
partial Wikipedia index I have.  I'll pursue!  Thanks Trejkaz.

> IndexOutOfBoundsException from FieldsReader after problem reading the index
> ---
>
> Key: LUCENE-1262
> URL: https://issues.apache.org/jira/browse/LUCENE-1262
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.3.1
>Reporter: Trejkaz
> Attachments: Test.java
>
>
> There is a situation where there is an IOException reading from Hits, and 
> then the next time you get a NullPointerException instead of an IOException.
> Example stack traces:
> java.io.IOException: The specified network name is no longer available
>   at java.io.RandomAccessFile.readBytes(Native Method)
>   at java.io.RandomAccessFile.read(RandomAccessFile.java:322)
>   at 
> org.apache.lucene.store.FSIndexInput.readInternal(FSDirectory.java:536)
>   at 
> org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:74)
>   at 
> org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal(CompoundFileReader.java:220)
>   at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:93)
>   at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:34)
>   at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:57)
>   at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:88)
>   at 
> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:344)
>   at org.apache.lucene.index.IndexReader.document(IndexReader.java:368)
>   at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:84)
>   at org.apache.lucene.search.Hits.doc(Hits.java:104)
> That error is fine.  The problem is the next call to doc generates:
> java.lang.NullPointerException
>   at 
> org.apache.lucene.index.FieldsReader.getIndexType(FieldsReader.java:280)
>   at org.apache.lucene.index.FieldsReader.addField(FieldsReader.java:216)
>   at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:101)
>   at 
> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:344)
>   at org.apache.lucene.index.IndexReader.document(IndexReader.java:368)
>   at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:84)
>   at org.apache.lucene.search.Hits.doc(Hits.java:104)
> Presumably FieldsReader is caching partially-initialised data somewhere.  I 
> would normally expect the exact same IOException to be thrown for subsequent 
> calls to the method.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Created: (LUCENE-1257) Port to Java5

2008-04-11 Thread Endre Stølsvik

robert engels wrote:



The 'foreach' should be faster in the general case for arrays as the 
bounds checking can be avoided.


Why is that? Where do you mean that the bounds-checking can be avoided?



But, I doubt the speed difference is going to matter much either way, 
and eventually the JVM impl will converge to near equal performance.


This I actually agree on: if the foreach has some disadvantage of 
explicit indexing through an array, it will at some point be fixed so 
that it doesn't have this disadvantage anymore..


Endre.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]