date:20081120

[jira] Commented: (LUCENE-1461) Cached filter for a single term field

2008-11-20 Thread Paul Elschot (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649298#action_12649298
 ] 

Paul Elschot commented on LUCENE-1461:
--

For fields that have no more distinct values than fit into a short (2^16 at 
best, 65536), using a short[] would make sense I think. As the number of 
distinct field values can simply be counted in this context, it would make 
sense to simply replace the int[] by a short[] in that case. But it would only 
help to reduce space, and only a factor two.

For a set based query, the problem boils down to doing integer set membership 
in the iterator. For small sets, binary search should be fine. For larger ones 
an OpenBitSet would be preferable, but in this context that would only be 
feasible when the number of different terms is a lot smaller than the number of 
documents in the index.

For location grid-blocks one needs to deal with more than one dimension. In 
such cases my first thought is to use indexed hierarchical prefixes in each 
dimension, because this allows skipTo() to be used on the documents for the 
intersection between the dimensions. (But there may be better ways, it's a long 
time ago that I had a look at the literature for this.)
Do you need to index separate lower bounds and upper bounds on the data? That 
would complicate things.
Without indexed bounds (i.e. point data only) for each dimension it could make 
sense to use this multi range filter.



> Cached filter for a single term field
> -
>
> Key: LUCENE-1461
> URL: https://issues.apache.org/jira/browse/LUCENE-1461
> Project: Lucene - Java
>  Issue Type: New Feature
>Reporter: Tim Sturge
> Attachments: DisjointMultiFilter.java, RangeMultiFilter.java
>
>
> These classes implement inexpensive range filtering over a field containing a 
> single term. They do this by building an integer array of term numbers 
> (storing the term->number mapping in a TreeMap) and then implementing a fast 
> integer comparison based DocSetIdIterator.
> This code is currently being used to do age range filtering, but could also 
> be used to do other date filtering or in any application where there need to 
> be multiple filters based on the same single term field. I have an untested 
> implementation of single term filtering and have considered but not yet 
> implemented term set filtering (useful for location based searches) as well. 
> The code here is fairly rough; it works but lacks javadocs and toString() and 
> hashCode() methods etc. I'm posting it here to discover if there is other 
> interest in this feature; I don't mind fixing it up but would hate to go to 
> the effort if it's not going to make it into Lucene.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-11-20 Thread Alex Vigdor (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649385#action_12649385
 ] 

Alex Vigdor commented on LUCENE-831:


To be honest, the cache never successfully refilled before the patch - or at 
least I gave up after waiting 10 minutes.  I was about to give up on sorting.  
It could have to do with the fact that we're running with a relatively modest 
amount of RAM (768M) given our index size. But with the patch at least sorting 
is a realistic option!

I will look at adding the warming to my own code as you suggest; it is another 
peculiarity of this project that I can't know in the code what fields will be 
used for sorting, but I'll just track the searches coming through and aggregate 
any sorts they perform into a warming query.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-11-20 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649391#action_12649391
 ] 

Mark Miller commented on LUCENE-831:


bq. i haven't had any time to do further work on this issue ... partly because 
i haven't had a lot of time, but mainly because i'm hoping to get some feedback 
on the overall approach before any more serious effort investment. 

Wheres that investment Hoss? You've orphaned your baby. There is a fairly 
decent amount of feedback here.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-11-20 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649393#action_12649393
 ] 

Mark Miller commented on LUCENE-831:


I think this would actually be better if all cachekey types had to implement 
both ObjectArray access as well as primitive Array access. Makes the code 
cleaner and cuts down on the cachekey explosion. Should have done it this way 
to start, but couldnt see the forest through the trees back then i suppose.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1461) Cached filter for a single term field

2008-11-20 Thread Tim Sturge (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649494#action_12649494
 ] 

Tim Sturge commented on LUCENE-1461:


For small subsets of a large set (in my case around 1000 out of 1million) I 
suspect a simple open hash may perform better than a binary search. 

For location blocks (point data) my plan is just to number the grid with N^2 
numbers and create a set based on a circle around the desired place. Ideally 
this solution doesn't degrade with circle size so it's not necessary to do 
hierarchical prefixes, but I don't have benchmarks to support or refute that 
assumption. Agreed bounded locations make this much trickier.




> Cached filter for a single term field
> -
>
> Key: LUCENE-1461
> URL: https://issues.apache.org/jira/browse/LUCENE-1461
> Project: Lucene - Java
>  Issue Type: New Feature
>Reporter: Tim Sturge
> Attachments: DisjointMultiFilter.java, RangeMultiFilter.java
>
>
> These classes implement inexpensive range filtering over a field containing a 
> single term. They do this by building an integer array of term numbers 
> (storing the term->number mapping in a TreeMap) and then implementing a fast 
> integer comparison based DocSetIdIterator.
> This code is currently being used to do age range filtering, but could also 
> be used to do other date filtering or in any application where there need to 
> be multiple filters based on the same single term field. I have an untested 
> implementation of single term filtering and have considered but not yet 
> implemented term set filtering (useful for location based searches) as well. 
> The code here is fairly rough; it works but lacks javadocs and toString() and 
> hashCode() methods etc. I'm posting it here to discover if there is other 
> interest in this feature; I don't mind fixing it up but would hate to go to 
> the effort if it's not going to make it into Lucene.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2008-11-20 Thread Mike Klaas



On 19-Nov-08, at 5:12 AM, Michael McCandless (JIRA) wrote:


How can the VM system possibly make good decisions about what to swap
out?  It can't know if a page is being used for terms dict index,
terms dict, norms, stored fields, postings.  LRU is not a good policy,
because some pages (terms index) are far far more costly to miss than
others.


A note on this discussion: we recently re-architected a large database- 
y, lucene-y system to use mmap-based storage and are extremely pleased  
with the performance.   Sharing the buffers among processes is rather  
cool, as Marvin mentions, as is the near-instantaneous startup.


-Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2008-11-20 Thread Marvin Humphrey (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649569#action_12649569
 ] 

Marvin Humphrey commented on LUCENE-1458:
-

> Take a large Jira instance, where the app itself is also
> consuming alot of RAM, doing alot of its own IO, etc., where perhaps
> searching is done infrequently enough relative to other operations
> that the OS may no longer think the pages you hit for the terms index
> are hot enough to keep around.

Search responsiveness is already compromised in such a situation, because we
can all but guarantee that the posting list files have already been evicted
from cache.  If the box has enough RAM for the large JIRA instance including
the Lucene index, search responsiveness won't be a problem.  As soon as you
start running a little short on RAM, though, there's no way to stop infrequent
searches from being sluggish.  

Nevertheless, the terms index isn't that big in comparison to, say, the size
of a posting list for a common term, so the cost of re-heating it isn't
astronomical in the grand scheme of things.

> Similarly, when a BG merge is burning through data, or say backup kicks off
> and moves many GB, or the simple act of iterating through a big postings
> list, the OS will gleefully evict my terms index or norms in order to
> populate its IO cache with data it will need again for a very long time.

When that background merge finishes, the new files will be hot.  So, if we
open a new IndexReader right away and that IndexReader uses mmap() to get at
the file data, new segments be responsive right away.  

Even better, any IO caches for old segments used by the previous IndexReader
may still be warm.  All of this without having to decompress a bunch of stream
data into per-process data structures at IndexReader startup.

The terms index could indeed get evicted some of the time on busy systems, but
the point is that the system IO cache usually works in our favor, even under
load.

As far as backup daemons blowing up everybody's cache, that's stupid,
pathological behavior: .  Such
apps ought to be calling madvise(ptr, len, MADV_SEQUENTIAL) so that the kernel
knows it can recycle the cache pages as soon as they're cleared.

>> But hey, we can simplify even further! How about dispensing with the index
>> file? We can just divide the main dictionary file into blocks and binary
>> search on that.
> 
> I'm not convinced this'll be a win in practice. You are now paying an
> even higher overhead cost for each "check" of your binary search,
> especially with something like pulsing which inlines more stuff into
> the terms dict. I agree it's simpler, but I think that's trumped by
> the performance hit.

I'm persuaded that we shouldn't do away with the terms index.  Even if we're
operating on a dedicated search box with gobs of RAM, loading entire cache
pages when we only care about the first few bytes of each is poor use of
memory bandwidth.  And, just in case the cache does get blown, we'd like to
keep the cost of rewarming down.

Nathan Kurz and I brainstormed this subject in a phone call this morning, and
we came up with a three-file lexicon index design:

  * A file which is a solid stack of 64-bit file pointers into the lexicon
index term data.  Term data UTF-8 byte length can be determined by
subtracting the current pointer from the next one (or the file length at
the end).
  * A file which is contains solid UTF-8 term content.  (No string lengths, no
file pointers, just character data.)
  * A file which is a solid stack of 64-bit file pointers into the primary
lexicon.

Since the integers are already expanded and the raw UTF-8 data can be compared
as-is, those files can be memory-mapped and used as-is for binary search.

> In Lucene java, the concurrency model we are aiming for is a single JVM
> sharing a single instance of IndexReader. 

When I mentioned this to Nate, he remarked that we're using the OS kernel like
you're using the JVM.  

We don't keep a single IndexReader around, but we do keep the bulk of its data
cached so that we can just slap a cheap wrapper around it.

> I do agree, if fork() is the basis of your concurrency model then sharing
> pages becomes critical.  However, modern OSs implement copy-on-write sharing
> of VM pages after a fork, so that's another good path to sharing?

Lucy/KS can't enforce that, and we wouldn't want to.  It's very convenient to
be able to launch a cheap search process.

> Have you tried any actual tests swapping these approaches in as your
> terms index impl? 

No -- changing something like this requires a lot of coding, so it's better to
do thought experiments first to winnow down the options.

> Tests of fully hot and fully cold ends of the
> spectrum would be interesting, but also tests where a big segment
> merge or a backup is running in the

[jira] Commented: (LUCENE-1461) Cached filter for a single term field

2008-11-20 Thread Tim Sturge (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649575#action_12649575
 ] 

Tim Sturge commented on LUCENE-1461:


I tried a short[] array and it is about 20% faster than the int[] array (I'm 
assuming this is a memory bandwidth issue.)

I also tried replacing catching the ArrayIndexOutOfBoundsException with a check 
in the loop and discovered that the exception handling is about 3% faster.

Finally, I implemented TermMultiFilter as well which has about the same 
performance characteristics.

> Cached filter for a single term field
> -
>
> Key: LUCENE-1461
> URL: https://issues.apache.org/jira/browse/LUCENE-1461
> Project: Lucene - Java
>  Issue Type: New Feature
>Reporter: Tim Sturge
> Attachments: DisjointMultiFilter.java, RangeMultiFilter.java, 
> TermMultiFilter.java
>
>
> These classes implement inexpensive range filtering over a field containing a 
> single term. They do this by building an integer array of term numbers 
> (storing the term->number mapping in a TreeMap) and then implementing a fast 
> integer comparison based DocSetIdIterator.
> This code is currently being used to do age range filtering, but could also 
> be used to do other date filtering or in any application where there need to 
> be multiple filters based on the same single term field. I have an untested 
> implementation of single term filtering and have considered but not yet 
> implemented term set filtering (useful for location based searches) as well. 
> The code here is fairly rough; it works but lacks javadocs and toString() and 
> hashCode() methods etc. I'm posting it here to discover if there is other 
> interest in this feature; I don't mind fixing it up but would hate to go to 
> the effort if it's not going to make it into Lucene.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-1461) Cached filter for a single term field

2008-11-20 Thread Tim Sturge (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Sturge updated LUCENE-1461:
---

Attachment: TermMultiFilter.java

Added TermMultiFilter.java

> Cached filter for a single term field
> -
>
> Key: LUCENE-1461
> URL: https://issues.apache.org/jira/browse/LUCENE-1461
> Project: Lucene - Java
>  Issue Type: New Feature
>Reporter: Tim Sturge
> Attachments: DisjointMultiFilter.java, RangeMultiFilter.java, 
> TermMultiFilter.java
>
>
> These classes implement inexpensive range filtering over a field containing a 
> single term. They do this by building an integer array of term numbers 
> (storing the term->number mapping in a TreeMap) and then implementing a fast 
> integer comparison based DocSetIdIterator.
> This code is currently being used to do age range filtering, but could also 
> be used to do other date filtering or in any application where there need to 
> be multiple filters based on the same single term field. I have an untested 
> implementation of single term filtering and have considered but not yet 
> implemented term set filtering (useful for location based searches) as well. 
> The code here is fairly rough; it works but lacks javadocs and toString() and 
> hashCode() methods etc. I'm posting it here to discover if there is other 
> interest in this feature; I don't mind fixing it up but would hate to go to 
> the effort if it's not going to make it into Lucene.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-1001) Add Payload retrieval to Spans

2008-11-20 Thread Mark Miller (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-1001:


Attachment: LUCENE-1001-fix.patch

Okay, I still understand like 2% of spans, but I think I have fixed the bug.

After finding a match, but before finding a min match, we were pulling the 
payload - works fine when the match is the min match, but otherwise we actually 
have to wait to get the payload until we have crunched in on the min match. I 
had an idea of this before, and the code before I touched it tried to grab the 
payloads at this point - the problem is, in finding the min match, you've often 
advanced passed the term position of interest to find out there was no such min 
match. So you have to save the possible payload ahead of time, and either find 
a new one or use the possible saved one. Sucks to have to add extra loading, 
but at the moment I don't see how to do it differently (I admittedly can't see 
much in spans). Thats all partly a guess, partly probably true.

Non the less, this patch handles the previous test cases, plus the bug case 
reported above. I have also added a modified version of the given test for the 
bug to the span battery of tests.

Thanks Jonathan!

> Add Payload retrieval to Spans
> --
>
> Key: LUCENE-1001
> URL: https://issues.apache.org/jira/browse/LUCENE-1001
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Search
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 2.4
>
> Attachments: LUCENE-1001-fix.patch, LUCENE-1001.patch, 
> LUCENE-1001.patch, LUCENE-1001.patch, LUCENE-1001.patch, LUCENE-1001.patch, 
> LUCENE-1001.patch, LUCENE-1001.patch, LUCENE-1001.patch
>
>
> It will be nice to have access to payloads when doing SpanQuerys.
> See http://www.gossamer-threads.com/lists/lucene/java-dev/52270 and 
> http://www.gossamer-threads.com/lists/lucene/java-dev/51134
> Current API, added to Spans.java is below.  I will try to post a patch as 
> soon as I can figure out how to make it work for unordered spans (I believe I 
> have all the other cases working).
> {noformat}
>  /**
>* Returns the payload data for the current span.
>* This is invalid until [EMAIL PROTECTED] #next()} is called for
>* the first time.
>* This method must not be called more than once after each call
>* of [EMAIL PROTECTED] #next()}. However, payloads are loaded lazily,
>* so if the payload data for the current position is not needed,
>* this method may not be called at all for performance reasons.
>* 
>* 
>* WARNING: The status of the Payloads feature is experimental.
>* The APIs introduced here might change in the future and will not be
>* supported anymore in such a case.
>*
>* @return a List of byte arrays containing the data of this payload
>* @throws IOException
>*/
>   // TODO: Remove warning after API has been finalized
>   List/**/ getPayload() throws IOException;
>   /**
>* Checks if a payload can be loaded at this position.
>* 
>* Payloads can only be loaded once per call to
>* [EMAIL PROTECTED] #next()}.
>* 
>* 
>* WARNING: The status of the Payloads feature is experimental.
>* The APIs introduced here might change in the future and will not be
>* supported anymore in such a case.
>*
>* @return true if there is a payload available at this position that can 
> be loaded
>*/
>   // TODO: Remove warning after API has been finalized
>   public boolean isPayloadAvailable();
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Hudson build is back to normal: Lucene-trunk #652

2008-11-20 Thread Apache Hudson Server

See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/652/changes



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

FileNotFoundException during IndexWriter#commit in lucene-2.4.0

2008-11-20 Thread Andrew Zhang

Hi,

I met FileNotFoundException when using lucene 2.4.0. Please see the stack
trace[1] below. I checked the code of lucene-2.4, found the following code
throws FileNotFoundException:

file = new RandomAccessFile(path, "rw");

I checked system log, and found a warning before the exception: unable to
unlink '/sdcard/.servo/.index/_3.fdt' (errno=2)

Any hint what the problem is? Possible race condition during commit?

My index process was killed by system somtimes, but I assume it's OK for
lucene-2.4.

Thanks a lot in advance!

[1]
java.io.FileNotFoundException: /sdcard/.servo/.index/_3.fdt
at
org.apache.harmony.luni.platform.OSFileSystem.open(OSFileSystem.java:227)
at java.io.RandomAccessFile.(RandomAccessFile.java:109)
at
org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:639)
at
org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:442)
at org.apache.lucene.index.FieldsWriter.(FieldsWriter.java:62)
at
org.apache.lucene.index.StoredFieldsWriter.initFieldsWriter(StoredFieldsWriter.java:67)
at
org.apache.lucene.index.StoredFieldsWriter.finishDocument(StoredFieldsWriter.java:141)
at
org.apache.lucene.index.StoredFieldsWriter$PerDoc.finish(StoredFieldsWriter.java:187)
at
org.apache.lucene.index.DocumentsWriter$WaitQueue.writeDocument(DocumentsWriter.java:1408)
at
org.apache.lucene.index.DocumentsWriter$WaitQueue.add(DocumentsWriter.java:1427)
at
org.apache.lucene.index.DocumentsWriter.finishDocument(DocumentsWriter.java:1062)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:768)
at
org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:743)
at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1902)
at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1880)


-- 
Best regards,
Andrew Zhang

db4o - database for Android: www.db4o.com
http://zhanghuangzhu.blogspot.com/

[jira] Commented: (LUCENE-1461) Cached filter for a single term field

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

[jira] Commented: (LUCENE-1461) Cached filter for a single term field

Re: [jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

[jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

[jira] Commented: (LUCENE-1461) Cached filter for a single term field

[jira] Updated: (LUCENE-1461) Cached filter for a single term field

[jira] Updated: (LUCENE-1001) Add Payload retrieval to Spans

Hudson build is back to normal: Lucene-trunk #652

FileNotFoundException during IndexWriter#commit in lucene-2.4.0

12 matches

Site Navigation

Mail list logo

Footer information