On Tue, Sep 9, 2008 at 5:28 AM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
> Yonik Seeley wrote:
>> What about something like term freq?  Would it need to count the
>> number of docs after the local maxDoc or is there a better way?
>
> Good question...
>
> I think we'd have to take a full copy of the term -> termFreq on reopen?  I
> don't see how else to do it (I don't understand your suggestion above).  So,
> this will clearly add to the cost of reopen.

One could adjust the freq by iterating over the terms documents...
skipTo(localMaxDoc) and count how many are after that, then subtract
from the freq.  I didn't say it was a *good* idea :-)

>>> For reading stored fields and term vectors, which are now flushed
>>> immediately to disk, we need to somehow get an IndexInput from the
>>> IndexOutputs that IndexWriter holds open on these files.  Or, maybe, just
>>> open new IndexInputs?
>>
>> Hmmm, seems like a case of our nice and simple Directory model not
>> having quite enough features in this case.
>
> I think we can simply open IndexInputs on these files.  I believe Java does
> the right thing on windows, such that if we are already writing to the file,
> it does not prevent another file handle from opening the file for reading.

Yeah, I think the underlying RandomAccessFile might do the right
thing, but IndexInput isn't required to see any changes on the fly
(and current implementations don't) so at a minimum it would be a
change of IndexInput semantics.  Maybe there would need to be a
refresh() function added, or we would need to require a specific
Directory impl?

OR, if all writes are append-only, perhaps we don't ever need to
invalidate the read buffer and would just need to remove the current
logic that caches the file length and then let the underlying
RandomAccessFile do the EOF checking.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to