[jira] [Commented] (LUCENE-4539) DocValues impls should read all headers up-front instead of per-directsource

Robert Muir (JIRA) Tue, 06 Nov 2012 05:52:18 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491454#comment-13491454
 ]


Robert Muir commented on LUCENE-4539:
-------------------------------------

I agree with you its bogus how it writes its header.

But I see a downside (I hope we can come up with an idea to deal with it rather 
than keeping the header!)

One advantage of PackedInts writing its versioning (like FSTs) is that lots of 
things nest them in their own file.

The problem with these two things is that they are themselves changing and 
versioned: they arent like readVint()
which is pretty much fixed in what it does.

So having them write their own versions etc today to some extent makes back 
compat management of file formats easier:
today its just DocValues and Term dictionaries using these things, tomorrow 
(4.1) its also the postings lists: documents,
frequencies, and positions, and maybe in the future even stored fields 
(LUCENE-4527). 

Who is keeping up with all the places that must be managed when a packed ints 
version change needs to happen? Today 
the header encapsulates in one place: if i backwards break FSTs and it breaks a 
few suggester impls, i know anyone
using those suggesters will get IndexFormatTooOldException without me doing 
anything. So thats very convenient.


                
> DocValues impls should read all headers up-front instead of per-directsource
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-4539
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4539
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/index
>            Reporter: Robert Muir
>         Attachments: LUCENE-4539.patch
>
>
> Currently, when DocValues opens, it just opens files. it doesnt read codec 
> headers etc.
> Instead we read these every single time a directsource opens. 
> I think it should work like PostingsReaders: e.g. the PackedInts impl would 
> read its versioning info and codec headers and creating a new Direct impl 
> should be a IndexInput.clone() + getDirectReaderNoHeader().
> Today its much more costly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4539) DocValues impls should read all headers up-front instead of per-directsource

Reply via email to