[jira] [Commented] (LUCENE-3606) Make IndexReader really read-only in Lucene 4.0

Robert Muir (Commented) (JIRA) Wed, 30 Nov 2011 17:42:03 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-3606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160559#comment-13160559
 ]


Robert Muir commented on LUCENE-3606:
-------------------------------------

{quote}
finally, "holy grail" where similarities can declare the normalization 
factor(s) they need, using byte/float/int whatever, and its all unified with 
the docvalues api. IndexReader.norms() maybe goes away here, and maybe 
NormsFormat too.
{quote}

Thinking about this: a clean way to do it would be for Similarity to get a new 
method:
{code}
ValueType getValueType();
{code}

and we would change:
{code}
byte computeNorm(FieldInvertState state);
{code}
to:
{code}
void computeNorm(FieldInvertState state, PerDocFieldValues norm);
{code}

Sims that want to encode multiple index-time scoring factors separately 
could just use BYTES_FIXED_STRAIGHT. This should be only for some rare
sims anyway, because a Sim can pull named 'application' specific scoring
factors from IR.perDocValues() today already.

Its not too crazy either since sims are already doing their own encoding,
so e.g. default sim would just use FIXED_INTS_8.

People that don't want to mess with bytes or smallfloat could use things
like FLOAT_32 if they want and need this.

we would just change FieldInfo.omitNorms to instead be FieldInfo.normValueType,
which is the value type of the norm (null if its omitted, just like 
docValueType).

Preflex FieldInfosReader would just set FIXED_INTS_8 or null, based on
whether the fieldinfos had omitNorms or not. it doesnt support
any other types... 

Finally then, sims would be own their scoring factors, and we could
even remove omitNorms from Field/FieldType etc (just use the correct 
scoring algorithm for the field, if you don't want norms, use a sim
that doesn't need them for scoring)

This would remove the awkward/messy situation where every similarity 
implementation we have has to 'downgrade' itself to handle things like
if the user decided to omit parts of their formula!
                
> Make IndexReader really read-only in Lucene 4.0
> -----------------------------------------------
>
>                 Key: LUCENE-3606
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3606
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>
> As we change API completely in Lucene 4.0 we are also free to remove 
> read-write access and commits from IndexReader. This code is so hairy and 
> buggy (as investigated by Robert and Mike today) when you work on 
> SegmentReader level but forget to flush in the DirectoryReader, so its better 
> to really make IndexReaders readonly.
> Currently with IndexReader you can do things like:
> - delete/undelete Documents -> Can be done by with IndexWriter, too (using 
> deleteByQuery)
> - change norms -> this is a bad idea in general, but when we remove norms at 
> all and replace by DocValues this is obsolete already. Changing DocValues 
> should also be done using IndexWriter in trunk (once it is ready)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-3606) Make IndexReader really read-only in Lucene 4.0

Reply via email to