[jira] [Commented] (LUCENE-1761) low level Field metadata is never removed from index

Robert Muir (JIRA) Thu, 03 Apr 2014 17:01:16 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959450#comment-13959450
 ]


Robert Muir commented on LUCENE-1761:
-------------------------------------

{quote}
and honestly shouldn’t be that hard to fix, right? If dead fields are culled as 
the segments are merged, this would just fix itself naturally wouldn’t it?
{quote}

Its pretty tricky to fix actually. There is a lot going on here including 
concurrency concerns with field numbers. Recycling field numbers would be even 
more difficult. The price paid for a mistake here is going to be index 
corruption because of how bulk stored fields merges and stuff work.

The risk is high, and the use-case for this is... not clear (in your case as 
you describe, it was an app bug). In such a situation I think filtering them 
out with something like addIndexes+FieldFilterAtomicReader is an acceptable 
workaround.

As far as why the opening is slow, thats specific to lucene 3.x's updatable 
norms (separate norms), which I'd bet $20 you arent even using. Unfortunately 
the same situation presents itself in SegmentReader due to updatable docvalues: 
i committed a comment that will hopefully be addressed:
{code}
    // TODO: can we avoid iterating over fieldinfos several times and creating 
maps of all this stuff if dv updates do not exist?
{code}

> low level Field metadata is never removed from index
> ----------------------------------------------------
>
>                 Key: LUCENE-1761
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1761
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 2.2, 2.3, 2.3.1, 2.3.2, 2.4, 2.4.1
>            Reporter: Hoss Man
>            Priority: Minor
>              Labels: gsoc2014
>         Attachments: LUCENE-1761.patch
>
>
> with heterogeneous docs, or an index whose fields evolve over time, field 
> names that are no longer used (ie: all docs that ever referenced them have 
> been deleted) still show up when you use IndexReader.getFieldNames.
> It seems logical that segment merging should only preserve metadata about 
> fields that actually existing the new segment, but even after deleting all 
> documents from an index and optimizing the old field names are still present.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-1761) low level Field metadata is never removed from index

Reply via email to