A question came across the #solr IRC channel, where the user was seeing fields in their /admin/luke endpoint about a bunch of fields they used to use, but are no longer in any current documents. That URL endpoint provides information about the fields in the index, getting most of that info directly from Lucene.

I asked them to run an optimize (forceMerge in Lucene) and see what that did. It did not remove those fields.

Discussing it with other Solr committers on the lucene-solr slack channel, this is apparently known -- a forceMerge does not eliminate any field metadata, even if the field is not referenced by any non-deleted document.

What I'm wondering is whether it would be possible to adjust merging so that it can determine what pieces of metadata (like field information) are unused in the index and remove them. It would be fine if this were only an option on forceMerge, but nice if it were something that could happen on any merge. That discussion on slack indicated that it might be prohibitively expensive to do this. Can one of our experts on Lucene merging respond?

This particular user has no option that I am aware of other than to rebuild their index. They're running version 4.2.1.

Thanks,
Shawn

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to