I have a bunch of Lucene indices lying around, and I want to start adding a
new field to documents in new indices that I'm generating. So, for a given
index, either every document in the index will have that field or no
document will have that field.

The new field has a default value; and I would like to write a query that,
when applied to old indices, matches all documents, while when applied to
new indices, it will only match documents with that specific default value.
(Probably the query will include other restrictions, but the other
restrictions have nothing to do with the new field, so they'll apply to
both indices.)


I can, of course, write two different queries, one for the old indices and
one for the new indices; for layering reasons, I'd prefer not to do that,
but it's a possibility. (I can't, however, go back to the old indices and
add the new field in.)

Any suggestions for how to write a single query that will work in both
places? Basically, what I want is a query that says something like

  (field IS MISSING) OR (field = DEFAULT_VALUE)

If it matters, the new field will only take one of a small number of
values, ten or so.


The one hint I've turned up when googling is this:
http://stackoverflow.com/questions/4365369/solr-search-for-documents-where-a-field-doesnt-exist

It talks in terms of Solr, but hopefully I can figure out how to translate
that into stock Lucene? Thinking out loud about what it suggests, I guess
maybe I can generate a WildcardQuery for my field with * (which I hope
won't be too expensive, given how few values my field has), and then do
something like

(field = DEFAULT_VALUE) OR NOT (field matches *)

And then I have to translate that into Lucene BooleanQuery syntax; I think
I can probably handle that step of things (I've done that sort of thing
before), but if anybody has tips, I'm all ears.


Basically, any suggestions would be welcome, whether about the basic
approach or about the details. And I would in particular very much
appreciate advice as to whether or not WildcardQuery(field, *) will have
good performance if field only takes a small number of values.

-- 
David Carlton
carl...@sumologic.com

Reply via email to