Re: Norm Value of not existing Field

Erick Erickson Fri, 04 Dec 2009 05:54:38 -0800

The word "Filter" as part of a class is overloaded in Lucene <G>....

See: http://lucene.apache.org/java/2_9_1/api/all/index.html

The above filter is just a DocIdSet, one bit per document. So
in your example, you're only talking 12M or so, even if you
create one filter for every field and keep it around.

You *might* get some joy from, say, QueryWrapperFilter, although
I don't know if it handles pure wildcard terms (e.g. field:*)...

If that doesn't work out of the box, I *think* you can use TermDocs
with a term like field:"" and just keep marching until next() returns
false, merrily setting your Filter bits for each Doc returned by
the enumerator.....

HTH
Erick

On Fri, Dec 4, 2009 at 3:40 AM, Benjamin Heilbrunn <ben...@gmail.com> wrote:

> Erick, I'm not sure if I understand you right.
> What do you mean by "spinning through all the terms on a field".
>
> It would be an option to load all unique terms of a field by using
> TermEnum.
> Than use TermDocs to get the docs to those terms.
> The rest of docs doesn't contain a term and so you know, that the
> field don't exists or is empty on those docs.
> Btw: Is there a distinction in Lucene between empty and not existing
> Fields?
>
> The above method would work very well I think, but it would require to
> build and hold an extra data structure.
> My index has about 20 fields and 4 million docs. The overhead would be to
> large.
>
> I think - using the norms array (which is already there for most of
> the fields) would be a nice approach.
>
>
> Benjamin
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Re: Norm Value of not existing Field

Reply via email to