Nicolas Lalevée wrote:
I have just looked at it. It looks great :)
Thanks! :-)
But I still doesn't understand why a new entry in the fieldinfo is needed.
The entry is not really *needed*, but I use it for
backwards-compatibility and as an optimization for fields that don't
have any tokens with payloads. For fields with payloads the
PositionDelta is shifted one bit, so for certain values this means that
the VInt needs an extra byte. I have an index with about 500k web
documents and measured, that about 8% of all PositionDelta values would
need one extra byte in case PositionDelta is shifted. For my index that
means roughly 4% growth of the total index size. With using a fieldbit,
payloads can be disabled for a field and therefore the shifting of
PositionDelta can be avoided. Furthermore, if the payload-fieldbit is
not enabled, then the index format does not change at all.
There is the same for TermVector. And code like that fail for no obvious
reason :
Document doc = new Document();
doc.add(new Field("f1", "v1", Store.YES, Index.TOKENIZED,
TermVector.WITH_POSITIONS_OFFSETS));
doc.add(new Field("f1", "v2", Store.YES, Index.TOKENIZED, TermVector.NO));
RAMDirectory ram = new RAMDirectory();
IndexWriter writer = new IndexWriter(ram, new StandardAnalyzer(), true);
writer.addDocument(doc);
writer.close();
Knowing a little bit about how lucene works, I have an idea why this fail, but
can we avoid this ?
Nicolas
In the payload case there is no problem like this one. There is no new
Field option that can be used to set the fieldbit explicitly. The bit is
set automatically for a field as soon as the first Token of that field
that carries a payload is encountered.
Michael
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]