Nicolas Lalevée wrote:

I have just looked at it. It looks great :)
Thanks! :-)

But I still doesn't understand why a new entry in the fieldinfo is needed.

The entry is not really *needed*, but I use it for backwards-compatibility and as an optimization for fields that don't have any tokens with payloads. For fields with payloads the PositionDelta is shifted one bit, so for certain values this means that the VInt needs an extra byte. I have an index with about 500k web documents and measured, that about 8% of all PositionDelta values would need one extra byte in case PositionDelta is shifted. For my index that means roughly 4% growth of the total index size. With using a fieldbit, payloads can be disabled for a field and therefore the shifting of PositionDelta can be avoided. Furthermore, if the payload-fieldbit is not enabled, then the index format does not change at all.

There is the same for TermVector. And code like that fail for no obvious reason :

Document doc = new Document();
doc.add(new Field("f1", "v1", Store.YES, Index.TOKENIZED, TermVector.WITH_POSITIONS_OFFSETS));
doc.add(new Field("f1", "v2", Store.YES, Index.TOKENIZED, TermVector.NO));

RAMDirectory ram = new RAMDirectory();
IndexWriter writer = new IndexWriter(ram, new StandardAnalyzer(), true);
writer.addDocument(doc);
writer.close();

Knowing a little bit about how lucene works, I have an idea why this fail, but can we avoid this ?

Nicolas
In the payload case there is no problem like this one. There is no new Field option that can be used to set the fieldbit explicitly. The bit is set automatically for a field as soon as the first Token of that field that carries a payload is encountered.

Michael


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to