I think there are many uses of Lucene that would benefit from 'enum' fields, aka categories.

When classifying documents, they are often in one or more categories.

Lucene could write these posting very efficiently using VINT and RLE (run length encoding) if the positions information was not stored (since it is not really useful in these typical cases).

StartingDocNum|NumberOfDocuments...StartingDocNum|NumberOfDocuments using a bit of the StartingDocNum to know if it was a series.

When a lot of documents are in the same category, and they are added as the same time, the document numbers would be nearly sequential, allowing very efficient compression.

Has anyone worked on this? Our previous custom IndexReaderWriter supported it, and I was wondering if this has made it into the core. I checked the docs/email and could not find anything.

Thanks.

Robert





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to