On 11/28/2012 01:11 AM, Michael McCandless wrote:
Flexible indexing is the ability to make your own codec, which
controls the reading and writing of all index parts (postings, stored
fields, term vectors, deleted docs, etc.).

So for example if you want to store some postings as a bit set instead
of the block format that's the default coming up in 4.1, that's easy
to do.

But what is less easy (as I described below) is changing what is
actually stored in the postings, eg adding a new per-position
attribute.

The original goal was to allow arbitrary attributes beyond the known
docs/freqs/positions/offsets that Lucene supports today, so that you
could easily make new application-dependent per-term, per-doc,
per-position things, pull them from the analyzer, save them to the
index, and access them from an IndexReader / query, but while some
APIs do expose this, it's not very well explored yet (eg, you'd have
to make a custom indexing chain to get the attributes "through"
IndexWriter down to your codec).  It would be great to make progress
making this easier, so ideas are very welcome :)

Regarding my questin/thread, is it also possible to change the backend system? I'd like to use Lucene for a versioned DBMS, thus I would need the ability to serialize/deserialize the bytes in my backend whereas keys/values are stored in pages (for instance in an upcoming B+-tree, or in simple "unordered" pages via a record-ID/record mapping). But as no one suggested anything as of now and I've also asked a year ago or so, after implementing the B+-tree I will probably have to implement my own datastructure and parser/tokenizer/stemmer... :-(

kind regards,
Johannes


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to