Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

Jack Krupansky Fri, 30 Nov 2012 07:49:15 -0800

"I will probably have to implement my own datastructure andparser/tokenizer/stemmer"

Why? I mean, I think the point of the Lucene architecture is that the codeclevel is completely independent of the analysis level.

The end result of analysis is a value to be stored from the applicationperspective, a "logical value" so to speak, but NOT the bit sequence, the"physical value" so to speak, that the codec will actually store.

So, go ahead and have your own codec that does whatever it wants withvalues, but the input for storage and query should be the output of astandard Lucene analyzer.


-- Jack Krupansky

-----Original Message-----From: Johannes.Lichtenberger

Sent: Friday, November 30, 2012 10:15 AM
To: java-user@lucene.apache.org
Cc: Michael McCandless

Subject: Re: What is "flexible indexing" in Lucene 4.0 if it's not theability to make new postings codecs?


On 11/28/2012 01:11 AM, Michael McCandless wrote:

Flexible indexing is the ability to make your own codec, which
controls the reading and writing of all index parts (postings, stored
fields, term vectors, deleted docs, etc.).

So for example if you want to store some postings as a bit set instead
of the block format that's the default coming up in 4.1, that's easy
to do.

But what is less easy (as I described below) is changing what is
actually stored in the postings, eg adding a new per-position
attribute.

The original goal was to allow arbitrary attributes beyond the known
docs/freqs/positions/offsets that Lucene supports today, so that you
could easily make new application-dependent per-term, per-doc,
per-position things, pull them from the analyzer, save them to the
index, and access them from an IndexReader / query, but while some
APIs do expose this, it's not very well explored yet (eg, you'd have
to make a custom indexing chain to get the attributes "through"
IndexWriter down to your codec).  It would be great to make progress
making this easier, so ideas are very welcome :)


Regarding my questin/thread, is it also possible to change the backend
system? I'd like to use Lucene for a versioned DBMS, thus I would need
the ability to serialize/deserialize the bytes in my backend whereas
keys/values are stored in pages (for instance in an upcoming B+-tree, or
in  simple "unordered" pages via a record-ID/record mapping). But as no
one suggested anything as of now and I've also asked a year ago or so,
after implementing the B+-tree I will probably have to implement my own
datastructure and parser/tokenizer/stemmer... :-(

kind regards,
Johannes


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org

For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

Reply via email to