Hi Martin,

Take a look at what I've done with SOLR-380 (https://issues.apache.org/jira/browse/SOLR-380). It might solve your problem, or at least give you a good starting point.

Tricia

Michael McCandless wrote:

I think you could use payloads (= arbitrary/opaque byte[]) for this?

You can attach a payload to each term occurrence during tokenization (indexing), and then retrieve the payload during searching.

Mike

Martin Owens wrote:

Hello Users,

I'm working on a project which attempts to store data that comes from an
OCR process which describes the pixel co-ordinates of each term in the
document. It's used for hit highlighting.

What I would like to do is store this co-ordinate information alongside
the terms. I know there is existing meta data stored per term (Word
Offset and Char Offsets) the problem is that If I create a separate
index and try and use the word offset or char offsets not only is it
slower but it doesn't match because of the way the terms are processed
both inside of lucene and the OCR program.

So, is it possible to store the data alongside the terms in lucene and
then recall them when doing certain searches? and how much custom code
needs to be written to do it?

Best Regards, Martin Owens

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to