If I were feeling adventurous, and I wanted to help out Mark with Lucene-1001, I'd try this:

Get the trunk and apply Lucene-1001.

Index all of your docs with the highlight coords as payloads.

At highlight time, do something like the SpanHighlighter does - I've got a class called something like PayloadSpansUtil to help out with this. You want to index the doc to be highlighted into a MemoryIndex, and then pass an IndexReader off that to the util class - like the SpanHighlighter, it will convert a Query into a SpanQuery approximation, but instead of getting positions for matches, it will collect all of the payloads for matches.

Now run through the payloads and make a nice yellow/orange translucent block over each hit in the original image using the coords from each payload.


- Mark


Martin Owens wrote:
Following the history of Payloads from its beginnings (https://issues.apache.org/jira/browse/LUCENE-755, https://issues.apache.org/jira/browse/LUCENE-761, https://issues.apache.org/jira/browse/LUCENE-834, http://wiki.apache.org/lucene-java/Payload_Planning) it looks like TermPostionsVector was never considered as part of the Payload functionality. I think this is based on the underlying index file structure??? I don't see any way to get at a Payload other than through a TermPositions object. I don't think there is a way to translate code which uses TermPositions to using TermPositionVector with regards to payloads -- but I welcome someone to show me how they could.

Very interesting, and it fills in a few missing bits.

Maybe there is some other work around. What are you trying to accomplish "historically" with TermPositionsVectors instead of TermPositions?

Historically we've not been able to access the TermPositions object
because it seemed to require that the original text was stored and not
just indexed (although I can't see why) Perhaps I am mistaken?

We're not storing the text context because a) there is rather a lot of
it, b) we have the text files stored on special storage boxes mounted to
the webservers and they're using directly and c) It didn't seem worth
it.

Thoughts? So can I use the TermPositions object without the stored text?

Best Regards, Martin Owens

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to