Hi,
is there a way to store additional metadata with fields?
My Problem is as follows:
I'm extracting extended html with tika. This extended html contains references
to pages, x,y values of the text etc. I want to be able to retrieve those
values when text was found while searching.
So when creating the Document, i'm storing a Field for every part of the texts
content of the document i'm currently indexing (lets call it "content").
Example:
I have the following content:
<html><body>
<span page="1" x="1", y="1">This is a very</span>
<span page="1" x="1", y="2">interesting text.</span>
<span page="2" x="1", y="1">This is boring text</span>
</body></html>
So i would store the following:
doc.add(new Field("content", "This is a very", Field.Store.YES,
Field.Index.YES));
doc.add(new Field("content", "interesting text", Field.Store.YES,
Field.Index.YES));
doc.add(new Field("content", "This is boring text", Field.Store.YES,
Field.Index.YES));
Is there any way to include the page,x,y values in there?
I'd like to display the page when retrieving the results.
I thought about storing the same field twice and adding the page,x,y values at
the beginning of the Field and then when retrieving the field extract those
values, but maybe theres a better way?
regards
Christoph Hermann
--
Christoph Hermann
Institut für Informatik
Tel: +49 761-203-8171 Fax: +49 761-203-8162
e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]