Nadav Har'El wrote:
On Thu, Jan 18, 2007, Michael Busch wrote about "Re: Payloads":
As you pointed out it is still possible to have per-doc payloads. You need an analyzer which adds just one Token with payload to a specific field for each doc. I understand that this code would be quite ugly on the app side. A more elegant solution might be LUCENE-580. With that patch you are able to add pre-analyzed fields (i. e. TokenStreams) to a Document without having to use an analyzer. You could use a TokenStream

Thanks, this sounds like a good idea.

In fact, I could live with something even simpler: I want to be able
to create a Field with a single token (with its payload). If I need more
than one of these tokens with payloads, I can just add several fields with
the same name (this should work, although the description of LUCENE-580
suggests that it might have a bug in this area).

I'll add a comment about this use-case to LUCENE-580.

Yes for your use case it would indeed make sense to just add a single Token to a field. But there are other use cases that would benefit from 580. E. g. when using UIMA as a parser. UIMA does not work per-field, it materializes the tokens of all fields in a CAS. So the indexer can't call the parser per field, the parsing has to be done before indexing. So it would make sense to do the parsing and then add TokenStreams for the different fields to the Document that only iterate through the CAS. This is of course also possible by adding multiple Field instances containing single Tokens to a Document, but the performance would suffer. Each Token would be wrapped in a Field object and then hold in a list in Document.

So I think being able to add TokenStreams to a Document makes sense.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to