On Nov 19, 2007 6:52 PM, Michael Busch <[EMAIL PROTECTED]> wrote:
> Yonik Seeley wrote:
> >
> > So I think we all agree to do payloads by reference (do not make a
> > copy of byte[] like termBuffer does), and to allow payload reuse.
> >
> > So now we still have 3 viable options still on the table I think:
> > Token{ byte[] payload, int payloadLength, ...}
> > Token{ byte[] payload, int payloadOffset, int payloadLength,...}
> > Token{ Payload p, ... }
> >
>
> I'm for option 2. I agree that it is worthwhile to allow filters to
> modify the payloads. And I'd like to optimize for the case where lot's
> of tokens have payloads, and option 2 seems therefore the way to go.Just to play devil's advocate, it seems like adding the byte[] directly to Token gains less than we might have been thinking if we have reuse in any case. A TokenFilter could reuse the same Payload object for each term in a Field, so the CPU allocation savings is closer to a single Payload per field using payloads. If we used a Payload object, it would save 8 bytes per Token for fields not using payloads. Besides an initial allocation per field, the additional cost to using a Payload field would be an additional dereference (but that should be really minor). So I'm a bit more on-the-fence... Thoughts? -Yonik --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
