[
https://issues.apache.org/jira/browse/LUCENE-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shai Erera updated LUCENE-4620:
-------------------------------
Attachment: LUCENE-4620.patch
Patch makes the following changes:
* {{IntEncoder.encode()}} takes an {{IntsRef}} and {{BytesRef}} and encodes the
integers from {{IntsRef}} to {{BytesRef}}. Similarily, {{IntDecoder.decode()}}
takes a {{BytesRef}} and {{IntsRef}} and decodes the integers from the byte
array to the integer array.
* {{CategoryListIterator}} and {{Aggregator}} were changed to do bulk handling
of category ordinals as well.
* In the process I merged some methods such as {{PayloadIterator.setdoc}} and
{{PayloadIterator.getPayload}}, as well as {{AssociationsPayloadIterator}}, to
reduce even further the number of method calls that happen during search.
* Added a test which tests MultiCategoryListIterator (we didn't have one!) and
improved EncodingTest to test a large number of random values.
All tests pass, and 'ant javadocs' passes too.
> Explore IntEncoder/Decoder bulk API
> -----------------------------------
>
> Key: LUCENE-4620
> URL: https://issues.apache.org/jira/browse/LUCENE-4620
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/facet
> Reporter: Shai Erera
> Attachments: LUCENE-4620.patch
>
>
> Today, IntEncoder/Decoder offer a streaming API, where you can encode(int)
> and decode(int). Originally, we believed that this layer can be useful for
> other scenarios, but in practice it's used only for writing/reading the
> category ordinals from payload/DV.
> Therefore, Mike and I would like to explore a bulk API, something like
> encode(IntsRef, BytesRef) and decode(BytesRef, IntsRef). Perhaps the Encoder
> can still be streaming (as we don't know in advance how many ints will be
> written), dunno. Will figure this out as we go.
> One thing to check is whether the bulk API can work w/ e.g. facet
> associations, which can write arbitrary byte[], and so may decoding to an
> IntsRef won't make sense. This too we'll figure out as we go. I don't rule
> out that associations will use a different bulk API.
> At the end of the day, the requirement is for someone to be able to configure
> how ordinals are written (i.e. different encoding schemes: VInt, PackedInts
> etc.) and later read, with as little overhead as possible.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]