[
https://issues.apache.org/jira/browse/HADOOP-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689842#action_12689842
]
Klaas Bosteels commented on HADOOP-5528:
----------------------------------------
Chris,
* 3:-1 does indeed not refer to the last two bytes, but
** that's how it works in Python as well:
{code}
>>> l = [1,2,3,4,5]
>>> l[1:-2], l[1:3], l[3:-1]
([2, 3], [2, 3], [4])
{code}
** you can specify "the last n" bytes by setting only the left offset (because
{{LastIndexer}} is the default right indexer), which is also how you do it in
Python:
{code}
>>> l[-2:]
[4, 5]
{code}
** because of the {{min}} in the {{PosOffsetIndexer}}, you can also just set
the right offset to a large enough number to get "the last n" bytes.
* I don't think that -1 should be the default right offset, since that would
mean that the last byte is ignored by default.
* It might indeed be possible to use {{(index + key.getLength()) %
key.getLength()}} for both negative and positive offsets, but we need a
separate indexer to implement the default right index anyway, and I think it
makes sense to minimize the required computations by using more specialized
indexers.
So, personally, I think that:
* we need the indexer classes (and cannot use -1 as default right index),
* the max/min games are useful (and not merely a way of preventing exceptions),
* the semantics are correct,
which leaves me with nothing to change in the latest patch *smile* Can you
agree with this, or is there still something you want me to change nevertheless?
> Binary partitioner
> ------------------
>
> Key: HADOOP-5528
> URL: https://issues.apache.org/jira/browse/HADOOP-5528
> Project: Hadoop Core
> Issue Type: New Feature
> Components: mapred
> Reporter: Klaas Bosteels
> Assignee: Klaas Bosteels
> Attachments: HADOOP-5528.patch, HADOOP-5528.patch, HADOOP-5528.patch,
> HADOOP-5528.patch
>
>
> It would be useful to have a {{BinaryPartitioner}} that partitions
> {{BinaryComparable}} keys by hashing a configurable part of the bytes array
> corresponding to each key.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.