[
https://issues.apache.org/jira/browse/ACCUMULO-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864219#comment-15864219
]
Christopher Tubbs commented on ACCUMULO-4586:
---------------------------------------------
Currently, {{RowIterator}} provides grouping in the same way that Linux command
{{uniq}} provides grouping. That is to say, it only groups adjacent items which
match, rather than group all matching items in the stream. I think that, just
like for {{uniq}}, partial groupings on unsorted data is still a valid use case.
The proposed "fix" removes a perfectly valid use of {{RowIterator}}, changing
the behavior to favor a different use case. This is exactly the kind of thing
we complain about Thrift doing, as in THRIFT-1805. I don't consider the current
behavior to be a bug, but I do recognize that it is prone to being used
incorrectly, or with invalid assumptions.
Rather than change the behavior of the existing API to accommodate a subset of
use cases, I would prefer a new, alternate API to replace it, which imposes the
desired restrictions.
> Make rowiterator fail when unsorted data is observed
> ----------------------------------------------------
>
> Key: ACCUMULO-4586
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4586
> Project: Accumulo
> Issue Type: Bug
> Affects Versions: 1.6.6, 1.7.1, 1.8.0
> Reporter: Keith Turner
> Fix For: 2.0.0
>
>
> A batchscanner was used as a row iterator data source. The rowiterator
> expects data in sorted order and the batch scanner does not supply data in
> sorted order. The row iterator should have a sanity check to ensure source
> data is in sorted order.
> https://lists.apache.org/thread.html/c24448d171d8414321bccfc778c7fc8b53e45892cae9daafa220503f@%3Cuser.accumulo.apache.org%3E
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)