[
https://issues.apache.org/jira/browse/ACCUMULO-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861888#comment-15861888
]
Christopher Tubbs commented on ACCUMULO-4586:
---------------------------------------------
This is pretty much going to guarantee failure whenever the BatchScanner is
used. At the same time, it seems like it might be overly restrictive.
As far as I can tell, the RowIterator doesn't necessarily need the data to be
in sorted order... it just needs all the entries for a single row to be grouped
together. RowIterator works just fine over single-entry rows (though, it's a
bit unnecessary at that point), or if wrapping a custom scanner or other source
which provides this guarantee. It also works just fine if the user doesn't care
if a row is split into a few different objects, even if the source makes no
such guarantees.
I think we should deprecate RowIterator and remove it in 2.0. The Java 8
streams API makes this class redundant, since there are better options for
grouping by, using collectors. The streams API also makes it a bit more obvious
the cost and results of trying to do groupBy on unsorted data. It's not hidden
inside assumptions within RowIterator. Rather, it actually imposes a level of
difficulty upon the user trying to use the streams API for grouping, because
it's just inherently hard to do on unsorted data.
> Make rowiterator fail when unsorted data is observed
> ----------------------------------------------------
>
> Key: ACCUMULO-4586
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4586
> Project: Accumulo
> Issue Type: Bug
> Affects Versions: 1.6.6, 1.7.1, 1.8.0
> Reporter: Keith Turner
> Fix For: 1.7.3, 1.8.2, 2.0.0
>
>
> A batchscanner was used as a row iterator data source. The rowiterator
> expects data in sorted order and the batch scanner does not supply data in
> sorted order. The row iterator should have a sanity check to ensure source
> data is in sorted order.
> https://lists.apache.org/thread.html/c24448d171d8414321bccfc778c7fc8b53e45892cae9daafa220503f@%3Cuser.accumulo.apache.org%3E
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)