[ 
https://issues.apache.org/jira/browse/ACCUMULO-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861888#comment-15861888
 ] 

Christopher Tubbs commented on ACCUMULO-4586:
---------------------------------------------

This is pretty much going to guarantee failure whenever the BatchScanner is 
used. At the same time, it seems like it might be overly restrictive.

As far as I can tell, the RowIterator doesn't necessarily need the data to be 
in sorted order... it just needs all the entries for a single row to be grouped 
together. RowIterator works just fine over single-entry rows (though, it's a 
bit unnecessary at that point), or if wrapping a custom scanner or other source 
which provides this guarantee. It also works just fine if the user doesn't care 
if a row is split into a few different objects, even if the source makes no 
such guarantees.

I think we should deprecate RowIterator and remove it in 2.0. The Java 8 
streams API makes this class redundant, since there are better options for 
grouping by, using collectors. The streams API also makes it a bit more obvious 
the cost and results of trying to do groupBy on unsorted data. It's not hidden 
inside assumptions within RowIterator. Rather, it actually imposes a level of 
difficulty upon the user trying to use the streams API for grouping, because 
it's just inherently hard to do on unsorted data.

> Make rowiterator fail when unsorted data is observed
> ----------------------------------------------------
>
>                 Key: ACCUMULO-4586
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4586
>             Project: Accumulo
>          Issue Type: Bug
>    Affects Versions: 1.6.6, 1.7.1, 1.8.0
>            Reporter: Keith Turner
>             Fix For: 1.7.3, 1.8.2, 2.0.0
>
>
> A batchscanner was used as a row iterator data source.  The rowiterator 
> expects data in sorted order and the batch scanner does not supply data in 
> sorted order.  The row iterator should have a sanity check to ensure source 
> data is in sorted order.
> https://lists.apache.org/thread.html/c24448d171d8414321bccfc778c7fc8b53e45892cae9daafa220503f@%3Cuser.accumulo.apache.org%3E



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to