[
https://issues.apache.org/jira/browse/ACCUMULO-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863939#comment-15863939
]
Keith Turner commented on ACCUMULO-4586:
----------------------------------------
This problem has existed for a long time. There is certainly no reason to hold
up 1.7.3 and 1.8.1 for it. However, I think it would be nice to get this fixed
in 1.8.2 and 1.7.4.
{quote}
As far as I can tell, the RowIterator doesn't necessarily need the data to be
in sorted order... it just needs all the entries for a single row to be grouped
together.
{quote}
There is no efficient way to check for that constraint without buffering all
rows seen in memory. The nice thing about constraining to sorted data is that
its efficient to check. I like having a condition we can fail fast on rather
than having users track down subtle bugs.
{quote}
I think we should deprecate RowIterator and remove it in 2.0. The Java 8
streams API makes this class redundant, since there are better options for
grouping by, using collectors. The streams API also makes it a bit more obvious
the cost and results of trying to do groupBy on unsorted data.
{quote}
I don't think we should deprecate it for the following reasons :
* RowIterator works great with Scanner which produces sorted data
* Given my limited experience with Java 8 stream, I think doing what
RowIterator does with streams may be cumbersome AND may require buffering in
memory. However, I am not sure about these assertions and would like to see an
example of using Streams to do what RowIterator does that is not cumbersome.
> Make rowiterator fail when unsorted data is observed
> ----------------------------------------------------
>
> Key: ACCUMULO-4586
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4586
> Project: Accumulo
> Issue Type: Bug
> Affects Versions: 1.6.6, 1.7.1, 1.8.0
> Reporter: Keith Turner
> Fix For: 2.0.0
>
>
> A batchscanner was used as a row iterator data source. The rowiterator
> expects data in sorted order and the batch scanner does not supply data in
> sorted order. The row iterator should have a sanity check to ensure source
> data is in sorted order.
> https://lists.apache.org/thread.html/c24448d171d8414321bccfc778c7fc8b53e45892cae9daafa220503f@%3Cuser.accumulo.apache.org%3E
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)