[ https://issues.apache.org/jira/browse/PHOENIX-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216411#comment-14216411 ]
Maryann Xue commented on PHOENIX-1456: -------------------------------------- I suggest we either remove this buffer reuse and creat a new buffer for each row we read from the file, or we provide a clear way that tells the caller if this buffer content is likely to be changed. Not sure which is better. Any thoughts? > Incorrect query results caused by reusing buffers in SpoolingResultIterator > --------------------------------------------------------------------------- > > Key: PHOENIX-1456 > URL: https://issues.apache.org/jira/browse/PHOENIX-1456 > Project: Phoenix > Issue Type: Bug > Affects Versions: 3.0.0, 4.0.0, 5.0.0 > Reporter: Maryann Xue > Assignee: Maryann Xue > Original Estimate: 120h > Remaining Estimate: 120h > > The SpoolingResultIterator#OnDiskResultIterator switches between two > pre-allocated buffers as reading buffers for the tuple result, based on the > assumption that the outer ResultIterator consumes the returned tuple in a > streaming fashion and will never look back/forward outside 2-tuple span. > However, some usages fail this assumption: > 1. OrderedResultIterator: It adds all tuples into its MappedByteBufferQueue > on initialization, which is maintained by a priority queue before threshold > is reached and spooling to files. > This is not revealed in most test cases because, most importantly, > OrderedResultIterator is not commonly used on clientside (only > ClientProcessingPlan does) > 2. Child/parent hash-join optimization, which uses a list of PK values to > create an InListExpression. > It might be easy to walk around the second usage here though, but may need > more consideration on the first one. > I am thinking to take away SpoolingResultIterator at all if there is an outer > ResultIterator being OrderedResultIterator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)