Hi Todd, Thanks for your accurate explanations, I am sure I understand that^_^
My mistake is that I assumed the process of MinorDeltaCompationOp was the same as MergeIterator while scanning:( 何李夫 2017-04-10 16:06:24 -----邮件原件----- 发件人: dev-return-5887-hzhelifu=corp.netease....@kudu.apache.org [mailto:dev-return-5887-hzhelifu=corp.netease....@kudu.apache.org] 代表 Todd Lipcon 发送时间: 2017年11月18日 0:31 收件人: dev <dev@kudu.apache.org> 主题: Re: could anybody help to explain this question Hi He, Answers inline below: On Fri, Nov 17, 2017 at 1:24 AM, helifu <hzhel...@corp.netease.com> wrote: > Hi everybody, > > > > I have a question about the MinorDeltaCompationOp. > > In my opinion, the data in redo delta file is in order by row_idx. And > at the same time there is a tree for mapping row_idx to block > pointer(ptr). Is it right? > That's correct. It's ordered by a tuple of (row_idx, transaction timestamp) actually since there may be multiple updates for a single row stored in a file. > > > Now, I am reading the code about MinorDeltaCompationOp, especially the > function ‘WriteDeltaIteratorToFile’, I find an interesting thing. The > new redo delta file will be disordered. > > 1.prepare n rows in every input redo delta file: > > RETURN_NOT_OK(iter->PrepareBatch(n, > DeltaIterator::PREPARE_FOR_COLLECT)); > > The thing that I think you are missing here is that PrepareBatch(n) doesn't prepare a batch of n deltas, but rather prepares a batch which contains all deltas for the next 'n' rowids. That is to say, if those 'n' rows contained no updates, this would prepare a batch of 0 deltas. If they each contained more than one update, it would prepare more than 'n' deltas. > > > 2.filter and collect these rows, and sort them by deltakey: > > RETURN_NOT_OK(iter->FilterColumnIdsAndCollectDeltas(vector<ColumnId>() > , > > &cells, > > &arena)); > > > > 3.write them to new redo delta file one by one: > > for (const DeltaKeyAndUpdate& cell : cells) { > > RowChangeList rcl(cell.cell); > > RETURN_NOT_OK(out->AppendDelta<Type>(cell.key, rcl)); > > RETURN_NOT_OK(stats.UpdateStats(cell.key.timestamp(), rcl)); > > } > > > > 4.next loop. > > > > Well, my question is that the second n rows in input redo delta file A > is not always larger than the first n rows in input redo delta file B. > Thus, it will result in failure when MutateRow. > I think given the above explanation this wouldn't be a problem. You can also see various test cases like fuzz-itest that would probably catch this bug if it were to happen. -Todd -- Todd Lipcon Software Engineer, Cloudera