Re: Kudu 1.6 release

2017-11-17 Thread Alexey Serbin

Hi Mike,

Thank you for taking care of the process for 1.6.0 release.
The plan you described looks good to me.


Best regards,

Alexey

On 11/16/17 1:51 AM, Mike Percy wrote:

Hi Kudu dev community,

It's been 2 months since release 1.5.0 and we've got a bunch of valuable
improvements and bug fixes waiting in the wings. Based on our usual 2-month
cadence, now looks like a good time to start thinking about a Kudu 1.6.0
release.

I'll volunteer to RM this one, unless someone else has a burning desire to
do it.

I'll also propose to cut the branch for 1.6.x early in the week *after* the
Thanksgiving holiday in the US, and to start a vote on RC1 a couple of days
after that.

Devs: That means release notes for notable changes in 1.6 should be up for
review and ready to go by Monday, November 27 (the Monday after
Thanksgiving) to ensure their inclusion.

Please let me know your thoughts on the above plan.

Thanks!
Mike





Re: could anybody help to explain this question

2017-11-17 Thread Todd Lipcon
Hi He,

Answers inline below:

On Fri, Nov 17, 2017 at 1:24 AM, helifu  wrote:

> Hi everybody,
>
>
>
> I have a question about the MinorDeltaCompationOp.
>
> In my opinion, the data in redo delta file is in order by row_idx. And at
> the same time there is a tree for mapping row_idx to block pointer(ptr). Is
> it right?
>

That's correct. It's ordered by a tuple of (row_idx, transaction timestamp)
actually since there may be multiple updates for a single row stored in a
file.


>
>
> Now, I am reading the code about MinorDeltaCompationOp, especially the
> function ‘WriteDeltaIteratorToFile’, I find an interesting thing. The new
> redo delta file will be disordered.
>
> 1.prepare n rows in every input redo delta file:
>
> RETURN_NOT_OK(iter->PrepareBatch(n, DeltaIterator::PREPARE_FOR_COLLECT));
>
>
The thing that I think you are missing here is that PrepareBatch(n) doesn't
prepare a batch of n deltas, but rather prepares a batch which contains all
deltas for the next 'n' rowids. That is to say, if those 'n' rows contained
no updates, this would prepare a batch of 0 deltas. If they each contained
more than one update, it would prepare more than 'n' deltas.


>
>
> 2.filter and collect these rows, and sort them by deltakey:
>
> RETURN_NOT_OK(iter->FilterColumnIdsAndCollectDeltas(vector(),
>
> &cells,
>
> &arena));
>
>
>
> 3.write them to new redo delta file one by one:
>
> for (const DeltaKeyAndUpdate& cell : cells) {
>
>   RowChangeList rcl(cell.cell);
>
>   RETURN_NOT_OK(out->AppendDelta(cell.key, rcl));
>
>   RETURN_NOT_OK(stats.UpdateStats(cell.key.timestamp(), rcl));
>
> }
>
>
>
> 4.next loop.
>
>
>
> Well, my question is that the second n rows in input redo delta file A is
> not always larger than the first n rows in input redo delta file B. Thus,
> it
> will result in failure when MutateRow.
>

I think given the above explanation this wouldn't be a problem. You can
also see various test cases like fuzz-itest that would probably catch this
bug if it were to happen.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


could anybody help to explain this question

2017-11-17 Thread helifu
Hi everybody,

 

I have a question about the MinorDeltaCompationOp. 

In my opinion, the data in redo delta file is in order by row_idx. And at
the same time there is a tree for mapping row_idx to block pointer(ptr). Is
it right?

 

Now, I am reading the code about MinorDeltaCompationOp, especially the
function ‘WriteDeltaIteratorToFile’, I find an interesting thing. The new
redo delta file will be disordered.

1.prepare n rows in every input redo delta file: 

RETURN_NOT_OK(iter->PrepareBatch(n, DeltaIterator::PREPARE_FOR_COLLECT));

 

2.filter and collect these rows, and sort them by deltakey:

RETURN_NOT_OK(iter->FilterColumnIdsAndCollectDeltas(vector(),

&cells,

&arena));

 

3.write them to new redo delta file one by one:

for (const DeltaKeyAndUpdate& cell : cells) {

  RowChangeList rcl(cell.cell);

  RETURN_NOT_OK(out->AppendDelta(cell.key, rcl));

  RETURN_NOT_OK(stats.UpdateStats(cell.key.timestamp(), rcl));

}

 

4.next loop.

 

Well, my question is that the second n rows in input redo delta file A is
not always larger than the first n rows in input redo delta file B. Thus, it
will result in failure when MutateRow.

 

 

何李夫

2017-04-10 16:06:24