Re: Fault tolerant scanner returning out-of-order partitions

2016-10-03 Thread Dan Burkert
The first three rows (including the out of order row) all fall in the same
range partition, so the issue is likely that the intra-tablet scan returned
out of order results (as opposed to the client scanning tablets out of
order). I'm under the same impression about SetFaultTolerant(), which is
why the test explicitly sets it.  How often is this happening? Back when
this test was committed a few months ago I ran it a few thousand times and
never saw anything like this.

On Mon, Oct 3, 2016 at 10:35 PM, Todd Lipcon  wrote:

> Hey Dan (+CC dev in case anyone else knows about this too)
>
> I'm debugging some flakiness in alter_table-randomized-test, and ti seems
> like it's failing because the verification scan is returning some out of
> order rows, despite using "SetFaultTolerant()". Granted, fault tolerance
> isn't publicly guaranteed to return rows in order, but I was under the
> impression that, with range partitioned tablets, it would always do so.
>
> The scan result I'm seeing has the following sequence within it:
>
> (int32 key=537424064, int32 c945=NULL, int32 c79=234639860, int32
> c990=NULL)
>  OUT OF ORDER ROW
> (int32 key=552025439, int32 c945=NULL, int32 c79=234639860, int32
> c990=NULL)
>  BACK TO NORMAL ORDER
> (int32 key=539314778, int32 c945=1708089980, int32 c79=-878787336, int32
> c990=829302644)
> (int32 key=541817227, int32 c945=2064952224, int32 c79=2064952224, int32
> c990=NULL)
> (int32 key=546056206, int32 c945=26527696, int32 c79=26527696, int32
> c990=26527696)
> (int32 key=601960253, int32 c945=NULL, int32 c79=1088757503, int32
> c990=NULL)
> (int32 key=677154987, int32 c945=823764490, int32 c79=823764490, int32
> c990=823764490)
>
> The prior alter was:
> I1004 05:17:48.192611 28113 alter_table-randomized-test.cc:481] Dropping
> range partition: [805306356, 872415219) resulting partitions: (134217726,
> 201326589], (268435452, 335544315], (335544315, 402653178], (402653178,
> 469762041], (536870904, 603979767], (671088630, 738197493], (738197493,
> 805306356], (939524082, 1006632945], (1006632945, 1073741808], (1275068397,
> 1342177260], (1342177260, 1409286123], (1409286123, 1476394986],
> (1610612712, 1677721575], (1879048164, 1946157027], (2013265890,
> 2080374753], (2080374753, 2147483616)
> I1004 05:17:48.193013 28113 alter_table-randomized-test.cc:406]
> Committing Alterations
>
> The whole log is available here:
> https://gist.githubusercontent.com/toddlipcon/
> 466976caf973f496885da9efc2f7246c/raw/f9baf418dad4ad07f33961b131c86e
> 84803815a8/alter_table-randomized-test.txt
>
> Any ideas what might be causing this out-of-order result? Is the test
> making some incorrect assumptions or might we have a bug?
>
> -Todd
>
>
>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>


Fault tolerant scanner returning out-of-order partitions

2016-10-03 Thread Todd Lipcon
Hey Dan (+CC dev in case anyone else knows about this too)

I'm debugging some flakiness in alter_table-randomized-test, and ti seems
like it's failing because the verification scan is returning some out of
order rows, despite using "SetFaultTolerant()". Granted, fault tolerance
isn't publicly guaranteed to return rows in order, but I was under the
impression that, with range partitioned tablets, it would always do so.

The scan result I'm seeing has the following sequence within it:

(int32 key=537424064, int32 c945=NULL, int32 c79=234639860, int32 c990=NULL)
 OUT OF ORDER ROW
(int32 key=552025439, int32 c945=NULL, int32 c79=234639860, int32 c990=NULL)
 BACK TO NORMAL ORDER
(int32 key=539314778, int32 c945=1708089980, int32 c79=-878787336, int32
c990=829302644)
(int32 key=541817227, int32 c945=2064952224, int32 c79=2064952224, int32
c990=NULL)
(int32 key=546056206, int32 c945=26527696, int32 c79=26527696, int32
c990=26527696)
(int32 key=601960253, int32 c945=NULL, int32 c79=1088757503, int32
c990=NULL)
(int32 key=677154987, int32 c945=823764490, int32 c79=823764490, int32
c990=823764490)

The prior alter was:
I1004 05:17:48.192611 28113 alter_table-randomized-test.cc:481] Dropping
range partition: [805306356, 872415219) resulting partitions: (134217726,
201326589], (268435452, 335544315], (335544315, 402653178], (402653178,
469762041], (536870904, 603979767], (671088630, 738197493], (738197493,
805306356], (939524082, 1006632945], (1006632945, 1073741808], (1275068397,
1342177260], (1342177260, 1409286123], (1409286123, 1476394986],
(1610612712, 1677721575], (1879048164, 1946157027], (2013265890,
2080374753], (2080374753, 2147483616)
I1004 05:17:48.193013 28113 alter_table-randomized-test.cc:406] Committing
Alterations

The whole log is available here:
https://gist.githubusercontent.com/toddlipcon/466976caf973f496885da9efc2f7246c/raw/f9baf418dad4ad07f33961b131c86e84803815a8/alter_table-randomized-test.txt

Any ideas what might be causing this out-of-order result? Is the test
making some incorrect assumptions or might we have a bug?

-Todd






-- 
Todd Lipcon
Software Engineer, Cloudera