Re: Flaky tests?

2017-11-29 Thread Alexey Serbin

An update: the flakiness in raft_consensus_nonvoter-itest has been fixed.

On 11/27/17 6:55 PM, Alexey Serbin wrote:
Yep, that CatalogManagerAddsNonVoter is the new one which was 
committed just yesterday.


On 11/27/17 6:53 PM, Alexey Serbin wrote:
The raft_consensus_nonvoter-itest is the set of tests added for 3-4-3 
re-replication improvements.  I'm adding more scenarios there right 
now, and I'll take care of the current flaky ones from there as well.



Thanks,

Alexey

On 11/27/17 6:38 PM, Andrew Wong wrote:
N/w! I should have checked with you beforehand given you were 
already in
the area (per your response last week). Seems the double-effort was 
fairly

minimal anyway.

With the fixes for tablet_copy-itest and delete_table-itest checked 
in, the

next-highest offenders on the dashboard
 are:

- raft_consensus_nonvoter-itest (9.62%)
- linked_list-test (8.45%)

 From a quick glance I'm not sure I have a grasp on what's going on in
either test. Would anyone like to volunteer? 

On Mon, Nov 27, 2017 at 6:27 PM, Alexey Serbin 
 wrote:



I just realized after re-reading this message that Andrew was about to
look at the flake in delete_table-itest as well.  I'm sorry for the
double-effort here, if any.  I read this message after posting the 
patch.




On 11/27/17 12:09 PM, Andrew Wong wrote:


I'm taking a look at tablet_copy-itest and the flakiness in
delete_table-itest beyond Alexey's outstanding patch.

On Tue, Nov 21, 2017 at 10:17 AM, Todd Lipcon  
wrote:


On Tue, Nov 21, 2017 at 10:13 AM, Alexey Serbin 


wrote:

I'll take a look at delete_table-itest (at least I have had a 
patch in

review for one flake there for a long time).

BTW, it would be much better if it were possible to see the type of


failed

build in the dashboard (as it was prior to quasar).  Is the type 
of a



build


something inherently impossible to expose from quasar?

I think it should be possible by just setting the BUILD_ID 
environment
variable appropriate before reporting the test result. That 
information

should be available in the enviornment as $BUILD_TYPE or somesuch. I
think
Ed is out this week but maybe he can take a look at this when he 
gets

back?

-Todd




Best regards,

Alexey


On 11/20/17 11:50 AM, Todd Lipcon wrote:

Hey folks,
It seems some of our tests have gotten pretty flaky lately 
again. Some



of
it is likely due to churn in test infrastructure (running on a 
different

VM
type now I think) but it makes me a little nervous to go into 
the 1.6

release with some tests at 5%+ flaky.

Can we get some volunteers to triage the top couple most flaky? 
Note



that
"triage" doesn't necessarily mean "fix" -- just want to 
investigate to

the
point that we can decide it's likely to be a test issue or known
existing
issue rather than a regression before the release.
I'll volunteer to look at consensus_peers-itests (the top most 
flaky



one).
-Todd



--
Todd Lipcon
Software Engineer, Cloudera














Re: Flaky tests?

2017-11-27 Thread Alexey Serbin
Yep, that CatalogManagerAddsNonVoter is the new one which was committed 
just yesterday.


On 11/27/17 6:53 PM, Alexey Serbin wrote:
The raft_consensus_nonvoter-itest is the set of tests added for 3-4-3 
re-replication improvements.  I'm adding more scenarios there right 
now, and I'll take care of the current flaky ones from there as well.



Thanks,

Alexey

On 11/27/17 6:38 PM, Andrew Wong wrote:

N/w! I should have checked with you beforehand given you were already in
the area (per your response last week). Seems the double-effort was 
fairly

minimal anyway.

With the fixes for tablet_copy-itest and delete_table-itest checked 
in, the

next-highest offenders on the dashboard
 are:

- raft_consensus_nonvoter-itest (9.62%)
- linked_list-test (8.45%)

 From a quick glance I'm not sure I have a grasp on what's going on in
either test. Would anyone like to volunteer? 

On Mon, Nov 27, 2017 at 6:27 PM, Alexey Serbin  
wrote:



I just realized after re-reading this message that Andrew was about to
look at the flake in delete_table-itest as well.  I'm sorry for the
double-effort here, if any.  I read this message after posting the 
patch.




On 11/27/17 12:09 PM, Andrew Wong wrote:


I'm taking a look at tablet_copy-itest and the flakiness in
delete_table-itest beyond Alexey's outstanding patch.

On Tue, Nov 21, 2017 at 10:17 AM, Todd Lipcon  
wrote:


On Tue, Nov 21, 2017 at 10:13 AM, Alexey Serbin 

wrote:

I'll take a look at delete_table-itest (at least I have had a 
patch in

review for one flake there for a long time).

BTW, it would be much better if it were possible to see the type of


failed

build in the dashboard (as it was prior to quasar).  Is the type 
of a



build


something inherently impossible to expose from quasar?

I think it should be possible by just setting the BUILD_ID 
environment
variable appropriate before reporting the test result. That 
information

should be available in the enviornment as $BUILD_TYPE or somesuch. I
think
Ed is out this week but maybe he can take a look at this when he gets
back?

-Todd




Best regards,

Alexey


On 11/20/17 11:50 AM, Todd Lipcon wrote:

Hey folks,
It seems some of our tests have gotten pretty flaky lately 
again. Some



of
it is likely due to churn in test infrastructure (running on a 
different

VM
type now I think) but it makes me a little nervous to go into 
the 1.6

release with some tests at 5%+ flaky.

Can we get some volunteers to triage the top couple most flaky? 
Note



that
"triage" doesn't necessarily mean "fix" -- just want to 
investigate to

the
point that we can decide it's likely to be a test issue or known
existing
issue rather than a regression before the release.
I'll volunteer to look at consensus_peers-itests (the top most 
flaky



one).
-Todd



--
Todd Lipcon
Software Engineer, Cloudera












Re: Flaky tests?

2017-11-27 Thread Alexey Serbin
The raft_consensus_nonvoter-itest is the set of tests added for 3-4-3 
re-replication improvements.  I'm adding more scenarios there right now, 
and I'll take care of the current flaky ones from there as well.



Thanks,

Alexey

On 11/27/17 6:38 PM, Andrew Wong wrote:

N/w! I should have checked with you beforehand given you were already in
the area (per your response last week). Seems the double-effort was fairly
minimal anyway.

With the fixes for tablet_copy-itest and delete_table-itest checked in, the
next-highest offenders on the dashboard
 are:

- raft_consensus_nonvoter-itest (9.62%)
- linked_list-test (8.45%)

 From a quick glance I'm not sure I have a grasp on what's going on in
either test. Would anyone like to volunteer? 

On Mon, Nov 27, 2017 at 6:27 PM, Alexey Serbin  wrote:


I just realized after re-reading this message that Andrew was about to
look at the flake in delete_table-itest as well.  I'm sorry for the
double-effort here, if any.  I read this message after posting the patch.



On 11/27/17 12:09 PM, Andrew Wong wrote:


I'm taking a look at tablet_copy-itest and the flakiness in
delete_table-itest beyond Alexey's outstanding patch.

On Tue, Nov 21, 2017 at 10:17 AM, Todd Lipcon  wrote:

On Tue, Nov 21, 2017 at 10:13 AM, Alexey Serbin 

wrote:

I'll take a look at delete_table-itest (at least I have had a patch in

review for one flake there for a long time).

BTW, it would be much better if it were possible to see the type of


failed


build in the dashboard (as it was prior to quasar).  Is the type of a


build


something inherently impossible to expose from quasar?

I think it should be possible by just setting the BUILD_ID environment

variable appropriate before reporting the test result. That information
should be available in the enviornment as $BUILD_TYPE or somesuch. I
think
Ed is out this week but maybe he can take a look at this when he gets
back?

-Todd




Best regards,

Alexey


On 11/20/17 11:50 AM, Todd Lipcon wrote:

Hey folks,

It seems some of our tests have gotten pretty flaky lately again. Some


of
it is likely due to churn in test infrastructure (running on a different

VM
type now I think) but it makes me a little nervous to go into the 1.6
release with some tests at 5%+ flaky.

Can we get some volunteers to triage the top couple most flaky? Note


that
"triage" doesn't necessarily mean "fix" -- just want to investigate to
the
point that we can decide it's likely to be a test issue or known
existing
issue rather than a regression before the release.

I'll volunteer to look at consensus_peers-itests (the top most flaky


one).
-Todd



--
Todd Lipcon
Software Engineer, Cloudera










Re: Flaky tests?

2017-11-27 Thread Alexey Serbin
I just realized after re-reading this message that Andrew was about to 
look at the flake in delete_table-itest as well.  I'm sorry for the 
double-effort here, if any.  I read this message after posting the patch.



On 11/27/17 12:09 PM, Andrew Wong wrote:

I'm taking a look at tablet_copy-itest and the flakiness in
delete_table-itest beyond Alexey's outstanding patch.

On Tue, Nov 21, 2017 at 10:17 AM, Todd Lipcon  wrote:


On Tue, Nov 21, 2017 at 10:13 AM, Alexey Serbin 
wrote:


I'll take a look at delete_table-itest (at least I have had a patch in
review for one flake there for a long time).

BTW, it would be much better if it were possible to see the type of

failed

build in the dashboard (as it was prior to quasar).  Is the type of a

build

something inherently impossible to expose from quasar?


I think it should be possible by just setting the BUILD_ID environment
variable appropriate before reporting the test result. That information
should be available in the enviornment as $BUILD_TYPE or somesuch. I think
Ed is out this week but maybe he can take a look at this when he gets back?

-Todd




Best regards,

Alexey


On 11/20/17 11:50 AM, Todd Lipcon wrote:


Hey folks,

It seems some of our tests have gotten pretty flaky lately again. Some

of

it is likely due to churn in test infrastructure (running on a different
VM
type now I think) but it makes me a little nervous to go into the 1.6
release with some tests at 5%+ flaky.

Can we get some volunteers to triage the top couple most flaky? Note

that

"triage" doesn't necessarily mean "fix" -- just want to investigate to

the

point that we can decide it's likely to be a test issue or known

existing

issue rather than a regression before the release.

I'll volunteer to look at consensus_peers-itests (the top most flaky

one).

-Todd





--
Todd Lipcon
Software Engineer, Cloudera








Re: Flaky tests?

2017-11-21 Thread Alexey Serbin
I'll take a look at delete_table-itest (at least I have had a patch in 
review for one flake there for a long time).


BTW, it would be much better if it were possible to see the type of 
failed build in the dashboard (as it was prior to quasar).  Is the type 
of a build something inherently impossible to expose from quasar?



Best regards,

Alexey

On 11/20/17 11:50 AM, Todd Lipcon wrote:

Hey folks,

It seems some of our tests have gotten pretty flaky lately again. Some of
it is likely due to churn in test infrastructure (running on a different VM
type now I think) but it makes me a little nervous to go into the 1.6
release with some tests at 5%+ flaky.

Can we get some volunteers to triage the top couple most flaky? Note that
"triage" doesn't necessarily mean "fix" -- just want to investigate to the
point that we can decide it's likely to be a test issue or known existing
issue rather than a regression before the release.

I'll volunteer to look at consensus_peers-itests (the top most flaky one).

-Todd