Re: Kudu 1.6.x branching Tue eve, RC1 on Thu

2017-11-29 Thread Mike Percy
Hi folks,
I've branched for 1.6.0 (a day late) and I'm still planning on spinning
1.6.0-RC1 tomorrow (Thursday).

We've included Andrew's disk failure patches (enabled by default) but we
are not shipping improved re-replication (KUDU-1097) enabled for this
release.

Because 1.6.x has branched, please add me as a reviewer for anything that
needs to get in for RC1 at the last minute, such as release notes.

Thanks,
Mike


On Tue, Nov 28, 2017 at 2:51 PM, Mike Percy  wrote:

> Hi all,
>
> I just wanted to give an update on the Kudu 1.6 release plan.
>
> We have a few outstanding patches that I'd like to see go in for the 1.6
> release:
> - Tests from Andrew related to disk failure handling
> - Wrap-up on improved re-replication from Alexey and I. We'll make a call
> tomorrow (Wed) on whether to ship this enabled or disabled for Kudu 1.6.0.
>
> My plan is to branch in the evening tonight and cut an RC1 on Thu. That
> will give us about a week of solid testing time from branch to final vote
> tally, since we'll have the weekend plus a couple weekdays to vote, and if
> we don't find any issues with RC1 then it will become Kudu 1.6.0 mid next
> week.
>
> If you have not sent me release notes for work you did during this release
> cycle, please write them up ASAP and send them to me.
>
> Please let me know if you have any concerns with this plan or any
> suggestions.
>
> Thanks!
> Mike
>
>


Re: Flaky tests?

2017-11-29 Thread Alexey Serbin

An update: the flakiness in raft_consensus_nonvoter-itest has been fixed.

On 11/27/17 6:55 PM, Alexey Serbin wrote:
Yep, that CatalogManagerAddsNonVoter is the new one which was 
committed just yesterday.


On 11/27/17 6:53 PM, Alexey Serbin wrote:
The raft_consensus_nonvoter-itest is the set of tests added for 3-4-3 
re-replication improvements.  I'm adding more scenarios there right 
now, and I'll take care of the current flaky ones from there as well.



Thanks,

Alexey

On 11/27/17 6:38 PM, Andrew Wong wrote:
N/w! I should have checked with you beforehand given you were 
already in
the area (per your response last week). Seems the double-effort was 
fairly

minimal anyway.

With the fixes for tablet_copy-itest and delete_table-itest checked 
in, the

next-highest offenders on the dashboard
 are:

- raft_consensus_nonvoter-itest (9.62%)
- linked_list-test (8.45%)

 From a quick glance I'm not sure I have a grasp on what's going on in
either test. Would anyone like to volunteer? 😃

On Mon, Nov 27, 2017 at 6:27 PM, Alexey Serbin 
 wrote:



I just realized after re-reading this message that Andrew was about to
look at the flake in delete_table-itest as well.  I'm sorry for the
double-effort here, if any.  I read this message after posting the 
patch.




On 11/27/17 12:09 PM, Andrew Wong wrote:


I'm taking a look at tablet_copy-itest and the flakiness in
delete_table-itest beyond Alexey's outstanding patch.

On Tue, Nov 21, 2017 at 10:17 AM, Todd Lipcon  
wrote:


On Tue, Nov 21, 2017 at 10:13 AM, Alexey Serbin 


wrote:

I'll take a look at delete_table-itest (at least I have had a 
patch in

review for one flake there for a long time).

BTW, it would be much better if it were possible to see the type of


failed

build in the dashboard (as it was prior to quasar).  Is the type 
of a



build


something inherently impossible to expose from quasar?

I think it should be possible by just setting the BUILD_ID 
environment
variable appropriate before reporting the test result. That 
information

should be available in the enviornment as $BUILD_TYPE or somesuch. I
think
Ed is out this week but maybe he can take a look at this when he 
gets

back?

-Todd




Best regards,

Alexey


On 11/20/17 11:50 AM, Todd Lipcon wrote:

Hey folks,
It seems some of our tests have gotten pretty flaky lately 
again. Some



of
it is likely due to churn in test infrastructure (running on a 
different

VM
type now I think) but it makes me a little nervous to go into 
the 1.6

release with some tests at 5%+ flaky.

Can we get some volunteers to triage the top couple most flaky? 
Note



that
"triage" doesn't necessarily mean "fix" -- just want to 
investigate to

the
point that we can decide it's likely to be a test issue or known
existing
issue rather than a regression before the release.
I'll volunteer to look at consensus_peers-itests (the top most 
flaky



one).
-Todd



--
Todd Lipcon
Software Engineer, Cloudera














gerrit.cloudera.org brief restart tonight @ 7:30pm PST

2017-11-29 Thread Mike Percy
Hi devs,

Kudu is branching tonight and we need to update our Gerrit replication
configuration to account for the new branch. The downtime shouldn't be more
than 10 minutes. This is just a small addition to a Gerrit configuration
file and if there are any problems the changes will be reverted to the
current version of the config file.

Please reach out if you have any concerns about this.

If I don't hear any protests then I won't bother sending out another email
about this restart.

Thanks,
Mike