On Thu, Apr 18, 2013 at 10:51 PM, Chip Childers
<chip.child...@sungard.com> wrote:
> On Apr 18, 2013, at 10:29 PM, David Nalley <da...@gnsa.us> wrote:
>
>> On Thu, Apr 18, 2013 at 10:26 PM, Chiradeep Vittal
>> <chiradeep.vit...@citrix.com> wrote:
>>>
>>>
>>> On 4/18/13 6:41 PM, "David Nalley" <da...@gnsa.us> wrote:
>>>
>>>> On Thu, Apr 18, 2013 at 6:26 PM, Will Chan <will.c...@citrix.com> wrote:
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Chip Childers [mailto:chip.child...@sungard.com]
>>>>>> Sent: Monday, April 15, 2013 7:22 AM
>>>>>> To: dev@cloudstack.apache.org
>>>>>> Cc: cloudstack-...@incubator.apache.org
>>>>>> Subject: Re: [ASFCS42] Proposed schedule for our next release
>>>>>>
>>>>>> On Thu, Apr 11, 2013 at 02:50:02PM -0700, Animesh Chaturvedi wrote:
>>>>>>>
>>>>>>> I want to call out my concern on technical debt we have accumulated
>>>>> so
>>>>>> far.
>>>>>>>
>>>>>>> I did an analysis on JIRA bugs yesterday night PST on "Affects
>>>>>>> Version = 4.1" and created since Dec 2012
>>>>>>>
>>>>>>> Total records : 429
>>>>>>> Resolution Type (Invalid, Duplicate, Cannot reproduce etc.) : 87 (30
>>>>>>> Blockers, 27 Critical, 27 Major, 4 Minor) Valid Defects  : 429-87=
>>>>> 342
>>>>>>> Fixed : 246 (60 Blockers, 70 Critical, 99 Majors) out of which 217
>>>>>>> were fixed since Feb Unresolved : 96 (1 Blocker, 8 Critical, 64
>>>>> Major)
>>>>>>>
>>>>>>> With this data it looks like we have fixed 2/3 of valid defects in
>>>>> little over
>>>>>> 2 months and pretty much deferring around 1/3 rd of issues for future
>>>>>> release.
>>>>>>>
>>>>>>> I also looked at overall backlog of bugs (Critical, Major and
>>>>> Blockers only)
>>>>>> as of 4/10/2013 - 10:0PM PST.
>>>>>>>
>>>>>>> 284 open (18 Blocker, 38 Critical, 228 Major) ; By Fix version
>>>>>>>    -  Release 4.0.x and prior: 13
>>>>>>>    -  4.1: 70
>>>>>>>    -  4.2 : 97
>>>>>>>    -  Future: 8
>>>>>>>    -  No version: 107
>>>>>>>
>>>>>>> Looking at that we fixed 217 bugs in roughly 2 months during 4.1
>>>>> cycle,
>>>>>> fixing the backlog of bug  will probably take us 2 months.  Should we
>>>>> extend
>>>>>> the 4.2 test cycle by 2 months [Original Schedule: 6/1 - 7/22,
>>>>> Extended
>>>>>> Schedule: 6/1-9/22] to reduce the technical debt significantly? I
>>>>> would like
>>>>>> to hear how community wants to address technical debt. Based on the
>>>>>> input and consensus I will publish the agreed schedule next week.
>>>>>>
>>>>>> I don't think that an extension of time changes bug counts really.
>>>>> IMO, we
>>>>>> need to pull together to have some bug-fix focused effort applied to
>>>>> the
>>>>>> code-base.  It's also another reason that I'm so big on making sure
>>>>> that
>>>>>> automated tests come in with the new features.  That doesn't address
>>>>> test
>>>>>> scenarios that human testers can come up with, but if a developer
>>>>> spends
>>>>>> the time to think about testing the basic feature and codifies that,
>>>>> we
>>>>>> should at least avoid the "this actually doesn't work at all" types
>>>>> of bugs.
>>>>>>
>>>>>> There's a school of thought that says, don't build another feature
>>>>> until you
>>>>>> have sorted out the known bugs in the current features.  I don't
>>>>> think we
>>>>>> could really pull that off, but perhaps a different thread to rally
>>>>> people
>>>>>> around the bug backlog is in order?
>>>>>>
>>>>>> -chip
>>>>>
>>>>> Sorry to chime in so late to this thread as I've been offsite for the
>>>>> better part of this week.  I was one of the original 4 month release
>>>>> crowd but after the recent two releases of ACS, I'm starting to wonder
>>>>> if we shouldn't start moving this to a 6 month cycle instead of two.
>>>>> Here are some high level observations based on the previous two releases:
>>>>>
>>>>> 1. It doesn't seem like we are on a true 4 month time based release
>>>>> schedule.  Both 4.0 and 4.1 were delayed more than several weeks past the
>>>>> original proposed GA date.  4.0 was released 11/6 and let's assume that
>>>>> 4.1 will ship within a week or two.  That's almost a 6 month release
>>>>> cycle.
>>>>
>>>> So both 4.0 and 4.1 strike me as extraordinary. 4.0 was our first
>>>> release - and we had lots of issues to resolve. 4.1 introduced a ton
>>>> of packaging and name changes that I also consider to be hopefully one
>>>> time. Really - we've only been through our release cycle once, so I am
>>>> not ready to declare it perpetually behind schedule.
>>>>
>>>>
>>>>> Every release incurs a fixed cost of release notes, upgrade testing,
>>>>> etc. that I suspect at least eats a month worth of time depending on
>>>>> people's
>>>>> schedule.  That's 3 months out of the year rather than two if we can
>>>>> get a 6 months cycle.  We can use that extra month for other purposes if
>>>>> need
>>>>> be.  I suppose if we want to continue to release past the proposed hard
>>>>> GA date, then I guess it doesn't matter if it's 4 or 6 months.  It's
>>>>> basically a
>>>>> release when the release mgmt. team feels it's right to release based
>>>>> on current bugs, etc.
>>>>
>>>> Having seen the point releases twice now, which still need upgrade
>>>> testing, release notes, etc I don't get the feeling that the
>>>> 'overhread' referred to above is the problem. Joe may disagree with
>>>> me.
>>>>
>>>>> 2. As more and more features/development go in, it just means more
>>>>> destabilization of the code.  4.0 was delayed and the majority of that
>>>>> work was
>>>>> licensing files.  4.1 got just a bit more complicated with new feature
>>>>> development and the delay is now much longer.  Not all features are
>>>>> created
>>>>> equal in terms of testing.  Some may require more time to develop but
>>>>> may not impact the entire system like for example, adding a new
>>>>> hypervisor.
>>>>> However, work like refactoring vm sync or other more internal code
>>>>> could affect the entire stack and require more QA time.  We need extra
>>>>> time for
>>>>> new code to settle in.
>>>>
>>>> I wonder why we would merge feature that we can't prove doesn't break
>>>> the entire stack and prove that it works. Some of this is the missing
>>>> automation you talk about below. Essentially we have no way, sometimes
>>>> until months after the merge, to tell if something works or not
>>>> because we relay on manual QA to test it.
>>>>
>>>>> 3. ACS is still dependent largely on manual QA.  Let's face it, our
>>>>> automated testing/unit testing isn't mature enough quite yet and we
>>>>> cannot always expect manual QA to be there and on ACS schedule.
>>>>> CloudStack releases have some type of quality expectations as well as
>>>>> support for upgrades.  Upgrades and migration scripts aren't that easily
>>>>> automatable.  Chip and others have been very diligent on ensuring that
>>>>> code check in has the appropriate tests but it's not there yet.
>>>>>
>>>>> 4. ACS development is based on volunteer work and many of us have a
>>>>> $dayjob and may not be able to assist with fixing bugs in ACS schedule.
>>>>> Having only a couple of months to fix bugs and expect others to follow
>>>>> our ACS schedule seems a bit rushed.  Wearing my Citrix hat now, I can
>>>>> tell you that 2 months of QA and bug fixing  is not enough to release
>>>>> quality GA release.  And that is with me breathing down the necks of
>>>>> many of the engineers to get them fixed on time.  ACS does not have this
>>>>> type of culture and nor should it.   Given that, we should be a bit more
>>>>> flexible in terms of allowing people eventually to act on issues.
>>>>
>>>>
>>>> So a couple of other comments.
>>>> We have folks clamoring for the awesome new features. To the point
>>>> they are creating derivative works (which tells me we are doing some
>>>> things right as folks are finding it easy enough to do)
>>>>
>>>> What I gathered from reading the above doesn't really have anything to
>>>> do with schedule:
>>>> * New development destabilizes our code base, and is a threat to
>>>> quality and the release schedule
>>>> * We can not depend on the current level of manual QA to be present
>>>> going forward.
>>>>
>>>> This brings me to conclusion that as a community we should seriously
>>>> temper our inclusion of new features and make our focus automated
>>>> testing until such time as pushing a release out is less months of
>>>> manual QA processes and more of a decision. This makes me want to
>>>> raise the barrier for merges even higher. Perhaps running the entire
>>>> Marvin suite with the proposed merge is what we need to begin
>>>> mandating.
>>>>
>>>> --David "who wishes he had kept working on Automated QA tasks" Nalley :)
>>>
>>> How does "temper the inclusion of new features" jive with "folks clamoring
>>> for awesome new features" ?
>>
>> It doesn't.
>> But as Animesh indicates all we are really doing is racking up
>> technical debt. We will have to pay the Piper one way or another.
>>
>> --David
>>
>
> I disagree. You might take a hit in the short term as people get
> acclimated, but I've accelerated multiple projects' output by being
> exceptionally focused on automated quality checks (unit, integration,
> regression, etc...). Feature dev is faster if your base is stable.
> Release QA is easier with higher quality prior to a merge. Refactoring
> is easier. Teams can be larger and more distributed. Etc...
>

I agree with you, sorry if it came across differently.
In the short term however we don't have that highly automated base
today, so we can continue on our rapid feature pace with few people
focused on automated testing (esp holistically) but that isn't
sustainable. We do have to take the short term hit IMO and pay the
piper and get rid of our technical debt (though I am not talking about
our bug backlog.)


> Every time a project compromises its harder to do the next change.
> That's why we are talking about a material "release cost" right now.
> Manual only testing is a poor strategy for a project of this size. It
> should only ever find exceptional conditions.
>

Agreed.

> I'll try to dig up some studies that back me up tomorrow.

Reply via email to