Great!

Unless I hear otherwise I will steer people away from new features on the Jira 
- delay them to 4.15.1 and 5.1.1 to be specific.
I suppose we have a testing gap for upgrades. These are hard to do in an 
automated way, though. If someone has some cycles to think about how to do that 
in an IT that would be valuable and appreciated.
Or perhaps we do not have the right strategy in general and code and metadata 
are coupled too tightly...? Also something to think about.

Let's close the final Jiras soon. And the we can release a 
SNAPSHOT/beta/whatever for people to bang against.

And let's keep the test suite passing. And if you happen to look at some test 
code, also think about the performance. It is much more valuable to have the 
test suite that can pass in 1h than one that takes 2 or 3h to run.

One last question... Should we revert the ViewIndexId changes and push then to 
4.15.1/5.1.1? Is there a pressing need for those? (I suppose with the way we 
use Views at Salesforce there probably is, just want to confirm.)
-- Lars

    On Friday, June 28, 2019, 2:50:06 PM PDT, Geoffrey Jacoby 
<[email protected]> wrote:  
 
 Lars,

That sounds good to me -- whether the thing people test is called "beta" or
"-SNAPSHOT", the important thing is that our code base is well-tested and,
as you say, something we all have confidence in.

In addition to splittable syscat (which I used as an example not because of
quality concerns but because it's both very large, very central and
one-way), other changes in 4.15 / 5.1 that might need attention for upgrade
testing are the optional increase of ViewIndexId from a short to an int
(PHOENIX-3547), and my own changes to fix a bug in ViewIndexId generation
in PHOENIX-5132 / 5138. (The bug fix was simple; making it upgrade
pre-existing view index sequences in a safe way was hard.) There are likely
others I've forgotten or don't know about.

The index changes also require upgrade and perf testing (some of which has
been done, with good results, but more to go), but the nice thing there is
that they're feature-flagged (opt-out for new tables/indexes, opt-in for
existing ones via the upgrade tool in PHOENIX-5333.) and operators can
switch back to the old design (even for new tables and indexes) if they
need to using the upgrade tool's rollback option. So once testing is
complete I think it's fine for them to go in 4.14.3.

Geoffrey




On Fri, Jun 28, 2019 at 12:20 PM [email protected] <[email protected]> wrote:

>  Any further comments?
> I offered to be the RM for 4.15.0, and I stand by that. I can't do it
> alone, though. Do we have consensus on the rough course of action below?
> Any other ideas? How far are we from a reasonable 4.15.0 release?
>
> IMHO we urgently need to apply some software engineering principles,
> namely an always releasable code base and small, frequent releases.
>
> -- Lars
>
>    On Thursday, June 27, 2019, 3:07:27 PM PDT, [email protected] <
> [email protected]> wrote:
>
>  Thanks Geoffrey.
> The damage is already done. We messed up and let it slide (multiple times,
> this is by no means the first time) and thus are in exactly the situation
> you outlined: No confidence in the code base.
> Now we can only look forward and get the code into a releasable state. The
> most important aspects are - as you point out, and I agree - getting
> confidence in splittable syscat and finishing the indexing work.
>
> In hindsight we should have done a release right before splittable syscat
> and perhaps one right after. Oh well. :)
>
> Could you mark the Jiras you remember with 4.15.0 and 5.1.0 fix versions
> (or are you saying you did already?)
> Since you say that we can release 4.14.3 with just the index changes, does
> that imply that you are mostly concerned about splittable syscat in 4.15.0
> and 5.1.0?
>
> I'm not a fan of a "beta" release, honestly. We can only do as good as we
> can and release a version that we believe in good conscience that there are
> no major issues. All releases will contain some bugs that are found later.
> It seems we are not even at that point yet... The good conscience part. :)
>
> How about we institute an immediate absolutely-no-new-feature policy for
> *all* of Phoenix until we have a releasable project? I'd be happy to
> enforce that. One cannot add new features to a code base that is not
> releasable/stable anyway. Until a few weeks ago we *never* had a passing
> test run. I really don't understand how we get here over and over again.
> But whatever, it's too late, and whining surely doesn't help.
>
> Lemme propose the following action plan then based on this and what you
> said:
> 1. We release 4.14.3 with just the index changes. Soon.
>
> 2. We immediately stop all new feature development in all branches
> (including 5.x, i.e. master)
>
> 3. We harden/test/etc splittable syscat as well as other accumulated tech
> debt that we identify.
>
> 4. After we release 4.15.0 and 5.1.0 we allow feature work again.
> 5. Following those releases we do strictly monthly releases on all
> branches (and if we cannot do that, declare a branch dead)
> Some of these (especially #5) might be radical, but we if we want to avoid
> this situation again we need to apply some rigor. As is Phoenix has been
> turning into an almost unmaintainable project over the past years, we need
> to actively counter that.
>
> Cheers!
>
> -- Lars
>
>    On Thursday, June 27, 2019, 1:36:37 PM PDT, Geoffrey Jacoby <
> [email protected]> wrote:
>
>  Lars,
>
> I agree 100% that we should have smaller, more frequent releases going
> forward. As for this release, I have two concerns.
>
> The first is indexes. I've added several JIRAs that had been incorrectly
> not marked with a Fix Version to 4.15 / 5.1. These are all part of the
> Self-Repairing Index project, which spans several JIRAs and whose first
> major one (PHOENIX-5156, allowing newly created mutable indexes to
> self-repair inconsistencies at read time) is already in 4.15 and 5.1.
> Outstanding JIRAs include PHOENIX-5211 to extend the logic to immutable
> indexes, and PHOENIX-5333, to give users a tool to convert their legacy
> indexes to the new model. These are all under review and should land very
> soon.
>
> Especially given the multiple reports on the user list of operators
> encountering index consistency problems (which I have also seen in my own
> environments), I think it's important that our next release include these
> fixes, and that they go out in a unified way.
>
> The second concern is testing, particularly upgrade, perf and chaos
> testing. In addition to the large index changes (for which I know some perf
> work and live-cluster testing has been done, with more planned), there are
> other major changes in 4.15 such as the splittable system catalog. If all
> the issues on the current list were fixed, I'd still be reluctant to put
> the bits into production without more due diligence. We've released
> binaries with significant regressions in them that were missed in our test
> suites before, and it's important to avoid that this time.
>
> Yet Lars's point that we've waited far too long to release is of course
> correct. Perhaps the solution is to do what the HBase community did when
> the 2.x branch dragged out too long, and after the listed issues are Fixed,
> we release an explicit beta, closed to new features, from which a final
> release can graduate. In parallel, we could release a 4.14.3 with just the
> index changes and the current diff from 4.14.2 so users get those faster.
>
> Or maybe our testing's advanced further than I know about, and we're closer
> to green than I think. Happy to hear everyone's thoughts.
>
> Geoffrey
>
> On Thu, Jun 27, 2019 at 10:26 AM [email protected] <[email protected]>
> wrote:
>
> > Hi all,
> > we're getting close. The test suite is passing fairly reliably now.(minus
> > some strange failure to archive the artifact in -1.4 and PartialCommitIT
> > failing in -1.3 only).
> > I put a lot of effort into speeding up the tests and making them pass.
> > Let's please (pretty please :) ) keep it that way.A passing,
> comprehensive
> > test suite is key to frequent releases.
> >
> > I also committed and push some issues to 4.15.1 and 5.1.1 already. But I
> > can't do it alone.
> >
> > There are 14 items to go for 4.15.0. Some of those are potentially
> serious.
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20PHOENIX%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20%22Patch%20Available%22)%20AND%20fixVersion%20%3D%204.15.0
> >
> > And 26 items for 5.1.0
> >
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20PHOENIX%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%20fixVersion%20%3D%205.1.0
> >
> > Let's make a final push and get these done (or moved to 4.15.1/5.1.1,
> > resp)If you have any issues open, please either get them committed to
> move
> > them to the next release.
> >
> > And then let's try to never get into this situation again where we have a
> > huge unreleased (and unreleasable) code base with 100's or 1000's of
> > unreleased changes.
> > Thanks!
> > -- Lars
> >
>
  

Reply via email to