Re: 4.15.0 and 5.1.0 releases

2019-06-28 Thread la...@apache.org
 Great!

Unless I hear otherwise I will steer people away from new features on the Jira 
- delay them to 4.15.1 and 5.1.1 to be specific.
I suppose we have a testing gap for upgrades. These are hard to do in an 
automated way, though. If someone has some cycles to think about how to do that 
in an IT that would be valuable and appreciated.
Or perhaps we do not have the right strategy in general and code and metadata 
are coupled too tightly...? Also something to think about.

Let's close the final Jiras soon. And the we can release a 
SNAPSHOT/beta/whatever for people to bang against.

And let's keep the test suite passing. And if you happen to look at some test 
code, also think about the performance. It is much more valuable to have the 
test suite that can pass in 1h than one that takes 2 or 3h to run.

One last question... Should we revert the ViewIndexId changes and push then to 
4.15.1/5.1.1? Is there a pressing need for those? (I suppose with the way we 
use Views at Salesforce there probably is, just want to confirm.)
-- Lars

On Friday, June 28, 2019, 2:50:06 PM PDT, Geoffrey Jacoby 
 wrote:  
 
 Lars,

That sounds good to me -- whether the thing people test is called "beta" or
"-SNAPSHOT", the important thing is that our code base is well-tested and,
as you say, something we all have confidence in.

In addition to splittable syscat (which I used as an example not because of
quality concerns but because it's both very large, very central and
one-way), other changes in 4.15 / 5.1 that might need attention for upgrade
testing are the optional increase of ViewIndexId from a short to an int
(PHOENIX-3547), and my own changes to fix a bug in ViewIndexId generation
in PHOENIX-5132 / 5138. (The bug fix was simple; making it upgrade
pre-existing view index sequences in a safe way was hard.) There are likely
others I've forgotten or don't know about.

The index changes also require upgrade and perf testing (some of which has
been done, with good results, but more to go), but the nice thing there is
that they're feature-flagged (opt-out for new tables/indexes, opt-in for
existing ones via the upgrade tool in PHOENIX-5333.) and operators can
switch back to the old design (even for new tables and indexes) if they
need to using the upgrade tool's rollback option. So once testing is
complete I think it's fine for them to go in 4.14.3.

Geoffrey




On Fri, Jun 28, 2019 at 12:20 PM la...@apache.org  wrote:

>  Any further comments?
> I offered to be the RM for 4.15.0, and I stand by that. I can't do it
> alone, though. Do we have consensus on the rough course of action below?
> Any other ideas? How far are we from a reasonable 4.15.0 release?
>
> IMHO we urgently need to apply some software engineering principles,
> namely an always releasable code base and small, frequent releases.
>
> -- Lars
>
>    On Thursday, June 27, 2019, 3:07:27 PM PDT, la...@apache.org <
> la...@apache.org> wrote:
>
>  Thanks Geoffrey.
> The damage is already done. We messed up and let it slide (multiple times,
> this is by no means the first time) and thus are in exactly the situation
> you outlined: No confidence in the code base.
> Now we can only look forward and get the code into a releasable state. The
> most important aspects are - as you point out, and I agree - getting
> confidence in splittable syscat and finishing the indexing work.
>
> In hindsight we should have done a release right before splittable syscat
> and perhaps one right after. Oh well. :)
>
> Could you mark the Jiras you remember with 4.15.0 and 5.1.0 fix versions
> (or are you saying you did already?)
> Since you say that we can release 4.14.3 with just the index changes, does
> that imply that you are mostly concerned about splittable syscat in 4.15.0
> and 5.1.0?
>
> I'm not a fan of a "beta" release, honestly. We can only do as good as we
> can and release a version that we believe in good conscience that there are
> no major issues. All releases will contain some bugs that are found later.
> It seems we are not even at that point yet... The good conscience part. :)
>
> How about we institute an immediate absolutely-no-new-feature policy for
> *all* of Phoenix until we have a releasable project? I'd be happy to
> enforce that. One cannot add new features to a code base that is not
> releasable/stable anyway. Until a few weeks ago we *never* had a passing
> test run. I really don't understand how we get here over and over again.
> But whatever, it's too late, and whining surely doesn't help.
>
> Lemme propose the following action plan then based on this and what you
> said:
> 1. We release 4.14.3 with just the index changes. Soon.
>
> 2. We immediately stop all new feature development in all branches
> (including 5.x, i.e. master)
>
> 3. We harden/test/etc splittable syscat as well as other accumulated tech
> debt that we identify.
>
> 4. After we release 4.15.0 and 5.1.0 we allow feature work again.
> 5. Following those releases we do strictly monthly 

Re: 4.15.0 and 5.1.0 releases

2019-06-28 Thread Geoffrey Jacoby
Lars,

That sounds good to me -- whether the thing people test is called "beta" or
"-SNAPSHOT", the important thing is that our code base is well-tested and,
as you say, something we all have confidence in.

In addition to splittable syscat (which I used as an example not because of
quality concerns but because it's both very large, very central and
one-way), other changes in 4.15 / 5.1 that might need attention for upgrade
testing are the optional increase of ViewIndexId from a short to an int
(PHOENIX-3547), and my own changes to fix a bug in ViewIndexId generation
in PHOENIX-5132 / 5138. (The bug fix was simple; making it upgrade
pre-existing view index sequences in a safe way was hard.) There are likely
others I've forgotten or don't know about.

The index changes also require upgrade and perf testing (some of which has
been done, with good results, but more to go), but the nice thing there is
that they're feature-flagged (opt-out for new tables/indexes, opt-in for
existing ones via the upgrade tool in PHOENIX-5333.) and operators can
switch back to the old design (even for new tables and indexes) if they
need to using the upgrade tool's rollback option. So once testing is
complete I think it's fine for them to go in 4.14.3.

Geoffrey




On Fri, Jun 28, 2019 at 12:20 PM la...@apache.org  wrote:

>  Any further comments?
> I offered to be the RM for 4.15.0, and I stand by that. I can't do it
> alone, though. Do we have consensus on the rough course of action below?
> Any other ideas? How far are we from a reasonable 4.15.0 release?
>
> IMHO we urgently need to apply some software engineering principles,
> namely an always releasable code base and small, frequent releases.
>
> -- Lars
>
> On Thursday, June 27, 2019, 3:07:27 PM PDT, la...@apache.org <
> la...@apache.org> wrote:
>
>   Thanks Geoffrey.
> The damage is already done. We messed up and let it slide (multiple times,
> this is by no means the first time) and thus are in exactly the situation
> you outlined: No confidence in the code base.
> Now we can only look forward and get the code into a releasable state. The
> most important aspects are - as you point out, and I agree - getting
> confidence in splittable syscat and finishing the indexing work.
>
> In hindsight we should have done a release right before splittable syscat
> and perhaps one right after. Oh well. :)
>
> Could you mark the Jiras you remember with 4.15.0 and 5.1.0 fix versions
> (or are you saying you did already?)
> Since you say that we can release 4.14.3 with just the index changes, does
> that imply that you are mostly concerned about splittable syscat in 4.15.0
> and 5.1.0?
>
> I'm not a fan of a "beta" release, honestly. We can only do as good as we
> can and release a version that we believe in good conscience that there are
> no major issues. All releases will contain some bugs that are found later.
> It seems we are not even at that point yet... The good conscience part. :)
>
> How about we institute an immediate absolutely-no-new-feature policy for
> *all* of Phoenix until we have a releasable project? I'd be happy to
> enforce that. One cannot add new features to a code base that is not
> releasable/stable anyway. Until a few weeks ago we *never* had a passing
> test run. I really don't understand how we get here over and over again.
> But whatever, it's too late, and whining surely doesn't help.
>
> Lemme propose the following action plan then based on this and what you
> said:
> 1. We release 4.14.3 with just the index changes. Soon.
>
> 2. We immediately stop all new feature development in all branches
> (including 5.x, i.e. master)
>
> 3. We harden/test/etc splittable syscat as well as other accumulated tech
> debt that we identify.
>
> 4. After we release 4.15.0 and 5.1.0 we allow feature work again.
> 5. Following those releases we do strictly monthly releases on all
> branches (and if we cannot do that, declare a branch dead)
> Some of these (especially #5) might be radical, but we if we want to avoid
> this situation again we need to apply some rigor. As is Phoenix has been
> turning into an almost unmaintainable project over the past years, we need
> to actively counter that.
>
> Cheers!
>
> -- Lars
>
> On Thursday, June 27, 2019, 1:36:37 PM PDT, Geoffrey Jacoby <
> gjac...@gmail.com> wrote:
>
>  Lars,
>
> I agree 100% that we should have smaller, more frequent releases going
> forward. As for this release, I have two concerns.
>
> The first is indexes. I've added several JIRAs that had been incorrectly
> not marked with a Fix Version to 4.15 / 5.1. These are all part of the
> Self-Repairing Index project, which spans several JIRAs and whose first
> major one (PHOENIX-5156, allowing newly created mutable indexes to
> self-repair inconsistencies at read time) is already in 4.15 and 5.1.
> Outstanding JIRAs include PHOENIX-5211 to extend the logic to immutable
> indexes, and PHOENIX-5333, to give users a tool to convert their legacy
> indexes to the 

Re: 4.15.0 and 5.1.0 releases

2019-06-28 Thread la...@apache.org
 Any further comments?
I offered to be the RM for 4.15.0, and I stand by that. I can't do it alone, 
though. Do we have consensus on the rough course of action below? Any other 
ideas? How far are we from a reasonable 4.15.0 release?

IMHO we urgently need to apply some software engineering principles, namely an 
always releasable code base and small, frequent releases.

-- Lars

On Thursday, June 27, 2019, 3:07:27 PM PDT, la...@apache.org 
 wrote:  
 
  Thanks Geoffrey.
The damage is already done. We messed up and let it slide (multiple times, this 
is by no means the first time) and thus are in exactly the situation you 
outlined: No confidence in the code base.
Now we can only look forward and get the code into a releasable state. The most 
important aspects are - as you point out, and I agree - getting confidence in 
splittable syscat and finishing the indexing work.

In hindsight we should have done a release right before splittable syscat and 
perhaps one right after. Oh well. :)

Could you mark the Jiras you remember with 4.15.0 and 5.1.0 fix versions (or 
are you saying you did already?)
Since you say that we can release 4.14.3 with just the index changes, does that 
imply that you are mostly concerned about splittable syscat in 4.15.0 and 5.1.0?

I'm not a fan of a "beta" release, honestly. We can only do as good as we can 
and release a version that we believe in good conscience that there are no 
major issues. All releases will contain some bugs that are found later.
It seems we are not even at that point yet... The good conscience part. :)

How about we institute an immediate absolutely-no-new-feature policy for *all* 
of Phoenix until we have a releasable project? I'd be happy to enforce that. 
One cannot add new features to a code base that is not releasable/stable 
anyway. Until a few weeks ago we *never* had a passing test run. I really don't 
understand how we get here over and over again. But whatever, it's too late, 
and whining surely doesn't help.

Lemme propose the following action plan then based on this and what you said:
1. We release 4.14.3 with just the index changes. Soon.

2. We immediately stop all new feature development in all branches (including 
5.x, i.e. master)

3. We harden/test/etc splittable syscat as well as other accumulated tech debt 
that we identify.

4. After we release 4.15.0 and 5.1.0 we allow feature work again.
5. Following those releases we do strictly monthly releases on all branches 
(and if we cannot do that, declare a branch dead)
Some of these (especially #5) might be radical, but we if we want to avoid this 
situation again we need to apply some rigor. As is Phoenix has been turning 
into an almost unmaintainable project over the past years, we need to actively 
counter that.

Cheers!

-- Lars

    On Thursday, June 27, 2019, 1:36:37 PM PDT, Geoffrey Jacoby 
 wrote:  
 
 Lars,

I agree 100% that we should have smaller, more frequent releases going
forward. As for this release, I have two concerns.

The first is indexes. I've added several JIRAs that had been incorrectly
not marked with a Fix Version to 4.15 / 5.1. These are all part of the
Self-Repairing Index project, which spans several JIRAs and whose first
major one (PHOENIX-5156, allowing newly created mutable indexes to
self-repair inconsistencies at read time) is already in 4.15 and 5.1.
Outstanding JIRAs include PHOENIX-5211 to extend the logic to immutable
indexes, and PHOENIX-5333, to give users a tool to convert their legacy
indexes to the new model. These are all under review and should land very
soon.

Especially given the multiple reports on the user list of operators
encountering index consistency problems (which I have also seen in my own
environments), I think it's important that our next release include these
fixes, and that they go out in a unified way.

The second concern is testing, particularly upgrade, perf and chaos
testing. In addition to the large index changes (for which I know some perf
work and live-cluster testing has been done, with more planned), there are
other major changes in 4.15 such as the splittable system catalog. If all
the issues on the current list were fixed, I'd still be reluctant to put
the bits into production without more due diligence. We've released
binaries with significant regressions in them that were missed in our test
suites before, and it's important to avoid that this time.

Yet Lars's point that we've waited far too long to release is of course
correct. Perhaps the solution is to do what the HBase community did when
the 2.x branch dragged out too long, and after the listed issues are Fixed,
we release an explicit beta, closed to new features, from which a final
release can graduate. In parallel, we could release a 4.14.3 with just the
index changes and the current diff from 4.14.2 so users get those faster.

Or maybe our testing's advanced further than I know about, and we're closer
to green than I think. Happy to hear everyone's thoughts.

Re: 4.15.0 and 5.1.0 releases

2019-06-27 Thread la...@apache.org
 Thanks Geoffrey.
The damage is already done. We messed up and let it slide (multiple times, this 
is by no means the first time) and thus are in exactly the situation you 
outlined: No confidence in the code base.
Now we can only look forward and get the code into a releasable state. The most 
important aspects are - as you point out, and I agree - getting confidence in 
splittable syscat and finishing the indexing work.

In hindsight we should have done a release right before splittable syscat and 
perhaps one right after. Oh well. :)

Could you mark the Jiras you remember with 4.15.0 and 5.1.0 fix versions (or 
are you saying you did already?)
Since you say that we can release 4.14.3 with just the index changes, does that 
imply that you are mostly concerned about splittable syscat in 4.15.0 and 5.1.0?

I'm not a fan of a "beta" release, honestly. We can only do as good as we can 
and release a version that we believe in good conscience that there are no 
major issues. All releases will contain some bugs that are found later.
It seems we are not even at that point yet... The good conscience part. :)

How about we institute an immediate absolutely-no-new-feature policy for *all* 
of Phoenix until we have a releasable project? I'd be happy to enforce that. 
One cannot add new features to a code base that is not releasable/stable 
anyway. Until a few weeks ago we *never* had a passing test run. I really don't 
understand how we get here over and over again. But whatever, it's too late, 
and whining surely doesn't help.

Lemme propose the following action plan then based on this and what you said:
1. We release 4.14.3 with just the index changes. Soon.

2. We immediately stop all new feature development in all branches (including 
5.x, i.e. master)

3. We harden/test/etc splittable syscat as well as other accumulated tech debt 
that we identify.

4. After we release 4.15.0 and 5.1.0 we allow feature work again.
5. Following those releases we do strictly monthly releases on all branches 
(and if we cannot do that, declare a branch dead)
Some of these (especially #5) might be radical, but we if we want to avoid this 
situation again we need to apply some rigor. As is Phoenix has been turning 
into an almost unmaintainable project over the past years, we need to actively 
counter that.

Cheers!

-- Lars

On Thursday, June 27, 2019, 1:36:37 PM PDT, Geoffrey Jacoby 
 wrote:  
 
 Lars,

I agree 100% that we should have smaller, more frequent releases going
forward. As for this release, I have two concerns.

The first is indexes. I've added several JIRAs that had been incorrectly
not marked with a Fix Version to 4.15 / 5.1. These are all part of the
Self-Repairing Index project, which spans several JIRAs and whose first
major one (PHOENIX-5156, allowing newly created mutable indexes to
self-repair inconsistencies at read time) is already in 4.15 and 5.1.
Outstanding JIRAs include PHOENIX-5211 to extend the logic to immutable
indexes, and PHOENIX-5333, to give users a tool to convert their legacy
indexes to the new model. These are all under review and should land very
soon.

Especially given the multiple reports on the user list of operators
encountering index consistency problems (which I have also seen in my own
environments), I think it's important that our next release include these
fixes, and that they go out in a unified way.

The second concern is testing, particularly upgrade, perf and chaos
testing. In addition to the large index changes (for which I know some perf
work and live-cluster testing has been done, with more planned), there are
other major changes in 4.15 such as the splittable system catalog. If all
the issues on the current list were fixed, I'd still be reluctant to put
the bits into production without more due diligence. We've released
binaries with significant regressions in them that were missed in our test
suites before, and it's important to avoid that this time.

Yet Lars's point that we've waited far too long to release is of course
correct. Perhaps the solution is to do what the HBase community did when
the 2.x branch dragged out too long, and after the listed issues are Fixed,
we release an explicit beta, closed to new features, from which a final
release can graduate. In parallel, we could release a 4.14.3 with just the
index changes and the current diff from 4.14.2 so users get those faster.

Or maybe our testing's advanced further than I know about, and we're closer
to green than I think. Happy to hear everyone's thoughts.

Geoffrey

On Thu, Jun 27, 2019 at 10:26 AM la...@apache.org  wrote:

> Hi all,
> we're getting close. The test suite is passing fairly reliably now.(minus
> some strange failure to archive the artifact in -1.4 and PartialCommitIT
> failing in -1.3 only).
> I put a lot of effort into speeding up the tests and making them pass.
> Let's please (pretty please :) ) keep it that way.A passing, comprehensive
> test suite is key to frequent releases.
>
> I also 

Re: 4.15.0 and 5.1.0 releases

2019-06-27 Thread Geoffrey Jacoby
Lars,

I agree 100% that we should have smaller, more frequent releases going
forward. As for this release, I have two concerns.

The first is indexes. I've added several JIRAs that had been incorrectly
not marked with a Fix Version to 4.15 / 5.1. These are all part of the
Self-Repairing Index project, which spans several JIRAs and whose first
major one (PHOENIX-5156, allowing newly created mutable indexes to
self-repair inconsistencies at read time) is already in 4.15 and 5.1.
Outstanding JIRAs include PHOENIX-5211 to extend the logic to immutable
indexes, and PHOENIX-5333, to give users a tool to convert their legacy
indexes to the new model. These are all under review and should land very
soon.

Especially given the multiple reports on the user list of operators
encountering index consistency problems (which I have also seen in my own
environments), I think it's important that our next release include these
fixes, and that they go out in a unified way.

The second concern is testing, particularly upgrade, perf and chaos
testing. In addition to the large index changes (for which I know some perf
work and live-cluster testing has been done, with more planned), there are
other major changes in 4.15 such as the splittable system catalog. If all
the issues on the current list were fixed, I'd still be reluctant to put
the bits into production without more due diligence. We've released
binaries with significant regressions in them that were missed in our test
suites before, and it's important to avoid that this time.

Yet Lars's point that we've waited far too long to release is of course
correct. Perhaps the solution is to do what the HBase community did when
the 2.x branch dragged out too long, and after the listed issues are Fixed,
we release an explicit beta, closed to new features, from which a final
release can graduate. In parallel, we could release a 4.14.3 with just the
index changes and the current diff from 4.14.2 so users get those faster.

Or maybe our testing's advanced further than I know about, and we're closer
to green than I think. Happy to hear everyone's thoughts.

Geoffrey

On Thu, Jun 27, 2019 at 10:26 AM la...@apache.org  wrote:

> Hi all,
> we're getting close. The test suite is passing fairly reliably now.(minus
> some strange failure to archive the artifact in -1.4 and PartialCommitIT
> failing in -1.3 only).
> I put a lot of effort into speeding up the tests and making them pass.
> Let's please (pretty please :) ) keep it that way.A passing, comprehensive
> test suite is key to frequent releases.
>
> I also committed and push some issues to 4.15.1 and 5.1.1 already. But I
> can't do it alone.
>
> There are 14 items to go for 4.15.0. Some of those are potentially serious.
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20PHOENIX%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20%22Patch%20Available%22)%20AND%20fixVersion%20%3D%204.15.0
>
> And 26 items for 5.1.0
>
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20PHOENIX%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%20fixVersion%20%3D%205.1.0
>
> Let's make a final push and get these done (or moved to 4.15.1/5.1.1,
> resp)If you have any issues open, please either get them committed to move
> them to the next release.
>
> And then let's try to never get into this situation again where we have a
> huge unreleased (and unreleasable) code base with 100's or 1000's of
> unreleased changes.
> Thanks!
> -- Lars
>


4.15.0 and 5.1.0 releases

2019-06-27 Thread la...@apache.org
Hi all,
we're getting close. The test suite is passing fairly reliably now.(minus some 
strange failure to archive the artifact in -1.4 and PartialCommitIT failing in 
-1.3 only).
I put a lot of effort into speeding up the tests and making them pass. Let's 
please (pretty please :) ) keep it that way.A passing, comprehensive test suite 
is key to frequent releases.

I also committed and push some issues to 4.15.1 and 5.1.1 already. But I can't 
do it alone.

There are 14 items to go for 4.15.0. Some of those are potentially 
serious.https://issues.apache.org/jira/issues/?jql=project%20%3D%20PHOENIX%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20%22Patch%20Available%22)%20AND%20fixVersion%20%3D%204.15.0

And 26 items for 5.1.0
https://issues.apache.org/jira/issues/?jql=project%20%3D%20PHOENIX%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%20fixVersion%20%3D%205.1.0

Let's make a final push and get these done (or moved to 4.15.1/5.1.1, resp)If 
you have any issues open, please either get them committed to move them to the 
next release.

And then let's try to never get into this situation again where we have a huge 
unreleased (and unreleasable) code base with 100's or 1000's of unreleased 
changes.
Thanks!
-- Lars