Re: 4.15.0 and 5.1.0 releases
Great! Unless I hear otherwise I will steer people away from new features on the Jira - delay them to 4.15.1 and 5.1.1 to be specific. I suppose we have a testing gap for upgrades. These are hard to do in an automated way, though. If someone has some cycles to think about how to do that in an IT that would be valuable and appreciated. Or perhaps we do not have the right strategy in general and code and metadata are coupled too tightly...? Also something to think about. Let's close the final Jiras soon. And the we can release a SNAPSHOT/beta/whatever for people to bang against. And let's keep the test suite passing. And if you happen to look at some test code, also think about the performance. It is much more valuable to have the test suite that can pass in 1h than one that takes 2 or 3h to run. One last question... Should we revert the ViewIndexId changes and push then to 4.15.1/5.1.1? Is there a pressing need for those? (I suppose with the way we use Views at Salesforce there probably is, just want to confirm.) -- Lars On Friday, June 28, 2019, 2:50:06 PM PDT, Geoffrey Jacoby wrote: Lars, That sounds good to me -- whether the thing people test is called "beta" or "-SNAPSHOT", the important thing is that our code base is well-tested and, as you say, something we all have confidence in. In addition to splittable syscat (which I used as an example not because of quality concerns but because it's both very large, very central and one-way), other changes in 4.15 / 5.1 that might need attention for upgrade testing are the optional increase of ViewIndexId from a short to an int (PHOENIX-3547), and my own changes to fix a bug in ViewIndexId generation in PHOENIX-5132 / 5138. (The bug fix was simple; making it upgrade pre-existing view index sequences in a safe way was hard.) There are likely others I've forgotten or don't know about. The index changes also require upgrade and perf testing (some of which has been done, with good results, but more to go), but the nice thing there is that they're feature-flagged (opt-out for new tables/indexes, opt-in for existing ones via the upgrade tool in PHOENIX-5333.) and operators can switch back to the old design (even for new tables and indexes) if they need to using the upgrade tool's rollback option. So once testing is complete I think it's fine for them to go in 4.14.3. Geoffrey On Fri, Jun 28, 2019 at 12:20 PM la...@apache.org wrote: > Any further comments? > I offered to be the RM for 4.15.0, and I stand by that. I can't do it > alone, though. Do we have consensus on the rough course of action below? > Any other ideas? How far are we from a reasonable 4.15.0 release? > > IMHO we urgently need to apply some software engineering principles, > namely an always releasable code base and small, frequent releases. > > -- Lars > > On Thursday, June 27, 2019, 3:07:27 PM PDT, la...@apache.org < > la...@apache.org> wrote: > > Thanks Geoffrey. > The damage is already done. We messed up and let it slide (multiple times, > this is by no means the first time) and thus are in exactly the situation > you outlined: No confidence in the code base. > Now we can only look forward and get the code into a releasable state. The > most important aspects are - as you point out, and I agree - getting > confidence in splittable syscat and finishing the indexing work. > > In hindsight we should have done a release right before splittable syscat > and perhaps one right after. Oh well. :) > > Could you mark the Jiras you remember with 4.15.0 and 5.1.0 fix versions > (or are you saying you did already?) > Since you say that we can release 4.14.3 with just the index changes, does > that imply that you are mostly concerned about splittable syscat in 4.15.0 > and 5.1.0? > > I'm not a fan of a "beta" release, honestly. We can only do as good as we > can and release a version that we believe in good conscience that there are > no major issues. All releases will contain some bugs that are found later. > It seems we are not even at that point yet... The good conscience part. :) > > How about we institute an immediate absolutely-no-new-feature policy for > *all* of Phoenix until we have a releasable project? I'd be happy to > enforce that. One cannot add new features to a code base that is not > releasable/stable anyway. Until a few weeks ago we *never* had a passing > test run. I really don't understand how we get here over and over again. > But whatever, it's too late, and whining surely doesn't help. > > Lemme propose the following action plan then based on this and what you > said: > 1. We release 4.14.3 with just the index changes. Soon. > > 2. We immediately stop all new feature development in all branches > (including 5.x, i.e. master) > > 3. We harden/test/etc splittable syscat as well as other accumulated tech > debt that we identify. > > 4. After we release 4.15.0 and 5.1.0 we allow feature work again. > 5. Following those releases we do strictly monthly
Re: 4.15.0 and 5.1.0 releases
Lars, That sounds good to me -- whether the thing people test is called "beta" or "-SNAPSHOT", the important thing is that our code base is well-tested and, as you say, something we all have confidence in. In addition to splittable syscat (which I used as an example not because of quality concerns but because it's both very large, very central and one-way), other changes in 4.15 / 5.1 that might need attention for upgrade testing are the optional increase of ViewIndexId from a short to an int (PHOENIX-3547), and my own changes to fix a bug in ViewIndexId generation in PHOENIX-5132 / 5138. (The bug fix was simple; making it upgrade pre-existing view index sequences in a safe way was hard.) There are likely others I've forgotten or don't know about. The index changes also require upgrade and perf testing (some of which has been done, with good results, but more to go), but the nice thing there is that they're feature-flagged (opt-out for new tables/indexes, opt-in for existing ones via the upgrade tool in PHOENIX-5333.) and operators can switch back to the old design (even for new tables and indexes) if they need to using the upgrade tool's rollback option. So once testing is complete I think it's fine for them to go in 4.14.3. Geoffrey On Fri, Jun 28, 2019 at 12:20 PM la...@apache.org wrote: > Any further comments? > I offered to be the RM for 4.15.0, and I stand by that. I can't do it > alone, though. Do we have consensus on the rough course of action below? > Any other ideas? How far are we from a reasonable 4.15.0 release? > > IMHO we urgently need to apply some software engineering principles, > namely an always releasable code base and small, frequent releases. > > -- Lars > > On Thursday, June 27, 2019, 3:07:27 PM PDT, la...@apache.org < > la...@apache.org> wrote: > > Thanks Geoffrey. > The damage is already done. We messed up and let it slide (multiple times, > this is by no means the first time) and thus are in exactly the situation > you outlined: No confidence in the code base. > Now we can only look forward and get the code into a releasable state. The > most important aspects are - as you point out, and I agree - getting > confidence in splittable syscat and finishing the indexing work. > > In hindsight we should have done a release right before splittable syscat > and perhaps one right after. Oh well. :) > > Could you mark the Jiras you remember with 4.15.0 and 5.1.0 fix versions > (or are you saying you did already?) > Since you say that we can release 4.14.3 with just the index changes, does > that imply that you are mostly concerned about splittable syscat in 4.15.0 > and 5.1.0? > > I'm not a fan of a "beta" release, honestly. We can only do as good as we > can and release a version that we believe in good conscience that there are > no major issues. All releases will contain some bugs that are found later. > It seems we are not even at that point yet... The good conscience part. :) > > How about we institute an immediate absolutely-no-new-feature policy for > *all* of Phoenix until we have a releasable project? I'd be happy to > enforce that. One cannot add new features to a code base that is not > releasable/stable anyway. Until a few weeks ago we *never* had a passing > test run. I really don't understand how we get here over and over again. > But whatever, it's too late, and whining surely doesn't help. > > Lemme propose the following action plan then based on this and what you > said: > 1. We release 4.14.3 with just the index changes. Soon. > > 2. We immediately stop all new feature development in all branches > (including 5.x, i.e. master) > > 3. We harden/test/etc splittable syscat as well as other accumulated tech > debt that we identify. > > 4. After we release 4.15.0 and 5.1.0 we allow feature work again. > 5. Following those releases we do strictly monthly releases on all > branches (and if we cannot do that, declare a branch dead) > Some of these (especially #5) might be radical, but we if we want to avoid > this situation again we need to apply some rigor. As is Phoenix has been > turning into an almost unmaintainable project over the past years, we need > to actively counter that. > > Cheers! > > -- Lars > > On Thursday, June 27, 2019, 1:36:37 PM PDT, Geoffrey Jacoby < > gjac...@gmail.com> wrote: > > Lars, > > I agree 100% that we should have smaller, more frequent releases going > forward. As for this release, I have two concerns. > > The first is indexes. I've added several JIRAs that had been incorrectly > not marked with a Fix Version to 4.15 / 5.1. These are all part of the > Self-Repairing Index project, which spans several JIRAs and whose first > major one (PHOENIX-5156, allowing newly created mutable indexes to > self-repair inconsistencies at read time) is already in 4.15 and 5.1. > Outstanding JIRAs include PHOENIX-5211 to extend the logic to immutable > indexes, and PHOENIX-5333, to give users a tool to convert their legacy > indexes to the
Re: 4.15.0 and 5.1.0 releases
Any further comments? I offered to be the RM for 4.15.0, and I stand by that. I can't do it alone, though. Do we have consensus on the rough course of action below? Any other ideas? How far are we from a reasonable 4.15.0 release? IMHO we urgently need to apply some software engineering principles, namely an always releasable code base and small, frequent releases. -- Lars On Thursday, June 27, 2019, 3:07:27 PM PDT, la...@apache.org wrote: Thanks Geoffrey. The damage is already done. We messed up and let it slide (multiple times, this is by no means the first time) and thus are in exactly the situation you outlined: No confidence in the code base. Now we can only look forward and get the code into a releasable state. The most important aspects are - as you point out, and I agree - getting confidence in splittable syscat and finishing the indexing work. In hindsight we should have done a release right before splittable syscat and perhaps one right after. Oh well. :) Could you mark the Jiras you remember with 4.15.0 and 5.1.0 fix versions (or are you saying you did already?) Since you say that we can release 4.14.3 with just the index changes, does that imply that you are mostly concerned about splittable syscat in 4.15.0 and 5.1.0? I'm not a fan of a "beta" release, honestly. We can only do as good as we can and release a version that we believe in good conscience that there are no major issues. All releases will contain some bugs that are found later. It seems we are not even at that point yet... The good conscience part. :) How about we institute an immediate absolutely-no-new-feature policy for *all* of Phoenix until we have a releasable project? I'd be happy to enforce that. One cannot add new features to a code base that is not releasable/stable anyway. Until a few weeks ago we *never* had a passing test run. I really don't understand how we get here over and over again. But whatever, it's too late, and whining surely doesn't help. Lemme propose the following action plan then based on this and what you said: 1. We release 4.14.3 with just the index changes. Soon. 2. We immediately stop all new feature development in all branches (including 5.x, i.e. master) 3. We harden/test/etc splittable syscat as well as other accumulated tech debt that we identify. 4. After we release 4.15.0 and 5.1.0 we allow feature work again. 5. Following those releases we do strictly monthly releases on all branches (and if we cannot do that, declare a branch dead) Some of these (especially #5) might be radical, but we if we want to avoid this situation again we need to apply some rigor. As is Phoenix has been turning into an almost unmaintainable project over the past years, we need to actively counter that. Cheers! -- Lars On Thursday, June 27, 2019, 1:36:37 PM PDT, Geoffrey Jacoby wrote: Lars, I agree 100% that we should have smaller, more frequent releases going forward. As for this release, I have two concerns. The first is indexes. I've added several JIRAs that had been incorrectly not marked with a Fix Version to 4.15 / 5.1. These are all part of the Self-Repairing Index project, which spans several JIRAs and whose first major one (PHOENIX-5156, allowing newly created mutable indexes to self-repair inconsistencies at read time) is already in 4.15 and 5.1. Outstanding JIRAs include PHOENIX-5211 to extend the logic to immutable indexes, and PHOENIX-5333, to give users a tool to convert their legacy indexes to the new model. These are all under review and should land very soon. Especially given the multiple reports on the user list of operators encountering index consistency problems (which I have also seen in my own environments), I think it's important that our next release include these fixes, and that they go out in a unified way. The second concern is testing, particularly upgrade, perf and chaos testing. In addition to the large index changes (for which I know some perf work and live-cluster testing has been done, with more planned), there are other major changes in 4.15 such as the splittable system catalog. If all the issues on the current list were fixed, I'd still be reluctant to put the bits into production without more due diligence. We've released binaries with significant regressions in them that were missed in our test suites before, and it's important to avoid that this time. Yet Lars's point that we've waited far too long to release is of course correct. Perhaps the solution is to do what the HBase community did when the 2.x branch dragged out too long, and after the listed issues are Fixed, we release an explicit beta, closed to new features, from which a final release can graduate. In parallel, we could release a 4.14.3 with just the index changes and the current diff from 4.14.2 so users get those faster. Or maybe our testing's advanced further than I know about, and we're closer to green than I think. Happy to hear everyone's thoughts.
Re: 4.15.0 and 5.1.0 releases
Thanks Geoffrey. The damage is already done. We messed up and let it slide (multiple times, this is by no means the first time) and thus are in exactly the situation you outlined: No confidence in the code base. Now we can only look forward and get the code into a releasable state. The most important aspects are - as you point out, and I agree - getting confidence in splittable syscat and finishing the indexing work. In hindsight we should have done a release right before splittable syscat and perhaps one right after. Oh well. :) Could you mark the Jiras you remember with 4.15.0 and 5.1.0 fix versions (or are you saying you did already?) Since you say that we can release 4.14.3 with just the index changes, does that imply that you are mostly concerned about splittable syscat in 4.15.0 and 5.1.0? I'm not a fan of a "beta" release, honestly. We can only do as good as we can and release a version that we believe in good conscience that there are no major issues. All releases will contain some bugs that are found later. It seems we are not even at that point yet... The good conscience part. :) How about we institute an immediate absolutely-no-new-feature policy for *all* of Phoenix until we have a releasable project? I'd be happy to enforce that. One cannot add new features to a code base that is not releasable/stable anyway. Until a few weeks ago we *never* had a passing test run. I really don't understand how we get here over and over again. But whatever, it's too late, and whining surely doesn't help. Lemme propose the following action plan then based on this and what you said: 1. We release 4.14.3 with just the index changes. Soon. 2. We immediately stop all new feature development in all branches (including 5.x, i.e. master) 3. We harden/test/etc splittable syscat as well as other accumulated tech debt that we identify. 4. After we release 4.15.0 and 5.1.0 we allow feature work again. 5. Following those releases we do strictly monthly releases on all branches (and if we cannot do that, declare a branch dead) Some of these (especially #5) might be radical, but we if we want to avoid this situation again we need to apply some rigor. As is Phoenix has been turning into an almost unmaintainable project over the past years, we need to actively counter that. Cheers! -- Lars On Thursday, June 27, 2019, 1:36:37 PM PDT, Geoffrey Jacoby wrote: Lars, I agree 100% that we should have smaller, more frequent releases going forward. As for this release, I have two concerns. The first is indexes. I've added several JIRAs that had been incorrectly not marked with a Fix Version to 4.15 / 5.1. These are all part of the Self-Repairing Index project, which spans several JIRAs and whose first major one (PHOENIX-5156, allowing newly created mutable indexes to self-repair inconsistencies at read time) is already in 4.15 and 5.1. Outstanding JIRAs include PHOENIX-5211 to extend the logic to immutable indexes, and PHOENIX-5333, to give users a tool to convert their legacy indexes to the new model. These are all under review and should land very soon. Especially given the multiple reports on the user list of operators encountering index consistency problems (which I have also seen in my own environments), I think it's important that our next release include these fixes, and that they go out in a unified way. The second concern is testing, particularly upgrade, perf and chaos testing. In addition to the large index changes (for which I know some perf work and live-cluster testing has been done, with more planned), there are other major changes in 4.15 such as the splittable system catalog. If all the issues on the current list were fixed, I'd still be reluctant to put the bits into production without more due diligence. We've released binaries with significant regressions in them that were missed in our test suites before, and it's important to avoid that this time. Yet Lars's point that we've waited far too long to release is of course correct. Perhaps the solution is to do what the HBase community did when the 2.x branch dragged out too long, and after the listed issues are Fixed, we release an explicit beta, closed to new features, from which a final release can graduate. In parallel, we could release a 4.14.3 with just the index changes and the current diff from 4.14.2 so users get those faster. Or maybe our testing's advanced further than I know about, and we're closer to green than I think. Happy to hear everyone's thoughts. Geoffrey On Thu, Jun 27, 2019 at 10:26 AM la...@apache.org wrote: > Hi all, > we're getting close. The test suite is passing fairly reliably now.(minus > some strange failure to archive the artifact in -1.4 and PartialCommitIT > failing in -1.3 only). > I put a lot of effort into speeding up the tests and making them pass. > Let's please (pretty please :) ) keep it that way.A passing, comprehensive > test suite is key to frequent releases. > > I also
Re: 4.15.0 and 5.1.0 releases
Lars, I agree 100% that we should have smaller, more frequent releases going forward. As for this release, I have two concerns. The first is indexes. I've added several JIRAs that had been incorrectly not marked with a Fix Version to 4.15 / 5.1. These are all part of the Self-Repairing Index project, which spans several JIRAs and whose first major one (PHOENIX-5156, allowing newly created mutable indexes to self-repair inconsistencies at read time) is already in 4.15 and 5.1. Outstanding JIRAs include PHOENIX-5211 to extend the logic to immutable indexes, and PHOENIX-5333, to give users a tool to convert their legacy indexes to the new model. These are all under review and should land very soon. Especially given the multiple reports on the user list of operators encountering index consistency problems (which I have also seen in my own environments), I think it's important that our next release include these fixes, and that they go out in a unified way. The second concern is testing, particularly upgrade, perf and chaos testing. In addition to the large index changes (for which I know some perf work and live-cluster testing has been done, with more planned), there are other major changes in 4.15 such as the splittable system catalog. If all the issues on the current list were fixed, I'd still be reluctant to put the bits into production without more due diligence. We've released binaries with significant regressions in them that were missed in our test suites before, and it's important to avoid that this time. Yet Lars's point that we've waited far too long to release is of course correct. Perhaps the solution is to do what the HBase community did when the 2.x branch dragged out too long, and after the listed issues are Fixed, we release an explicit beta, closed to new features, from which a final release can graduate. In parallel, we could release a 4.14.3 with just the index changes and the current diff from 4.14.2 so users get those faster. Or maybe our testing's advanced further than I know about, and we're closer to green than I think. Happy to hear everyone's thoughts. Geoffrey On Thu, Jun 27, 2019 at 10:26 AM la...@apache.org wrote: > Hi all, > we're getting close. The test suite is passing fairly reliably now.(minus > some strange failure to archive the artifact in -1.4 and PartialCommitIT > failing in -1.3 only). > I put a lot of effort into speeding up the tests and making them pass. > Let's please (pretty please :) ) keep it that way.A passing, comprehensive > test suite is key to frequent releases. > > I also committed and push some issues to 4.15.1 and 5.1.1 already. But I > can't do it alone. > > There are 14 items to go for 4.15.0. Some of those are potentially serious. > https://issues.apache.org/jira/issues/?jql=project%20%3D%20PHOENIX%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20%22Patch%20Available%22)%20AND%20fixVersion%20%3D%204.15.0 > > And 26 items for 5.1.0 > > https://issues.apache.org/jira/issues/?jql=project%20%3D%20PHOENIX%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%20fixVersion%20%3D%205.1.0 > > Let's make a final push and get these done (or moved to 4.15.1/5.1.1, > resp)If you have any issues open, please either get them committed to move > them to the next release. > > And then let's try to never get into this situation again where we have a > huge unreleased (and unreleasable) code base with 100's or 1000's of > unreleased changes. > Thanks! > -- Lars >
4.15.0 and 5.1.0 releases
Hi all, we're getting close. The test suite is passing fairly reliably now.(minus some strange failure to archive the artifact in -1.4 and PartialCommitIT failing in -1.3 only). I put a lot of effort into speeding up the tests and making them pass. Let's please (pretty please :) ) keep it that way.A passing, comprehensive test suite is key to frequent releases. I also committed and push some issues to 4.15.1 and 5.1.1 already. But I can't do it alone. There are 14 items to go for 4.15.0. Some of those are potentially serious.https://issues.apache.org/jira/issues/?jql=project%20%3D%20PHOENIX%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20%22Patch%20Available%22)%20AND%20fixVersion%20%3D%204.15.0 And 26 items for 5.1.0 https://issues.apache.org/jira/issues/?jql=project%20%3D%20PHOENIX%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%20fixVersion%20%3D%205.1.0 Let's make a final push and get these done (or moved to 4.15.1/5.1.1, resp)If you have any issues open, please either get them committed to move them to the next release. And then let's try to never get into this situation again where we have a huge unreleased (and unreleasable) code base with 100's or 1000's of unreleased changes. Thanks! -- Lars