[
https://issues.apache.org/jira/browse/SOLR-16531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17675670#comment-17675670
]
Ishan Chattopadhyaya edited comment on SOLR-16531 at 1/11/23 6:21 PM:
----------------------------------------------------------------------
bq. We have had quite a lot of large "development branches" with awesome
features get so out-of-sync over time that there was no way to merge them back
in later.
What you are arguing in favour of is a very bizzare practice: do
incremental/iterative development of major features (that are potentially
disruptive to the health of the project, if not done carefully) on stable
branch. We moved to git, and development on feature branches is one of the most
common practices followed world wide. I don't understand the rush and hurry to
keep half baked commits in stable branch, and cause disruption to others who
care about performance. I know that we have done this in the past, putting half
baked commits disrupting tests etc. But, that doesn't mean it is not a "bad
development practice". In fact, I was enlightened about the merits of working
on development branches inspired by [~hossman], when we worked together on
SOLR-5944. It took a long time to get right on a development branch, but it was
solid once it was brought in master/branch_9x. Due diligence of perf testing,
unit/integration/stress testing etc. was done on the branch first, and merged
finally.
bq. I do not agree that the way to go is keep it on a development branch,
making Jason keep it up to date for months until its reached the "threshold"
for acceptability.
I fail to understand the logic behind this. If many people are collaborating
now on main / branch_9x now, why would they not collaborate on a development
branch? No one is making Jason do anything, all of us can collectively
contribute on the development branch. Regarding "until its reached the
"threshold" for acceptability" , a massive performance degradation like this
warrants the agreement that threshold was breached. What is wrong with working
on a development branch until the threshold is no longer breached?
bq. The goal here was to make performance with JAXRS and without JAXRS similar.
Without reverting the current code and getting a baseline measurement, how do
you plan to measure that? Lets say performance benchmark (on some test) was X1
at the time of JAX RS integration, and after the fixes, the new performance
benchmark is X2, can you attribute the difference | X1 - X2 | to the overall
JAX RS integration alone, when you know there are hundreds of other commits
that have happened in the past 4 months?
bq. There is no longer any reason behind your veto
When the veto was invoked, the ASF guidelines around immediate reverting was
ignored. Even after the 1 month deadline, performance degradation is still
there (which was the "reason behind [my] veto"). Please keep in mind that I've
prioritized my work schedule around this deadline such that I'm able to start
vetting other major commits that have been introduced in last 4 months. There
should've been no reason to keep me waiting, because we could've easily
reverted this when there was a veto, and everything could've been brought back
once fixed.
bq. We can move forward with Jason's new PR and put this all behind us.
Sure, we can. If you insist that we must, I'll go your way. Then, why even
bother about keeping ourselves honest about anything to do with performance
benchmarking, if that's what everyone wants here. A very respected member of
our community once mentioned to me in private, "If we hold performance above
all else, many of us would feel a sense of 'don’t change it if it isn’t broken'
and then the project is on its death-bead IMO — fear of changing something and
getting criticized". IOW, keeping ourselves accountable performance benchmarks
wise might make Solr a dead project.
bq. This is not an acceptable way to talk about an agreed upon approach that
took a contributor months to accomplish. It is all of our jobs to foster a
healthy and supportive environment in the project and I ask to you to take that
job more seriously.
Please point to whatever you feel is unacceptable. Any piece of code that is
damaging the health of the project is a toxin that must be removed. Removing
such code is the way to "foster a healthy and supportive environment", which is
something I'm going to take seriously. As you know, presence of CDCR and
Autoscaling were detrimental to our project in various ways. Presence of a JAX
RS integration that slows down node startup by 30% is also detrimental. Please
keep in mind, any users that want to maintain a large Solr cluster and not
working in a trillion dollar company would be cost conscious regarding their
operations, and performance degradations of any kind would be bad for them.
was (Author: ichattopadhyaya):
bq. We have had quite a lot of large "development branches" with awesome
features get so out-of-sync over time that there was no way to merge them back
in later.
What you are arguing in favour of is a very bizzare practice: do
incremental/iterative development of major features (that are potentially
disruptive to the health of the project, if not done carefully) on stable
branch. We moved to git, and development on feature branches is one of the most
common practices followed world wide. I don't understand the rush and hurry to
keep half baked commits in stable branch, and cause disruption to others who
care about performance. I know that we have done this in the past, putting half
baked commits disrupting tests etc. But, that doesn't mean it is not a "bad
development practice". In fact, I was enlightened about the merits of such an
approach inspired by [~hossman], when we worked together on SOLR-5944. It took
a long time to get right on a development branch, but it was solid once it was
brought in master/branch_9x. Due diligence of perf testing,
unit/integration/stress testing etc. was done on the branch first, and merged
finally.
bq. I do not agree that the way to go is keep it on a development branch,
making Jason keep it up to date for months until its reached the "threshold"
for acceptability.
I fail to understand the logic behind this. If many people are collaborating
now on main / branch_9x now, why would they not collaborate on a development
branch? No one is making Jason do anything, all of us can collectively
contribute on the development branch. Regarding "until its reached the
"threshold" for acceptability" , a massive performance degradation like this
warrants the agreement that threshold was breached. What is wrong with working
on a development branch until the threshold is no longer breached?
bq. The goal here was to make performance with JAXRS and without JAXRS similar.
Without reverting the current code and getting a baseline measurement, how do
you plan to measure that? Lets say performance benchmark (on some test) was X1
at the time of JAX RS integration, and after the fixes, the new performance
benchmark is X2, can you attribute the difference | X1 - X2 | to the overall
JAX RS integration alone, when you know there are hundreds of other commits
that have happened in the past 4 months?
bq. There is no longer any reason behind your veto
When the veto was invoked, the ASF guidelines around immediate reverting was
ignored. Even after the 1 month deadline, performance degradation is still
there (which was the "reason behind [my] veto"). Please keep in mind that I've
prioritized my work schedule around this deadline such that I'm able to start
vetting other major commits that have been introduced in last 4 months. There
should've been no reason to keep me waiting, because we could've easily
reverted this when there was a veto, and everything could've been brought back
once fixed.
bq. We can move forward with Jason's new PR and put this all behind us.
Sure, we can. If you insist that we must, I'll go your way. Then, why even
bother about keeping ourselves honest about anything to do with performance
benchmarking, if that's what everyone wants here. A very respected member of
our community once mentioned to me in private, "If we hold performance above
all else, many of us would feel a sense of 'don’t change it if it isn’t broken'
and then the project is on its death-bead IMO — fear of changing something and
getting criticized". IOW, keeping ourselves accountable performance benchmarks
wise might make Solr a dead project.
bq. This is not an acceptable way to talk about an agreed upon approach that
took a contributor months to accomplish. It is all of our jobs to foster a
healthy and supportive environment in the project and I ask to you to take that
job more seriously.
Please point to whatever you feel is unacceptable. Any piece of code that is
damaging the health of the project is a toxin that must be removed. Removing
such code is the way to "foster a healthy and supportive environment", which is
something I'm going to take seriously. As you know, presence of CDCR and
Autoscaling were detrimental to our project in various ways. Presence of a JAX
RS integration that slows down node startup by 30% is also detrimental. Please
keep in mind, any users that want to maintain a large Solr cluster and not
working in a trillion dollar company would be cost conscious regarding their
operations, and performance degradations of any kind would be bad for them.
> Performance degradation due to introduction of JAX-RS
> -----------------------------------------------------
>
> Key: SOLR-16531
> URL: https://issues.apache.org/jira/browse/SOLR-16531
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Ishan Chattopadhyaya
> Priority: Blocker
> Fix For: 9.2
>
> Attachments: Screenshot from 2022-11-09 11-20-44.png,
> jaxrs-integration-with-app-per-configset-flamegraph.html,
> results-with-patch.tar.gz, vanilla-jaxrs-integration-flamegraph.html
>
>
> During performance benchmarking on branch_9x, I observed a slowdown in
> restart performance since commits in SOLR-16347. See attached screenshot.
> CC [~gerlowskija].
> http://mostly.cool/cluster-test-with-patch.html
> The benchmark is here:
> https://github.com/fullstorydev/solr-bench/blob/ishan/repeatable-jenkins/suites/cluster-test.json.
> This suite was run after retro-actively applying the parallelStream patch
> from SOLR-16414:
> https://github.com/apache/solr/commit/b33161d0cdd976fc0c3dc78c4afafceb4db671cf.diff
>
> Effort to automate these benchmarks is WIP and tracked here: SOLR-16525.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]