[ 
https://issues.apache.org/jira/browse/SOLR-16531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17675670#comment-17675670
 ] 

Ishan Chattopadhyaya edited comment on SOLR-16531 at 1/11/23 6:21 PM:
----------------------------------------------------------------------

bq. We have had quite a lot of large "development branches" with awesome 
features get so out-of-sync over time that there was no way to merge them back 
in later.

What you are arguing in favour of is a very bizzare practice: do 
incremental/iterative development of major features (that are potentially 
disruptive to the health of the project, if not done carefully) on stable 
branch. We moved to git, and development on feature branches is one of the most 
common practices followed world wide. I don't understand the rush and hurry to 
keep half baked commits in stable branch, and cause disruption to others who 
care about performance. I know that we have done this in the past, putting half 
baked commits disrupting tests etc. But, that doesn't mean it is not a "bad 
development practice". In fact, I was enlightened about the merits of working 
on development branches inspired by [~hossman], when we worked together on 
SOLR-5944. It took a long time to get right on a development branch, but it was 
solid once it was brought in master/branch_9x. Due diligence of perf testing, 
unit/integration/stress testing etc. was done on the branch first, and merged 
finally.

bq. I do not agree that the way to go is keep it on a development branch, 
making Jason keep it up to date for months until its reached the "threshold" 
for acceptability.

I fail to understand the logic behind this. If many people are collaborating 
now on main / branch_9x now, why would they not collaborate on a development 
branch? No one is making Jason do anything, all of us can collectively 
contribute on the development branch. Regarding "until its reached the 
"threshold" for acceptability" , a massive performance degradation like this 
warrants the agreement that threshold was breached. What is wrong with working 
on a development branch until the threshold is no longer breached?

bq. The goal here was to make performance with JAXRS and without JAXRS similar.

Without reverting the current code and getting a baseline measurement, how do 
you plan to measure that? Lets say performance benchmark (on some test) was X1 
at the time of JAX RS integration, and after the fixes, the new performance 
benchmark is X2, can you attribute the difference | X1 - X2 | to the overall 
JAX RS integration alone, when you know there are hundreds of other commits 
that have happened in the past 4 months?

bq. There is no longer any reason behind your veto
When the veto was invoked, the ASF guidelines around immediate reverting was 
ignored. Even after the 1 month deadline, performance degradation is still 
there (which was the "reason behind [my] veto"). Please keep in mind that I've 
prioritized my work schedule around this deadline such that I'm able to start 
vetting other major commits that have been introduced in last 4 months. There 
should've been no reason to keep me waiting, because we could've easily 
reverted this when there was a veto, and everything could've been brought back 
once fixed.

bq. We can move forward with Jason's new PR and put this all behind us.
Sure, we can. If you insist that we must, I'll go your way. Then, why even 
bother about keeping ourselves honest about anything to do with performance 
benchmarking, if that's what everyone wants here. A very respected member of 
our community once mentioned to me in private, "If we hold performance above 
all else, many of us would feel a sense of 'don’t change it if it isn’t broken' 
and then the project is on its death-bead IMO — fear of changing something and 
getting criticized". IOW, keeping ourselves accountable performance benchmarks 
wise might make Solr a dead project.

bq. This is not an acceptable way to talk about an agreed upon approach that 
took a contributor months to accomplish. It is all of our jobs to foster a 
healthy and supportive environment in the project and I ask to you to take that 
job more seriously.

Please point to whatever you feel is unacceptable. Any piece of code that is 
damaging the health of the project is a toxin that must be removed. Removing 
such code is the way to "foster a healthy and supportive environment", which is 
something I'm going to take seriously. As you know, presence of CDCR and 
Autoscaling were detrimental to our project in various ways. Presence of a JAX 
RS integration that slows down node startup by 30% is also detrimental. Please 
keep in mind, any users that want to maintain a large Solr cluster and not 
working in a trillion dollar company would be cost conscious regarding their 
operations, and performance degradations of any kind would be bad for them.



was (Author: ichattopadhyaya):
bq. We have had quite a lot of large "development branches" with awesome 
features get so out-of-sync over time that there was no way to merge them back 
in later.

What you are arguing in favour of is a very bizzare practice: do 
incremental/iterative development of major features (that are potentially 
disruptive to the health of the project, if not done carefully) on stable 
branch. We moved to git, and development on feature branches is one of the most 
common practices followed world wide. I don't understand the rush and hurry to 
keep half baked commits in stable branch, and cause disruption to others who 
care about performance. I know that we have done this in the past, putting half 
baked commits disrupting tests etc. But, that doesn't mean it is not a "bad 
development practice". In fact, I was enlightened about the merits of such an 
approach inspired by [~hossman], when we worked together on SOLR-5944. It took 
a long time to get right on a development branch, but it was solid once it was 
brought in master/branch_9x. Due diligence of perf testing, 
unit/integration/stress testing etc. was done on the branch first, and merged 
finally.

bq. I do not agree that the way to go is keep it on a development branch, 
making Jason keep it up to date for months until its reached the "threshold" 
for acceptability.

I fail to understand the logic behind this. If many people are collaborating 
now on main / branch_9x now, why would they not collaborate on a development 
branch? No one is making Jason do anything, all of us can collectively 
contribute on the development branch. Regarding "until its reached the 
"threshold" for acceptability" , a massive performance degradation like this 
warrants the agreement that threshold was breached. What is wrong with working 
on a development branch until the threshold is no longer breached?

bq. The goal here was to make performance with JAXRS and without JAXRS similar.

Without reverting the current code and getting a baseline measurement, how do 
you plan to measure that? Lets say performance benchmark (on some test) was X1 
at the time of JAX RS integration, and after the fixes, the new performance 
benchmark is X2, can you attribute the difference | X1 - X2 | to the overall 
JAX RS integration alone, when you know there are hundreds of other commits 
that have happened in the past 4 months?

bq. There is no longer any reason behind your veto
When the veto was invoked, the ASF guidelines around immediate reverting was 
ignored. Even after the 1 month deadline, performance degradation is still 
there (which was the "reason behind [my] veto"). Please keep in mind that I've 
prioritized my work schedule around this deadline such that I'm able to start 
vetting other major commits that have been introduced in last 4 months. There 
should've been no reason to keep me waiting, because we could've easily 
reverted this when there was a veto, and everything could've been brought back 
once fixed.

bq. We can move forward with Jason's new PR and put this all behind us.
Sure, we can. If you insist that we must, I'll go your way. Then, why even 
bother about keeping ourselves honest about anything to do with performance 
benchmarking, if that's what everyone wants here. A very respected member of 
our community once mentioned to me in private, "If we hold performance above 
all else, many of us would feel a sense of 'don’t change it if it isn’t broken' 
and then the project is on its death-bead IMO — fear of changing something and 
getting criticized". IOW, keeping ourselves accountable performance benchmarks 
wise might make Solr a dead project.

bq. This is not an acceptable way to talk about an agreed upon approach that 
took a contributor months to accomplish. It is all of our jobs to foster a 
healthy and supportive environment in the project and I ask to you to take that 
job more seriously.

Please point to whatever you feel is unacceptable. Any piece of code that is 
damaging the health of the project is a toxin that must be removed. Removing 
such code is the way to "foster a healthy and supportive environment", which is 
something I'm going to take seriously. As you know, presence of CDCR and 
Autoscaling were detrimental to our project in various ways. Presence of a JAX 
RS integration that slows down node startup by 30% is also detrimental. Please 
keep in mind, any users that want to maintain a large Solr cluster and not 
working in a trillion dollar company would be cost conscious regarding their 
operations, and performance degradations of any kind would be bad for them.


> Performance degradation due to introduction of JAX-RS
> -----------------------------------------------------
>
>                 Key: SOLR-16531
>                 URL: https://issues.apache.org/jira/browse/SOLR-16531
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Ishan Chattopadhyaya
>            Priority: Blocker
>             Fix For: 9.2
>
>         Attachments: Screenshot from 2022-11-09 11-20-44.png, 
> jaxrs-integration-with-app-per-configset-flamegraph.html, 
> results-with-patch.tar.gz, vanilla-jaxrs-integration-flamegraph.html
>
>
> During performance benchmarking on branch_9x, I observed a slowdown in 
> restart performance since commits in SOLR-16347. See attached screenshot.
> CC [~gerlowskija].
> http://mostly.cool/cluster-test-with-patch.html
> The benchmark is here: 
> https://github.com/fullstorydev/solr-bench/blob/ishan/repeatable-jenkins/suites/cluster-test.json.
>  This suite was run after retro-actively applying the parallelStream patch 
> from SOLR-16414: 
> https://github.com/apache/solr/commit/b33161d0cdd976fc0c3dc78c4afafceb4db671cf.diff
>  
> Effort to automate these benchmarks is WIP and tracked here: SOLR-16525.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to