I apologize for the harsh words, and personally to Andrzej for hurting your feelings. I had no such intentions.
> You conveniently don’t mention that I WITHDREW my objection, and instead proposed a lenient validation (but validation nonetheless!). Yes, let me mention that you agreed in principal to reduce the impact of the change (even though not completely revert it). I welcome that and thank you for that. By the time you replied on JIRA, I had already sent this mail. > I see no urgency at all in this matter. This can be handled as day-to-day bug fixing as usual. I think this requires an immediate notification to all users to be aware of this situation before upgrading. Also, an immediate breakfix should be helpful for them. > My feelings are hurt, and I'm greatly disappointed in your words, quick attacking off the cuff regularly rude (IMO) because you happened to have a bad day. I apologize. How I saw things is that we have a commitment to our users to give them good quality software that they can rely on. My intention was not to attack Andrzej personally, but to bring about collective awareness regarding this problem: that we, as a community, don't care enough for our users. We need to get better at testing, get better at reviews, better at benchmarks, etc. Individually, we all have the best of intentions, and obviously so does Andrzej. However, we need to get better, and I wanted this to be a starting point in that conversation. Clearly, I was carried over and I apologize for that. On Tue, May 18, 2021 at 5:52 PM Andrzej Białecki <[email protected]> wrote: > Ishan, as I pointed out in Jira I don’t care for you implying that I have > evil intentions, I resent also your implication that I’m behaving > irrationally or don’t care for the users. Those of you who are interested > may read the comments in Jira and judge for themselves. > > You conveniently don’t mention that I WITHDREW my objection, and instead > proposed a lenient validation (but validation nonetheless!). It’s easy to > scream “revert! revert!” but it actually takes some consideration to > properly address the original purpose of this change - that is, detecting > and avoiding the corruption of replica state. Let’s focus on this and not > on pointing fingers. > > As for the production outage - I’m sorry this happened to you. As I hope > you and Noble and others are sorry for other inadvertently introduced bugs, > which I’m sure brought down many clusters at inconvenient hours... > > > On 18 May 2021, at 13:26, Ishan Chattopadhyaya <[email protected]> > wrote: > > https://issues.apache.org/jira/browse/SOLR-14245 > > There was a *production outage* at *odd hours* at my (and Noble's) > client, due to this above change in Solr 8.5 onwards by *Andrzej Bialecki* > . > > In short, there is some bug in Solr where a replica gets "null" as the > node_name (upon invocation of a collection API command). On the rare > occasions where we encountered such situations in the past, the replica > would be unavailable and the system would work fine overall. However, this > change (which introduces strict validation of errors while *reading* > Replica objects) now means that if such a situation arises (where some > Solr's APIs itself results in node_name being null in a state.json), all > SolrJ clients and all Solr nodes will go for a toss (possibly crash, and > not start back up). > > This change was rushed in, *without any discussions or review*, without > extensive testing for the failures it will cause on existing systems where > cluster state is messed up but system is running, and *without any > consideration for the impact on users*. > > Noble and I are of the opinion that this change should be *reverted > immediately*, considering the impact to users. However, there is *strong > disagreement on Andrzej's part*. > > *Mistakes* happen, but *doubling down on them irrationally* [1] will > destroy the reputation of the project, let alone the peace of mind of those > who are running Solr in production. > > Does someone have any thoughts or opinions? > > [1] - > https://issues.apache.org/jira/browse/SOLR-14245?focusedCommentId=17346758&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17346758 > > >
