I apologize for the harsh words, and personally to Andrzej for hurting your
feelings. I had no such intentions.

> You conveniently don’t mention that I WITHDREW my objection, and instead
proposed a lenient validation (but validation nonetheless!).
Yes, let me mention that you agreed in principal to reduce the impact of
the change (even though not completely revert it). I welcome that and thank
you for that. By the time you replied on JIRA, I had already sent this mail.

> I see no urgency at all in this matter. This can be handled as day-to-day
bug fixing as usual.
I think this requires an immediate notification to all users to be aware of
this situation before upgrading. Also, an immediate breakfix should be
helpful for them.

> My feelings are hurt, and I'm greatly disappointed in your words, quick
attacking off the cuff regularly rude (IMO) because you happened to have a
bad day.
I apologize.

How I saw things is that we have a commitment to our users to give them
good quality software that they can rely on. My intention was not to attack
Andrzej personally, but to bring about collective awareness regarding this
problem: that we, as a community, don't care enough for our users. We need
to get better at testing, get better at reviews, better at benchmarks, etc.
Individually, we all have the best of intentions, and obviously so does
Andrzej. However, we need to get better, and I wanted this to be a starting
point in that conversation. Clearly, I was carried over and I apologize for
that.

On Tue, May 18, 2021 at 5:52 PM Andrzej Białecki <[email protected]> wrote:

> Ishan, as I pointed out in Jira I don’t care for you implying that I have
> evil intentions, I resent also your implication that I’m behaving
> irrationally or don’t care for the users. Those of you who are interested
> may read the comments in Jira and judge for themselves.
>
> You conveniently don’t mention that I WITHDREW my objection, and instead
> proposed a lenient validation (but validation nonetheless!). It’s easy to
> scream “revert! revert!” but it actually takes some consideration to
> properly address the original purpose of this change - that is, detecting
> and avoiding the corruption of replica state. Let’s focus on this and not
> on pointing fingers.
>
> As for the production outage - I’m sorry this happened to you. As I hope
> you and Noble and others are sorry for other inadvertently introduced bugs,
> which I’m sure brought down many clusters at inconvenient hours...
>
>
> On 18 May 2021, at 13:26, Ishan Chattopadhyaya <[email protected]>
> wrote:
>
> https://issues.apache.org/jira/browse/SOLR-14245
>
> There was a *production outage* at *odd hours* at my (and Noble's)
> client, due to this above change in Solr 8.5 onwards by *Andrzej Bialecki*
> .
>
> In short, there is some bug in Solr where a replica gets "null" as the
> node_name (upon invocation of a collection API command). On the rare
> occasions where we encountered such situations in the past, the replica
> would be unavailable and the system would work fine overall. However, this
> change (which introduces strict validation of errors while *reading*
> Replica objects) now means that if such a situation arises (where some
> Solr's APIs itself results in node_name being null in a state.json), all
> SolrJ clients and all Solr nodes will go for a toss (possibly crash, and
> not start back up).
>
> This change was rushed in, *without any discussions or review*, without
> extensive testing for the failures it will cause on existing systems where
> cluster state is messed up but system is running, and *without any
> consideration for the impact on users*.
>
> Noble and I are of the opinion that this change should be *reverted
> immediately*, considering the impact to users. However, there is *strong
> disagreement on Andrzej's part*.
>
> *Mistakes* happen, but *doubling down on them irrationally* [1] will
> destroy the reputation of the project, let alone the peace of mind of those
> who are running Solr in production.
>
> Does someone have any thoughts or opinions?
>
> [1] -
> https://issues.apache.org/jira/browse/SOLR-14245?focusedCommentId=17346758&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17346758
>
>
>

Reply via email to