[GitHub] [solr-operator] joshsouza commented on issue #471: How to prevent node rotation behavior from causing cluster instability

GitBox Fri, 16 Sep 2022 09:43:51 -0700


joshsouza commented on issue #471:
URL: https://github.com/apache/solr-operator/issues/471#issuecomment-1249574050


   Thanks for all the thoughtful discussion. It hadn't even occurred to me to
   do a per-pod pdb, but that makes a ton of sense given the context, and I
   would say that's probably the near-term most viable solution (since there's
   so much in the air for future k8s revisions, and we wouldn't want to
   require bleeding edge k8s to run Solr safely).
   
   That said, I think it's worth taking the time to do this right, get other
   voices, and test things out. In the interim, my team is proceeding with a
   cluster-wide PDB, and a pod that will flip between 0-1 for availability on
   that in order to be overly cautious.
   
   I think that's a reasonable option as a stop-gap for us, but I'd love to
   help where I can in making this a first-party solution.
   
   How can I best help out?
   
   On Fri, Sep 16, 2022, 8:58 AM Houston Putman ***@***.***>
   wrote:
   
   > Just had a thought on this after perusing the docs further to see if
   > there's anything I could find to support our end goals within current
   > constraints:
   > 
https://kubernetes.io/docs/tasks/run-application/configure-pdb/#specifying-a-poddisruptionbudget
   > I can specify a disruption maxUnavailable of 0. This will prevent any
   > voluntary disruptions entirely.
   >
   > That is very interesting, and could certainly be something for us to look
   > into.
   >
   > If we go further down that idea, we *could* have a PDB for each pod
   > individually, and basically set the minAvailable to either 0 or 1
   > depending on whether it's ok to take down that pod at any given time (given
   > the same logic we use for restarts). That gives us a much more fine-tuned
   > ability to control this.
   >
   > It also occurred to me that if each SolrCloud had a PDB with a
   > maxUnavailable of 0 at all times, the Solr Operator could monitor the
   > cluster for node rotation behavior
   >
   > This is probably the best solution, if we can get it right. There are
   > things the Solr Operator generally wants to control before letting a pod
   > get deleted, such as moving replicas off of Solr Node with ephemeral data.
   > So if we are able to do that then I think we go for it.
   >
   > The new DistruptionCondition
   > 
<https://kubernetes.io/docs/concepts/workloads/pods/disruptions/#pod-disruption-conditions>
   > stuff might give us that info, but its alpha in v1.25, so probably won't
   > be available by default for at least a few more versions. I'm also not sure
   > if it will put the condition on the pod if the PDB says not to delete it...
   > But it would certainly be the easiest way forward if we wanted to do this.
   >
   > Either way, we don't need to be perfect from the beginning. I say that for
   > now, we either go cluster-wide PDB or do per-pod PDBs. But I absolutely
   > love this discussion, and with a few new versions of Kubernetes, we can
   > probably get this to an amazing place.
   >
   > —
   > Reply to this email directly, view it on GitHub
   > 
<https://github.com/apache/solr-operator/issues/471#issuecomment-1249533894>,
   > or unsubscribe
   > 
<https://github.com/notifications/unsubscribe-auth/ACGKLJISYCUZ5N5W7GOD7PDV6SKLRANCNFSM6AAAAAAQJBT6FE>
   > .
   > You are receiving this because you were mentioned.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[GitHub] [solr-operator] joshsouza commented on issue #471: How to prevent node rotation behavior from causing cluster instability

Reply via email to