On Wed, Jul 25, 2018 at 10:48 PM, John Strunk <jstr...@redhat.com> wrote:
> I have not put together a list. Perhaps the following will help w/ the > context though... > > The "reconcile loop" of the operator will take the cluster CRs and > reconcile them against the actual cluster config. At the 20k foot level, > this amounts to something like determining there should be 8 gluster pods > running, and making the appropriate changes if that doesn't match reality. > In practical terms, the construction of this reconciliation loop can be > thought of as a set (array) of 3-tuples: [{should_act() -> bool, can_act -> > bool, action() -> ok, error}, {..., ..., ...}, ...] > > Each capability of the operator would be expressed as one of these tuples. > should_act() : true if the action() should be taken > can_act() : true if the prerequisites for taking the action are met > action() : make the change. Only run if should && can. > (note that I believe should_act() and can_act() should not be separate in > the implementation, for reasons I'll not go into here) > > An example action might be "upgrade the container image for pod X". The > associated should_act would be triggered if the "image=" of the pod doesn't > match the desired "image=" in the operator CRs. The can_act evaluation > would be verifying that it's ok to do this... Thinking from the top of my > head: > - All volumes w/ a brick on this pod should be fully healed > - Sufficient cluster nodes should be up such that quorum is not lost when > this node goes down (does this matter?) > - The proposed image is compatible with the current version of the CSI > driver(s), the operator, and other gluster pods > - Probably some other stuff > The action() would update the "image=" in the Deployment to trigger the > rollout > > The idea is that queries would be made, both to the kube API and the > gluster cluster to verify the necessary preconditions for an action prior > to that action being invoked. There would obviously be commonality among > the preconditions for various actions, so the results should be fetched > exactly once per reconcile cycle. Also note, 1 cycle == at most 1 action() > due to the action changing the state of the system. > > Given that we haven't designed (or even listed) all the potential > action()s, I can't give you a list of everything to query. I guarantee > we'll need to know the up/down status, heal counts, and free capacity for > each brick and node. > Thanks for the detailed explanation. This helps. One question though, is 5 seconds a hard limit or is there a possibility to configure it? > > -John > > On Wed, Jul 25, 2018 at 11:56 AM Pranith Kumar Karampuri < > pkara...@redhat.com> wrote: > >> >> >> On Wed, Jul 25, 2018 at 8:17 PM, John Strunk <jstr...@redhat.com> wrote: >> >>> To add an additional data point... The operator will need to regularly >>> reconcile the true state of the gluster cluster with the desired state >>> stored in kubernetes. This task will be required frequently (i.e., >>> operator-framework defaults to every 5s even if there are no config >>> changes). >>> The actual amount of data we will need to query from the cluster is >>> currently TBD and likely significantly affected by Heketi/GD1 vs. GD2 >>> choice. >>> >> >> Do we have any partial list of data we will gather? Just want to >> understand what this might entail already... >> >> >>> >>> -John >>> >>> >>> On Wed, Jul 25, 2018 at 5:45 AM Pranith Kumar Karampuri < >>> pkara...@redhat.com> wrote: >>> >>>> >>>> >>>> On Tue, Jul 24, 2018 at 10:10 PM, Sankarshan Mukhopadhyay < >>>> sankarshan.mukhopadh...@gmail.com> wrote: >>>> >>>>> On Tue, Jul 24, 2018 at 9:48 PM, Pranith Kumar Karampuri >>>>> <pkara...@redhat.com> wrote: >>>>> > hi, >>>>> > Quite a few commands to monitor gluster at the moment take >>>>> almost a >>>>> > second to give output. >>>>> >>>>> Is this at the (most) minimum recommended cluster size? >>>>> >>>> >>>> Yes, with a single volume with 3 bricks i.e. 3 nodes in cluster. >>>> >>>> >>>>> >>>>> > Some categories of these commands: >>>>> > 1) Any command that needs to do some sort of mount/glfs_init. >>>>> > Examples: 1) heal info family of commands 2) statfs to find >>>>> > space-availability etc (On my laptop replica 3 volume with all local >>>>> bricks, >>>>> > glfs_init takes 0.3 seconds on average) >>>>> > 2) glusterd commands that need to wait for the previous command to >>>>> unlock. >>>>> > If the previous command is something related to lvm snapshot which >>>>> takes >>>>> > quite a few seconds, it would be even more time consuming. >>>>> > >>>>> > Nowadays container workloads have hundreds of volumes if not >>>>> thousands. If >>>>> > we want to serve any monitoring solution at this scale (I have seen >>>>> > customers use upto 600 volumes at a time, it will only get bigger) >>>>> and lets >>>>> > say collecting metrics per volume takes 2 seconds per volume(Let us >>>>> take the >>>>> > worst example which has all major features enabled like >>>>> > snapshot/geo-rep/quota etc etc), that will mean that it will take 20 >>>>> minutes >>>>> > to collect metrics of the cluster with 600 volumes. What are the >>>>> ways in >>>>> > which we can make this number more manageable? I was initially >>>>> thinking may >>>>> > be it is possible to get gd2 to execute commands in parallel on >>>>> different >>>>> > volumes, so potentially we could get this done in ~2 seconds. But >>>>> quite a >>>>> > few of the metrics need a mount or equivalent of a mount(glfs_init) >>>>> to >>>>> > collect different information like statfs, number of pending heals, >>>>> quota >>>>> > usage etc. This may lead to high memory usage as the size of the >>>>> mounts tend >>>>> > to be high. >>>>> > >>>>> >>>>> I am not sure if starting from the "worst example" (it certainly is >>>>> not) is a good place to start from. >>>> >>>> >>>> I didn't understand your statement. Are you saying 600 volumes is a >>>> worst example? >>>> >>>> >>>>> That said, for any environment >>>>> with that number of disposable volumes, what kind of metrics do >>>>> actually make any sense/impact? >>>>> >>>> >>>> Same metrics you track for long running volumes. It is just that the >>>> way the metrics >>>> are interpreted will be different. On a long running volume, you would >>>> look at the metrics >>>> and try to find why is the volume not giving performance as expected in >>>> the last 1 hour. Where as >>>> in this case, you would look at metrics and find the reason why volumes >>>> that were >>>> created and deleted in the last hour didn't give performance as >>>> expected. >>>> >>>> >>>>> >>>>> > I wanted to seek suggestions from others on how to come to a >>>>> conclusion >>>>> > about which path to take and what problems to solve. >>>>> > >>>>> > I will be happy to raise github issues based on our conclusions on >>>>> this mail >>>>> > thread. >>>>> > >>>>> > -- >>>>> > Pranith >>>>> > >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> sankarshan mukhopadhyay >>>>> <https://about.me/sankarshan.mukhopadhyay> >>>>> _______________________________________________ >>>>> Gluster-devel mailing list >>>>> Gluster-devel@gluster.org >>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel >>>>> >>>> >>>> >>>> >>>> -- >>>> Pranith >>>> _______________________________________________ >>>> Gluster-devel mailing list >>>> Gluster-devel@gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-devel >>> >>> >> >> >> -- >> Pranith >> > -- Pranith
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel