Yeah that's true. I wonder if we could get iscsi information from the
Solidfire API that way the agent is just responsible for the iscsi login of
the fake volume and then we just monitor the API for connect time

On Mon, Jun 8, 2020 at 12:17 PM Paul Angus <paul.an...@shapeblue.com> wrote:

> Doesn’t the same problem exist with that?  If the agent dies, then the
> fake volume will stop being updated but the VMs would still be running.
>
> paul.an...@shapeblue.com
> www.shapeblue.com
> 3 London Bridge Street,  3rd floor, News Building, London  SE1 9SGUK
> @shapeblue
>
>
>
>
> -----Original Message-----
> From: Syed Ahmed <sah...@cloudops.com.INVALID>
> Sent: 08 June 2020 17:14
> To: dev@cloudstack.apache.org
> Subject: Re: Managed Storage and HA
>
> My suggestion would be to use a "fake" volume for each host and use that
> to check the activity of a host. The volume can be updated by the agent
> periodically and then we can use the above API from the management server
> to query the volume activity.
>
> On Tue, Jun 2, 2020 at 1:18 PM Tutkowski, Mike <mike.tutkow...@netapp.com>
> wrote:
>
> > Hi Sven,
> >
> > You can use the ListVolumeStats API call (I put in an example request
> > and response below).
> >
> > Since this goes over the management network, though, it's possible if
> > your management network is down, but your storage network is up that
> > this call could fail, but your VMs might still have perfectly good
> > access to their volumes.
> >
> > Talk to you later!
> > Mike
> >
> > Request:
> >
> > {
> >    "method": "ListVolumeStats",
> >    "params": {
> >         "volumeIDs": [1, 2]
> >    },
> >    "id" : 1
> > }
> >
> > Response:
> >
> > {
> >     "id": 1,
> >     "result": {
> >         "volumeStats": [
> >             {
> >                 "accountID": 1,
> >                 "actualIOPS": 14,
> >                 "asyncDelay": null,
> >                 "averageIOPSize": 13763,
> >                 "burstIOPSCredit": 0,
> >                 "clientQueueDepth": 0,
> >                 "desiredMetadataHosts": null,
> >                 "latencyUSec": 552,
> >                 "metadataHosts": {
> >                     "deadSecondaries": [],
> >                     "liveSecondaries": [],
> >                     "primary": 5
> >                 },
> >                 "nonZeroBlocks": 10962174,
> >                 "normalizedIOPS": 34,
> >                 "readBytes": 747306804224,
> >                 "readBytesLastSample": 0,
> >                 "readLatencyUSec": 0,
> >                 "readLatencyUSecTotal": 11041939920,
> >                 "readOps": 19877559,
> >                 "readOpsLastSample": 0,
> >                 "samplePeriodMSec": 500,
> >                 "throttle": 0,
> >                 "timestamp": "2020-06-02T17:14:35.444789Z",
> >                 "unalignedReads": 2176454,
> >                 "unalignedWrites": 1438822,
> >                 "volumeAccessGroups": [
> >                     1
> >                 ],
> >                 "volumeID": 1,
> >                 "volumeSize": 2147483648000,
> >                 "volumeUtilization": 0.002266666666666667,
> >                 "writeBytes": 3231402834432,
> >                 "writeBytesLastSample": 106496,
> >                 "writeLatencyUSec": 552,
> >                 "writeLatencyUSecTotal": 44174792405,
> >                 "writeOps": 340339085,
> >                 "writeOpsLastSample": 7,
> >                 "zeroBlocks": 513325826
> >             },
> >             {
> >                 "accountID": 1,
> >                 "actualIOPS": 0,
> >                 "asyncDelay": null,
> >                 "averageIOPSize": 11261,
> >                 "burstIOPSCredit": 0,
> >                 "clientQueueDepth": 0,
> >                 "desiredMetadataHosts": null,
> >                 "latencyUSec": 0,
> >                 "metadataHosts": {
> >                     "deadSecondaries": [],
> >                     "liveSecondaries": [],
> >                     "primary": 5
> >                 },
> >                 "nonZeroBlocks": 28816654,
> >                 "normalizedIOPS": 0,
> >                 "readBytes": 778768996864,
> >                 "readBytesLastSample": 0,
> >                 "readLatencyUSec": 0,
> >                 "readLatencyUSecTotal": 7068679159,
> >                 "readOps": 14977610,
> >                 "readOpsLastSample": 0,
> >                 "samplePeriodMSec": 500,
> >                 "throttle": 0,
> >                 "timestamp": "2020-06-02T17:14:35.445978Z",
> >                 "unalignedReads": 890959,
> >                 "unalignedWrites": 358758,
> >                 "volumeAccessGroups": [
> >                     1
> >                 ],
> >                 "volumeID": 2,
> >                 "volumeSize": 2147483648000,
> >                 "volumeUtilization": 0,
> >                 "writeBytes": 8957684071424,
> >                 "writeBytesLastSample": 0,
> >                 "writeLatencyUSec": 0,
> >                 "writeLatencyUSecTotal": 16780712096,
> >                 "writeOps": 406101472,
> >                 "writeOpsLastSample": 0,
> >                 "zeroBlocks": 495471346
> >             }
> >         ]
> >     }
> > }
> >
> > On 6/2/20, 9:11 AM, "Sven Vogel" <s.vo...@ewerk.com> wrote:
> >
> >     NetApp Security WARNING: This is an external email. Do not click
> > links or open attachments unless you recognize the sender and know the
> > content is safe.
> >
> >
> >
> >
> >     Hi Paul,
> >
> >     Thanks for the answer and help.
> >
> >     Ok. Secondary Storage is no good solution what I understand.
> >
> >     > 1. HAManager
> >     > 2. HighAvailbilityManager
> >     > 3. KVMHAConfig
> >
> >
> >     which of the three should we expand and which one should be active?
> >
> >     @Mike did you know somethings like that if there is a check of
> > volume activity?
> >     Maybe we can poll the API but I think this will be a massive
> > polling
> > (overload) if we poll for each volume.
> >     Ah the moment I don’t have any idea how this could work.
> >
> >     Cheers
> >
> >     Sven
> >
> >
> >     __
> >
> >     Sven Vogel
> >     Lead Cloud Solution Architect
> >
> >     EWERK DIGITAL GmbH
> >     Brühl 24, D-04109 Leipzig
> >     P +49 341 42649 - 99
> >     F +49 341 42649 - 98
> >     s.vo...@ewerk.com
> >     www.ewerk.com
> >
> >     Geschäftsführer:
> >     Dr. Erik Wende, Hendrik Schubert, Tassilo Möschke
> >     Registergericht: Leipzig HRB 9065
> >
> >     Zertifiziert nach:
> >     ISO/IEC 27001:2013
> >     DIN EN ISO 9001:2015
> >     DIN ISO/IEC 20000-1:2011
> >
> >     EWERK-Blog | LinkedIn | Xing | Twitter | Facebook
> >
> >     Auskünfte und Angebote per Mail sind freibleibend und unverbindlich.
> >
> >     Disclaimer Privacy:
> >     Der Inhalt dieser E-Mail (einschließlich etwaiger beigefügter
> > Dateien) ist vertraulich und nur für den Empfänger bestimmt. Sollten
> > Sie nicht der bestimmungsgemäße Empfänger sein, ist Ihnen jegliche
> > Offenlegung, Vervielfältigung, Weitergabe oder Nutzung des Inhalts
> > untersagt. Bitte informieren Sie in diesem Fall unverzüglich den
> > Absender und löschen Sie die E-Mail (einschließlich etwaiger beigefügter
> Dateien) von Ihrem System.
> > Vielen Dank.
> >
> >     The contents of this e-mail (including any attachments) are
> > confidential and may be legally privileged. If you are not the
> > intended recipient of this e-mail, any disclosure, copying,
> > distribution or use of its contents is strictly prohibited, and you
> > should please notify the sender immediately and then delete it
> > (including any attachments) from your system. Thank you.
> >     > Am 01.06.2020 um 19:30 schrieb Paul Angus
> > <paul.an...@shapeblue.com
> > >:
> >     >
> >     > Hi Sven,
> >     >
> >     > I think that there is a piece of the jigsaw that you are missing.
> >     >
> >     > Given that the only thing that we know, is that we can no longer
> > communicate with the host agent;  To avoid split brain/corruption of
> > VMs, CloudStack must determine if the guests VMs are still running on
> > the host not.  The only way we can do that is look for disk activity
> > created by those VMs.
> >     >
> >     > Using a secondary storage heartbeat would give a false 'host is
> > down' if say a switch went down carrying sec storage and mgmt. traffic
> >     >
> >     > Wrt solidfire, you could poll SolidFire via API for activity on
> > the volumes which belong to the VMs on the unresponsive host.  I don't
> > know if there is an equivalent for Ceph.
> >     >
> >     > Kind regards
> >     >
> >     >
> >     > Paul Angus
> >     >
> >     >
> >     >
> >     > paul.an...@shapeblue.com
> >     > www.shapeblue.com
> >     > 3 London Bridge Street,  3rd floor, News Building, London  SE1
> 9SGUK
> >     > @shapeblue
> >     >
> >     >
> >     >
> >     >
> >     > -----Original Message-----
> >     > From: Sven Vogel <s.vo...@ewerk.com>
> >     > Sent: 01 June 2020 12:30
> >     > To: dev <dev@cloudstack.apache.org>
> >     > Subject: Managed Storage and HA
> >     >
> >     > Hi Community,
> >     >
> >     > I try to encounter how HA works. Our goal is it to make it
> > usable with managed storage like (Netapp Solidfire / maybe it works
> > with CEPH too) so if its possible.
> >     >
> >     > This is a good guide and for some times we fixed and added the
> > missing keys.
> >     >
> > https://cwiki.apache.org/confluence/display/CLOUDSTACK/High+Availabili
> > ty+Developer%27s+Guide
> > <
> > https://cwiki.apache.org/confluence/display/CLOUDSTACK/High+Availabili
> > ty+Developer's+Guide
> > >
> >     > https://cwiki.apache.org/confluence/display/CLOUDSTACK/Host+HA
> >     >
> >     > In the database I found out there are three different types of HA.
> >     >
> >     > If you select in the configuration table "SELECT * FROM
> > `configuration` WHERE `component` LIKE '%ha%' LIMIT 0,1000;“ you will
> > get three types of components.
> >     >
> >     > 1. HAManager
> >     > 2. HighAvailbilityManager
> >     > 3. KVMHAConfig
> >     >
> >     > "HAManager and HighAvailbilityManager" are the base which was
> > extended from Rohit with „KVMHAConfig“ - KVM with stonith fencing.
> >     >
> >     > I understand all things work together but maybe I need to
> > understand the process a little bit better.
> >     >
> >     >
> >
> ------------------------------------------------------------------------------------
> >     > To clarify I write down what I think to each of them. This is
> > what I understand but please correct me or help me to understand it a
> > little bit better.
> >     >
> >     > —>I found out that if we use managed storage a restart of
> > virtual machines only works on the same host. This is what I
> > understand the lack of the missing heartbeat file on the shared
> > storage because we don’t have shared storage like NFS.
> >     >
> >     > —
> >     > "If the network ping investigation returns that it cannot detect
> > the status of the host, CloudStack HA then relies on the hypervisor
> > specific investigation. For VMware, there is no such investigation as
> > the hypervisor host handles its own HA. For XenServer and KVM,
> > CloudStack HA deploys a monitoring script that writes the current
> > timestamp on to a heartbeat file on shared storage."
> >     > —
> >     >
> >     > And
> >     >
> >     > —
> >     > For the initial release, only KVM with NFS storage will be
> > supported. However, the storage check component will be implemented in
> > a modular fashion allowing for checks using other storage platforms(e.g.
> > Ceph) in the future.
> >     > —
> >     >
> >
> ------------------------------------------------------------------------------------
> >     >
> >     > We would implement a plugin or extend this for managed storage
> > but at the moment I need to understand where this should happen. Since
> > managed storage uses different volumes for each VM we should its not
> > easy to make an storage heartbeat like NFS. the lack of one missing
> > volume don’t means the hole storage has an problem so I think its not
> > easy to encounter from one volumes to a complete storage.
> >     >
> >     > We don’t use KVMHAConfig a the moment and encounter that if a
> > Host goes down (offline) the virtual machines will not be restarted on
> > another host. They will only restarted on the host if the host comes
> back.(online).
> > We don’t want a hard fencing of the hosts but we want a correct
> > determination whether the host is still alive. Fencing would maybe in
> > our case a little bit hard because we don’t have an hard data
> > corruption on entire storage.
> >     >
> >     > Some questions.
> >     > 1. let's assume correctly that the HA don’t work without an
> > shared storage and network ping? Is this the cause why our virtual
> > machines will not restarted on another host? Is this correct or do we
> > have an config problem?
> >     > 2. Where could the plugin be implemented? Is there a preferred
> place?
> >     > 3. If point 1. Is correctly I thought the idea would be to add
> > an global flag to use the secondary storage (NFS) as heartbeat to find
> > out if there is any host inactive?
> >     >
> >     > Thanks and Cheers
> >     >
> >     > Sven
> >     >
> >     >
> >     > __
> >     >
> >     > Sven Vogel
> >     > Lead Cloud Solution Architect
> >     >
> >     > EWERK DIGITAL GmbH
> >     > Brühl 24, D-04109 Leipzig
> >     > P +49 341 42649 - 99
> >     > F +49 341 42649 - 98
> >     > s.vo...@ewerk.com
> >     > www.ewerk.com
> >     >
> >     > Geschäftsführer:
> >     > Dr. Erik Wende, Hendrik Schubert, Tassilo Möschke
> >     > Registergericht: Leipzig HRB 9065
> >     >
> >     > Support:
> >     > +49 341 42649 555
> >     >
> >     > Zertifiziert nach:
> >     > ISO/IEC 27001:2013
> >     > DIN EN ISO 9001:2015
> >     > DIN ISO/IEC 20000-1:2011
> >     >
> >     > ISAE 3402 Typ II Assessed
> >     >
> >     > EWERK-Blog<https://blog.ewerk.com/> | LinkedIn<
> > https://www.linkedin.com/company/ewerk-group> | Xing<
> > https://www.xing.com/company/ewerk> | Twitter<
> > https://twitter.com/EWERK_Group> | Facebook<
> > https://de-de.facebook.com/EWERK.IT/>
> >     >
> >     >
> >     > Auskünfte und Angebote per Mail sind freibleibend und
> unverbindlich.
> >     >
> >     > Disclaimer Privacy:
> >     > Der Inhalt dieser E-Mail (einschließlich etwaiger beigefügter
> > Dateien) ist vertraulich und nur für den Empfänger bestimmt. Sollten
> > Sie nicht der bestimmungsgemäße Empfänger sein, ist Ihnen jegliche
> > Offenlegung, Vervielfältigung, Weitergabe oder Nutzung des Inhalts
> > untersagt. Bitte informieren Sie in diesem Fall unverzüglich den
> > Absender und löschen Sie die E-Mail (einschließlich etwaiger beigefügter
> Dateien) von Ihrem System.
> > Vielen Dank.
> >     >
> >     > The contents of this e-mail (including any attachments) are
> > confidential and may be legally privileged. If you are not the
> > intended recipient of this e-mail, any disclosure, copying,
> > distribution or use of its contents is strictly prohibited, and you
> > should please notify the sender immediately and then delete it
> > (including any attachments) from your system. Thank you.
> >
> >
> >
>

Reply via email to