I think a proactive background check service could be useful in some cases but of course it'd have to be optional and configurable to allow operators to tune the trade-off between the effort required to check all images versus the risk of hitting a rogue file.
Letting the backend report health back to Glance, as suggested by Avishay, is also an option but not every backend has this capability (e.g. local filesystem) and that would also require the backend to keep track of the original checksum in Glance, which again might not be always possible. Another option I see is to update the image status when Glance attempts to serve an image and notices that the file isn't available or doesn't match the checksum. In Icehouse, Glance simply returns a 500, which doesn't get properly reported back to the user (when a VM is being created). I'm not sure if this is handled better in later versions of Glance and Nova. On Tue, Sep 13, 2016 at 8:01 AM, Avishay Traeger <[email protected]> wrote: > On Tue, Sep 13, 2016 at 7:16 AM, Nikhil Komawar <[email protected]> > wrote: > > Firstly, I'd like to mention that Glance is built-in (and if deployed > > correctly) is self-resilient in ensuring that you do NOT need an > audit > > of such files. In fact, if any operator (particularly large scale > > operator) needs such a system we have a serious issue where > > potentially > > important /user/ data is likely to be lost resulting in legal > > issues (so > > please beware). > > Can you please elaborate on how Glance is self-resilient? > > Hey Sergio, >> >> >> Glad to know that you're not having any feature related issues (to me >> this is a good sign). Based on your answers, it makes sense to require a >> reliability solution for backend data (or some sort of health monitoring >> for the user data). >> > > All backends will at some point lose some data. The ask is for reflecting > the image's "health" to the user. > > >> So, I wonder what your thoughts are for such an audit system. At a first >> glance, this looks rather not scalable, at least if you plan to do the >> audit on all of the active images. Consider a deployment trying to run >> this for around 100-500K active image records. This will need to be run >> in batches, thus completing the list of records and saying that you've >> done a full audit of the active image -- is a NP-complete problem (new >> images can be introduced, some images can be updated in the meantime, >> etc.) >> > > NP-complete? Really? Every storage system scrubs all data periodically > to protect from disk errors. Glance images should be relatively static > anyway. > > >> The failure rate is low, so a random (sparse check) on the image data >> won't help either. Would a cron job setup to do the audit for smaller >> deployments work? May be we can look into some known cron solutions to >> do the trick? >> > > How about letting the backend report the health? S3, for example, reports > an event on object loss > <http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html#supported-notification-event-types>. > The S3 driver could monitor those events and update status. Swift performs > scrubbing to determine object health - I haven't checked if it reports an > event on object loss, but don't see any reason not to. For local > filesystem, it would need its own scrubbing process (e.g., recalculate hash > for each object every N days). On the other hand if it is a mount of some > filer, the filer should be able to report on health. > > Thanks, > Avishay > > -- > *Avishay Traeger, PhD* > *System Architect* > > Mobile: +972 54 447 1475 > E-mail: [email protected] > > > > Web <http://www.stratoscale.com/> | Blog > <http://www.stratoscale.com/blog/> | Twitter > <https://twitter.com/Stratoscale> | Google+ > <https://plus.google.com/u/1/b/108421603458396133912/108421603458396133912/posts> > | Linkedin <https://www.linkedin.com/company/stratoscale> >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
