Hey Sergio,
Glad to know that you're not having any feature related issues (to me this is a good sign). Based on your answers, it makes sense to require a reliability solution for backend data (or some sort of health monitoring for the user data). So, I wonder what your thoughts are for such an audit system. At a first glance, this looks rather not scalable, at least if you plan to do the audit on all of the active images. Consider a deployment trying to run this for around 100-500K active image records. This will need to be run in batches, thus completing the list of records and saying that you've done a full audit of the active image -- is a NP-complete problem (new images can be introduced, some images can be updated in the meantime, etc.) The failure rate is low, so a random (sparse check) on the image data won't help either. Would a cron job setup to do the audit for smaller deployments work? May be we can look into some known cron solutions to do the trick? On 9/12/16 4:18 PM, Sergio A. de Carvalho Jr. wrote: > Hi Nikhil, > > Thanks so much for you response. > > 1) No, this is a private cloud. > 2) Glance v1 (this problem has manifested itself in one of our oldest > deployments, which is running Icehouse). > 3) No, location is not exposed. > 4) Glance is setup with the filesystem backend drive, using a Gluster > volume mounted on the host.. > 5.1) Images were in active state, even though the image file had zero > bytes. > 5.2) very low, it may have happened only twice in the last year. > > Even if the location is not exposed, there are a number of things that > can happen to the actual images files after they've been uploaded to > Glance, without Glance noticing, depending how reliable your storage > backend is. That's why I thought, in some circumstances, it would be > useful to have some sort of background service checking that image > files haven't been corrupted or gone missing altogether. > > Sergio > > > On Mon, Sep 12, 2016 at 7:27 PM, Nikhil Komawar <nik.koma...@gmail.com > <mailto:nik.koma...@gmail.com>> wrote: > > > Hi Sergio, > > Thanks for reaching out. And this is an excellent question. > > Firstly, I'd like to mention that Glance is built-in (and if deployed > correctly) is self-resilient in ensuring that you do NOT need an audit > of such files. In fact, if any operator (particularly large scale > operator) needs such a system we have a serious issue where > potentially > important /user/ data is likely to be lost resulting in legal > issues (so > please beware). > > Having said that, I'd like to start investigating more into your > particular issue and see where we may be missing out in ensuring data > integrity in Glance. Let me ask you a first few set of questions that > will help us get an initial understanding:- > > 1) Are you a public cloud vendor; in particular, have you deployed > glance to potentially non-trusted users? or is the case otherwise? > 2) Are you deploying Glance v1 or Glance v2? > 3) Have you exposed the "location" feature set (CRUD) to regular > users? > (if using API v2, have you enabled ``show_multiple_locations`` > configuration) > 4) What backends have you configured Glance with and who has access to > them? What is the resiliency or rotation (of disks) (for say capacity > management) of your backend store system? > 5) Sanity check on your issue:- > 5.1) What are the image statues for which the image data files are > missing? > 5.2) What is the rate of error approximately (if you don't have > specifics, info like rare, medium, often will help) > > > We may have to dig a bit further into the issue but this set of info > should help us narrow down the issue and determine if there are > any gaps > in Glance. > > P.S. Please use the tag "[glance]" in the subject line to help us > get to > your email faster. > > On 9/12/16 12:48 PM, Sergio A. de Carvalho Jr. wrote: > > Hi all, > > > > Is there (or was there ever) any plans to implement in Glance a > > service that would periodically check that the image files are still > > available on the file system (or in whatever storage system being > > used) and have the correct checksum? > > > > We had a few issues where an image file was removed from the > > filesystem and that can go undetected for a long time until someone > > tries to access that image, so we were wondering if it would be > > possible (and if it would make sense) to implement some sort of > > background service to periodically check if all images found in the > > database can be retrieved successfully. > > > > Thoughts? > > > > Sergio > > > > > > > > > > > __________________________________________________________________________ > > OpenStack Development Mailing List (not for usage questions) > > Unsubscribe: > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe> > > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev> > > > > > > > -- > > Thanks, > Nikhil > > > -- Thanks, Nikhil __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev