On Wed, May 24, 2017 at 11:02:02AM +0200, Niels de Vos wrote: > On Tue, May 23, 2017 at 01:27:50PM -0400, Jeff Cody wrote: > > On current released versions of glusterfs, glfs_lseek() will sometimes > > return invalid values for SEEK_DATA or SEEK_HOLE. For SEEK_DATA and > > SEEK_HOLE, the returned value should be >= the passed offset, or < 0 in > > the case of error: > > > > LSEEK(2): > > > > off_t lseek(int fd, off_t offset, int whence); > > > > [...] > > > > SEEK_HOLE > > Adjust the file offset to the next hole in the file greater > > than or equal to offset. If offset points into the middle of > > a hole, then the file offset is set to offset. If there is no > > hole past offset, then the file offset is adjusted to the end > > of the file (i.e., there is an implicit hole at the end of > > any file). > > > > [...] > > > > RETURN VALUE > > Upon successful completion, lseek() returns the resulting > > offset location as measured in bytes from the beginning of the > > file. On error, the value (off_t) -1 is returned and errno is > > set to indicate the error > > > > However, occasionally glfs_lseek() for SEEK_HOLE/DATA will return a > > value less than the passed offset, yet greater than zero. > > > > For instance, here are example values observed from this call: > > > > offs = glfs_lseek(s->fd, start, SEEK_HOLE); > > if (offs < 0) { > > return -errno; /* D1 and (H3 or H4) */ > > } > > > > start == 7608336384 > > offs == 7607877632 > > > > This causes QEMU to abort on the assert test. When this value is > > returned, errno is also 0. > > > > This is a reported and known bug to glusterfs: > > https://bugzilla.redhat.com/show_bug.cgi?id=1425293 > > > > Although this is being fixed in gluster, we still should work around it > > in QEMU, given that multiple released versions of gluster behave this > > way. > > Versions of GlusterFS 3.8.0 - 3.8.8 are affected, 3.8.9 has the fix > already and was released in February this year. We encourage users to > update to recent versions, and provide a stable (bugfix only) update > each month.
I am able to reproduce this bug on a server running glusterfs 3.11.0rc0. Is that expected? > > The Red Hat Gluster Storage product that is often used in combination > with QEMU (+oVirt) does unfortunately not have an update where this is > fixed. > > Using an unfixed Gluster storage environment can cause QEMU to segfault. > Although fixes exist for Gluster, not all users will have them > installed. Preventing the segfault in QEMU due to a broken storage > environment makes sense to me. > > > This patch treats the return case of (offs < start) the same as if an > > error value other than ENXIO is returned; we will assume we learned > > nothing, and there are no holes in the file. > > > > Signed-off-by: Jeff Cody <jc...@redhat.com> > > --- > > block/gluster.c | 18 ++++++++++++++++-- > > 1 file changed, 16 insertions(+), 2 deletions(-) > > > > diff --git a/block/gluster.c b/block/gluster.c > > index 7c76cd0..c147909e 100644 > > --- a/block/gluster.c > > +++ b/block/gluster.c > > @@ -1275,7 +1275,14 @@ static int find_allocation(BlockDriverState *bs, > > off_t start, > > if (offs < 0) { > > return -errno; /* D3 or D4 */ > > } > > - assert(offs >= start); > > + > > + if (offs < start) { > > + /* This is not a valid return by lseek(). We are safe to just > > return > > + * -EIO in this case, and we'll treat it like D4. Unfortunately > > some > > + * versions of libgfapi will return offs < start, so an assert > > here > > + * will unneccesarily abort QEMU. */ > > This is not really correct, the problem is not in libgfapi, but in the > protocol translator on the server-side. The version of libgfapi does not > matter here, it is the version on the server. But that might be too much > detail for the comment. > > > + return -EIO; > > + } > > > > if (offs > start) { > > /* D2: in hole, next data at offs */ > > @@ -1307,7 +1314,14 @@ static int find_allocation(BlockDriverState *bs, > > off_t start, > > if (offs < 0) { > > return -errno; /* D1 and (H3 or H4) */ > > } > > - assert(offs >= start); > > + > > + if (offs < start) { > > + /* This is not a valid return by lseek(). We are safe to just > > return > > + * -EIO in this case, and we'll treat it like H4. Unfortunately > > some > > + * versions of libgfapi will return offs < start, so an assert > > here > > + * will unneccesarily abort QEMU. */ > > + return -EIO; > > + } > > > > if (offs > start) { > > /* > > -- > > 2.9.3 > > > > You might want to explain the problem a little different in the commit > message. It is fine too if you think it would become too detailed, my > explanation is in the archives now in any case. > > Reviewed-by: Niels de Vos <nde...@redhat.com>