Hi,

The work around is fine, I still think there could be a bug here. I will
try and spend some time in the next few days to write something to test
with Regular buffered IO.

But even using memory-mapped IO I would not expect the read request from
server1 or server2  to get zeros's in place of the file contents in the
event the file producer had not synchronised the file contents.

Surly we should expect the read to fail or lock until the file is
finished flushing by the producer and is available for read, or I am miss
understanding some logic?

Thanks

On Fri, Sep 2, 2016 at 7:37 PM, Gregory Farnum <gfar...@redhat.com> wrote:

> On Fri, Sep 2, 2016 at 11:35 AM, Sean Redmond <sean.redmo...@gmail.com>
> wrote:
> > Hi,
> >
> > That makes sense, I have worked around this by forcing the sync within
> the
> > application running under apache and it is working very well now without
> the
> > need for the 'sync' mount option.
> >
> > What interesting is that in the pastebin provided below it shows a way to
> > replicate this, I was just using a wget to download a file to the ceph
> file
> > system instead of using apache to do the upload, just to simplify it, but
> > maybe wget is also using memory-mapped IO.
>
> That appears to be the case, yeah:
> https://lists.gnu.org/archive/html/bug-wget/2013-09/msg00004.html
>
> I'm starting to feel a little better. Glad you found a workaround. :)
> -Greg
>
>
> >
> > http://pastebin.com/QK8AemAb
> >
> > Thanks
> >
> > On Fri, Sep 2, 2016 at 6:32 PM, Gregory Farnum <gfar...@redhat.com>
> wrote:
> >>
> >> On Thu, Sep 1, 2016 at 8:02 AM, Sean Redmond <sean.redmo...@gmail.com>
> >> wrote:
> >> > Hi,
> >> >
> >> > It seems to be using syscall mmap() from what I read this indicates it
> >> > is
> >> > using memory-mapped IO.
> >> >
> >> > Please see a strace here: http://pastebin.com/6wjhSNrP
> >>
> >> Zheng meant is Apache using memory-mapped IO. From a quick google it
> >> does in some configurations, but I'm not sure how common it is.
> >>
> >> We ask because Ceph does not synchronize mmap IO for you and Apache
> >> probably isn't doing it either; that would fit the symptoms you're
> >> seeing. Regular buffered IO should not be exhibiting any of these
> >> issues, although obviously we can't guarantee there are no bugs.
> >> -Greg
> >>
> >> >
> >> > Thanks
> >> >
> >> > On Wed, Aug 31, 2016 at 5:51 PM, Sean Redmond <
> sean.redmo...@gmail.com>
> >> > wrote:
> >> >>
> >> >> I am not sure how to tell?
> >> >>
> >> >> Server1 and Server2 mount the ceph file system using kernel client
> >> >> 4.7.2
> >> >> and I can replicate the problem using '/usr/bin/sum' to read the file
> >> >> or a
> >> >> http GET request via a web server (apache).
> >> >>
> >> >> On Wed, Aug 31, 2016 at 2:38 PM, Yan, Zheng <uker...@gmail.com>
> wrote:
> >> >>>
> >> >>> On Wed, Aug 31, 2016 at 12:49 AM, Sean Redmond
> >> >>> <sean.redmo...@gmail.com>
> >> >>> wrote:
> >> >>> > Hi,
> >> >>> >
> >> >>> > I have been able to pick through the process a little further and
> >> >>> > replicate
> >> >>> > it via the command line. The flow seems looks like this:
> >> >>> >
> >> >>> > 1) The user uploads an image to webserver server 'uploader01' it
> >> >>> > gets
> >> >>> > written to a path such as
> >> >>> > '/cephfs/webdata/static/456/JHL/66448H-755h.jpg'
> >> >>> > on cephfs
> >> >>> >
> >> >>> > 2) The MDS makes the file meta data available for this new file
> >> >>> > immediately
> >> >>> > to all clients.
> >> >>> >
> >> >>> > 3) The 'uploader01' server asynchronously commits the file
> contents
> >> >>> > to
> >> >>> > disk
> >> >>> > as sync is not explicitly called during the upload.
> >> >>> >
> >> >>> > 4) Before step 3 is done the visitor requests the file via one of
> >> >>> > two
> >> >>> > web
> >> >>> > servers server1 or server2 - the MDS provides the meta data but
> the
> >> >>> > contents
> >> >>> > of the file is not committed to disk yet so the data read returns
> >> >>> > 0's -
> >> >>> > This
> >> >>> > is then cached by the file system page cache until it expires or
> is
> >> >>> > flushed
> >> >>> > manually.
> >> >>>
> >> >>> do server1 or server2 use memory-mapped IO to read the file?
> >> >>>
> >> >>> Regards
> >> >>> Yan, Zheng
> >> >>>
> >> >>> >
> >> >>> > 5) As step 4 typically only happens on one of the two web servers
> >> >>> > before
> >> >>> > step 3 is complete we get the mismatch between server1 and server2
> >> >>> > file
> >> >>> > system page cache.
> >> >>> >
> >> >>> > The below demonstrates how to reproduce this issue
> >> >>> >
> >> >>> > http://pastebin.com/QK8AemAb
> >> >>> >
> >> >>> > As we can see the checksum of the file returned by the web server
> is
> >> >>> > 0
> >> >>> > as
> >> >>> > the file contents has not been flushed to disk from server
> >> >>> > uploader01
> >> >>> >
> >> >>> > If however we call ‘sync’ as shown below the checksum is correct:
> >> >>> >
> >> >>> > http://pastebin.com/p4CfhEFt
> >> >>> >
> >> >>> > If we also wait for 10 seconds for the kernel to flush the dirty
> >> >>> > pages,
> >> >>> > we
> >> >>> > can also see the checksum is valid:
> >> >>> >
> >> >>> > http://pastebin.com/1w6UZzNQ
> >> >>> >
> >> >>> > It looks it maybe a race between the time it takes the uploader01
> >> >>> > server to
> >> >>> > commit the file to the file system and the fast incoming read
> >> >>> > request
> >> >>> > from
> >> >>> > the visiting user to server1 or server2.
> >> >>> >
> >> >>> > Thanks
> >> >>> >
> >> >>> >
> >> >>> > On Tue, Aug 30, 2016 at 10:21 AM, Sean Redmond
> >> >>> > <sean.redmo...@gmail.com>
> >> >>> > wrote:
> >> >>> >>
> >> >>> >> You are correct it only seems to impact recently modified files.
> >> >>> >>
> >> >>> >> On Tue, Aug 30, 2016 at 3:36 AM, Yan, Zheng <uker...@gmail.com>
> >> >>> >> wrote:
> >> >>> >>>
> >> >>> >>> On Tue, Aug 30, 2016 at 2:11 AM, Gregory Farnum
> >> >>> >>> <gfar...@redhat.com>
> >> >>> >>> wrote:
> >> >>> >>> > On Mon, Aug 29, 2016 at 7:14 AM, Sean Redmond
> >> >>> >>> > <sean.redmo...@gmail.com>
> >> >>> >>> > wrote:
> >> >>> >>> >> Hi,
> >> >>> >>> >>
> >> >>> >>> >> I am running cephfs (10.2.2) with kernel 4.7.0-1. I have
> >> >>> >>> >> noticed
> >> >>> >>> >> that
> >> >>> >>> >> frequently static files are showing empty when serviced via a
> >> >>> >>> >> web
> >> >>> >>> >> server
> >> >>> >>> >> (apache). I have tracked this down further and can see when
> >> >>> >>> >> running a
> >> >>> >>> >> checksum against the file on the cephfs file system on the
> node
> >> >>> >>> >> serving the
> >> >>> >>> >> empty http response the checksum is '00000'
> >> >>> >>> >>
> >> >>> >>> >> The below shows the checksum on a defective node.
> >> >>> >>> >>
> >> >>> >>> >> [root@server2]# ls -al
> >> >>> >>> >> /cephfs/webdata/static/456/JHL/66448H-755h.jpg
> >> >>> >>> >> -rw-r--r-- 1 apache apache 53317 Aug 28 23:46
> >> >>> >>> >> /cephfs/webdata/static/456/JHL/66448H-755h.jpg
> >> >>> >>>
> >> >>> >>> It seems this file was modified recently. Maybe the web server
> >> >>> >>> silently modifies the files. Please check if this issue happens
> on
> >> >>> >>> older files.
> >> >>> >>>
> >> >>> >>> Regards
> >> >>> >>> Yan, Zheng
> >> >>> >>>
> >> >>> >>> >>
> >> >>> >>> >> [root@server2]# sum
> >> >>> >>> >> /cephfs/webdata/static/456/JHL/66448H-755h.jpg
> >> >>> >>> >> 00000    53
> >> >>> >>> >
> >> >>> >>> > So can we presume there are no file contents, and it's just 53
> >> >>> >>> > blocks
> >> >>> >>> > of zeros?
> >> >>> >>> >
> >> >>> >>> > This doesn't sound familiar to me; Zheng, do you have any
> ideas?
> >> >>> >>> > Anyway, ceph-fuse shouldn't be susceptible to this bug even
> with
> >> >>> >>> > the
> >> >>> >>> > page cache enabled; if you're just serving stuff via the web
> >> >>> >>> > it's
> >> >>> >>> > probably a better idea anyway (harder to break, easier to
> >> >>> >>> > update,
> >> >>> >>> > etc).
> >> >>> >>> > -Greg
> >> >>> >>> >
> >> >>> >>> >>
> >> >>> >>> >> The below shows the checksum on a working node.
> >> >>> >>> >>
> >> >>> >>> >> [root@server1]# ls -al
> >> >>> >>> >> /cephfs/webdata/static/456/JHL/66448H-755h.jpg
> >> >>> >>> >> -rw-r--r-- 1 apache apache 53317 Aug 28 23:46
> >> >>> >>> >> /cephfs/webdata/static/456/JHL/66448H-755h.jpg
> >> >>> >>> >>
> >> >>> >>> >> [root@server1]# sum
> >> >>> >>> >> /cephfs/webdata/static/456/JHL/66448H-755h.jpg
> >> >>> >>> >> 03620    53
> >> >>> >>> >> [root@server1]#
> >> >>> >>> >>
> >> >>> >>> >> If I flush the cache as shown below the checksum returns as
> >> >>> >>> >> expected
> >> >>> >>> >> and the
> >> >>> >>> >> web server serves up valid content.
> >> >>> >>> >>
> >> >>> >>> >> [root@server2]# echo 3 > /proc/sys/vm/drop_caches
> >> >>> >>> >> [root@server2]# sum
> >> >>> >>> >> /cephfs/webdata/static/456/JHL/66448H-755h.jpg
> >> >>> >>> >> 03620    53
> >> >>> >>> >>
> >> >>> >>> >> After some time typically less than 1hr the issue repeats, It
> >> >>> >>> >> seems to
> >> >>> >>> >> not
> >> >>> >>> >> repeat if I take any one of the servers out of the LB and
> only
> >> >>> >>> >> serve
> >> >>> >>> >> requests from one of the servers.
> >> >>> >>> >>
> >> >>> >>> >> I may try and use the FUSE client has has a mount option
> >> >>> >>> >> direct_io
> >> >>> >>> >> that
> >> >>> >>> >> looks to disable page cache.
> >> >>> >>> >>
> >> >>> >>> >> I have been hunting in the ML and tracker but could not see
> >> >>> >>> >> anything
> >> >>> >>> >> really
> >> >>> >>> >> close to this issue, Any input or feedback on similar
> >> >>> >>> >> experiences
> >> >>> >>> >> is
> >> >>> >>> >> welcome.
> >> >>> >>> >>
> >> >>> >>> >> Thanks
> >> >>> >>> >>
> >> >>> >>> >>
> >> >>> >>> >> _______________________________________________
> >> >>> >>> >> ceph-users mailing list
> >> >>> >>> >> ceph-users@lists.ceph.com
> >> >>> >>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >>> >>> >>
> >> >>> >>> > _______________________________________________
> >> >>> >>> > ceph-users mailing list
> >> >>> >>> > ceph-users@lists.ceph.com
> >> >>> >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >>> >>
> >> >>> >>
> >> >>> >
> >> >>
> >> >>
> >> >
> >
> >
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to