On Tue, Mar 6, 2018 at 10:22 PM, Paul Anderson <p...@umich.edu> wrote:
> Raghavendra, > > I've commited my tests case to https://github.com/powool/gluster.git - > it's grungy, and a work in progress, but I am happy to take change > suggestions, especially if it will save folks significant time. > > For the rest, I'll reply inline below... > > On Mon, Mar 5, 2018 at 10:39 PM, Raghavendra Gowdappa > <rgowd...@redhat.com> wrote: > > +Csaba. > > > > On Tue, Mar 6, 2018 at 2:52 AM, Paul Anderson <p...@umich.edu> wrote: > >> > >> Raghavendra, > >> > >> Thanks very much for your reply. > >> > >> I fixed our data corruption problem by disabling the volume > >> performance.write-behind flag as you suggested, and simultaneously > >> disabling caching in my client side mount command. > > > > > > Good to know it worked. Can you give us the output of > > # gluster volume info > > [root@node-1 /]# gluster volume info > > Volume Name: dockerstore > Type: Replicate > Volume ID: fb08b9f4-0784-4534-9ed3-e01ff71a0144 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: 172.18.0.4:/data/glusterfs/store/dockerstore > Brick2: 172.18.0.3:/data/glusterfs/store/dockerstore > Brick3: 172.18.0.2:/data/glusterfs/store/dockerstore > Options Reconfigured: > performance.client-io-threads: off > nfs.disable: on > transport.address-family: inet > locks.mandatory-locking: optimal > performance.flush-behind: off > performance.write-behind: off > > > > > We would like to debug the problem in write-behind. Some questions: > > > > 1. What version of Glusterfs are you using? > > On the server nodes: > > [root@node-1 /]# gluster --version > glusterfs 3.13.2 > Repository revision: git://git.gluster.org/glusterfs.git > > On the docker container sqlite test node: > > root@b4055d8547d2:/# glusterfs --version > glusterfs 3.8.8 built on Jan 11 2017 14:07:11 > I guess this is where client is mounted. If I am correct on where glusterfs client is mounted, client is running quite a old version. There have been significant number of fixes between 3.8.8 and current master. I would suggest to try out 3.13.2 patched with [1]. If you get a chance to try this out, please report back how did the tests go. [1] https://review.gluster.org/19673 > I recognize that version skew could be an issue. > > > 2. Were you able to figure out whether its stale data or metadata that is > > causing the issue? > > I lean towards stale data based on the only real observation I have: > > While debugging, I put log messages in as to when the flock() is > acquired, and when it is released. There is no instance where two > different processes ever hold the same flock()'d file. From what I > have read, the locks are considered metadata, and they appear to me to > be working, so that's why I'm inclined to think stale data is the > issue. > > > > > There have been patches merged in write-behind in recent past and one in > the > > works which address metadata consistency. Would like to understand > whether > > you've run into any of the already identified issues. > > Agreed! > > Thanks, > > Paul > > > > > regards, > > Raghavendra > >> > >> > >> In very modest testing, the flock() case appears to me to work well - > >> before it would corrupt the db within a few transactions. > >> > >> Testing using built in sqlite3 locks is better (fcntl range locks), > >> but has some behavioral issues (probably just requires query retry > >> when the file is locked). I'll research this more, although the test > >> case is not critical to our use case. > >> > >> There are no signs of O_DIRECT use in the sqlite3 code that I can see. > >> > >> I intend to set up tests that run much longer than a few minutes, to > >> see if there are any longer term issues. Also, I want to experiment > >> with data durability by killing various gluster server nodes during > >> the tests. > >> > >> If anyone would like our test scripts, I can either tar them up and > >> email them or put them in github - either is fine with me. (they rely > >> on current builds of docker and docker-compose) > >> > >> Thanks again!! > >> > >> Paul > >> > >> On Mon, Mar 5, 2018 at 11:26 AM, Raghavendra Gowdappa > >> <rgowd...@redhat.com> wrote: > >> > > >> > > >> > On Mon, Mar 5, 2018 at 8:21 PM, Paul Anderson <p...@umich.edu> wrote: > >> >> > >> >> Hi, > >> >> > >> >> tl;dr summary of below: flock() works, but what does it take to make > >> >> sync()/fsync() work in a 3 node GFS cluster? > >> >> > >> >> I am under the impression that POSIX flock, POSIX > >> >> fcntl(F_SETLK/F_GETLK,...), and POSIX read/write/sync/fsync are all > >> >> supported in cluster operations, such that in theory, SQLite3 should > >> >> be able to atomically lock the file (or a subset of page), modify > >> >> pages, flush the pages to gluster, then release the lock, and thus > >> >> satisfy the ACID property that SQLite3 appears to try to accomplish > on > >> >> a local filesystem. > >> >> > >> >> In a test we wrote that fires off 10 simple concurrernt SQL insert, > >> >> read, update loops, we discovered that we at least need to use > flock() > >> >> around the SQLite3 db connection open/update/close to protect it. > >> >> > >> >> However, that is not enough - although from testing, it looks like > >> >> flock() works as advertised across gluster mounted files, sync/fsync > >> >> don't appear to, so we end up getting corruption in the SQLite3 file > >> >> (pragma integrity_check generally will show a bunch of problems after > >> >> a short test). > >> >> > >> >> Is what we're trying to do achievable? We're testing using the docker > >> >> container gluster/gluster-centos as the three servers, with a php > test > >> >> inside of php-cli using filesystem mounts. If we mount the gluster FS > >> >> via sapk/plugin-gluster into the php-cli containers using docker, we > >> >> seem to have better success sometimes, but I haven't figured out why, > >> >> yet. > >> >> > >> >> I did see that I needed to set the server volume parameter > >> >> 'performance.flush-behind off', otherwise it seems that flushes won't > >> >> block as would be needed by SQLite3. > >> > > >> > > >> > If you are relying on fsync this shouldn't matter as fsync makes sure > >> > data > >> > is synced to disk. > >> > > >> >> > >> >> Does anyone have any suggestions? Any words of widsom would be much > >> >> appreciated. > >> > > >> > > >> > Can you experiment with turning on/off various performance xlators? > >> > Based on > >> > earlier issues, its likely that there is stale metadata which might be > >> > causing the issue (not necessarily improper fsync behavior). I would > >> > suggest > >> > turning off all performance xlators. You can refer [1] for a related > >> > discussion. In theory the only perf xlator relevant for fsync is > >> > write-behind and I am not aware of any issues where fsync is not > >> > working. > >> > Does glusterfs log file has any messages complaining about writes or > >> > fsync > >> > failing? Does your application use O_DIRECT? If yes, please note that > >> > you > >> > need to turn the option performance.strict-o-direct on for > write-behind > >> > to > >> > honour O_DIRECT > >> > > >> > Also, is it possible to identify nature of corruption - Data or > >> > metadata? > >> > More detailed explanation will help to RCA the issue. > >> > > >> > Also, is your application running on a single mount or from multiple > >> > mounts? > >> > Can you collect strace of your application (strace -ff -T -p <pid> -o > >> > <file>)? If possible can you also collect fuse-dump using option > >> > --dump-fuse > >> > while mounting glusterfs? > >> > > >> > [1] > >> > > >> > http://lists.gluster.org/pipermail/gluster-users/2018- > February/033503.html > >> > > >> >> > >> >> Thanks, > >> >> > >> >> Paul > >> >> _______________________________________________ > >> >> Gluster-users mailing list > >> >> Gluster-users@gluster.org > >> >> http://lists.gluster.org/mailman/listinfo/gluster-users > >> > > >> > > > > > >
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users