+csaba On Tue, Feb 27, 2018 at 2:49 AM, Jim Prewett <downl...@carc.unm.edu> wrote:
> > Hello, > > I'm having problems when write-behind is enabled on Gluster 3.8.4. > > I have 2 Gluster servers each with a single brick that is mirrored between > them. The code causing these issues reads two data files each approx. 128G > in size. It opens a third file, mmap()'s that file, and subsequently reads > and writes to it. The third file, on sucessful runs (without write-behind > enabled) is ultimately approx. 224G in size. > What exactly is the problem you are facing with write-behind enabled? Is it that the file size is smaller? > The servers have the IP addresses 172.17.2.254 and 172.17.2.255 and the > client has the IP address 172.17.1.61. These are all IP over InfiniBand. > > I'm attaching logfiles for the brick and for the volume from each of the > servers and for the client. I'm also attaching the output of "gluster > volume info" and "gluster volume get <volume> all". > > I have only noticed problems with write-behind being enabled with this one > particular workload. When I ran it under strace, I see it seeking all over > the place and reading and writing little bits of data to/from the third > file. > What is the pattern you see when write-behind is disabled? Can you attach strace of the application for both scenarios - write-behind enabled and disabled? Can you also explain the workload and its data access pattern? > For now, I'm leaving write-behind disabled. What are the performance > implications of this for jobs that don't have this strange access pattern? > Disabling write-behind can bring down performance for sequential workloads. > My co-worker who usually maintains the Gluster filesystems here is busy > having a baby right now and I've gotten it while he's out, so I'm /really/ > new to Gluster and am not confident that anything is correct in my > configuration (nor do I have a specific reason to doubt its correctness! :) > > I have checked the InfiniBand fabric for errors and do not see any beyond > the normal PortXmitWait counter. There is no firewall on any of these > machines. Their system clocks seem to all be synchronized. > > Is there anything additional I can provide to help diagnose this problem? > > Thanks for any help you can provide! :) > > Jim > > James E. Prewett j...@prewett.org downl...@hpc.unm.edu > Systems Team Leader LoGS: http://www.hpc.unm.edu/~download/LoGS/ > Designated Security Officer OpenPGP key: pub 1024D/31816D93 > HPC Systems Engineer III UNM HPC 505.277.8210 > _______________________________________________ > Gluster-users mailing list > Gluster-users@gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users >
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users