On Sat, May 13, 2017 at 8:44 AM, Pranith Kumar Karampuri < pkara...@redhat.com> wrote:
> > > On Fri, May 12, 2017 at 8:04 PM, Pat Haley <pha...@mit.edu> wrote: > >> >> Hi Pranith, >> >> My question was about setting up a gluster volume on an ext4 partition. >> I thought we had the bricks mounted as xfs for compatibility with gluster? >> > > Oh that should not be a problem. It works fine. > Just that xfs doesn't have limits for anything, where as ext4 does for things like hardlinks etc(At least last time I checked :-) ). So it is better to have xfs. > > >> >> Pat >> >> >> >> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri wrote: >> >> >> >> On Thu, May 11, 2017 at 9:32 PM, Pat Haley <pha...@mit.edu> wrote: >> >>> >>> Hi Pranith, >>> >>> The /home partition is mounted as ext4 >>> /home ext4 defaults,usrquota,grpquota 1 2 >>> >>> The brick partitions are mounted ax xfs >>> /mnt/brick1 xfs defaults 0 0 >>> /mnt/brick2 xfs defaults 0 0 >>> >>> Will this cause a problem with creating a volume under /home? >>> >> >> I don't think the bottleneck is disk. You can do the same tests you did >> on your new volume to confirm? >> >> >>> >>> Pat >>> >>> >>> >>> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri wrote: >>> >>> >>> >>> On Thu, May 11, 2017 at 8:57 PM, Pat Haley <pha...@mit.edu> wrote: >>> >>>> >>>> Hi Pranith, >>>> >>>> Unfortunately, we don't have similar hardware for a small scale test. >>>> All we have is our production hardware. >>>> >>> >>> You said something about /home partition which has lesser disks, we can >>> create plain distribute volume inside one of those directories. After we >>> are done, we can remove the setup. What do you say? >>> >>> >>>> >>>> Pat >>>> >>>> >>>> >>>> >>>> On 05/11/2017 07:05 AM, Pranith Kumar Karampuri wrote: >>>> >>>> >>>> >>>> On Thu, May 11, 2017 at 2:48 AM, Pat Haley <pha...@mit.edu> wrote: >>>> >>>>> >>>>> Hi Pranith, >>>>> >>>>> Since we are mounting the partitions as the bricks, I tried the dd >>>>> test writing to <brick-path>/.glusterfs/<file-to-be-removed-after-test>. >>>>> The results without oflag=sync were 1.6 Gb/s (faster than gluster but not >>>>> as fast as I was expecting given the 1.2 Gb/s to the no-gluster area w/ >>>>> fewer disks). >>>>> >>>> >>>> Okay, then 1.6Gb/s is what we need to target for, considering your >>>> volume is just distribute. Is there any way you can do tests on similar >>>> hardware but at a small scale? Just so we can run the workload to learn >>>> more about the bottlenecks in the system? We can probably try to get the >>>> speed to 1.2Gb/s on your /home partition you were telling me yesterday. Let >>>> me know if that is something you are okay to do. >>>> >>>> >>>>> >>>>> Pat >>>>> >>>>> >>>>> >>>>> On 05/10/2017 01:27 PM, Pranith Kumar Karampuri wrote: >>>>> >>>>> >>>>> >>>>> On Wed, May 10, 2017 at 10:15 PM, Pat Haley <pha...@mit.edu> wrote: >>>>> >>>>>> >>>>>> Hi Pranith, >>>>>> >>>>>> Not entirely sure (this isn't my area of expertise). I'll run your >>>>>> answer by some other people who are more familiar with this. >>>>>> >>>>>> I am also uncertain about how to interpret the results when we also >>>>>> add the dd tests writing to the /home area (no gluster, still on the same >>>>>> machine) >>>>>> >>>>>> - dd test without oflag=sync (rough average of multiple tests) >>>>>> - gluster w/ fuse mount : 570 Mb/s >>>>>> - gluster w/ nfs mount: 390 Mb/s >>>>>> - nfs (no gluster): 1.2 Gb/s >>>>>> - dd test with oflag=sync (rough average of multiple tests) >>>>>> - gluster w/ fuse mount: 5 Mb/s >>>>>> - gluster w/ nfs mount: 200 Mb/s >>>>>> - nfs (no gluster): 20 Mb/s >>>>>> >>>>>> Given that the non-gluster area is a RAID-6 of 4 disks while each >>>>>> brick of the gluster area is a RAID-6 of 32 disks, I would naively expect >>>>>> the writes to the gluster area to be roughly 8x faster than to the >>>>>> non-gluster. >>>>>> >>>>> >>>>> I think a better test is to try and write to a file using nfs without >>>>> any gluster to a location that is not inside the brick but someother >>>>> location that is on same disk(s). If you are mounting the partition as the >>>>> brick, then we can write to a file inside .glusterfs directory, something >>>>> like <brick-path>/.glusterfs/<file-to-be-removed-after-test>. >>>>> >>>>> >>>>>> I still think we have a speed issue, I can't tell if fuse vs nfs is >>>>>> part of the problem. >>>>>> >>>>> >>>>> I got interested in the post because I read that fuse speed is lesser >>>>> than nfs speed which is counter-intuitive to my understanding. So wanted >>>>> clarifications. Now that I got my clarifications where fuse outperformed >>>>> nfs without sync, we can resume testing as described above and try to find >>>>> what it is. Based on your email-id I am guessing you are from Boston and I >>>>> am from Bangalore so if you are okay with doing this debugging for >>>>> multiple >>>>> days because of timezones, I will be happy to help. Please be a bit >>>>> patient >>>>> with me, I am under a release crunch but I am very curious with the >>>>> problem >>>>> you posted. >>>>> >>>>> Was there anything useful in the profiles? >>>>>> >>>>> >>>>> Unfortunately profiles didn't help me much, I think we are collecting >>>>> the profiles from an active volume, so it has a lot of information that is >>>>> not pertaining to dd so it is difficult to find the contributions of dd. >>>>> So >>>>> I went through your post again and found something I didn't pay much >>>>> attention to earlier i.e. oflag=sync, so did my own tests on my setup with >>>>> FUSE so sent that reply. >>>>> >>>>> >>>>>> >>>>>> Pat >>>>>> >>>>>> >>>>>> >>>>>> On 05/10/2017 12:15 PM, Pranith Kumar Karampuri wrote: >>>>>> >>>>>> Okay good. At least this validates my doubts. Handling O_SYNC in >>>>>> gluster NFS and fuse is a bit different. >>>>>> When application opens a file with O_SYNC on fuse mount then each >>>>>> write syscall has to be written to disk as part of the syscall where as >>>>>> in >>>>>> case of NFS, there is no concept of open. NFS performs write though a >>>>>> handle saying it needs to be a synchronous write, so write() syscall is >>>>>> performed first then it performs fsync(). so an write on an fd with >>>>>> O_SYNC >>>>>> becomes write+fsync. I am suspecting that when multiple threads do this >>>>>> write+fsync() operation on the same file, multiple writes are batched >>>>>> together to be written do disk so the throughput on the disk is >>>>>> increasing >>>>>> is my guess. >>>>>> >>>>>> Does it answer your doubts? >>>>>> >>>>>> On Wed, May 10, 2017 at 9:35 PM, Pat Haley <pha...@mit.edu> wrote: >>>>>> >>>>>>> >>>>>>> Without the oflag=sync and only a single test of each, the FUSE is >>>>>>> going faster than NFS: >>>>>>> >>>>>>> FUSE: >>>>>>> mseas-data2(dri_nascar)% dd if=/dev/zero count=4096 bs=1048576 >>>>>>> of=zeros.txt conv=sync >>>>>>> 4096+0 records in >>>>>>> 4096+0 records out >>>>>>> 4294967296 bytes (4.3 GB) copied, 7.46961 s, 575 MB/s >>>>>>> >>>>>>> >>>>>>> NFS >>>>>>> mseas-data2(HYCOM)% dd if=/dev/zero count=4096 bs=1048576 >>>>>>> of=zeros.txt conv=sync >>>>>>> 4096+0 records in >>>>>>> 4096+0 records out >>>>>>> 4294967296 bytes (4.3 GB) copied, 11.4264 s, 376 MB/s >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 05/10/2017 11:53 AM, Pranith Kumar Karampuri wrote: >>>>>>> >>>>>>> Could you let me know the speed without oflag=sync on both the >>>>>>> mounts? No need to collect profiles. >>>>>>> >>>>>>> On Wed, May 10, 2017 at 9:17 PM, Pat Haley <pha...@mit.edu> wrote: >>>>>>> >>>>>>>> >>>>>>>> Here is what I see now: >>>>>>>> >>>>>>>> [root@mseas-data2 ~]# gluster volume info >>>>>>>> >>>>>>>> Volume Name: data-volume >>>>>>>> Type: Distribute >>>>>>>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18 >>>>>>>> Status: Started >>>>>>>> Number of Bricks: 2 >>>>>>>> Transport-type: tcp >>>>>>>> Bricks: >>>>>>>> Brick1: mseas-data2:/mnt/brick1 >>>>>>>> Brick2: mseas-data2:/mnt/brick2 >>>>>>>> Options Reconfigured: >>>>>>>> diagnostics.count-fop-hits: on >>>>>>>> diagnostics.latency-measurement: on >>>>>>>> nfs.exports-auth-enable: on >>>>>>>> diagnostics.brick-sys-log-level: WARNING >>>>>>>> performance.readdir-ahead: on >>>>>>>> nfs.disable: on >>>>>>>> nfs.export-volumes: off >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 05/10/2017 11:44 AM, Pranith Kumar Karampuri wrote: >>>>>>>> >>>>>>>> Is this the volume info you have? >>>>>>>> >>>>>>>> >* [root at mseas-data2 >>>>>>>> ><http://www.gluster.org/mailman/listinfo/gluster-users> ~]# gluster >>>>>>>> >volume info >>>>>>>> *>>* Volume Name: data-volume >>>>>>>> *>* Type: Distribute >>>>>>>> *>* Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18 >>>>>>>> *>* Status: Started >>>>>>>> *>* Number of Bricks: 2 >>>>>>>> *>* Transport-type: tcp >>>>>>>> *>* Bricks: >>>>>>>> *>* Brick1: mseas-data2:/mnt/brick1 >>>>>>>> *>* Brick2: mseas-data2:/mnt/brick2 >>>>>>>> *>* Options Reconfigured: >>>>>>>> *>* performance.readdir-ahead: on >>>>>>>> *>* nfs.disable: on >>>>>>>> *>* nfs.export-volumes: off >>>>>>>> >>>>>>>> * >>>>>>>> >>>>>>>> I copied this from old thread from 2016. This is distribute >>>>>>>> volume. Did you change any of the options in between? >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>> Pat Haley Email: pha...@mit.edu >>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >>>>>>>> 77 Massachusetts Avenue >>>>>>>> Cambridge, MA 02139-4301 >>>>>>>> >>>>>>>> -- >>>>>>> Pranith >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>> Pat Haley Email: pha...@mit.edu >>>>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >>>>>>> 77 Massachusetts Avenue >>>>>>> Cambridge, MA 02139-4301 >>>>>>> >>>>>>> -- >>>>>> Pranith >>>>>> >>>>>> -- >>>>>> >>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>> Pat Haley Email: pha...@mit.edu >>>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >>>>>> 77 Massachusetts Avenue >>>>>> Cambridge, MA 02139-4301 >>>>>> >>>>>> -- >>>>> Pranith >>>>> >>>>> -- >>>>> >>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>> Pat Haley Email: pha...@mit.edu >>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >>>>> 77 Massachusetts Avenue >>>>> Cambridge, MA 02139-4301 >>>>> >>>>> -- >>>> Pranith >>>> >>>> -- >>>> >>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>> Pat Haley Email: pha...@mit.edu >>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >>>> 77 Massachusetts Avenue >>>> Cambridge, MA 02139-4301 >>>> >>>> -- >>> Pranith >>> >>> -- >>> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>> Pat Haley Email: pha...@mit.edu >>> Center for Ocean Engineering Phone: (617) 253-6824 >>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >>> 77 Massachusetts Avenue >>> Cambridge, MA 02139-4301 >>> >>> -- >> Pranith >> >> -- >> >> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >> Pat Haley Email: pha...@mit.edu >> Center for Ocean Engineering Phone: (617) 253-6824 >> Dept. of Mechanical Engineering Fax: (617) 253-8125 >> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >> 77 Massachusetts Avenue >> Cambridge, MA 02139-4301 >> >> > > > -- > Pranith > -- Pranith
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users