Re: [Gluster-users] ganesha.nfsd process dies when copying files

2018-08-16 Thread Karli Sjöberg
Den 15 aug. 2018 13:14 skrev Karli Sjöberg :On Wed, 2018-08-15 at 13:42 +0800, Pui Edylie wrote:> Hi Karli,> > I think Alex is right in regards with the NFS version and state.> > I am only using NFSv3 and the failover is working per expectation.OK, so I've remade the test again and it goes like this:1) Start copy loop[*]2) Power off hv023) Copy loop stalls indefinitelyI have attached a snippet of the ctdb log that looks interesting butdoesn't say much to me execpt that something's wrong:)[*]: while true; do mount -o vers=3 hv03v.localdomain:/data /mnt/; ddif=/var/tmp/test.bin of=/mnt/test.bin bs=1M status=progress; rm -fv/mnt/test.bin; umount /mnt; doneThanks in advance!/KCould someone just confirm to me if this is the correct result for this scenario?Aren't you supposed to be able to reboot a host in the cluster without compromising it?/K> > In my use case, I have 3 nodes with ESXI 6.7 as OS and setup 1x > gluster VM on each of the ESXI host using its local datastore.> > Once I have formed the replicate 3, I use the CTDB VIP to present the> NFS3 back to the Vcenter and uses it as a shared storage.> > Everything works great other than performance is not very good ... I> am still looking for ways to improve it.> > Cheers,> Edy> > On 8/15/2018 12:25 AM, Alex Chekholko wrote:> > Hi Karli,> > > > I'm not 100% sure this is related, but when I set up my ZFS NFS HA> > per https://github.com/ewwhite/zfs-ha/wiki I was not able to get> > the failover to work with NFS v4 but only with NFS v3.> > > > From the client point of view, it really looked like with NFS v4> > there is an open file handle and that just goes stale and hangs, or> > something like that, whereas with NFSv3 the client retries and> > recovers and continues.  I did not investigate further, I just use> > v3.  I think it has something to do with NFSv4 being "stateful" and> > NFSv3 being "stateless".> > > > Can you re-run your test but using NFSv3 on the client mount?  Or> > do you need to use v4.x?> > > > Regards,> > Alex> > > > On Tue, Aug 14, 2018 at 6:11 AM Karli Sjöberg > > wrote:> > > On Fri, 2018-08-10 at 09:39 -0400, Kaleb S. KEITHLEY wrote:> > > > On 08/10/2018 09:23 AM, Karli Sjöberg wrote:> > > > > On Fri, 2018-08-10 at 21:23 +0800, Pui Edylie wrote:> > > > > > Hi Karli,> > > > > > > > > > > > Storhaug works with glusterfs 4.1.2 and latest nfs-ganesha.> > > > > > > > > > > > I just installed them last weekend ... they are working> > > very well> > > > > > :)> > > > > > > > > > Okay, awesome!> > > > > > > > > > Is there any documentation on how to do that?> > > > > > > > > > > > > https://github.com/gluster/storhaug/wiki> > > > > > > > > > Thanks Kaleb and Edy!> > > > > > I have now redone the cluster using the latest and greatest> > > following> > > the above guide and repeated the same test I was doing before> > > (the> > > rsync while loop) with success. I let (forgot) it run for about a> > > day> > > and it was still chugging along nicely when I aborted it, so> > > success> > > there!> > > > > > On to the next test; the catastrophic failure test- where one of> > > the> > > servers dies, I'm having a more difficult time with.> > > > > > 1) I start with mounting the share over NFS 4.1 and then proceed> > > with> > > writing a 8 GiB large random data file with 'dd', while "hard-> > > cutting"> > > the power to the server I'm writing to, the transfer just stops> > > indefinitely, until the server comes back again. Is that supposed> > > to> > > happen? Like this:> > > > > > # dd if=/dev/urandom of=/var/tmp/test.bin bs=1M count=8192> > > # mount -o vers=4.1 hv03v.localdomain:/data /mnt/> > > # dd if=/var/tmp/test.bin of=/mnt/test.bin bs=1M status=progress> > > 2434793472 bytes (2,4 GB, 2,3 GiB) copied, 42 s, 57,9 MB/s> > > > > > (here I cut the power and let it be for almost two hours before> > > turning> > > it on again)> > > > > > dd: error writing '/mnt/test.bin': Remote I/O error> > > 2325+0 records in> > > 2324+0 records out> > > 2436890624 bytes (2,4 GB, 2,3 GiB) copied, 6944,84 s, 351 kB/s> > > # umount /mnt> > > > > > Here the unmount command hung and I had to hard reset the client.> > > > > > 2) Another question I have is why some files "change" as you copy> > > them> > > out to the Gluster storage? Is that the way it should be? This> > > time, I> > > deleted eveything in the destination directory to start over:> > > > > > # mount -o vers=4.1 hv03v.localdomain:/data /mnt/> > > # rm -f /mnt/test.bin> > > # dd if=/var/tmp/test.bin of=/mnt/test.bin bs=1M status=progress> > > 8557428736 bytes (8,6 GB, 8,0 GiB) copied, 122 s, 70,1 MB/s> > > 8192+0 records in> > > 8192+0 records out> > > 8589934592 bytes (8,6 GB, 8,0 GiB) copied, 123,039 s, 69,8 MB/s> > > # md5sum /var/tmp/test.bin > > > 073867b68fa8eaa382ffe05adb90b583  /var/tmp/test.bin> > > # md5sum /mnt/test.bin > > > 634187d367f856f3f5fb31846f796397  /mnt/test.bin> > > # umount /mnt> > > > > > Thanks in advance!> > > > > > /K> > > ___> > > 

Re: [Gluster-users] ganesha.nfsd process dies when copying files

2018-08-15 Thread Karli Sjöberg
On Wed, 2018-08-15 at 13:42 +0800, Pui Edylie wrote:
> Hi Karli,
> 
> I think Alex is right in regards with the NFS version and state.
> 
> I am only using NFSv3 and the failover is working per expectation.

OK, so I've remade the test again and it goes like this:

1) Start copy loop[*]
2) Power off hv02
3) Copy loop stalls indefinitely

I have attached a snippet of the ctdb log that looks interesting but
doesn't say much to me execpt that something's wrong:)

[*]: while true; do mount -o vers=3 hv03v.localdomain:/data /mnt/; dd
if=/var/tmp/test.bin of=/mnt/test.bin bs=1M status=progress; rm -fv
/mnt/test.bin; umount /mnt; done

Thanks in advance!

/K

> 
> In my use case, I have 3 nodes with ESXI 6.7 as OS and setup 1x 
> gluster VM on each of the ESXI host using its local datastore.
> 
> Once I have formed the replicate 3, I use the CTDB VIP to present the
> NFS3 back to the Vcenter and uses it as a shared storage.
> 
> Everything works great other than performance is not very good ... I
> am still looking for ways to improve it.
> 
> Cheers,
> Edy
> 
> On 8/15/2018 12:25 AM, Alex Chekholko wrote:
> > Hi Karli,
> > 
> > I'm not 100% sure this is related, but when I set up my ZFS NFS HA
> > per https://github.com/ewwhite/zfs-ha/wiki I was not able to get
> > the failover to work with NFS v4 but only with NFS v3.
> > 
> > From the client point of view, it really looked like with NFS v4
> > there is an open file handle and that just goes stale and hangs, or
> > something like that, whereas with NFSv3 the client retries and
> > recovers and continues.  I did not investigate further, I just use
> > v3.  I think it has something to do with NFSv4 being "stateful" and
> > NFSv3 being "stateless".
> > 
> > Can you re-run your test but using NFSv3 on the client mount?  Or
> > do you need to use v4.x?
> > 
> > Regards,
> > Alex
> > 
> > On Tue, Aug 14, 2018 at 6:11 AM Karli Sjöberg 
> > wrote:
> > > On Fri, 2018-08-10 at 09:39 -0400, Kaleb S. KEITHLEY wrote:
> > > > On 08/10/2018 09:23 AM, Karli Sjöberg wrote:
> > > > > On Fri, 2018-08-10 at 21:23 +0800, Pui Edylie wrote:
> > > > > > Hi Karli,
> > > > > > 
> > > > > > Storhaug works with glusterfs 4.1.2 and latest nfs-ganesha.
> > > > > > 
> > > > > > I just installed them last weekend ... they are working
> > > very well
> > > > > > :)
> > > > > 
> > > > > Okay, awesome!
> > > > > 
> > > > > Is there any documentation on how to do that?
> > > > > 
> > > > 
> > > > https://github.com/gluster/storhaug/wiki
> > > > 
> > > 
> > > Thanks Kaleb and Edy!
> > > 
> > > I have now redone the cluster using the latest and greatest
> > > following
> > > the above guide and repeated the same test I was doing before
> > > (the
> > > rsync while loop) with success. I let (forgot) it run for about a
> > > day
> > > and it was still chugging along nicely when I aborted it, so
> > > success
> > > there!
> > > 
> > > On to the next test; the catastrophic failure test- where one of
> > > the
> > > servers dies, I'm having a more difficult time with.
> > > 
> > > 1) I start with mounting the share over NFS 4.1 and then proceed
> > > with
> > > writing a 8 GiB large random data file with 'dd', while "hard-
> > > cutting"
> > > the power to the server I'm writing to, the transfer just stops
> > > indefinitely, until the server comes back again. Is that supposed
> > > to
> > > happen? Like this:
> > > 
> > > # dd if=/dev/urandom of=/var/tmp/test.bin bs=1M count=8192
> > > # mount -o vers=4.1 hv03v.localdomain:/data /mnt/
> > > # dd if=/var/tmp/test.bin of=/mnt/test.bin bs=1M status=progress
> > > 2434793472 bytes (2,4 GB, 2,3 GiB) copied, 42 s, 57,9 MB/s
> > > 
> > > (here I cut the power and let it be for almost two hours before
> > > turning
> > > it on again)
> > > 
> > > dd: error writing '/mnt/test.bin': Remote I/O error
> > > 2325+0 records in
> > > 2324+0 records out
> > > 2436890624 bytes (2,4 GB, 2,3 GiB) copied, 6944,84 s, 351 kB/s
> > > # umount /mnt
> > > 
> > > Here the unmount command hung and I had to hard reset the client.
> > > 
> > > 2) Another question I have is why some files "change" as you copy
> > > them
> > > out to the Gluster storage? Is that the way it should be? This
> > > time, I
> > > deleted eveything in the destination directory to start over:
> > > 
> > > # mount -o vers=4.1 hv03v.localdomain:/data /mnt/
> > > # rm -f /mnt/test.bin
> > > # dd if=/var/tmp/test.bin of=/mnt/test.bin bs=1M status=progress
> > > 8557428736 bytes (8,6 GB, 8,0 GiB) copied, 122 s, 70,1 MB/s
> > > 8192+0 records in
> > > 8192+0 records out
> > > 8589934592 bytes (8,6 GB, 8,0 GiB) copied, 123,039 s, 69,8 MB/s
> > > # md5sum /var/tmp/test.bin 
> > > 073867b68fa8eaa382ffe05adb90b583  /var/tmp/test.bin
> > > # md5sum /mnt/test.bin 
> > > 634187d367f856f3f5fb31846f796397  /mnt/test.bin
> > > # umount /mnt
> > > 
> > > Thanks in advance!
> > > 
> > > /K
> > > ___
> > > Gluster-users mailing list
> > > Gluster-users@gluster.org
> > > 

Re: [Gluster-users] ganesha.nfsd process dies when copying files

2018-08-14 Thread Pui Edylie

Hi Karli,

I think Alex is right in regards with the NFS version and state.

I am only using NFSv3 and the failover is working per expectation.

In my use case, I have 3 nodes with ESXI 6.7 as OS and setup 1x gluster 
VM on each of the ESXI host using its local datastore.


Once I have formed the replicate 3, I use the CTDB VIP to present the 
NFS3 back to the Vcenter and uses it as a shared storage.


Everything works great other than performance is not very good ... I am 
still looking for ways to improve it.


Cheers,
Edy


On 8/15/2018 12:25 AM, Alex Chekholko wrote:

Hi Karli,

I'm not 100% sure this is related, but when I set up my ZFS NFS HA per 
https://github.com/ewwhite/zfs-ha/wiki I was not able to get the 
failover to work with NFS v4 but only with NFS v3.


From the client point of view, it really looked like with NFS v4 there 
is an open file handle and that just goes stale and hangs, or 
something like that, whereas with NFSv3 the client retries and 
recovers and continues.  I did not investigate further, I just use 
v3.  I think it has something to do with NFSv4 being "stateful" and 
NFSv3 being "stateless".


Can you re-run your test but using NFSv3 on the client mount?  Or do 
you need to use v4.x?


Regards,
Alex

On Tue, Aug 14, 2018 at 6:11 AM Karli Sjöberg > wrote:


On Fri, 2018-08-10 at 09:39 -0400, Kaleb S. KEITHLEY wrote:
> On 08/10/2018 09:23 AM, Karli Sjöberg wrote:
> > On Fri, 2018-08-10 at 21:23 +0800, Pui Edylie wrote:
> > > Hi Karli,
> > >
> > > Storhaug works with glusterfs 4.1.2 and latest nfs-ganesha.
> > >
> > > I just installed them last weekend ... they are working very
well
> > > :)
> >
> > Okay, awesome!
> >
> > Is there any documentation on how to do that?
> >
>
> https://github.com/gluster/storhaug/wiki
>

Thanks Kaleb and Edy!

I have now redone the cluster using the latest and greatest following
the above guide and repeated the same test I was doing before (the
rsync while loop) with success. I let (forgot) it run for about a day
and it was still chugging along nicely when I aborted it, so success
there!

On to the next test; the catastrophic failure test- where one of the
servers dies, I'm having a more difficult time with.

1) I start with mounting the share over NFS 4.1 and then proceed with
writing a 8 GiB large random data file with 'dd', while "hard-cutting"
the power to the server I'm writing to, the transfer just stops
indefinitely, until the server comes back again. Is that supposed to
happen? Like this:

# dd if=/dev/urandom of=/var/tmp/test.bin bs=1M count=8192
# mount -o vers=4.1 hv03v.localdomain:/data /mnt/
# dd if=/var/tmp/test.bin of=/mnt/test.bin bs=1M status=progress
2434793472 bytes (2,4 GB, 2,3 GiB) copied, 42 s, 57,9 MB/s

(here I cut the power and let it be for almost two hours before
turning
it on again)

dd: error writing '/mnt/test.bin': Remote I/O error
2325+0 records in
2324+0 records out
2436890624 bytes (2,4 GB, 2,3 GiB) copied, 6944,84 s, 351 kB/s
# umount /mnt

Here the unmount command hung and I had to hard reset the client.

2) Another question I have is why some files "change" as you copy them
out to the Gluster storage? Is that the way it should be? This time, I
deleted eveything in the destination directory to start over:

# mount -o vers=4.1 hv03v.localdomain:/data /mnt/
# rm -f /mnt/test.bin
# dd if=/var/tmp/test.bin of=/mnt/test.bin bs=1M status=progress
8557428736 bytes (8,6 GB, 8,0 GiB) copied, 122 s, 70,1 MB/s
8192+0 records in
8192+0 records out
8589934592 bytes (8,6 GB, 8,0 GiB) copied, 123,039 s, 69,8 MB/s
# md5sum /var/tmp/test.bin
073867b68fa8eaa382ffe05adb90b583  /var/tmp/test.bin
# md5sum /mnt/test.bin
634187d367f856f3f5fb31846f796397  /mnt/test.bin
# umount /mnt

Thanks in advance!

/K
___
Gluster-users mailing list
Gluster-users@gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-users



___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] ganesha.nfsd process dies when copying files

2018-08-14 Thread Karli Sjöberg
Den 15 aug. 2018 07:43 skrev Pui Edylie :
Hi Karli,
  
  I think Alex is right in regards with the NFS version and state.Yeah, I'm setting up the tests now, I'll report back once it's done!
  
  I am only using NFSv3 and the failover is working per expectation.
  
  In my use case, I have 3 nodes with ESXI 6.7 as OS and setup 1x 
  gluster VM on each of the ESXI host using its local datastore.
  
  Once I have formed the replicate 3, I use the CTDB VIP to present
  the NFS3 back to the Vcenter and uses it as a shared storage.
  
  Everything works great other than performance is not very good ...
  I am still looking for ways to improve it.The obvious way would be to use oVirt instead of VMWare;)/K
  
  Cheers,
  Edy


On 8/15/2018 12:25 AM, Alex Chekholko
  wrote:


  
  Hi Karli,


I'm not 100% sure this is related, but when I set up my ZFS
  NFS HA per https://github.com/ewwhite/zfs-ha/wiki
  I was not able to get the failover to work with NFS v4 but
  only with NFS v3.


From the client point of view, it really looked like with
  NFS v4 there is an open file handle and that just goes stale
  and hangs, or something like that, whereas with NFSv3 the
  client retries and recovers and continues.  I did not
  investigate further, I just use v3.  I think it has something
  to do with NFSv4 being "stateful" and NFSv3 being "stateless".


Can you re-run your test but using NFSv3 on the client
  mount?  Or do you need to use v4.x?


Regards,
Alex
  
  
  
On Tue, Aug 14, 2018 at 6:11 AM Karli Sjöberg
   wrote:

On Fri,
  2018-08-10 at 09:39 -0400, Kaleb S. KEITHLEY wrote:
  > On 08/10/2018 09:23 AM, Karli Sjöberg wrote:
  > > On Fri, 2018-08-10 at 21:23 +0800, Pui Edylie wrote:
  > > > Hi Karli,
  > > > 
  > > > Storhaug works with glusterfs 4.1.2 and latest
  nfs-ganesha.
  > > > 
  > > > I just installed them last weekend ... they are
  working very well
  > > > :)
  > > 
  > > Okay, awesome!
  > > 
  > > Is there any documentation on how to do that?
  > > 
  > 
  > https://github.com/gluster/storhaug/wiki
  > 
  
  Thanks Kaleb and Edy!
  
  I have now redone the cluster using the latest and greatest
  following
  the above guide and repeated the same test I was doing before
  (the
  rsync while loop) with success. I let (forgot) it run for
  about a day
  and it was still chugging along nicely when I aborted it, so
  success
  there!
  
  On to the next test; the catastrophic failure test- where one
  of the
  servers dies, I'm having a more difficult time with.
  
  1) I start with mounting the share over NFS 4.1 and then
  proceed with
  writing a 8 GiB large random data file with 'dd', while
  "hard-cutting"
  the power to the server I'm writing to, the transfer just
  stops
  indefinitely, until the server comes back again. Is that
  supposed to
  happen? Like this:
  
  # dd if=/dev/urandom of=/var/tmp/test.bin bs=1M count=8192
  # mount -o vers=4.1 hv03v.localdomain:/data /mnt/
  # dd if=/var/tmp/test.bin of=/mnt/test.bin bs=1M
  status=progress
  2434793472 bytes (2,4 GB, 2,3 GiB) copied, 42 s, 57,9 MB/s
  
  (here I cut the power and let it be for almost two hours
  before turning
  it on again)
  
  dd: error writing '/mnt/test.bin': Remote I/O error
  2325+0 records in
  2324+0 records out
  2436890624 bytes (2,4 GB, 2,3 GiB) copied, 6944,84 s, 351 kB/s
  # umount /mnt
  
  Here the unmount command hung and I had to hard reset the
  client.
  
  2) Another question I have is why some files "change" as you
  copy them
  out to the Gluster storage? Is that the way it should be? This
  time, I
  deleted eveything in the destination directory to start over:
  
  # mount -o vers=4.1 hv03v.localdomain:/data /mnt/
  # rm -f /mnt/test.bin
  # dd if=/var/tmp/test.bin of=/mnt/test.bin bs=1M
  status=progress
  8557428736 bytes (8,6 GB, 8,0 GiB) copied, 122 s, 70,1 MB/s
  8192+0 records in
  8192+0 records out
  8589934592 bytes (8,6 GB, 8,0 GiB) copied, 123,039 s, 69,8
  MB/s
  # md5sum /var/tmp/test.bin 
  

Re: [Gluster-users] ganesha.nfsd process dies when copying files

2018-08-14 Thread Alex Chekholko
Hi Karli,

I'm not 100% sure this is related, but when I set up my ZFS NFS HA per
https://github.com/ewwhite/zfs-ha/wiki I was not able to get the failover
to work with NFS v4 but only with NFS v3.

>From the client point of view, it really looked like with NFS v4 there is
an open file handle and that just goes stale and hangs, or something like
that, whereas with NFSv3 the client retries and recovers and continues.  I
did not investigate further, I just use v3.  I think it has something to do
with NFSv4 being "stateful" and NFSv3 being "stateless".

Can you re-run your test but using NFSv3 on the client mount?  Or do you
need to use v4.x?

Regards,
Alex

On Tue, Aug 14, 2018 at 6:11 AM Karli Sjöberg  wrote:

> On Fri, 2018-08-10 at 09:39 -0400, Kaleb S. KEITHLEY wrote:
> > On 08/10/2018 09:23 AM, Karli Sjöberg wrote:
> > > On Fri, 2018-08-10 at 21:23 +0800, Pui Edylie wrote:
> > > > Hi Karli,
> > > >
> > > > Storhaug works with glusterfs 4.1.2 and latest nfs-ganesha.
> > > >
> > > > I just installed them last weekend ... they are working very well
> > > > :)
> > >
> > > Okay, awesome!
> > >
> > > Is there any documentation on how to do that?
> > >
> >
> > https://github.com/gluster/storhaug/wiki
> >
>
> Thanks Kaleb and Edy!
>
> I have now redone the cluster using the latest and greatest following
> the above guide and repeated the same test I was doing before (the
> rsync while loop) with success. I let (forgot) it run for about a day
> and it was still chugging along nicely when I aborted it, so success
> there!
>
> On to the next test; the catastrophic failure test- where one of the
> servers dies, I'm having a more difficult time with.
>
> 1) I start with mounting the share over NFS 4.1 and then proceed with
> writing a 8 GiB large random data file with 'dd', while "hard-cutting"
> the power to the server I'm writing to, the transfer just stops
> indefinitely, until the server comes back again. Is that supposed to
> happen? Like this:
>
> # dd if=/dev/urandom of=/var/tmp/test.bin bs=1M count=8192
> # mount -o vers=4.1 hv03v.localdomain:/data /mnt/
> # dd if=/var/tmp/test.bin of=/mnt/test.bin bs=1M status=progress
> 2434793472 bytes (2,4 GB, 2,3 GiB) copied, 42 s, 57,9 MB/s
>
> (here I cut the power and let it be for almost two hours before turning
> it on again)
>
> dd: error writing '/mnt/test.bin': Remote I/O error
> 2325+0 records in
> 2324+0 records out
> 2436890624 bytes (2,4 GB, 2,3 GiB) copied, 6944,84 s, 351 kB/s
> # umount /mnt
>
> Here the unmount command hung and I had to hard reset the client.
>
> 2) Another question I have is why some files "change" as you copy them
> out to the Gluster storage? Is that the way it should be? This time, I
> deleted eveything in the destination directory to start over:
>
> # mount -o vers=4.1 hv03v.localdomain:/data /mnt/
> # rm -f /mnt/test.bin
> # dd if=/var/tmp/test.bin of=/mnt/test.bin bs=1M status=progress
> 8557428736 bytes (8,6 GB, 8,0 GiB) copied, 122 s, 70,1 MB/s
> 8192+0 records in
> 8192+0 records out
> 8589934592 bytes (8,6 GB, 8,0 GiB) copied, 123,039 s, 69,8 MB/s
> # md5sum /var/tmp/test.bin
> 073867b68fa8eaa382ffe05adb90b583  /var/tmp/test.bin
> # md5sum /mnt/test.bin
> 634187d367f856f3f5fb31846f796397  /mnt/test.bin
> # umount /mnt
>
> Thanks in advance!
>
> /K
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] ganesha.nfsd process dies when copying files

2018-08-14 Thread Karli Sjöberg
On Fri, 2018-08-10 at 09:39 -0400, Kaleb S. KEITHLEY wrote:
> On 08/10/2018 09:23 AM, Karli Sjöberg wrote:
> > On Fri, 2018-08-10 at 21:23 +0800, Pui Edylie wrote:
> > > Hi Karli,
> > > 
> > > Storhaug works with glusterfs 4.1.2 and latest nfs-ganesha.
> > > 
> > > I just installed them last weekend ... they are working very well
> > > :)
> > 
> > Okay, awesome!
> > 
> > Is there any documentation on how to do that?
> > 
> 
> https://github.com/gluster/storhaug/wiki
> 

Thanks Kaleb and Edy!

I have now redone the cluster using the latest and greatest following
the above guide and repeated the same test I was doing before (the
rsync while loop) with success. I let (forgot) it run for about a day
and it was still chugging along nicely when I aborted it, so success
there!

On to the next test; the catastrophic failure test- where one of the
servers dies, I'm having a more difficult time with.

1) I start with mounting the share over NFS 4.1 and then proceed with
writing a 8 GiB large random data file with 'dd', while "hard-cutting"
the power to the server I'm writing to, the transfer just stops
indefinitely, until the server comes back again. Is that supposed to
happen? Like this:

# dd if=/dev/urandom of=/var/tmp/test.bin bs=1M count=8192
# mount -o vers=4.1 hv03v.localdomain:/data /mnt/
# dd if=/var/tmp/test.bin of=/mnt/test.bin bs=1M status=progress
2434793472 bytes (2,4 GB, 2,3 GiB) copied, 42 s, 57,9 MB/s

(here I cut the power and let it be for almost two hours before turning
it on again)

dd: error writing '/mnt/test.bin': Remote I/O error
2325+0 records in
2324+0 records out
2436890624 bytes (2,4 GB, 2,3 GiB) copied, 6944,84 s, 351 kB/s
# umount /mnt

Here the unmount command hung and I had to hard reset the client.

2) Another question I have is why some files "change" as you copy them
out to the Gluster storage? Is that the way it should be? This time, I
deleted eveything in the destination directory to start over:

# mount -o vers=4.1 hv03v.localdomain:/data /mnt/
# rm -f /mnt/test.bin
# dd if=/var/tmp/test.bin of=/mnt/test.bin bs=1M status=progress
8557428736 bytes (8,6 GB, 8,0 GiB) copied, 122 s, 70,1 MB/s
8192+0 records in
8192+0 records out
8589934592 bytes (8,6 GB, 8,0 GiB) copied, 123,039 s, 69,8 MB/s
# md5sum /var/tmp/test.bin 
073867b68fa8eaa382ffe05adb90b583  /var/tmp/test.bin
# md5sum /mnt/test.bin 
634187d367f856f3f5fb31846f796397  /mnt/test.bin
# umount /mnt

Thanks in advance!

/K
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] ganesha.nfsd process dies when copying files

2018-08-10 Thread Karli Sjöberg
On Aug 10, 2018 15:39, "Kaleb S. KEITHLEY"  wrote:On 08/10/2018 09:23 AM, Karli Sjöberg wrote:> On Fri, 2018-08-10 at 21:23 +0800, Pui Edylie wrote:>> Hi Karli, Storhaug works with glusterfs 4.1.2 and latest nfs-ganesha. I just installed them last weekend ... they are working very well :)> > Okay, awesome!> > Is there any documentation on how to do that?> https://github.com/gluster/storhaug/wiki-- KalebThank you very much Kaleb, that's exactly what I was after! I will redo my cluster and try this approach instead!/K___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] ganesha.nfsd process dies when copying files

2018-08-10 Thread Pui Edylie

Hi Karli,

The following is my note which i gathered from google searches, storhaug 
wiki and more google searches ... i might have missed certain steps and 
this is based on Centos 7


install centos 7.x
yum update -y

i have disabled both firewalld and selinux

In our setup we are using LSI raid card RAID10 and present the virtual 
drive partition as /dev/sdb


Create LVM so that we could utilise the the snapshot feature of gluster

pvcreate --dataalignment 256k /dev/sdb
vgcreate --physicalextentsize 256K gfs_vg /dev/sdb

set the volume to use all the space with -l 100%FREE
lvcreate --thinpool gfs_vg/thin_pool -l 100%FREE  --chunksize 256K 
--poolmetadatasize 15G --zero n


we use XFS file system for our glusterfs
mkfs.xfs -i size=512 /dev/gfs_vg/thin_pool

Adding the following into /etc/fstab with mount point /bring1683 (you 
could change the name accordingly)

/dev/gfs_vg/thin_pool /brick1683 xfs    defaults 1 2

Enable gluster 4.1 repro

vi /etc/yum.repos.d/Gluster.repo

[gluster41]
name=Gluster 4.1
baseurl=http://mirror.centos.org/centos/7/storage/$basearch/gluster-4.1/
gpgcheck=0
enabled=1

install gluster 4.1

yum install -y centos-release-gluster

Once we have done the above steps on our 3 nodes, login to 1 of the node 
and issue the following


gluster volume create gv0 replica 3 192.168.0.1:/brick1683/gv0 
192.168.0.2:/brick1684/gv0 192.168.0.3:/brick1685/gv0



Setting up HA for NFS-Ganesha using CTDB

install the storhaug package on all participating nodes
Install the storhaug package on all nodes using the appropriate command 
for your system:


yum -y install storhaug-nfs

Note: this will install all the dependencies, e.g. ctdb, 
nfs-ganesha-gluster, glusterfs, and their related dependencies.


Create a passwordless ssh key and copy it to all participating nodes
On one of the participating nodes (Fedora, RHEL, CentOS):
node1% ssh-keygen -f /etc/sysconfig/storhaug.d/secret.pem
or (Debian, Ubuntu):
node1% ssh-keygen -f /etc/default/storhaug.d/secret.pem
When prompted for a password, press the Enter key.

Copy the public key to all the nodes nodes (Fedora, RHEL, CentOS):
node1% ssh-copy-id -i /etc/sysconfig/storhaug.d/secret.pem.pub root@node1
node1% ssh-copy-id -i /etc/sysconfig/storhaug.d/secret.pem.pub root@node2
node1% ssh-copy-id -i /etc/sysconfig/storhaug.d/secret.pem.pub root@node3

...

You can confirm that it works with (Fedora, RHEL, CentOS):
node1% ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i 
/etc/sysconfig/storhaug.d/secret.pem root@node1



populate /etc/ctdb/nodes and /etc/ctdb/public_addresses
Select one node as your lead node, e.g. node1. On the lead node, 
create/edit /etc/ctdb/nodes and populate it with the (fixed) IP 
addresses of the participating nodes. It should look like this:

192.168.122.81
192.168.122.82
192.168.122.83
192.168.122.84

On the lead node, create/edit /etc/ctdb/public_addresses and populate it 
with the floating IP addresses (a.k.a. VIPs) for the participating 
nodes. These must be different than the IP addresses in /etc/ctdb/nodes. 
It should look like this:

192.168.122.85 eth0
192.168.122.86 eth0
192.168.122.87 eth0
192.168.122.88 eth0

edit /etc/ctdb/ctdbd.conf
Ensure that the line CTDB_MANAGES_NFS=yes exists. If not, add it or 
change it from no to yes. Add or change the following lines:

CTDB_RECOVERY_LOCK=/run/gluster/shared_storage/.ctdb/reclock
CTDB_NFS_CALLOUT=/etc/ctdb/nfs-ganesha-callout
CTDB_NFS_STATE_FS_TYPE=glusterfs
CTDB_NFS_STATE_MNT=/run/gluster/shared_storage
CTDB_NFS_SKIP_SHARE_CHECK=yes
NFS_HOSTNAME=localhost

create a bare minimum /etc/ganesha/ganesha.conf file
On the lead node:
node1% touch /etc/ganesha/ganesha.conf
or
node1% echo "### NFS-Ganesha.config" > /etc/ganesha/ganesha.conf

Note: you can edit this later to set global configuration options.

create a trusted storage pool and start the gluster shared-storage volume
On all the participating nodes:
node1% systemctl start glusterd
node2% systemctl start glusterd
node3% systemctl start glusterd
...

On the lead node, peer probe the other nodes:
node1% gluster peer probe node2
node1% gluster peer probe node3
...

Optional: on one of the other nodes, peer probe node1:
node2% gluster peer probe node1

Enable the gluster shared-storage volume:
node1% gluster volume set all cluster.enable-shared-storage enable
This takes a few moments. When done check that the 
gluster_shared_storage volume is mounted at /run/gluster/shared_storage 
on all the nodes.


start the ctdbd and ganesha.nfsd daemons
On the lead node:
node1% storhaug setup
You can watch the ctdb (/var/log/ctdb.log) and ganesha log 
(/var/log/ganesha/ganesha.log) to monitor their progress. From this 
point on you may enter storhaug commands from any of the participating 
nodes.


export a gluster volume
Create a gluster volume
node1% gluster volume create replica 2 myvol node1:/bricks/vol/myvol 
node2:/bricks/vol/myvol node3:/bricks/vol/myvol node4:/bricks/vol/myvol ...


Start the gluster 

Re: [Gluster-users] ganesha.nfsd process dies when copying files

2018-08-10 Thread Kaleb S. KEITHLEY
On 08/10/2018 09:23 AM, Karli Sjöberg wrote:
> On Fri, 2018-08-10 at 21:23 +0800, Pui Edylie wrote:
>> Hi Karli,
>>
>> Storhaug works with glusterfs 4.1.2 and latest nfs-ganesha.
>>
>> I just installed them last weekend ... they are working very well :)
> 
> Okay, awesome!
> 
> Is there any documentation on how to do that?
> 

https://github.com/gluster/storhaug/wiki

-- 

Kaleb



signature.asc
Description: OpenPGP digital signature
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] ganesha.nfsd process dies when copying files

2018-08-10 Thread Kaleb S. KEITHLEY
On 08/10/2018 09:08 AM, Karli Sjöberg wrote:
> On Fri, 2018-08-10 at 08:39 -0400, Kaleb S. KEITHLEY wrote:
>> On 08/10/2018 08:08 AM, Karli Sjöberg wrote:
>>> Hey all!
>>> ...
>>>
>>> glusterfs-client-xlators-3.10.12-1.el7.x86_64
>>> glusterfs-api-3.10.12-1.el7.x86_64
>>> nfs-ganesha-2.4.5-1.el7.x86_64
>>> centos-release-gluster310-1.0-1.el7.centos.noarch
>>> glusterfs-3.10.12-1.el7.x86_64
>>> glusterfs-cli-3.10.12-1.el7.x86_64
>>> nfs-ganesha-gluster-2.4.5-1.el7.x86_64
>>> glusterfs-server-3.10.12-1.el7.x86_64
>>> glusterfs-libs-3.10.12-1.el7.x86_64
>>> glusterfs-fuse-3.10.12-1.el7.x86_64
>>> glusterfs-ganesha-3.10.12-1.el7.x86_64
>>>
>>
>> For nfs-ganesha problems you'd really be better served by posting to
>> support@ or de...@lists.nfs-ganesha.org.
>>
>> Both glusterfs-3.10 and nfs-ganesha-2.4 are really old. glusterfs-
>> 3.10
>> is even officially EOL. Ganesha isn't really organized  enough to
>> have
>> done anything as bold as officially declaring 2.4 as having reached
>> EOL.
>>
>> The nfs-ganesha devs are currently working on 2.7; maintaining and
>> supporting 2.6, and less so 2.5, is pretty much at the limit of what
>> they might be willing to help debug.
>>
>> I strongly encourage you to update to a more recent version of both
>> glusterfs and nfs-ganesha.  glusterfs-4.1 and nfs-ganesha-2.6 would
>> be
>> ideal. Then if you still have problems you're much more likely to get
>> help.
>>
> 
> Hi, thank you for your answer, but it raises even more questions about
> any potential production deployment.
> 
> Actually, I knew that the versions are old, but it seems to me that you
> are contradicting yourself:
> 
> https://lists.gluster.org/pipermail/gluster-users/2017-July/031753.html
> 
> "After 3.10 you'd need to use storhaug Which doesn't work
> (yet).
> 

I don't recall the context of that email.

But I did also send this email
https://lists.gluster.org/pipermail/gluster-devel/2018-June/054896.html
announcing the availability of (working) storhaug-1.0.

There are packages in Fedora, CentOS Storage SIG, gluster's Launchpad
PPA, gluster's OBS, and for Debian on download.gluster.org.

-- 

Kaleb



signature.asc
Description: OpenPGP digital signature
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] ganesha.nfsd process dies when copying files

2018-08-10 Thread Karli Sjöberg
On Fri, 2018-08-10 at 21:23 +0800, Pui Edylie wrote:
> Hi Karli,
> 
> Storhaug works with glusterfs 4.1.2 and latest nfs-ganesha.
> 
> I just installed them last weekend ... they are working very well :)

Okay, awesome!

Is there any documentation on how to do that?

/K

> 
> Cheers,
> Edy
> 
> On 8/10/2018 9:08 PM, Karli Sjöberg wrote:
> > On Fri, 2018-08-10 at 08:39 -0400, Kaleb S. KEITHLEY wrote:
> > > On 08/10/2018 08:08 AM, Karli Sjöberg wrote:
> > > > Hey all!
> > > > ...
> > > > 
> > > > glusterfs-client-xlators-3.10.12-1.el7.x86_64
> > > > glusterfs-api-3.10.12-1.el7.x86_64
> > > > nfs-ganesha-2.4.5-1.el7.x86_64
> > > > centos-release-gluster310-1.0-1.el7.centos.noarch
> > > > glusterfs-3.10.12-1.el7.x86_64
> > > > glusterfs-cli-3.10.12-1.el7.x86_64
> > > > nfs-ganesha-gluster-2.4.5-1.el7.x86_64
> > > > glusterfs-server-3.10.12-1.el7.x86_64
> > > > glusterfs-libs-3.10.12-1.el7.x86_64
> > > > glusterfs-fuse-3.10.12-1.el7.x86_64
> > > > glusterfs-ganesha-3.10.12-1.el7.x86_64
> > > > 
> > > 
> > > For nfs-ganesha problems you'd really be better served by posting
> > > to
> > > support@ or de...@lists.nfs-ganesha.org.
> > > 
> > > Both glusterfs-3.10 and nfs-ganesha-2.4 are really old.
> > > glusterfs-
> > > 3.10
> > > is even officially EOL. Ganesha isn't really organized  enough to
> > > have
> > > done anything as bold as officially declaring 2.4 as having
> > > reached
> > > EOL.
> > > 
> > > The nfs-ganesha devs are currently working on 2.7; maintaining
> > > and
> > > supporting 2.6, and less so 2.5, is pretty much at the limit of
> > > what
> > > they might be willing to help debug.
> > > 
> > > I strongly encourage you to update to a more recent version of
> > > both
> > > glusterfs and nfs-ganesha.  glusterfs-4.1 and nfs-ganesha-2.6
> > > would
> > > be
> > > ideal. Then if you still have problems you're much more likely to
> > > get
> > > help.
> > > 
> > 
> > Hi, thank you for your answer, but it raises even more questions
> > about
> > any potential production deployment.
> > 
> > Actually, I knew that the versions are old, but it seems to me that
> > you
> > are contradicting yourself:
> > 
> > https://lists.gluster.org/pipermail/gluster-users/2017-July/031753.
> > html
> > 
> > "After 3.10 you'd need to use storhaug Which doesn't work
> > (yet).
> > 
> > You need to use 3.10 for now."
> > 
> > So how is that supposed to work?
> > 
> > Is there documentation for how to get there?
> > 
> > Thanks in advance!
> > 
> > /K
> > 
> > 
> > ___
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users
>  
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users

signature.asc
Description: This is a digitally signed message part
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] ganesha.nfsd process dies when copying files

2018-08-10 Thread Pui Edylie

Hi Karli,

Storhaug works with glusterfs 4.1.2 and latest nfs-ganesha.


I just installed them last weekend ... they are working very well :)

Cheers,
Edy


On 8/10/2018 9:08 PM, Karli Sjöberg wrote:

On Fri, 2018-08-10 at 08:39 -0400, Kaleb S. KEITHLEY wrote:

On 08/10/2018 08:08 AM, Karli Sjöberg wrote:

Hey all!
...

glusterfs-client-xlators-3.10.12-1.el7.x86_64
glusterfs-api-3.10.12-1.el7.x86_64
nfs-ganesha-2.4.5-1.el7.x86_64
centos-release-gluster310-1.0-1.el7.centos.noarch
glusterfs-3.10.12-1.el7.x86_64
glusterfs-cli-3.10.12-1.el7.x86_64
nfs-ganesha-gluster-2.4.5-1.el7.x86_64
glusterfs-server-3.10.12-1.el7.x86_64
glusterfs-libs-3.10.12-1.el7.x86_64
glusterfs-fuse-3.10.12-1.el7.x86_64
glusterfs-ganesha-3.10.12-1.el7.x86_64


For nfs-ganesha problems you'd really be better served by posting to
support@ or de...@lists.nfs-ganesha.org.

Both glusterfs-3.10 and nfs-ganesha-2.4 are really old. glusterfs-
3.10
is even officially EOL. Ganesha isn't really organized  enough to
have
done anything as bold as officially declaring 2.4 as having reached
EOL.

The nfs-ganesha devs are currently working on 2.7; maintaining and
supporting 2.6, and less so 2.5, is pretty much at the limit of what
they might be willing to help debug.

I strongly encourage you to update to a more recent version of both
glusterfs and nfs-ganesha.  glusterfs-4.1 and nfs-ganesha-2.6 would
be
ideal. Then if you still have problems you're much more likely to get
help.


Hi, thank you for your answer, but it raises even more questions about
any potential production deployment.

Actually, I knew that the versions are old, but it seems to me that you
are contradicting yourself:

https://lists.gluster.org/pipermail/gluster-users/2017-July/031753.html

"After 3.10 you'd need to use storhaug Which doesn't work
(yet).

You need to use 3.10 for now."

So how is that supposed to work?

Is there documentation for how to get there?

Thanks in advance!

/K


___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] ganesha.nfsd process dies when copying files

2018-08-10 Thread Karli Sjöberg
On Fri, 2018-08-10 at 08:39 -0400, Kaleb S. KEITHLEY wrote:
> On 08/10/2018 08:08 AM, Karli Sjöberg wrote:
> > Hey all!
> > ...
> > 
> > glusterfs-client-xlators-3.10.12-1.el7.x86_64
> > glusterfs-api-3.10.12-1.el7.x86_64
> > nfs-ganesha-2.4.5-1.el7.x86_64
> > centos-release-gluster310-1.0-1.el7.centos.noarch
> > glusterfs-3.10.12-1.el7.x86_64
> > glusterfs-cli-3.10.12-1.el7.x86_64
> > nfs-ganesha-gluster-2.4.5-1.el7.x86_64
> > glusterfs-server-3.10.12-1.el7.x86_64
> > glusterfs-libs-3.10.12-1.el7.x86_64
> > glusterfs-fuse-3.10.12-1.el7.x86_64
> > glusterfs-ganesha-3.10.12-1.el7.x86_64
> > 
> 
> For nfs-ganesha problems you'd really be better served by posting to
> support@ or de...@lists.nfs-ganesha.org.
> 
> Both glusterfs-3.10 and nfs-ganesha-2.4 are really old. glusterfs-
> 3.10
> is even officially EOL. Ganesha isn't really organized  enough to
> have
> done anything as bold as officially declaring 2.4 as having reached
> EOL.
> 
> The nfs-ganesha devs are currently working on 2.7; maintaining and
> supporting 2.6, and less so 2.5, is pretty much at the limit of what
> they might be willing to help debug.
> 
> I strongly encourage you to update to a more recent version of both
> glusterfs and nfs-ganesha.  glusterfs-4.1 and nfs-ganesha-2.6 would
> be
> ideal. Then if you still have problems you're much more likely to get
> help.
> 

Hi, thank you for your answer, but it raises even more questions about
any potential production deployment.

Actually, I knew that the versions are old, but it seems to me that you
are contradicting yourself:

https://lists.gluster.org/pipermail/gluster-users/2017-July/031753.html

"After 3.10 you'd need to use storhaug Which doesn't work
(yet).

You need to use 3.10 for now."

So how is that supposed to work?

Is there documentation for how to get there?

Thanks in advance!

/K

signature.asc
Description: This is a digitally signed message part
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] ganesha.nfsd process dies when copying files

2018-08-10 Thread Kaleb S. KEITHLEY
On 08/10/2018 08:08 AM, Karli Sjöberg wrote:
> Hey all!
> ...
> 
> glusterfs-client-xlators-3.10.12-1.el7.x86_64
> glusterfs-api-3.10.12-1.el7.x86_64
> nfs-ganesha-2.4.5-1.el7.x86_64
> centos-release-gluster310-1.0-1.el7.centos.noarch
> glusterfs-3.10.12-1.el7.x86_64
> glusterfs-cli-3.10.12-1.el7.x86_64
> nfs-ganesha-gluster-2.4.5-1.el7.x86_64
> glusterfs-server-3.10.12-1.el7.x86_64
> glusterfs-libs-3.10.12-1.el7.x86_64
> glusterfs-fuse-3.10.12-1.el7.x86_64
> glusterfs-ganesha-3.10.12-1.el7.x86_64
> 

For nfs-ganesha problems you'd really be better served by posting to
support@ or de...@lists.nfs-ganesha.org.

Both glusterfs-3.10 and nfs-ganesha-2.4 are really old. glusterfs-3.10
is even officially EOL. Ganesha isn't really organized  enough to have
done anything as bold as officially declaring 2.4 as having reached EOL.

The nfs-ganesha devs are currently working on 2.7; maintaining and
supporting 2.6, and less so 2.5, is pretty much at the limit of what
they might be willing to help debug.

I strongly encourage you to update to a more recent version of both
glusterfs and nfs-ganesha.  glusterfs-4.1 and nfs-ganesha-2.6 would be
ideal. Then if you still have problems you're much more likely to get help.

-- 

Kaleb
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users