Re: [Gluster-users] ESXI cannot access stripped gluster volume

2014-03-02 Thread Bryan Whitehead
I don't have ESXi experience but the first thing that jumps out at me is
you probably need to mount NFS/tcp. NFS/udp doesn't work on glusterfs
(unless this has changed and I've not been paying close enough attention
lately).




On Sat, Mar 1, 2014 at 8:35 AM, Carlos Capriotti  wrote:

> ESXI cannot access stripped gluster volume
>
> Hello all.
>
> Unfortunately this is going to be a long post, so I will not spend too
> many words with compliments; Gluster is a great solution and I should be
> writing odes about it, so, bravo all of you.
>
> A bit about me: I've been working with FreeBSD and Linux for over a decade
> now. I use CentOS nowadays because of some convenient features, regarding
> my applications.
>
> Now a bit more about my problem: After adding my stripped gluster volume
> to my ESXI via NFS, I try to browse it with vSphere's browser,  and it...
>
> a) cannot see any of the folders already there. The operation never times
> out, and I have a line of dots telling me the system (esxi) is trying to do
> something. Other systems CAN access and see content on the same volume.
>
> b) gluster or ESXI return/log NO ERROR when I (try to) create folders
> there, but does not show them either. Folder IS created.
>
> c) When trying to create a virtual machine it DOES return an error. Folder
> and first file related to the VM ARE created but I get an error about the
> an "Invalid virtual machine configuration". I am under the impression esxi
> returns this when it tries to create the file for the virtual disk.
>
> d) When trying to remove the volume, it DOES return an error, stating the
> resource is busy. I am forced to reboot the ESXI host in order to
> successfully delete the nfs volume.
>
> Now, a bit about my environment:
>
> I am one of those cursed with an Isilon, but with NO service contract and
> NO license. So, basically I have a big, fast and resilient NAS. Cool stuff,
> but with great inconveniences as well. Goes without saying that me, as a
> free software guy, would love to build something that can retire the
> isilon, or at least, move it to a secondary role.
>
> Anyway, trying to add an alternative to it, I searched for days and
> decided Gluster was the way to go. And I am not going back. I will make it
> work.
>
> All of my VM servers, (about 15) are spread across 3 metal boxes, and -
> please, don't blame me, I inherited this situation - there is no backup
> solution whatsoever. Gluster will, in its final configuration, run on 4
> boxes, providing HA and backup.
>
> So, on esxi,  Isilon's NFS volume/share works like a charm; pure NFS
> sharing on CentOS works like a charm; gluster stripe - using two of my four
> servers -  does not like me.
>
> The very same gluster NSF volume is mounted and works happily on a ContOS
> client. Actually, more than one.
>
> I have been reading literally dozens of docs, guides, manuals, VMware,
> Gluster and Red Hat for more than a week, and while that, I've even created
> ESXI VIRTUAL SERVERS *INSIDE* my EXSI physical servers, because I can no
> longer afford rebooting a production server whenever I need to test yet
> another change on gluster.
>
>
> My software versions:
>
> CentOS 6.5
> Gluster 3.4.2
> ESXI 5.1 all patches applied
> ESXI 5.5
>
> My hardware for the nodes: 2 x Dell PE2950 with RAID5, bricks are single
> volume of about 1.5 TB on each node.
>
> One stand-alone PE2900 with a single volume, on RAID5, and about 2.4 TB,
> which will be added to the stripe, eventually. One PE2950, brick on RAID5
> 800GB, which will also be added eventually.
>
> All of them with one NIC for regular networking and a bonded nic made out
> of 2 physical NICs, for gluster/nfs.
>
> ESXIs are running on R710s, lots of ram, and, at least one NIC dedicated
> to NFS. I have one test server running with all four NICs on the NFS
> netowrk.
>
> NFS network is 9000 MTU, tuned for iSCSI (in the future).
>
> Now, trying to make it all work, these are the steps I took:
>
> Regarding ESXI tweaks, I've changed GLUSTER'S ping limit, lowering it to
> 20, to stop the volume from being intermittently inaccessible. On ESXI
> itself I've set the NFS max queue length to 64.
>
> I've chmoded gluster's share with 777, and you can find my gluster tweaks
> for the volume below.
>
> Gluster's NFS and regular NFS both are forcing uid and gid to
> nfsnobody:nfsnobody.
>
> iptables has been disabled, along with SElinux.
>
> Of course regular NFS is disabled.
>
> My gluster settings:
>
>
> Volume Name: glvol0
> Type: Stripe
> Volume ID: f76af2ac-6a42-42ea-9887-941bf1600ced
> Status: Started
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: 10.0.1.21:/export/glusterroot
> Brick2: 10.0.1.22:/export/glusterroot
> Options Reconfigured:
> nfs.ports-insecure: on
> nfs.addr-namelookup: off
> auth.reject: NONE
> nfs.volume-access: read-write
> nfs.nlm: off
> network.ping-timeout: 20
> server.root-squash: on
> performance.nfs.write-behind: on
> performance.nfs.read-ahead: on
> perfo

[Gluster-users] gluster3.3 cFuse client dying after "gfid different on subvolume" ?

2014-03-02 Thread Mingfan Lu
Hi
   I saw one of our client dying after we see "gfid different on
subvolume". Here is log of the client

[2014-03-03 08:30:54.286225] W
[afr-common.c:1196:afr_detect_self_heal_by_iatt] 0-bj-mams-replicate-6:
/operation/video/2014/03/03/e7/35/71/81f0a6656c077a16cad663e540543a78.pfvmeta:
gfid different on subvolume
[2014-03-03 08:30:54.287017] I
[afr-self-heal-common.c:1970:afr_sh_post_nb_entrylk_gfid_sh_cbk]
0-bj-mams-replicate-6: Non blocking entrylks failed.
[2014-03-03 08:30:54.287910] W [inode.c:914:inode_lookup]
(-->/usr/lib64/glusterfs/3.3.0.5rhs_iqiyi_7/xlator/debug/io-stats.so(io_stats_lookup_cbk+0xff)
[0x7fb6fd630adf]
(-->/usr/lib64/glusterfs/3.3.0.5rhs_iqiyi_7/xlator/mount/fuse.so(+0xf3f8)
[0x7fb7019da3f8]
(-->/usr/lib64/glusterfs/3.3.0.5rhs_iqiyi_7/xlator/mount/fuse.so(+0xf25b)
[0x7fb7019da25b]))) 0-fuse: inode not found

I saw similar discussion in mail list, but i don't see the solution to fix
this issue.
http://www.gluster.org/pipermail/gluster-users/2013-June/036190.html

Using umount and remount, the client is alive now. but What I want know why
this happen ? Is there any bugfix on this?
thanks
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Check Synchronization state before/after host upgrade

2014-03-02 Thread Georg Kunz
Hi everyone!


We are running a couple of Gluster storage servers serving mainly VM
images. We are wondering already for quite some time how to handle host OS
upgrades efficiently and safely without risking losing data on the gluster
volumes. So, it would be great if you could give me some recommendations,
how-tos, or just your opinion on how to handle host OS upgrades best.


The problem: Host OS upgrades often require restating the servers. We run
Gluster 3.4.0 and all volumes use 2x replication. So we typically restart
one server at a time, wait for re-synchronization and continue. However, we
don't know of any means to reliably identify that Gluster has fully synched
the bricks. In fact, the "volume heal info" command on our servers not only
shows files which are out of sync, but also files which are currently being
modified - even under normal, i.e. replicated, operation. Since VM images,
log files, databases, etc. are constantly being modified, our volumes seems
to be out-of-sync all the time. Hence, after each restart, we manually
trigger re-synchronization and wait for a day before we restart the next
server, hoping that even the VM images under heavy modification have been
synched in the meantime. However, we'd like to script the upgrade to make
it faster, more automated and mostly more reliable.


My questions: What is your recommended way to reliably handle this? Is the
behavior of the "volume heal info" command supposed to be like this?


Thank you and best regards,

Georg
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] ESXI cannot access stripped gluster volume

2014-03-02 Thread Carlos Capriotti
ESXI cannot access stripped gluster volume

Hello all.

Unfortunately this is going to be a long post, so I will not spend too many
words with compliments; Gluster is a great solution and I should be writing
odes about it, so, bravo all of you.

A bit about me: I've been working with FreeBSD and Linux for over a decade
now. I use CentOS nowadays because of some convenient features, regarding
my applications.

Now a bit more about my problem: After adding my stripped gluster volume to
my ESXI via NFS, I try to browse it with vSphere's browser,  and it...

a) cannot see any of the folders already there. The operation never times
out, and I have a line of dots telling me the system (esxi) is trying to do
something. Other systems CAN access and see content on the same volume.

b) gluster or ESXI return/log NO ERROR when I (try to) create folders
there, but does not show them either. Folder IS created.

c) When trying to create a virtual machine it DOES return an error. Folder
and first file related to the VM ARE created but I get an error about the
an "Invalid virtual machine configuration". I am under the impression esxi
returns this when it tries to create the file for the virtual disk.

d) When trying to remove the volume, it DOES return an error, stating the
resource is busy. I am forced to reboot the ESXI host in order to
successfully delete the nfs volume.

Now, a bit about my environment:

I am one of those cursed with an Isilon, but with NO service contract and
NO license. So, basically I have a big, fast and resilient NAS. Cool stuff,
but with great inconveniences as well. Goes without saying that me, as a
free software guy, would love to build something that can retire the
isilon, or at least, move it to a secondary role.

Anyway, trying to add an alternative to it, I searched for days and decided
Gluster was the way to go. And I am not going back. I will make it work.

All of my VM servers, (about 15) are spread across 3 metal boxes, and -
please, don't blame me, I inherited this situation - there is no backup
solution whatsoever. Gluster will, in its final configuration, run on 4
boxes, providing HA and backup.

So, on esxi,  Isilon's NFS volume/share works like a charm; pure NFS
sharing on CentOS works like a charm; gluster stripe - using two of my four
servers -  does not like me.

The very same gluster NSF volume is mounted and works happily on a ContOS
client. Actually, more than one.

I have been reading literally dozens of docs, guides, manuals, VMware,
Gluster and Red Hat for more than a week, and while that, I've even created
ESXI VIRTUAL SERVERS *INSIDE* my EXSI physical servers, because I can no
longer afford rebooting a production server whenever I need to test yet
another change on gluster.


My software versions:

CentOS 6.5
Gluster 3.4.2
ESXI 5.1 all patches applied
ESXI 5.5

My hardware for the nodes: 2 x Dell PE2950 with RAID5, bricks are single
volume of about 1.5 TB on each node.

One stand-alone PE2900 with a single volume, on RAID5, and about 2.4 TB,
which will be added to the stripe, eventually. One PE2950, brick on RAID5
800GB, which will also be added eventually.

All of them with one NIC for regular networking and a bonded nic made out
of 2 physical NICs, for gluster/nfs.

ESXIs are running on R710s, lots of ram, and, at least one NIC dedicated to
NFS. I have one test server running with all four NICs on the NFS netowrk.

NFS network is 9000 MTU, tuned for iSCSI (in the future).

Now, trying to make it all work, these are the steps I took:

Regarding ESXI tweaks, I've changed GLUSTER'S ping limit, lowering it to
20, to stop the volume from being intermittently inaccessible. On ESXI
itself I've set the NFS max queue length to 64.

I've chmoded gluster's share with 777, and you can find my gluster tweaks
for the volume below.

Gluster's NFS and regular NFS both are forcing uid and gid to
nfsnobody:nfsnobody.

iptables has been disabled, along with SElinux.

Of course regular NFS is disabled.

My gluster settings:


Volume Name: glvol0
Type: Stripe
Volume ID: f76af2ac-6a42-42ea-9887-941bf1600ced
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.0.1.21:/export/glusterroot
Brick2: 10.0.1.22:/export/glusterroot
Options Reconfigured:
nfs.ports-insecure: on
nfs.addr-namelookup: off
auth.reject: NONE
nfs.volume-access: read-write
nfs.nlm: off
network.ping-timeout: 20
server.root-squash: on
performance.nfs.write-behind: on
performance.nfs.read-ahead: on
performance.nfs.io-cache: on
performance.nfs.quick-read: on
performance.nfs.stat-prefetch: on
performance.nfs.io-threads: on
storage.owner-uid: 65534
storage.owner-gid: 65534
nfs.disable: off


My regular NFS settings, that work just the way I need:

/export/share *(rw,all_squash,anonuid=65534,anongid=65534,no_subtree_check)

Once I get this all to work, I intend to create a nice page with
instructions/info on gluster for ESXi. This field needs some more work out
there.

Now the question: is there anyth