Re: [Gluster-users] Shard file size (gluster 3.7.5)

2015-11-05 Thread Lindsay Mathieson
On 6 November 2015 at 17:22, Krutika Dhananjay  wrote:

> Sure. So far I've just been able to figure that GlusterFS counts blocks in
> multiples of 512B while XFS seems to count them in multiples of 4.0KB.
> Let me again try creating sparse files on xfs, sharded and non-sharded
> gluster volumes and compare the results. I'll let you know what I find.
>

Yes, that could complicate things. ZFS has available record size up to 1MB
max. I'll try the same tests with XFS & EXT4 as well.


-- 
Lindsay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Shard file size (gluster 3.7.5)

2015-11-05 Thread Krutika Dhananjay
Sure. So far I've just been able to figure that GlusterFS counts blocks in 
multiples of 512B while XFS seems to count them in multiples of 4.0KB. 
Let me again try creating sparse files on xfs, sharded and non-sharded gluster 
volumes and compare the results. I'll let you know what I find. 

-Krutika 
- Original Message -

> From: "Lindsay Mathieson" 
> To: "Krutika Dhananjay" 
> Cc: "gluster-users" 
> Sent: Thursday, November 5, 2015 7:16:06 PM
> Subject: Re: [Gluster-users] Shard file size (gluster 3.7.5)

> On 5 November 2015 at 21:19, Krutika Dhananjay < kdhan...@redhat.com > wrote:

> > Just to be sure, did you rerun the test on the already broken file
> > (test.bin)
> > which was written to when strict-write-ordering had been off?
> 
> > Or did you try the new test with strict-write-ordering on a brand new file?
> 

> Very strange. I tried it on new files and even went to the extent of deleting
> the datastore and bricks, then recreating.

> One oddity on my system - I have two prefs that I cannot reset
> - cluster.server-quorum-ratio
> - performance.readdir-ahead

> Though I would'nt have thought they made a difference. I might try cleaning
> gluster and all its config files off the systems and *really* starting from
> scratch.

> --
> Lindsay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] File Corruption with shards - 100% reproducable

2015-11-05 Thread Krutika Dhananjay
CC'd him only now. 

- Original Message -

> From: "Krutika Dhananjay" 
> To: "Lindsay Mathieson" 
> Cc: "gluster-users" 
> Sent: Friday, November 6, 2015 11:05:27 AM
> Subject: Re: [Gluster-users] File Corruption with shards - 100% reproducable

> CC'ing Raghavendra Talur, who is managing the 3.7.6 release.

> -Krutika

> - Original Message -

> > From: "Lindsay Mathieson" 
> 
> > To: "Krutika Dhananjay" 
> 
> > Cc: "gluster-users" 
> 
> > Sent: Thursday, November 5, 2015 7:17:35 PM
> 
> > Subject: Re: [Gluster-users] File Corruption with shards - 100%
> > reproducable
> 

> > On 5 November 2015 at 21:55, Krutika Dhananjay < kdhan...@redhat.com >
> > wrote:
> 

> > > Although I do not have experience with VM live migration, IIUC, it is got
> > > to
> > > do with a different server (and as a result a new glusterfs client
> > > process)
> > > taking over the operations and mgmt of the VM.
> > 
> 

> > Thats sounds very plausible
> 

> > > If this is a correct assumption, then I think this could be the result of
> > > the
> > > same caching bug that I talked about sometime back in 3.7.5, which is
> > > fixed
> > > in 3.7.6.
> > 
> 
> > > The issue could cause the new client to not see the correct size and
> > > block
> > > count of the file, leading to errors in reads (perhaps triggered by the
> > > restart of the vm) and writes on the image.
> > 
> 

> > Cool, I look fwd to testing that in 3.7.6, which I believe is due out next
> > week?
> 

> > thanks,
> 

> > --
> 
> > Lindsay
> 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] File Corruption with shards - 100% reproducable

2015-11-05 Thread Krutika Dhananjay
CC'ing Raghavendra Talur, who is managing the 3.7.6 release. 

-Krutika 

- Original Message -

> From: "Lindsay Mathieson" 
> To: "Krutika Dhananjay" 
> Cc: "gluster-users" 
> Sent: Thursday, November 5, 2015 7:17:35 PM
> Subject: Re: [Gluster-users] File Corruption with shards - 100% reproducable

> On 5 November 2015 at 21:55, Krutika Dhananjay < kdhan...@redhat.com > wrote:

> > Although I do not have experience with VM live migration, IIUC, it is got
> > to
> > do with a different server (and as a result a new glusterfs client process)
> > taking over the operations and mgmt of the VM.
> 

> Thats sounds very plausible

> > If this is a correct assumption, then I think this could be the result of
> > the
> > same caching bug that I talked about sometime back in 3.7.5, which is fixed
> > in 3.7.6.
> 
> > The issue could cause the new client to not see the correct size and block
> > count of the file, leading to errors in reads (perhaps triggered by the
> > restart of the vm) and writes on the image.
> 

> Cool, I look fwd to testing that in 3.7.6, which I believe is due out next
> week?

> thanks,

> --
> Lindsay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Troubleshooting: Script to detect files without GFID and Invalid directory symlinks

2015-11-05 Thread Aravinda

Hi,

Yesterday while debugging in a production setup, found that multiple 
directories having same GFID(Rename and Lookup race) and some files 
without GFID xattr(Brick crash/hard reboot during create).


Wrote a script to detect the same 
https://gist.github.com/aravindavk/29f673f13c2f8963447e


Usage:
python find_gfid_issues.py 

For example,
python find_gfid_issues.py /export/brick1/b1

Let me know if this script is useful in detecting gfid/link issues.

--
regards
Aravinda

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] [ovirt-users] Centos 7.1 failed to start glusterd after upgrading to ovirt 3.6

2015-11-05 Thread Atin Mukherjee
>> [glusterd-store.c:4243:glusterd_resolve_all_bricks] 0-glusterd:
>> resolve brick failed in restore
The above log is the culprit here. Generally this function fails when
GlusterD fails to resolve the associated host of a brick. Has any of the
node undergone an IP change during the upgrade process?

~Atin

On 11/06/2015 09:59 AM, Sahina Bose wrote:
> Did you upgrade all the nodes too?
> Are some of your nodes not-reachable?
> 
> Adding gluster-users for glusterd error.
> 
> On 11/06/2015 12:00 AM, Stefano Danzi wrote:
>>
>> After upgrading oVirt from 3.5 to 3.6, glusterd fail to start when the
>> host boot.
>> Manual start of service after boot works fine.
>>
>> gluster log:
>>
>> [2015-11-04 13:37:55.360876] I [MSGID: 100030]
>> [glusterfsd.c:2318:main] 0-/usr/sbin/glusterd: Started running
>> /usr/sbin/glusterd version 3.7.5 (args: /usr/sbin/glusterd -p
>> /var/run/glusterd.pid)
>> [2015-11-04 13:37:55.447413] I [MSGID: 106478] [glusterd.c:1350:init]
>> 0-management: Maximum allowed open file descriptors set to 65536
>> [2015-11-04 13:37:55.447477] I [MSGID: 106479] [glusterd.c:1399:init]
>> 0-management: Using /var/lib/glusterd as working directory
>> [2015-11-04 13:37:55.464540] W [MSGID: 103071]
>> [rdma.c:4592:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event
>> channel creation failed [Nessun device corrisponde]
>> [2015-11-04 13:37:55.464559] W [MSGID: 103055] [rdma.c:4899:init]
>> 0-rdma.management: Failed to initialize IB Device
>> [2015-11-04 13:37:55.464566] W
>> [rpc-transport.c:359:rpc_transport_load] 0-rpc-transport: 'rdma'
>> initialization failed
>> [2015-11-04 13:37:55.464616] W [rpcsvc.c:1597:rpcsvc_transport_create]
>> 0-rpc-service: cannot create listener, initing the transport failed
>> [2015-11-04 13:37:55.464624] E [MSGID: 106243] [glusterd.c:1623:init]
>> 0-management: creation of 1 listeners failed, continuing with
>> succeeded transport
>> [2015-11-04 13:37:57.663862] I [MSGID: 106513]
>> [glusterd-store.c:2036:glusterd_restore_op_version] 0-glusterd:
>> retrieved op-version: 30600
>> [2015-11-04 13:37:58.284522] I [MSGID: 106194]
>> [glusterd-store.c:3465:glusterd_store_retrieve_missed_snaps_list]
>> 0-management: No missed snaps list.
>> [2015-11-04 13:37:58.287477] E [MSGID: 106187]
>> [glusterd-store.c:4243:glusterd_resolve_all_bricks] 0-glusterd:
>> resolve brick failed in restore
>> [2015-11-04 13:37:58.287505] E [MSGID: 101019]
>> [xlator.c:428:xlator_init] 0-management: Initialization of volume
>> 'management' failed, review your volfile again
>> [2015-11-04 13:37:58.287513] E [graph.c:322:glusterfs_graph_init]
>> 0-management: initializing translator failed
>> [2015-11-04 13:37:58.287518] E [graph.c:661:glusterfs_graph_activate]
>> 0-graph: init failed
>> [2015-11-04 13:37:58.287799] W [glusterfsd.c:1236:cleanup_and_exit]
>> (-->/usr/sbin/glusterd(glusterfs_volumes_init+0xfd) [0x7f29b876524d]
>> -->/usr/sbin/glusterd(glusterfs_process_volfp+0x126) [0x7f29b87650f6]
>> -->/usr/sbin/glusterd(cleanup_and_exit+0x69) [0x7f29b87646d9] ) 0-:
>> received signum (0), shutting down
>>
>>
>> ___
>> Users mailing list
>> us...@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
> 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] [ovirt-users] Centos 7.1 failed to start glusterd after upgrading to ovirt 3.6

2015-11-05 Thread Sahina Bose

Did you upgrade all the nodes too?
Are some of your nodes not-reachable?

Adding gluster-users for glusterd error.

On 11/06/2015 12:00 AM, Stefano Danzi wrote:


After upgrading oVirt from 3.5 to 3.6, glusterd fail to start when the 
host boot.

Manual start of service after boot works fine.

gluster log:

[2015-11-04 13:37:55.360876] I [MSGID: 100030] 
[glusterfsd.c:2318:main] 0-/usr/sbin/glusterd: Started running 
/usr/sbin/glusterd version 3.7.5 (args: /usr/sbin/glusterd -p 
/var/run/glusterd.pid)
[2015-11-04 13:37:55.447413] I [MSGID: 106478] [glusterd.c:1350:init] 
0-management: Maximum allowed open file descriptors set to 65536
[2015-11-04 13:37:55.447477] I [MSGID: 106479] [glusterd.c:1399:init] 
0-management: Using /var/lib/glusterd as working directory
[2015-11-04 13:37:55.464540] W [MSGID: 103071] 
[rdma.c:4592:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event 
channel creation failed [Nessun device corrisponde]
[2015-11-04 13:37:55.464559] W [MSGID: 103055] [rdma.c:4899:init] 
0-rdma.management: Failed to initialize IB Device
[2015-11-04 13:37:55.464566] W 
[rpc-transport.c:359:rpc_transport_load] 0-rpc-transport: 'rdma' 
initialization failed
[2015-11-04 13:37:55.464616] W [rpcsvc.c:1597:rpcsvc_transport_create] 
0-rpc-service: cannot create listener, initing the transport failed
[2015-11-04 13:37:55.464624] E [MSGID: 106243] [glusterd.c:1623:init] 
0-management: creation of 1 listeners failed, continuing with 
succeeded transport
[2015-11-04 13:37:57.663862] I [MSGID: 106513] 
[glusterd-store.c:2036:glusterd_restore_op_version] 0-glusterd: 
retrieved op-version: 30600
[2015-11-04 13:37:58.284522] I [MSGID: 106194] 
[glusterd-store.c:3465:glusterd_store_retrieve_missed_snaps_list] 
0-management: No missed snaps list.
[2015-11-04 13:37:58.287477] E [MSGID: 106187] 
[glusterd-store.c:4243:glusterd_resolve_all_bricks] 0-glusterd: 
resolve brick failed in restore
[2015-11-04 13:37:58.287505] E [MSGID: 101019] 
[xlator.c:428:xlator_init] 0-management: Initialization of volume 
'management' failed, review your volfile again
[2015-11-04 13:37:58.287513] E [graph.c:322:glusterfs_graph_init] 
0-management: initializing translator failed
[2015-11-04 13:37:58.287518] E [graph.c:661:glusterfs_graph_activate] 
0-graph: init failed
[2015-11-04 13:37:58.287799] W [glusterfsd.c:1236:cleanup_and_exit] 
(-->/usr/sbin/glusterd(glusterfs_volumes_init+0xfd) [0x7f29b876524d] 
-->/usr/sbin/glusterd(glusterfs_process_volfp+0x126) [0x7f29b87650f6] 
-->/usr/sbin/glusterd(cleanup_and_exit+0x69) [0x7f29b87646d9] ) 0-: 
received signum (0), shutting down



___
Users mailing list
us...@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Replacing a node in a 4x2 distributed/replicated setup

2015-11-05 Thread Bipin Kunal
Thomas,

You seems to be on right track, That is easiest way to replace brick
without any hassle.

Here is the set of steps which I usually follow:

# mkdir -p /bricks/brick1  <--Mountpoint of new brick
# mkdir -p /bricks/brick1/.glusterfs/00/00
# cd  /bricks/brick1/.glusterfs/00/00
# ln -s ../../.. ----0001
#  ll should return this:
lrwxrwxrwx 1 root root   ----0001
-> ../../..
# get the volume id from the replica pair of failed node
#getfattr -d -m. -e hex /bricks/brick2   ---> replica brick
trusted.glusterfs.volume-id= 
set the same vol id to new brick
#setfattr -n trusted.glusterfs.volume-id -v  /bricks/brick1
Verify it
#getfattr -d -m. -e hex /bricks/brick1
If volume is in stop state start the volume:
#gluster volume start VOLNAME
Check #gluster volume status to check new brick came online  and have pid
and portnumber
If brick is not online restart the volume:
#gluster volume stop VOLNAME force
#gluster volume start VOLNAME
Run full self-heal:
#gluster volume heal VOLNAME full
Compare the data from replica brick `/bricks/brick2` to this new brick
`/bricks/brick1`, it should have same data.

Thanks,
Bipin Kunal

On Thu, Nov 5, 2015 at 9:55 PM, Thomas Bätzler 
wrote:

> Hi,
>
> A small update: since nothing else worked, I broke down and changed the
> replacement system's IP and hostname to that of the broken system;
> replaced its UUID with that of the downed machine and probed it back
> into the gluster cluster. Had to restart glusterd several times to make
> the other systems pick up the change.
>
> I then added the volume-id attr to the new bricks as suggested on
> https://joejulian.name/blog/replacing-a-brick-on-glusterfs-340/. After
> that I was able to trigger a manual heal. By tomorrow I may have some
> kind of estimate of how long the repair is going to take.
>
>
> Bye,
> Thomas
>
>
>
> ---
> Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
> https://www.avast.com/antivirus
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] IPv6 peer probe problem

2015-11-05 Thread Atin Mukherjee


On 11/06/2015 06:06 AM, Gmail wrote:
> Does any one run Gluster on IPv6???
We are already working on this. A patch [1] to fix all IPv6 issues is
already under review. The plan is to get the complete support in 3.8
release. Will keep you posted once it gets merged in mainline.

[1] http://review.gluster.org/#/c/11988/

Thanks,
Atin
> 
> -Bishoy
>> On Oct 30, 2015, at 1:14 PM, Gmail > > wrote:
>>
>> Hello,
>>
>> I’m trying to use IPv6 with Gluster 3.7.5, but when I do peer probe, I
>> get the following error:
>>
>> peer probe: failed: Probe returned with Transport endpoint is not
>> connected
>>
>> and the logs show the following:
>>
>> E [MSGID: 101075] [common-utils.c:306:gf_resolve_ip6] 0-resolver:
>> getaddrinfo failed (Name or service not known)
>> E [name.c:247:af_inet_client_get_remote_sockaddr] 0-management: DNS
>> resolution failed on host x
>>
>> PS: I can ping the host though.
>>
>> -Bishoy
> 
> 
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
> 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] IPv6 peer probe problem

2015-11-05 Thread Gmail
Does any one run Gluster on IPv6???

-Bishoy
> On Oct 30, 2015, at 1:14 PM, Gmail  wrote:
> 
> Hello,
> 
> I’m trying to use IPv6 with Gluster 3.7.5, but when I do peer probe, I get 
> the following error:
> 
> peer probe: failed: Probe returned with Transport endpoint is not connected
> 
> and the logs show the following:
> 
> E [MSGID: 101075] [common-utils.c:306:gf_resolve_ip6] 0-resolver: getaddrinfo 
> failed (Name or service not known)
> E [name.c:247:af_inet_client_get_remote_sockaddr] 0-management: DNS 
> resolution failed on host x
> 
> PS: I can ping the host though.
> 
> -Bishoy

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] glusterfs 3.7.5 shutting down with signum 15 unexpectedly

2015-11-05 Thread Wade Fitzpatrick

Hi Jeremy

I have found that glusterfs 3.7.5 dies when I disconnect from the client 
if starting glusterfs from the command line but is stable when started 
with a systemd unit file. You might try using an upstart unit instead of 
init.d scripts.


Cheers,
Wade.

On 6/11/2015 9:43 AM, Jeremy Koerber wrote:

Hi all,
I recently upgraded to gluster 3.7.5 on a 6 node dist-repl cluster 
(one brick each), along with about 10 clients. I was on 3.7.4 
previously and it was stable. With 3.7.5 (running as a daemon, 
installed via *ppa:gluster/glusterfs-3.7*), the glusterd process shuts 
down unexpectedly after less than an hour typically. When I run it as 
`glusterd --debug` from the shell, it seems to run fine. I've been 
trying to set the log-level to DEBUG for the glusterfs-server service 
via the GLUSTERD_OPTS constant in the init.d script, but it doesn't 
seem to be taking. In any case, the only non INFO message I see in 
/var/log/glusterfs/etc-glusterfs-glusterd.vol.log right before it 
exits is:


[2015-11-05 22:49:02.387558] W [glusterfsd.c:1236:cleanup_and_exit] 
(-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182) [0x7f4e3dfc2182] 
-->/usr/sbin/glusterd(glusterfs_sigwaiter+0xd5) [0x7f4e3ecff7c5] 
-->/usr/sbin/glusterd(cleanup_and_exit+0x69) [0x7f4e3ecff659] ) 0-: 
received signum (15), shutting down


Happy to post the entire log if need be. Anyone else experiencing this 
or know how I might be able to dig a little deeper? Please let me know 
if I can provide any other diagnostic info.


Thank you,
Jeremy Koerber


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] glusterfs 3.7.5 shutting down with signum 15 unexpectedly

2015-11-05 Thread Jeremy Koerber
Hi all,
I recently upgraded to gluster 3.7.5 on a 6 node dist-repl cluster (one
brick each), along with about 10 clients. I was on 3.7.4 previously and it
was stable. With 3.7.5 (running as a daemon, installed via
*ppa:gluster/glusterfs-3.7*), the glusterd process shuts down unexpectedly
after less than an hour typically. When I run it as `glusterd --debug` from
the shell, it seems to run fine. I've been trying to set the log-level to
DEBUG for the glusterfs-server service via the GLUSTERD_OPTS constant in
the init.d script, but it doesn't seem to be taking. In any case, the only
non INFO message I see in /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
right before it exits is:

[2015-11-05 22:49:02.387558] W [glusterfsd.c:1236:cleanup_and_exit]
(-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182) [0x7f4e3dfc2182]
-->/usr/sbin/glusterd(glusterfs_sigwaiter+0xd5) [0x7f4e3ecff7c5]
-->/usr/sbin/glusterd(cleanup_and_exit+0x69) [0x7f4e3ecff659] ) 0-:
received signum (15), shutting down

Happy to post the entire log if need be. Anyone else experiencing this or
know how I might be able to dig a little deeper? Please let me know if I
can provide any other diagnostic info.

Thank you,
Jeremy Koerber
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Replacing a node in a 4x2 distributed/replicated setup

2015-11-05 Thread Thomas Bätzler
Hi,

A small update: since nothing else worked, I broke down and changed the
replacement system's IP and hostname to that of the broken system;
replaced its UUID with that of the downed machine and probed it back
into the gluster cluster. Had to restart glusterd several times to make
the other systems pick up the change.

I then added the volume-id attr to the new bricks as suggested on
https://joejulian.name/blog/replacing-a-brick-on-glusterfs-340/. After
that I was able to trigger a manual heal. By tomorrow I may have some
kind of estimate of how long the repair is going to take.


Bye,
Thomas



---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Missing files after add new bricks and remove old ones - how to restore files

2015-11-05 Thread Marco Lorenzo Crociani

Any news?
glusterfs version is 3.7.5

On 30/10/2015 17:51, Marco Lorenzo Crociani wrote:

Hi Susant,
here the stats:

[root@s20 brick1]# stat .* *
  File: `.'
  Size: 78Blocks: 0  IO Block: 4096 directory
Device: 811h/2065dInode: 2481712637  Links: 7
Access: (0755/drwxr-xr-x)  Uid: (0/root)   Gid: (0/ root)
Access: 2015-10-30 15:26:57.565475699 +0100
Modify: 2015-08-04 12:30:56.604846056 +0200
Change: 2015-10-27 14:21:12.981420157 +0100
  File: `..'
  Size: 50Blocks: 0  IO Block: 4096 directory
Device: 811h/2065dInode: 495824  Links: 6
Access: (0755/drwxr-xr-x)  Uid: (0/root)   Gid: (0/ root)
Access: 2015-10-27 11:27:36.956230769 +0100
Modify: 2015-08-04 11:25:27.893410342 +0200
Change: 2015-08-04 11:25:27.893410342 +0200
  File: `.glusterfs'
  Size: 8192  Blocks: 24 IO Block: 4096 directory
Device: 811h/2065dInode: 2481712643  Links: 261
Access: (0600/drw---)  Uid: (0/root)   Gid: (0/ root)
Access: 2015-10-30 15:31:14.988775452 +0100
Modify: 2015-08-04 12:31:40.803715075 +0200
Change: 2015-08-04 12:31:40.803715075 +0200
  File: `.trashcan'
  Size: 24Blocks: 0  IO Block: 4096 directory
Device: 811h/2065dInode: 495865  Links: 3
Access: (0755/drwxr-xr-x)  Uid: (0/root)   Gid: (0/ root)
Access: 2015-10-26 18:32:17.369070847 +0100
Modify: 2015-08-04 11:36:11.357529000 +0200
Change: 2015-10-26 18:32:17.368070850 +0100
  File: `lost+found'
  Size: 6 Blocks: 0  IO Block: 4096 directory
Device: 811h/2065dInode: 2481712624  Links: 2
Access: (0700/drwx--)  Uid: (0/root)   Gid: (0/ root)
Access: 2015-10-26 18:55:08.274323554 +0100
Modify: 2014-01-18 21:48:37.0 +0100
Change: 2015-10-26 18:55:08.259323594 +0100
  File: `rh'
  Size: 6 Blocks: 0  IO Block: 4096 directory
Device: 811h/2065dInode: 495961  Links: 2
Access: (0755/drwxr-xr-x)  Uid: (0/root)   Gid: (0/ root)
Access: 2015-10-26 14:02:02.294771698 +0100
Modify: 2015-03-26 13:22:19.0 +0100
Change: 2015-10-26 18:32:17.384070805 +0100
  File: `zimbra'
  Size: 4096  Blocks: 8  IO Block: 4096 directory
Device: 811h/2065dInode: 495969  Links: 50
Access: (0755/drwxr-xr-x)  Uid: (0/root)   Gid: (0/ root)
Access: 2015-10-30 15:27:44.899346957 +0100
Modify: 2015-10-26 18:32:17.733069841 +0100
Change: 2015-10-26 18:32:17.733069841 +0100



[root@s21 brick2]#  stat .* *
  File: `.'
  Size: 78Blocks: 0  IO Block: 4096 directory
Device: 811h/2065dInode: 501309  Links: 7
Access: (0755/drwxr-xr-x)  Uid: (0/root)   Gid: (0/ root)
Access: 2015-10-30 17:45:23.983929018 +0100
Modify: 2015-08-04 12:30:56.602392330 +0200
Change: 2015-10-26 18:32:17.327779305 +0100
  File: `..'
  Size: 50Blocks: 0  IO Block: 4096 directory
Device: 811h/2065dInode: 2484780736  Links: 6
Access: (0755/drwxr-xr-x)  Uid: (0/root)   Gid: (0/ root)
Access: 2015-10-30 17:45:20.800922878 +0100
Modify: 2015-08-04 11:25:27.942732803 +0200
Change: 2015-08-04 11:25:27.942732803 +0200
  File: `.glusterfs'
  Size: 8192  Blocks: 24 IO Block: 4096 directory
Device: 811h/2065dInode: 501323  Links: 261
Access: (0600/drw---)  Uid: (0/root)   Gid: (0/ root)
Access: 2015-08-04 11:36:13.776967886 +0200
Modify: 2015-08-04 12:31:40.801477366 +0200
Change: 2015-08-04 12:31:40.801477366 +0200
  File: `.trashcan'
  Size: 24Blocks: 0  IO Block: 4096 directory
Device: 811h/2065dInode: 2484780773  Links: 3
Access: (0755/drwxr-xr-x)  Uid: (0/root)   Gid: (0/ root)
Access: 2015-10-26 18:32:17.324779299 +0100
Modify: 2015-08-04 11:36:11.357529000 +0200
Change: 2015-10-26 18:32:17.368779386 +0100
  File: `lost+found'
  Size: 6 Blocks: 0  IO Block: 4096 directory
Device: 811h/2065dInode: 501268  Links: 2
Access: (0700/drwx--)  Uid: (0/root)   Gid: (0/ root)
Access: 2015-10-26 18:32:17.371779392 +0100
Modify: 2014-01-18 21:48:37.0 +0100
Change: 2015-10-26 18:55:08.260516194 +0100
  File: `rh'
  Size: 6 Blocks: 0  IO Block: 4096 directory
Device: 811h/2065dInode: 2484780842  Links: 2
Access: (0755/drwxr-xr-x)  Uid: (0/root)   Gid: (0/ root)
Access: 2015-10-26 18:32:17.386779422 +0100
Modify: 2015-03-26 13:22:19.0 +0100
Change: 2015-10-26 18:32:17.384779418 +0100
  File: `zimbra'
  Size: 4096  Blocks: 8  IO Block: 4096 directory
Device: 811h/2065dInode: 2484780856  Links: 50
Access: (0755/drwxr-xr-x)  Uid: (0/root)   Gid: (0/ root)
Access: 2015-10-26 18:34:10.34939 +0100
Modify: 2015-10-26 18:32:17.733780116 +0100
Change: 2015-10-26 18:32:17.733780116 +0100






[root@s20 brick1]# stat zimbra/jdk-1.7.0_45/db/bin/*
  File: `zimbra/jdk-1.7.0_45/db/bin/dblook'
  Size: 5740  B

Re: [Gluster-users] Replacing a node in a 4x2 distributed/replicated setup

2015-11-05 Thread Thomas Bätzler
Hi,

A small update: since nothing else worked, I broke down and changed the
replacement system's IP and hostname to that of the broken system;
replaced its UUID with that of the downed machine and probed it back
into the gluster cluster. Had to restart glusterd several times to make
the other systems pick up the change.

I then added the volume-id attr to the new bricks as suggested on
https://joejulian.name/blog/replacing-a-brick-on-glusterfs-340/. After
that I was able to trigger a manual heal. By tomorrow I may have some
kind of estimate of how long the repair is going to take.


Bye,
Thomas



---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Question on HA Active-Active Ganesha setup

2015-11-05 Thread Kaleb KEITHLEY
On 11/05/2015 10:13 AM, Surya K Ghatty wrote:
> All... I need your help! I am trying to setup Highly available
> Active-Active Ganesha configuration on two glusterfs nodes based on
> instructions here:
> 
> https://gluster.readthedocs.org/en/latest/Administrator%20Guide/Configuring%20HA%20NFS%20Server/
> and
> http://www.slideshare.net/SoumyaKoduri/high-49117846 and
> https://www.youtube.com/watch?v=Z4mvTQC-efM.
> 
> 
> *My questions:*
> 
> 1. what is the expected behvaior? Is the cluster.enable-shared-storage
> command expected to create shared storage? It seems odd to return a
> success message without creating the shared volume.
> 2. Any suggestions on how to get past this problem?
> 
> *Details:*
> I am using glusterfs 3.7.5 and Ganesha 2.2.0.6 installable packages. I'm
> installing
> 
> Also, I am using the following command
> 
> gluster volume set all cluster.enable-shared-storage enable
> 
> that would automatically setup the shared_storage directory under
> /run/gluster/ and automounts the shared volume for HA.
> 
> This command was working perfectly fine, and I was able to setup ganesha
> HA successfully on cent OS 7.0 running on bare metals - until now.
> 
> 
> 
> [root@qint-tor01-c7 gluster]# gluster vol set all
> cluster.enable-shared-storage enable
> volume set: success
> 
> [root@qint-tor01-c7 gluster]# pwd
> /run/gluster
> 
> [root@qint-tor01-c7 gluster]# ls
> 5027ba011969a8b2eca99ca5c9fb77ae.socket shared_storage
> changelog-9fe3f3fdd745db918d7d5c39fbe94017.sock snaps
> changelog-a9bf0a82aba38610df80c75a9adc45ad.sock
> 
> 
> Yesterday, we tried to deploy Ganesha HA with Gluster FSAL on a
> different cloud. and when I run the same command there, (same version of
> glusterfs and ganesha, same cent OS 7) - the command returned
> successfully, but it did not auto create the shared_storage directory.
> There were no logs either in
> /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
> 
> or /var/log/ganesha.log related to the command.
> 
> However, I do see these logs written to the etc-glusterfs-glusterd.vol.log
> 
> [2015-11-05 14:43:00.692762] W [socket.c:588:__socket_rwv] 0-nfs: readv
> on /var/run/gluster/9d5e1ba5e44bd1aa3331d2ee752a806a.socket failed
> (Invalid argument)
> 
> on both ganesha nodes independent of the commands I execute.
> 
> regarding this error, I did a ss -x | grep
> /var/run/gluster/9d5e1ba5e44bd1aa3331d2ee752a806a.socket
> 
> and it appears that no process was using these sockets, on either machines.
> 
> My questions:
> 
> 1. what is the expected behvaior? Is the cluster.enable-shared-storage
> command expected to create shared storage? It seems odd to return a
> success message without creating the shared volume.
> 2. Any suggestions on how to get past this problem?
> Regards,

The answer everyone hates to hear: It works for me.

I suspect it's not working in your case because it wants to create a
"replica 3" volume and you only have two nodes.

My blog at
http://blog.gluster.org/2015/10/linux-scale-out-nfsv4-using-nfs-ganesha-and-glusterfs-one-step-at-a-time/
documents what I did recently to set up a four node HA ganesha cluster
for testing at the NFS Bake-a-thon that Red Hat hosted recently.


> 
> Surya Ghatty
> 
> "This too shall pass"
> 
> Surya Ghatty | Software Engineer | IBM Cloud Infrastructure Services
> Development | tel: (507) 316-0559 | gha...@us.ibm.com
> 
> 
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
> 

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Question on HA Active-Active Ganesha setup

2015-11-05 Thread Surya K Ghatty


All... I need your help! I am trying to setup Highly available
Active-Active Ganesha configuration on two glusterfs nodes based on
instructions here:

https://gluster.readthedocs.org/en/latest/Administrator%20Guide/Configuring%20HA%20NFS%20Server/
 and
http://www.slideshare.net/SoumyaKoduri/high-49117846 and
https://www.youtube.com/watch?v=Z4mvTQC-efM.


My questions:

1. what is the expected behvaior? Is the cluster.enable-shared-storage
command expected to create shared storage? It seems odd to return a success
message without creating the shared volume.
2. Any suggestions on how to get past this problem?

Details:
I am using glusterfs 3.7.5 and Ganesha 2.2.0.6 installable packages. I'm
installing

Also, I am using the following command

gluster volume set all cluster.enable-shared-storage enable

that would automatically setup the shared_storage directory
under /run/gluster/ and automounts the shared volume for HA.

This command was working perfectly fine, and I was able to setup ganesha HA
successfully on cent OS 7.0 running on bare metals - until now.



[root@qint-tor01-c7 gluster]# gluster vol set all
cluster.enable-shared-storage enable
volume set: success

[root@qint-tor01-c7 gluster]# pwd
/run/gluster

[root@qint-tor01-c7 gluster]# ls
5027ba011969a8b2eca99ca5c9fb77ae.socket  shared_storage
changelog-9fe3f3fdd745db918d7d5c39fbe94017.sock  snaps
changelog-a9bf0a82aba38610df80c75a9adc45ad.sock


Yesterday, we tried to deploy Ganesha HA with Gluster FSAL on a different
cloud. and when I run the same command there, (same version of glusterfs
and ganesha, same cent OS 7) - the command returned successfully, but it
did not auto create the shared_storage directory. There were no logs either
in /var/log/glusterfs/etc-glusterfs-glusterd.vol.log

or  /var/log/ganesha.log related to the command.

However, I do see these logs written to the etc-glusterfs-glusterd.vol.log

[2015-11-05 14:43:00.692762] W [socket.c:588:__socket_rwv] 0-nfs: readv
on /var/run/gluster/9d5e1ba5e44bd1aa3331d2ee752a806a.socket failed (Invalid
argument)

on both ganesha nodes independent of the commands I execute.

regarding this error, I did a ss -x |
grep /var/run/gluster/9d5e1ba5e44bd1aa3331d2ee752a806a.socket

and it appears that no process was using these sockets, on either machines.

My questions:

1. what is the expected behvaior? Is the cluster.enable-shared-storage
command expected to create shared storage? It seems odd to return a success
message without creating the shared volume.
2. Any suggestions on how to get past this problem?
Regards,

Surya Ghatty

"This too shall pass"


Surya Ghatty | Software Engineer | IBM Cloud Infrastructure Services
Development | tel: (507) 316-0559 | gha...@us.ibm.com
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] File Corruption with shards - 100% reproducable

2015-11-05 Thread Lindsay Mathieson
On 5 November 2015 at 21:55, Krutika Dhananjay  wrote:

> Although I do not have experience with VM live migration,  IIUC, it is got
> to do with a different server (and as a result a new glusterfs client
> process) taking over the operations and mgmt of the VM.
>

Thats sounds very plausible


> If this is a correct assumption, then I think this could be the result of
> the same caching bug that I talked about sometime back in 3.7.5, which is
> fixed in 3.7.6.
> The issue could cause the new client to not see the correct size and block
> count of the file, leading to errors in reads (perhaps triggered by the
> restart of the vm) and writes on the image.
>

Cool, I look fwd to testing that in 3.7.6, which I believe is due out next
week?

thanks,





-- 
Lindsay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Shard file size (gluster 3.7.5)

2015-11-05 Thread Lindsay Mathieson
On 5 November 2015 at 21:19, Krutika Dhananjay  wrote:

> Just to be sure, did you rerun the test on the already broken file
> (test.bin) which was written to when strict-write-ordering had been off?
> Or did you try the new test with strict-write-ordering on a brand new file?
>

Very strange. I tried it on new files and even went to the extent of
deleting the datastore and bricks, then recreating.

One oddity on my system - I have two prefs that I cannot reset
- cluster.server-quorum-ratio
- performance.readdir-ahead

Though I would'nt have thought they made a difference. I might try cleaning
gluster and all its config files off the systems and *really* starting from
scratch.




-- 
Lindsay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Clients are unable to connect while rebalancing

2015-11-05 Thread Davy Croonen
Hi all,

We have a 4 node distributed/replicated setup (2 x 2) with gluster version 
3.6.4.

Yesterday one node went down due to a power failure, as expected everything 
kept working well. But after we brought up the failed node again gluster 
started, also as expected, its self-healing process. The moment that happens 
all our client connections to the gluster cluster start freezing one by one 
till the moment the healing process finishes.

After some research we found this page 
http://www.gluster.org/community/documentation/index.php/Documenting_the_undocumented
 with some possible performance options to tweak, 
performance.least-prio-threads and performance.least-rate-limit. Does anybody 
has experience with tweaking these options? What are there default settings? 
Are these the right options to tweak or do we needs another approach regarding 
our problem?

Thanks in advance.

Kind regards
Davy

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] File Corruption with shards - 100% reproducable

2015-11-05 Thread Krutika Dhananjay
Hi, 

Although I do not have experience with VM live migration, IIUC, it is got to do 
with a different server (and as a result a new glusterfs client process) taking 
over the operations and mgmt of the VM. 
If this is a correct assumption, then I think this could be the result of the 
same caching bug that I talked about sometime back in 3.7.5, which is fixed in 
3.7.6. 
The issue could cause the new client to not see the correct size and block 
count of the file, leading to errors in reads (perhaps triggered by the restart 
of the vm) and writes on the image. 

-Krutika 
- Original Message -

> From: "Lindsay Mathieson" 
> To: "gluster-users" 
> Sent: Thursday, November 5, 2015 3:53:25 AM
> Subject: [Gluster-users] File Corruption with shards - 100% reproducable

> Gluster 3.7.5, gluster repos, on proxmox (debian 8)

> I have an issue with VM images (qcow2) being corrupted.

> - gluster replica 3, shards on, shard size = 256MB
> - Gluster nodes are all also VM host nodes
> - VM image mounted from qemu via gfapi

> To reproduce
> - Start VM
> - live migrate it to another node
> - VM will rapidly become unresponsive and have to be stopped
> - attempting to restart the vm results in a "qcow2: Image is corrupt; cannot
> be opened read/write" error.

> I have never seen this before. 100% reproducible with shards on, never
> happens with shards off.

> I don't think this happens when using NFS to access the shard volume, I
> suspect because with NFS it is still accessing the one node, whereas with
> gfapi it's handed off to the node the VM is running on.

> --
> Lindsay

> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Shard file size (gluster 3.7.5)

2015-11-05 Thread Krutika Dhananjay
OK. I am not sure what it is that we're doing differently. I tried the steps 
you shared and here's what I got: 

[root@dhcp35-215 bricks]# gluster volume info 

Volume Name: rep 
Type: Replicate 
Volume ID: 3fd45a4b-0d02-4a44-b74a-41592d48e102 
Status: Started 
Number of Bricks: 1 x 3 = 3 
Transport-type: tcp 
Bricks: 
Brick1: kdhananjay:/bricks/1 
Brick2: kdhananjay:/bricks/2 
Brick3: kdhananjay:/bricks/3 
Options Reconfigured: 
performance.strict-write-ordering: on 
features.shard: on 
features.shard-block-size: 512MB 
cluster.quorum-type: auto 
client.event-threads: 4 
server.event-threads: 4 
cluster.self-heal-window-size: 256 
performance.write-behind: on 
nfs.enable-ino32: on 
nfs.addr-namelookup: off 
nfs.disable: on 
performance.cache-refresh-timeout: 4 
performance.cache-size: 1GB 
performance.write-behind-window-size: 128MB 
performance.io-thread-count: 32 
performance.readdir-ahead: on 

[root@dhcp35-215 mnt]# gluster volume set rep strict-write-ordering on 
volume set: success 
[root@dhcp35-215 mnt]# dd if=/dev/sda of=test.bin bs=1MB count=8192 
8192+0 records in 
8192+0 records out 
819200 bytes (8.2 GB) copied, 133.754 s, 61.2 MB/s 
[root@dhcp35-215 mnt]# ls -l 
total 800 
-rw-r--r--. 1 root root 819200 Nov 5 16:40 test.bin 
[root@dhcp35-215 mnt]# ls -lh 
total 7.7G 
-rw-r--r--. 1 root root 7.7G Nov 5 16:40 test.bin 
[root@dhcp35-215 mnt]# du test.bin 
800 test.bin 

[root@dhcp35-215 bricks]# du /bricks/1/.shard/ 
7475780 /bricks/1/.shard/ 
[root@dhcp35-215 bricks]# du /bricks/1/ 
.glusterfs/ .shard/ test.bin .trashcan/ 
[root@dhcp35-215 bricks]# du /bricks/1/test.bin 
524292 /bricks/1/test.bin 

Just to be sure, did you rerun the test on the already broken file (test.bin) 
which was written to when strict-write-ordering had been off? 
Or did you try the new test with strict-write-ordering on a brand new file? 

-Krutika 

- Original Message -

> From: "Lindsay Mathieson" 
> To: "Krutika Dhananjay" 
> Cc: "gluster-users" 
> Sent: Thursday, November 5, 2015 3:04:51 AM
> Subject: Re: [Gluster-users] Shard file size (gluster 3.7.5)

> On 5 November 2015 at 01:09, Krutika Dhananjay < kdhan...@redhat.com > wrote:

> > Ah! It's the same issue. Just saw your volume info output. Enabling
> > strict-write-ordering should ensure both size and disk usage are accurate.
> 
> Tested it - nope :( Size s accurate (27746172928 bytes), but disk usage is
> wildly inaccurate (698787).

> I have compression disabled on the underlying storage now.

> --
> Lindsay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users