[Gluster-users] Dispersed Volume Errors after failed expansion

2023-11-14 Thread edrock200
Hello,
I've run into an issue with Gluster 11.1 and need some assistance. I have a 4+1 
dispersed gluster setup consisting of 20 nodes and 200 bricks. This setup was 
15 nodes and 150 bricks until last week and was working flawlessly. We needed 
more space so we expanded the volume by adding 5 more nodes and 50 bricks.

We added the nodes and triggered a fix-layout command. Unknown to us at the 
time, one of the five new nodes had a hardware issue, the CPU cooling fan was 
bad. This caused the node to throttle down to 500mhz on all cores and 
eventually shut itself down mid fix-layout. Due to how our ISP works, we could 
only replace the entire node, so we did and executed a replace-brick command.

Presently this is the state we are in and I'm not sure how best to proceed to 
fix the errors and behavior I'm seeing. I'm not sure if running another 
fix-layout command again should be the next step or not given hundreds of 
objects are stuck in a persistent heal state, and the fact that doing just 
about any command other than status, info or heal volume info, results in all 
client mounts hanging for ~5m or bricks start to drop. The client logs show 
numerous anomolies as well such as:

[2023-11-10 17:41:52.153423 +] W [MSGID: 122040] 
[ec-common.c:1262:ec_prepare_update_cbk] 0-media-disperse-30: Failed to get 
size and version : FOP : 'XATTROP' failed on '/path/to/folder' with gfid 
0d295c94-5577-4445-9e57-6258f24d22c5. Parent FOP: OPENDIR [Input/output error]

[2023-11-10 17:48:46.965415 +] E [MSGID: 122038] 
[ec-dir-read.c:398:ec_manager_readdir] 0-media-disperse-36: EC is not winding 
readdir: FOP : 'READDIRP' failed on gfid f8ad28d0-05b4-4df3-91ea-73fabf27712c. 
Parent FOP: No Parent [File descriptor in bad state]

[2023-11-10 17:39:46.076149 +] I [MSGID: 109018] 
[dht-common.c:1840:dht_revalidate_cbk] 0-media-dht: Mismatching layouts for 
/path/to/folder2, gfid = f04124e5-63e6-4ddf-9b6b-aa47770f90f2

[2023-11-10 17:39:18.463421 +] E [MSGID: 122034] 
[ec-common.c:662:ec_log_insufficient_vol] 0-media-disperse-4: Insufficient 
available children for this request: Have : 0, Need : 4 : Child UP : 1 
Mask: 0, Healing : 0 : FOP : 'XATTROP' failed on 
'/path/to/another/folder with gfid f04124e5-63e6-4ddf-9b6b-aa47770f90f2. Parent 
FOP: SETXATTR

[2023-11-10 17:36:21.565681 +] W [MSGID: 122006] 
[ec-combine.c:188:ec_iatt_combine] 0-media-disperse-39: Failed to combine iatt 
(inode: 13324146332441721129-13324146332441721129, links: 2-2, uid: 1000-1000, 
gid: 1000-1001, rdev: 0-0, size: 10-10, mode: 40775-40775), FOP : 'LOOKUP' 
failed on '/path/to/yet/another/folder'. Parent FOP: No Parent

[2023-11-10 17:39:46.147299 +] W [MSGID: 114031] 
[client-rpc-fops_v2.c:2563:client4_0_lookup_cbk] 0-media-client-1: remote 
operation failed. [{path=/path/to/folder3}, 
{gfid=----}, {errno=13}, {error=Permission 
denied}]

[2023-11-10 17:39:46.093069 +] W [MSGID: 114061] 
[client-common.c:1232:client_pre_readdirp_v2] 0-media-client-14: remote_fd is 
-1. EBADFD [{gfid=f04124e5-63e6-4ddf-9b6b-aa47770f90f2}, {errno=77}, 
{error=File descriptor in bad state}]

[2023-11-10 17:55:11.407630 +] E [MSGID: 122038] 
[ec-dir-read.c:398:ec_manager_readdir] 0-media-disperse-30: EC is not winding 
readdir: FOP : 'READDIRP' failed on gfid 2bba7b7e-7a4b-416a-80f0-dd50caffd2c2. 
Parent FOP: No Parent [File descriptor in bad state]

[2023-11-10 17:39:46.076179 +] W [MSGID: 109221] 
[dht-selfheal.c:2023:dht_selfheal_directory] 0-media-dht: Directory selfheal 
failed [{path=/path/to/folder7}, {misc=2}, {unrecoverable-errors}, 
{gfid=f04124e5-63e6-4ddf-9b6b-aa47770f90f2}]

Something about this failed expansion has caused these errors and I'm not sure 
how to proceed. Right now doing just about anything causes the client mounts to 
hang for up to 5 minutes including restarting a node, trying to use a volume 
set command, etc. I tried increasing a cache timeout value and ~153 bricks out 
of 200 dropped offline. Restarting a node seems to cause the mounts to hang as 
well.

I've tried:
running a gluster volume heal volumename full - will cause mounts to hang for 
3-5m but seems to proceed
Running ls -alhR against volume to trigger heals
Tried removing new bricks, which triggers a rebalance which fails almost 
immediately, and most of the self-heal agents go offline as well
Turned off bit-rot to reduce load on system
Replace a brick with a new brick (same drive, new dir.) Attempted force as well.
Changed heal mode from diff to full
Lowered parallel heal count to 4

When I replaced the one brick, the heal count dropped on that brick from ~100 
to ~6, however, those 6 are folders in the root of the volume vs subfolders 
many layers in. I suspect this is causing a lot of the issues I'm seeing and I 
don't know how to resolve this without damaging any of the existing data.

I'm hoping its just due to the fix layout failing and that just needs to run 
again but wanted to seek 

Re: [Gluster-users] dispersed volume + cifs export does not work (replicated + cifs works fine)

2020-03-10 Thread Felix Kölzow

To answer my one question and maybe it is helpful for the community:

Due to a different (minor) issue with the posix-permissions, we set
stat-prefetch to off.

This causes the issue mentioned below, i.e. it had nothing to do with
the smb.conf setting.


gluster volume set volname performance.stat-prefetch on


solved that issue.

Reference:

https://access.redhat.com/solutions/4558341


On 20/10/2019 18:26, Felix Kölzow wrote:


Dear Gluster-Users,

_
_

_short story:_

_
_

Two volumes are exported via smb/cifs with the

(almost) the same configuration with respect to smb.conf. The
replicated volume

is easily accessible via cifs and fuse. The dispersed volume is
accessible via fuse,

but not via cifs.

Error message from windows client:

The parameter is incorrect



Maybe the error is somehow related to this:

https://gluster-users.gluster.narkive.com/g35gmGj6/vfs-gluster-broken


_more information:
_


We have created a gluster setup that consists of three servers, and
each server

provides two bricks. Two volumes are created on these bricks and are
going to exported via

smb/cifs.


  * replicated distributed
  * dispersed


The volume settings are given here:


[root@node1 ~]# gluster volume info replicated_cifs

Volume Name: replicated_cifs
Type: Distributed-Replicate
Volume ID: 51bb4440-3b8e-48be-a84c-5ea9e1ddd38e
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: node1:/gluster/vg00/replicated_cifs/brick
Brick2: node2:/gluster/vg00/replicated_cifs/brick
Brick3: node3:/gluster/vg00/replicated_cifs/brick
Brick4: node1:/gluster/vg01/replicated_cifs/brick
Brick5: node2:/gluster/vg01/replicated_cifs/brick
Brick6: node3:/gluster/vg01/replicated_cifs/brick
Options Reconfigured:
features.show-snapshot-directory: on
features.uss: enable
features.barrier: disable
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
user.cifs: enable
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.cache-samba-metadata: on
performance.stat-prefetch: disable
performance.cache-invalidation: on
performance.md-cache-timeout: 600
network.inode-lru-limit: 20
performance.nl-cache: on
performance.nl-cache-timeout: 600
performance.readdir-ahead: on
performance.parallel-readdir: on
client.event-threads: 4
server.event-threads: 4
server.root-squash: off
cluster.lookup-optimize: on
features.quota: on
features.inode-quota: on
features.quota-deem-statfs: on
performance.cache-size: 10GB
cluster.server-quorum-ratio: 51%
cluster.enable-shared-storage: enable


[root@node1 ~]# gluster volume info dispersed_cifs

Volume Name: dispersed_cifs
Type: Disperse
Volume ID: 0a291429-1875-41c8-96ff-bce0054ed309
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (4 + 2) = 6
Transport-type: tcp
Bricks:
Brick1: node1:/gluster/vg00/dispersed_cifs/brick
Brick2: node2:/gluster/vg00/dispersed_cifs/brick
Brick3: node3:/gluster/vg00/dispersed_cifs/brick
Brick4: node1:/gluster/vg01/dispersed_cifs/brick
Brick5: node2:/gluster/vg01/dispersed_cifs/brick
Brick6: node3:/gluster/vg01/dispersed_cifs/brick
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
transport.address-family: inet
nfs.disable: on
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
user.cifs: enable
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.cache-samba-metadata: on
performance.stat-prefetch: disable
performance.cache-invalidation: on
performance.md-cache-timeout: 600
network.inode-lru-limit: 20
performance.nl-cache: on
performance.nl-cache-timeout: 600
performance.readdir-ahead: on
performance.parallel-readdir: on
server.event-threads: 4
client.event-threads: 4
server.root-squash: off
cluster.lookup-optimize: on
features.quota: on
features.inode-quota: on
features.quota-deem-statfs: on
performance.cache-size: 10GB
cluster.server-quorum-ratio: 51%
cluster.enable-shared-storage: enable


The export via cifs looks:

_distributed replicated:_

[gluster-replicated_cifs]
vfs objects = fruit acl_xattr glusterfs
acl_xattr:ignore system acls = yes
acl_xattr:default acl style = windows
glusterfs:volume = replicated_cifs
glusterfs:logfile = /var/log/samba/glusterfs-replicated_cifs.%M.log
glusterfs:loglevel = 7
kernel share modes = no
path = /
read only = no
guest ok = no
browseable = no

[replicated_data]
vfs objects = fruit acl_xattr shadow_copy2 glusterfs
acl_xattr:ignore system acls = yes
acl_xattr:default acl style = windows
glusterfs:volume = replicated_cifs
glusterfs:logfile = /var/log/samba/glusterfs-replicated_data.%M.log
glusterfs:loglevel = 7
kernel share modes = no
path = /replicated_data
read only = no
guest ok = no
create mask = 0660
directory mask = 0770
map acl inherit = yes
inherit permissions = 

[Gluster-users] dispersed volume + cifs export does not work (replicated + cifs works fine)

2019-10-20 Thread Felix Kölzow

Dear Gluster-Users,

_
_

_short story:_

_
_

Two volumes are exported via smb/cifs with the

(almost) the same configuration with respect to smb.conf. The replicated
volume

is easily accessible via cifs and fuse. The dispersed volume is
accessible via fuse,

but not via cifs.

Error message from windows client:

The parameter is incorrect



Maybe the error is somehow related to this:

https://gluster-users.gluster.narkive.com/g35gmGj6/vfs-gluster-broken


_more information:
_


We have created a gluster setup that consists of three servers, and each
server

provides two bricks. Two volumes are created on these bricks and are
going to exported via

smb/cifs.


 * replicated distributed
 * dispersed


The volume settings are given here:


[root@node1 ~]# gluster volume info replicated_cifs

Volume Name: replicated_cifs
Type: Distributed-Replicate
Volume ID: 51bb4440-3b8e-48be-a84c-5ea9e1ddd38e
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: node1:/gluster/vg00/replicated_cifs/brick
Brick2: node2:/gluster/vg00/replicated_cifs/brick
Brick3: node3:/gluster/vg00/replicated_cifs/brick
Brick4: node1:/gluster/vg01/replicated_cifs/brick
Brick5: node2:/gluster/vg01/replicated_cifs/brick
Brick6: node3:/gluster/vg01/replicated_cifs/brick
Options Reconfigured:
features.show-snapshot-directory: on
features.uss: enable
features.barrier: disable
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
user.cifs: enable
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.cache-samba-metadata: on
performance.stat-prefetch: disable
performance.cache-invalidation: on
performance.md-cache-timeout: 600
network.inode-lru-limit: 20
performance.nl-cache: on
performance.nl-cache-timeout: 600
performance.readdir-ahead: on
performance.parallel-readdir: on
client.event-threads: 4
server.event-threads: 4
server.root-squash: off
cluster.lookup-optimize: on
features.quota: on
features.inode-quota: on
features.quota-deem-statfs: on
performance.cache-size: 10GB
cluster.server-quorum-ratio: 51%
cluster.enable-shared-storage: enable


[root@node1 ~]# gluster volume info dispersed_cifs

Volume Name: dispersed_cifs
Type: Disperse
Volume ID: 0a291429-1875-41c8-96ff-bce0054ed309
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (4 + 2) = 6
Transport-type: tcp
Bricks:
Brick1: node1:/gluster/vg00/dispersed_cifs/brick
Brick2: node2:/gluster/vg00/dispersed_cifs/brick
Brick3: node3:/gluster/vg00/dispersed_cifs/brick
Brick4: node1:/gluster/vg01/dispersed_cifs/brick
Brick5: node2:/gluster/vg01/dispersed_cifs/brick
Brick6: node3:/gluster/vg01/dispersed_cifs/brick
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
transport.address-family: inet
nfs.disable: on
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
user.cifs: enable
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.cache-samba-metadata: on
performance.stat-prefetch: disable
performance.cache-invalidation: on
performance.md-cache-timeout: 600
network.inode-lru-limit: 20
performance.nl-cache: on
performance.nl-cache-timeout: 600
performance.readdir-ahead: on
performance.parallel-readdir: on
server.event-threads: 4
client.event-threads: 4
server.root-squash: off
cluster.lookup-optimize: on
features.quota: on
features.inode-quota: on
features.quota-deem-statfs: on
performance.cache-size: 10GB
cluster.server-quorum-ratio: 51%
cluster.enable-shared-storage: enable


The export via cifs looks:

_distributed replicated:_

[gluster-replicated_cifs]
vfs objects = fruit acl_xattr glusterfs
acl_xattr:ignore system acls = yes
acl_xattr:default acl style = windows
glusterfs:volume = replicated_cifs
glusterfs:logfile = /var/log/samba/glusterfs-replicated_cifs.%M.log
glusterfs:loglevel = 7
kernel share modes = no
path = /
read only = no
guest ok = no
browseable = no

[replicated_data]
vfs objects = fruit acl_xattr shadow_copy2 glusterfs
acl_xattr:ignore system acls = yes
acl_xattr:default acl style = windows
glusterfs:volume = replicated_cifs
glusterfs:logfile = /var/log/samba/glusterfs-replicated_data.%M.log
glusterfs:loglevel = 7
kernel share modes = no
path = /replicated_data
read only = no
guest ok = no
create mask = 0660
directory mask = 0770
map acl inherit = yes
inherit permissions = yes
inherit acls = true
store dos attributes = yes
shadow:snapdir = /.snaps
shadow:basedir = /
shadow:sort = desc
shadow:snapprefix = snap_replicated_cifs
shadow:format = _GMT-%Y.%m.%d-%H.%M.%S


_dispersed volume:_

[gluster-dispersed_cifs]
vfs objects = fruit acl_xattr glusterfs
acl_xattr:ignore system acls = yes
acl_xattr:default acl style = windows
glusterfs:volume = dispersed_cifs
glusterfs:logfile = 

Re: [Gluster-users] Dispersed volume and auto-heal

2016-12-07 Thread Serkan Çoban
No, you should replace the brick.

On Wed, Dec 7, 2016 at 1:02 PM, Cedric Lemarchand  wrote:
> Hello,
>
> Is gluster able to auto-heal when some bricks are lost ? by auto-heal I mean 
> that losted parity are re-generated on bricks that are still available in 
> order to recover the level of redundancy without replacing the failed bricks.
>
> I am in the learning curve, apologies if the question is trivial.
>
> Cheers,
>
> Cédric
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Dispersed volume and auto-heal

2016-12-07 Thread Cedric Lemarchand
Hello,

Is gluster able to auto-heal when some bricks are lost ? by auto-heal I mean 
that losted parity are re-generated on bricks that are still available in order 
to recover the level of redundancy without replacing the failed bricks.

I am in the learning curve, apologies if the question is trivial.

Cheers,

Cédric


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] DISPERSED VOLUME

2016-11-25 Thread Serkan Çoban
I think you should try with a bigger file.1,10,100,1000KB?
Small files might just being replicated to bricks...(Just a guess..)

On Fri, Nov 25, 2016 at 12:41 PM, Alexandre Blanca
 wrote:
> Hi,
>
> I am a beginner in distributed file systems and I currently work on
> Glusterfs.
> I work with 4 VM : srv1, srv2, srv3 and cli1
> I tested several types of volume (distributed, replicated, striped ...)
> which are for me JBOD, RAID 1 and RAID 0.
> When I try to make a dispersed volume (raid5 / 6) I have a misunderstanding
> ...
>
>
> gluster volume create gv7 disperse-data 3 redundancy 1
> ipserver1:/data/brick1/gv7 ipserver2:/data/brick1/gv7
> ipserver3:/data/brick1/gv7 ipserver4:/data/brick1/gv7
>
>
> gluster volume info
>
>
> Volume Name: gv7
> Type: Disperse
> Status: Created
> Number of Bricks: 4
> Transport-type: tcp
> Bricks:
> Brick1: ipserver1:/data/brick1/gv7
> Brick2: ipserver2:/data/brick1/gv7
> Brick3: ipserver3:/data/brick1/gv7
> Brick4: ipserver4:/data/brick1/gv7
>
> gluster volume start gv7
>
>
> mkdir /home/cli1/gv7_dispersed_directory
>
>
> mount -t glusterfs ipserver1:/gv7 /home/cli1/gv7_dispersed_directory
>
>
>
> Now, when i create a file on my moint point (gv7_dispersed_directory) :
>
>
> cd /home/cli1/gv7_dispersed_directory
>
>
> echo 'hello world !' >> test_file
>
>
> I can see in my srv1 :
>
>
> cd /data/brick1/gv7
>
>
> cat test
>
>
> hello world !
>
>
> in my srv2 :
>
>
>
> cd /data/brick1/gv7
>
>
>
> cat test
>
>
>
> hello world !
>
>
> in my srv4:
>
>
>
> cd /data/brick1/gv7
>
>
>
> cat test
>
>
>
> hello world !
>
>
> but in my srv3 :
>
>
>
> cd /data/brick1/gv7
>
>
>
> cat test
>
>
>
> hello world !
>
> hello world !
>
>
> Why?! output of server 3 displays 2 times hello world ! Parity? Redundancy?
> I don't know...
>
> Best regards
>
> Alex
>
>
>
>
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] DISPERSED VOLUME

2016-11-25 Thread Alexandre Blanca
Hi, 
I am a beginner in distributed file systems and I currently work on Glusterfs. 
I work with 4 VM : srv1, srv2, srv3 and cli1 
I tested several types of volume (distributed, replicated, striped ...) which 
are for me JBOD, RAID 1 and RAID 0. 
When I try to make a dispersed volume (raid5 / 6) I have a misunderstanding ... 



gluster volume create gv7 disperse-data 3 redundancy 1 
ipserver1:/data/brick1/gv7 ipserver2:/data/brick1/gv7 
ipserver3:/data/brick1/gv7 ipserver4:/data/brick1/gv7 



gluster volume info 


Volume Name: gv7 Type: Disperse Status: Created Number of Bricks: 4 
Transport-type: tcp Bricks: Brick1: ipserver1:/data/brick1/gv7 Brick 2 : 
ipserver 2 :/data/brick 1 /gv 7 Brick3: ipserver3:/data/brick1/gv7 Brick4: 
ipserver4:/data/brick1/gv7 



gluster volume start gv7 




m kdir / home/cli1/gv7_dispersed_directory 




m ount -t glusterfs ipserv er 1:/gv 7 / home/cli1 / gv7_dispersed_directory 







Now, when i create a file on my moint point (gv7_dispersed_directory) : 





cd /home/cli1/gv7 _dispersed_directory 





echo 'hello world !' >> test_file 





I can see in my srv1 : 





cd /data/brick1/gv7 





cat test 





hello world ! 





in my srv2 : 






cd /data/brick1/gv7 








cat test 








hello world ! 




in my srv4: 






cd /data/brick1/gv7 








cat test 








hello world ! 




but in my srv3 : 






cd /data/brick1/gv7 








cat test 








hello world ! 

hello world ! 






Why?! output of server 3 displays 2 times hello world ! Parity? Redundancy? I 
don't know... 

Best regards 

Alex 















___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users