[Gluster-users] is 10.4 released?

2023-04-20 Thread Eli V
I see packages for 10.4, but no release announcement or release notes.
Anyone know what's the status?




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] New Gluster volume (10.3) not healing symlinks after brick offline

2023-02-24 Thread Eli V
I've seen issues with symlinks failing to heal as well. I never found
a good solution on the glusterfs side of things. Most reliable fix I
found is just rm and recreate the symlink in the fuse volume itself.
Also, I'd strongly suggest heavy load testing before upgrading to 10.3
in production, after upgrading from 9.5 -> 10.3 I've seen frequent
brick process crashes(glusterfsd), whereas 9.5 was quite stable.

On Mon, Jan 23, 2023 at 3:58 PM Matt Rubright  wrote:
>
> Hi friends,
>
> I have recently built a new replica 3 arbiter 1 volume on 10.3 servers and 
> have been putting it through its paces before getting it ready for production 
> use. The volume will ultimately contain about 200G of web content files 
> shared among multiple frontends. Each will use the gluster fuse client to 
> connect.
>
> What I am experiencing sounds very much like this post from 9 years ago: 
> https://lists.gnu.org/archive/html/gluster-devel/2013-12/msg00103.html
>
> In short, if I perform these steps I can reliably end up with symlinks on the 
> volume which will not heal either by initiating a 'full heal' from the 
> cluster or using a fuse client to read each file:
>
> 1) Verify that all nodes are healthy, the volume is healthy, and there are no 
> items needing to be healed
> 2) Cleanly shut down one server hosting a brick
> 3) Copy data, including some symlinks, from a fuse client to the volume
> 4) Bring the brick back online and observe the number and type of items 
> needing to be healed
> 5) Initiate a full heal from one of the nodes
> 6) Confirm that while files and directories are healed, symlinks are not
>
> Please help me determine if I have improper expectations here. I have some 
> basic knowledge of managing gluster volumes, but I may be misunderstanding 
> intended behavior.
>
> Here is the volume info and heal data at each step of the way:
>
> *** Verify that all nodes are healthy, the volume is healthy, and there are 
> no items needing to be healed ***
>
> # gluster vol info cwsvol01
>
> Volume Name: cwsvol01
> Type: Replicate
> Volume ID: 7b28e6e6-4a73-41b7-83fe-863a45fd27fc
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: glfs02-172-20-1:/data/brick01/cwsvol01
> Brick2: glfs01-172-20-1:/data/brick01/cwsvol01
> Brick3: glfsarb01-172-20-1:/data/arb01/cwsvol01 (arbiter)
> Options Reconfigured:
> performance.client-io-threads: off
> nfs.disable: on
> transport.address-family: inet
> storage.fips-mode-rchecksum: on
> cluster.granular-entry-heal: on
>
> # gluster vol status
> Status of volume: cwsvol01
> Gluster process TCP Port  RDMA Port  Online  Pid
> --
> Brick glfs02-172-20-1:/data/brick01/cwsvol0
> 1   50253 0  Y   1397
> Brick glfs01-172-20-1:/data/brick01/cwsvol0
> 1   56111 0  Y   1089
> Brick glfsarb01-172-20-1:/data/arb01/cwsvol
> 01  54517 0  Y   
> 118704
> Self-heal Daemon on localhost   N/A   N/AY   1413
> Self-heal Daemon on glfs01-172-20-1 N/A   N/AY   3490
> Self-heal Daemon on glfsarb01-172-20-1  N/A   N/AY   
> 118720
>
> Task Status of Volume cwsvol01
> --
> There are no active volume tasks
>
> # gluster vol heal cwsvol01 info summary
> Brick glfs02-172-20-1:/data/brick01/cwsvol01
> Status: Connected
> Total Number of entries: 0
> Number of entries in heal pending: 0
> Number of entries in split-brain: 0
> Number of entries possibly healing: 0
>
> Brick glfs01-172-20-1:/data/brick01/cwsvol01
> Status: Connected
> Total Number of entries: 0
> Number of entries in heal pending: 0
> Number of entries in split-brain: 0
> Number of entries possibly healing: 0
>
> Brick glfsarb01-172-20-1:/data/arb01/cwsvol01
> Status: Connected
> Total Number of entries: 0
> Number of entries in heal pending: 0
> Number of entries in split-brain: 0
> Number of entries possibly healing: 0
>
> *** Cleanly shut down one server hosting a brick ***
>
> *** Copy data, including some symlinks, from a fuse client to the volume ***
>
> # gluster vol heal cwsvol01 info summary
> Brick glfs02-172-20-1:/data/brick01/cwsvol01
> Status: Transport endpoint is not connected
> Total Number of entries: -
> Number of entries in heal pending: -
> Number of entries in split-brain: -
> Number of entries possibly healing: -
>
> Brick glfs01-172-20-1:/data/brick01/cwsvol01
> Status: Connected
> Total Number of entries: 810
> Number of entries in heal pending: 810
> Number of entries in split-brain: 0
> Number of entries possibly healing: 0
>
> Brick glfsarb01-172-20-1:/data/arb01/cwsvol01
> Status: Connected
> Total Number of entries: 810
> Number of 

[Gluster-users] Gluster 11.0 upgrade report

2023-02-24 Thread Eli V
Just upgraded my test 3 node distributed-replica 9x2 glusterfs to 11.0
and it was a bit rough. After upgrading the 1st node, gluster volume
status showed only the bricks on node 1, and gluster peer status
showed node1 rejecting node 2 & 3. After upgrading node2, and then
node3, node 3 remained rejected. I followed the docs for resolving the
rejected peer, i.e. clean out /var/lib/glusterd other than .info file
and was able to peer probe and get node 3 back into the cluster.
However, the fuse glusterfs client is now oddly reporting the volume
is only 1.1TB, versus the 2.5TB before(9x280GB disks). Also,
glusterfsd's seem to crash under load testing just as much as 10, and
it created unhealable files which I'd never seen on 10, and only
resolved it by rm -rf on the whole testing directory tree.




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] glusterfs v10.3 crashing

2022-12-06 Thread Eli V
I've started seeing lots of crashes recently after having a stable
gluster for a couple of years. I upgraded from 9.6 to 10.3 in hopes of
more stability and got 3 crashes yesterday. All 3 seemed to have the
rpc_transport_unref line near the top, not sure what's going on, but
it's making the filesystem unusable:

Crash1
Program terminated with signal SIGBUS, Bus error.
#0  0x7f0d66b4e7aa in __gf_free (free_ptr=0x7f0d58153678) at mem-pool.c:363
363 mem-pool.c: No such file or directory.
[Current thread is 1 (Thread 0x7f0d53fff700 (LWP 1664975))]
(gdb) where
#0  0x7f0d66b4e7aa in __gf_free (free_ptr=0x7f0d58153678) at mem-pool.c:363
#1  __gf_free (free_ptr=0x7f0d58153678) at mem-pool.c:332
#2  0x7f0d66ad04fa in rpc_transport_unref () from
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0
#3  0x7f0d66b7a71d in event_dispatch_epoll_handler
(event=0x7f0d53ffe054, event_pool=0x556c97385518) at event-epoll.c:638
#4  event_dispatch_epoll_worker (data=0x7f0d54006928) at event-epoll.c:749
#5  0x7f0d66a90ea7 in start_thread (arg=) at
pthread_create.c:477
#6  0x7f0d669b0a2f in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95


Crash2:
Core was generated by /usr/sbin/glusterfsd
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x in ?? ()
[Current thread is 1 (Thread 0x7f26fc1bc700 (LWP 1731799))]
(gdb) where
#0  0x in ?? ()
#1  0x7f270957f4a5 in rpc_transport_unref
(this=this@entry=0x7f26f40a0ad8) at rpc-transport.c:501
#2  0x7f26feec3f4b in server_process_event_upcall
(this=this@entry=0x7f26f002b0f8, data=data@entry=0x7f26fc1b9a70) at
server.c:1499
#3  0x7f26feec4964 in server_notify (this=0x7f26f002b0f8,
event=19, data=0x7f26fc1b9a70) at server.c:1620
#4  0x7f27095cc244 in xlator_notify (xl=0x7f26f002b0f8, event=19,
data=0x7f26fc1b9a70) at xlator.c:711
#5  0x7f270965df2b in default_notify (this=0x7f26f0028fd8,
event=, data=0x7f26fc1b9a70) at defaults.c:3414
#6  0x7f26fef95b09 in ?? () from
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/debug/io-stats.so
#7  0x7f27095cc244 in xlator_notify (xl=0x7f26f0028fd8, event=19,
data=0x7f26fc1b9a70) at xlator.c:711
#8  0x7f270965df2b in default_notify
(this=this@entry=0x7f26f00272c8, event=event@entry=19,
data=data@entry=0x7f26fc1b9a70) at defaults.c:3414
#9  0x7f26fefc27ce in notify (this=0x7f26f00272c8, event=19,
data=0x7f26fc1b9a70) at quota.c:5017
#10 0x7f27095cc244 in xlator_notify (xl=0x7f26f00272c8, event=19,
data=0x7f26fc1b9a70) at xlator.c:711
#11 0x7f270965df2b in default_notify
(this=this@entry=0x7f26f0025548, event=event@entry=19,
data=data@entry=0x7f26fc1b9a70) at defaults.c:3414
#12 0x7f26fefe47b1 in notify (this=0x7f26f0025548, event=19,
data=0x7f26fc1b9a70) at index.c:2664
#13 0x7f27095cc244 in xlator_notify (xl=0x7f26f0025548, event=19,
data=0x7f26fc1b9a70) at xlator.c:711
#14 0x7f270965df2b in default_notify
(this=this@entry=0x7f26f00239e8, event=event@entry=19,
data=data@entry=0x7f26fc1b9a70) at defaults.c:3414
#15 0x7f26feff68c0 in notify (this=0x7f26f00239e8, event=19,
data=0x7f26fc1b9a70) at barrier.c:516
#16 0x7f27095cc244 in xlator_notify (xl=0x7f26f00239e8, event=19,
data=0x7f26fc1b9a70) at xlator.c:711
#17 0x7f270965df2b in default_notify (this=0x7f26f0021878,
event=, data=0x7f26fc1b9a70) at defaults.c:3414
#18 0x7f27095cc244 in xlator_notify (xl=0x7f26f0021878, event=19,
data=0x7f26fc1b9a70) at xlator.c:711
#19 0x7f270965df2b in default_notify (this=0x7f26f001fe28,
event=, data=0x7f26fc1b9a70) at defaults.c:3414
#20 0x7f27095cc244 in xlator_notify (xl=0x7f26f001fe28, event=19,
data=0x7f26fc1b9a70) at xlator.c:711
#21 0x7f270965df2b in default_notify
(this=this@entry=0x7f26f001e428, event=event@entry=19,
data=data@entry=0x7f26fc1b9a70) at defaults.c:3414
#22 0x7f2704033efd in notify (this=0x7f26f001e428, event=19,
data=0x7f26fc1b9a70) at io-threads.c:1339
#23 0x7f27095cc244 in xlator_notify (xl=0x7f26f001e428, event=19,
data=0x7f26fc1b9a70) at xlator.c:711
#24 0x7f270965df2b in default_notify
(this=this@entry=0x7f26f001c8c8, event=event@entry=19,
data=data@entry=0x7f26fc1b9a70) at defaults.c:3414
#25 0x7f270404ccc1 in notify (event=,
data=0x7f26fc1b9a70, this=0x7f26f001c8c8) at upcall.c:2368
#26 notify (this=0x7f26f001c8c8, event=,
data=0x7f26fc1b9a70) at upcall.c:2355
#27 0x7f270404daa4 in upcall_client_cache_invalidate
(this=this@entry=0x7f26f001c8c8, gfid=gfid@entry=0x7f26f82dad34
"y-\300b\261;H6\221\306\003\031\307\306", ,
up_client_entry=up_client_entry@entry=0x7f2680327d48,
flags=flags@entry=24,
stbuf=stbuf@entry=0x7f26fc1ba0d0, p_stbuf=p_stbuf@entry=0x0,
oldp_stbuf=0x0, xattr=0x0, now=1670297728) at upcall-internal.c:632
#28 0x7f270405233c in upcall_cache_invalidate
(frame=0x7f2684be2738, this=0x7f26f001c8c8, client=0x7f26f022e078,
inode=, flags=24, stbuf=0x7f26fc1ba0d0, p_stbuf=0x0,
oldp_stbuf=0x0, xattr=0x0) at upcall-internal.c:566
#29 0x7f2704041835 in 

Re: [Gluster-users] gluster volume not healing - remote operation failed

2022-11-02 Thread Eli V
On Wed, Sep 14, 2022 at 7:08 AM  wrote:
>
> Hi folks,
>
> my gluster volume isn't fully healing. We had an outage couple days ago
> and all other files got healed successfully. Now - days later - i can
> see there are still two gfid's per node remaining in healing list.
>
> root@storage-001~# for i in `gluster volume list`; do gluster volume
> heal $i info; done
> Brick storage-003.mydomain.com:/mnt/bricks/g-volume-myvolume
> 
> 
> Status: Connected
> Number of entries: 2
>
> Brick storage-002.mydomain.com:/mnt/bricks/g-volume-myvolume
> 
> 
> Status: Connected
> Number of entries: 2
>
> Brick storage-001.mydomain.com:/mnt/bricks/g-volume-myvolume
> 
> 
> Status: Connected
> Number of entries: 2
>
> In the log i can see that the glustershd process is invoked to heal the
> reamining files but fails with "remote operation failed".
> [2022-09-14 10:56:50.007978 +] I [MSGID: 108026]
> [afr-self-heal-entry.c:1053:afr_selfheal_entry_do]
> 0-g-volume-myvolume-replicate-0: performing entry selfheal on
> 48791313-e5e7-44df-bf99-3ebc8d4cf5d5
> [2022-09-14 10:56:50.008428 +] I [MSGID: 108026]
> [afr-self-heal-entry.c:1053:afr_selfheal_entry_do]
> 0-g-volume-myvolume-replicate-0: performing entry selfheal on
> a4babc5a-bd5a-4429-b65e-758651d5727c
> [2022-09-14 10:56:50.015005 +] E [MSGID: 114031]
> [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk]
> 0-g-volume-myvolume-client-2: remote operation failed. [{path=(null)},
> {errno=22}, {error=Invalid argument}]
> [2022-09-14 10:56:50.015007 +] E [MSGID: 114031]
> [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk]
> 0-g-volume-myvolume-client-3: remote operation failed. [{path=(null)},
> {errno=22}, {error=Invalid argument}]
> [2022-09-14 10:56:50.015138 +] E [MSGID: 114031]
> [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk]
> 0-g-volume-myvolume-client-4: remote operation failed. [{path=(null)},
> {errno=22}, {error=Invalid argument}]
> [2022-09-14 10:56:50.614082 +] E [MSGID: 114031]
> [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk]
> 0-g-volume-myvolume-client-2: remote operation failed. [{path=(null)},
> {errno=22}, {error=Invalid argument}]
> [2022-09-14 10:56:50.614108 +] E [MSGID: 114031]
> [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk]
> 0-g-volume-myvolume-client-3: remote operation failed. [{path=(null)},
> {errno=22}, {error=Invalid argument}]
> [2022-09-14 10:56:50.614099 +] E [MSGID: 114031]
> [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk]
> 0-g-volume-myvolume-client-4: remote operation failed. [{path=(null)},
> {errno=22}, {error=Invalid argument}]
> [2022-09-14 10:56:51.619623 +] E [MSGID: 114031]
> [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk]
> 0-g-volume-myvolume-client-2: remote operation failed. [{path=(null)},
> {errno=22}, {error=Invalid argument}]
> [2022-09-14 10:56:51.619630 +] E [MSGID: 114031]
> [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk]
> 0-g-volume-myvolume-client-3: remote operation failed. [{path=(null)},
> {errno=22}, {error=Invalid argument}]
> [2022-09-14 10:56:51.619632 +] E [MSGID: 114031]
> [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk]
> 0-g-volume-myvolume-client-4: remote operation failed. [{path=(null)},
> {errno=22}, {error=Invalid argument}]
>
> The gluster is running with opversion 9 on CentOS. There are no
> entries in split brain.
>
> How can i get these files finally healed?
>
> Thanks in advance.
> 

I've seen this too. The only I've found to fix it is run a find under
each of my bricks and run getfattr -n trusted.gfid -e hex on all the
files, saving the output to a text file and then greping for the
problematic gfid's to identify which file it is. Accessing the files
through the gluster fuse mount can sometimes heal them, but I've had
symlinks I just had to rm and recreate and other files that were just
failed removals that only exist in one brick and no others that have
to be removed by hand. Happens often enough I wrote a script that
traverses all files under a brick and recursively removes the file in
the brick and it's gfid version under .glusterfs.  I can dig it up if
you're still interested, don't have it handy atm.




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] heal failure after bricks go down

2022-08-03 Thread Eli V
Sequence of events which ended up with 2 bricks down and a heal
failure. What should I do about the heal failure, and before or after
replacing the bad disk? First, gluster 10.2 info

Volume Name: glust-distr-rep
Type: Distributed-Replicate
Volume ID: fe0ea6f6-2d1b-4b5c-8af5-0c11ea546270
Status: Started
Snapshot Count: 0
Number of Bricks: 9 x 2 = 18
Transport-type: tcp
Bricks:
Brick1: md1cfsd01:/bricks/b0/br
Brick2: md1cfsd02:/bricks/b0/br
Brick3: md1cfsd03:/bricks/b0/br
Brick4: md1cfsd01:/bricks/b3/br
Brick5: md1cfsd02:/bricks/b3/br
Brick6: md1cfsd03:/bricks/b3/br
Brick7: md1cfsd01:/bricks/b1/br
Brick8: md1cfsd02:/bricks/b1/br
Brick9: md1cfsd03:/bricks/b1/br
Brick10: md1cfsd01:/bricks/b4/br
Brick11: md1cfsd02:/bricks/b4/br
Brick12: md1cfsd03:/bricks/b4/br
Brick13: md1cfsd01:/bricks/b2/br
Brick14: md1cfsd02:/bricks/b2/br
Brick15: md1cfsd03:/bricks/b2/br
Brick16: md1cfsd01:/bricks/b5/br
Brick17: md1cfsd02:/bricks/b5/br
Brick18: md1cfsd03:/bricks/b5/br
Options Reconfigured:
performance.md-cache-statfs: on
cluster.server-quorum-type: server
cluster.min-free-disk: 15
storage.batch-fsync-delay-usec: 0
user.smb: enable
features.cache-invalidation: on
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet

Fun started with a brick(d02:b5) crashing:

[2022-08-02 18:59:29.417147 +] W
[rpcsvc.c:1323:rpcsvc_callback_submit] 0-rpcsvc: transmission of
rpc-request failed
pending frames:
frame : type(1) op(WRITE)
frame : type(1) op(WRITE)
frame : type(1) op(WRITE)
patchset: git://git.gluster.org/glusterfs.git
signal received: 7
time of crash:
2022-08-02 18:59:29 +
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 10.2
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x28a54)[0x7fefb20f7a54]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x700)[0x7fefb20fffc0]
/lib/x86_64-linux-gnu/libc.so.6(+0x3bd60)[0x7fefb1ecdd60]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(__gf_free+0x5a)[0x7fefb211c7aa]
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_unref+0x9a)[0x7fefb209e4fa]
/usr/lib/x86_64-linux-gnu/glusterfs/10.2/xlator/protocol/server.so(+0xaf4b)[0x7fefac1fff4b]
/usr/lib/x86_64-linux-gnu/glusterfs/10.2/xlator/protocol/server.so(+0xb964)[0x7fefac200964]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(xlator_notify+0x34)[0x7fefb20eb244]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_notify+0x1ab)[0x7fefb217cf2b]
...

Then a few hours later a read error on a different brick(b2) on the same host:

[2022-08-02 22:04:17.808970 +] E [MSGID: 113040]
[posix-inode-fd-ops.c:1758:posix_readv] 0-glust-distr-rep-posix: read
failed on gfid=16b51498-966e-4546-b561-24b0062f4324,
fd=0x7ff9f00d6b08, offset=663314432 size=16384, buf=0x7ff9fc0f7000
[Input/output error]
[2022-08-02 22:04:17.809057 +] E [MSGID: 115068]
[server-rpc-fops_v2.c:1369:server4_readv_cbk]
0-glust-distr-rep-server: READ info [{frame=1334746}, {READV_fd_no=4},
{uuid_utoa=16b51498-966e-4546-b561-24b0062f4324},
{client=CTX_ID:6d7535af-769c-4223-aad0-79acffa836ed-GRAPH_ID:0-PID:1414-HOST:r4-16-PC_NAME:glust-distr-rep-client-13-RECON_NO:-1},
{error-xlator=glust-distr-rep-posix}, {errno=5}, {error=Input/output
error}]

This looks like a real hardware error:
[Tue Aug  2 18:03:48 2022] megaraid_sas :03:00.0: 6293
(712778647s/0x0002/FATAL) - Unrecoverable medium error during recovery
on PD 04(e0x20/s4) at 1d267163
[Tue Aug  2 18:03:49 2022] sd 0:2:3:0: [sdd] tag#435 FAILED Result:
hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=3s
[Tue Aug  2 18:03:49 2022] sd 0:2:3:0: [sdd] tag#435 CDB: Read(10) 28
00 1d 26 70 78 00 01 00 00
[Tue Aug  2 18:03:49 2022] blk_update_request: I/O error, dev sdd,
sector 489058424 op 0x0:(READ) flags 0x80700 phys_seg 9 prio class 0


This morning noticing both b2 & b5 were offline, systemctl stopped and
started glusterd to restart the bricks.
All bricks are now up:
Status of volume: glust-distr-rep
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick md1cfsd01:/bricks/b0/br   55386 0  Y   2047
Brick md1cfsd02:/bricks/b0/br   59983 0  Y   3036416
Brick md1cfsd03:/bricks/b0/br   58028 0  Y   2014
Brick md1cfsd01:/bricks/b3/br   59454 0  Y   2041
Brick md1cfsd02:/bricks/b3/br   52352 0  Y   3036421
Brick md1cfsd03:/bricks/b3/br   56786 0  Y   2017
Brick md1cfsd01:/bricks/b1/br   59885 0  Y   2040
Brick md1cfsd02:/bricks/b1/br   55148 0  Y   3036434
Brick md1cfsd03:/bricks/b1/br   52422 0  Y   2068
Brick md1cfsd01:/bricks/b4/br   56378 0  Y   2099
Brick md1cfsd02:/bricks/b4/br   60152 0  Y   

Re: [Gluster-users] GlusterFS 9 and Debian 11

2021-09-27 Thread Eli V
It's built for Debian, can't speak to the docs but an apt repo is available:
https://download.nfs-ganesha.org/3/3.5/Debian/

On Mon, Sep 27, 2021 at 3:53 AM Eliyahu Rosenberg
 wrote:
>
> Since it seems there are after all some Debian (/debian based) users on this 
> list, can I hijack this thread just a bit and ask about ganesha and glusterfs?
> Is that not built for Debian or is it included in the main package?
>
> I ask because as far as I can tell docs on doing gluster+ganesha refer to 
> rpms that don't seem to have deb equivalents and commands referred int the 
> docs also don't seem to exist for me.
>
> Thanks!
> Eli
>
> On Wed, Sep 22, 2021 at 3:40 PM Kaleb Keithley  wrote:
>>
>>
>> On Wed, Sep 22, 2021 at 7:51 AM Taste-Of-IT  wrote:
>>>
>>> Hi,
>>>
>>> i installed fresh Debian 11 stable and use GlusterFS latest sources. At 
>>> installing glusterfs-server i got error missing libreadline7 Paket, which 
>>> is not in Debian 11.
>>>
>>> Is GF 9 not Debian 11 ready?
>>
>>
>> Our Debian 11 box has readline-common 8.1-1 and libreadline8 8.1-1 and 
>> glusterfs 9 builds fine for us.
>>
>> What "latest sources" are you using?
>>
>> --
>>
>> Kaleb
>> 
>>
>>
>>
>> Community Meeting Calendar:
>>
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://meet.google.com/cpu-eiue-hvk
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
> 
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Duplicate files after 8.2 -> 8.3 upgrade

2021-01-08 Thread Eli V
Googling around I see this has been happening for years, with
apparently no one ever understanding why. The files are now on 2
bricks of each of the 3 replica nodes in my cluster, and the gfids all
appear identical, per below. I guess I'll plan on deleting a copy
directly under one of the bricks after the heal finishes. I'll be
happy to provide more info or logs if someone is interested in looking
further.

pdsh -w glust0[5-7] getfattr -m . -d -e hex /bricks/b*/br/ |
dshbak | grep gfid
trusted.gfid=0x9c751f5a7f16490dacda32cb7403ccc1
trusted.gfid2path.f2a6b5d7c793e21e=0x39336663336262342d613161332d343039662d613061312d3661646361316232326231662f323139323332335f4252434156322d312d4845524544494356365f53696d62615f3835353233615f4c616e65305f4944543331342e746172
trusted.gfid=0x9c751f5a7f16490dacda32cb7403ccc1
trusted.gfid2path.f2a6b5d7c793e21e=0x39336663336262342d613161332d343039662d613061312d3661646361316232326231662f323139323332335f4252434156322d312d4845524544494356365f53696d62615f3835353233615f4c616e65305f4944543331342e746172
trusted.gfid=0x9c751f5a7f16490dacda32cb7403ccc1
trusted.gfid2path.f2a6b5d7c793e21e=0x39336663336262342d613161332d343039662d613061312d3661646361316232326231662f323139323332335f4252434156322d312d4845524544494356365f53696d62615f3835353233615f4c616e65305f4944543331342e746172
trusted.gfid=0x9c751f5a7f16490dacda32cb7403ccc1
trusted.gfid2path.f2a6b5d7c793e21e=0x39336663336262342d613161332d343039662d613061312d3661646361316232326231662f323139323332335f4252434156322d312d4845524544494356365f53696d62615f3835353233615f4c616e65305f4944543331342e746172
trusted.gfid=0x9c751f5a7f16490dacda32cb7403ccc1
trusted.gfid2path.f2a6b5d7c793e21e=0x39336663336262342d613161332d343039662d613061312d3661646361316232326231662f323139323332335f4252434156322d312d4845524544494356365f53696d62615f3835353233615f4c616e65305f4944543331342e746172
trusted.gfid=0x9c751f5a7f16490dacda32cb7403ccc1
trusted.gfid2path.f2a6b5d7c793e21e=0x39336663336262342d613161332d343039662d613061312d3661646361316232326231662f323139323332335f4252434156322d312d4845524544494356365f53696d62615f3835353233615f4c616e65305f4944543331342e746172

On Fri, Jan 8, 2021 at 1:29 PM Eli V  wrote:
>
> I just upgraded a gluster replica 3 gluster from 8.2 -> 8.3 rebooting
> the nodes one by one. The cluster was completely idle during this
> time. After each reboot things seemed fine and a gluster volume heal
> vol & info reported no issues. However, about an hour later I started
> seeing duplicate files in ls listing of some of the directories, and
> doing a heal info now shows files undergoing healing. The list of
> healing files does not contain all the duplicates however. Any idea's
> what's going on, or pointers to debug further?




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Duplicate files after 8.2 -> 8.3 upgrade

2021-01-08 Thread Eli V
I just upgraded a gluster replica 3 gluster from 8.2 -> 8.3 rebooting
the nodes one by one. The cluster was completely idle during this
time. After each reboot things seemed fine and a gluster volume heal
vol & info reported no issues. However, about an hour later I started
seeing duplicate files in ls listing of some of the directories, and
doing a heal info now shows files undergoing healing. The list of
healing files does not contain all the duplicates however. Any idea's
what's going on, or pointers to debug further?




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Docs on gluster parameters

2020-11-12 Thread Eli V
I think docs.gluster.org needs a section on the available parameters,
especially considering how important some of them can be. For example
a google for performance.parallel-readdir, or
features.cache-invalidation only seems to turn up some hits in the
release notes on docs.gluster.org. I woudn't expect a new user to have
to go read the release notes for all previous releases to understand
the importance of these parameters, or what paremeters even exist.




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] missing files on FUSE mount

2020-10-23 Thread Eli V
On Tue, Oct 20, 2020 at 8:41 AM Martín Lorenzo  wrote:
>
> Hi, I have the following problem, I have a distributed replicated cluster set 
> up with samba and CTDB, over fuse mount points
> I am having inconsistencies across the FUSE mounts, users report that files 
> are disappearing after being copied/moved. I take a look at the mount points 
> on each node, and they don't display the same data
>
>  faulty mount point
> [root@gluster6 ARRIBA GENTE martes 20 de octubre]# ll
> ls: cannot access PANEO VUELTA A CLASES CON TAPABOCAS.mpg: No such file or 
> directory
> ls: cannot access PANEO NIÑOS ESCUELAS CON TAPABOCAS.mpg: No such file or 
> directory
> total 633723
> drwxr-xr-x. 5 arribagente PN  4096 Oct 19 10:52 COMERCIAL AG martes 20 de 
> octubre
> -rw-r--r--. 1 arribagente PN 648927236 Jun  3 07:16 PANEO FACHADA PALACIO 
> LEGISLATIVO DRONE DIA Y NOCHE.mpg
> -?? ? ?   ?  ?? PANEO NIÑOS ESCUELAS CON 
> TAPABOCAS.mpg
> -?? ? ?   ?  ?? PANEO VUELTA A CLASES CON 
> TAPABOCAS.mpg
>
>
> ###healthy mount point###
> [root@gluster7 ARRIBA GENTE martes 20 de octubre]# ll
> total 3435596
> drwxr-xr-x. 5 arribagente PN   4096 Oct 19 10:52 COMERCIAL AG martes 20 
> de octubre
> -rw-r--r--. 1 arribagente PN  648927236 Jun  3 07:16 PANEO FACHADA PALACIO 
> LEGISLATIVO DRONE DIA Y NOCHE.mpg
> -rw-r--r--. 1 arribagente PN 2084415492 Aug 18 09:14 PANEO NIÑOS ESCUELAS CON 
> TAPABOCAS.mpg
> -rw-r--r--. 1 arribagente PN  784701444 Sep  4 07:23 PANEO VUELTA A CLASES 
> CON TAPABOCAS.mpg
>
>  - So far the only way to solve this is to create a directory in the healthy 
> mount point, on the same path:
> [root@gluster7 ARRIBA GENTE martes 20 de octubre]# mkdir hola
>
> - When you refresh the other mountpoint, and the issue is resolved:
> [root@gluster6 ARRIBA GENTE martes 20 de octubre]# ll
> total 3435600
> drwxr-xr-x. 5 arribagente PN 4096 Oct 19 10:52 COMERCIAL AG martes 20 
> de octubre
> drwxr-xr-x. 2 rootroot   4096 Oct 20 08:45 hola
> -rw-r--r--. 1 arribagente PN648927236 Jun  3 07:16 PANEO FACHADA PALACIO 
> LEGISLATIVO DRONE DIA Y NOCHE.mpg
> -rw-r--r--. 1 arribagente PN   2084415492 Aug 18 09:14 PANEO NIÑOS ESCUELAS 
> CON TAPABOCAS.mpg
> -rw-r--r--. 1 arribagente PN784701444 Sep  4 07:23 PANEO VUELTA A CLASES 
> CON TAPABOCAS.mpg
>
> Interestingly, the error occurs on the mount point where the files were 
> copied. They don't show up as pending heal entries. I have around 15 people 
> using them over samba, I think I'm having this issue reported every two days.
>
> I have an older cluster with similar issues, different gluster version, but a 
> very similar topology (4 bricks, initially two bricks then expanded)
> Please note , the bricks aren't the same size (but their replicas are), so my 
> other suspicion is that rebalancing has something to do with it.
>
> I'm trying to reproduce it over a small virtualized cluster, so far no 
> results.
>
> Here are the cluster details
> four nodes, replica 2, plus one arbiter hosting 2 bricks
> I have 2 bricks with ~20 TB capacity and the other pair is ~48TB
> Volume Name: tapeless
> Type: Distributed-Replicate
> Volume ID: 53bfa86d-b390-496b-bbd7-c4bba625c956
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 2 x (2 + 1) = 6
> Transport-type: tcp
> Bricks:
> Brick1: gluster6.glustersaeta.net:/data/glusterfs/tapeless/brick_6/brick
> Brick2: gluster7.glustersaeta.net:/data/glusterfs/tapeless/brick_7/brick
> Brick3: 
> kitchen-store.glustersaeta.net:/data/glusterfs/tapeless/brick_1a/brick 
> (arbiter)
> Brick4: gluster12.glustersaeta.net:/data/glusterfs/tapeless/brick_12/brick
> Brick5: gluster13.glustersaeta.net:/data/glusterfs/tapeless/brick_13/brick
> Brick6: 
> kitchen-store.glustersaeta.net:/data/glusterfs/tapeless/brick_2a/brick 
> (arbiter)
> Options Reconfigured:
> features.quota-deem-statfs: on
> performance.client-io-threads: on
> nfs.disable: on
> transport.address-family: inet
> features.quota: on
> features.inode-quota: on
> features.cache-invalidation: on
> features.cache-invalidation-timeout: 600
> performance.cache-samba-metadata: on
> performance.stat-prefetch: on
> performance.cache-invalidation: on
> performance.md-cache-timeout: 600
> network.inode-lru-limit: 20
> performance.nl-cache: on
> performance.nl-cache-timeout: 600
> performance.readdir-ahead: on
> performance.parallel-readdir: on
> performance.cache-size: 1GB
> client.event-threads: 4
> server.event-threads: 4
> performance.normal-prio-threads: 16
> performance.io-thread-count: 32
> performance.write-behind-window-size: 8MB
> storage.batch-fsync-delay-usec: 0
> cluster.data-self-heal: on
> cluster.metadata-self-heal: on
> cluster.entry-self-heal: on
> cluster.self-heal-daemon: on
> performance.write-behind: on
> performance.open-behind: on
>
> Log section form faulty mount point. I think the [file exists] entries are 
> from people trying to copy the missing files over an 

[Gluster-users] fuse Stale file handle error

2020-04-01 Thread Eli V
Have a directory in a weird state on a Distributed-Replicate, server
is Gluster 7.3, client is the fuse client 6.6. Script did a mkdir then
tried to mv a file into the new dir, which failed. The ls -l of it
from the fuse client gives the stale file hande error and the weird
listing:

d? ? ? ? ?? orig

Looks like from the bricks themselves the directory exists and looks
normal. So what's the proper way to remove this bad directory? Just
rmdir on all the bricks directly?




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users