[Gluster-users] Another transaction is in progress

2020-03-15 Thread David Cunningham
Hello,

When checking the status of a volume we often get "Another transaction is
in progress" messages. These are probably correct, however what we'd really
like is for the command to keep trying for a time rather than give up. Is
there a place to configure the time period to retry for? We're running
GlusterFS 5.12.

Thanks in advance,

-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782




Community Meeting Calendar:

Schedule -
Every Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Is rebalance in progress or not?

2020-03-15 Thread Alexander Iliev

On 3/15/20 5:17 PM, Strahil Nikolov wrote:

On March 15, 2020 12:16:51 PM GMT+02:00, Alexander Iliev 
 wrote:

On 3/15/20 11:07 AM, Strahil Nikolov wrote:

On March 15, 2020 11:50:32 AM GMT+02:00, Alexander Iliev

 wrote:

Hi list,

I was having some issues with one of my Gluster nodes so I ended up
re-installing it. Now I want to re-add the bricks for my main volume
and
I'm having the following issue - when I try to add the bricks I get:


# gluster volume add-brick store1 replica 3 
volume add-brick: failed: Pre Validation failed on 172.31.35.132.

Volume name store1 rebalance is in progress. Please retry after
completion

But then if I get the rebalance status I get:


# gluster volume rebalance store1 status
volume rebalance: store1: failed: Rebalance not started for volume

store1.

And if I try to start the rebalancing I get:


# gluster volume rebalance store1 start
volume rebalance: store1: failed: Rebalance on store1 is already

started

Looking at the logs of the first node, when I try to start the
rebalance
operation I see this:


[2020-03-15 09:41:31.883651] E [MSGID: 106276]

[glusterd-rpc-ops.c:1200:__glusterd_stage_op_cbk] 0-management:
Received
stage RJT from uuid: 9476b8bb-d7ee-489a-b083-875805343e67

On the second node the logs are showing stuff that indicates that a
rebalance operation is indeed in progress:


[2020-03-15 09:47:34.190042] I [MSGID: 109081]

[dht-common.c:5868:dht_setxattr] 0-store1-dht: fixing the layout of
/redacted

[2020-03-15 09:47:34.775691] I

[dht-rebalance.c:3285:gf_defrag_process_dir] 0-store1-dht: migrate

data


called on /redacted

[2020-03-15 09:47:36.019403] I

[dht-rebalance.c:3480:gf_defrag_process_dir] 0-store1-dht: Migration
operation on dir /redacted took 1.24 secs


Some background on what led to this situation:

The volume was originally a replica 3 distributed replicated volume

on

three nodes. In order to detach the faulty node I lowered the

replica

count to 2 and removed the bricks from that node from the volume. I
cleaned up the storage (formatted the bricks and cleaned the
trusted.gfid and trusted.glusterfs.volume-id extended attributes)

and

purged the gluster packages from the system, then I re-installed the
gluster packages and did a `gluster peer probe` from another node.

I'm running Gluster 6.6 on CentOS 7.7 on all nodes.

I feel stuck at this point, so any guidance will be greatly
appreciated.

Thanks!

Best regards,


Hey  Alex,

Did you try to  go the second node  (the  one tgat  thinks  balance

is running)  and stop tge balance ?


gluster volume rebalance VOLNAME stop

Then add the new brick (and  increase  the  replica  count) and after

the  heal is over - rebalance again.

Hey Strahil,

Thanks for the suggestion, I just tried it, but unfortunately the
result
is pretty much the same - when I try to stop the rebalance on the
second
node it reports that no rebalance is in progress:


# gluster volume rebalance store1 stop
volume rebalance: store1: failed: Rebalance not started for volume

store1.



Best Regards,
Strahil Nikolov



Best regards,
--
alexander iliev


Hey Alex,

I'm not sure  if the  command  has  a  'force' flag, but of it does - it is 
worth trying.

gluster volume rebalance store1 stop force


Hey Strahil,

Thank again for your suggestions!

According to the `gluster volume rebalance help` output only the `start` 
subcommand supports a force flag. I tried that already, unfortunately it 
doesn't help:


# gluster volume rebalance store1 start force
volume rebalance: store1: failed: Rebalance on store1 is already started
# gluster volume rebalance store1 stop
volume rebalance: store1: failed: Rebalance not started for volume store1.


Sadly, as  the second  node  thinks balance  is running - I'm not sure if a 
'start force' (to convince both nodes  that balance  is runking )and then 
'stop'  will have the expected  effect.


The rebalance is indeed running on the second node judging from the 
contents of /var/log/glusterfs/store1-rebalance.log.



Sadly, this situation is hard to reproduce.


In any way , a bug report  should be opened .


The thing is I'm not sure if I can provide meaningful steps to reproduce 
at this point. I didn't keep proper track of all the things I attempted, 
so I'm not sure if the bug report I can file would be of much value. :(



Keep  in mind  that I do not have  a  distributed volume ,  so everything above 
is pure speculation.


Based  on my experience - a gluster  upgrade can fix odd situations like that, 
but also it could make things worse . So for now avoid any upgrades,  until a 
dev confirms  it is safe to do.


Yeah, I'd rather wait for the rebalance to finish before I make any 
further attempts at it. Sadly the storage is backed by rather slow 
(spinning) drives, so it might take a while, but even so I prefer being 
safe than sorry. :)





Best Regards,
Strahil Nikolov



Best regards,
--
alexander iliev




Community Meeting Calendar:

Schedule -
Every Tuesday at 

Re: [Gluster-users] boot auto mount NFS-Ganesha exports failed

2020-03-15 Thread Soumya Koduri

Hi,

Since its working on your test machine, most likely could be NFS 
client-side issue. Please check if there are any kernel fixes between 
those versions which may have caused this.


I see similar issue reported in below threads [1] [2]. As suggested 
there, could you try disabling kerberos module and specify "sec=sys" 
during mount. If the issue persists, please use "-vvv" option during 
mount to get more verbose output.



Thanks,
Soumya

[1] 
https://askubuntu.com/questions/854591/nfs-not-mounting-since-upgrading-from-14-04-to-16-04
[2] 
https://superuser.com/questions/1201159/nfs-v4-nfs4-discover-server-trunking-unhandled-error-512-after-reboot/1207346


On 3/13/20 7:08 PM, Renaud Fortier wrote:

Hi community,

Maybe someone could help me with this one: half the time, mounting 
nfs-ganesha NFS4 exports failed at boot. I’ve seach a lot about this 
problem but because it doesn’t happen at every boot it’s difficult to 
pinpoint the exact problem. It mount perfectly after boot is completed.


-

-Debian 9 (uptodate) on all Gluster servers and clients (4 apache2 web 
servers).


-Gluster version 6.7

-NFS-Ganesha version 2.8.3

-NFS 4.2

-fstab exemple: 192.168.11.90:/dev /data nfs4 
noatime,nodiratime,vers=4.2,_netdev 0 0


-NFS-Ganesha export exemple:

EXPORT {

     Export_Id = 2;

     Path = "/dev";

     Pseudo = "/dev";

     Access_Type = RW;

     Squash = No_root_squash;

     Disable_ACL = true;

     Protocols = "4";

     Transports = "UDP","TCP";

     SecType = "sys";

     FSAL {

     Name = "GLUSTER";

    Hostname = localhost;

     Volume = "dev";

     }

}

-Dmesg log: NFS: nfs4_discover_server_trunking unhandled error -512. 
Exiting with error EIO


-Systemd log:

     systemd[1]: Failed to mount /data.

     systemd[1]: Dependency failed for Remote File Systems.

     systemd[1]: remote-fs.target: Job remote-fs.target/start failed 
with result 'dependency'.


     systemd[1]: data.mount: Unit entered failed state.

---

I tried to reproduce the same problem with a test machine but it mount 
perfectly at every reboot. Then, I’m pretty sure the problem is on my 
clients. Also, no problem with fuse mount.


Any help or direction to follow will be greatly appreciated.

Thank you

Renaud Fortier






Community Meeting Calendar:

Schedule -
Every Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users







Community Meeting Calendar:

Schedule -
Every Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Is rebalance in progress or not?

2020-03-15 Thread Alexander Iliev

On 3/15/20 11:07 AM, Strahil Nikolov wrote:

On March 15, 2020 11:50:32 AM GMT+02:00, Alexander Iliev 
 wrote:

Hi list,

I was having some issues with one of my Gluster nodes so I ended up
re-installing it. Now I want to re-add the bricks for my main volume
and
I'm having the following issue - when I try to add the bricks I get:


# gluster volume add-brick store1 replica 3 
volume add-brick: failed: Pre Validation failed on 172.31.35.132.

Volume name store1 rebalance is in progress. Please retry after
completion

But then if I get the rebalance status I get:


# gluster volume rebalance store1 status
volume rebalance: store1: failed: Rebalance not started for volume

store1.

And if I try to start the rebalancing I get:


# gluster volume rebalance store1 start
volume rebalance: store1: failed: Rebalance on store1 is already

started

Looking at the logs of the first node, when I try to start the
rebalance
operation I see this:


[2020-03-15 09:41:31.883651] E [MSGID: 106276]

[glusterd-rpc-ops.c:1200:__glusterd_stage_op_cbk] 0-management:
Received
stage RJT from uuid: 9476b8bb-d7ee-489a-b083-875805343e67

On the second node the logs are showing stuff that indicates that a
rebalance operation is indeed in progress:


[2020-03-15 09:47:34.190042] I [MSGID: 109081]

[dht-common.c:5868:dht_setxattr] 0-store1-dht: fixing the layout of
/redacted

[2020-03-15 09:47:34.775691] I

[dht-rebalance.c:3285:gf_defrag_process_dir] 0-store1-dht: migrate data

called on /redacted

[2020-03-15 09:47:36.019403] I

[dht-rebalance.c:3480:gf_defrag_process_dir] 0-store1-dht: Migration
operation on dir /redacted took 1.24 secs


Some background on what led to this situation:

The volume was originally a replica 3 distributed replicated volume on
three nodes. In order to detach the faulty node I lowered the replica
count to 2 and removed the bricks from that node from the volume. I
cleaned up the storage (formatted the bricks and cleaned the
trusted.gfid and trusted.glusterfs.volume-id extended attributes) and
purged the gluster packages from the system, then I re-installed the
gluster packages and did a `gluster peer probe` from another node.

I'm running Gluster 6.6 on CentOS 7.7 on all nodes.

I feel stuck at this point, so any guidance will be greatly
appreciated.

Thanks!

Best regards,


Hey  Alex,

Did you try to  go the second node  (the  one tgat  thinks  balance  is 
running)  and stop tge balance ?

gluster volume rebalance VOLNAME stop

Then add the new brick (and  increase  the  replica  count) and after  the  
heal is over - rebalance again.


Hey Strahil,

Thanks for the suggestion, I just tried it, but unfortunately the result 
is pretty much the same - when I try to stop the rebalance on the second 
node it reports that no rebalance is in progress:


> # gluster volume rebalance store1 stop
> volume rebalance: store1: failed: Rebalance not started for volume 
store1.




Best Regards,
Strahil Nikolov



Best regards,
--
alexander iliev




Community Meeting Calendar:

Schedule -
Every Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Is rebalance in progress or not?

2020-03-15 Thread Alexander Iliev

Hi list,

I was having some issues with one of my Gluster nodes so I ended up 
re-installing it. Now I want to re-add the bricks for my main volume and 
I'm having the following issue - when I try to add the bricks I get:


> # gluster volume add-brick store1 replica 3 
> volume add-brick: failed: Pre Validation failed on 172.31.35.132. 
Volume name store1 rebalance is in progress. Please retry after completion


But then if I get the rebalance status I get:

> # gluster volume rebalance store1 status
> volume rebalance: store1: failed: Rebalance not started for volume 
store1.


And if I try to start the rebalancing I get:

> # gluster volume rebalance store1 start
> volume rebalance: store1: failed: Rebalance on store1 is already started

Looking at the logs of the first node, when I try to start the rebalance 
operation I see this:


> [2020-03-15 09:41:31.883651] E [MSGID: 106276] 
[glusterd-rpc-ops.c:1200:__glusterd_stage_op_cbk] 0-management: Received 
stage RJT from uuid: 9476b8bb-d7ee-489a-b083-875805343e67


On the second node the logs are showing stuff that indicates that a 
rebalance operation is indeed in progress:


> [2020-03-15 09:47:34.190042] I [MSGID: 109081] 
[dht-common.c:5868:dht_setxattr] 0-store1-dht: fixing the layout of 
/redacted
> [2020-03-15 09:47:34.775691] I 
[dht-rebalance.c:3285:gf_defrag_process_dir] 0-store1-dht: migrate data 
called on /redacted
> [2020-03-15 09:47:36.019403] I 
[dht-rebalance.c:3480:gf_defrag_process_dir] 0-store1-dht: Migration 
operation on dir /redacted took 1.24 secs



Some background on what led to this situation:

The volume was originally a replica 3 distributed replicated volume on 
three nodes. In order to detach the faulty node I lowered the replica 
count to 2 and removed the bricks from that node from the volume. I 
cleaned up the storage (formatted the bricks and cleaned the 
trusted.gfid and trusted.glusterfs.volume-id extended attributes) and 
purged the gluster packages from the system, then I re-installed the 
gluster packages and did a `gluster peer probe` from another node.


I'm running Gluster 6.6 on CentOS 7.7 on all nodes.

I feel stuck at this point, so any guidance will be greatly appreciated.

Thanks!

Best regards,
--
alexander iliev




Community Meeting Calendar:

Schedule -
Every Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users