Re: [Gluster-users] qemu raw image file - qemu and grub2 can't find boot content from VM

2021-02-01 Thread Mahdi Adnan
Thanks for the feedback, 7.9 is really stable, in fact, it is so stable
that we might not even upgrade to 8.x for some time.

On Mon, Feb 1, 2021 at 11:56 PM Erik Jacobson  wrote:

> We think this fixed it. While there is random chance in there, we can't
> repeat it in 7.9. So I'll close this thread out for now.
>
> We'll ask for help again if needed. Thanks for all the kind responses,
>
> Erik
>
> On Fri, Jan 29, 2021 at 02:20:56PM -0600, Erik Jacobson wrote:
> > I updated to 7.9, rebooted everything, and it started working.
> >
> > I will have QE try to break it again and report back. I couldn't break
> > it but they're better at breaking things (which is hard to imagine :)
> >
> >
> > On Fri, Jan 29, 2021 at 01:11:50PM -0600, Erik Jacobson wrote:
> > > Thank you.
> > >
> > > We reproduced the problem after force-killing one of the 3 physical
> > > nodes 6 times in a row.
> > >
> > > At that point, the grub2 loaded off the qemu virtual hard drive, but
> > > could not find partitions. Since there is random luck involved, we
> don't
> > > actually know if it was the force-killing that caused it to stop
> > > working.
> > >
> > > When I start the VM with the image in this state, there is nothing
> > > interesting in the fuse log for the volume in /var/log/glusterfs on the
> > > node hosting the image.
> > >
> > > No pending heals (all servers report 0 entries to heal).
> > >
> > > The same VM behavior happens on all the physical nodes when I try to
> > > start with the same VM image.
> > >
> > > Something from the gluster fuse mount log from earlier shows:
> > >
> > > [2021-01-28 21:24:40.814227] I [MSGID: 114018]
> [client.c:2347:client_rpc_notify] 0-adminvm-client-0: disconnected from
> adminvm-client-0. Client process will keep trying to connect to glusterd
> until brick's port is available
> > > [2021-01-28 21:24:43.815120] I [rpc-clnt.c:1963:rpc_clnt_reconfig]
> 0-adminvm-client-0: changing port to 49152 (from 0)
> > > [2021-01-28 21:24:43.815833] I [MSGID: 114057]
> [client-handshake.c:1376:select_server_supported_programs]
> 0-adminvm-client-0: Using Program GlusterFS 4.x v1, Num (1298437), Version
> (400)
> > > [2021-01-28 21:24:43.817682] I [MSGID: 114046]
> [client-handshake.c:1106:client_setvolume_cbk] 0-adminvm-client-0:
> Connected to adminvm-client-0, attached to remote volume
> '/data/brick_adminvm'.
> > > [2021-01-28 21:24:43.817709] I [MSGID: 114042]
> [client-handshake.c:930:client_post_handshake] 0-adminvm-client-0: 1 fds
> open - Delaying child_up until they are re-opened
> > > [2021-01-28 21:24:43.895163] I [MSGID: 114041]
> [client-handshake.c:318:client_child_up_reopen_done] 0-adminvm-client-0:
> last fd open'd/lock-self-heal'd - notifying CHILD-UP
> > > The message "W [MSGID: 114061] [client-common.c:2893:client_pre_lk_v2]
> 0-adminvm-client-0:  (94695bdb-06b4-4105-9bc8-b8207270c941) remote_fd is
> -1. EBADFD [File descriptor in bad state]" repeated 6 times between
> [2021-01-28 21:23:54.395811] and [2021-01-28 21:23:54.811640]
> > >
> > >
> > > But that was a long time ago.
> > >
> > > Brick logs have an entry from when I first started the vm today (the
> > > problem was reproduced yesterday) all brick logs have something
> similar.
> > > Nothing appeared on the several other startup attempts of the VM:
> > >
> > > [2021-01-28 21:24:45.460147] I [MSGID: 115029]
> [server-handshake.c:549:server_setvolume] 0-adminvm-server: accepted client
> from
> CTX_ID:613f0d91-34e6-4495-859f-bca1c9f7af01-GRAPH_ID:0-PID:6287-HOST:nano-1-PC_NAME:adminvm-client-2-RECON_NO:-0
> (version: 7.2) with subvol /data/brick_adminvm
> > > [2021-01-29 18:54:45.48] I [addr.c:54:compare_addr_and_update]
> 0-/data/brick_adminvm: allowed = "*", received addr = "172.23.255.153"
> > > [2021-01-29 18:54:45.455802] I [login.c:110:gf_auth] 0-auth/login:
> allowed user names: 3b66cfab-00d5-4b13-a103-93b4cf95e144
> > > [2021-01-29 18:54:45.455815] I [MSGID: 115029]
> [server-handshake.c:549:server_setvolume] 0-adminvm-server: accepted client
> from
> CTX_ID:3774af6b-07b9-437b-a34e-9f71f3b57d03-GRAPH_ID:0-PID:45640-HOST:nano-3-PC_NAME:adminvm-client-2-RECON_NO:-0
> (version: 7.2) with subvol /data/brick_adminvm
> > > [2021-01-29 18:54:45.494950] W [socket.c:774:__socket_rwv]
> 0-tcp.adminvm-server: readv on 172.23.255.153:48551 failed (No data
> available)
> > > [2021-01-29 18:54:45.494994] I [MSGID: 115036]
> [server.c:501:server_rpc_notify] 0-adminvm-server: disconnecting connection
> from
> CTX_ID:3774af6b-07b9-437b-a34e-9f71f3b57d03-GRAPH_ID:0-PID:45640-HOST:nano-3-PC_NAME:adminvm-client-2-RECON_NO:-0
> > > [2021-01-29 18:54:45.495091] I [MSGID: 101055]
> [client_t.c:436:gf_client_unref] 0-adminvm-server: Shutting down connection
> CTX_ID:3774af6b-07b9-437b-a34e-9f71f3b57d03-GRAPH_ID:0-PID:45640-HOST:nano-3-PC_NAME:adminvm-client-2-RECON_NO:-0
> > >
> > >
> > >
> > > Like before, if I halt the VM, kpartx the image, mount the giant root
> > > within the image, then unmount, unkpartx, and start the VM - it works:
> >

Re: [Gluster-users] qemu raw image file - qemu and grub2 can't find boot content from VM

2021-02-01 Thread Erik Jacobson
We think this fixed it. While there is random chance in there, we can't
repeat it in 7.9. So I'll close this thread out for now.

We'll ask for help again if needed. Thanks for all the kind responses,

Erik

On Fri, Jan 29, 2021 at 02:20:56PM -0600, Erik Jacobson wrote:
> I updated to 7.9, rebooted everything, and it started working.
> 
> I will have QE try to break it again and report back. I couldn't break
> it but they're better at breaking things (which is hard to imagine :)
> 
> 
> On Fri, Jan 29, 2021 at 01:11:50PM -0600, Erik Jacobson wrote:
> > Thank you.
> > 
> > We reproduced the problem after force-killing one of the 3 physical
> > nodes 6 times in a row.
> > 
> > At that point, the grub2 loaded off the qemu virtual hard drive, but
> > could not find partitions. Since there is random luck involved, we don't
> > actually know if it was the force-killing that caused it to stop
> > working.
> > 
> > When I start the VM with the image in this state, there is nothing
> > interesting in the fuse log for the volume in /var/log/glusterfs on the
> > node hosting the image.
> > 
> > No pending heals (all servers report 0 entries to heal).
> > 
> > The same VM behavior happens on all the physical nodes when I try to
> > start with the same VM image.
> > 
> > Something from the gluster fuse mount log from earlier shows:
> > 
> > [2021-01-28 21:24:40.814227] I [MSGID: 114018] 
> > [client.c:2347:client_rpc_notify] 0-adminvm-client-0: disconnected from 
> > adminvm-client-0. Client process will keep trying to connect to glusterd 
> > until brick's port is available
> > [2021-01-28 21:24:43.815120] I [rpc-clnt.c:1963:rpc_clnt_reconfig] 
> > 0-adminvm-client-0: changing port to 49152 (from 0)
> > [2021-01-28 21:24:43.815833] I [MSGID: 114057] 
> > [client-handshake.c:1376:select_server_supported_programs] 
> > 0-adminvm-client-0: Using Program GlusterFS 4.x v1, Num (1298437), Version 
> > (400)
> > [2021-01-28 21:24:43.817682] I [MSGID: 114046] 
> > [client-handshake.c:1106:client_setvolume_cbk] 0-adminvm-client-0: 
> > Connected to adminvm-client-0, attached to remote volume 
> > '/data/brick_adminvm'.
> > [2021-01-28 21:24:43.817709] I [MSGID: 114042] 
> > [client-handshake.c:930:client_post_handshake] 0-adminvm-client-0: 1 fds 
> > open - Delaying child_up until they are re-opened
> > [2021-01-28 21:24:43.895163] I [MSGID: 114041] 
> > [client-handshake.c:318:client_child_up_reopen_done] 0-adminvm-client-0: 
> > last fd open'd/lock-self-heal'd - notifying CHILD-UP
> > The message "W [MSGID: 114061] [client-common.c:2893:client_pre_lk_v2] 
> > 0-adminvm-client-0:  (94695bdb-06b4-4105-9bc8-b8207270c941) remote_fd is 
> > -1. EBADFD [File descriptor in bad state]" repeated 6 times between 
> > [2021-01-28 21:23:54.395811] and [2021-01-28 21:23:54.811640]
> > 
> > 
> > But that was a long time ago.
> > 
> > Brick logs have an entry from when I first started the vm today (the
> > problem was reproduced yesterday) all brick logs have something similar.
> > Nothing appeared on the several other startup attempts of the VM:
> > 
> > [2021-01-28 21:24:45.460147] I [MSGID: 115029] 
> > [server-handshake.c:549:server_setvolume] 0-adminvm-server: accepted client 
> > from 
> > CTX_ID:613f0d91-34e6-4495-859f-bca1c9f7af01-GRAPH_ID:0-PID:6287-HOST:nano-1-PC_NAME:adminvm-client-2-RECON_NO:-0
> >  (version: 7.2) with subvol /data/brick_adminvm
> > [2021-01-29 18:54:45.48] I [addr.c:54:compare_addr_and_update] 
> > 0-/data/brick_adminvm: allowed = "*", received addr = "172.23.255.153"
> > [2021-01-29 18:54:45.455802] I [login.c:110:gf_auth] 0-auth/login: allowed 
> > user names: 3b66cfab-00d5-4b13-a103-93b4cf95e144
> > [2021-01-29 18:54:45.455815] I [MSGID: 115029] 
> > [server-handshake.c:549:server_setvolume] 0-adminvm-server: accepted client 
> > from 
> > CTX_ID:3774af6b-07b9-437b-a34e-9f71f3b57d03-GRAPH_ID:0-PID:45640-HOST:nano-3-PC_NAME:adminvm-client-2-RECON_NO:-0
> >  (version: 7.2) with subvol /data/brick_adminvm
> > [2021-01-29 18:54:45.494950] W [socket.c:774:__socket_rwv] 
> > 0-tcp.adminvm-server: readv on 172.23.255.153:48551 failed (No data 
> > available)
> > [2021-01-29 18:54:45.494994] I [MSGID: 115036] 
> > [server.c:501:server_rpc_notify] 0-adminvm-server: disconnecting connection 
> > from 
> > CTX_ID:3774af6b-07b9-437b-a34e-9f71f3b57d03-GRAPH_ID:0-PID:45640-HOST:nano-3-PC_NAME:adminvm-client-2-RECON_NO:-0
> > [2021-01-29 18:54:45.495091] I [MSGID: 101055] 
> > [client_t.c:436:gf_client_unref] 0-adminvm-server: Shutting down connection 
> > CTX_ID:3774af6b-07b9-437b-a34e-9f71f3b57d03-GRAPH_ID:0-PID:45640-HOST:nano-3-PC_NAME:adminvm-client-2-RECON_NO:-0
> > 
> > 
> > 
> > Like before, if I halt the VM, kpartx the image, mount the giant root
> > within the image, then unmount, unkpartx, and start the VM - it works:
> > 
> > nano-2:/var/log/glusterfs # kpartx -a /adminvm/images/adminvm.img
> > nano-2:/var/log/glusterfs # mount /dev/mapper/loop0p31 /mnt
> > nano-2:/var/log/glusterfs # dmesg|tail -3
> > [85

Re: [Gluster-users] qemu raw image file - qemu and grub2 can't find boot content from VM

2021-01-29 Thread Erik Jacobson
I updated to 7.9, rebooted everything, and it started working.

I will have QE try to break it again and report back. I couldn't break
it but they're better at breaking things (which is hard to imagine :)


On Fri, Jan 29, 2021 at 01:11:50PM -0600, Erik Jacobson wrote:
> Thank you.
> 
> We reproduced the problem after force-killing one of the 3 physical
> nodes 6 times in a row.
> 
> At that point, the grub2 loaded off the qemu virtual hard drive, but
> could not find partitions. Since there is random luck involved, we don't
> actually know if it was the force-killing that caused it to stop
> working.
> 
> When I start the VM with the image in this state, there is nothing
> interesting in the fuse log for the volume in /var/log/glusterfs on the
> node hosting the image.
> 
> No pending heals (all servers report 0 entries to heal).
> 
> The same VM behavior happens on all the physical nodes when I try to
> start with the same VM image.
> 
> Something from the gluster fuse mount log from earlier shows:
> 
> [2021-01-28 21:24:40.814227] I [MSGID: 114018] 
> [client.c:2347:client_rpc_notify] 0-adminvm-client-0: disconnected from 
> adminvm-client-0. Client process will keep trying to connect to glusterd 
> until brick's port is available
> [2021-01-28 21:24:43.815120] I [rpc-clnt.c:1963:rpc_clnt_reconfig] 
> 0-adminvm-client-0: changing port to 49152 (from 0)
> [2021-01-28 21:24:43.815833] I [MSGID: 114057] 
> [client-handshake.c:1376:select_server_supported_programs] 
> 0-adminvm-client-0: Using Program GlusterFS 4.x v1, Num (1298437), Version 
> (400)
> [2021-01-28 21:24:43.817682] I [MSGID: 114046] 
> [client-handshake.c:1106:client_setvolume_cbk] 0-adminvm-client-0: Connected 
> to adminvm-client-0, attached to remote volume '/data/brick_adminvm'.
> [2021-01-28 21:24:43.817709] I [MSGID: 114042] 
> [client-handshake.c:930:client_post_handshake] 0-adminvm-client-0: 1 fds open 
> - Delaying child_up until they are re-opened
> [2021-01-28 21:24:43.895163] I [MSGID: 114041] 
> [client-handshake.c:318:client_child_up_reopen_done] 0-adminvm-client-0: last 
> fd open'd/lock-self-heal'd - notifying CHILD-UP
> The message "W [MSGID: 114061] [client-common.c:2893:client_pre_lk_v2] 
> 0-adminvm-client-0:  (94695bdb-06b4-4105-9bc8-b8207270c941) remote_fd is -1. 
> EBADFD [File descriptor in bad state]" repeated 6 times between [2021-01-28 
> 21:23:54.395811] and [2021-01-28 21:23:54.811640]
> 
> 
> But that was a long time ago.
> 
> Brick logs have an entry from when I first started the vm today (the
> problem was reproduced yesterday) all brick logs have something similar.
> Nothing appeared on the several other startup attempts of the VM:
> 
> [2021-01-28 21:24:45.460147] I [MSGID: 115029] 
> [server-handshake.c:549:server_setvolume] 0-adminvm-server: accepted client 
> from 
> CTX_ID:613f0d91-34e6-4495-859f-bca1c9f7af01-GRAPH_ID:0-PID:6287-HOST:nano-1-PC_NAME:adminvm-client-2-RECON_NO:-0
>  (version: 7.2) with subvol /data/brick_adminvm
> [2021-01-29 18:54:45.48] I [addr.c:54:compare_addr_and_update] 
> 0-/data/brick_adminvm: allowed = "*", received addr = "172.23.255.153"
> [2021-01-29 18:54:45.455802] I [login.c:110:gf_auth] 0-auth/login: allowed 
> user names: 3b66cfab-00d5-4b13-a103-93b4cf95e144
> [2021-01-29 18:54:45.455815] I [MSGID: 115029] 
> [server-handshake.c:549:server_setvolume] 0-adminvm-server: accepted client 
> from 
> CTX_ID:3774af6b-07b9-437b-a34e-9f71f3b57d03-GRAPH_ID:0-PID:45640-HOST:nano-3-PC_NAME:adminvm-client-2-RECON_NO:-0
>  (version: 7.2) with subvol /data/brick_adminvm
> [2021-01-29 18:54:45.494950] W [socket.c:774:__socket_rwv] 
> 0-tcp.adminvm-server: readv on 172.23.255.153:48551 failed (No data available)
> [2021-01-29 18:54:45.494994] I [MSGID: 115036] 
> [server.c:501:server_rpc_notify] 0-adminvm-server: disconnecting connection 
> from 
> CTX_ID:3774af6b-07b9-437b-a34e-9f71f3b57d03-GRAPH_ID:0-PID:45640-HOST:nano-3-PC_NAME:adminvm-client-2-RECON_NO:-0
> [2021-01-29 18:54:45.495091] I [MSGID: 101055] 
> [client_t.c:436:gf_client_unref] 0-adminvm-server: Shutting down connection 
> CTX_ID:3774af6b-07b9-437b-a34e-9f71f3b57d03-GRAPH_ID:0-PID:45640-HOST:nano-3-PC_NAME:adminvm-client-2-RECON_NO:-0
> 
> 
> 
> Like before, if I halt the VM, kpartx the image, mount the giant root
> within the image, then unmount, unkpartx, and start the VM - it works:
> 
> nano-2:/var/log/glusterfs # kpartx -a /adminvm/images/adminvm.img
> nano-2:/var/log/glusterfs # mount /dev/mapper/loop0p31 /mnt
> nano-2:/var/log/glusterfs # dmesg|tail -3
> [85528.602570] loop: module loaded
> [85535.975623] EXT4-fs (dm-3): recovery complete
> [85535.979663] EXT4-fs (dm-3): mounted filesystem with ordered data mode. 
> Opts: (null)
> nano-2:/var/log/glusterfs # umount /mnt
> nano-2:/var/log/glusterfs # kpartx -d /adminvm/images/adminvm.img
> loop deleted : /dev/loop0
> 
> VM WORKS for ONE boot cycle on one physical!
> 
> nano-2:/var/log/glusterfs # virsh start adminvm
> 
> However, this will work for a boot but later

Re: [Gluster-users] qemu raw image file - qemu and grub2 can't find boot content from VM

2021-01-29 Thread Erik Jacobson
Thank you.

We reproduced the problem after force-killing one of the 3 physical
nodes 6 times in a row.

At that point, the grub2 loaded off the qemu virtual hard drive, but
could not find partitions. Since there is random luck involved, we don't
actually know if it was the force-killing that caused it to stop
working.

When I start the VM with the image in this state, there is nothing
interesting in the fuse log for the volume in /var/log/glusterfs on the
node hosting the image.

No pending heals (all servers report 0 entries to heal).

The same VM behavior happens on all the physical nodes when I try to
start with the same VM image.

Something from the gluster fuse mount log from earlier shows:

[2021-01-28 21:24:40.814227] I [MSGID: 114018] 
[client.c:2347:client_rpc_notify] 0-adminvm-client-0: disconnected from 
adminvm-client-0. Client process will keep trying to connect to glusterd until 
brick's port is available
[2021-01-28 21:24:43.815120] I [rpc-clnt.c:1963:rpc_clnt_reconfig] 
0-adminvm-client-0: changing port to 49152 (from 0)
[2021-01-28 21:24:43.815833] I [MSGID: 114057] 
[client-handshake.c:1376:select_server_supported_programs] 0-adminvm-client-0: 
Using Program GlusterFS 4.x v1, Num (1298437), Version (400)
[2021-01-28 21:24:43.817682] I [MSGID: 114046] 
[client-handshake.c:1106:client_setvolume_cbk] 0-adminvm-client-0: Connected to 
adminvm-client-0, attached to remote volume '/data/brick_adminvm'.
[2021-01-28 21:24:43.817709] I [MSGID: 114042] 
[client-handshake.c:930:client_post_handshake] 0-adminvm-client-0: 1 fds open - 
Delaying child_up until they are re-opened
[2021-01-28 21:24:43.895163] I [MSGID: 114041] 
[client-handshake.c:318:client_child_up_reopen_done] 0-adminvm-client-0: last 
fd open'd/lock-self-heal'd - notifying CHILD-UP
The message "W [MSGID: 114061] [client-common.c:2893:client_pre_lk_v2] 
0-adminvm-client-0:  (94695bdb-06b4-4105-9bc8-b8207270c941) remote_fd is -1. 
EBADFD [File descriptor in bad state]" repeated 6 times between [2021-01-28 
21:23:54.395811] and [2021-01-28 21:23:54.811640]


But that was a long time ago.

Brick logs have an entry from when I first started the vm today (the
problem was reproduced yesterday) all brick logs have something similar.
Nothing appeared on the several other startup attempts of the VM:

[2021-01-28 21:24:45.460147] I [MSGID: 115029] 
[server-handshake.c:549:server_setvolume] 0-adminvm-server: accepted client 
from 
CTX_ID:613f0d91-34e6-4495-859f-bca1c9f7af01-GRAPH_ID:0-PID:6287-HOST:nano-1-PC_NAME:adminvm-client-2-RECON_NO:-0
 (version: 7.2) with subvol /data/brick_adminvm
[2021-01-29 18:54:45.48] I [addr.c:54:compare_addr_and_update] 
0-/data/brick_adminvm: allowed = "*", received addr = "172.23.255.153"
[2021-01-29 18:54:45.455802] I [login.c:110:gf_auth] 0-auth/login: allowed user 
names: 3b66cfab-00d5-4b13-a103-93b4cf95e144
[2021-01-29 18:54:45.455815] I [MSGID: 115029] 
[server-handshake.c:549:server_setvolume] 0-adminvm-server: accepted client 
from 
CTX_ID:3774af6b-07b9-437b-a34e-9f71f3b57d03-GRAPH_ID:0-PID:45640-HOST:nano-3-PC_NAME:adminvm-client-2-RECON_NO:-0
 (version: 7.2) with subvol /data/brick_adminvm
[2021-01-29 18:54:45.494950] W [socket.c:774:__socket_rwv] 
0-tcp.adminvm-server: readv on 172.23.255.153:48551 failed (No data available)
[2021-01-29 18:54:45.494994] I [MSGID: 115036] [server.c:501:server_rpc_notify] 
0-adminvm-server: disconnecting connection from 
CTX_ID:3774af6b-07b9-437b-a34e-9f71f3b57d03-GRAPH_ID:0-PID:45640-HOST:nano-3-PC_NAME:adminvm-client-2-RECON_NO:-0
[2021-01-29 18:54:45.495091] I [MSGID: 101055] [client_t.c:436:gf_client_unref] 
0-adminvm-server: Shutting down connection 
CTX_ID:3774af6b-07b9-437b-a34e-9f71f3b57d03-GRAPH_ID:0-PID:45640-HOST:nano-3-PC_NAME:adminvm-client-2-RECON_NO:-0



Like before, if I halt the VM, kpartx the image, mount the giant root
within the image, then unmount, unkpartx, and start the VM - it works:

nano-2:/var/log/glusterfs # kpartx -a /adminvm/images/adminvm.img
nano-2:/var/log/glusterfs # mount /dev/mapper/loop0p31 /mnt
nano-2:/var/log/glusterfs # dmesg|tail -3
[85528.602570] loop: module loaded
[85535.975623] EXT4-fs (dm-3): recovery complete
[85535.979663] EXT4-fs (dm-3): mounted filesystem with ordered data mode. Opts: 
(null)
nano-2:/var/log/glusterfs # umount /mnt
nano-2:/var/log/glusterfs # kpartx -d /adminvm/images/adminvm.img
loop deleted : /dev/loop0

VM WORKS for ONE boot cycle on one physical!

nano-2:/var/log/glusterfs # virsh start adminvm

However, this will work for a boot but later it will stop working again.
(INCLUDING the physical node that booted once ok. The next boot fails
again as does luanching it on the other two).

Based on feedback, I will not change the shard size at this time and
will leave that for later. Some people suggest larger sizes but it isn't
a universal suggestion. I'll also not attempt to make a logical volume
out of a group of smaller images as I think it should work like this.
Those are things I will try later if 

Re: [Gluster-users] qemu raw image file - qemu and grub2 can't find boot content from VM

2021-01-27 Thread Mahdi Adnan
 I would leave it on 64M in volumes with spindle disks, but with SSD
volumes, I would increase it to 128M or even 256M, but it varies from one
workload to another.
On Wed, Jan 27, 2021 at 10:02 PM Erik Jacobson 
wrote:

> > Also, I would like to point that I have VMs with large disks 1TB and
> 2TB, and
> > have no issues. definitely would upgrade Gluster version like let's say
> at
> > least 7.9.
>
> Great! Thank you! We can update but it's very sensitive due to the
> workload. I can't officially update our gluster until we have a cluster
> with a couple thousand nodes to test with. However, for this problem,
> this is on my list on the test machine. I'm hoping I can reproduce it. So
> far
> no luck making it happen again. Once I hit it, I will try to collect more
> data
> and at the end update gluster.
>
> What do you think about the suggestion to increase the shard size? Are
> you using the default size on your 1TB and 2TB images?
>
> > Amar also asked a question regarding enabling Sharding in the volume
> after
> > creating the VMs disks, which would certainly mess up the volume if that
> what
> > happened.
>
> Oh I missed this question. I basically scripted it quick since I was
> doing it so often.. I have a similar script that takes it away to start
> over.
>
> set -x
> pdsh -g gluster mkdir /data/brick_adminvm/
> gluster volume create adminvm replica 3 transport tcp 
> 172.23.255.151:/data/brick_adminvm
> 172.23.255.152:/data/brick_adminvm 172.23.255.153:/data/brick_adminvm
> gluster volume set adminvm group virt
> gluster volume set adminvm granular-entry-heal enable
> gluster volume set adminvm storage.owner-uid 439
> gluster volume set adminvm storage.owner-gid 443
> gluster volume start adminvm
>
> pdsh -g gluster mount /adminvm
>
> echo -n "press enter to continue for restore tarball"
>
> pushd /adminvm
> tar xvf /root/backup.tar
> popd
>
> echo -n "press enter to continue for qemu-img"
>
> pushd /adminvm
> qemu-img create -f raw -o preallocation=falloc /adminvm/images/adminvm.img
> 5T
> popd
>
>
> Thanks again for the kind responses,
>
> Erik
>
> >
> > On Wed, Jan 27, 2021 at 5:28 PM Erik Jacobson 
> wrote:
> >
> > > > Shortly after the sharded volume is made, there are some fuse
> mount
> > > > messages. I'm not 100% sure if this was just before or during the
> > > > big qemu-img command to make the 5T image
> > > > (qemu-img create -f raw -o preallocation=falloc
> > > > /adminvm/images/adminvm.img 5T)
> > > Any reason to have a single disk with this size ?
> >
> > > Usually in any
> > > virtualization I have used , it is always recommended to keep it
> lower.
> > > Have you thought about multiple disks with smaller size ?
> >
> > Yes, because the actual virtual machine is an admin node/head node
> cluster
> > manager for a supercomputer that hosts big OS images and drives
> > multi-thousand-node-clusters (boot, monitoring, image creation,
> > distribution, sometimes NFS roots, etc) . So this VM is a biggie.
> >
> > We could make multiple smaller images but it would be very painful
> since
> > it differs from the normal non-VM setup.
> >
> > So unlike many solutions where you have lots of small VMs with their
> > images small images, this solution is one giant VM with one giant
> image.
> > We're essentially using gluster in this use case (as opposed to
> others I
> > have posted about in the past) for head node failover (combined with
> > pacemaker).
> >
> > > Also worth
> > > noting is that RHII is supported only when the shard size is
> 512MB, so
> > > it's worth trying bigger shard size .
> >
> > I have put larger shard size and newer gluster version on the list to
> > try. Thank you! Hoping to get it failing again to try these things!
> >
> >
> >
> > --
> > Respectfully
> > Mahdi
>


-- 
Respectfully
Mahdi




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] qemu raw image file - qemu and grub2 can't find boot content from VM

2021-01-27 Thread Erik Jacobson
> Also, I would like to point that I have VMs with large disks 1TB and 2TB, and
> have no issues. definitely would upgrade Gluster version like let's say at
> least 7.9.

Great! Thank you! We can update but it's very sensitive due to the
workload. I can't officially update our gluster until we have a cluster
with a couple thousand nodes to test with. However, for this problem,
this is on my list on the test machine. I'm hoping I can reproduce it. So far
no luck making it happen again. Once I hit it, I will try to collect more data
and at the end update gluster.

What do you think about the suggestion to increase the shard size? Are
you using the default size on your 1TB and 2TB images?

> Amar also asked a question regarding enabling Sharding in the volume after
> creating the VMs disks, which would certainly mess up the volume if that what
> happened.

Oh I missed this question. I basically scripted it quick since I was
doing it so often.. I have a similar script that takes it away to start
over.

set -x
pdsh -g gluster mkdir /data/brick_adminvm/
gluster volume create adminvm replica 3 transport tcp 
172.23.255.151:/data/brick_adminvm 172.23.255.152:/data/brick_adminvm 
172.23.255.153:/data/brick_adminvm
gluster volume set adminvm group virt
gluster volume set adminvm granular-entry-heal enable
gluster volume set adminvm storage.owner-uid 439
gluster volume set adminvm storage.owner-gid 443
gluster volume start adminvm

pdsh -g gluster mount /adminvm

echo -n "press enter to continue for restore tarball"

pushd /adminvm
tar xvf /root/backup.tar
popd

echo -n "press enter to continue for qemu-img"

pushd /adminvm
qemu-img create -f raw -o preallocation=falloc /adminvm/images/adminvm.img 5T
popd


Thanks again for the kind responses,

Erik

> 
> On Wed, Jan 27, 2021 at 5:28 PM Erik Jacobson  wrote:
> 
> > > Shortly after the sharded volume is made, there are some fuse mount
> > > messages. I'm not 100% sure if this was just before or during the
> > > big qemu-img command to make the 5T image
> > > (qemu-img create -f raw -o preallocation=falloc
> > > /adminvm/images/adminvm.img 5T)
> > Any reason to have a single disk with this size ?
> 
> > Usually in any
> > virtualization I have used , it is always recommended to keep it lower.
> > Have you thought about multiple disks with smaller size ?
> 
> Yes, because the actual virtual machine is an admin node/head node cluster
> manager for a supercomputer that hosts big OS images and drives
> multi-thousand-node-clusters (boot, monitoring, image creation,
> distribution, sometimes NFS roots, etc) . So this VM is a biggie.
> 
> We could make multiple smaller images but it would be very painful since
> it differs from the normal non-VM setup.
> 
> So unlike many solutions where you have lots of small VMs with their
> images small images, this solution is one giant VM with one giant image.
> We're essentially using gluster in this use case (as opposed to others I
> have posted about in the past) for head node failover (combined with
> pacemaker).
> 
> > Also worth
> > noting is that RHII is supported only when the shard size is  512MB, so
> > it's worth trying bigger shard size .
> 
> I have put larger shard size and newer gluster version on the list to
> try. Thank you! Hoping to get it failing again to try these things!
> 
> 
> 
> --
> Respectfully
> Mahdi




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] qemu raw image file - qemu and grub2 can't find boot content from VM

2021-01-27 Thread Mahdi Adnan
I think the following messages are not harmful;

[2021-01-26 19:28:40.652898] W [MSGID: 101159]
[inode.c:1212:__inode_unlink] 0-inode: be318638-e8a0-4c6d-977d-7a937a
a84806/48bb5288-e27e-46c9-9f7c-944a804df361.1: dentry not found in
48bb5288-e27e-46c9-9f7c-944a804df361
[2021-01-26 19:28:40.652975] W [MSGID: 101159]
[inode.c:1212:__inode_unlink] 0-inode: be318638-e8a0-4c6d-977d-7a937a
a84806/931508ed-9368-4982-a53e-7187a9f0c1f9.3: dentry not found in
931508ed-9368-4982-a53e-7187a9f0c1f9
[2021-01-26 19:28:40.653047] W [MSGID: 101159]
[inode.c:1212:__inode_unlink] 0-inode: be318638-e8a0-4c6d-977d-7a937a
a84806/e808ecab-2e70-4ef3-954e-ce1b78ed8b52.4: dentry not found in
e808ecab-2e70-4ef3-954e-ce1b78ed8b52
[2021-01-26 19:28:40.653102] W [MSGID: 101159]
[inode.c:1212:__inode_unlink] 0-inode: be318638-e8a0-4c6d-977d-7a937a
a84806/2c62c383-d869-4655-9c03-f08a86a874ba.6: dentry not found in
2c62c383-d869-4655-9c03-f08a86a874ba
[2021-01-26 19:28:40.653169] W [MSGID: 101159]
[inode.c:1212:__inode_unlink] 0-inode: be318638-e8a0-4c6d-977d-7a937a
a84806/556ffbc9-bcbe-445a-93f5-13784c5a6df1.2: dentry not found in
556ffbc9-bcbe-445a-93f5-13784c5a6df1
[2021-01-26 19:28:40.653218] W [MSGID: 101159]
[inode.c:1212:__inode_unlink] 0-inode: be318638-e8a0-4c6d-977d-7a937a
a84806/5d414e7c-335d-40da-bb96-6c427181338b.5: dentry not found in
5d414e7c-335d-40da-bb96-6c427181338b
[2021-01-26 19:28:40.653314] W [MSGID: 101159]
[inode.c:1212:__inode_unlink] 0-inode: be318638-e8a0-4c6d-977d-7a937a
a84806/43364dc9-2d8e-4fca-89d2-e11dee6fcfd4.8: dentry not found in
43364dc9-2d8e-4fca-89d2-e11dee6fcfd4

Also, I would like to point that I have VMs with large disks 1TB and 2TB,
and have no issues. definitely would upgrade Gluster version like let's say
at least 7.9.
Amar also asked a question regarding enabling Sharding in the volume after
creating the VMs disks, which would certainly mess up the volume if that
what happened.

On Wed, Jan 27, 2021 at 5:28 PM Erik Jacobson  wrote:

> > > Shortly after the sharded volume is made, there are some fuse mount
> > > messages. I'm not 100% sure if this was just before or during the
> > > big qemu-img command to make the 5T image
> > > (qemu-img create -f raw -o preallocation=falloc
> > > /adminvm/images/adminvm.img 5T)
> > Any reason to have a single disk with this size ?
>
> > Usually in any
> > virtualization I have used , it is always recommended to keep it lower.
> > Have you thought about multiple disks with smaller size ?
>
> Yes, because the actual virtual machine is an admin node/head node cluster
> manager for a supercomputer that hosts big OS images and drives
> multi-thousand-node-clusters (boot, monitoring, image creation,
> distribution, sometimes NFS roots, etc) . So this VM is a biggie.
>
> We could make multiple smaller images but it would be very painful since
> it differs from the normal non-VM setup.
>
> So unlike many solutions where you have lots of small VMs with their
> images small images, this solution is one giant VM with one giant image.
> We're essentially using gluster in this use case (as opposed to others I
> have posted about in the past) for head node failover (combined with
> pacemaker).
>
> > Also worth
> > noting is that RHII is supported only when the shard size is  512MB, so
> > it's worth trying bigger shard size .
>
> I have put larger shard size and newer gluster version on the list to
> try. Thank you! Hoping to get it failing again to try these things!
>


-- 
Respectfully
Mahdi




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] qemu raw image file - qemu and grub2 can't find boot content from VM

2021-01-27 Thread Erik Jacobson
> > Shortly after the sharded volume is made, there are some fuse mount
> > messages. I'm not 100% sure if this was just before or during the
> > big qemu-img command to make the 5T image
> > (qemu-img create -f raw -o preallocation=falloc
> > /adminvm/images/adminvm.img 5T)
> Any reason to have a single disk with this size ?

> Usually in any
> virtualization I have used , it is always recommended to keep it lower.
> Have you thought about multiple disks with smaller size ?

Yes, because the actual virtual machine is an admin node/head node cluster
manager for a supercomputer that hosts big OS images and drives
multi-thousand-node-clusters (boot, monitoring, image creation,
distribution, sometimes NFS roots, etc) . So this VM is a biggie.

We could make multiple smaller images but it would be very painful since
it differs from the normal non-VM setup.

So unlike many solutions where you have lots of small VMs with their
images small images, this solution is one giant VM with one giant image.
We're essentially using gluster in this use case (as opposed to others I
have posted about in the past) for head node failover (combined with
pacemaker).

> Also worth
> noting is that RHII is supported only when the shard size is  512MB, so
> it's worth trying bigger shard size .

I have put larger shard size and newer gluster version on the list to
try. Thank you! Hoping to get it failing again to try these things!




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] qemu raw image file - qemu and grub2 can't find boot content from VM

2021-01-27 Thread Erik Jacobson
> Are you sure that there is no heals pending at the time of the power up

I was watching heals when the problem was persisting and it was all
clear. This was a great suggestion though.

> I checked my oVirt-based gluster and the only difference is:
> cluster.gra
> nular-entry-heal: enable
> The options seem fine.

> > libglusterfs0-7.2-4723.1520.210122T1700.a.sles15sp2hpe.x86_64
> > glusterfs-7.2-4723.1520.210122T1700.a.sles15sp2hpe.x86_64
> > python3-gluster-7.2-4723.1520.210122T1700.a.sles15sp2hpe.noarch
> This one is quite old although it never caused any troubles with my
> oVirt VMs. Either try with latest v7 or even v8.3 .


I can try a newer version. The issue is we have to do massive testing
with thousands of nodes to validate function and that isn't always
available. So we tend to latch on to a good one and stage an upgrade
when we have a system big enough in the factory. In this case though,
the use case is a single VM. If I could find a way to reproduce the
problem I would be able to know if upgrading helped. These hard to
reproduce problems are painful!! We keep hitting it in places but
triggering has been elusive.

THANK YOU for replying back. I will continue to try to reproduce the
problem. If I get it back to consistent fail, I'll try updating gluster
then and take another closer look at the logs and post them.

Erik




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] qemu raw image file - qemu and grub2 can't find boot content from VM

2021-01-27 Thread Strahil Nikolov
В 17:36 -0600 на 26.01.2021 (вт), Erik Jacobson написа:
> Shortly after the sharded volume is made, there are some fuse mount
> messages. I'm not 100% sure if this was just before or during the
> big qemu-img command to make the 5T image
> (qemu-img create -f raw -o preallocation=falloc
> /adminvm/images/adminvm.img 5T)
Any reason to have a single disk with this size ?
Usually in any
virtualization I have used , it is always recommended to keep it lower.
Have you thought about multiple disks with smaller size ?

Also worth
noting is that RHII is supported only when the shard size is  512MB, so
it's worth trying bigger shard size .

Best Regards,
Strahil Nikolov





Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] qemu raw image file - qemu and grub2 can't find boot content from VM

2021-01-26 Thread Amar Tumballi
Was a volume with existing data got converted to sharding volume?

On Wed, Jan 27, 2021 at 5:06 AM Erik Jacobson  wrote:

> Shortly after the sharded volume is made, there are some fuse mount
> messages. I'm not 100% sure if this was just before or during the
> big qemu-img command to make the 5T image
> (qemu-img create -f raw -o preallocation=falloc
> /adminvm/images/adminvm.img 5T)
>
>
> (from /var/log/glusterfs/adminvm.log)
> [2021-01-26 19:18:21.287697] I [fuse-bridge.c:5166:fuse_init]
> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel
> 7.31
> [2021-01-26 19:18:21.287719] I [fuse-bridge.c:5777:fuse_graph_sync]
> 0-fuse: switched to graph 0
> [2021-01-26 19:18:23.945566] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2633:client4_0_lookup_cbk] 0-adminvm-client-2: remote
> operation failed. Path: /.shard/0cb55720-2288-46c2-bd7e-5d9bd23b40bd.7
> (----) [No data available]
> [2021-01-26 19:18:54.089721] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2633:client4_0_lookup_cbk] 0-adminvm-client-0: remote
> operation failed. Path: /.shard/0cb55720-2288-46c2-bd7e-5d9bd23b40bd.85
> (----) [No data available]
> [2021-01-26 19:18:54.089784] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2633:client4_0_lookup_cbk] 0-adminvm-client-1: remote
> operation failed. Path: /.shard/0cb55720-2288-46c2-bd7e-5d9bd23b40bd.85
> (----) [No data available]
> [2021-01-26 19:18:55.048613] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2633:client4_0_lookup_cbk] 0-adminvm-client-1: remote
> operation failed. Path: /.shard/0cb55720-2288-46c2-bd7e-5d9bd23b40bd.88
> (----) [No data available]
> [2021-01-26 19:18:55.355131] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2633:client4_0_lookup_cbk] 0-adminvm-client-0: remote
> operation failed. Path: /.shard/0cb55720-2288-46c2-bd7e-5d9bd23b40bd.89
> (----) [No data available]
> [2021-01-26 19:18:55.981094] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2633:client4_0_lookup_cbk] 0-adminvm-client-0: remote
> operation failed. Path: /.shard/0cb55720-2288-46c2-bd7e-5d9bd23b40bd.91
> (----) [No data available]
> ..
>
>
> Towards the end (or just after, it's hard to tell) of the qemu-img
> create command, these msgs showed up in the adminvm.log. I just supplied
> the first few. There were many:
>
>
> [2021-01-26 19:28:40.652898] W [MSGID: 101159]
> [inode.c:1212:__inode_unlink] 0-inode:
> be318638-e8a0-4c6d-977d-7a937aa84806/48bb5288-e27e-46c9-9f7c-944a804df361.1:
> dentry not found in 48bb5288-e27e-46c9-9f7c-944a804df361
> [2021-01-26 19:28:40.652975] W [MSGID: 101159]
> [inode.c:1212:__inode_unlink] 0-inode:
> be318638-e8a0-4c6d-977d-7a937aa84806/931508ed-9368-4982-a53e-7187a9f0c1f9.3:
> dentry not found in 931508ed-9368-4982-a53e-7187a9f0c1f9
> [2021-01-26 19:28:40.653047] W [MSGID: 101159]
> [inode.c:1212:__inode_unlink] 0-inode:
> be318638-e8a0-4c6d-977d-7a937aa84806/e808ecab-2e70-4ef3-954e-ce1b78ed8b52.4:
> dentry not found in e808ecab-2e70-4ef3-954e-ce1b78ed8b52
> [2021-01-26 19:28:40.653102] W [MSGID: 101159]
> [inode.c:1212:__inode_unlink] 0-inode:
> be318638-e8a0-4c6d-977d-7a937aa84806/2c62c383-d869-4655-9c03-f08a86a874ba.6:
> dentry not found in 2c62c383-d869-4655-9c03-f08a86a874ba
> [2021-01-26 19:28:40.653169] W [MSGID: 101159]
> [inode.c:1212:__inode_unlink] 0-inode:
> be318638-e8a0-4c6d-977d-7a937aa84806/556ffbc9-bcbe-445a-93f5-13784c5a6df1.2:
> dentry not found in 556ffbc9-bcbe-445a-93f5-13784c5a6df1
> [2021-01-26 19:28:40.653218] W [MSGID: 101159]
> [inode.c:1212:__inode_unlink] 0-inode:
> be318638-e8a0-4c6d-977d-7a937aa84806/5d414e7c-335d-40da-bb96-6c427181338b.5:
> dentry not found in 5d414e7c-335d-40da-bb96-6c427181338b
> [2021-01-26 19:28:40.653314] W [MSGID: 101159]
> [inode.c:1212:__inode_unlink] 0-inode:
> be318638-e8a0-4c6d-977d-7a937aa84806/43364dc9-2d8e-4fca-89d2-e11dee6fcfd4.8:
> dentry not found in 43364dc9-2d8e-4fca-89d2-e11dee6fcfd4
> .
>
>
> So now I installed Linux in to a VM using the above as the VM image.
> There were no additional fuse messages while the admin VM was being
> installed with our installer (via qemu on the same physical node the
> above messages appeared and same node where I ran qemu-img create).
>
> Rebooted the virtual machine and it booted fine. No new messages in
> fuse log. So now it's officially booted. This was 'reboot' so qemu
> didn't restart.
>
> halted the vm with 'halt', then in virt-manager did a forced shut down.
>
> started vm from scratch.
>
> Still no new messages and it booted fine.
>
> Powered off a physical node and brought it back, still fine.
> Reset all physical nodes and brought them back, still fine.
>
> I am unable to trigger this problem. However, once it starts to go bad,
> it stays bad and stays bad across all the physical nodes. The kpartx
> mount root from within the image then umount it trick is only a
> temporary 

Re: [Gluster-users] qemu raw image file - qemu and grub2 can't find boot content from VM

2021-01-26 Thread Erik Jacobson
Shortly after the sharded volume is made, there are some fuse mount
messages. I'm not 100% sure if this was just before or during the
big qemu-img command to make the 5T image
(qemu-img create -f raw -o preallocation=falloc
/adminvm/images/adminvm.img 5T)


(from /var/log/glusterfs/adminvm.log)
[2021-01-26 19:18:21.287697] I [fuse-bridge.c:5166:fuse_init] 0-glusterfs-fuse: 
FUSE inited with protocol versions: glusterfs 7.24 kernel 7.31
[2021-01-26 19:18:21.287719] I [fuse-bridge.c:5777:fuse_graph_sync] 0-fuse: 
switched to graph 0
[2021-01-26 19:18:23.945566] W [MSGID: 114031] 
[client-rpc-fops_v2.c:2633:client4_0_lookup_cbk] 0-adminvm-client-2: remote 
operation failed. Path: /.shard/0cb55720-2288-46c2-bd7e-5d9bd23b40bd.7 
(----) [No data available]
[2021-01-26 19:18:54.089721] W [MSGID: 114031] 
[client-rpc-fops_v2.c:2633:client4_0_lookup_cbk] 0-adminvm-client-0: remote 
operation failed. Path: /.shard/0cb55720-2288-46c2-bd7e-5d9bd23b40bd.85 
(----) [No data available]
[2021-01-26 19:18:54.089784] W [MSGID: 114031] 
[client-rpc-fops_v2.c:2633:client4_0_lookup_cbk] 0-adminvm-client-1: remote 
operation failed. Path: /.shard/0cb55720-2288-46c2-bd7e-5d9bd23b40bd.85 
(----) [No data available]
[2021-01-26 19:18:55.048613] W [MSGID: 114031] 
[client-rpc-fops_v2.c:2633:client4_0_lookup_cbk] 0-adminvm-client-1: remote 
operation failed. Path: /.shard/0cb55720-2288-46c2-bd7e-5d9bd23b40bd.88 
(----) [No data available]
[2021-01-26 19:18:55.355131] W [MSGID: 114031] 
[client-rpc-fops_v2.c:2633:client4_0_lookup_cbk] 0-adminvm-client-0: remote 
operation failed. Path: /.shard/0cb55720-2288-46c2-bd7e-5d9bd23b40bd.89 
(----) [No data available]
[2021-01-26 19:18:55.981094] W [MSGID: 114031] 
[client-rpc-fops_v2.c:2633:client4_0_lookup_cbk] 0-adminvm-client-0: remote 
operation failed. Path: /.shard/0cb55720-2288-46c2-bd7e-5d9bd23b40bd.91 
(----) [No data available]
..


Towards the end (or just after, it's hard to tell) of the qemu-img
create command, these msgs showed up in the adminvm.log. I just supplied
the first few. There were many:


[2021-01-26 19:28:40.652898] W [MSGID: 101159] [inode.c:1212:__inode_unlink] 
0-inode: 
be318638-e8a0-4c6d-977d-7a937aa84806/48bb5288-e27e-46c9-9f7c-944a804df361.1: 
dentry not found in 48bb5288-e27e-46c9-9f7c-944a804df361
[2021-01-26 19:28:40.652975] W [MSGID: 101159] [inode.c:1212:__inode_unlink] 
0-inode: 
be318638-e8a0-4c6d-977d-7a937aa84806/931508ed-9368-4982-a53e-7187a9f0c1f9.3: 
dentry not found in 931508ed-9368-4982-a53e-7187a9f0c1f9
[2021-01-26 19:28:40.653047] W [MSGID: 101159] [inode.c:1212:__inode_unlink] 
0-inode: 
be318638-e8a0-4c6d-977d-7a937aa84806/e808ecab-2e70-4ef3-954e-ce1b78ed8b52.4: 
dentry not found in e808ecab-2e70-4ef3-954e-ce1b78ed8b52
[2021-01-26 19:28:40.653102] W [MSGID: 101159] [inode.c:1212:__inode_unlink] 
0-inode: 
be318638-e8a0-4c6d-977d-7a937aa84806/2c62c383-d869-4655-9c03-f08a86a874ba.6: 
dentry not found in 2c62c383-d869-4655-9c03-f08a86a874ba
[2021-01-26 19:28:40.653169] W [MSGID: 101159] [inode.c:1212:__inode_unlink] 
0-inode: 
be318638-e8a0-4c6d-977d-7a937aa84806/556ffbc9-bcbe-445a-93f5-13784c5a6df1.2: 
dentry not found in 556ffbc9-bcbe-445a-93f5-13784c5a6df1
[2021-01-26 19:28:40.653218] W [MSGID: 101159] [inode.c:1212:__inode_unlink] 
0-inode: 
be318638-e8a0-4c6d-977d-7a937aa84806/5d414e7c-335d-40da-bb96-6c427181338b.5: 
dentry not found in 5d414e7c-335d-40da-bb96-6c427181338b
[2021-01-26 19:28:40.653314] W [MSGID: 101159] [inode.c:1212:__inode_unlink] 
0-inode: 
be318638-e8a0-4c6d-977d-7a937aa84806/43364dc9-2d8e-4fca-89d2-e11dee6fcfd4.8: 
dentry not found in 43364dc9-2d8e-4fca-89d2-e11dee6fcfd4
.


So now I installed Linux in to a VM using the above as the VM image.
There were no additional fuse messages while the admin VM was being
installed with our installer (via qemu on the same physical node the
above messages appeared and same node where I ran qemu-img create).

Rebooted the virtual machine and it booted fine. No new messages in
fuse log. So now it's officially booted. This was 'reboot' so qemu
didn't restart.

halted the vm with 'halt', then in virt-manager did a forced shut down.

started vm from scratch.

Still no new messages and it booted fine.

Powered off a physical node and brought it back, still fine.
Reset all physical nodes and brought them back, still fine.

I am unable to trigger this problem. However, once it starts to go bad,
it stays bad and stays bad across all the physical nodes. The kpartx
mount root from within the image then umount it trick is only a
temporary fix that doesn't persist beyond one boot once we're in the
bad state.

So something gets in to a bad state and stays that way but we don't know
how to cause it to happen at will. I will continue to try to reproduce
this as it's causing some huge problems in the f

Re: [Gluster-users] qemu raw image file - qemu and grub2 can't find boot content from VM

2021-01-26 Thread Erik Jacobson
Thank you so much for responding! More below.


>  Anything in the logs of the fuse mount? can you stat the file from the mount?
> also, the report of an image is only 64M makes me think about Sharding as the
> default value of Shard size is 64M.
> Do you have any clues on when this issue start to happen? was there any
> operation done to the Gluster cluster?


- I had just created the gluster volumes within an hour of the problem
  to test the vary problem I reported. So it was a "fresh start".

- It booted one or two times, then stopped booting. Once it couldn't
  boot, all 3 nodes were the same in that grub2 couldn't boot in the VM
  image.

As for the fuse log, I did see a couple of these before it happened the
first time. I'm not sure if it's a clue or not.

[2021-01-25 22:48:19.310467] I [fuse-bridge.c:5777:fuse_graph_sync] 0-fuse: 
switched to graph 0
[2021-01-25 22:50:09.693958] E [fuse-bridge.c:227:check_and_dump_fuse_W] (--> 
/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17a)[0x7f914e346faa] (--> 
/usr/lib64/glusterfs/7.2/xlator/mount/fuse.so(+0x874a)[0x7f914a3d374a] (--> 
/usr/lib64/glusterfs/7.2/xlator/mount/fuse.so(+0x91cb)[0x7f914a3d41cb] (--> 
/lib64/libpthread.so.0(+0x84f9)[0x7f914cf184f9] (--> 
/lib64/libc.so.6(clone+0x3f)[0x7f914c76afbf] ) 0-glusterfs-fuse: writing to 
fuse device failed: No such file or directory
[2021-01-25 22:50:09.694462] E [fuse-bridge.c:227:check_and_dump_fuse_W] (--> 
/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17a)[0x7f914e346faa] (--> 
/usr/lib64/glusterfs/7.2/xlator/mount/fuse.so(+0x874a)[0x7f914a3d374a] (--> 
/usr/lib64/glusterfs/7.2/xlator/mount/fuse.so(+0x91cb)[0x7f914a3d41cb] (--> 
/lib64/libpthread.so.0(+0x84f9)[0x7f914cf184f9] (--> 
/lib64/libc.so.6(clone+0x3f)[0x7f914c76afbf] ) 0-glusterfs-fuse: writing to 
fuse device failed: No such file or directory



I have reserved the test system again. My plans today are:
 - Start over with the gluster volume on the machine with sles15sp2
   updates

 - Learn if there are modifications to the image (besides
   mounting/umounting filesystems with the image using kpartx to map
   them to force it to work). What if I add/remove a byte from the end
   of the image file for example.

 - Revert the setup to sles15sp2 with no updates. My theory is the
   updates are not making a difference and it's just random chance.
   (re-making the gluster volume in the process)

 - The 64MB shard size made me think too!!

 - If the team feels it is worth it, I could try a newer gluster. We're
   using the versions we've validated at scale when we have large
   clusters in the factory but if the team thinks I should try something
   else I'm happy to re-build it!!!  We are @ 7.2 plus afr-event-gen-changes
   patch.

I will keep a better eye on the fuse log to tie an error to the problem
starting.


THANKS AGAIN for responding and let me know if you have any more
clues!

Erik


> 
> On Tue, Jan 26, 2021 at 2:40 AM Erik Jacobson  wrote:
> 
> Hello all. Thanks again for gluster. We're having a strange problem
> getting virtual machines started that are hosted on a gluster volume.
> 
> One of the ways we use gluster now is to make a HA-ish cluster head
> node. A virtual machine runs in the shared storage and is backed up by 3
> physical servers that contribute to the gluster storage share.
> 
> We're using sharding in this volume. The VM image file is around 5T and
> we use qemu-img with falloc to get all the blocks allocated in advance.
> 
> We are not using gfapi largely because it would mean we have to build
> our own libvirt and qemu and we'd prefer not to do that. So we're using
> a glusterfs fuse mount to host the image. The virtual machine is using
> virtio disks but we had similar trouble using scsi emulation.
> 
> The issue: - all seems well, the VM head node installs, boots, etc.
> 
> However, at some point, it stops being able to boot! grub2 acts like it
> cannot find /boot. At the grub2 prompt, it can see the partitions, but
> reports no filesystem found where there are indeed filesystems.
> 
> If we switch qemu to use "direct kernel load" (bypass grub2), this often
> works around the problem but in one case Linux gave us a clue. Linux
> reported /dev/vda as only being 64 megabytes, which would explain a lot.
> This means the virtual machine Linux though the disk supplied by the
> disk image was tiny! 64M instead of 5T
> 
> We are using sles15sp2 and hit the problem more often with updates
> applied than without. I'm in the process of trying to isolate if there
> is a sles15sp2 update causing this, or if we're within "random chance".
> 
> On one of the physical nodes, if it is in the failure mode, if I use
> 'kpartx' to create the partitions from the image file, then mount the
> giant root filesystem (ie mount /dev/mapper/loop0p31 /mnt) and then
> umount /mnt, then that physical node starts the VM fine, 

Re: [Gluster-users] qemu raw image file - qemu and grub2 can't find boot content from VM

2021-01-25 Thread Mahdi Adnan
Hello Erik,

 Anything in the logs of the fuse mount? can you stat the file from the
mount? also, the report of an image is only 64M makes me think about
Sharding as the default value of Shard size is 64M.
Do you have any clues on when this issue start to happen? was there any
operation done to the Gluster cluster?

On Tue, Jan 26, 2021 at 2:40 AM Erik Jacobson  wrote:

> Hello all. Thanks again for gluster. We're having a strange problem
> getting virtual machines started that are hosted on a gluster volume.
>
> One of the ways we use gluster now is to make a HA-ish cluster head
> node. A virtual machine runs in the shared storage and is backed up by 3
> physical servers that contribute to the gluster storage share.
>
> We're using sharding in this volume. The VM image file is around 5T and
> we use qemu-img with falloc to get all the blocks allocated in advance.
>
> We are not using gfapi largely because it would mean we have to build
> our own libvirt and qemu and we'd prefer not to do that. So we're using
> a glusterfs fuse mount to host the image. The virtual machine is using
> virtio disks but we had similar trouble using scsi emulation.
>
> The issue: - all seems well, the VM head node installs, boots, etc.
>
> However, at some point, it stops being able to boot! grub2 acts like it
> cannot find /boot. At the grub2 prompt, it can see the partitions, but
> reports no filesystem found where there are indeed filesystems.
>
> If we switch qemu to use "direct kernel load" (bypass grub2), this often
> works around the problem but in one case Linux gave us a clue. Linux
> reported /dev/vda as only being 64 megabytes, which would explain a lot.
> This means the virtual machine Linux though the disk supplied by the
> disk image was tiny! 64M instead of 5T
>
> We are using sles15sp2 and hit the problem more often with updates
> applied than without. I'm in the process of trying to isolate if there
> is a sles15sp2 update causing this, or if we're within "random chance".
>
> On one of the physical nodes, if it is in the failure mode, if I use
> 'kpartx' to create the partitions from the image file, then mount the
> giant root filesystem (ie mount /dev/mapper/loop0p31 /mnt) and then
> umount /mnt, then that physical node starts the VM fine, grub2 loads,
> the virtual machine is fully happy!  Until I try to shut it down and
> start it up again, at which point it sticks at grub2 again! What about
> mounting the image file makes it so qemu sees the whole disk?
>
> The problem doesn't always happen but once it starts, the same VM image has
> trouble starting on any of the 3 physical nodes sharing the storage.
> But using the trick to force-mount the root within the image with
> kpartx, then the machine can come up. My only guess is this changes the
> file just a tiny bit in the middle of the image.
>
> Once the problem starts, it keeps happening except temporarily working
> when I do the loop mount trick on the physical admin.
>
>
> Here is some info about what I have in place:
>
>
> nano-1:/adminvm/images # gluster volume info
>
> Volume Name: adminvm
> Type: Replicate
> Volume ID: 67de902c-8c00-4dc9-8b69-60b93b5f6104
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: 172.23.255.151:/data/brick_adminvm
> Brick2: 172.23.255.152:/data/brick_adminvm
> Brick3: 172.23.255.153:/data/brick_adminvm
> Options Reconfigured:
> performance.client-io-threads: on
> nfs.disable: on
> storage.fips-mode-rchecksum: on
> transport.address-family: inet
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.low-prio-threads: 32
> network.remote-dio: enable
> cluster.eager-lock: enable
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> cluster.data-self-heal-algorithm: full
> cluster.locking-scheme: granular
> cluster.shd-max-threads: 8
> cluster.shd-wait-qlength: 1
> features.shard: on
> user.cifs: off
> cluster.choose-local: off
> client.event-threads: 4
> server.event-threads: 4
> cluster.granular-entry-heal: enable
> storage.owner-uid: 439
> storage.owner-gid: 443
>
>
>
>
> libglusterfs0-7.2-4723.1520.210122T1700.a.sles15sp2hpe.x86_64
> glusterfs-7.2-4723.1520.210122T1700.a.sles15sp2hpe.x86_64
> python3-gluster-7.2-4723.1520.210122T1700.a.sles15sp2hpe.noarch
>
>
>
> nano-1:/adminvm/images # uname -a
> Linux nano-1 5.3.18-24.46-default #1 SMP Tue Jan 5 16:11:50 UTC 2021
> (4ff469b) x86_64 x86_64 x86_64 GNU/Linux
> nano-1:/adminvm/images # rpm -qa | grep qemu-4
> qemu-4.2.0-9.4.x86_64
>
>
>
> Would love any advice
>
>
> Erik
> 
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>


-- 
Respectfully
Mahdi




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14

[Gluster-users] qemu raw image file - qemu and grub2 can't find boot content from VM

2021-01-25 Thread Erik Jacobson
Hello all. Thanks again for gluster. We're having a strange problem
getting virtual machines started that are hosted on a gluster volume.

One of the ways we use gluster now is to make a HA-ish cluster head
node. A virtual machine runs in the shared storage and is backed up by 3
physical servers that contribute to the gluster storage share.

We're using sharding in this volume. The VM image file is around 5T and
we use qemu-img with falloc to get all the blocks allocated in advance.

We are not using gfapi largely because it would mean we have to build
our own libvirt and qemu and we'd prefer not to do that. So we're using
a glusterfs fuse mount to host the image. The virtual machine is using
virtio disks but we had similar trouble using scsi emulation.

The issue: - all seems well, the VM head node installs, boots, etc.

However, at some point, it stops being able to boot! grub2 acts like it
cannot find /boot. At the grub2 prompt, it can see the partitions, but
reports no filesystem found where there are indeed filesystems.

If we switch qemu to use "direct kernel load" (bypass grub2), this often
works around the problem but in one case Linux gave us a clue. Linux
reported /dev/vda as only being 64 megabytes, which would explain a lot.
This means the virtual machine Linux though the disk supplied by the
disk image was tiny! 64M instead of 5T

We are using sles15sp2 and hit the problem more often with updates
applied than without. I'm in the process of trying to isolate if there
is a sles15sp2 update causing this, or if we're within "random chance".

On one of the physical nodes, if it is in the failure mode, if I use
'kpartx' to create the partitions from the image file, then mount the
giant root filesystem (ie mount /dev/mapper/loop0p31 /mnt) and then
umount /mnt, then that physical node starts the VM fine, grub2 loads,
the virtual machine is fully happy!  Until I try to shut it down and
start it up again, at which point it sticks at grub2 again! What about
mounting the image file makes it so qemu sees the whole disk?

The problem doesn't always happen but once it starts, the same VM image has
trouble starting on any of the 3 physical nodes sharing the storage.
But using the trick to force-mount the root within the image with
kpartx, then the machine can come up. My only guess is this changes the
file just a tiny bit in the middle of the image.

Once the problem starts, it keeps happening except temporarily working
when I do the loop mount trick on the physical admin.


Here is some info about what I have in place:


nano-1:/adminvm/images # gluster volume info

Volume Name: adminvm
Type: Replicate
Volume ID: 67de902c-8c00-4dc9-8b69-60b93b5f6104
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 172.23.255.151:/data/brick_adminvm
Brick2: 172.23.255.152:/data/brick_adminvm
Brick3: 172.23.255.153:/data/brick_adminvm
Options Reconfigured:
performance.client-io-threads: on
nfs.disable: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.low-prio-threads: 32
network.remote-dio: enable
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 1
features.shard: on
user.cifs: off
cluster.choose-local: off
client.event-threads: 4
server.event-threads: 4
cluster.granular-entry-heal: enable
storage.owner-uid: 439
storage.owner-gid: 443




libglusterfs0-7.2-4723.1520.210122T1700.a.sles15sp2hpe.x86_64
glusterfs-7.2-4723.1520.210122T1700.a.sles15sp2hpe.x86_64
python3-gluster-7.2-4723.1520.210122T1700.a.sles15sp2hpe.noarch



nano-1:/adminvm/images # uname -a
Linux nano-1 5.3.18-24.46-default #1 SMP Tue Jan 5 16:11:50 UTC 2021 (4ff469b) 
x86_64 x86_64 x86_64 GNU/Linux
nano-1:/adminvm/images # rpm -qa | grep qemu-4
qemu-4.2.0-9.4.x86_64



Would love any advice


Erik




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users