Re: [Gluster-users] Gluster usage scenarios in HPC cluster management

2021-03-23 Thread Zeeshan Ali Shah
Just add on, we are using gluster beside our main storage Lustre for k8s
cluster  .

On Wed, Mar 24, 2021 at 4:33 AM Ewen Chan  wrote:

> Erik:
>
> I just want to say that I really appreciate you sharing this information
> with us.
>
> I don't think that my personal home lab micro cluster environment may get
> that complicated enough where I have a virtualized testing/Gluster
> development setup like you have, but on the other hand, as I mentioned
> before, I am running 100 Gbps Infiniband so what I am trying to do/use
> Gluster for is quite different than what and how most people deploy/install
> Gluster for production systems.
>
> If I wanted to splurge, I'd get a second set of IB cables so that the high
> speed interconnect layer can be split so that jobs will run on one layer of
> the Infiniband fabric whilst storage/Gluster may run on another layer.
>
> But for that, I'll have to revamp my entire microcluster, so there are no
> plans to do that just yet.
>
> Thank you.
>
> Sincerely,
> Ewen
>
> --
> *From:* gluster-users-boun...@gluster.org <
> gluster-users-boun...@gluster.org> on behalf of Erik Jacobson <
> erik.jacob...@hpe.com>
> *Sent:* March 23, 2021 10:43 AM
> *To:* Diego Zuccato 
> *Cc:* gluster-users@gluster.org 
> *Subject:* Re: [Gluster-users] Gluster usage scenarios in HPC cluster
> management
>
> > I still have to grasp the "leader node" concept.
> > Weren't gluster nodes "peers"? Or by "leader" you mean that it's
> > mentioned in the fstab entry like
> > /l1,l2,l3:gv0 /mnt/gv0 glusterfs defaults 0 0
> > while the peer list includes l1,l2,l3 and a bunch of other nodes?
>
> Right, it's a list of 24 peers. The 24 peers are split in to a 3x24
> replicated/distributed setup for the volumes. They also have entries
> for themselves as clients in /etc/fstab. I'll dump some volume info
> at the end of this.
>
>
> > > So we would have 24 leader nodes, each leader would have a disk serving
> > > 4 bricks (one of which is simply a lock FS for CTDB, one is sharded,
> > > one is for logs, and one is heavily optimized for non-object expanded
> > > tree NFS). The term "disk" is loose.
> > That's a system way bigger than ours (3 nodes, replica3arbiter1, up to
> > 36 bricks per node).
>
> I have one dedicated "disk" (could be disk, raid lun, single ssd) and
> 4 directories for volumes ("bricks"). Of course, the "ctdb" volume is just
> for the lock and has a single file.
>
> >
> > > Specs of a leader node at a customer site:
> > >  * 256G RAM
> > Glip! 256G for 4 bricks... No wonder I have had troubles running 26
> > bricks in 64GB RAM... :)
>
> I'm not an expert in memory pools or how they would be impacted by more
> peers. I had to do a little research and I think what you're after is
> if I can run gluster volume status cm_shared mem on a real cluster
> that has a decent node count. I will see if I can do that.
>
>
> TEST ENV INFO for those who care
> 
> Here is some info on my own test environemnt which you can skip.
>
> I have the environment duplicated on my desktop using virtual machines and
> it
> runs fine (slow but fine). It's a 3x1. I take out my giant 8GB cache
> from the optimized volumes but other than that it is fine. In my
> development environment, the gluster disk is a 40G qcow2 image.
>
> Cache sizes changed from 8G to 100M to fit in the VM.
>
> XML snips for memory, cpus:
> 
>   cm-leader1
>   99d5a8fc-a32c-b181-2f1a-2929b29c3953
>   3268608
>   3268608
>   2
>   
> ..
>
>
> I have 1 admin (head) node VM, 3 VM leader nodes like above, and one test
> compute node for my development environment.
>
> My desktop where I test this cluster stack is a beefy but not brand new
> desktop:
>
> Architecture:x86_64
> CPU op-mode(s):  32-bit, 64-bit
> Byte Order:  Little Endian
> Address sizes:   46 bits physical, 48 bits virtual
> CPU(s):  16
> On-line CPU(s) list: 0-15
> Thread(s) per core:  2
> Core(s) per socket:  8
> Socket(s):   1
> NUMA node(s):1
> Vendor ID:   GenuineIntel
> CPU family:  6
> Model:   79
> Model name:  Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
> Stepping:1
> CPU MHz: 2594.333
> CPU max MHz: 3000.
> CPU min MHz: 1200.
> BogoMIPS:4190.22
> Virtualization:  VT-x
> L1d cache:   32K
> L1i cache:   32K
> L2 cache:256K
> L3 cache:20480K
> NUMA node0 CPU(s):   0-15
> 
>
>
> (Not that it matters but this is a HP Z640 Workstation)
>
> 128G memory (good for a desktop I know, but I think 64G would work since
> I also run windows10 vm environment for unrelated reasons)
>
> I was able to find a MegaRAID in the lab a few years ago and so I have 4
> drives in a MegaRAID and carve off a separate volume for the VM disk
> images. It has a cache. So that's also more beefy than a normal desktop.
> (on the other hand, I have no SSDs. May experimen

Re: [Gluster-users] Gluster usage scenarios in HPC cluster management

2021-03-23 Thread Ewen Chan
Erik:

I just want to say that I really appreciate you sharing this information with 
us.

I don't think that my personal home lab micro cluster environment may get that 
complicated enough where I have a virtualized testing/Gluster development setup 
like you have, but on the other hand, as I mentioned before, I am running 100 
Gbps Infiniband so what I am trying to do/use Gluster for is quite different 
than what and how most people deploy/install Gluster for production systems.

If I wanted to splurge, I'd get a second set of IB cables so that the high 
speed interconnect layer can be split so that jobs will run on one layer of the 
Infiniband fabric whilst storage/Gluster may run on another layer.

But for that, I'll have to revamp my entire microcluster, so there are no plans 
to do that just yet.

Thank you.

Sincerely,
Ewen


From: gluster-users-boun...@gluster.org  on 
behalf of Erik Jacobson 
Sent: March 23, 2021 10:43 AM
To: Diego Zuccato 
Cc: gluster-users@gluster.org 
Subject: Re: [Gluster-users] Gluster usage scenarios in HPC cluster management

> I still have to grasp the "leader node" concept.
> Weren't gluster nodes "peers"? Or by "leader" you mean that it's
> mentioned in the fstab entry like
> /l1,l2,l3:gv0 /mnt/gv0 glusterfs defaults 0 0
> while the peer list includes l1,l2,l3 and a bunch of other nodes?

Right, it's a list of 24 peers. The 24 peers are split in to a 3x24
replicated/distributed setup for the volumes. They also have entries
for themselves as clients in /etc/fstab. I'll dump some volume info
at the end of this.


> > So we would have 24 leader nodes, each leader would have a disk serving
> > 4 bricks (one of which is simply a lock FS for CTDB, one is sharded,
> > one is for logs, and one is heavily optimized for non-object expanded
> > tree NFS). The term "disk" is loose.
> That's a system way bigger than ours (3 nodes, replica3arbiter1, up to
> 36 bricks per node).

I have one dedicated "disk" (could be disk, raid lun, single ssd) and
4 directories for volumes ("bricks"). Of course, the "ctdb" volume is just
for the lock and has a single file.

>
> > Specs of a leader node at a customer site:
> >  * 256G RAM
> Glip! 256G for 4 bricks... No wonder I have had troubles running 26
> bricks in 64GB RAM... :)

I'm not an expert in memory pools or how they would be impacted by more
peers. I had to do a little research and I think what you're after is
if I can run gluster volume status cm_shared mem on a real cluster
that has a decent node count. I will see if I can do that.


TEST ENV INFO for those who care

Here is some info on my own test environemnt which you can skip.

I have the environment duplicated on my desktop using virtual machines and it
runs fine (slow but fine). It's a 3x1. I take out my giant 8GB cache
from the optimized volumes but other than that it is fine. In my
development environment, the gluster disk is a 40G qcow2 image.

Cache sizes changed from 8G to 100M to fit in the VM.

XML snips for memory, cpus:

  cm-leader1
  99d5a8fc-a32c-b181-2f1a-2929b29c3953
  3268608
  3268608
  2
  
..


I have 1 admin (head) node VM, 3 VM leader nodes like above, and one test
compute node for my development environment.

My desktop where I test this cluster stack is a beefy but not brand new
desktop:

Architecture:x86_64
CPU op-mode(s):  32-bit, 64-bit
Byte Order:  Little Endian
Address sizes:   46 bits physical, 48 bits virtual
CPU(s):  16
On-line CPU(s) list: 0-15
Thread(s) per core:  2
Core(s) per socket:  8
Socket(s):   1
NUMA node(s):1
Vendor ID:   GenuineIntel
CPU family:  6
Model:   79
Model name:  Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
Stepping:1
CPU MHz: 2594.333
CPU max MHz: 3000.
CPU min MHz: 1200.
BogoMIPS:4190.22
Virtualization:  VT-x
L1d cache:   32K
L1i cache:   32K
L2 cache:256K
L3 cache:20480K
NUMA node0 CPU(s):   0-15



(Not that it matters but this is a HP Z640 Workstation)

128G memory (good for a desktop I know, but I think 64G would work since
I also run windows10 vm environment for unrelated reasons)

I was able to find a MegaRAID in the lab a few years ago and so I have 4
drives in a MegaRAID and carve off a separate volume for the VM disk
images. It has a cache. So that's also more beefy than a normal desktop.
(on the other hand, I have no SSDs. May experiment with that some day
but things work so well now I'm tempted to leave it until something
croaks :)

I keep all VMs for the test cluster with "Unsafe cache mode" since there
is no true data to worry about and it makes the test cases faster.

So I am able to test a complete cluster management stack including
3-leader-gluster servers, an admin, and compute all on my desktop using
virtual machines and shared networks within libivrt/qemu.

[Gluster-users] remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]

2021-03-23 Thread algol

Hi,

I have just configured a gluster volume.

I can mount it copy and read files. But, time to time, even without  
user operations I get lots of errors on the log files (pasted below).


Can anyone, please help me to figuring out what's wrong?

Thanks alot.

RM

root@srv-31:~# tail -f /var/log/glusterfs/glusterd.log   
/var/log/glusterfs/glustershd.log   
/var/log/glusterfs/bricks/data-glusterfs-ssd-brick[12]-brick.log


==> /var/log/glusterfs/bricks/data-glusterfs-ssd-brick1-brick.log <==

[2021-03-19 11:45:09.525079 +] E [MSGID: 113002]  
[posix-entry-ops.c:682:posix_mkdir] 0-ssd-volume-posix: gfid is null  
for (null) [Invalid argument]


[2021-03-19 11:45:09.525212 +] E [MSGID: 115056]  
[server-rpc-fops_v2.c:497:server4_mkdir_cbk] 0-ssd-volume-server:  
MKDIR info [{frame=11817}, {MKDIR_path=},  
{uuid_utoa=----0001}, {bname=},  
{client=CTX_ID:4b47408f-323c-4c6a-9a20-2ae2a3a2cdb8-GRAPH_ID:3-PID:2291-HOST:srv-32-PC_NAME:ssd-volume-client-0-RECON_NO:-0}, {error-xlator=ssd-volume-posix}, {errno=22}, {error=Invalid  
argument}]


==> /var/log/glusterfs/bricks/data-glusterfs-ssd-brick2-brick.log <==

[2021-03-19 11:45:13.525581 +] E [MSGID: 113002]  
[posix-entry-ops.c:682:posix_mkdir] 0-ssd-volume-posix: gfid is null  
for (null) [Invalid argument]


[2021-03-19 11:45:13.525711 +] E [MSGID: 115056]  
[server-rpc-fops_v2.c:497:server4_mkdir_cbk] 0-ssd-volume-server:  
MKDIR info [{frame=10055}, {MKDIR_path=},  
{uuid_utoa=----0001}, {bname=},  
{client=CTX_ID:4b47408f-323c-4c6a-9a20-2ae2a3a2cdb8-GRAPH_ID:3-PID:2291-HOST:srv-32-PC_NAME:ssd-volume-client-3-RECON_NO:-0}, {error-xlator=ssd-volume-posix}, {errno=22}, {error=Invalid  
argument}]


==> /var/log/glusterfs/bricks/data-glusterfs-ssd-brick1-brick.log <==

[2021-03-19 11:45:38.829803 +] E [MSGID: 115056]  
[server-rpc-fops_v2.c:497:server4_mkdir_cbk] 0-ssd-volume-server:  
MKDIR info [{frame=11820}, {MKDIR_path=},  
{uuid_utoa=----0001}, {bname=},  
{client=CTX_ID:974f4637-64ef-42e6-afad-1dc9c67c4a43-GRAPH_ID:3-PID:2062-HOST:srv-33-PC_NAME:ssd-volume-client-0-RECON_NO:-0}, {error-xlator=ssd-volume-posix}, {errno=22}, {error=Invalid  
argument}]


==> /var/log/glusterfs/bricks/data-glusterfs-ssd-brick2-brick.log <==

[2021-03-19 11:45:38.829970 +] E [MSGID: 115056]  
[server-rpc-fops_v2.c:497:server4_mkdir_cbk] 0-ssd-volume-server:  
MKDIR info [{frame=10058}, {MKDIR_path=},  
{uuid_utoa=----0001}, {bname=},  
{client=CTX_ID:974f4637-64ef-42e6-afad-1dc9c67c4a43-GRAPH_ID:3-PID:2062-HOST:srv-33-PC_NAME:ssd-volume-client-3-RECON_NO:-0}, {error-xlator=ssd-volume-posix}, {errno=22}, {error=Invalid  
argument}]


==> /var/log/glusterfs/bricks/data-glusterfs-ssd-brick1-brick.log <==

[2021-03-19 11:45:38.829721 +] E [MSGID: 113002]  
[posix-entry-ops.c:682:posix_mkdir] 0-ssd-volume-posix: gfid is null  
for (null) [Invalid argument]


==> /var/log/glusterfs/bricks/data-glusterfs-ssd-brick2-brick.log <==

[2021-03-19 11:45:38.829883 +] E [MSGID: 113002]  
[posix-entry-ops.c:682:posix_mkdir] 0-ssd-volume-posix: gfid is null  
for (null) [Invalid argument]


[2021-03-19 11:45:50.005995 +] E [MSGID: 113002]  
[posix-entry-ops.c:682:posix_mkdir] 0-ssd-volume-posix: gfid is null  
for (null) [Invalid argument]


[2021-03-19 11:45:50.006115 +] E [MSGID: 115056]  
[server-rpc-fops_v2.c:497:server4_mkdir_cbk] 0-ssd-volume-server:  
MKDIR info [{frame=10082}, {MKDIR_path=},  
{uuid_utoa=----0001}, {bname=},  
{client=CTX_ID:4945546f-f368-4fa7-8bfc-3dd7abda5d1b-GRAPH_ID:3-PID:2486-HOST:srv-31-PC_NAME:ssd-volume-client-3-RECON_NO:-0}, {error-xlator=ssd-volume-posix}, {errno=22}, {error=Invalid  
argument}]


==> /var/log/glusterfs/bricks/data-glusterfs-ssd-brick1-brick.log <==

[2021-03-19 11:45:50.006096 +] E [MSGID: 113002]  
[posix-entry-ops.c:682:posix_mkdir] 0-ssd-volume-posix: gfid is null  
for (null) [Invalid argument]


[2021-03-19 11:45:50.006212 +] E [MSGID: 115056]  
[server-rpc-fops_v2.c:497:server4_mkdir_cbk] 0-ssd-volume-server:  
MKDIR info [{frame=11844}, {MKDIR_path=},  
{uuid_utoa=----0001}, {bname=},  
{client=CTX_ID:4945546f-f368-4fa7-8bfc-3dd7abda5d1b-GRAPH_ID:3-PID:2486-HOST:srv-31-PC_NAME:ssd-volume-client-0-RECON_NO:-0}, {error-xlator=ssd-volume-posix}, {errno=22}, {error=Invalid  
argument}]


==> /var/log/glusterfs/glustershd.log <==

[2021-03-19 11:45:50.006255 +] E [MSGID: 114031]  
[client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 3-ssd-volume-client-3:  
remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid  
argument}]


[2021-03-19 11:45:50.006352 +] E [MSGID: 114031]  
[client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 3-ssd-volume-client-0:  
remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid  
argument}]


[2021-03-19 11:45:50.006408 +] E [MSGID: 114031]  
[client-rpc-fops_v2.c:214:cl

[Gluster-users] Updated invitation: Gluster Community Meeting @ Tue Mar 23, 2021 2:30pm - 3:30pm (IST) (gluster-users@gluster.org)

2021-03-23 Thread ssivakum
BEGIN:VCALENDAR
PRODID:-//Google Inc//Google Calendar 70.9054//EN
VERSION:2.0
CALSCALE:GREGORIAN
METHOD:REQUEST
BEGIN:VTIMEZONE
TZID:Asia/Kolkata
X-LIC-LOCATION:Asia/Kolkata
BEGIN:STANDARD
TZOFFSETFROM:+0530
TZOFFSETTO:+0530
TZNAME:IST
DTSTART:19700101T00
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=Asia/Kolkata:20210323T143000
DTEND;TZID=Asia/Kolkata:20210323T153000
DTSTAMP:20210322T070229Z
ORGANIZER;CN=sajmo...@redhat.com:mailto:sajmo...@redhat.com
UID:044bdru9e1v3uah2jln7j5rebc_r20210223t090...@google.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=DECLINED;RSVP=TRUE
 ;CN=pierre-marie.jan...@agoda.com;X-NUM-GUESTS=0:mailto:pierre-marie.janvre
 @agoda.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=ACCEPTED;RSVP=TRUE
 ;CN=sajmo...@redhat.com;X-NUM-GUESTS=0;X-RESPONSE-COMMENT="UPDATED THE BRID
 GE TO GOOGLE MEET LINK - meet.google.com/cpu-eiue-hvk\n":mailto:sajmoham@re
 dhat.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=DECLINED;RSVP=TRUE
 ;CN=alpha754...@hotmail.com;X-NUM-GUESTS=0:mailto:alpha754...@hotmail.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=ACCEPTED;RSVP=TRUE
 ;CN=Sheetal Pamecha;X-NUM-GUESTS=0:mailto:spame...@redhat.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=Shwetha Acharya;X-NUM-GUESTS=0:mailto:sacha...@redhat.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=ACCEPTED;RSVP=TRUE
 ;CN=Deepshikha Khandelwal;X-NUM-GUESTS=0:mailto:dkhan...@redhat.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=ACCEPTED;RSVP=TRUE
 ;CN=Sunil Kumar Heggodu Gopala Acharya;X-NUM-GUESTS=0:mailto:sheggodu@redha
 t.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=Vinayakswami Hariharmath;X-NUM-GUESTS=0:mailto:vhari...@redhat.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=ACCEPTED;RSVP=TRUE
 ;CN=bsaso...@redhat.com;X-NUM-GUESTS=0:mailto:bsaso...@redhat.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=DECLINED;RSVP=TRUE
 ;CN=Ana Neri;X-NUM-GUESTS=0:mailto:amne...@fb.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=ACCEPTED;RSVP=TRUE
 ;CN=ssiva...@redhat.com;X-NUM-GUESTS=0:mailto:ssiva...@redhat.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=DECLINED;RSVP=TRUE
 ;CN=Richard Wareing;X-NUM-GUESTS=0:mailto:rware...@fb.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=DECLINED;RSVP=TRUE
 ;CN=David Hasson;X-NUM-GUESTS=0:mailto:d...@fb.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=TENTATIVE;RSVP=TRU
 E;CN=ch...@redhat.com;X-NUM-GUESTS=0:mailto:ch...@redhat.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=TENTATIVE;RSVP=TRU
 E;CN=Ravishankar N;X-NUM-GUESTS=0:mailto:ravishan...@redhat.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=ACCEPTED;RSVP=TRUE
 ;CN=a...@kadalu.io;X-NUM-GUESTS=0:mailto:a...@kadalu.io
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=ACCEPTED;RSVP=TRUE
 ;CN=nla...@redhat.com;X-NUM-GUESTS=0:mailto:nla...@redhat.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=ACCEPTED;RSVP=TRUE
 ;CN=sankarshan.mukhopadh...@gmail.com;X-NUM-GUESTS=0:mailto:sankarshan.mukh
 opadh...@gmail.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=ACCEPTED;RSVP=TRUE
 ;CN=rkoth...@redhat.com;X-NUM-GUESTS=0:mailto:rkoth...@redhat.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=TENTATIVE;RSVP=TRU
 E;CN=sunku...@redhat.com;X-NUM-GUESTS=0:mailto:sunku...@redhat.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=ACCEPTED;RSVP=TRUE
 ;CN=pranith.karamp...@phonepe.com;X-NUM-GUESTS=0:mailto:pranith.karampuri@p
 honepe.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=DECLINED;RSVP=TRUE
 ;CN=Wojciech J. Turek;X-NUM-GUESTS=0:mailto:wj...@cam.ac.uk
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=ACCEPTED;RSVP=TRUE
 ;CN=sasun...@redhat.com;X-NUM-GUESTS=0:mailto:sasun...@redhat.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=ACCEPTED;RSVP=TRUE
 ;CN=tshac...@redhat.com;X-NUM-GUESTS=0:mailto:tshac...@redhat.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=TENTATIVE;RSVP=TRU
 E;CN=pueb...@redhat.com;X-NUM-GUESTS=0:mailto:pueb...@redhat.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=neesi...@redhat.com;X-NUM-GUESTS=0:mailto:neesi...@redhat.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=aujj...@redhat.com;X-NUM-GUESTS=0:mailto:aujj...@redhat.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=ACCEPTED;RSVP=TRUE
 ;CN=rafi.kavun...@iternity.com;X-NUM-GUESTS=0:mailto:rafi.kavungal@iternity
 .com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=gluster-users@gluster.org;X-NUM-GUESTS=0:mailto:gluster-users@glust
 er.org
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=ACCEPTED;RSVP=TRUE
 ;CN=gluster-de...@gluster.org;X-NUM-GUESTS=0:mailto:gluster-devel@gluste

Re: [Gluster-users] Gluster usage scenarios in HPC cluster management

2021-03-23 Thread Erik Jacobson
> I still have to grasp the "leader node" concept.
> Weren't gluster nodes "peers"? Or by "leader" you mean that it's
> mentioned in the fstab entry like
> /l1,l2,l3:gv0 /mnt/gv0 glusterfs defaults 0 0
> while the peer list includes l1,l2,l3 and a bunch of other nodes?

Right, it's a list of 24 peers. The 24 peers are split in to a 3x24
replicated/distributed setup for the volumes. They also have entries
for themselves as clients in /etc/fstab. I'll dump some volume info
at the end of this.


> > So we would have 24 leader nodes, each leader would have a disk serving
> > 4 bricks (one of which is simply a lock FS for CTDB, one is sharded,
> > one is for logs, and one is heavily optimized for non-object expanded
> > tree NFS). The term "disk" is loose.
> That's a system way bigger than ours (3 nodes, replica3arbiter1, up to
> 36 bricks per node).

I have one dedicated "disk" (could be disk, raid lun, single ssd) and
4 directories for volumes ("bricks"). Of course, the "ctdb" volume is just
for the lock and has a single file.

> 
> > Specs of a leader node at a customer site:
> >  * 256G RAM
> Glip! 256G for 4 bricks... No wonder I have had troubles running 26
> bricks in 64GB RAM... :)

I'm not an expert in memory pools or how they would be impacted by more
peers. I had to do a little research and I think what you're after is
if I can run gluster volume status cm_shared mem on a real cluster
that has a decent node count. I will see if I can do that.


TEST ENV INFO for those who care

Here is some info on my own test environemnt which you can skip.

I have the environment duplicated on my desktop using virtual machines and it
runs fine (slow but fine). It's a 3x1. I take out my giant 8GB cache
from the optimized volumes but other than that it is fine. In my
development environment, the gluster disk is a 40G qcow2 image.

Cache sizes changed from 8G to 100M to fit in the VM.

XML snips for memory, cpus:

  cm-leader1
  99d5a8fc-a32c-b181-2f1a-2929b29c3953
  3268608
  3268608
  2
  
..


I have 1 admin (head) node VM, 3 VM leader nodes like above, and one test
compute node for my development environment.

My desktop where I test this cluster stack is a beefy but not brand new
desktop:

Architecture:x86_64
CPU op-mode(s):  32-bit, 64-bit
Byte Order:  Little Endian
Address sizes:   46 bits physical, 48 bits virtual
CPU(s):  16
On-line CPU(s) list: 0-15
Thread(s) per core:  2
Core(s) per socket:  8
Socket(s):   1
NUMA node(s):1
Vendor ID:   GenuineIntel
CPU family:  6
Model:   79
Model name:  Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
Stepping:1
CPU MHz: 2594.333
CPU max MHz: 3000.
CPU min MHz: 1200.
BogoMIPS:4190.22
Virtualization:  VT-x
L1d cache:   32K
L1i cache:   32K
L2 cache:256K
L3 cache:20480K
NUMA node0 CPU(s):   0-15



(Not that it matters but this is a HP Z640 Workstation)

128G memory (good for a desktop I know, but I think 64G would work since
I also run windows10 vm environment for unrelated reasons)

I was able to find a MegaRAID in the lab a few years ago and so I have 4
drives in a MegaRAID and carve off a separate volume for the VM disk
images. It has a cache. So that's also more beefy than a normal desktop.
(on the other hand, I have no SSDs. May experiment with that some day
but things work so well now I'm tempted to leave it until something
croaks :)

I keep all VMs for the test cluster with "Unsafe cache mode" since there
is no true data to worry about and it makes the test cases faster.

So I am able to test a complete cluster management stack including
3-leader-gluster servers, an admin, and compute all on my desktop using
virtual machines and shared networks within libivrt/qemu.

It is so much easier to do development when you don't have to reserve
scarce test clusters and compete with people. I can do 90% of my cluster
development work this way. Things fall over when I need to care about
BMCs/ILOs or need to do performance testing of course. Then I move to
real hardware and play the hunger-games-of-internal-test-resources :) :)

I mention all this just to show that the beefy servers are not needed
nor the memory usage high. I'm not continually swapping or anything like
that.




Configuration Info from Real Machine


Some info on an active 3x3 cluster. 2738 compute nodes.

The most active volume here is "cm_obj_sharded". It is where the image
objects live and this cluster uses image objects for compute node root
filesystems. I by hand changed the IP addresses (in case I made an
error doing that).


Memory status for volume : cm_obj_sharded
--
Brick : 10.1.0.5:/data/brick_cm_obj_sharded
Mallinfo

Arena: 20676608
Ordblks  : 2077
Smblks   : 518
Hblks: 17
Hblkhd   : 173506

Re: [Gluster-users] Gluster usage scenarios in HPC cluster management

2021-03-23 Thread Yaniv Kaul
On Tue, Mar 23, 2021 at 10:02 AM Diego Zuccato 
wrote:

> Il 22/03/21 16:54, Erik Jacobson ha scritto:
>
> > So if you had 24 leaders like HLRS, there would be 8 replica-3 at the
> > bottom layer, and then distributed across. (replicated/distributed
> > volumes)
> I still have to grasp the "leader node" concept.
> Weren't gluster nodes "peers"? Or by "leader" you mean that it's
> mentioned in the fstab entry like
> /l1,l2,l3:gv0 /mnt/gv0 glusterfs defaults 0 0
> while the peer list includes l1,l2,l3 and a bunch of other nodes?
>
> > So we would have 24 leader nodes, each leader would have a disk serving
> > 4 bricks (one of which is simply a lock FS for CTDB, one is sharded,
> > one is for logs, and one is heavily optimized for non-object expanded
> > tree NFS). The term "disk" is loose.
> That's a system way bigger than ours (3 nodes, replica3arbiter1, up to
> 36 bricks per node).
>
> > Specs of a leader node at a customer site:
> >  * 256G RAM
> Glip! 256G for 4 bricks... No wonder I have had troubles running 26
> bricks in 64GB RAM... :)
>

If you can recompile Gluster, you may want to experiment with disabling
memory pools - this should save you some memory.
Y.

>
> --
> Diego Zuccato
> DIFA - Dip. di Fisica e Astronomia
> Servizi Informatici
> Alma Mater Studiorum - Università di Bologna
> V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
> tel.: +39 051 20 95786
> 
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Gluster usage scenarios in HPC cluster management

2021-03-23 Thread Diego Zuccato
Il 22/03/21 16:54, Erik Jacobson ha scritto:

> So if you had 24 leaders like HLRS, there would be 8 replica-3 at the
> bottom layer, and then distributed across. (replicated/distributed
> volumes)
I still have to grasp the "leader node" concept.
Weren't gluster nodes "peers"? Or by "leader" you mean that it's
mentioned in the fstab entry like
/l1,l2,l3:gv0 /mnt/gv0 glusterfs defaults 0 0
while the peer list includes l1,l2,l3 and a bunch of other nodes?

> So we would have 24 leader nodes, each leader would have a disk serving
> 4 bricks (one of which is simply a lock FS for CTDB, one is sharded,
> one is for logs, and one is heavily optimized for non-object expanded
> tree NFS). The term "disk" is loose.
That's a system way bigger than ours (3 nodes, replica3arbiter1, up to
36 bricks per node).

> Specs of a leader node at a customer site:
>  * 256G RAM
Glip! 256G for 4 bricks... No wonder I have had troubles running 26
bricks in 64GB RAM... :)

-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users