Re: [Gluster-users] Gluster -> Ceph

2023-12-17 Thread Diego Zuccato

Il 17/12/2023 14:52, Joe Julian ha scritto:


From what I've been told (by experts) it's really hard to make it happen. More 
if proper redundancy of MON and MDS daemons is implemented on quality HW.

LSI isn't exactly crap hardware. But when a flaw causes it to drop drives under 
heavy load, the rebalance from dropped drives can cause that heavy load causing 
a cascading failure. When the journal is never idle long enough to checkpoint, 
it fills the partition and ends up corrupted and unrecoverable.


Good to know. Better to add a monitoring service that stops everything 
when the log is too full.
That also applies to Gluster, BTW, even if with less severe 
consequences: sometimes, "peer files" got lost due to /var filling up 
and glusterd wouldn't come up after a reboot.



Neither Gluster nor Ceph are "backup solutions", so if the data is not easily 
replaceable it's better to have it elsewhere. Better if offline.

It's a nice idea but when you're dealing in petabytes of data, streaming in as 
fast as your storage will allow, it's just not physically possible.
Well, it will have to stop sometimes, or you'd need an infinite storage, 
no? :) Usually data from experiments comes in bursts, with (often large) 
intervals when you can process/archive it.


--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Gluster -> Ceph

2023-12-17 Thread Joe Julian



On December 17, 2023 5:40:52 AM PST, Diego Zuccato  
wrote:
>Il 14/12/2023 16:08, Joe Julian ha scritto:
>
>> With ceph, if the placement database is corrupted, all your data is lost 
>> (happened to my employer, once, losing 5PB of customer data).
>
>From what I've been told (by experts) it's really hard to make it happen. More 
>if proper redundancy of MON and MDS daemons is implemented on quality HW.
>
LSI isn't exactly crap hardware. But when a flaw causes it to drop drives under 
heavy load, the rebalance from dropped drives can cause that heavy load causing 
a cascading failure. When the journal is never idle long enough to checkpoint, 
it fills the partition and ends up corrupted and unrecoverable.


>Neither Gluster nor Ceph are "backup solutions", so if the data is not easily 
>replaceable it's better to have it elsewhere. Better if offline.
>

It's a nice idea but when you're dealing in petabytes of data, streaming in as 
fast as your storage will allow, it's just not physically possible.




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Gluster -> Ceph

2023-12-17 Thread Diego Zuccato

Il 14/12/2023 16:08, Joe Julian ha scritto:

With ceph, if the placement database is corrupted, all your data is lost 
(happened to my employer, once, losing 5PB of customer data).


From what I've been told (by experts) it's really hard to make it 
happen. More if proper redundancy of MON and MDS daemons is implemented 
on quality HW.



With Gluster, it's just files on disks, easily recovered.


I've already had to do it twice in a year with the coming third time 
that's the "definitive migration".
The first time there were too many little files, the second it seemed 
192GB RAM are not enough to handle 30 bricks per server, and now that I 
reduced to just 6 bricks per server (creating RAIDs) and created a brand 
new volume in august, I already find lots of FUSE-inaccessible files 
that doesn't heal. Should be impossible since I'm using "replica 3 
arbiter 1" over IPoIB with the three servers speaking directly via the 
switch. But it keeps happening. I really trusted Gluster promises, but 
currently what I (and, worse, the users) see is a 60-70% availability.


Neither Gluster nor Ceph are "backup solutions", so if the data is not 
easily replaceable it's better to have it elsewhere. Better if offline.


--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Gluster -> Ceph

2023-12-14 Thread Marcus Pedersén
Thanks for you feedback!
Please, do not get me wrong, I really like gluster
and it has served us well for many, many years.
But as from previous posts about gluster project health
this worries me and I want to be able to have a good
alternative prepared in case of 
Gluster is great and aligns well to our needs,
as mentioned ceph is for larger systems.
The problem is that there is not so many other
filesystems then can tick all the want boxes.
Opensource with a community, replication, snapshots aso.

Thanks a lot!!

Marcus


On Thu, Dec 14, 2023 at 07:08:46AM -0800, Joe Julian wrote:
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open attachments unless you recognize the sender and know the 
> content is safe.
>
>
> Big raid isn't great as bricks. If the array does fail, the larger brick 
> means much longer heal times.
>
> My main question I ask when evaluating storage solutions is, "what happens 
> when it fails?"
>
> With ceph, if the placement database is corrupted, all your data is lost 
> (happened to my employer, once, losing 5PB of customer data). With Gluster, 
> it's just files on disks, easily recovered.
>
> If your data is easily replaced, ceph offers copy-on-write which is really 
> handy for things like VM images where you might want to clone 100 
> simultaneously.
>
>
> On December 14, 2023 6:57:00 AM PST, Alvin Starr  wrote:
>
> On 2023-12-14 07:48, Marcus Pedersén wrote:
> Hi all,
> I am looking in to ceph and cephfs and in my
> head I am comparing with gluster.
>
> The way I have been running gluster over the years
> is either a replicated or replicated-distributed clusters.
> Here are my observations but I am far from an expert in either Ceph or 
> Gluster.
>
> Gluster works very well with 2 servers containing 2 big RAID disk arrays.
>
> Ceph on the other hand has MON,MGR,MDS...  that can run on multiple servers, 
> and should be for redundancy, but the OSDs should be lots of small servers 
> with very few disks attached.
>
> It kind of seems that the perfect OSD would be a disk with a raspberry pi 
> attached and a 2.5Gb nic.
> Something really cheap and replaceable.
>
> So putting Ceph on 2 big servers with RAID arrays is likely a very bad idea.
>
> I am hoping that someone picks up Gluster because it fits the storage 
> requirements for organizations who start measuring their storage in TB as 
> opposed to EB.
>
> The small setup we have had has been a replicated cluster
> with one arbiter and two fileservers.
> These fileservers has been configured with RAID6 and
> that raid has been used as the brick.
>
> If disaster strikes and one fileserver burns up
> there is still the other fileserver and as it is RAIDed
> I can loose two disks on this machine before I
> start to loose data.
>
>  thinking ceph and similar setup 
> The idea is to have one "admin" node and two fileservers.
> The admin node will run mon, mgr and mds.
> The storage nodes will run mon, mgr, mds and 8x osd (8 disks),
> with replication = 2.
>
> The problem is that I can not get my head around how
> to think when disaster strikes.
> So one fileserver burns up, there is still the other
> fileserver and from my understanding the ceph system
> will start to replicate the files on the same fileserver
> and when this is done disks can be lost on this server
> without loosing data.
> But to be able to have this security on hardware it
> means that the ceph cluster can never be more then 50% full
> or this will not work, right?
> ... and it becomes similar if we have three fileservers,
> then the cluster can never be more then 2/3 full?
>
> I am not sure if I missunderstand how ceph works or
> that ceph works bad on smaller systems like this?
>
> I would appreciate if somebody with better knowledge
> would be able to help me out with this!
>
> Many thanks in advance!!
>
> Marcus
> 
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina 
> personuppgifter. För att läsa mer om hur detta går till, klicka här 
> 
> E-mailing SLU will result in SLU processing your personal data. For more 
> information on how this is done, click here 
> 
> 
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>

> 
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina 
personuppgifter. För att läsa 

Re: [Gluster-users] Gluster -> Ceph

2023-12-14 Thread Joe Julian
Big raid isn't great as bricks. If the array does fail, the larger brick means 
much longer heal times.

My main question I ask when evaluating storage solutions is, "what happens when 
it fails?"

With ceph, if the placement database is corrupted, all your data is lost 
(happened to my employer, once, losing 5PB of customer data). With Gluster, 
it's just files on disks, easily recovered.

If your data is easily replaced, ceph offers copy-on-write which is really 
handy for things like VM images where you might want to clone 100 
simultaneously.

On December 14, 2023 6:57:00 AM PST, Alvin Starr  wrote:
>On 2023-12-14 07:48, Marcus Pedersén wrote:
>> Hi all,
>> I am looking in to ceph and cephfs and in my
>> head I am comparing with gluster.
>> 
>> The way I have been running gluster over the years
>> is either a replicated or replicated-distributed clusters.
>Here are my observations but I am far from an expert in either Ceph or Gluster.
>
>Gluster works very well with 2 servers containing 2 big RAID disk arrays.
>
>Ceph on the other hand has MON,MGR,MDS...  that can run on multiple servers, 
>and should be for redundancy, but the OSDs should be lots of small servers 
>with very few disks attached.
>
>It kind of seems that the perfect OSD would be a disk with a raspberry pi 
>attached and a 2.5Gb nic.
>Something really cheap and replaceable.
>
>So putting Ceph on 2 big servers with RAID arrays is likely a very bad idea.
>
>I am hoping that someone picks up Gluster because it fits the storage 
>requirements for organizations who start measuring their storage in TB as 
>opposed to EB.
>
>> The small setup we have had has been a replicated cluster
>> with one arbiter and two fileservers.
>> These fileservers has been configured with RAID6 and
>> that raid has been used as the brick.
>> 
>> If disaster strikes and one fileserver burns up
>> there is still the other fileserver and as it is RAIDed
>> I can loose two disks on this machine before I
>> start to loose data.
>> 
>>  thinking ceph and similar setup 
>> The idea is to have one "admin" node and two fileservers.
>> The admin node will run mon, mgr and mds.
>> The storage nodes will run mon, mgr, mds and 8x osd (8 disks),
>> with replication = 2.
>> 
>> The problem is that I can not get my head around how
>> to think when disaster strikes.
>> So one fileserver burns up, there is still the other
>> fileserver and from my understanding the ceph system
>> will start to replicate the files on the same fileserver
>> and when this is done disks can be lost on this server
>> without loosing data.
>> But to be able to have this security on hardware it
>> means that the ceph cluster can never be more then 50% full
>> or this will not work, right?
>> ... and it becomes similar if we have three fileservers,
>> then the cluster can never be more then 2/3 full?
>> 
>> I am not sure if I missunderstand how ceph works or
>> that ceph works bad on smaller systems like this?
>> 
>> I would appreciate if somebody with better knowledge
>> would be able to help me out with this!
>> 
>> Many thanks in advance!!
>> 
>> Marcus
>> 
>> ---
>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina 
>> personuppgifter. För att läsa mer om hur detta går till, klicka här 
>> 
>> E-mailing SLU will result in SLU processing your personal data. For more 
>> information on how this is done, click here 
>> 
>> 
>> 
>> 
>> 
>> Community Meeting Calendar:
>> 
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://meet.google.com/cpu-eiue-hvk
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>-- 
>Alvin Starr   ||   land:  (647)478-6285
>Netvel Inc.   ||   Cell:  (416)806-0133
>al...@netvel.net  ||
>
>
>
>
>
>Community Meeting Calendar:
>
>Schedule -
>Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>Bridge: https://meet.google.com/cpu-eiue-hvk
>Gluster-users mailing list
>Gluster-users@gluster.org
>https://lists.gluster.org/mailman/listinfo/gluster-users




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Gluster -> Ceph

2023-12-14 Thread Alvin Starr

On 2023-12-14 07:48, Marcus Pedersén wrote:

Hi all,
I am looking in to ceph and cephfs and in my
head I am comparing with gluster.

The way I have been running gluster over the years
is either a replicated or replicated-distributed clusters.
Here are my observations but I am far from an expert in either Ceph or 
Gluster.


Gluster works very well with 2 servers containing 2 big RAID disk arrays.

Ceph on the other hand has MON,MGR,MDS...  that can run on multiple 
servers, and should be for redundancy, but the OSDs should be lots of 
small servers with very few disks attached.


It kind of seems that the perfect OSD would be a disk with a raspberry 
pi attached and a 2.5Gb nic.

Something really cheap and replaceable.

So putting Ceph on 2 big servers with RAID arrays is likely a very bad idea.

I am hoping that someone picks up Gluster because it fits the storage 
requirements for organizations who start measuring their storage in TB 
as opposed to EB.



The small setup we have had has been a replicated cluster
with one arbiter and two fileservers.
These fileservers has been configured with RAID6 and
that raid has been used as the brick.

If disaster strikes and one fileserver burns up
there is still the other fileserver and as it is RAIDed
I can loose two disks on this machine before I
start to loose data.

 thinking ceph and similar setup 
The idea is to have one "admin" node and two fileservers.
The admin node will run mon, mgr and mds.
The storage nodes will run mon, mgr, mds and 8x osd (8 disks),
with replication = 2.

The problem is that I can not get my head around how
to think when disaster strikes.
So one fileserver burns up, there is still the other
fileserver and from my understanding the ceph system
will start to replicate the files on the same fileserver
and when this is done disks can be lost on this server
without loosing data.
But to be able to have this security on hardware it
means that the ceph cluster can never be more then 50% full
or this will not work, right?
... and it becomes similar if we have three fileservers,
then the cluster can never be more then 2/3 full?

I am not sure if I missunderstand how ceph works or
that ceph works bad on smaller systems like this?

I would appreciate if somebody with better knowledge
would be able to help me out with this!

Many thanks in advance!!

Marcus

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina 
personuppgifter. För att läsa mer om hur detta går till, klicka här 

E-mailing SLU will result in SLU processing your personal data. For more information 
on how this is done, click here 





Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


--
Alvin Starr   ||   land:  (647)478-6285
Netvel Inc.   ||   Cell:  (416)806-0133
al...@netvel.net  ||





Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Gluster -> Ceph

2023-12-14 Thread Dmitry Melekhov

14.12.2023 16:48, Marcus Pedersén пишет:

The problem is that I can not get my head around how
to think when disaster strikes.
So one fileserver burns up, there is still the other
fileserver and from my understanding the ceph system
will start to replicate the files on the same fileserver


no, it will not.






Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users