Thank you for the explanation Frank.

I also agree with you, Ceph is not designed for this kind of use case
but I tried to continue what I know.
My idea was exactly what you described, I was trying to automate
cleaning or recreating on any failure.

As you can see below, rep1 pool is very fast:
- Create: time for i in {00001..99999}; do head -c 1K </dev/urandom
>randfile$i; done
replication 2 : 31m59.917s
replication 1 : 7m6.046s
--------------------------------
- Delete: time rm -rf testdir/
replication 2 : 11m56.994s
replication 1 : 0m40.756s
-------------------------------------

I started learning DRBD, I will also check BeeGFS thanks for the advice.

Regards.

Frank Schilder <fr...@dtu.dk>, 1 May 2023 Pzt, 10:27 tarihinde şunu yazdı:
>
> I think you misunderstood Janne's reply. The main statement is at the end, 
> ceph is not designed for an "I don't care about data" use case. If you need 
> speed for temporary data where you can sustain data loss, go for something 
> simpler. For example, we use beegfs with great success for a burst buffer for 
> an HPC cluster. It is very lightweight and will pull out all performance your 
> drives can offer. In case of disaster it is easily possible to clean up. 
> Beegfs does not care about lost data, such data will simply become 
> inaccessible while everything else just moves on. It will not try to 
> self-heal either. It doesn't even scrub data, so no competition of users with 
> admin IO.
>
> Its pretty much your use case. We clean it up every 6-8 weeks and if 
> something breaks we just redeploy the whole thing from scratch. Performance 
> is great and its a very simple and economic system to administrate. No need 
> for the whole ceph daemon engine with large RAM requirements and extra admin 
> daemons.
>
> Use ceph for data you want to survive a nuclear blast. Don't use it for 
> things its not made for and then complain.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: mhnx <morphinwith...@gmail.com>
> Sent: Saturday, April 29, 2023 5:48 AM
> To: Janne Johansson
> Cc: Ceph Users
> Subject: [ceph-users] Re: How can I use not-replicated pool (replication 1 or 
> raid-0)
>
> Hello Janne, thank you for your response.
>
> I understand your advice and be sure that I've designed too many EC
> pools and I know the mess. This is not an option because I need SPEED.
>
> Please let me tell you, my hardware first to meet the same vision.
> Server: R620
> Cpu: 2 x Xeon E5-2630 v2 @ 2.60GHz
> Ram: 128GB - DDR3
> Disk1: 20x Samsung SSD 860 2TB
> Disk2: 10x Samsung SSD 870 2TB
>
> My ssds does not have PLP. Because of that, every ceph write also
> waits for TRIM. I want to know how much latency we are talking about
> because I'm thinking of adding PLP NVME for wal+db cache to gain some
> speed.
> As you can see, I even try to gain from every TRIM command.
> Currently I'm testing replication 2 pool and even this speed is not
> enough for my use case.
> Now I'm trying to boost the deletion speed because I'm writing and
> deleting files all the time and this never ends.
> I write this mail because replication 1 will decrease the deletion
> speed but still I'm trying to tune some MDS+ODS parameters to increase
> delete speed.
>
> Any help and idea will be great for me. Thanks.
> Regards.
>
>
>
> Janne Johansson <icepic...@gmail.com>, 12 Nis 2023 Çar, 10:10
> tarihinde şunu yazdı:
> >
> > Den mån 10 apr. 2023 kl 22:31 skrev mhnx <morphinwith...@gmail.com>:
> > > Hello.
> > > I have a 10 node cluster. I want to create a non-replicated pool
> > > (replication 1) and I want to ask some questions about it:
> > >
> > > Let me tell you my use case:
> > > - I don't care about losing data,
> > > - All of my data is JUNK and these junk files are usually between 1KB to 
> > > 32MB.
> > > - These files will be deleted in 5 days.
> > > - Writable space and I/O speed is more important.
> > > - I have high Write/Read/Delete operations, minimum 200GB a day.
> >
> > That is "only" 18MB/s which should easily be doable even with
> > repl=2,3,4. or EC. This of course depends on speed of drives, network,
> > cpus and all that, but in itself it doesn't seem too hard to achieve
> > in terms of average speeds. We have EC8+3 rgw backed by some 12-14 OSD
> > hosts with hdd and nvme (for wal+db) that can ingest over 1GB/s if you
> > parallelize the rgw streams, so 18MB/s seems totally doable with 10
> > decent machines. Even with replication.
> >
> > > I'm afraid that, in any failure, I won't be able to access the whole
> > > cluster. Losing data is okay but I have to ignore missing files,
> >
> > Even with repl=1, in case of a failure, the cluster will still aim at
> > fixing itself rather than ignoring currently lost data and moving on,
> > so any solution that involves "forgetting" about lost data would need
> > a ceph operator telling the cluster to ignore all the missing parts
> > and to recreate the broken PGs. This would not be automatic.
> >
> >
> > --
> > May the most significant bit of your life be positive.
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to