[ceph-users] Re: ceph error connecting to the cluster

2024-02-08 Thread Eugen Block

Hi,

your message showed up in my inbox today, but apparently it's almost a  
week old. Have you resolved your issue?
If not, you'll need to provide more details, for example your ceph  
version, the current ceph status, which keyrings are present on the  
host you're trying to execute the command from.


Zitat von arimbidh...@gmail.com:

hello, i was tried to create osd but when i run ceph command the  
output is like this :


root@pod-deyyaa-ceph1:~# sudo ceph -s
2024-02-02T16:01:23.627+0700 7fc762f37640  0 monclient(hunting):  
authenticate timed out after 300

[errno 110] RADOS timed out (error connecting to the cluster)


can anyone give me a hint or help me to solve this
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Performance issues with writing files to Ceph via S3 API

2024-02-08 Thread Renann Prado
Hello Anthony,

Sorry for the late reply.
My thought process behind it was that maybe there's some kind of indexing
that Ceph does under the hood, and perhaps the bucket structure could
influence that.
But if you say it's not the case, then I was on the wrong path.

Sorry for the daley, but I also wanted to gather info.

> How many millions?

About 75 millions.

> How big are they?

They vary from ~500kb to a couple of megabytes, say 5mb. I wouldn't be able
to tell you if most files are closer to 5mb or to 500kb though, but if
that's important I can try to figure it out.

> Are you writing them to a single bucket?

Yes. All these files are in a single bucket.

> How is the index pool configured?  On what media?
> Same with the bucket pool.

I wouldn't be able to answer that unfortunately.

> Which Ceph release?

Pacific (https://docs.ceph.com/en/pacific/).

> Sharding config?
> Are you mixing in bucket list operations ?

We don't use list operations on this bucket, but the Ceph infrastructure is
shared across multiple companies and we are aware that there are others
using list operations *on other buckets*. But also, I can say that list
operations in this bucket IIRC are failing (to a point where we don't have
the exact metric of how many objects are in the bucket). The provider has a
prometheus exporter which fails to expert the metrics in production
currently.

> Do you have the ability to utilize more than one bucket? If you can limit
the number of objects in a bucket that might help.

Technically it should be possible, but I'd assume that Ceph can abstract
this complexity for the bucket user so that we don't have to care for that.
If we do it, I would see it as a workaround more than a real solution.

> If your application keeps track of object names you might try indexless
buckets.

I didn't know there was this possibility.

I don't know how Ceph works under the hood, but assuming that all files are
ultimately written to the same folder in disk, could that be a problem?
I have faced in the past struggle with linux file system getting too slow
due to too many files written to the same folder.

Thanks for the help already!

Best regards,
*Renann Prado*


On Sat, Feb 3, 2024 at 7:13 PM Anthony D'Atri 
wrote:

> The slashes don’t mean much if anything to Ceph.  Buckets are not
> hierarchical filesystems.
>
> You speak of millions of files.  How many millions?
>
> How big are they?  Very small objects stress any object system.  Very
> large objects may be multi part uploads that stage to slow media or
> otherwise add overhead.
>
> Are you writing them to a single bucket?
>
> How is the index pool configured?  On what media?
> Same with the bucket pool.
>
> Which Ceph release? Sharding config?
> Are you mixing in bucket list operations ?
>
> It could be that you have an older release or a cluster set up on an older
> release that doesn’t effectively auto-reshard the bucket index.  If the
> index pool is set up poorly - slow media, too few OSDs, too few PGs - that
> may contribute.
>
> In some circumstances pre-sharding might help.
>
> Do you have the ability to utilize more than one bucket? If you can limit
> the number of objects in a bucket that might help.
>
> If your application keeps track of object names you might try indexless
> buckets.
>
> > On Feb 3, 2024, at 12:57 PM, Renann Prado 
> wrote:
> >
> > Hello,
> >
> > I have an issue at my company where we have an underperforming Ceph
> > instance.
> > The issue that we have is that sometimes writing files to Ceph via S3 API
> > (our only option) takes up to 40s, which is too long for us.
> > We are a bit limited on what we can do to investigate why it's performing
> > so badly, because we have a service provider in between, so getting to
> the
> > bottom of this really is not that easy.
> >
> > That being said, the way we use the S3 APi (again, Ceph under the hood)
> is
> > by writing all files (multiple millions) to the root, so we don't use
> *no*
> > folder-like structure e.g. we write */* instead of
> */this/that/*
> > .
> >
> > The question is:
> >
> > Does anybody know whether Ceph has performance gains when you create a
> > folder structure vs when you don't?
> > Looking at Ceph's documentation I could not find such information.
> >
> > Best regards,
> >
> > *Renann Prado*
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pacific 16.2.15 QE validation status

2024-02-08 Thread Stefan Kooman

Hi,

Is this PR: https://github.com/ceph/ceph/pull/54918 included as well?

You definitely want to build the Ubuntu / debian packages with the 
proper CMAKE_CXX_FLAGS. The performance impact on RocksDB is _HUGE_.


Thanks,

Gr. Stefan

P.s. Kudos to Mark Nelson for figuring it out / testing.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Performance issues with writing files to Ceph via S3 API

2024-02-08 Thread Anthony D'Atri


> On Feb 8, 2024, at 07:05, Renann Prado  wrote:
> 
> Hello Anthony,
> 
> Sorry for the late reply.
> My thought process behind it was that maybe there's some kind of indexing
> that Ceph does under the hood, and perhaps the bucket structure could
> influence that.

Absolutely, that's why I asked the questions.


> But if you say it's not the case, then I was on the wrong path.
> 
> Sorry for the daley, but I also wanted to gather info.
> 
>> How many millions?
> 
> About 75 millions.

In a single bucket???

> 
>> How big are they?
> 
> They vary from ~500kb to a couple of megabytes, say 5mb. I wouldn't be able
> to tell you if most files are closer to 5mb or to 500kb though, but if
> that's important I can try to figure it out.

No that's fine.  Ceph, and many other object storage systems, have a harder 
time with small objects.  If they're a lot smaller you can end up with wasted 
space.  But at 500KB, metadata operations rival just storing the data, so they 
can be a bottleneck and a hotspot.

> 
>> Are you writing them to a single bucket?
> 
> Yes. All these files are in a single bucket.


yikes.  Any chance you could refactor the application to use smaller buckets?

> 
>> How is the index pool configured?  On what media?
>> Same with the bucket pool.
> 
> I wouldn't be able to answer that unfortunately.
> 
>> Which Ceph release?
> 
> Pacific (https://docs.ceph.com/en/pacific/).
> 
>> Sharding config?
>> Are you mixing in bucket list operations ?
> 
> We don't use list operations on this bucket, but the Ceph infrastructure is
> shared across multiple companies and we are aware that there are others
> using list operations *on other buckets*. But also, I can say that list
> operations in this bucket IIRC are failing (to a point where we don't have
> the exact metric of how many objects are in the bucket).

Could be a timeout, I think the list API call only returns up to 1000 objects 
and for a larger bucket one has to iterate.

> The provider has a
> prometheus exporter which fails to expert the metrics in production
> currently.


> 
>> Do you have the ability to utilize more than one bucket? If you can limit
> the number of objects in a bucket that might help.
> 
> Technically it should be possible, but I'd assume that Ceph can abstract
> this complexity for the bucket user so that we don't have to care for that.
> If we do it, I would see it as a workaround more than a real solution.

I don't recall the succession of changes to bucket sharding.  With your Pacific 
release it could be that auto-resharding isn't enabled or isn't functioning.  I 
suspect that bucket sharding is the heart of the issue.

> 
>> If your application keeps track of object names you might try indexless
> buckets.
> 
> I didn't know there was this possibility.
> 
> I don't know how Ceph works under the hood, but assuming that all files are
> ultimately written to the same folder in disk, could that be a problem?

It doesn't work that way.  Ceph has an abstracted foundation layer called 
RADOS, and the data isn't stored on disk as traditional files.

> I have faced in the past struggle with linux file system getting too slow
> due to too many files written to the same folder.

It could be a similar but not identical issue.

When a Ceph cluster runs RGW to provide object storage, it has a dedicated pool 
that stores bucket indexes.  For any scale at all this must be placed on fast 
storage (SSDs) across enough separate drives and with enough placement groups.

Each bucket's index is broken into "shards".  With older releases that sharding 
was manual -- for very large buckets one would have to manually reshard the 
index, or pre-shard it in advance for the eventual size of the bucket.

Recent releases have a feature that does this automatically, if it's enabled.

My command of these dynamics is limited, so others on the list may be able to 
chime in with refinements.


> 
> Thanks for the help already!
> 
> Best regards,
> *Renann Prado*
> 
> 
> On Sat, Feb 3, 2024 at 7:13 PM Anthony D'Atri 
> wrote:
> 
>> The slashes don’t mean much if anything to Ceph.  Buckets are not
>> hierarchical filesystems.
>> 
>> You speak of millions of files.  How many millions?
>> 
>> How big are they?  Very small objects stress any object system.  Very
>> large objects may be multi part uploads that stage to slow media or
>> otherwise add overhead.
>> 
>> Are you writing them to a single bucket?
>> 
>> How is the index pool configured?  On what media?
>> Same with the bucket pool.
>> 
>> Which Ceph release? Sharding config?
>> Are you mixing in bucket list operations ?
>> 
>> It could be that you have an older release or a cluster set up on an older
>> release that doesn’t effectively auto-reshard the bucket index.  If the
>> index pool is set up poorly - slow media, too few OSDs, too few PGs - that
>> may contribute.
>> 
>> In some circumstances pre-sharding might help.
>> 
>> Do you have the ability to utilize more than one bucket? If you can li

[ceph-users] Re: Adding a new monitor fails

2024-02-08 Thread Eugen Block

Hi,

you're always welcome to report a documentation issue on  
tracker.ceph.com, you don't need to clean them up by yourself. :-)
There is a major restructuring in progress, but they will probably  
never be perfect anyway.



There are definitely some warts om there, as the monitor count was 1
but there were 2 monitors listed running.


I don't know your mon history, but I assume that you've had more than  
one mon (before converting to cephadm?). Then you might have updated  
the mon specs via command line, containing "count:1". But the mgr  
refuses to remove the second mon because it would break quorum. That's  
why you had 2/1 running, this is reproducible in my test cluster.
Adding more mons also failed because of the count:1 spec. You could  
have just overwritten it in the cli as well without a yaml spec file  
(omit the count spec):


ceph orch apply mon --placement="host1,host2,host3"

Regards,
Eugen

Zitat von Tim Holloway :


Ah, yes. Much better.

There are definitely some warts om there, as the monitor count was 1
but there were 2 monitors listed running.

I've mostly avoided docs that reference ceph config files and yaml
configs because the online docs are (as I've whined before) not always
trustworthy and often contain anachronisms. Were I sufficiently
knowledgeable, I'd offer to clean them up, but if that were the case, I
wouldn't have to come crying here.

All happy now, though.

   Tim


On Tue, 2024-02-06 at 19:22 +, Eugen Block wrote:

Yeah, you have the „count:1“ in there, that’s why your manually
added 
daemons are rejected. Try my suggestion with a mon.yaml.

Zitat von Tim Holloway :

> ceph orch ls
> NAME   PORTS    RUNNING  REFRESHED 
> AGE
> PLACEMENT
> alertmanager   ?:9093,9094  1/1  3m ago
> 8M
> count:1
> crash   5/5  3m ago
> 8M
> *
> grafana    ?:3000   1/1  3m ago
> 8M
> count:1
> mds.ceefs   2/2  3m ago
> 4M
> count:2
> mds.fs_name 3/3  3m ago
> 8M
> count:3
> mgr 3/3  3m ago
> 4M
> www6.mousetech.com;www2.mousetech.com;www7.mousetech.com
> mon 2/1  3m ago
> 4M
> www6.mousetech.com;www2.mousetech.com;www7.mousetech.com;count:1
> nfs.foo    ?:2049   1/1  3m ago
> 4M
> www7.mousetech.com
> node-exporter  ?:9100   5/5  3m ago
> 8M
> *
> osd   6  3m ago
> -
> 
> osd.dashboard-admin-1686941775231 0  - 
> 7M
> *
> prometheus ?:9095   1/1  3m ago
> 8M
> count:1
> rgw.mousetech  ?:80 2/2  3m ago
> 3M
> www7.mousetech.com;www2.mousetech.com
>
>
> Note that the dell02 monitor doesn't show here although the "ceph
> orch
> deamon add" returns success initially. And actually the www6
> monitor is
> not running nor does it list on the dashboard or "ceph orch ps".
> The
> www6 machine is still somewhat messed up because it was the initial
> launch machine for Octopus.
>
> On Tue, 2024-02-06 at 17:22 +, Eugen Block wrote:
> > So the orchestrator is working and you have a working ceph
> > cluster? 
> > Can you share the output of:
> > ceph orch ls mon
> >
> > If the orchestrator expects only one mon and you deploy another 
> > manually via daemon add it can be removed. Try using a mon.yaml
> > file 
> > instead which contains the designated mon hosts and then run
> > ceph orch apply -I mon.yaml
> >
> >
> >
> > Zitat von Tim Holloway :
> >
> > > I just jacked in a completely new, clean server and I've been
> > > trying to
> > > get a Ceph (Pacific) monitor running on it.
> > >
> > > The "ceph orch daemon add" appears to install all/most of
> > > what's
> > > necessary, but when the monitor starts, it shuts down
> > > immediately,
> > > and
> > > in the manner of Ceph containers immediately erases itself and
> > > the
> > > container log, so it's not possible to see what its problem is.
> > >
> > > I looked at manual installation, but the docs appear to be
> > > oriented
> > > towards old-style non-container implementation and don't
> > > account
> > > for
> > > the newer /var/lib/ceph/*fsid*/ approach.
> > >
> > > Any tips?
> > >
> > > Last few lines in the system journal are like this:
> > >
> > > Feb 06 11:09:58 dell02.mousetech.com ceph-278fcd86-0861-11ee-
> > > a7df-
> > > 9c5c8e86cf8f-mon-dell02[1357545]: debug 2024-02-
> > > 06T16:09:58.938+
> > > 7f26810ae700  4 rocksdb: (Original Log Time 2024/02/06-
> > > 16:09:58.938432)
> > > [compaction/compaction_job.cc:760] [default] compacted to: base
> > > level 6
> > > level multiplier 10.00 max bytes base 268435456 files[0 0 0 0 0
> > > 0
> > > 2]
> > > max score 0.00, MB/sec: 35

[ceph-users] Re: pacific 16.2.15 QE validation status

2024-02-08 Thread Casey Bodley
thanks, i've created https://tracker.ceph.com/issues/64360 to track
these backports to pacific/quincy/reef

On Thu, Feb 8, 2024 at 7:50 AM Stefan Kooman  wrote:
>
> Hi,
>
> Is this PR: https://github.com/ceph/ceph/pull/54918 included as well?
>
> You definitely want to build the Ubuntu / debian packages with the
> proper CMAKE_CXX_FLAGS. The performance impact on RocksDB is _HUGE_.
>
> Thanks,
>
> Gr. Stefan
>
> P.s. Kudos to Mark Nelson for figuring it out / testing.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Adding a new monitor fails

2024-02-08 Thread Tim Holloway
Thanks,

I'll have to see if I come up with a suitable issue on documentation.
My biggest issue isn't a specific item (well, except for Octopus
telling me to use the not-included ceph-deploy command in lots of
places). It's more a case of needing attention paid to anachronisms in
general.

That and more attention could be paid to the distinction between
container-based and OS-native Ceph components.

So in short, not single issues, but more of a need for attention to the
overall details to assure that features described for a specific
release actually apply TO that release. Grunt work, but it can save a
lot on service calls.

I migrated to ceph from gluster because gluster is apparently going
unsupported at the end of this year. I moved to gluster from DR/BD
because I wanted triple redundancy on the data. While ceph is really
kind of overkill for my small R&D farm, it has proven to be about the
most solid network distributed filesystem I've worked with, No split
brains, no outright corruption, no data outages. Despite all the
atrocities I committed in setting it up, it has never failed at it
primary duty of delivering data service.

I started off with Octopus, and that has been the root of a lot of my
problems. Octopus introduced cephadm as a primary management tool, I
believe, but the documentation still referenced ceph-deploy. And
cephadm suffered from a bug that meant that if even one service was
down, scheduled work would not be done, so to repair anything I needed
an already-repaired system.

Migrating to Pacific cleared that up so a lot of what I'm doing now is
getting the lint out. I'm now staying consistently healthy between a
proper monitor configuration and having removed direct ceph mounts on
the desktops.

I very much appreciate all the help and insights you've provided. It's
nice to have laid my problems to rest.

   Tim

On Thu, 2024-02-08 at 14:41 +, Eugen Block wrote:
> Hi,
> 
> you're always welcome to report a documentation issue on  
> tracker.ceph.com, you don't need to clean them up by yourself. :-)
> There is a major restructuring in progress, but they will probably  
> never be perfect anyway.
> 
> > There are definitely some warts om there, as the monitor count was
> > 1
> > but there were 2 monitors listed running.
> 
> I don't know your mon history, but I assume that you've had more
> than  
> one mon (before converting to cephadm?). Then you might have updated 
> the mon specs via command line, containing "count:1". But the mgr  
> refuses to remove the second mon because it would break quorum.
> That's  
> why you had 2/1 running, this is reproducible in my test cluster.
> Adding more mons also failed because of the count:1 spec. You could  
> have just overwritten it in the cli as well without a yaml spec file 
> (omit the count spec):
> 
> ceph orch apply mon --placement="host1,host2,host3"
> 
> Regards,
> Eugen
> 
> Zitat von Tim Holloway :
> 
> > Ah, yes. Much better.
> > 
> > There are definitely some warts om there, as the monitor count was
> > 1
> > but there were 2 monitors listed running.
> > 
> > I've mostly avoided docs that reference ceph config files and yaml
> > configs because the online docs are (as I've whined before) not
> > always
> > trustworthy and often contain anachronisms. Were I sufficiently
> > knowledgeable, I'd offer to clean them up, but if that were the
> > case, I
> > wouldn't have to come crying here.
> > 
> > All happy now, though.
> > 
> >    Tim
> > 
> > 
> > On Tue, 2024-02-06 at 19:22 +, Eugen Block wrote:
> > > Yeah, you have the „count:1“ in there, that’s why your manually
> > > added 
> > > daemons are rejected. Try my suggestion with a mon.yaml.
> > > 
> > > Zitat von Tim Holloway :
> > > 
> > > > ceph orch ls
> > > > NAME   PORTS    RUNNING 
> > > > REFRESHED 
> > > > AGE
> > > > PLACEMENT
> > > > alertmanager   ?:9093,9094  1/1  3m
> > > > ago
> > > > 8M
> > > > count:1
> > > > crash   5/5  3m
> > > > ago
> > > > 8M
> > > > *
> > > > grafana    ?:3000   1/1  3m
> > > > ago
> > > > 8M
> > > > count:1
> > > > mds.ceefs   2/2  3m
> > > > ago
> > > > 4M
> > > > count:2
> > > > mds.fs_name 3/3  3m
> > > > ago
> > > > 8M
> > > > count:3
> > > > mgr 3/3  3m
> > > > ago
> > > > 4M
> > > > www6.mousetech.com;www2.mousetech.com;www7.mousetech.com
> > > > mon 2/1  3m
> > > > ago
> > > > 4M
> > > > www6.mousetech.com;www2.mousetech.com;www7.mousetech.com;count:
> > > > 1
> > > > nfs.foo    ?:2049   1/1  3m
> > > > ago
> > > > 4M
> > > > www7.mousetech.com
> > > > node-exporter  ?:9100   5/5  3m
> > > > ago
> > > > 8M
> > > > *
> > > > osd  

[ceph-users] PSA: Long Standing Debian/Ubuntu build performance issue (fixed, backports in progress)

2024-02-08 Thread Mark Nelson

Hi Folks,

Recently we discovered a flaw in how the upstream Ubuntu and Debian 
builds of Ceph compile RocksDB.  It causes a variety of performance 
issues including slower than expected write performance, 3X longer 
compaction times, and significantly higher than expected CPU utilization 
when RocksDB is heavily utilized.  The issue has now been fixed in main. 
Igor Fedotov, however, observed during the performance meeting today 
that there were no backports for the fix in place.  He also rightly 
pointed out that it would be helpful to make an announcement about the 
issue given the severity for the affected users. I wanted to give a bit 
more background and make sure people are aware and understand what's 
going on.


1) Who's affected?

Anyone running an upstream Ubuntu/Debian build of Ceph from the last 
several years.  External builds from Canonical and Gentoo suffered from 
this issue as well, but were fixed independently.


2) How can you check?

There's no easy way to tell at the moment.  We are investigating if 
running "strings" on the OSD executable may provide a clue.  For now, 
assume that if you are using our Debian/Ubuntu builds in a non-container 
configuration you are affected.  Proxmox for instance was affected prior 
to adopting the fix.


3) Are Cephadm deployments affected?

Not as far as we know.  Ceph container builds are compiled slightly 
differently from stand-alone Debian builds.  They do not appear to 
suffer from the bug.


4) What versions of Ceph will get the fix?

Casey Bodley kindly offered to backport the fix to both Reef and Quincy. 
 He also verified that the fix builds properly with Pacific.  We now 
have 3 separate backport PRs for the releases here:


https://github.com/ceph/ceph/pull/55500
https://github.com/ceph/ceph/pull/55501
https://github.com/ceph/ceph/pull/55502


Please feel free to reply if you have any questions!

Thanks,
Mark

--
Best Regards,
Mark Nelson
Head of Research and Development

Clyso GmbH
p: +49 89 21552391 12 | a: Minnesota, USA
w: https://clyso.com | e: mark.nel...@clyso.com

We are hiring: https://www.clyso.com/jobs/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] What is the proper way to setup Rados Gateway (RGW) under Ceph?

2024-02-08 Thread Michael Worsham
I have setup a 'reef' Ceph Cluster using Cephadm and Ansible in a VMware ESXi 7 
/ Ubuntu 22.04 lab environment per the how-to guide provided here:  
https://computingforgeeks.com/install-ceph-storage-cluster-on-ubuntu-linux-servers/.

The installation steps were fairly easy and I was able to get the environment 
up and running in about 15 minutes under VMware ESXi 7. I have buckets and 
pools already setup. However, the ceph.io site is confusing on how to setup the 
Rados Gateway (radosgw) with Multi-site -- 
https://docs.ceph.com/en/latest/radosgw/multisite/. Is a copy of HAProxy also 
needed for handling the front-end load balancing or is it implied that Ceph 
sets it up?

Command-line scripting I was planning on using for setting up the RGW:
```
radosgw-admin realm create --rgw-realm=sandbox --default
radosgw-admin zonegroup create --rgw-zonegroup=sandbox  --master --default
radosgw-admin zone create --rgw-zonegroup=sandbox --rgw-zone=sandbox --master 
--default
radosgw-admin period update --rgw-realm=sandbox --commit
ceph orch apply rgw sandbox --realm=sandbox --zone=sandbox --placement="2 
ceph-mon1 ceph-mon2" --port=8000
```

What other steps are needed to get the RGW up and running so that it can be 
presented to something like Veeam for doing performance and I/O testing 
concepts?

-- Michael

This message and its attachments are from Data Dimensions and are intended only 
for the use of the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential, and exempt from 
disclosure under applicable law. If the reader of this message is not the 
intended recipient, or the employee or agent responsible for delivering the 
message to the intended recipient, you are hereby notified that any 
dissemination, distribution, or copying of this communication is strictly 
prohibited. If you have received this communication in error, please notify the 
sender immediately and permanently delete the original email and destroy any 
copies or printouts of this email as well as any attachments.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PSA: Long Standing Debian/Ubuntu build performance issue (fixed, backports in progress)

2024-02-08 Thread Bailey Allison
Holy! I have no questions just wanted to say thanks for emailing this, as
much as it does suck to know that's been an issue I really appreciate you
sharing the information about this on here.

We've got a fair share of ubuntu clusters so if there's a way to validate I
would love to know, but it also seems like it's pretty much guaranteed to
have the issue so maybe no need for that hahahaha.

If there's anything we can provide that would be of assistance let me know
and I can see what we can do too!

Thanks to everyone involved that's doing the hard work to get this resolved!

Regards,

Bailey

> -Original Message-
> From: Mark Nelson 
> Sent: February 8, 2024 2:05 PM
> To: ceph-users@ceph.io; d...@ceph.io
> Subject: [ceph-users] PSA: Long Standing Debian/Ubuntu build performance
> issue (fixed, backports in progress)
> 
> Hi Folks,
> 
> Recently we discovered a flaw in how the upstream Ubuntu and Debian
> builds of Ceph compile RocksDB.  It causes a variety of performance issues
> including slower than expected write performance, 3X longer compaction
> times, and significantly higher than expected CPU utilization when RocksDB
is
> heavily utilized.  The issue has now been fixed in main.
> Igor Fedotov, however, observed during the performance meeting today
> that there were no backports for the fix in place.  He also rightly
pointed out
> that it would be helpful to make an announcement about the issue given the
> severity for the affected users. I wanted to give a bit more background
and
> make sure people are aware and understand what's going on.
> 
> 1) Who's affected?
> 
> Anyone running an upstream Ubuntu/Debian build of Ceph from the last
> several years.  External builds from Canonical and Gentoo suffered from
this
> issue as well, but were fixed independently.
> 
> 2) How can you check?
> 
> There's no easy way to tell at the moment.  We are investigating if
running
> "strings" on the OSD executable may provide a clue.  For now, assume that
if
> you are using our Debian/Ubuntu builds in a non-container configuration
you
> are affected.  Proxmox for instance was affected prior to adopting the
fix.
> 
> 3) Are Cephadm deployments affected?
> 
> Not as far as we know.  Ceph container builds are compiled slightly
> differently from stand-alone Debian builds.  They do not appear to suffer
> from the bug.
> 
> 4) What versions of Ceph will get the fix?
> 
> Casey Bodley kindly offered to backport the fix to both Reef and Quincy.
>   He also verified that the fix builds properly with Pacific.  We now have
3
> separate backport PRs for the releases here:
> 
> https://github.com/ceph/ceph/pull/55500
> https://github.com/ceph/ceph/pull/55501
> https://github.com/ceph/ceph/pull/55502
> 
> 
> Please feel free to reply if you have any questions!
> 
> Thanks,
> Mark
> 
> --
> Best Regards,
> Mark Nelson
> Head of Research and Development
> 
> Clyso GmbH
> p: +49 89 21552391 12 | a: Minnesota, USA
> w: https://clyso.com | e: mark.nel...@clyso.com
> 
> We are hiring: https://www.clyso.com/jobs/
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email
> to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Storage || Deploy/Install/Bootstrap a Ceph Cluster || Cephadm Orchestrator CLI method

2024-02-08 Thread ankit
Hi Guys,

I am newbie and trying to install Ceph Storage cluster and following this 
https://docs.ceph.com/en/latest/cephadm/install/#cephadm-deploying-new-cluster

=
OS - Ubuntu 22.04.3 LTS (Jammy Jellyfish)

4 node Cluster - mon1,mgr1,2 OSD nodes

mon1 node can ssh all nodes via root to sudo ceph-user and ceph-user to 
ceph-user on other nodes

basic requirements are done like podman, python3, systemd,ntp, lvm.
===

cephadm bootstrap --mon-ip 192.168.2.125 - after running this i am getting 
following error. 

ceph-user@mon1:~$ sudo cephadm bootstrap --mon-ip 192.168.2.125
Creating directory /etc/ceph for ceph.conf
Verifying podman|docker is present...
Verifying lvm2 is present...
Verifying time synchronization is in place...
Unit chrony.service is enabled and running
Repeating the final host check...
podman (/usr/bin/podman) version 3.4.4 is present
systemctl is present
lvcreate is present
Unit chrony.service is enabled and running
Host looks OK
Cluster fsid: 90813682-c656-11ee-9ca3-0800274ff361
Verifying IP 192.168.2.125 port 3300 ...
Verifying IP 192.168.2.125 port 6789 ...
Mon IP `192.168.2.125` is in CIDR network `192.168.2.0/24`
Mon IP `192.168.2.125` is in CIDR network `192.168.2.0/24`
Internal network (--cluster-network) has not been provided, OSD replication 
will default to the public_network
Pulling container image quay.io/ceph/ceph:v17...
Ceph version: ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) 
quincy (stable)
Extracting ceph user uid/gid from container image...
Creating initial keys...
Creating initial monmap...
Creating mon...
Waiting for mon to start...
Waiting for mon...
mon is available
Assimilating anything we can from ceph.conf...
Generating new minimal ceph.conf...
Restarting the monitor...
Setting mon public_network to 192.168.2.0/24
Wrote config to /etc/ceph/ceph.conf
Wrote keyring to /etc/ceph/ceph.client.admin.keyring
Creating mgr...
Verifying port 9283 ...
Waiting for mgr to start...
Waiting for mgr...
mgr not available, waiting (1/15)...
mgr not available, waiting (2/15)...
mgr not available, waiting (3/15)...
mgr not available, waiting (4/15)...
mgr not available, waiting (5/15)...
mgr not available, waiting (6/15)...
mgr not available, waiting (7/15)...
mgr is available
Enabling cephadm module...
Waiting for the mgr to restart...
Waiting for mgr epoch 5...
mgr epoch 5 is available
Setting orchestrator backend to cephadm...
Generating ssh key...
Wrote public SSH key to /etc/ceph/ceph.pub
Adding key to root@localhost authorized_keys...
Adding host mon1...
Deploying mon service with default placement...
Deploying mgr service with default placement...
Deploying crash service with default placement...
Deploying ceph-exporter service with default placement...
Non-zero exit code 22 from /usr/bin/podman run --rm --ipc=host 
--stop-signal=SIGTERM --net=host --entrypoint /usr/bin/ceph --init -e 
CONTAINER_IMAGE=quay.io/ceph/ceph:v17 -e NODE_NAME=mon1 -e 
CEPH_USE_RANDOM_NONCE=1 -v 
/var/log/ceph/90813682-c656-11ee-9ca3-0800274ff361:/var/log/ceph:z -v 
/tmp/ceph-tmpnjonhex7:/etc/ceph/ceph.client.admin.keyring:z -v 
/tmp/ceph-tmp3gil6lbb:/etc/ceph/ceph.conf:z quay.io/ceph/ceph:v17 orch apply 
ceph-exporter
/usr/bin/ceph: stderr Error EINVAL: Usage:
/usr/bin/ceph: stderr   ceph orch apply -i  [--dry-run]
/usr/bin/ceph: stderr   ceph orch apply  
[--placement=] [--unmanaged]
/usr/bin/ceph: stderr 
Traceback (most recent call last):
  File "/usr/sbin/cephadm", line 9653, in 
main()
  File "/usr/sbin/cephadm", line 9641, in main
r = ctx.func(ctx)
  File "/usr/sbin/cephadm", line 2205, in _default_image
return func(ctx)
  File "/usr/sbin/cephadm", line 5774, in command_bootstrap
prepare_ssh(ctx, cli, wait_for_mgr_restart)
  File "/usr/sbin/cephadm", line 5275, in prepare_ssh
cli(['orch', 'apply', t])
  File "/usr/sbin/cephadm", line 5708, in cli
return CephContainer(
  File "/usr/sbin/cephadm", line 4144, in run
out, _, _ = call_throws(self.ctx, self.run_cmd(),
  File "/usr/sbin/cephadm", line 1853, in call_throws
raise RuntimeError('Failed command: %s' % ' '.join(command))
RuntimeError: Failed command: /usr/bin/podman run --rm --ipc=host 
--stop-signal=SIGTERM --net=host --entrypoint /usr/bin/ceph --init -e 
CONTAINER_IMAGE=quay.io/ceph/ceph:v17 -e NODE_NAME=mon1 -e 
CEPH_USE_RANDOM_NONCE=1 -v 
/var/log/ceph/90813682-c656-11ee-9ca3-0800274ff361:/var/log/ceph:z -v 
/tmp/ceph-tmpnjonhex7:/etc/ceph/ceph.client.admin.keyring:z -v 
/tmp/ceph-tmp3gil6lbb:/etc/ceph/ceph.conf:z quay.io/ceph/ceph:v17 orch apply 
ceph-exporter


What i am doing wrong or missing? Please help.

Many Thanks
AS
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Does it impact write performance when SSD applies into block.wal (not block.db)

2024-02-08 Thread Jaemin Joo
Hi everyone,

I saw the bluestore can separate block.db, block.wal.
In my case, I'd like to apply hybrid device which uses SSD, HDD to improve
the small data write performance.
but I don't have enough SSD to cover block.db and block.wal.
so I think it can impact performance even though SSD applies into just
block.wal.
I just know that block.wal depends on rocksdb cache size as parameters. SSD
might not need too much.

1.
When I use SSD just into block.wal,
Does it impact the write performance of the small data?

2.
Should I make lv of ssd per osd for block.wal?

3.
How much SSD do I need for block.wal relative to HDD(if I have 100TB)?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Does it impact write performance when SSD applies into block.wal (not block.db)

2024-02-08 Thread Anthony D'Atri


> Hi everyone,
> 
> I saw the bluestore can separate block.db, block.wal.
> In my case, I'd like to apply hybrid device which uses SSD, HDD to improve
> the small data write performance.
> but I don't have enough SSD to cover block.db and block.wal.
> so I think it can impact performance even though SSD applies into just
> block.wal.
> I just know that block.wal depends on rocksdb cache size as parameters. SSD
> might not need too much.
> 
> 1.
> When I use SSD just into block.wal,
> Does it impact the write performance of the small data?

I *think* by default only writes that are smaller than the min_alloc_size the 
OSD was created with will be staged in the WAL.  In recent releases that 
defaults to 4KB.


> 3.
> How much SSD do I need for block.wal relative to HDD(if I have 100TB)?

cf.  
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/SBNRW5R22IE3OVOR57DRL2ULFTWXLAGQ/

The WAL size is I believe constant, 1GB.

Be careful that you don’t share your SSD devices with too many HDDs.  In the 
Filestore days conventional wisdom was to not share a SAS/SATA SSD across more 
than 4-5 HDD OSDs; an NVMe SSD perhaps as high as 10.  If you exceed this ratio 
you may end up slower than with pure HDD OSDs.

Naturally the best solution is to not use HDDs at all ;)

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RGW Index pool(separated SSD) tuning factor

2024-02-08 Thread Jaemin Joo
Hi everyone,

I confirmed that write performance has increased too much even if I apply
just SSD for the index pool of rgw.
I know that ~200 Bytes per object in the index pool is created.
When I checked the index pool size, it's around 300 bytes ~ 400 bytes
calculated.
like me, If it uses index pool applying separated SSD devices, I guess
there is the tuning factor like block size to reduce index pool size.
Is there the tuning factor or recommendation about separated SSD for the
index pool?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io