[ceph-users] Questions about the CRUSH details

2024-01-24 Thread Henry lol
Hello, I'm new to ceph and sorry in advance for the naive questions.

1.
As far as I know, CRUSH utilizes the cluster map consisting of the PG
map and others.
I don't understand why CRUSH computation is required on client-side,
even though PG-to-OSDs mapping can be acquired from the PG map.

2.
how does the client get a valid(old) OSD set when the PG is being
remapped to a new ODS set which CRUSH returns?

thanks.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Questions about the CRUSH details

2024-01-24 Thread Henry lol
Do you mean object location (osds) is initially calculated only using its
name and crushmap,
and then the result is reprocessed with the map of the PGs?

and I'm still skeptical about computation on the client-side.
is it possible to obtain object location without computation on the client
because ceph-mon already updates that information to PG map?

2024년 1월 25일 (목) 오전 3:08, David C. 님이 작성:

> Hi,
>
> The client calculates the location (PG) of an object from its name and the
> crushmap.
> This is what makes it possible to parallelize the flows directly from the
> client.
>
> The client also has the map of the PGs which are relocated to other OSDs
> (upmap, temp, etc.)
> 
>
> Cordialement,
>
> *David CASIER*
> ________
>
>
>
> Le mer. 24 janv. 2024 à 17:49, Henry lol  a
> écrit :
>
>> Hello, I'm new to ceph and sorry in advance for the naive questions.
>>
>> 1.
>> As far as I know, CRUSH utilizes the cluster map consisting of the PG
>> map and others.
>> I don't understand why CRUSH computation is required on client-side,
>> even though PG-to-OSDs mapping can be acquired from the PG map.
>>
>> 2.
>> how does the client get a valid(old) OSD set when the PG is being
>> remapped to a new ODS set which CRUSH returns?
>>
>> thanks.
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Questions about the CRUSH details

2024-01-25 Thread Henry lol
It's reasonable enough.
actually, I expected the client to have just? thousands of
"PG-to-OSDs" mappings.
Nevertheless, it’s so heavy that the client calculates location on
demand, right?

if the client with the outdated map sends a request to the wrong OSD,
then does the OSD handle it somehow through redirection or something?

Lastly, not only CRUSH map but also other factors like storage usage
are considered when doing CRUSH?
because it seems that the target OSD set isn’t deterministic given only it.

2024년 1월 25일 (목) 오후 4:42, Janne Johansson 님이 작성:
>
> Den tors 25 jan. 2024 kl 03:05 skrev Henry lol :
> >
> > Do you mean object location (osds) is initially calculated only using its
> > name and crushmap,
> > and then the result is reprocessed with the map of the PGs?
> >
> > and I'm still skeptical about computation on the client-side.
> > is it possible to obtain object location without computation on the client
> > because ceph-mon already updates that information to PG map?
>
> The client should not need to contact the mon for each object access
> and every client can't have a complete list of millions of objects in
> the cluster, so it does client-side computations.
>
> The mon connection will more or less only require new updates if/when
> OSDs change weight or goes in/out. This way, clients can run on
> "autopilot" even if all mons are down, as long as OSD states don't
> change.
>
> --
> May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Questions about the CRUSH details

2024-01-25 Thread Henry lol
Oh! That's why data imbalance occurs in Ceph.
I totally misunderstood Ceph's placement algorithm until just now.

Thank you a lot for your detailed explanation :)

Sincerely,

2024년 1월 25일 (목) 오후 9:32, Janne Johansson 님이 작성:
>
> Den tors 25 jan. 2024 kl 11:57 skrev Henry lol :
> >
> > It's reasonable enough.
> > actually, I expected the client to have just? thousands of
> > "PG-to-OSDs" mappings.
>
> Yes, but filename to PG is done with a pseudorandom algo.
>
> > Nevertheless, it’s so heavy that the client calculates location on
> > demand, right?
>
> Yes, and I guess the client has some kind of algorithm that makes it
> possible to know that PG 1.a4 should be on OSD 4, 93, 44 but also if 4
> is missing, the next candidate would be 51, if 93 isn't up either then
> 66 would be the next logical OSD to contact for that copy and so on.
> Since all parts (client, mons, OSDs) have the same code, when osd 4
> dies, 51 knows it needs to get a copy from either 93 or 44 and as soon
> as that copy is made, the PG will stop being active+degraded but might
> possibly be active+remapped, since it knows it wants to go back to OSD
> 4 if it comes back with the same size again.
>
> > if the client with the outdated map sends a request to the wrong OSD,
> > then does the OSD handle it somehow through redirection or something?
>
> I think it would get told it has the wrong osdmap.
>
> > Lastly, not only CRUSH map but also other factors like storage usage
> > are considered when doing CRUSH?
> > because it seems that the target OSD set isn’t deterministic given only it.
>
> It doesn't take OSD usage into consideration except at creation time
> or OSD in/out/reweighing (or manual displacements with upmap and so
> forth), so this is why "ceph df" will tell you a pool has X free
> space, where X is "smallest free space on the OSDs on which this pool
> lies, times the number of OSDs". Given the pseudorandom placement of
> objects to PGs, there is nothing to prevent you from having the worst
> luck ever and all the objects you create end up on the OSD with least
> free space.
>
> --
> May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Questions about rbd flatten command

2024-04-01 Thread Henry lol
Hello,

I executed multiple 'rbd flatten' commands simultaneously on a client.
The elapsed time of each flatten job increased as the number of jobs
increased, and network I/O was nearly full.

so, I have two questions.
1. isn’t the flatten job running within the ceph cluster? Why is
client-side network I/O so high?
2. How can I apply qos for each flatten job to reduce network I/O?

Sincerely,
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Questions about rbd flatten command

2024-04-01 Thread Henry lol
I'm not sure, but it seems that read and write operations are
performed for all objects in rbd.
If so, is there any method to apply qos for flatten operation?

2024년 4월 1일 (월) 오후 11:59, Henry lol 님이 작성:
>
> Hello,
>
> I executed multiple 'rbd flatten' commands simultaneously on a client.
> The elapsed time of each flatten job increased as the number of jobs
> increased, and network I/O was nearly full.
>
> so, I have two questions.
> 1. isn’t the flatten job running within the ceph cluster? Why is
> client-side network I/O so high?
> 2. How can I apply qos for each flatten job to reduce network I/O?
>
> Sincerely,
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Questions about rbd flatten command

2024-04-02 Thread Henry lol
Yes, they do.
Actually, the read/write ops will be skipped as you said.

Also, is it possible to limit the max network throughput per flatten
operation or image?
I want to avoid the scenario where the flatten operation consumes
network throughput fully.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] How to Identify Bottlenecks in RBD job

2024-04-05 Thread Henry lol
Hello,

I executed a long-running rbd flatten job on the client and expected
one of the resources, such as CPU, memory, or network I/O, to reach
its maximum usage.
However, they didn’t. How can I find the bottleneck to improve performance?

Specifically, cpu usage of each msgr-worker thread of rbd is under
60%, network usage is under 30% and memory is rarely used.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to Identify Bottlenecks in RBD job

2024-04-05 Thread Henry lol
Of course, the ceph cluster has sufficient capacity to handle the job
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Any way to put the rate limit on rbd flatten operation?

2024-08-07 Thread Henry lol
Hello,

AFAIK, massive rx/tx occurs on the client side for the flatten operation.
so, I want to control the network rate limit or predict the network
bandwidth it will consume.
Is there any way to do that?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io