[ceph-users] Re: OSD hearbeat_check failure while using 10Gb/s

2024-06-17 Thread Phong Tran Thanh
check mtu between nodes first, ping with mtu size to check it.


Vào Th 2, 17 thg 6, 2024 vào lúc 22:59 Sarunas Burdulis <
saru...@math.dartmouth.edu> đã viết:

> Hi,
>
> 6 host 16 OSD cluster here, all SATA SSDs. All Ceph daemons version
> 18.2.2. Host OS is Ubuntu 24.04. Intel X540 10Gb/s interfaces for
> cluster network. All is fine while using a 1Gb/s switch. When moved to
> 10Gb/s switch (Netgear XS712T), OSDs, one-by-one start failing heartbeat
> checks and are marked as 'down' until only 3 or 4 OSDs remain up. By
> then cluster is unusable (slow ops, PGs inactive).
>
> Here is a sample sequence from the log of one of the OSDs:
>
> ceph-osd[23402]: osd.3 77434 heartbeat_check: no reply from
> 129.170.x.x:6802 osd.13 ever on either front or back
>
> ceph-osd[23402]: log_channel(cluster) log [WRN] : 101 slow requests (by
> type [ 'delayed' : 101 ] most affected pool [ 'default.rgw.log' : 96 ])
>
> ceph-osd[23402]: log_channel(cluster) log [WRN] : Monitor daemon marked
> osd.3 down, but it is still running
>
> ceph-osd[23402]: log_channel(cluster) log [DBG] : map e77442 wrongly
> marked me down at e77441
>
> ceph-osd[23402]: osd.3 77442 start_waiting_for_healthy
>
> ceph-osd[23402]: osd.3 77434 is_healthy false -- only 0/10 up peers
> (less than 33%)
>
> ceph-osd[23402]: osd.3 77434 not healthy; waiting to boot
>
> OSD service container keeps running, but it is not booted.
>
> Has anyone experienced this? Any ideas on what should be fixed? Please
> let me know what other info would be useful.
>
> Best regards,
> --
> Sarunas Burdulis
> Dartmouth Mathematics
> math.dartmouth.edu/~sarunas
>
> · https://useplaintext.email ·
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Trân trọng,


*Tran Thanh Phong*

Email: tranphong...@gmail.com
Skype: tranphong079
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: About disk disk iops and ultil peak

2024-06-11 Thread Phong Tran Thanh
Hi Anthony,

I have 15 nodes 18HDD disk and 6 ssd disk per node

Vào Th 3, 11 thg 6, 2024 vào lúc 10:29 Anthony D'Atri <
anthony.da...@gmail.com> đã viết:

> What specifically are your OSD devices?
>
> On Jun 10, 2024, at 22:23, Phong Tran Thanh 
> wrote:
>
> Hi ceph user!
>
> I am encountering a problem with IOPS and disk utilization of OSD.
> Sometimes, my disk peaks in IOPS and utilization become too high, which
> affects my cluster and causes slow operations to appear in the logs.
>
> 6/6/24 9:51:46 AM[WRN]Health check update: 0 slow ops, oldest one blocked
> for 36 sec, osd.268 has slow ops (SLOW_OPS)
>
> 6/6/24 9:51:37 AM[WRN]Health check update: 0 slow ops, oldest one blocked
> for 31 sec, osd.268 has slow ops (SLOW_OPS)
>
> 
> This is config tu reduce it, but its not resolve my problem
> global  advanced  osd_mclock_profile
> custom
>
> global  advanced
>  osd_mclock_scheduler_background_best_effort_lim  0.10
>
> global  advanced
>  osd_mclock_scheduler_background_best_effort_res  0.10
>
> global  advanced
>  osd_mclock_scheduler_background_best_effort_wgt  1
>
> global  advanced  osd_mclock_scheduler_background_recovery_lim
> 0.10
>
> global  advanced  osd_mclock_scheduler_background_recovery_res
> 0.10
>
> global  advanced  osd_mclock_scheduler_background_recovery_wgt
> 1
>
> global  advanced  osd_mclock_scheduler_client_lim
>  0.40
>
> global  advanced  osd_mclock_scheduler_client_res
>  0.40
>
> global  advanced  osd_mclock_scheduler_client_wgt 4
>
> Hope someone can help me
>
> Thanks so much!
> --
>
> Email: tranphong...@gmail.com
> Skype: tranphong079
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>

-- 
Trân trọng,


*Tran Thanh Phong*

Email: tranphong...@gmail.com
Skype: tranphong079
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: About disk disk iops and ultil peak

2024-06-11 Thread Phong Tran Thanh
Hi Anthony!

My osd is HDD 12TB 7200 and SSD 960GB for wal/db

Thanks Anthony!

Vào Th 3, 11 thg 6, 2024 vào lúc 10:29 Anthony D'Atri <
anthony.da...@gmail.com> đã viết:

> What specifically are your OSD devices?
>
> On Jun 10, 2024, at 22:23, Phong Tran Thanh 
> wrote:
>
> Hi ceph user!
>
> I am encountering a problem with IOPS and disk utilization of OSD.
> Sometimes, my disk peaks in IOPS and utilization become too high, which
> affects my cluster and causes slow operations to appear in the logs.
>
> 6/6/24 9:51:46 AM[WRN]Health check update: 0 slow ops, oldest one blocked
> for 36 sec, osd.268 has slow ops (SLOW_OPS)
>
> 6/6/24 9:51:37 AM[WRN]Health check update: 0 slow ops, oldest one blocked
> for 31 sec, osd.268 has slow ops (SLOW_OPS)
>
> 
> This is config tu reduce it, but its not resolve my problem
> global  advanced  osd_mclock_profile
> custom
>
> global  advanced
>  osd_mclock_scheduler_background_best_effort_lim  0.10
>
> global  advanced
>  osd_mclock_scheduler_background_best_effort_res  0.10
>
> global  advanced
>  osd_mclock_scheduler_background_best_effort_wgt  1
>
> global  advanced  osd_mclock_scheduler_background_recovery_lim
> 0.10
>
> global  advanced  osd_mclock_scheduler_background_recovery_res
> 0.10
>
> global  advanced  osd_mclock_scheduler_background_recovery_wgt
> 1
>
> global  advanced  osd_mclock_scheduler_client_lim
>  0.40
>
> global  advanced  osd_mclock_scheduler_client_res
>  0.40
>
> global  advanced  osd_mclock_scheduler_client_wgt 4
>
> Hope someone can help me
>
> Thanks so much!
> --
>
> Email: tranphong...@gmail.com
> Skype: tranphong079
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>

-- 
Trân trọng,


*Tran Thanh Phong*

Email: tranphong...@gmail.com
Skype: tranphong079
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] About placement group scrubbing state

2024-05-29 Thread Phong Tran Thanh
Hi everyone,

I want to know about the scrubbing state placement group, my cluster has
too many pg in state scrubbing and it's increasing over time, maybe
scrubbing takes too long.
[image: image.png]
Cluster is not problem
I'm using reef version
root@n1s1:~# ceph health detail
HEALTH_OK

I want to ask, if the scrubbing state of pg is too long, what is the matter
with my cluster? scrubbing error or another problem?

Thank to the community



*Tran Thanh Phong*

Email: tranphong...@gmail.com
Skype: tranphong079
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Recomand number of k and m erasure code

2024-01-15 Thread Phong Tran Thanh
Thanks Anthony for your knowledge.

I am very happy

Vào Th 7, 13 thg 1, 2024 vào lúc 23:36 Anthony D'Atri <
anthony.da...@gmail.com> đã viết:

> There are nuances, but in general the higher the sum of m+k, the lower the
> performance, because *every* operation has to hit that many drives, which
> is especially impactful with HDDs.  So there’s a tradeoff between storage
> efficiency and performance.  And as you’ve seen, larger parity groups
> especially mean slower recovery/backfill.
>
> There’s also a modest benefit to choosing values of m and k that have
> small prime factors, but I wouldn’t worry too much about that.
>
>
> You can find EC efficiency tables on the net:
>
>
>
> https://docs.netapp.com/us-en/storagegrid-116/ilm/what-erasure-coding-schemes-are.html
>
>
> I should really add a table to the docs, making a note to do that.
>
> There’s a nice calculator at the OSNEXUS site:
>
> Ceph Designer <https://www.osnexus.com/ceph-designer>
> osnexus.com <https://www.osnexus.com/ceph-designer>
> [image: favicon.ico] <https://www.osnexus.com/ceph-designer>
> <https://www.osnexus.com/ceph-designer>
>
>
> The overhead factor is (k+m) / k
>
> So for a 4,2 profile, that’s 6 / 4 == 1.5
>
> For 6,2, 8 / 6 = 1.33
>
> For 10,2, 12 / 10 = 1.2
>
> and so forth.  As k increases, the incremental efficiency gain sees
> diminishing returns, but performance continues to decrease.
>
> Think of m as the number of copies you can lose without losing data, and
> m-1 as the number you can lose / have down and still have data *available*.
>
> I also suggest that the number of failure domains — in your cases this
> means OSD nodes — be *at least* k+m+1, so in your case you want k+m to be
> at most 9.
>
> With RBD and many CephFS implementations, we mostly have relatively large
> RADOS objects that are striped over many OSDs.
>
> When using RGW especially, one should attend to average and median S3
> object size.  There’s an analysis of the potential for space amplification
> in the docs so I won’t repeat it here in detail. This sheet
> https://docs.google.com/spreadsheets/d/1rpGfScgG-GLoIGMJWDixEkqs-On9w8nAUToPQjN8bDI/edit#gid=358760253
>  visually
> demonstrates this.
>
> Basically, for an RGW bucket pool — or for a CephFS data pool storing
> unusually small objects — if you have a lot of S3 objects in the multiples
> of KB size, you waste a significant fraction of underlying storage.  This
> is exacerbated by EC, and the larger the sum of k+m, the more waste.
>
> When people ask me about replication vs EC and EC profile, the first
> question I ask is what they’re storing.  When EC isn’t a non-starter, I
> tend to recommend 4,2 as a profile until / unless someone has specific
> needs and can understand the tradeoffs. This lets you store ~~ 2x the data
> of 3x replication while not going overboard on the performance hit.
>
> If you care about your data, do not set m=1.
>
> If you need to survive the loss of many drives, say if your cluster is
> across multiple buildings or sites, choose a larger value of k.  There are
> people running profiles like 4,6 because they have unusual and specific
> needs.
>
>
>
>
> On Jan 13, 2024, at 10:32 AM, Phong Tran Thanh 
> wrote:
>
> Hi ceph user!
>
> I need to determine which erasure code values (k and m) to choose for a
> Ceph cluster with 10 nodes.
>
> I am using the reef version with rbd. Furthermore, when using a larger k,
> for example, ec6+2 and ec4+2, which erasure coding performance is better,
> and what are the criteria for choosing the appropriate erasure coding?
> Please help me
>
> Email: tranphong...@gmail.com
> Skype: tranphong079
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>

-- 
Trân trọng,


*Tran Thanh Phong*

Email: tranphong...@gmail.com
Skype: tranphong079


favicon.ico
Description: Binary data
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Recomand number of k and m erasure code

2024-01-15 Thread Phong Tran Thanh
Dear Frank,

"For production systems I would recommend to use EC profiles with at least
m=3" -> can i set min_size with min_size=4 for ec4+2 it's ok for
productions? My data is video from the camera system, it's hot data, write
and delete in some day, 10-15 day ex... Read and write availability is more
important than data loss

Thanks Frank

Vào Th 2, 15 thg 1, 2024 vào lúc 16:46 Frank Schilder  đã
viết:

> I would like to add here a detail that is often overlooked:
> maintainability under degraded conditions.
>
> For production systems I would recommend to use EC profiles with at least
> m=3. The reason being that if you have a longer problem with a node that is
> down and m=2 it is not possible to do any maintenance on the system without
> loosing write access. Don't trust what users claim they are willing to
> tolerate - at least get it in writing. Once a problem occurs they will be
> at your door step no matter what they said before.
>
> Similarly, when doing a longer maintenance task and m=2, any disk fail
> during maintenance will imply loosing write access.
>
> Having m=3 or larger allows for 2 (or larger) numbers of hosts/OSDs being
> unavailable simultaneously while service is fully operational. That can be
> a life saver in many situations.
>
> An additional reason for larger m is systematic failures of drives if your
> vendor doesn't mix drives from different batches and factories. If a batch
> has a systematic production error, failures are no longer statistically
> independent. In such a situation, if one drives fails the likelihood that
> more drives fail at the same time is very high. Having a larger number of
> parity shards increases the chances of recovering from such events.
>
> For similar reasons I would recommend to deploy 5 MONs instead of 3. My
> life got so much better after having the extra redundancy.
>
> As some background, in our situation we experience(d) somewhat heavy
> maintenance operations including modifying/updating ceph nodes (hardware,
> not software), exchanging Racks, switches, cooling and power etc. This
> required longer downtime and/or moving of servers and moving the ceph
> hardware was the easiest compared with other systems due to the extra
> redundancy bits in it. We had no service outages during such operations.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Anthony D'Atri 
> Sent: Saturday, January 13, 2024 5:36 PM
> To: Phong Tran Thanh
> Cc: ceph-users@ceph.io
> Subject: [ceph-users] Re: Recomand number of k and m erasure code
>
> There are nuances, but in general the higher the sum of m+k, the lower the
> performance, because *every* operation has to hit that many drives, which
> is especially impactful with HDDs.  So there’s a tradeoff between storage
> efficiency and performance.  And as you’ve seen, larger parity groups
> especially mean slower recovery/backfill.
>
> There’s also a modest benefit to choosing values of m and k that have
> small prime factors, but I wouldn’t worry too much about that.
>
>
> You can find EC efficiency tables on the net:
>
>
>
> https://docs.netapp.com/us-en/storagegrid-116/ilm/what-erasure-coding-schemes-are.html
>
>
> I should really add a table to the docs, making a note to do that.
>
> There’s a nice calculator at the OSNEXUS site:
>
> https://www.osnexus.com/ceph-designer
>
>
> The overhead factor is (k+m) / k
>
> So for a 4,2 profile, that’s 6 / 4 == 1.5
>
> For 6,2, 8 / 6 = 1.33
>
> For 10,2, 12 / 10 = 1.2
>
> and so forth.  As k increases, the incremental efficiency gain sees
> diminishing returns, but performance continues to decrease.
>
> Think of m as the number of copies you can lose without losing data, and
> m-1 as the number you can lose / have down and still have data *available*.
>
> I also suggest that the number of failure domains — in your cases this
> means OSD nodes — be *at least* k+m+1, so in your case you want k+m to be
> at most 9.
>
> With RBD and many CephFS implementations, we mostly have relatively large
> RADOS objects that are striped over many OSDs.
>
> When using RGW especially, one should attend to average and median S3
> object size.  There’s an analysis of the potential for space amplification
> in the docs so I won’t repeat it here in detail. This sheet
> https://docs.google.com/spreadsheets/d/1rpGfScgG-GLoIGMJWDixEkqs-On9w8nAUToPQjN8bDI/edit#gid=358760253
> visually demonstrates this.
>
> Basically, for an RGW bucket pool — or for a CephFS data pool storing
> unusually small objects — if you have a lot of S3 objects in the multiples
> of KB size, you waste a significant

[ceph-users] Recomand number of k and m erasure code

2024-01-13 Thread Phong Tran Thanh
Hi ceph user!

I need to determine which erasure code values (k and m) to choose for a
Ceph cluster with 10 nodes.

I am using the reef version with rbd. Furthermore, when using a larger k,
for example, ec6+2 and ec4+2, which erasure coding performance is better,
and what are the criteria for choosing the appropriate erasure coding?
Please help me

Email: tranphong...@gmail.com
Skype: tranphong079
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: About ceph disk slowops effect to cluster

2024-01-12 Thread Phong Tran Thanh
Only change it with a custom profile, no with built-in profiles, i am
configuring it from ceph dashboard.

osd_mclock_scheduler_client_wgt=6 -> this is my setting

Vào Th 7, 13 thg 1, 2024 vào lúc 02:19 Anthony D'Atri 
đã viết:

>
>
> > On Jan 12, 2024, at 03:31, Phong Tran Thanh 
> wrote:
> >
> > Hi Yang and Anthony,
> >
> > I found the solution for this problem on a HDD disk 7200rpm
> >
> > When the cluster recovers, one or multiple disk failures because slowop
> > appears and then affects the cluster, we can change these configurations
> > and may reduce IOPS when recovery.
> > osd_mclock_profile=custom
> > osd_mclock_scheduler_background_recovery_lim=0.2
> > osd_mclock_scheduler_background_recovery_res=0.2
> > osd_mclock_scheduler_client_wgt
>
> This got cut off.  What value are you using for wgt?
>
> And how are you setting these?
>
> With 17.2.5 I get
>
> [rook@rook-ceph-tools-5ff8d58445-gkl5w /]$ ceph config set osd
> osd_mclock_scheduler_background_recovery_res 0.2
> Error EINVAL: error parsing value: strict_si_cast: unit prefix not
> recognized
>
> but with 17.2.6 it works.
>
> The wording isn't clear but I suspect this is a function of
> https://tracker.ceph.com/issues/57533
>
> >
> >
> > Vào Th 4, 10 thg 1, 2024 vào lúc 11:22 David Yang  >
> > đã viết:
> >
> >> The 2*10Gbps shared network seems to be full (1.9GB/s).
> >> Is it possible to reduce part of the workload and wait for the cluster
> >> to return to a healthy state?
> >> Tip: Erasure coding needs to collect all data blocks when recovering
> >> data, so it takes up a lot of network card bandwidth and processor
> >> resources.
> >>
> >
> >
> > --
> > Trân trọng,
> >
> 
> >
> > *Tran Thanh Phong*
> >
> > Email: tranphong...@gmail.com
> > Skype: tranphong079
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>

-- 
Trân trọng,


*Tran Thanh Phong*

Email: tranphong...@gmail.com
Skype: tranphong079
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: About ceph disk slowops effect to cluster

2024-01-12 Thread Phong Tran Thanh
Yes, it's good for me, reduce recovery process from 4GB/s to 200MB/s

Vào Th 6, 12 thg 1, 2024 vào lúc 15:52 Szabo, Istvan (Agoda) <
istvan.sz...@agoda.com> đã viết:

> Is it better?
>
> Istvan Szabo
> Staff Infrastructure Engineer
> ---
> Agoda Services Co., Ltd.
> e: istvan.sz...@agoda.com
> ---
>
>
> ------
> *From:* Phong Tran Thanh 
> *Sent:* Friday, January 12, 2024 3:32 PM
> *To:* David Yang 
> *Cc:* ceph-users@ceph.io 
> *Subject:* [ceph-users] Re: About ceph disk slowops effect to cluster
>
> Email received from the internet. If in doubt, don't click any link nor
> open any attachment !
> 
>
> I update the config
> osd_mclock_profile=custom
> osd_mclock_scheduler_background_recovery_lim=0.2
> osd_mclock_scheduler_background_recovery_res=0.2
> osd_mclock_scheduler_client_wgt=6
>
> Vào Th 6, 12 thg 1, 2024 vào lúc 15:31 Phong Tran Thanh <
> tranphong...@gmail.com> đã viết:
>
> > Hi Yang and Anthony,
> >
> > I found the solution for this problem on a HDD disk 7200rpm
> >
> > When the cluster recovers, one or multiple disk failures because slowop
> > appears and then affects the cluster, we can change these configurations
> > and may reduce IOPS when recovery.
> > osd_mclock_profile=custom
> > osd_mclock_scheduler_background_recovery_lim=0.2
> > osd_mclock_scheduler_background_recovery_res=0.2
> > osd_mclock_scheduler_client_wgt
> >
> >
> > Vào Th 4, 10 thg 1, 2024 vào lúc 11:22 David Yang  >
> > đã viết:
> >
> >> The 2*10Gbps shared network seems to be full (1.9GB/s).
> >> Is it possible to reduce part of the workload and wait for the cluster
> >> to return to a healthy state?
> >> Tip: Erasure coding needs to collect all data blocks when recovering
> >> data, so it takes up a lot of network card bandwidth and processor
> >> resources.
> >>
> >
> >
> > --
> > Trân trọng,
> >
> >
> 
> >
> > *Tran Thanh Phong*
> >
> > Email: tranphong...@gmail.com
> > Skype: tranphong079
> >
>
>
> --
> Trân trọng,
>
> 
>
> *Tran Thanh Phong*
>
> Email: tranphong...@gmail.com
> Skype: tranphong079
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
> --
> This message is confidential and is for the sole use of the intended
> recipient(s). It may also be privileged or otherwise protected by copyright
> or other legal rules. If you have received it by mistake please let us know
> by reply email and delete it from your system. It is prohibited to copy
> this message or disclose its content to anyone. Any confidentiality or
> privilege is not waived or lost by any mistaken delivery or unauthorized
> disclosure of the message. All messages sent to and from Agoda may be
> monitored to ensure compliance with company policies, to protect the
> company's interests and to remove potential malware. Electronic messages
> may be intercepted, amended, lost or deleted, or contain viruses.
>


-- 
Trân trọng,


*Tran Thanh Phong*

Email: tranphong...@gmail.com
Skype: tranphong079
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: About ceph disk slowops effect to cluster

2024-01-12 Thread Phong Tran Thanh
I update the config
osd_mclock_profile=custom
osd_mclock_scheduler_background_recovery_lim=0.2
osd_mclock_scheduler_background_recovery_res=0.2
osd_mclock_scheduler_client_wgt=6

Vào Th 6, 12 thg 1, 2024 vào lúc 15:31 Phong Tran Thanh <
tranphong...@gmail.com> đã viết:

> Hi Yang and Anthony,
>
> I found the solution for this problem on a HDD disk 7200rpm
>
> When the cluster recovers, one or multiple disk failures because slowop
> appears and then affects the cluster, we can change these configurations
> and may reduce IOPS when recovery.
> osd_mclock_profile=custom
> osd_mclock_scheduler_background_recovery_lim=0.2
> osd_mclock_scheduler_background_recovery_res=0.2
> osd_mclock_scheduler_client_wgt
>
>
> Vào Th 4, 10 thg 1, 2024 vào lúc 11:22 David Yang 
> đã viết:
>
>> The 2*10Gbps shared network seems to be full (1.9GB/s).
>> Is it possible to reduce part of the workload and wait for the cluster
>> to return to a healthy state?
>> Tip: Erasure coding needs to collect all data blocks when recovering
>> data, so it takes up a lot of network card bandwidth and processor
>> resources.
>>
>
>
> --
> Trân trọng,
>
> 
>
> *Tran Thanh Phong*
>
> Email: tranphong...@gmail.com
> Skype: tranphong079
>


-- 
Trân trọng,


*Tran Thanh Phong*

Email: tranphong...@gmail.com
Skype: tranphong079
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: About ceph disk slowops effect to cluster

2024-01-12 Thread Phong Tran Thanh
Hi Yang and Anthony,

I found the solution for this problem on a HDD disk 7200rpm

When the cluster recovers, one or multiple disk failures because slowop
appears and then affects the cluster, we can change these configurations
and may reduce IOPS when recovery.
osd_mclock_profile=custom
osd_mclock_scheduler_background_recovery_lim=0.2
osd_mclock_scheduler_background_recovery_res=0.2
osd_mclock_scheduler_client_wgt


Vào Th 4, 10 thg 1, 2024 vào lúc 11:22 David Yang 
đã viết:

> The 2*10Gbps shared network seems to be full (1.9GB/s).
> Is it possible to reduce part of the workload and wait for the cluster
> to return to a healthy state?
> Tip: Erasure coding needs to collect all data blocks when recovering
> data, so it takes up a lot of network card bandwidth and processor
> resources.
>


-- 
Trân trọng,


*Tran Thanh Phong*

Email: tranphong...@gmail.com
Skype: tranphong079
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] About ceph disk slowops effect to cluster

2024-01-06 Thread Phong Tran Thanh
Hi community
I'm currently facing a significant issue with my Ceph cluster. I have a
cluster consisting of 10 nodes, and each node is equipped with 6 SSDs of
960GB used for block.db and 18 12TB drives used for data, network bonding
2x10Gbps for public and local networks.

I am using a 4+2 erasure code for RBD in my Ceph cluster. When one node
becomes unavailable, the cluster initiates the recovery process, and
subsequently, slow operations (slowops) logs appear on the disk, impacting
the entire cluster. Afterward, additional nodes are marked as failures. Is
this phenomenon possibly due to the performance of SSDs and HDDs?

When I check the I/O of the disk using the iostat command, the result shows
that disk utilization has reached 80-90%
Is using a combination of HDDs and SAS SSDs in Ceph a choice leading to
poor performance?

My Ceph cluster has a bandwidth of 1.9GB/s
Thanks and hope someone can help me



Email: tranphong...@gmail.com
Skype: tranphong079
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] CEPH create an pool with 256 PGs stuck peering

2024-01-04 Thread Phong Tran Thanh
Hi community.
 I' am running ceph cluster with 10 node and 180 osds, and i create an pool
erasure code 4+2 with 256 PGs, but when create an pool PG too slow, and pg
status stuck peering

EALTH_WARN Reduced data availability: 5 pgs inactive, 5 pgs peering
[WRN] PG_AVAILABILITY: Reduced data availability: 5 pgs inactive, 5 pgs
peering
pg 59.6b is stuck peering for 4m, current state creating+peering, last
acting [17,87,92,117,71,149]
pg 59.78 is stuck peering for 4m, current state creating+peering, last
acting [94,16,137,98,41,79]
pg 59.86 is stuck peering for 4m, current state creating+peering, last
acting [37,107,24,138,144,25]

and this is a pg query

 "recovery_state": [
{
"name": "Started/Primary/Peering/GetInfo",
"enter_time": "2024-01-04T11:02:09.208218+",
"requested_info_from": [
{
"osd": "101(4)"
}
]
},
{
"name": "Started/Primary/Peering",
"enter_time": "2024-01-04T11:02:09.208209+",
"past_intervals": [
{
"first": "0",
"last": "0",
"all_participants": [],
"intervals": []
}
],
"probing_osds": [
"0(3)",
"36(5)",
"74(2)",
"100(0)",
"101(4)",
"150(1)"
],
"down_osds_we_would_probe": [],
"peering_blocked_by": []
},
{
"name": "Started",
"enter_time": "2024-01-04T11:02:09.208161+"
}
],
"agent_state": {}

Why is the pg peering state so slow, it's affected by the network?

My network lacp with two of 10Gbps NIC

--

*Tran Thanh Phong*
Email: tranphong...@gmail.com
Skype: tranphong079
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] About slow query of Block-Images

2023-12-29 Thread Phong Tran Thanh
Hi community,

When I list images of rbd in ceph dashboard, Block->Images, list image is
too slow to view, how can i get it faster.

I am using ceph reef version 18.2.1

Thanks to the community.

*Tran Thanh Phong*

Email: tranphong...@gmail.com
Skype: tranphong079
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: About lost disk with erasure code

2023-12-28 Thread Phong Tran Thanh
Dear Kai Stian Olstad,
Thank you for your information. It's good knowledge for me.

Vào Th 5, 28 thg 12, 2023 vào lúc 15:06 Kai Stian Olstad <
ceph+l...@olstad.com> đã viết:

> On 27.12.2023 04:54, Phong Tran Thanh wrote:
> > Thank you for your knowledge. I have a question. Which pool is affected
> > when the PG is down, and how can I show it?
> > When a PG is down, is only one pool affected or are multiple pools
> > affected?
>
> If only 1 PG is down only 1 pool is affected.
> The name of a PG is {pool-num}.{pg-id} and the pools number you find
> with "ceph osd lspools".
>
> ceph health detail
> will show which PG is down and all other issues.
>
> ceph pg ls
> will show you all PG, their status and the OSD they are running on.
>
> Some useful links
>
> https://docs.ceph.com/en/quincy/rados/operations/monitoring-osd-pg/#monitoring-pg-states
> https://docs.ceph.com/en/quincy/rados/troubleshooting/troubleshooting-pg/
> https://docs.ceph.com/en/latest/dev/placement-group/#user-visible-pg-states
>
>
> --
> Kai Stian Olstad
>


-- 
Trân trọng,


*Tran Thanh Phong*

Email: tranphong...@gmail.com
Skype: tranphong079
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: About lost disk with erasure code

2023-12-26 Thread Phong Tran Thanh
Thank you for your knowledge. I have a question. Which pool is affected
when the PG is down, and how can I show it?
When a PG is down, is only one pool affected or are multiple pools affected?


Vào Th 3, 26 thg 12, 2023 vào lúc 16:15 Janne Johansson <
icepic...@gmail.com> đã viết:

> Den tis 26 dec. 2023 kl 08:45 skrev Phong Tran Thanh <
> tranphong...@gmail.com>:
> >
> > Hi community,
> >
> > I am running ceph with block rbd with 6 nodes, erasure code 4+2 with
> > min_size of pool is 4.
> >
> > When three osd is down, and an PG is state down, some pools is can't
> write
> > data, suppose three osd can't start and pg stuck in down state, how i can
> > delete or recreate pg to replace down pg or another way to allow pool to
> > write/read data?
>
>
> Depending on how the data is laid out in this pool, you might lose
> more or less all data from it.
>
> RBD images get split into pieces of 2 or 4M sizes, so that those
> pieces end up on different PGs,
> which in turn makes them end up on different OSDs and this allows for
> load balancing over the'
> whole cluster, but also means that if you lose some PGs on a 40G RBD
> image (made up of 10k
> pieces), chances are very high that the lost PG did contain one or
> more of those 10k pieces.
>
> So lost PGs would probably mean that every RBD image of decent sizes
> will have holes in them,
> and how this affects all the instances that mount the images will be
> hard to tell.
> If at all possible, try to use the offline OSD tools to try to get
> this PG out of one of the bad OSDs.
>
> https://hawkvelt.id.au/post/2022-4-5-ceph-pg-export-import/ might
> help, to see how to run
> the export + import commands.
>
> If you can get it out, it can be injected (imported) into any other
> running OSD and then replicas
> will be recreated and moved to where they should be.
>
> If you have disks to spare, make sure to do full copies of the broken
> OSDs and work in the copies
> instead, to maximize the chances of restoring your data.
>
> If you are very sure that these three OSDs are never coming back, and
> have marked the OSDs
> as lost, then I guess
>
> ceph pg force_create_pg 
>
> would be the next step to have the cluster create empty PGs to replace
> the lost ones, but I would
> consider this only after trying all the possible options for repairing
> at least one of the OSDs that held
> the PGs that are missing.
>
> --
> May the most significant bit of your life be positive.
>


-- 
Trân trọng,


*Tran Thanh Phong*

Email: tranphong...@gmail.com
Skype: tranphong079
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] About lost disk with erasure code

2023-12-25 Thread Phong Tran Thanh
Hi community,

I am running ceph with block rbd with 6 nodes, erasure code 4+2 with
min_size of pool is 4.

When three osd is down, and an PG is state down, some pools is can't write
data, suppose three osd can't start and pg stuck in down state, how i can
delete or recreate pg to replace down pg or another way to allow pool to
write/read data?

Thanks for the community

*Tran Thanh Phong*

Email: tranphong...@gmail.com
Skype: tranphong079
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph osd dump_historic_ops

2023-12-01 Thread Phong Tran Thanh
It works!!!

Thanks Kai Stian Olstad

Vào Th 6, 1 thg 12, 2023 vào lúc 17:06 Kai Stian Olstad <
ceph+l...@olstad.com> đã viết:

> On Fri, Dec 01, 2023 at 04:33:20PM +0700, Phong Tran Thanh wrote:
> >I have a problem with my osd, i want to show dump_historic_ops of osd
> >I follow the guide:
> >
> https://www.ibm.com/docs/en/storage-fusion/2.6?topic=alerts-cephosdslowops
> >But when i run command
> >
> >ceph daemon osd.8 dump_historic_ops show the error, the command run on
> node
> >with osd.8
> >Can't get admin socket path: unable to get conf option admin_socket for
> >osd: b"error parsing 'osd': expected string of the form TYPE.ID, valid
> >types are: auth, mon, osd, mds, mgr, client\n"
> >
> >I am running ceph cluster reef version by cephadmin install
> >
> >What should I do?
>
> The easiest is use tell, then you can run it on any node that have access
> to ceph.
>
>  ceph tell osd.8 dump_historic_ops
>
>
>  ceph tell osd.8 help
> will give you all you can do with tell.
>
> --
> Kai Stian Olstad
>


-- 
Trân trọng,


*Tran Thanh Phong*

Email: tranphong...@gmail.com
Skype: tranphong079
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph osd dump_historic_ops

2023-12-01 Thread Phong Tran Thanh
Hi community,

I have a problem with my osd, i want to show dump_historic_ops of osd
I follow the guide:
https://www.ibm.com/docs/en/storage-fusion/2.6?topic=alerts-cephosdslowops
But when i run command

ceph daemon osd.8 dump_historic_ops show the error, the command run on node
with osd.8
Can't get admin socket path: unable to get conf option admin_socket for
osd: b"error parsing 'osd': expected string of the form TYPE.ID, valid
types are: auth, mon, osd, mds, mgr, client\n"

I am running ceph cluster reef version by cephadmin install

What should I do?

Thank you.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io