[ceph-users] Re: FS down - mds degraded

2024-02-22 Thread nguyenvandiep
HI Mr Patrick,

We are in same situation with Sake, now my MDS is crashed , NFS service is down 
with CEPHFS not responding. with my "ceph -s" result

health: HEALTH_WARN
3 failed cephadm daemon(s)
1 filesystem is degraded
insufficient standby MDS daemons available
 data:
volumes: 0/1 healthy, 1 recovering
pools:   15 pools, 1457 pgs
pgs: 15664126/110662485 objects misplaced (14.155%)
 1110 active+clean
 305  active+remapped+backfill_wait
 17   active+remapped+backfilling
 13   active+remapped+backfill_toofull


Could you help me explain the status of volume "recovering" ? what is it ? how 
can we track the progress of this?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-22 Thread nguyenvandiep
Could you pls help me explain the status of volume: recovering ? what is it ? 
and do we need to wait for volume recovery progress finished ??
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: High IO utilization for bstore_kv_sync

2024-02-22 Thread Anthony D'Atri



>  you can sometimes find really good older drives like Intel P4510s on ebay 
> for reasonable prices.  Just watch out for how much write wear they have on 
> them.

Also be sure to update to the latest firmware before use, then issue a Secure 
Erase.
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: High IO utilization for bstore_kv_sync

2024-02-22 Thread Mark Nelson
The biggest improvement would be to put all of the OSDs on SSDs with 
PLP.  Next would be to put the WAL/DB on drives with PLP.  If price is a 
concern,  you can sometimes find really good older drives like Intel 
P4510s on ebay for reasonable prices.  Just watch out for how much write 
wear they have on them.



I had an experimental PR that I was playing with to see if I could queue 
up more IO at once in the bstore_kv_sync thread here:


https://github.com/ceph/ceph/pull/50610


I didn't have the proper gear to test it though so it just kind of 
languished and was closed by the bot.  The idea was just a proof of 
concept to see if we could reduce the number of fdatasyncs by manually 
introducing latency and letting more IOs accumulate before doing a flush.



Mark


On 2/22/24 11:29, Work Ceph wrote:

Thanks for the prompt response!

I see, and indeed some of them are consumer SSD disks. Is there any 
parameter that we can change/tune to better handle the call "fdatsync"?


Maybe using NVMEs for the RocksDB?

On Thu, Feb 22, 2024 at 2:24 PM Mark Nelson  wrote:

Most likely you are seeing time spent waiting on fdatsync in
bstore_kv_sync if the drives you are using don't have power loss
protection and can't perform flushes quickly.  Some consumer grade
drives are actually slower at this than HDDs.


Mark


On 2/22/24 11:04, Work Ceph wrote:
> Hello guys,
> We are running Ceph Octopus on Ubuntu 18.04, and we are noticing
spikes of
> IO utilization for bstore_kv_sync thread during processes such
as adding a
> new pool and increasing/reducing the number of PGs in a pool.
>
> It is funny though that the IO utilization (reported with IOTOP)
is 99.99%,
> but the reading for R/W speeds are slow. The devices where we
are seeing
> these issues are all SSDs systems. We are not using high end
SSDs devices
> though.
>
> Have you guys seen such behavior?
>
> Also, do you guys have any clues on why the IO utilization would
be high,
> when there is such a small amount of data being read and written
to the
> OSD/disks?
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
Best Regards,

Mark Nelson
Head of Research and Development

Clyso GmbH
p: +49 89 21552391 12 | a: Minnesota, USA
w: https://clyso.com | e: mark.nel...@clyso.com

We are hiring: https://www.clyso.com/jobs/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--
Best Regards,
Mark Nelson
Head of Research and Development

Clyso GmbH
p: +49 89 21552391 12 | a: Minnesota, USA
w: https://clyso.com | e: mark.nel...@clyso.com

We are hiring: https://www.clyso.com/jobs/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: High IO utilization for bstore_kv_sync

2024-02-22 Thread Work Ceph
Thanks for the prompt response!

I see, and indeed some of them are consumer SSD disks. Is there any
parameter that we can change/tune to better handle the call "fdatsync"?

Maybe using NVMEs for the RocksDB?

On Thu, Feb 22, 2024 at 2:24 PM Mark Nelson  wrote:

> Most likely you are seeing time spent waiting on fdatsync in
> bstore_kv_sync if the drives you are using don't have power loss
> protection and can't perform flushes quickly.  Some consumer grade
> drives are actually slower at this than HDDs.
>
>
> Mark
>
>
> On 2/22/24 11:04, Work Ceph wrote:
> > Hello guys,
> > We are running Ceph Octopus on Ubuntu 18.04, and we are noticing spikes
> of
> > IO utilization for bstore_kv_sync thread during processes such as adding
> a
> > new pool and increasing/reducing the number of PGs in a pool.
> >
> > It is funny though that the IO utilization (reported with IOTOP) is
> 99.99%,
> > but the reading for R/W speeds are slow. The devices where we are seeing
> > these issues are all SSDs systems. We are not using high end SSDs devices
> > though.
> >
> > Have you guys seen such behavior?
> >
> > Also, do you guys have any clues on why the IO utilization would be high,
> > when there is such a small amount of data being read and written to the
> > OSD/disks?
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
> --
> Best Regards,
> Mark Nelson
> Head of Research and Development
>
> Clyso GmbH
> p: +49 89 21552391 12 | a: Minnesota, USA
> w: https://clyso.com | e: mark.nel...@clyso.com
>
> We are hiring: https://www.clyso.com/jobs/
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: High IO utilization for bstore_kv_sync

2024-02-22 Thread Mark Nelson
Most likely you are seeing time spent waiting on fdatsync in 
bstore_kv_sync if the drives you are using don't have power loss 
protection and can't perform flushes quickly.  Some consumer grade 
drives are actually slower at this than HDDs.



Mark


On 2/22/24 11:04, Work Ceph wrote:

Hello guys,
We are running Ceph Octopus on Ubuntu 18.04, and we are noticing spikes of
IO utilization for bstore_kv_sync thread during processes such as adding a
new pool and increasing/reducing the number of PGs in a pool.

It is funny though that the IO utilization (reported with IOTOP) is 99.99%,
but the reading for R/W speeds are slow. The devices where we are seeing
these issues are all SSDs systems. We are not using high end SSDs devices
though.

Have you guys seen such behavior?

Also, do you guys have any clues on why the IO utilization would be high,
when there is such a small amount of data being read and written to the
OSD/disks?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--
Best Regards,
Mark Nelson
Head of Research and Development

Clyso GmbH
p: +49 89 21552391 12 | a: Minnesota, USA
w: https://clyso.com | e: mark.nel...@clyso.com

We are hiring: https://www.clyso.com/jobs/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] High IO utilization for bstore_kv_sync

2024-02-22 Thread Work Ceph
Hello guys,
We are running Ceph Octopus on Ubuntu 18.04, and we are noticing spikes of
IO utilization for bstore_kv_sync thread during processes such as adding a
new pool and increasing/reducing the number of PGs in a pool.

It is funny though that the IO utilization (reported with IOTOP) is 99.99%,
but the reading for R/W speeds are slow. The devices where we are seeing
these issues are all SSDs systems. We are not using high end SSDs devices
though.

Have you guys seen such behavior?

Also, do you guys have any clues on why the IO utilization would be high,
when there is such a small amount of data being read and written to the
OSD/disks?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cannot start ceph after maintenence

2024-02-22 Thread Schweiss, Chip
The problem turns out to be burning the candle at both ends.   I have been
checking network communication for the past few hours and haven't realized
I was using my 1Gb IPs, not the 100Gb IPs.  The 100Gb got connected to the
wrong ports on the cable move.

Thanks for the attempted assists.   Focusing on the mons at least
eventually lead to finding the error.

-Chip

On Thu, Feb 22, 2024 at 7:26 AM Schweiss, Chip  wrote:

> I had to temporarily disconnect the network on my entire Ceph cluster, so
> I prepared the cluster by following what appears to be some incomplete
> advice.
>
> I did the following before disconnecting the network:
> #ceph osd set noout
> #ceph osd set norecover
> #ceph osd set norebalance
> #ceph osd set nobackfill
> #ceph osd set nodown
> #ceph osd set pause
>
> Now, all the ceph services are still running, but I cannot undo any flags:
>
> root@proxmox01:~# ceph osd unset pause
> 2024-02-22T13:16:02.220+ 7f0aab5a26c0  0 monclient(hunting):
> authenticate timed out after 300
>
> [errno 110] RADOS timed out (error connecting to the cluster)
>
> Any advice on how to recover would be greatly appreciated.
>
> Thank you,
> -Chip
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Scrub stuck and 'pg has invalid (post-split) stat'

2024-02-22 Thread Eugen Block

Hi,


Have you already tried to set the primary PG out and wait for the
backfill to finish?


Of course I meant the primary OSD for that PG, I hope that was clear. ;-)


We are thinking about the use of "ceph pg_mark_unfound_lost revert"


I'm not a developer, but how I read the code [2] is that such an  
action actually caused the invalid_state in the first place, the  
function void PrimaryLogPG::mark_all_unfound_lost handles two cases:


case pg_log_entry_t::LOST_REVERT:
...
case pg_log_entry_t::LOST_DELETE:

And after those cases updates the stats and marks them invalid:

  recovery_state.update_stats(
[](auto &history, auto &stats) {
  stats.stats_invalid = true;
  return false;
});

But according to the scrubbing code [3] it would update the invalid  
stats when finishing the scrub:


  if (info.stats.stats_invalid) {
m_pl_pg->recovery_state.update_stats([=](auto& history, auto& stats) {
  stats.stats = m_scrub_cstat;
  stats.stats_invalid = false;
  return false;
});

So from my understanding, the PG should be "scrubbable", I don't  
really understand why it doesn't. Did you already send the overall  
cluster status (ceph -s)? And maybe attach the entire query output to  
a file and attach it?


[2] https://github.com/ceph/ceph/blob/v16.2.13/src/osd/PrimaryLogPG.cc#L12407
[3] https://github.com/ceph/ceph/blob/v16.2.13/src/osd/PrimaryLogScrub.cc#L54

Zitat von Cedric :


On Thu, Feb 22, 2024 at 12:37 PM Eugen Block  wrote:

You haven't told yet if you changed the hit_set_count to 0.


Not yet, we will give it a try ASAP


Have you already tried to set the primary PG out and wait for the
backfill to finish?


No, we will try also


And another question, are all services running pacific already and on
the same version (ceph versions)?


Yes, all daemon runs 16.2.13



Zitat von Cedric :

> Yes the osd_scrub_invalid_stats is set to true.
>
> We are thinking about the use of "ceph pg_mark_unfound_lost revert"
> action, but we wonder if there is a risk of data loss.
>
> On Thu, Feb 22, 2024 at 11:50 AM Eugen Block  wrote:
>>
>> I found a config to force scrub invalid PGs, what is your current
>> setting on that?
>>
>> ceph config get osd osd_scrub_invalid_stats
>> true
>>
>> The config reference states:
>>
>> > Forces extra scrub to fix stats marked as invalid.
>>
>> But the default seems to be true, so I'd expect it's true in your case
>> as well?
>>
>> Zitat von Cedric :
>>
>> > Thanks Eugen for the suggestion, yes we have tried, also repeering
>> > concerned PGs, still the same issue.
>> >
>> > Looking at the code it seems the split-mode message is triggered when
>> > the PG as ""stats_invalid": true,", here is the result of a query:
>> >
>> > "stats_invalid": true,
>> > "dirty_stats_invalid": false,
>> > "omap_stats_invalid": false,
>> > "hitset_stats_invalid": false,
>> > "hitset_bytes_stats_invalid": false,
>> > "pin_stats_invalid": false,
>> > "manifest_stats_invalid": false,
>> >
>> > I also provide again cluster informations that was lost in previous
>> > missed reply all. Don't hesitate to ask more if needed I would be
>> > glade to provide them.
>> >
>> > Cédric
>> >
>> >
>> > On Thu, Feb 22, 2024 at 11:04 AM Eugen Block  wrote:
>> >>
>> >> Hm, I wonder if setting (and unsetting after a while) noscrub and
>> >> nodeep-scrub has any effect. Have you tried that?
>> >>
>> >> Zitat von Cedric :
>> >>
>> >> > Update: we have run fsck and re-shard on all bluestore volume, seems
>> >> > sharding were not applied.
>> >> >
>> >> > Unfortunately scrubs and deep-scrubs are still stuck on PGs of the
>> >> > pool that is suffering the issue, but other PGs scrubs well.
>> >> >
>> >> > The next step will be to remove the cache tier as suggested, but its
>> >> > not available yet as PGs needs to be scrubbed in order for the cache
>> >> > tier can be activated.
>> >> >
>> >> > As we are struggling to make this cluster works again, any help
>> >> > would be greatly appreciated.
>> >> >
>> >> > Cédric
>> >> >
>> >> >> On 20 Feb 2024, at 20:22, Cedric  wrote:
>> >> >>
>> >> >> Thanks Eugen, sorry about the missed reply to all.
>> >> >>
>> >> >> The reason we still have the cache tier is because we were not able
>> >> >> to flush all dirty entry to remove it (as per the procedure), so
>> >> >> the cluster as been migrated from HDD/SSD to NVME a while ago but
>> >> >> tiering remains, unfortunately.
>> >> >>
>> >> >> So actually we are trying to understand the root cause
>> >> >>
>> >> >> On Tue, Feb 20, 2024 at 1:43 PM Eugen Block  wrote:
>> >> >>>
>> >> >>> Please don't drop the list from your response.
>> >> >>>
>> >> >>> The first question coming to mind is, why do you have a  
cache-tier if

>> >> >>> all your pools are on nvme decices anyway? I don't see any
>> benefit here.
>> >> >>> Did you try the suggested workaround and disable the cache-tier?
>> >> >>>
>> >> >>> Zit

[ceph-users] Re: Size return by df

2024-02-22 Thread Konstantin Shalygin
Hi,

Yes you can, this controlled by option

client quota df = false


k
Sent from my iPhone

> On Feb 22, 2024, at 11:17, Albert Shih  wrote:
> 
> Is they are any way to keep the first answer ?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cannot start ceph after maintenence

2024-02-22 Thread Stephan Hohn
Hi Chip,

Looks like not all mons are up or couldn't reach each other via network to
form quorum.
Make sure all nodes can reach each other and check the mon logs.


Furthermore some info about

pvecm status


pveceph status


 or just


ceph status

would be helpful

Cheers

Stephan


Am Do., 22. Feb. 2024 um 14:28 Uhr schrieb Schweiss, Chip <
c...@innovates.com>:

> I had to temporarily disconnect the network on my entire Ceph cluster, so I
> prepared the cluster by following what appears to be some incomplete
> advice.
>
> I did the following before disconnecting the network:
> #ceph osd set noout
> #ceph osd set norecover
> #ceph osd set norebalance
> #ceph osd set nobackfill
> #ceph osd set nodown
> #ceph osd set pause
>
> Now, all the ceph services are still running, but I cannot undo any flags:
>
> root@proxmox01:~# ceph osd unset pause
> 2024-02-22T13:16:02.220+ 7f0aab5a26c0  0 monclient(hunting):
> authenticate timed out after 300
>
> [errno 110] RADOS timed out (error connecting to the cluster)
>
> Any advice on how to recover would be greatly appreciated.
>
> Thank you,
> -Chip
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Cannot start ceph after maintenence

2024-02-22 Thread Schweiss, Chip
I had to temporarily disconnect the network on my entire Ceph cluster, so I
prepared the cluster by following what appears to be some incomplete
advice.

I did the following before disconnecting the network:
#ceph osd set noout
#ceph osd set norecover
#ceph osd set norebalance
#ceph osd set nobackfill
#ceph osd set nodown
#ceph osd set pause

Now, all the ceph services are still running, but I cannot undo any flags:

root@proxmox01:~# ceph osd unset pause
2024-02-22T13:16:02.220+ 7f0aab5a26c0  0 monclient(hunting):
authenticate timed out after 300

[errno 110] RADOS timed out (error connecting to the cluster)

Any advice on how to recover would be greatly appreciated.

Thank you,
-Chip
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Scrub stuck and 'pg has invalid (post-split) stat'

2024-02-22 Thread Cedric
On Thu, Feb 22, 2024 at 12:37 PM Eugen Block  wrote:
> You haven't told yet if you changed the hit_set_count to 0.

Not yet, we will give it a try ASAP

> Have you already tried to set the primary PG out and wait for the
> backfill to finish?

No, we will try also

> And another question, are all services running pacific already and on
> the same version (ceph versions)?

Yes, all daemon runs 16.2.13

>
> Zitat von Cedric :
>
> > Yes the osd_scrub_invalid_stats is set to true.
> >
> > We are thinking about the use of "ceph pg_mark_unfound_lost revert"
> > action, but we wonder if there is a risk of data loss.
> >
> > On Thu, Feb 22, 2024 at 11:50 AM Eugen Block  wrote:
> >>
> >> I found a config to force scrub invalid PGs, what is your current
> >> setting on that?
> >>
> >> ceph config get osd osd_scrub_invalid_stats
> >> true
> >>
> >> The config reference states:
> >>
> >> > Forces extra scrub to fix stats marked as invalid.
> >>
> >> But the default seems to be true, so I'd expect it's true in your case
> >> as well?
> >>
> >> Zitat von Cedric :
> >>
> >> > Thanks Eugen for the suggestion, yes we have tried, also repeering
> >> > concerned PGs, still the same issue.
> >> >
> >> > Looking at the code it seems the split-mode message is triggered when
> >> > the PG as ""stats_invalid": true,", here is the result of a query:
> >> >
> >> > "stats_invalid": true,
> >> > "dirty_stats_invalid": false,
> >> > "omap_stats_invalid": false,
> >> > "hitset_stats_invalid": false,
> >> > "hitset_bytes_stats_invalid": false,
> >> > "pin_stats_invalid": false,
> >> > "manifest_stats_invalid": false,
> >> >
> >> > I also provide again cluster informations that was lost in previous
> >> > missed reply all. Don't hesitate to ask more if needed I would be
> >> > glade to provide them.
> >> >
> >> > Cédric
> >> >
> >> >
> >> > On Thu, Feb 22, 2024 at 11:04 AM Eugen Block  wrote:
> >> >>
> >> >> Hm, I wonder if setting (and unsetting after a while) noscrub and
> >> >> nodeep-scrub has any effect. Have you tried that?
> >> >>
> >> >> Zitat von Cedric :
> >> >>
> >> >> > Update: we have run fsck and re-shard on all bluestore volume, seems
> >> >> > sharding were not applied.
> >> >> >
> >> >> > Unfortunately scrubs and deep-scrubs are still stuck on PGs of the
> >> >> > pool that is suffering the issue, but other PGs scrubs well.
> >> >> >
> >> >> > The next step will be to remove the cache tier as suggested, but its
> >> >> > not available yet as PGs needs to be scrubbed in order for the cache
> >> >> > tier can be activated.
> >> >> >
> >> >> > As we are struggling to make this cluster works again, any help
> >> >> > would be greatly appreciated.
> >> >> >
> >> >> > Cédric
> >> >> >
> >> >> >> On 20 Feb 2024, at 20:22, Cedric  wrote:
> >> >> >>
> >> >> >> Thanks Eugen, sorry about the missed reply to all.
> >> >> >>
> >> >> >> The reason we still have the cache tier is because we were not able
> >> >> >> to flush all dirty entry to remove it (as per the procedure), so
> >> >> >> the cluster as been migrated from HDD/SSD to NVME a while ago but
> >> >> >> tiering remains, unfortunately.
> >> >> >>
> >> >> >> So actually we are trying to understand the root cause
> >> >> >>
> >> >> >> On Tue, Feb 20, 2024 at 1:43 PM Eugen Block  wrote:
> >> >> >>>
> >> >> >>> Please don't drop the list from your response.
> >> >> >>>
> >> >> >>> The first question coming to mind is, why do you have a cache-tier 
> >> >> >>> if
> >> >> >>> all your pools are on nvme decices anyway? I don't see any
> >> benefit here.
> >> >> >>> Did you try the suggested workaround and disable the cache-tier?
> >> >> >>>
> >> >> >>> Zitat von Cedric :
> >> >> >>>
> >> >>  Thanks Eugen, see attached infos.
> >> >> 
> >> >>  Some more details:
> >> >> 
> >> >>  - commands that actually hangs: ceph balancer status ; rbd
> >> -p vms ls ;
> >> >>  rados -p vms_cache cache-flush-evict-all
> >> >>  - all scrub running on vms_caches pgs are stall / start in a loop
> >> >>  without actually doing anything
> >> >>  - all io are 0 both from ceph status or iostat on nodes
> >> >> 
> >> >>  On Tue, Feb 20, 2024 at 10:00 AM Eugen Block  wrote:
> >> >> >
> >> >> > Hi,
> >> >> >
> >> >> > some more details would be helpful, for example what's the
> >> pool size
> >> >> > of the cache pool? Did you issue a PG split before or during the
> >> >> > upgrade? This thread [1] deals with the same problem, the 
> >> >> > described
> >> >> > workaround was to set hit_set_count to 0 and disable the
> >> cache layer
> >> >> > until that is resolved. Afterwards you could enable the cache 
> >> >> > layer
> >> >> > again. But keep in mind that the code for cache tier is entirely
> >> >> > removed in Reef (IIRC).
> >> >> >
> >> >> > Regards,
> >> >> > Eugen
> >> >> >
> >> >> > [1]
> >> >>

[ceph-users] Re: Scrub stuck and 'pg has invalid (post-split) stat'

2024-02-22 Thread Eugen Block

We are thinking about the use of "ceph pg_mark_unfound_lost revert"
action, but we wonder if there is a risk of data loss.


You don't seem to have unfound objects so I don't think that command  
would make sense.

You haven't told yet if you changed the hit_set_count to 0.

Have you already tried to set the primary PG out and wait for the  
backfill to finish?
And another question, are all services running pacific already and on  
the same version (ceph versions)?


Zitat von Cedric :


Yes the osd_scrub_invalid_stats is set to true.

We are thinking about the use of "ceph pg_mark_unfound_lost revert"
action, but we wonder if there is a risk of data loss.

On Thu, Feb 22, 2024 at 11:50 AM Eugen Block  wrote:


I found a config to force scrub invalid PGs, what is your current
setting on that?

ceph config get osd osd_scrub_invalid_stats
true

The config reference states:

> Forces extra scrub to fix stats marked as invalid.

But the default seems to be true, so I'd expect it's true in your case
as well?

Zitat von Cedric :

> Thanks Eugen for the suggestion, yes we have tried, also repeering
> concerned PGs, still the same issue.
>
> Looking at the code it seems the split-mode message is triggered when
> the PG as ""stats_invalid": true,", here is the result of a query:
>
> "stats_invalid": true,
> "dirty_stats_invalid": false,
> "omap_stats_invalid": false,
> "hitset_stats_invalid": false,
> "hitset_bytes_stats_invalid": false,
> "pin_stats_invalid": false,
> "manifest_stats_invalid": false,
>
> I also provide again cluster informations that was lost in previous
> missed reply all. Don't hesitate to ask more if needed I would be
> glade to provide them.
>
> Cédric
>
>
> On Thu, Feb 22, 2024 at 11:04 AM Eugen Block  wrote:
>>
>> Hm, I wonder if setting (and unsetting after a while) noscrub and
>> nodeep-scrub has any effect. Have you tried that?
>>
>> Zitat von Cedric :
>>
>> > Update: we have run fsck and re-shard on all bluestore volume, seems
>> > sharding were not applied.
>> >
>> > Unfortunately scrubs and deep-scrubs are still stuck on PGs of the
>> > pool that is suffering the issue, but other PGs scrubs well.
>> >
>> > The next step will be to remove the cache tier as suggested, but its
>> > not available yet as PGs needs to be scrubbed in order for the cache
>> > tier can be activated.
>> >
>> > As we are struggling to make this cluster works again, any help
>> > would be greatly appreciated.
>> >
>> > Cédric
>> >
>> >> On 20 Feb 2024, at 20:22, Cedric  wrote:
>> >>
>> >> Thanks Eugen, sorry about the missed reply to all.
>> >>
>> >> The reason we still have the cache tier is because we were not able
>> >> to flush all dirty entry to remove it (as per the procedure), so
>> >> the cluster as been migrated from HDD/SSD to NVME a while ago but
>> >> tiering remains, unfortunately.
>> >>
>> >> So actually we are trying to understand the root cause
>> >>
>> >> On Tue, Feb 20, 2024 at 1:43 PM Eugen Block  wrote:
>> >>>
>> >>> Please don't drop the list from your response.
>> >>>
>> >>> The first question coming to mind is, why do you have a cache-tier if
>> >>> all your pools are on nvme decices anyway? I don't see any  
benefit here.

>> >>> Did you try the suggested workaround and disable the cache-tier?
>> >>>
>> >>> Zitat von Cedric :
>> >>>
>>  Thanks Eugen, see attached infos.
>> 
>>  Some more details:
>> 
>>  - commands that actually hangs: ceph balancer status ; rbd  
-p vms ls ;

>>  rados -p vms_cache cache-flush-evict-all
>>  - all scrub running on vms_caches pgs are stall / start in a loop
>>  without actually doing anything
>>  - all io are 0 both from ceph status or iostat on nodes
>> 
>>  On Tue, Feb 20, 2024 at 10:00 AM Eugen Block  wrote:
>> >
>> > Hi,
>> >
>> > some more details would be helpful, for example what's the  
pool size

>> > of the cache pool? Did you issue a PG split before or during the
>> > upgrade? This thread [1] deals with the same problem, the described
>> > workaround was to set hit_set_count to 0 and disable the  
cache layer

>> > until that is resolved. Afterwards you could enable the cache layer
>> > again. But keep in mind that the code for cache tier is entirely
>> > removed in Reef (IIRC).
>> >
>> > Regards,
>> > Eugen
>> >
>> > [1]
>> >
>>  
https://ceph-users.ceph.narkive.com/zChyOq5D/ceph-strange-issue-after-adding-a-cache-osd

>> >
>> > Zitat von Cedric :
>> >
>> >> Hello,
>> >>
>> >> Following an upgrade from Nautilus (14.2.22) to Pacific  
(16.2.13), we

>> >> encounter an issue with a cache pool becoming completely stuck,
>> >> relevant messages below:
>> >>
>> >> pg xx.x has invalid (post-split) stats; must scrub before  
tier agent

>> >> can activate
>> >>
>> >> In OSD logs, scrubs are starting i

[ceph-users] Sharing our "Containerized Ceph and Radosgw Playground"

2024-02-22 Thread Ansgar Jazdzewski
Hi Folks,

We are excited to announce plans for building a larger Ceph-S3 setup.
To ensure its success, extensive testing is needed in advance.

Some of these tests don't need a full-blown Ceph cluster on hardware
but still require meeting specific logical requirements, such as a
multi-site S3 setup. To address this, we're pleased to introduce our
ceph-s3-box test environment, which you can access on GitHub:

https://github.com/hetznercloud/ceph-s3-box

In the spirit of collaboration and knowledge sharing, we've made this
testing environment publicly available today. We hope that it proves
as beneficial to you as it has been for us.

If you have any questions or suggestions, please don't hesitate to reach out.

Cheers,
Ansgar
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Scrub stuck and 'pg has invalid (post-split) stat'

2024-02-22 Thread Cedric
Yes the osd_scrub_invalid_stats is set to true.

We are thinking about the use of "ceph pg_mark_unfound_lost revert"
action, but we wonder if there is a risk of data loss.

On Thu, Feb 22, 2024 at 11:50 AM Eugen Block  wrote:
>
> I found a config to force scrub invalid PGs, what is your current
> setting on that?
>
> ceph config get osd osd_scrub_invalid_stats
> true
>
> The config reference states:
>
> > Forces extra scrub to fix stats marked as invalid.
>
> But the default seems to be true, so I'd expect it's true in your case
> as well?
>
> Zitat von Cedric :
>
> > Thanks Eugen for the suggestion, yes we have tried, also repeering
> > concerned PGs, still the same issue.
> >
> > Looking at the code it seems the split-mode message is triggered when
> > the PG as ""stats_invalid": true,", here is the result of a query:
> >
> > "stats_invalid": true,
> > "dirty_stats_invalid": false,
> > "omap_stats_invalid": false,
> > "hitset_stats_invalid": false,
> > "hitset_bytes_stats_invalid": false,
> > "pin_stats_invalid": false,
> > "manifest_stats_invalid": false,
> >
> > I also provide again cluster informations that was lost in previous
> > missed reply all. Don't hesitate to ask more if needed I would be
> > glade to provide them.
> >
> > Cédric
> >
> >
> > On Thu, Feb 22, 2024 at 11:04 AM Eugen Block  wrote:
> >>
> >> Hm, I wonder if setting (and unsetting after a while) noscrub and
> >> nodeep-scrub has any effect. Have you tried that?
> >>
> >> Zitat von Cedric :
> >>
> >> > Update: we have run fsck and re-shard on all bluestore volume, seems
> >> > sharding were not applied.
> >> >
> >> > Unfortunately scrubs and deep-scrubs are still stuck on PGs of the
> >> > pool that is suffering the issue, but other PGs scrubs well.
> >> >
> >> > The next step will be to remove the cache tier as suggested, but its
> >> > not available yet as PGs needs to be scrubbed in order for the cache
> >> > tier can be activated.
> >> >
> >> > As we are struggling to make this cluster works again, any help
> >> > would be greatly appreciated.
> >> >
> >> > Cédric
> >> >
> >> >> On 20 Feb 2024, at 20:22, Cedric  wrote:
> >> >>
> >> >> Thanks Eugen, sorry about the missed reply to all.
> >> >>
> >> >> The reason we still have the cache tier is because we were not able
> >> >> to flush all dirty entry to remove it (as per the procedure), so
> >> >> the cluster as been migrated from HDD/SSD to NVME a while ago but
> >> >> tiering remains, unfortunately.
> >> >>
> >> >> So actually we are trying to understand the root cause
> >> >>
> >> >> On Tue, Feb 20, 2024 at 1:43 PM Eugen Block  wrote:
> >> >>>
> >> >>> Please don't drop the list from your response.
> >> >>>
> >> >>> The first question coming to mind is, why do you have a cache-tier if
> >> >>> all your pools are on nvme decices anyway? I don't see any benefit 
> >> >>> here.
> >> >>> Did you try the suggested workaround and disable the cache-tier?
> >> >>>
> >> >>> Zitat von Cedric :
> >> >>>
> >>  Thanks Eugen, see attached infos.
> >> 
> >>  Some more details:
> >> 
> >>  - commands that actually hangs: ceph balancer status ; rbd -p vms ls ;
> >>  rados -p vms_cache cache-flush-evict-all
> >>  - all scrub running on vms_caches pgs are stall / start in a loop
> >>  without actually doing anything
> >>  - all io are 0 both from ceph status or iostat on nodes
> >> 
> >>  On Tue, Feb 20, 2024 at 10:00 AM Eugen Block  wrote:
> >> >
> >> > Hi,
> >> >
> >> > some more details would be helpful, for example what's the pool size
> >> > of the cache pool? Did you issue a PG split before or during the
> >> > upgrade? This thread [1] deals with the same problem, the described
> >> > workaround was to set hit_set_count to 0 and disable the cache layer
> >> > until that is resolved. Afterwards you could enable the cache layer
> >> > again. But keep in mind that the code for cache tier is entirely
> >> > removed in Reef (IIRC).
> >> >
> >> > Regards,
> >> > Eugen
> >> >
> >> > [1]
> >> >
> >> https://ceph-users.ceph.narkive.com/zChyOq5D/ceph-strange-issue-after-adding-a-cache-osd
> >> >
> >> > Zitat von Cedric :
> >> >
> >> >> Hello,
> >> >>
> >> >> Following an upgrade from Nautilus (14.2.22) to Pacific (16.2.13), 
> >> >> we
> >> >> encounter an issue with a cache pool becoming completely stuck,
> >> >> relevant messages below:
> >> >>
> >> >> pg xx.x has invalid (post-split) stats; must scrub before tier agent
> >> >> can activate
> >> >>
> >> >> In OSD logs, scrubs are starting in a loop without succeeding for 
> >> >> all
> >> >> pg of this pool.
> >> >>
> >> >> What we already tried without luck so far:
> >> >>
> >> >> - shutdown / restart OSD
> >> >> - rebalance pg between OSD
> >> >> - raise the m

[ceph-users] Re: Scrub stuck and 'pg has invalid (post-split) stat'

2024-02-22 Thread Eugen Block
I found a config to force scrub invalid PGs, what is your current  
setting on that?


ceph config get osd osd_scrub_invalid_stats
true

The config reference states:


Forces extra scrub to fix stats marked as invalid.


But the default seems to be true, so I'd expect it's true in your case  
as well?


Zitat von Cedric :


Thanks Eugen for the suggestion, yes we have tried, also repeering
concerned PGs, still the same issue.

Looking at the code it seems the split-mode message is triggered when
the PG as ""stats_invalid": true,", here is the result of a query:

"stats_invalid": true,
"dirty_stats_invalid": false,
"omap_stats_invalid": false,
"hitset_stats_invalid": false,
"hitset_bytes_stats_invalid": false,
"pin_stats_invalid": false,
"manifest_stats_invalid": false,

I also provide again cluster informations that was lost in previous
missed reply all. Don't hesitate to ask more if needed I would be
glade to provide them.

Cédric


On Thu, Feb 22, 2024 at 11:04 AM Eugen Block  wrote:


Hm, I wonder if setting (and unsetting after a while) noscrub and
nodeep-scrub has any effect. Have you tried that?

Zitat von Cedric :

> Update: we have run fsck and re-shard on all bluestore volume, seems
> sharding were not applied.
>
> Unfortunately scrubs and deep-scrubs are still stuck on PGs of the
> pool that is suffering the issue, but other PGs scrubs well.
>
> The next step will be to remove the cache tier as suggested, but its
> not available yet as PGs needs to be scrubbed in order for the cache
> tier can be activated.
>
> As we are struggling to make this cluster works again, any help
> would be greatly appreciated.
>
> Cédric
>
>> On 20 Feb 2024, at 20:22, Cedric  wrote:
>>
>> Thanks Eugen, sorry about the missed reply to all.
>>
>> The reason we still have the cache tier is because we were not able
>> to flush all dirty entry to remove it (as per the procedure), so
>> the cluster as been migrated from HDD/SSD to NVME a while ago but
>> tiering remains, unfortunately.
>>
>> So actually we are trying to understand the root cause
>>
>> On Tue, Feb 20, 2024 at 1:43 PM Eugen Block  wrote:
>>>
>>> Please don't drop the list from your response.
>>>
>>> The first question coming to mind is, why do you have a cache-tier if
>>> all your pools are on nvme decices anyway? I don't see any benefit here.
>>> Did you try the suggested workaround and disable the cache-tier?
>>>
>>> Zitat von Cedric :
>>>
 Thanks Eugen, see attached infos.

 Some more details:

 - commands that actually hangs: ceph balancer status ; rbd -p vms ls ;
 rados -p vms_cache cache-flush-evict-all
 - all scrub running on vms_caches pgs are stall / start in a loop
 without actually doing anything
 - all io are 0 both from ceph status or iostat on nodes

 On Tue, Feb 20, 2024 at 10:00 AM Eugen Block  wrote:
>
> Hi,
>
> some more details would be helpful, for example what's the pool size
> of the cache pool? Did you issue a PG split before or during the
> upgrade? This thread [1] deals with the same problem, the described
> workaround was to set hit_set_count to 0 and disable the cache layer
> until that is resolved. Afterwards you could enable the cache layer
> again. But keep in mind that the code for cache tier is entirely
> removed in Reef (IIRC).
>
> Regards,
> Eugen
>
> [1]
>  
https://ceph-users.ceph.narkive.com/zChyOq5D/ceph-strange-issue-after-adding-a-cache-osd

>
> Zitat von Cedric :
>
>> Hello,
>>
>> Following an upgrade from Nautilus (14.2.22) to Pacific (16.2.13), we
>> encounter an issue with a cache pool becoming completely stuck,
>> relevant messages below:
>>
>> pg xx.x has invalid (post-split) stats; must scrub before tier agent
>> can activate
>>
>> In OSD logs, scrubs are starting in a loop without succeeding for all
>> pg of this pool.
>>
>> What we already tried without luck so far:
>>
>> - shutdown / restart OSD
>> - rebalance pg between OSD
>> - raise the memory on OSD
>> - repeer PG
>>
>> Any idea what is causing this? any help will be greatly appreciated
>>
>> Thanks
>>
>> Cédric
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>
>>>
>>>






___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Scrub stuck and 'pg has invalid (post-split) stat'

2024-02-22 Thread Cedric
Thanks Eugen for the suggestion, yes we have tried, also repeering
concerned PGs, still the same issue.

Looking at the code it seems the split-mode message is triggered when
the PG as ""stats_invalid": true,", here is the result of a query:

"stats_invalid": true,
"dirty_stats_invalid": false,
"omap_stats_invalid": false,
"hitset_stats_invalid": false,
"hitset_bytes_stats_invalid": false,
"pin_stats_invalid": false,
"manifest_stats_invalid": false,

I also provide again cluster informations that was lost in previous
missed reply all. Don't hesitate to ask more if needed I would be
glade to provide them.

Cédric


On Thu, Feb 22, 2024 at 11:04 AM Eugen Block  wrote:
>
> Hm, I wonder if setting (and unsetting after a while) noscrub and
> nodeep-scrub has any effect. Have you tried that?
>
> Zitat von Cedric :
>
> > Update: we have run fsck and re-shard on all bluestore volume, seems
> > sharding were not applied.
> >
> > Unfortunately scrubs and deep-scrubs are still stuck on PGs of the
> > pool that is suffering the issue, but other PGs scrubs well.
> >
> > The next step will be to remove the cache tier as suggested, but its
> > not available yet as PGs needs to be scrubbed in order for the cache
> > tier can be activated.
> >
> > As we are struggling to make this cluster works again, any help
> > would be greatly appreciated.
> >
> > Cédric
> >
> >> On 20 Feb 2024, at 20:22, Cedric  wrote:
> >>
> >> Thanks Eugen, sorry about the missed reply to all.
> >>
> >> The reason we still have the cache tier is because we were not able
> >> to flush all dirty entry to remove it (as per the procedure), so
> >> the cluster as been migrated from HDD/SSD to NVME a while ago but
> >> tiering remains, unfortunately.
> >>
> >> So actually we are trying to understand the root cause
> >>
> >> On Tue, Feb 20, 2024 at 1:43 PM Eugen Block  wrote:
> >>>
> >>> Please don't drop the list from your response.
> >>>
> >>> The first question coming to mind is, why do you have a cache-tier if
> >>> all your pools are on nvme decices anyway? I don't see any benefit here.
> >>> Did you try the suggested workaround and disable the cache-tier?
> >>>
> >>> Zitat von Cedric :
> >>>
>  Thanks Eugen, see attached infos.
> 
>  Some more details:
> 
>  - commands that actually hangs: ceph balancer status ; rbd -p vms ls ;
>  rados -p vms_cache cache-flush-evict-all
>  - all scrub running on vms_caches pgs are stall / start in a loop
>  without actually doing anything
>  - all io are 0 both from ceph status or iostat on nodes
> 
>  On Tue, Feb 20, 2024 at 10:00 AM Eugen Block  wrote:
> >
> > Hi,
> >
> > some more details would be helpful, for example what's the pool size
> > of the cache pool? Did you issue a PG split before or during the
> > upgrade? This thread [1] deals with the same problem, the described
> > workaround was to set hit_set_count to 0 and disable the cache layer
> > until that is resolved. Afterwards you could enable the cache layer
> > again. But keep in mind that the code for cache tier is entirely
> > removed in Reef (IIRC).
> >
> > Regards,
> > Eugen
> >
> > [1]
> > https://ceph-users.ceph.narkive.com/zChyOq5D/ceph-strange-issue-after-adding-a-cache-osd
> >
> > Zitat von Cedric :
> >
> >> Hello,
> >>
> >> Following an upgrade from Nautilus (14.2.22) to Pacific (16.2.13), we
> >> encounter an issue with a cache pool becoming completely stuck,
> >> relevant messages below:
> >>
> >> pg xx.x has invalid (post-split) stats; must scrub before tier agent
> >> can activate
> >>
> >> In OSD logs, scrubs are starting in a loop without succeeding for all
> >> pg of this pool.
> >>
> >> What we already tried without luck so far:
> >>
> >> - shutdown / restart OSD
> >> - rebalance pg between OSD
> >> - raise the memory on OSD
> >> - repeer PG
> >>
> >> Any idea what is causing this? any help will be greatly appreciated
> >>
> >> Thanks
> >>
> >> Cédric
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >>>
> >>>
> >>>
>
>
>
--- RAW STORAGE ---
CLASS SIZEAVAIL USED  RAW USED  %RAW USED
nvme   419 TiB  143 TiB  276 TiB   276 TiB  65.78
TOTAL  419 TiB  143 TiB  276 TiB   276 TiB  65.78

--- POOLS ---
POOL   ID   PGS   STORED   (DATA)   (OMAP)  OBJECTS USED   
(DATA)   (OMAP)  %USED  MAX AVAIL  QUOTA OBJECTS  QUOTA BYTESDIRTY  USED 
COMPR  UNDER COMPR
images  

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-22 Thread Eugen Block
If it crashes after two minutes you have your time window to look for.  
Restart the mds daemon and capture everything after that until the  
crash.


Zitat von nguyenvand...@baoviet.com.vn:

it suck too long log, could you pls guide me how to grep/filter  
important things in logs ?

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-22 Thread nguyenvandiep
Feb 22 13:39:43 cephgw02 conmon[1340927]:   log_file 
/var/lib/ceph/crash/2024-02-22T06:39:43.618845Z_78ee38bc-9115-4bc6-8c3a-4bf42284c970/log
Feb 22 13:39:43 cephgw02 conmon[1340927]: --- end dump of recent events ---
Feb 22 13:39:45 cephgw02 systemd[1]: 
ceph-258af72a-cff3-11eb-a261-d4f5ef25154c@mds.cephfs.cephgw02.qqsavr.service: 
Main process exited, code=exited, status=134/n/a
Feb 22 13:39:45 cephgw02 systemd[1]: 
ceph-258af72a-cff3-11eb-a261-d4f5ef25154c@mds.cephfs.cephgw02.qqsavr.service: 
Failed with result 'exit-code'.
Feb 22 13:39:55 cephgw02 systemd[1]: 
ceph-258af72a-cff3-11eb-a261-d4f5ef25154c@mds.cephfs.cephgw02.qqsavr.service: 
Service RestartSec=10s expired, scheduling restart.
Feb 22 13:39:55 cephgw02 systemd[1]: 
ceph-258af72a-cff3-11eb-a261-d4f5ef25154c@mds.cephfs.cephgw02.qqsavr.service: 
Scheduled restart job, restart counter is at 4.
Feb 22 13:39:55 cephgw02 systemd[1]: Stopped Ceph mds.cephfs.cephgw02.qqsavr 
for 258af72a-cff3-11eb-a261-d4f5ef25154c.
Feb 22 13:39:55 cephgw02 systemd[1]: Starting Ceph mds.cephfs.cephgw02.qqsavr 
for 258af72a-cff3-11eb-a261-d4f5ef25154c...
Feb 22 13:39:56 cephgw02 bash[1341570]: 
636aedce612582726e7eb229eb63d0f641491994fc76f1de1828b61d07d6e8ac
Feb 22 13:39:56 cephgw02 systemd[1]: Started Ceph mds.cephfs.cephgw02.qqsavr 
for 258af72a-cff3-11eb-a261-d4f5ef25154c.
Feb 22 13:39:56 cephgw02 conmon[1341683]: debug 2024-02-22T06:39:56.151+ 
7fc4e3309780  0 set uid:gid to 167:167 (ceph:ceph)
Feb 22 13:39:56 cephgw02 conmon[1341683]: debug 2024-02-22T06:39:56.151+ 
7fc4e3309780  0 ceph version 16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) 
pacific (stable), process ceph-mds, pid 7
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Some questions about cephadm

2024-02-22 Thread Eugen Block

Hi,

just responding to the last questions:


   - After the bootstrap, the Web interface was accessible :
  - How can I access the wizard page again? If I don't use it the first
  time I could not find another way to get it.


I don't know how to recall the wizard, but you should be able to  
create a new dashboard user with your desired role (e. g.  
administrator) from the CLI:


ceph dashboard ac-user-create  [] -i 


  - I had a problem with telemetry, I did not configure telemetry, then
  when I clicked the button, the web gui became inaccessible.!!!


You can see what happened in the active MGR log.

Zitat von wodel youchi :


Hi,

I have some questions about ceph using cephadm.

I used to deploy ceph using ceph-ansible, now I have to move to cephadm, I
am in my learning journey.


   - How can I tell my cluster that it's a part of an HCI deployment? With
   ceph-ansible it was easy using is_hci : yes
   - The documentation of ceph does not indicate what versions of grafana,
   prometheus, ...etc should be used with a certain version.
  - I am trying to deploy Quincy, I did a bootstrap to see what
  containers were downloaded and their version.
  - I am asking because I need to use a local registry to deploy those
  images.
   - After the bootstrap, the Web interface was accessible :
  - How can I access the wizard page again? If I don't use it the first
  time I could not find another way to get it.
  - I had a problem with telemetry, I did not configure telemetry, then
  when I clicked the button, the web gui became inaccessible.!!!



Regards.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-22 Thread nguyenvandiep
it suck too long log, could you pls guide me how to grep/filter important 
things in logs ?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Scrub stuck and 'pg has invalid (post-split) stat'

2024-02-22 Thread Eugen Block
Hm, I wonder if setting (and unsetting after a while) noscrub and  
nodeep-scrub has any effect. Have you tried that?


Zitat von Cedric :

Update: we have run fsck and re-shard on all bluestore volume, seems  
sharding were not applied.


Unfortunately scrubs and deep-scrubs are still stuck on PGs of the  
pool that is suffering the issue, but other PGs scrubs well.


The next step will be to remove the cache tier as suggested, but its  
not available yet as PGs needs to be scrubbed in order for the cache  
tier can be activated.


As we are struggling to make this cluster works again, any help  
would be greatly appreciated.


Cédric


On 20 Feb 2024, at 20:22, Cedric  wrote:

Thanks Eugen, sorry about the missed reply to all.

The reason we still have the cache tier is because we were not able  
to flush all dirty entry to remove it (as per the procedure), so  
the cluster as been migrated from HDD/SSD to NVME a while ago but  
tiering remains, unfortunately.


So actually we are trying to understand the root cause

On Tue, Feb 20, 2024 at 1:43 PM Eugen Block  wrote:


Please don't drop the list from your response.

The first question coming to mind is, why do you have a cache-tier if
all your pools are on nvme decices anyway? I don't see any benefit here.
Did you try the suggested workaround and disable the cache-tier?

Zitat von Cedric :


Thanks Eugen, see attached infos.

Some more details:

- commands that actually hangs: ceph balancer status ; rbd -p vms ls ;
rados -p vms_cache cache-flush-evict-all
- all scrub running on vms_caches pgs are stall / start in a loop
without actually doing anything
- all io are 0 both from ceph status or iostat on nodes

On Tue, Feb 20, 2024 at 10:00 AM Eugen Block  wrote:


Hi,

some more details would be helpful, for example what's the pool size
of the cache pool? Did you issue a PG split before or during the
upgrade? This thread [1] deals with the same problem, the described
workaround was to set hit_set_count to 0 and disable the cache layer
until that is resolved. Afterwards you could enable the cache layer
again. But keep in mind that the code for cache tier is entirely
removed in Reef (IIRC).

Regards,
Eugen

[1]
https://ceph-users.ceph.narkive.com/zChyOq5D/ceph-strange-issue-after-adding-a-cache-osd

Zitat von Cedric :


Hello,

Following an upgrade from Nautilus (14.2.22) to Pacific (16.2.13), we
encounter an issue with a cache pool becoming completely stuck,
relevant messages below:

pg xx.x has invalid (post-split) stats; must scrub before tier agent
can activate

In OSD logs, scrubs are starting in a loop without succeeding for all
pg of this pool.

What we already tried without luck so far:

- shutdown / restart OSD
- rebalance pg between OSD
- raise the memory on OSD
- repeer PG

Any idea what is causing this? any help will be greatly appreciated

Thanks

Cédric
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io







___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-22 Thread Eugen Block

There a couple of ways, find your MDS daemon with:

ceph fs status  -> should show you the to-be-active MDS

On that host run:

cephadm logs --name mds.{MDS}

or alternatively:

cephadm ls --no-detail | grep mds

journalctl -u ceph-{FSID}@mds.{MDS} --no-pager > {MDS}.log

Zitat von nguyenvand...@baoviet.com.vn:


How can we get log of MDS, pls guide me T_T
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-22 Thread nguyenvandiep
How can we get log of MDS, pls guide me T_T
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-22 Thread Eugen Block

What does the MDS log when it crashes?

Zitat von nguyenvand...@baoviet.com.vn:

We have 6 node ( 3 OSD-node and 3 service node), t2/3 OSD nodes was  
powered off and we got big problem

pls check ceph-s result below
now we cannot start mds service, ( we tried to start but it stopped  
after 2 minute)

Now my application cannot access to NFS exported Folder

What should we do

[root@cephgw01 /]# ceph -s
  cluster:
id: 258af72a-cff3-11eb-a261-d4f5ef25154c
health: HEALTH_WARN
3 failed cephadm daemon(s)
1 filesystem is degraded
insufficient standby MDS daemons available
1 nearfull osd(s)
Low space hindering backfill (add storage if this  
doesn't resolve itself): 21 pgs backfill_toofull

15 pool(s) nearfull
11 daemons have recently crashed

  services:
mon: 6 daemons, quorum  
cephgw03,cephosd01,cephgw01,cephosd03,cephgw02,cephosd02 (age 30h)
mgr: cephgw01.vwoffq(active, since 17h), standbys:  
cephgw02.nauphz, cephgw03.aipvii

mds: 1/1 daemons up
osd: 29 osds: 29 up (since 40h), 29 in (since 29h); 402  
remapped pgs

rgw: 2 daemons active (2 hosts, 1 zones)
tcmu-runner: 18 daemons active (2 hosts)

  data:
volumes: 0/1 healthy, 1 recovering
pools:   15 pools, 1457 pgs
objects: 36.87M objects, 25 TiB
usage:   75 TiB used, 41 TiB / 116 TiB avail
pgs: 17759672/110607480 objects misplaced (16.056%)
 1055 active+clean
 363  active+remapped+backfill_wait
 18   active+remapped+backfilling
 14   active+remapped+backfill_toofull
 7active+remapped+backfill_wait+backfill_toofull

  io:
client:   2.0 MiB/s rd, 395 KiB/s wr, 73 op/s rd, 19 op/s wr
recovery: 32 MiB/s, 45 objects/s
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-22 Thread nguyenvandiep
We have 6 node ( 3 OSD-node and 3 service node), t2/3 OSD nodes was powered off 
and we got big problem
pls check ceph-s result below
now we cannot start mds service, ( we tried to start but it stopped after 2 
minute)
Now my application cannot access to NFS exported Folder

What should we do

[root@cephgw01 /]# ceph -s
  cluster:
id: 258af72a-cff3-11eb-a261-d4f5ef25154c
health: HEALTH_WARN
3 failed cephadm daemon(s)
1 filesystem is degraded
insufficient standby MDS daemons available
1 nearfull osd(s)
Low space hindering backfill (add storage if this doesn't resolve 
itself): 21 pgs backfill_toofull
15 pool(s) nearfull
11 daemons have recently crashed
 
  services:
mon: 6 daemons, quorum 
cephgw03,cephosd01,cephgw01,cephosd03,cephgw02,cephosd02 (age 30h)
mgr: cephgw01.vwoffq(active, since 17h), standbys: cephgw02.nauphz, 
cephgw03.aipvii
mds: 1/1 daemons up
osd: 29 osds: 29 up (since 40h), 29 in (since 29h); 402 remapped pgs
rgw: 2 daemons active (2 hosts, 1 zones)
tcmu-runner: 18 daemons active (2 hosts)
 
  data:
volumes: 0/1 healthy, 1 recovering
pools:   15 pools, 1457 pgs
objects: 36.87M objects, 25 TiB
usage:   75 TiB used, 41 TiB / 116 TiB avail
pgs: 17759672/110607480 objects misplaced (16.056%)
 1055 active+clean
 363  active+remapped+backfill_wait
 18   active+remapped+backfilling
 14   active+remapped+backfill_toofull
 7active+remapped+backfill_wait+backfill_toofull
 
  io:
client:   2.0 MiB/s rd, 395 KiB/s wr, 73 op/s rd, 19 op/s wr
recovery: 32 MiB/s, 45 objects/s
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Size return by df

2024-02-22 Thread Albert Shih
Hi, 

I got one cephfs with one volume and subvolumes with a erasure coding. 

If I don't set any quota when I run df on the client I got 

0ccbc438-d109-4c5f-b47b-70f8df707c2c/vo   5,8P 78T  5,8P   2% /vo

The 78T seem to be the size use by ceph on disk (on the hardware I mean). And I 
find that very good. 

But If I set a quota 

setfattr -n ceph.quota.max_bytes -v 109951162777600 vo

then on the same client I got 

0ccbc438-d109-4c5f-b47b-70f8df707c2c/vo   100T 51T   50T  51% /vo

and that are the size of the data (I using erasure 4/2 so 51*1.5 ~ 77 To)

Is they are any way to keep the first answer ?

Regards

-- 
Albert SHIH 🦫 🐸
France
Heure locale/Local time:
jeu. 22 févr. 2024 08:44:17 CET
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] help me understand ceph snapshot sizes

2024-02-22 Thread garcetto
good morning,
  i am trying to understand ceph snapshot sizing. For example if i have 2.7
GB volume and i create a snap on it, the sizing says:

(BEFORE SNAP)

rbd du volumes/volume-d954915c-1dc1-41cb-8bf0-0c67e7b6e080
NAME PROVISIONED USED

volume-d954915c-1dc1-41cb-8bf0-0c67e7b6e080 10 GiB 2.7 GiB

(AFTER SNAP)

rbd du volumes/volume-d954915c-1dc1-41cb-8bf0-0c67e7b6e080
NAME PROVISIONED USED

volume-d954915c-1dc1-41cb-8bf0-0c67e7b6e080@snap01 10 GiB 2.7 GiB
volume-d954915c-1dc1-41cb-8bf0-0c67e7b6e080 10 GiB 0 B

 10 GiB 2.7 GiB

why the SNAP is 2.7 GB? is not going to be 0 GB in the beginning and only
after the COW start doying its thing (copying original write blocks to snap
before overwrite with new ones) it should grow?

am i wrong?

thank you.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io