[ceph-users] Re: [EXTERNAL] Re: OSDs flapping with "_open_alloc loaded 132 GiB in 2930776 extents available 113 GiB"

2021-09-20 Thread Janne Johansson
Den mån 20 sep. 2021 kl 18:02 skrev Dave Piper :
> Okay - I've finally got full debug logs from the flapping OSDs. The raw logs 
> are both 100M each - I can email them directly if necessary. (Igor I've 
> already sent these your way.)
> Both flapping OSDs are reporting the same "bluefs _allocate failed to 
> allocate" errors as before.  I've also noticed additional errors about 
> corrupt blocks which I haven't noticed previously.  E.g.
> 2021-09-08T10:42:13.316+ 7f705c4f2f00  3 rocksdb: 
> [table/block_based_table_reader.cc:1117] Encountered error while reading data 
> from compression dictionary block Corruption: block checksum mismatch: 
> expected 0, got 2324967111  in db/501397.sst offset 18446744073709551615 size 
> 18446744073709551615

Those 18446744073709551615 numbers are -1 (or the largest 64bit int),
so something makes the numbers wrap around below zero.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rocksdb corruption with 16.2.6

2021-09-20 Thread Andrej Filipcic


Hi,

Some further investigation on the failed OSDs:

1 out of 8 OSDs actually has hardware issue,

[16841006.029332] sd 0:0:10:0: [sdj] tag#96 FAILED Result: 
hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=2s
[16841006.037917] sd 0:0:10:0: [sdj] tag#34 FAILED Result: 
hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK cmd_age=2s
[16841006.047558] sd 0:0:10:0: [sdj] tag#96 Sense Key : Medium Error 
[current]
[16841006.057647] sd 0:0:10:0: [sdj] tag#34 CDB: Read(16) 88 00 00 00 00 
00 00 07 e7 70 00 00 00 10 00 00
[16841006.064693] sd 0:0:10:0: [sdj] tag#96 Add. Sense: Unrecovered read 
error
[16841006.073988] blk_update_request: I/O error, dev sdj, sector 518000 
op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[16841006.080949] sd 0:0:10:0: [sdj] tag#96 CDB: Read(16) 88 00 00 00 00 
00 0b 95 d9 80 00 00 00 08 00 00


smartctl:
Error 23 occurred at disk power-on lifetime: 6105 hours (254 days + 9 hours)
  When the command that caused the error occurred, the device was 
active or idle.


  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 80 d9 95 0b  Error: UNC at LBA = 0x0b95d980 = 194369920

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --    
  60 00 10 70 e7 07 40 00  14d+02:46:05.704  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00  14d+02:46:05.703  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00  14d+02:46:05.703  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00  14d+02:46:05.703  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00  14d+02:46:05.703  READ FPDMA QUEUED

so, let's say, this might be hw fault, though the drive appears to be 
working fine.


But the other 7 show no hw related issues. The HDDs are Seagate Exos 
X16, enterprise grade,  servers are supermicro SSG-6029P-E1CR24L-AT059 
with ECC. There are no cpu or memory errors logged in the past months on 
the servers, which have been up for ~200 days. So it's is unlikely HW fault.


Is there something else that could be checked? I have left one OSD 
intact, so it can be checked further.


Best regards,
Andrej

On 20/09/2021 17:09, Neha Ojha wrote:

Can we please create a bluestore tracker issue for this
(if one does not exist already), where we can start capturing all the
relevant information needed to debug this? Given that this has been
encountered in previous 16.2.* versions, it doesn't sound like a
regression in 16.2.6 to me, rather an issue in pacific. In any case,
we'll prioritize fixing it.

Thanks,
Neha

On Mon, Sep 20, 2021 at 8:03 AM Andrej Filipcic  wrote:

On 20/09/2021 16:02, David Orman wrote:

Same question here, for clarity, was this on upgrading to 16.2.6 from
16.2.5? Or upgrading
from some other release?

from 16.2.5. but the OSD services were never restarted after upgrade to
.5, so it could be a leftover of previous issues.

Cheers,
Andrej

On Mon, Sep 20, 2021 at 8:57 AM Sean  wrote:

   I also ran into this with v16. In my case, trying to run a repair totally
exhausted the RAM on the box, and was unable to complete.

After removing/recreating the OSD, I did notice that it has a drastically
   smaller OMAP size than the other OSDs. I don’t know if that actually means
anything, but just wanted to mention it in case it does.

ID   CLASS  WEIGHT REWEIGHT  SIZE RAW USE  DATA OMAP META
AVAIL%USE   VAR   PGS  STATUS  TYPE NAME
14   hdd10.91409   1.0   11 TiB  3.3 TiB  3.2 TiB  4.6 MiB  5.4 GiB
   7.7 TiB  29.81  1.02   34  uposd.14
16   hdd10.91409   1.0   11 TiB  3.3 TiB  3.3 TiB   20 KiB  9.4 GiB
   7.6 TiB  30.03  1.03   35  uposd.16

~ Sean


On Sep 20, 2021 at 8:27:39 AM, Paul Mezzanini  wrote:


I got the exact same error on one of my OSDs when upgrading to 16.  I
used it as an exercise on trying to fix a corrupt rocksdb. A spent a few
days of poking with no success.  I got mostly tool crashes like you are
seeing with no forward progress.

I eventually just gave up, purged the OSD, did a smart long test on the
drive to be sure and then threw it back in the mix.  Been HEALTH OK for
a week now after it finished refilling the drive.


On 9/19/21 10:47 AM, Andrej Filipcic wrote:

2021-09-19T15:47:13.610+0200 7f8bc1f0e700  2 rocksdb:

[db_impl/db_impl_compaction_flush.cc:2344] Waiting after background

compaction error: Corruption: block checksum mismatch: expected

2427092066, got 4051549320  in db/251935.sst offset 18414386 size

4032, Accumulated background error counts: 1

2021-09-19T15:47:13.636+0200 7f8bbacf1700 -1 rocksdb: submit_common

error: Corruption: block checksum mismatch: expected 2427092066, got

4051549320  in db/251935.sst offset 18414386 size 4032 code = 2

Rocksdb transaction:

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___

[ceph-users] RocksDB options for HDD, SSD, NVME Mixed productions

2021-09-20 Thread mhnx
Hello everyone!
I want to understand the concept and tune my rocksDB options on nautilus
14.2.16.

 osd.178 spilled over 102 GiB metadata from 'db' device (24 GiB used of
50 GiB) to slow device
 osd.180 spilled over 91 GiB metadata from 'db' device (33 GiB used of
50 GiB) to slow device

The problem is, I have the spill over warnings like the rest of the
community.
I tuned RocksDB Options with the settings below but the problem still
exists and I wonder if I did anything wrong. I still have the Spill Overs
and also some times index SSD's are getting down due to compaction problems
and can not start them until I do offline compaction.

Let me tell you about my hardware right?
Every server in my system has:
HDD -   19 x TOSHIBA  MG08SCA16TEY   16.0TB for EC pool.
SSD -3 x SAMSUNG  MZILS960HEHP/007 GXL0 960GB
NVME - 2 x PM1725b 1.6TB

I'm using Raid 1 Nvme for Bluestore DB. I dont have WAL.
19*50GB = 950GB total usage on NVME. (I was thinking use the rest but
regret it now)

So! Finally let's check my RocksDB Options:
[osd]
bluefs_buffered_io = true
bluestore_rocksdb_options =
compression=kNoCompression,max_write_buffer_number=32,min_write_buffer_number_to_merge=2,recycle_log_file_num=32,compaction_style=kCompactionStyleLevel,write_buffer_size=67108864,target_file_size_base=67108864,max_background_compactions=31,level0_file_num_compaction_trigger=8,level0_slowdown_writes_trigger=32,level0_stop_writes_trigger=64,flusher_threads=8,compaction_readahead_size=2MB,compaction_threads=16,
*max_bytes_for_level_base=536870912*,
*max_bytes_for_level_multiplier=10*

*"ceph osd df tree"  *to see ssd and hdd usage, omap and meta.

> ID  CLASS WEIGHT REWEIGHT SIZERAW USE DATAOMAPMETA
> AVAIL   %USE  VAR  PGS STATUS TYPE NAME
> -28280.04810- 280 TiB 169 TiB 166 TiB 688 GiB 2.4 TiB 111
> TiB 60.40 1.00   -host MHNX1
> 178   hdd   14.60149  1.0  15 TiB 8.6 TiB 8.5 TiB  44 KiB 126 GiB 6.0
> TiB 59.21 0.98 174 up osd.178
> 179   ssd0.87329  1.0 894 GiB 415 GiB  89 GiB 321 GiB 5.4 GiB 479
> GiB 46.46 0.77 104 up osd.179


I know the size of NVME is not suitable for 16TB HDD's. I should have more
but the expense is cutting us pieces. Because of that I think I'll see the
spill overs no matter what I do. But maybe I will make it better
with your help!

*My questions are:*
1- What is the meaning of (33 GiB used of 50 GiB)
2- Why it's not 50GiB / 50GiB ?
3- Do I have 17GiB unused area on the DB partition?
4- Is there anything wrong with my Rocksdb options?
5- How can I be sure and find the good Rocksdb Options for Ceph?
6- How can I measure the change and test it?
7- Do I need different RocksDB options for HDD's and SSD's ?
8- If I stop using Nvme Raid1 to gain x2 size and resize the DB's  to
160GiB. Is it worth to take Nvme faulty? Because I will lose 10HDD at the
same time but I have 10 Node and that's only %5 of the EC data . I use m=8
k=2.

P.S: There are so many people asking and searching around this. I hope it
will work this time.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Safe value for maximum speed backfilling

2021-09-20 Thread Kobi Ginon
you can see example below of changing it on the fly
sudo ceph tell osd.\* injectargs '--osd_max_backfills 4'
sudo ceph tell osd.\* injectargs '--osd_heartbeat_interval 15'
sudo ceph tell osd.\* injectargs '--osd_recovery_max_active 4'
sudo ceph tell osd.\* injectargs '--osd_recovery_op_priority 63'
sudo ceph tell osd.\* injectargs '--osd_client_op_priority 3'

‫בתאריך יום ב׳, 20 בספט׳ 2021 ב-17:29 מאת ‪Szabo, Istvan (Agoda)‬‏ <‪
istvan.sz...@agoda.com‬‏>:‬

> Hi,
>
> 7 node, ec 4:2 host based crush, ssd osds with nvme wal+db, what shouldn't
> cause any issue with these values?
>
> osd_max_backfills = 1
> osd_recovery_max_active = 1
> osd_recovery_op_priority = 1
>
> I want to speed it up but haven't really found any reference.
>
> Ty
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: etcd support

2021-09-20 Thread Kobi Ginon
hi
in general i see nothing you can do except supporting ssd's (like having a
pool of ssd's in your ceph cluster or use other shared storage with ssd's0
option we give (not production wize just for lab testing who ssuffers from
lack of Hardware) - is using memory (of the vm) for etcd
this way the perfromance of etcd will boost
but again not good for production :)
Regards
kobi ginon

‫בתאריך יום ב׳, 20 בספט׳ 2021 ב-20:59 מאת ‪Tony Liu‬‏ <‪
tonyliu0...@hotmail.com‬‏>:‬

> Hi,
>
> I wonder if anyone could share some experiences in etcd support by Ceph.
> My users build Kubernetes cluster in VMs on OpenStack with Ceph.
> With HDD (DB/WAL on SSD) volume, etcd performance test fails sometimes
> because of latency. With SSD (all SSD) volume, it works fine.
> I wonder if there is anything I can improve with HDD volume, or it has to
> be
> SSD volume to support etcd?
>
>
> Thanks!
> Tony
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERNAL] Re: OSDs flapping with "_open_alloc loaded 132 GiB in 2930776 extents available 113 GiB"

2021-09-20 Thread Dave Piper
Okay - I've finally got full debug logs from the flapping OSDs. The raw logs 
are both 100M each - I can email them directly if necessary. (Igor I've already 
sent these your way.)

Both flapping OSDs are reporting the same "bluefs _allocate failed to allocate" 
errors as before.  I've also noticed additional errors about corrupt blocks 
which I haven't noticed previously.  E.g.

2021-09-08T10:42:13.316+ 7f705c4f2f00  3 rocksdb: 
[table/block_based_table_reader.cc:1117] Encountered error while reading data 
from compression dictionary block Corruption: block checksum mismatch: expected 
0, got 2324967111  in db/501397.sst offset 18446744073709551615 size 
18446744073709551615


FTR (I realised I never posted this before) our osd tree is:

[qs-admin@condor_sc0 ~]$ sudo docker exec fe4eb75fc98b ceph osd tree
ID  CLASS  WEIGHT   TYPE NAMESTATUS  REWEIGHT  PRI-AFF
-1 1.02539  root default
-7 0.34180  host condor_sc0
 1ssd  0.34180  osd.1  down 0  1.0
-5 0.34180  host condor_sc1
 0ssd  0.34180  osd.0up   1.0  1.0
-3 0.34180  host condor_sc2
 2ssd  0.34180  osd.2  down   1.0  1.0


I've still not managed to get the ceph-bluestore-tool output - will get back to 
you on that.


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Safe value for maximum speed backfilling

2021-09-20 Thread Szabo, Istvan (Agoda)
Hi,

7 node, ec 4:2 host based crush, ssd osds with nvme wal+db, what shouldn't 
cause any issue with these values?

osd_max_backfills = 1
osd_recovery_max_active = 1
osd_recovery_op_priority = 1

I want to speed it up but haven't really found any reference.

Ty
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Getting cephadm "stderr:Inferring config" every minute in log - for a monitor that doesn't exist and shouldn't exist

2021-09-20 Thread Robert W. Eckert
That may be pointing in the right direction - I see

   {
   "style": "legacy",
   "name": "mon.rhel1.robeckert.us",
   "fsid": "fe3a7cb0-69ca-11eb-8d45-c86000d08867",
   "systemd_unit": "ceph-...@rhel1.robeckert.us",
   "enabled": false,
   "state": "stopped",
   "host_version": "16.2.5"
   },

And
{
"style": "cephadm:v1",
"name": "mon.rhel1",
"fsid": "fe3a7cb0-69ca-11eb-8d45-c86000d08867",
"systemd_unit": "ceph-fe3a7cb0-69ca-11eb-8d45-c86000d08867@mon.rhel1",
"enabled": true,
"state": "running",
"service_name": "mon",
"ports": [],
"ip": null,
"deployed_by": [

"quay.io/ceph/ceph@sha256:5d042251e1faa1408663508099cf97b256364300365d403ca5563a518060abac",

"quay.io/ceph/ceph@sha256:8a0f6f285edcd6488e2c91d3f9fa43534d37d7a9b37db1e0ff6691aae6466530"
],
"rank": null,
"rank_generation": null,
"memory_request": null,
"memory_limit": null,
"container_id": null,
"container_image_name": 
"quay.io/ceph/ceph@sha256:5d042251e1faa1408663508099cf97b256364300365d403ca5563a518060abac",
"container_image_id": null,
"container_image_digests": null,
"version": null,
"started": null,
"created": "2021-09-20T15:46:42.166486Z",
"deployed": "2021-09-20T15:46:41.136498Z",
"configured": "2021-09-20T15:47:23.002007Z"
}

As the output.

In /var/lib/ceph/mon (not 
/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon), there is a link:
ceph-rhel1.robeckert.us -> 
/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/

I removed the link and the error did clear up.  (hopefully it will stay gone 
:-))

Thanks,

Rob





-Original Message-
From: Fyodor Ustinov  
Sent: Monday, September 20, 2021 2:01 PM
To: Robert W. Eckert 
Cc: ceph-users 
Subject: Re: [ceph-users] Getting cephadm "stderr:Inferring config" every 
minute in log - for a monitor that doesn't exist and shouldn't exist

Hi!

It looks exactly the same as the problem I had. 

Try the `cephadm ls` command on the `rhel1.robeckert.us` node. 

- Original Message -
> From: "Robert W. Eckert" 
> To: "ceph-users" 
> Sent: Monday, 20 September, 2021 18:28:08
> Subject: [ceph-users] Getting cephadm "stderr:Inferring config" every 
> minute in log - for a monitor that doesn't exist and shouldn't exist

> Hi- after the upgrade to 16.2.6, I am now seeing this error:
> 
> 9/20/21 10:45:00 AM[ERR]cephadm exited with an error code: 1, 
> stderr:Inferring config 
> /var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert
> .us/config
> ERROR: [Errno 2] No such file or directory:
> '/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/config'
> Traceback (most recent call last): File 
> "/usr/share/ceph/mgr/cephadm/serve.py",
> line 1366, in _remote_connection yield (conn, connr) File 
> "/usr/share/ceph/mgr/cephadm/serve.py", line 1263, in _run_cephadm 
> code,
> '\n'.join(err))) orchestrator._interface.OrchestratorError: cephadm 
> exited with an error code: 1, stderr:Inferring config 
> /var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert
> .us/config
> ERROR: [Errno 2] No such file or directory:
> '/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/config'
> 
> The rhel1 server has a monitor under
> /var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1 , and it 
> is up and active.  If I copy the
> /var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1 to 
> /var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert
> .us the error clears, then cephadm removes the folder with the domain 
> name, and the error starts showing up in the log again.
> 
> After a few minutes, I get the all clear:
> 
> 9/20/21 11:00:00 AM[INF]overall HEALTH_OK
> 
> 9/20/21 10:58:38 AM[INF]Removing key for mon.
> 
> 9/20/21 10:58:37 AM[INF]Removing daemon mon.rhel1.robeckert.us from 
> rhel1.robeckert.us
> 
> 9/20/21 10:58:37 AM[INF]Removing monitor rhel1.robeckert.us from monmap...
> 
> 9/20/21 10:58:37 AM[INF]Safe to remove mon.rhel1.robeckert.us: not in 
> monmap (['rhel1', 'story', 'cube'])
> 
> 9/20/21 10:52:21 AM[INF]Cluster is now healthy
> 
> 9/20/21 10:52:21 AM[INF]Health check cleared: CEPHADM_REFRESH_FAILED (was:
> failed to probe daemons or devices)
> 
> 9/20/21 10:51:15 AM
> 
> 
> I checked all of the configurations and can't find any reason it wants 
> the monitor with the domain.
> 
> But then the errors start up again - I haven't found any messages 
> before they start up, I am going to monitor more closely.
> This doesn't seem to affect any functionality, just lots of messages in the 
> log.
> 
> Thanks,
> Rob
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
> email to ceph-users-le...@ceph.io
___
cep

[ceph-users] Re: Getting cephadm "stderr:Inferring config" every minute in log - for a monitor that doesn't exist and shouldn't exist

2021-09-20 Thread Fyodor Ustinov
Hi!

It looks exactly the same as the problem I had. 

Try the `cephadm ls` command on the `rhel1.robeckert.us` node. 

- Original Message -
> From: "Robert W. Eckert" 
> To: "ceph-users" 
> Sent: Monday, 20 September, 2021 18:28:08
> Subject: [ceph-users] Getting cephadm "stderr:Inferring config" every minute 
> in log - for a monitor that doesn't exist
> and shouldn't exist

> Hi- after the upgrade to 16.2.6, I am now seeing this error:
> 
> 9/20/21 10:45:00 AM[ERR]cephadm exited with an error code: 1, stderr:Inferring
> config
> /var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/config
> ERROR: [Errno 2] No such file or directory:
> '/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/config'
> Traceback (most recent call last): File 
> "/usr/share/ceph/mgr/cephadm/serve.py",
> line 1366, in _remote_connection yield (conn, connr) File
> "/usr/share/ceph/mgr/cephadm/serve.py", line 1263, in _run_cephadm code,
> '\n'.join(err))) orchestrator._interface.OrchestratorError: cephadm exited 
> with
> an error code: 1, stderr:Inferring config
> /var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/config
> ERROR: [Errno 2] No such file or directory:
> '/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/config'
> 
> The rhel1 server has a monitor under
> /var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1 , and it is up 
> and
> active.  If I copy the
> /var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1 to
> /var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us the
> error clears, then cephadm removes the folder with the domain name, and the
> error starts showing up in the log again.
> 
> After a few minutes, I get the all clear:
> 
> 9/20/21 11:00:00 AM[INF]overall HEALTH_OK
> 
> 9/20/21 10:58:38 AM[INF]Removing key for mon.
> 
> 9/20/21 10:58:37 AM[INF]Removing daemon mon.rhel1.robeckert.us from
> rhel1.robeckert.us
> 
> 9/20/21 10:58:37 AM[INF]Removing monitor rhel1.robeckert.us from monmap...
> 
> 9/20/21 10:58:37 AM[INF]Safe to remove mon.rhel1.robeckert.us: not in monmap
> (['rhel1', 'story', 'cube'])
> 
> 9/20/21 10:52:21 AM[INF]Cluster is now healthy
> 
> 9/20/21 10:52:21 AM[INF]Health check cleared: CEPHADM_REFRESH_FAILED (was:
> failed to probe daemons or devices)
> 
> 9/20/21 10:51:15 AM
> 
> 
> I checked all of the configurations and can't find any reason it wants the
> monitor with the domain.
> 
> But then the errors start up again - I haven't found any messages before they
> start up, I am going to monitor more closely.
> This doesn't seem to affect any functionality, just lots of messages in the 
> log.
> 
> Thanks,
> Rob
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] etcd support

2021-09-20 Thread Tony Liu
Hi,

I wonder if anyone could share some experiences in etcd support by Ceph.
My users build Kubernetes cluster in VMs on OpenStack with Ceph.
With HDD (DB/WAL on SSD) volume, etcd performance test fails sometimes
because of latency. With SSD (all SSD) volume, it works fine.
I wonder if there is anything I can improve with HDD volume, or it has to be
SSD volume to support etcd?


Thanks!
Tony
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rocksdb corruption with 16.2.6

2021-09-20 Thread Paul Mezzanini
I was doing a rolling upgrade from 14.2.x -> 15.2.x (wait a week) -> 
16.2.5.  It was the last jump that had the hiccup. I'm doing the 16.2.5 
-> .6 upgrade as I type this.  So far, so good.


-paul


On 9/20/21 10:02 AM, David Orman wrote:

For clarity, was this on upgrading to 16.2.6 from 16.2.5? Or upgrading
from some other release?

On Mon, Sep 20, 2021 at 8:33 AM Paul Mezzanini  wrote:

I got the exact same error on one of my OSDs when upgrading to 16.  I
used it as an exercise on trying to fix a corrupt rocksdb. A spent a few
days of poking with no success.  I got mostly tool crashes like you are
seeing with no forward progress.

I eventually just gave up, purged the OSD, did a smart long test on the
drive to be sure and then threw it back in the mix.  Been HEALTH OK for
a week now after it finished refilling the drive.


On 9/19/21 10:47 AM, Andrej Filipcic wrote:

2021-09-19T15:47:13.610+0200 7f8bc1f0e700  2 rocksdb:
[db_impl/db_impl_compaction_flush.cc:2344] Waiting after background
compaction error: Corruption: block checksum mismatch: expected
2427092066, got 4051549320  in db/251935.sst offset 18414386 size
4032, Accumulated background error counts: 1
2021-09-19T15:47:13.636+0200 7f8bbacf1700 -1 rocksdb: submit_common
error: Corruption: block checksum mismatch: expected 2427092066, got
4051549320  in db/251935.sst offset 18414386 size 4032 code = 2
Rocksdb transaction:

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Getting cephadm "stderr:Inferring config" every minute in log - for a monitor that doesn't exist and shouldn't exist

2021-09-20 Thread Robert W. Eckert
Just after I sent, the error message started again:
9/20/21 11:30:00 AM
[WRN]
ERROR: [Errno 2] No such file or directory: 
'/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/config'

9/20/21 11:30:00 AM
[WRN]
host rhel1.robeckert.us `cephadm ceph-volume` failed: cephadm exited with an 
error code: 1, stderr:Inferring config 
/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/config

9/20/21 11:30:00 AM
[WRN]
[WRN] CEPHADM_REFRESH_FAILED: failed to probe daemons or devices

9/20/21 11:30:00 AM
[WRN]
Health detail: HEALTH_WARN failed to probe daemons or devices

9/20/21 11:29:45 AM
[ERR]
cephadm exited with an error code: 1, stderr:Inferring config 
/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/config
 ERROR: [Errno 2] No such file or directory: 
'/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/config'
 Traceback (most recent call last): File 
"/usr/share/ceph/mgr/cephadm/serve.py", line 1366, in _remote_connection yield 
(conn, connr) File "/usr/share/ceph/mgr/cephadm/serve.py", line 1263, in 
_run_cephadm code, '\n'.join(err))) orchestrator._interface.OrchestratorError: 
cephadm exited with an error code: 1, stderr:Inferring config 
/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/config
 ERROR: [Errno 2] No such file or directory: 
'/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/config'

9/20/21 11:28:39 AM
[ERR]
cephadm exited with an error code: 1, stderr:Inferring config 
/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/config
 ERROR: [Errno 2] No such file or directory: 
'/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/config'
 Traceback (most recent call last): File 
"/usr/share/ceph/mgr/cephadm/serve.py", line 1366, in _remote_connection yield 
(conn, connr) File "/usr/share/ceph/mgr/cephadm/serve.py", line 1263, in 
_run_cephadm code, '\n'.join(err))) orchestrator._interface.OrchestratorError: 
cephadm exited with an error code: 1, stderr:Inferring config 
/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/config
 ERROR: [Errno 2] No such file or directory: 
'/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/config'

9/20/21 11:27:37 AM
[ERR]
cephadm exited with an error code: 1, stderr:Inferring config 
/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/config
 ERROR: [Errno 2] No such file or directory: 
'/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/config'
 Traceback (most recent call last): File 
"/usr/share/ceph/mgr/cephadm/serve.py", line 1366, in _remote_connection yield 
(conn, connr) File "/usr/share/ceph/mgr/cephadm/serve.py", line 1263, in 
_run_cephadm code, '\n'.join(err))) orchestrator._interface.OrchestratorError: 
cephadm exited with an error code: 1, stderr:Inferring config 
/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/config
 ERROR: [Errno 2] No such file or directory: 
'/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/config'

9/20/21 11:26:31 AM
[ERR]
cephadm exited with an error code: 1, stderr:Inferring config 
/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/config
 ERROR: [Errno 2] No such file or directory: 
'/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/config'
 Traceback (most recent call last): File 
"/usr/share/ceph/mgr/cephadm/serve.py", line 1366, in _remote_connection yield 
(conn, connr) File "/usr/share/ceph/mgr/cephadm/serve.py", line 1263, in 
_run_cephadm code, '\n'.join(err))) orchestrator._interface.OrchestratorError: 
cephadm exited with an error code: 1, stderr:Inferring config 
/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/config
 ERROR: [Errno 2] No such file or directory: 
'/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/config'

9/20/21 11:25:29 AM
[ERR]
cephadm exited with an error code: 1, stderr:Inferring config 
/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/config
 ERROR: [Errno 2] No such file or directory: 
'/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/config'
 Traceback (most recent call last): File 
"/usr/share/ceph/mgr/cephadm/serve.py", line 1366, in _remote_connection yield 
(conn, connr) File "/usr/share/ceph/mgr/cephadm/serve.py", line 1263, in 
_run_cephadm code, '\n'.join(err))) orchestrator._interface.OrchestratorError: 
cephadm exited with an error code: 1, stderr:Inferring config 
/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/config
 ERROR: [Errno 2] No such file or directory: 
'/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/config'

9/20/21 11:24:28 AM
[ERR]
cephadm exited with an error code: 1, stderr:Inferring config 
/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/

[ceph-users] Re: rocksdb corruption with 16.2.6

2021-09-20 Thread Mark Nelson

FWIW, we've had similar reports in the past:


https://tracker.ceph.com/issues/37282

https://tracker.ceph.com/issues/48002

https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/2GBK5NJFOSQGMN25GQ3CZNX4W2ZGQV5U/?sort=date

https://www.spinics.net/lists/ceph-users/msg59466.html

https://www.bountysource.com/issues/49313514-block-checksum-mismatch


...but we aren't the only ones:

https://github.com/facebook/rocksdb/issues/5251

https://github.com/facebook/rocksdb/issues/7033

https://jira.mariadb.org/browse/MDEV-20456

https://lists.launchpad.net/maria-discuss/msg05614.html

https://githubmemory.com/repo/openethereum/openethereum/issues/416

https://githubmemory.com/repo/FISCO-BCOS/FISCO-BCOS/issues/1895

https://groups.google.com/g/rocksdb/c/gUD4kCGTw-0/m/uLpFwkO5AgAJ


At least in one case for us, the user was using consumer grade SSDs 
without power loss protection.  I don't think we ever fully diagnosed if 
that was the cause though.  Another case potentially was related to high 
memory usage on the node.  Hardware errors are a legitimate concern here 
so probably checking dmesg/smartctl/etc is warranted.  ECC memory 
obviously helps too (or rather the lack of which makes it more difficult 
to diagnose).



For folks that have experienced this, any info you can give related to 
the HW involved would be helpful.  We (and other projects) have seen 
similar things over the years but this is a notoriously difficult issue 
to track down given that it could be any one of many different things 
and it may or may not be our code.



Mark


On 9/20/21 10:09 AM, Neha Ojha wrote:

Can we please create a bluestore tracker issue for this
(if one does not exist already), where we can start capturing all the
relevant information needed to debug this? Given that this has been
encountered in previous 16.2.* versions, it doesn't sound like a
regression in 16.2.6 to me, rather an issue in pacific. In any case,
we'll prioritize fixing it.

Thanks,
Neha

On Mon, Sep 20, 2021 at 8:03 AM Andrej Filipcic  wrote:

On 20/09/2021 16:02, David Orman wrote:

Same question here, for clarity, was this on upgrading to 16.2.6 from
16.2.5? Or upgrading
from some other release?

from 16.2.5. but the OSD services were never restarted after upgrade to
.5, so it could be a leftover of previous issues.

Cheers,
Andrej

On Mon, Sep 20, 2021 at 8:57 AM Sean  wrote:

   I also ran into this with v16. In my case, trying to run a repair totally
exhausted the RAM on the box, and was unable to complete.

After removing/recreating the OSD, I did notice that it has a drastically
   smaller OMAP size than the other OSDs. I don’t know if that actually means
anything, but just wanted to mention it in case it does.

ID   CLASS  WEIGHT REWEIGHT  SIZE RAW USE  DATA OMAP META
AVAIL%USE   VAR   PGS  STATUS  TYPE NAME
14   hdd10.91409   1.0   11 TiB  3.3 TiB  3.2 TiB  4.6 MiB  5.4 GiB
   7.7 TiB  29.81  1.02   34  uposd.14
16   hdd10.91409   1.0   11 TiB  3.3 TiB  3.3 TiB   20 KiB  9.4 GiB
   7.6 TiB  30.03  1.03   35  uposd.16

~ Sean


On Sep 20, 2021 at 8:27:39 AM, Paul Mezzanini  wrote:


I got the exact same error on one of my OSDs when upgrading to 16.  I
used it as an exercise on trying to fix a corrupt rocksdb. A spent a few
days of poking with no success.  I got mostly tool crashes like you are
seeing with no forward progress.

I eventually just gave up, purged the OSD, did a smart long test on the
drive to be sure and then threw it back in the mix.  Been HEALTH OK for
a week now after it finished refilling the drive.


On 9/19/21 10:47 AM, Andrej Filipcic wrote:

2021-09-19T15:47:13.610+0200 7f8bc1f0e700  2 rocksdb:

[db_impl/db_impl_compaction_flush.cc:2344] Waiting after background

compaction error: Corruption: block checksum mismatch: expected

2427092066, got 4051549320  in db/251935.sst offset 18414386 size

4032, Accumulated background error counts: 1

2021-09-19T15:47:13.636+0200 7f8bbacf1700 -1 rocksdb: submit_common

error: Corruption: block checksum mismatch: expected 2427092066, got

4051549320  in db/251935.sst offset 18414386 size 4032 code = 2

Rocksdb transaction:

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--
_
 prof. dr. Andrej Filipcic,   E-mail: andrej.filip...@ijs.si
 Department of Experimental High Energy Physics - F9
 Jozef Stefan Institute, Jamova 39, P.o.Box 3000
 SI-1001 Ljubljana, Slovenia
 Tel.: +386-1-477-3674Fax: +386-1-425-7074
--

[ceph-users] Getting cephadm "stderr:Inferring config" every minute in log - for a monitor that doesn't exist and shouldn't exist

2021-09-20 Thread Robert W. Eckert
Hi- after the upgrade to 16.2.6, I am now seeing this error:

9/20/21 10:45:00 AM[ERR]cephadm exited with an error code: 1, stderr:Inferring 
config 
/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/config
 ERROR: [Errno 2] No such file or directory: 
'/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/config'
 Traceback (most recent call last): File 
"/usr/share/ceph/mgr/cephadm/serve.py", line 1366, in _remote_connection yield 
(conn, connr) File "/usr/share/ceph/mgr/cephadm/serve.py", line 1263, in 
_run_cephadm code, '\n'.join(err))) orchestrator._interface.OrchestratorError: 
cephadm exited with an error code: 1, stderr:Inferring config 
/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/config
 ERROR: [Errno 2] No such file or directory: 
'/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/config'

The rhel1 server has a monitor under 
/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1 , and it is up and 
active.  If I copy the 
/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1 to 
/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us the 
error clears, then cephadm removes the folder with the domain name, and the 
error starts showing up in the log again.

After a few minutes, I get the all clear:

9/20/21 11:00:00 AM[INF]overall HEALTH_OK

9/20/21 10:58:38 AM[INF]Removing key for mon.

9/20/21 10:58:37 AM[INF]Removing daemon mon.rhel1.robeckert.us from 
rhel1.robeckert.us

9/20/21 10:58:37 AM[INF]Removing monitor rhel1.robeckert.us from monmap...

9/20/21 10:58:37 AM[INF]Safe to remove mon.rhel1.robeckert.us: not in monmap 
(['rhel1', 'story', 'cube'])

9/20/21 10:52:21 AM[INF]Cluster is now healthy

9/20/21 10:52:21 AM[INF]Health check cleared: CEPHADM_REFRESH_FAILED (was: 
failed to probe daemons or devices)

9/20/21 10:51:15 AM


I checked all of the configurations and can't find any reason it wants the 
monitor with the domain.

But then the errors start up again - I haven't found any messages before they 
start up, I am going to monitor more closely.
This doesn't seem to affect any functionality, just lots of messages in the log.

Thanks,
Rob

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rocksdb corruption with 16.2.6

2021-09-20 Thread Neha Ojha
Can we please create a bluestore tracker issue for this
(if one does not exist already), where we can start capturing all the
relevant information needed to debug this? Given that this has been
encountered in previous 16.2.* versions, it doesn't sound like a
regression in 16.2.6 to me, rather an issue in pacific. In any case,
we'll prioritize fixing it.

Thanks,
Neha

On Mon, Sep 20, 2021 at 8:03 AM Andrej Filipcic  wrote:
>
> On 20/09/2021 16:02, David Orman wrote:
> > Same question here, for clarity, was this on upgrading to 16.2.6 from
> > 16.2.5? Or upgrading
> > from some other release?
>
> from 16.2.5. but the OSD services were never restarted after upgrade to
> .5, so it could be a leftover of previous issues.
>
> Cheers,
> Andrej
> >
> > On Mon, Sep 20, 2021 at 8:57 AM Sean  wrote:
> >>   I also ran into this with v16. In my case, trying to run a repair totally
> >> exhausted the RAM on the box, and was unable to complete.
> >>
> >> After removing/recreating the OSD, I did notice that it has a drastically
> >>   smaller OMAP size than the other OSDs. I don’t know if that actually 
> >> means
> >> anything, but just wanted to mention it in case it does.
> >>
> >> ID   CLASS  WEIGHT REWEIGHT  SIZE RAW USE  DATA OMAP META
> >>AVAIL%USE   VAR   PGS  STATUS  TYPE NAME
> >> 14   hdd10.91409   1.0   11 TiB  3.3 TiB  3.2 TiB  4.6 MiB  5.4 GiB
> >>   7.7 TiB  29.81  1.02   34  uposd.14
> >> 16   hdd10.91409   1.0   11 TiB  3.3 TiB  3.3 TiB   20 KiB  9.4 GiB
> >>   7.6 TiB  30.03  1.03   35  uposd.16
> >>
> >> ~ Sean
> >>
> >>
> >> On Sep 20, 2021 at 8:27:39 AM, Paul Mezzanini  wrote:
> >>
> >>> I got the exact same error on one of my OSDs when upgrading to 16.  I
> >>> used it as an exercise on trying to fix a corrupt rocksdb. A spent a few
> >>> days of poking with no success.  I got mostly tool crashes like you are
> >>> seeing with no forward progress.
> >>>
> >>> I eventually just gave up, purged the OSD, did a smart long test on the
> >>> drive to be sure and then threw it back in the mix.  Been HEALTH OK for
> >>> a week now after it finished refilling the drive.
> >>>
> >>>
> >>> On 9/19/21 10:47 AM, Andrej Filipcic wrote:
> >>>
> >>> 2021-09-19T15:47:13.610+0200 7f8bc1f0e700  2 rocksdb:
> >>>
> >>> [db_impl/db_impl_compaction_flush.cc:2344] Waiting after background
> >>>
> >>> compaction error: Corruption: block checksum mismatch: expected
> >>>
> >>> 2427092066, got 4051549320  in db/251935.sst offset 18414386 size
> >>>
> >>> 4032, Accumulated background error counts: 1
> >>>
> >>> 2021-09-19T15:47:13.636+0200 7f8bbacf1700 -1 rocksdb: submit_common
> >>>
> >>> error: Corruption: block checksum mismatch: expected 2427092066, got
> >>>
> >>> 4051549320  in db/251935.sst offset 18414386 size 4032 code = 2
> >>>
> >>> Rocksdb transaction:
> >>>
> >>> ___
> >>> ceph-users mailing list -- ceph-users@ceph.io
> >>> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>>
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> --
> _
> prof. dr. Andrej Filipcic,   E-mail: andrej.filip...@ijs.si
> Department of Experimental High Energy Physics - F9
> Jozef Stefan Institute, Jamova 39, P.o.Box 3000
> SI-1001 Ljubljana, Slovenia
> Tel.: +386-1-477-3674Fax: +386-1-425-7074
> -
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rocksdb corruption with 16.2.6

2021-09-20 Thread Andrej Filipcic

On 20/09/2021 16:02, David Orman wrote:

Same question here, for clarity, was this on upgrading to 16.2.6 from
16.2.5? Or upgrading
from some other release?


from 16.2.5. but the OSD services were never restarted after upgrade to 
.5, so it could be a leftover of previous issues.


Cheers,
Andrej


On Mon, Sep 20, 2021 at 8:57 AM Sean  wrote:

  I also ran into this with v16. In my case, trying to run a repair totally
exhausted the RAM on the box, and was unable to complete.

After removing/recreating the OSD, I did notice that it has a drastically
  smaller OMAP size than the other OSDs. I don’t know if that actually means
anything, but just wanted to mention it in case it does.

ID   CLASS  WEIGHT REWEIGHT  SIZE RAW USE  DATA OMAP META
   AVAIL%USE   VAR   PGS  STATUS  TYPE NAME
14   hdd10.91409   1.0   11 TiB  3.3 TiB  3.2 TiB  4.6 MiB  5.4 GiB
  7.7 TiB  29.81  1.02   34  uposd.14
16   hdd10.91409   1.0   11 TiB  3.3 TiB  3.3 TiB   20 KiB  9.4 GiB
  7.6 TiB  30.03  1.03   35  uposd.16

~ Sean


On Sep 20, 2021 at 8:27:39 AM, Paul Mezzanini  wrote:


I got the exact same error on one of my OSDs when upgrading to 16.  I
used it as an exercise on trying to fix a corrupt rocksdb. A spent a few
days of poking with no success.  I got mostly tool crashes like you are
seeing with no forward progress.

I eventually just gave up, purged the OSD, did a smart long test on the
drive to be sure and then threw it back in the mix.  Been HEALTH OK for
a week now after it finished refilling the drive.


On 9/19/21 10:47 AM, Andrej Filipcic wrote:

2021-09-19T15:47:13.610+0200 7f8bc1f0e700  2 rocksdb:

[db_impl/db_impl_compaction_flush.cc:2344] Waiting after background

compaction error: Corruption: block checksum mismatch: expected

2427092066, got 4051549320  in db/251935.sst offset 18414386 size

4032, Accumulated background error counts: 1

2021-09-19T15:47:13.636+0200 7f8bbacf1700 -1 rocksdb: submit_common

error: Corruption: block checksum mismatch: expected 2427092066, got

4051549320  in db/251935.sst offset 18414386 size 4032 code = 2

Rocksdb transaction:

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



--
_
   prof. dr. Andrej Filipcic,   E-mail: andrej.filip...@ijs.si
   Department of Experimental High Energy Physics - F9
   Jozef Stefan Institute, Jamova 39, P.o.Box 3000
   SI-1001 Ljubljana, Slovenia
   Tel.: +386-1-477-3674Fax: +386-1-425-7074
-

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rocksdb corruption with 16.2.6

2021-09-20 Thread Sean
 In my case it happened after upgrading from v16.2.4 to v16.2.5 a couple
months ago.

~ Sean


On Sep 20, 2021 at 9:02:45 AM, David Orman  wrote:

> Same question here, for clarity, was this on upgrading to 16.2.6 from
> 16.2.5? Or upgrading
> from some other release?
>
> On Mon, Sep 20, 2021 at 8:57 AM Sean  wrote:
>
>
>  I also ran into this with v16. In my case, trying to run a repair totally
>
> exhausted the RAM on the box, and was unable to complete.
>
>
> After removing/recreating the OSD, I did notice that it has a drastically
>
>  smaller OMAP size than the other OSDs. I don’t know if that actually means
>
> anything, but just wanted to mention it in case it does.
>
>
> ID   CLASS  WEIGHT REWEIGHT  SIZE RAW USE  DATA OMAP META
>
>   AVAIL%USE   VAR   PGS  STATUS  TYPE NAME
>
> 14   hdd10.91409   1.0   11 TiB  3.3 TiB  3.2 TiB  4.6 MiB  5.4 GiB
>
>  7.7 TiB  29.81  1.02   34  uposd.14
>
> 16   hdd10.91409   1.0   11 TiB  3.3 TiB  3.3 TiB   20 KiB  9.4 GiB
>
>  7.6 TiB  30.03  1.03   35  uposd.16
>
>
> ~ Sean
>
>
>
> On Sep 20, 2021 at 8:27:39 AM, Paul Mezzanini  wrote:
>
>
> > I got the exact same error on one of my OSDs when upgrading to 16.  I
>
> > used it as an exercise on trying to fix a corrupt rocksdb. A spent a few
>
> > days of poking with no success.  I got mostly tool crashes like you are
>
> > seeing with no forward progress.
>
> >
>
> > I eventually just gave up, purged the OSD, did a smart long test on the
>
> > drive to be sure and then threw it back in the mix.  Been HEALTH OK for
>
> > a week now after it finished refilling the drive.
>
> >
>
> >
>
> > On 9/19/21 10:47 AM, Andrej Filipcic wrote:
>
> >
>
> > 2021-09-19T15:47:13.610+0200 7f8bc1f0e700  2 rocksdb:
>
> >
>
> > [db_impl/db_impl_compaction_flush.cc:2344] Waiting after background
>
> >
>
> > compaction error: Corruption: block checksum mismatch: expected
>
> >
>
> > 2427092066, got 4051549320  in db/251935.sst offset 18414386 size
>
> >
>
> > 4032, Accumulated background error counts: 1
>
> >
>
> > 2021-09-19T15:47:13.636+0200 7f8bbacf1700 -1 rocksdb: submit_common
>
> >
>
> > error: Corruption: block checksum mismatch: expected 2427092066, got
>
> >
>
> > 4051549320  in db/251935.sst offset 18414386 size 4032 code = 2
>
> >
>
> > Rocksdb transaction:
>
> >
>
> > ___
>
> > ceph-users mailing list -- ceph-users@ceph.io
>
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
> >
>
> ___
>
> ceph-users mailing list -- ceph-users@ceph.io
>
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSDs unable to mount BlueFS after reboot

2021-09-20 Thread Stefan Kooman

On 9/20/21 12:00, Davíð Steinn Geirsson wrote:



Does the SAS controller run the latest firmware?


As far as I can tell yes. Avago's website does not seem to list these
anymore, but they are running firmware version 20 which is the latest I
can find references to in a web search.

This machine has been chugging along like this for years (it was a single-
node ZFS NFS server before) and I've never had any such issues before.




I'm not sure what your failure domain is, but I would certainly want to try
to reproduce this issue.


I'd be interested to hear any ideas you have about that. The failure domain
is host[1], but this is a 3-node cluster so there isn't much room for taking
a machine down for longer periods. Taking OSDs down is no problem.


Reboot for starters. And a "yank the power cord" next.



The two other machines in the cluster have very similar hardware and software
so I am concerned about seeing the same there on reboot. Backfilling these
16TB spinners takes a long time and is still running, I'm not going to reboot
either of the other nodes until that is finished.


Yeah, definitely don't reboot any other node until cluster is HEALTH_OK. 
But that's also the point, if those 3 hosts are all in the same rack and 
connected to the same power bar, sooner or later this might happen 
involuntarily. And if there is important data on there, you want to take 
mitigate the risks now, not when it's too late.


Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rocksdb corruption with 16.2.6

2021-09-20 Thread David Orman
Same question here, for clarity, was this on upgrading to 16.2.6 from
16.2.5? Or upgrading
from some other release?

On Mon, Sep 20, 2021 at 8:57 AM Sean  wrote:
>
>  I also ran into this with v16. In my case, trying to run a repair totally
> exhausted the RAM on the box, and was unable to complete.
>
> After removing/recreating the OSD, I did notice that it has a drastically
>  smaller OMAP size than the other OSDs. I don’t know if that actually means
> anything, but just wanted to mention it in case it does.
>
> ID   CLASS  WEIGHT REWEIGHT  SIZE RAW USE  DATA OMAP META
>   AVAIL%USE   VAR   PGS  STATUS  TYPE NAME
> 14   hdd10.91409   1.0   11 TiB  3.3 TiB  3.2 TiB  4.6 MiB  5.4 GiB
>  7.7 TiB  29.81  1.02   34  uposd.14
> 16   hdd10.91409   1.0   11 TiB  3.3 TiB  3.3 TiB   20 KiB  9.4 GiB
>  7.6 TiB  30.03  1.03   35  uposd.16
>
> ~ Sean
>
>
> On Sep 20, 2021 at 8:27:39 AM, Paul Mezzanini  wrote:
>
> > I got the exact same error on one of my OSDs when upgrading to 16.  I
> > used it as an exercise on trying to fix a corrupt rocksdb. A spent a few
> > days of poking with no success.  I got mostly tool crashes like you are
> > seeing with no forward progress.
> >
> > I eventually just gave up, purged the OSD, did a smart long test on the
> > drive to be sure and then threw it back in the mix.  Been HEALTH OK for
> > a week now after it finished refilling the drive.
> >
> >
> > On 9/19/21 10:47 AM, Andrej Filipcic wrote:
> >
> > 2021-09-19T15:47:13.610+0200 7f8bc1f0e700  2 rocksdb:
> >
> > [db_impl/db_impl_compaction_flush.cc:2344] Waiting after background
> >
> > compaction error: Corruption: block checksum mismatch: expected
> >
> > 2427092066, got 4051549320  in db/251935.sst offset 18414386 size
> >
> > 4032, Accumulated background error counts: 1
> >
> > 2021-09-19T15:47:13.636+0200 7f8bbacf1700 -1 rocksdb: submit_common
> >
> > error: Corruption: block checksum mismatch: expected 2427092066, got
> >
> > 4051549320  in db/251935.sst offset 18414386 size 4032 code = 2
> >
> > Rocksdb transaction:
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rocksdb corruption with 16.2.6

2021-09-20 Thread David Orman
For clarity, was this on upgrading to 16.2.6 from 16.2.5? Or upgrading
from some other release?

On Mon, Sep 20, 2021 at 8:33 AM Paul Mezzanini  wrote:
>
> I got the exact same error on one of my OSDs when upgrading to 16.  I
> used it as an exercise on trying to fix a corrupt rocksdb. A spent a few
> days of poking with no success.  I got mostly tool crashes like you are
> seeing with no forward progress.
>
> I eventually just gave up, purged the OSD, did a smart long test on the
> drive to be sure and then threw it back in the mix.  Been HEALTH OK for
> a week now after it finished refilling the drive.
>
>
> On 9/19/21 10:47 AM, Andrej Filipcic wrote:
> > 2021-09-19T15:47:13.610+0200 7f8bc1f0e700  2 rocksdb:
> > [db_impl/db_impl_compaction_flush.cc:2344] Waiting after background
> > compaction error: Corruption: block checksum mismatch: expected
> > 2427092066, got 4051549320  in db/251935.sst offset 18414386 size
> > 4032, Accumulated background error counts: 1
> > 2021-09-19T15:47:13.636+0200 7f8bbacf1700 -1 rocksdb: submit_common
> > error: Corruption: block checksum mismatch: expected 2427092066, got
> > 4051549320  in db/251935.sst offset 18414386 size 4032 code = 2
> > Rocksdb transaction:
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSDs unable to mount BlueFS after reboot

2021-09-20 Thread Davíð Steinn Geirsson

On Mon, Sep 20, 2021 at 10:38:37AM +0200, Stefan Kooman wrote:
> On 9/16/21 13:42, Davíð Steinn Geirsson wrote:
> 
> > 
> > The 4 affected drives are of 3 different types from 2 different vendors:
> > ST16000NM001G-2KK103
> > ST12000VN0007-2GS116
> > WD60EFRX-68MYMN1
> > 
> > They are all connected through an LSI2308 SAS controller in IT mode. Other
> > drives that did not fail are also connected to the same controller.
> > 
> > There are no expanders in this particular machine, only a direct-attach
> > SAS backplane.
> 
> Does the SAS controller run the latest firmware?

As far as I can tell yes. Avago's website does not seem to list these
anymore, but they are running firmware version 20 which is the latest I
can find references to in a web search.

This machine has been chugging along like this for years (it was a single-
node ZFS NFS server before) and I've never had any such issues before.


> 
> I'm not sure what your failure domain is, but I would certainly want to try
> to reproduce this issue.

I'd be interested to hear any ideas you have about that. The failure domain
is host[1], but this is a 3-node cluster so there isn't much room for taking
a machine down for longer periods. Taking OSDs down is no problem.

The two other machines in the cluster have very similar hardware and software
so I am concerned about seeing the same there on reboot. Backfilling these
16TB spinners takes a long time and is still running, I'm not going to reboot
either of the other nodes until that is finished.


> 
> Gr. Stefan

Regards,
Davíð

[1] Mostly. Failure domain is host for every pool using the default CRUSH
rules. There is also an EC pool with m=5 k=7, with a custom CRUSH rule to
pick 3 hosts and 4 OSDs from each of the hosts.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rocksdb corruption with 16.2.6

2021-09-20 Thread Sean
 I also ran into this with v16. In my case, trying to run a repair totally
exhausted the RAM on the box, and was unable to complete.

After removing/recreating the OSD, I did notice that it has a drastically
 smaller OMAP size than the other OSDs. I don’t know if that actually means
anything, but just wanted to mention it in case it does.

ID   CLASS  WEIGHT REWEIGHT  SIZE RAW USE  DATA OMAP META
  AVAIL%USE   VAR   PGS  STATUS  TYPE NAME
14   hdd10.91409   1.0   11 TiB  3.3 TiB  3.2 TiB  4.6 MiB  5.4 GiB
 7.7 TiB  29.81  1.02   34  uposd.14
16   hdd10.91409   1.0   11 TiB  3.3 TiB  3.3 TiB   20 KiB  9.4 GiB
 7.6 TiB  30.03  1.03   35  uposd.16

~ Sean


On Sep 20, 2021 at 8:27:39 AM, Paul Mezzanini  wrote:

> I got the exact same error on one of my OSDs when upgrading to 16.  I
> used it as an exercise on trying to fix a corrupt rocksdb. A spent a few
> days of poking with no success.  I got mostly tool crashes like you are
> seeing with no forward progress.
>
> I eventually just gave up, purged the OSD, did a smart long test on the
> drive to be sure and then threw it back in the mix.  Been HEALTH OK for
> a week now after it finished refilling the drive.
>
>
> On 9/19/21 10:47 AM, Andrej Filipcic wrote:
>
> 2021-09-19T15:47:13.610+0200 7f8bc1f0e700  2 rocksdb:
>
> [db_impl/db_impl_compaction_flush.cc:2344] Waiting after background
>
> compaction error: Corruption: block checksum mismatch: expected
>
> 2427092066, got 4051549320  in db/251935.sst offset 18414386 size
>
> 4032, Accumulated background error counts: 1
>
> 2021-09-19T15:47:13.636+0200 7f8bbacf1700 -1 rocksdb: submit_common
>
> error: Corruption: block checksum mismatch: expected 2427092066, got
>
> 4051549320  in db/251935.sst offset 18414386 size 4032 code = 2
>
> Rocksdb transaction:
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rocksdb corruption with 16.2.6

2021-09-20 Thread Paul Mezzanini
I got the exact same error on one of my OSDs when upgrading to 16.  I 
used it as an exercise on trying to fix a corrupt rocksdb. A spent a few 
days of poking with no success.  I got mostly tool crashes like you are 
seeing with no forward progress.


I eventually just gave up, purged the OSD, did a smart long test on the 
drive to be sure and then threw it back in the mix.  Been HEALTH OK for 
a week now after it finished refilling the drive.



On 9/19/21 10:47 AM, Andrej Filipcic wrote:
2021-09-19T15:47:13.610+0200 7f8bc1f0e700  2 rocksdb: 
[db_impl/db_impl_compaction_flush.cc:2344] Waiting after background 
compaction error: Corruption: block checksum mismatch: expected 
2427092066, got 4051549320  in db/251935.sst offset 18414386 size 
4032, Accumulated background error counts: 1
2021-09-19T15:47:13.636+0200 7f8bbacf1700 -1 rocksdb: submit_common 
error: Corruption: block checksum mismatch: expected 2427092066, got 
4051549320  in db/251935.sst offset 18414386 size 4032 code = 2 
Rocksdb transaction: 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSDs unable to mount BlueFS after reboot

2021-09-20 Thread Stefan Kooman

On 9/16/21 13:42, Davíð Steinn Geirsson wrote:



The 4 affected drives are of 3 different types from 2 different vendors:
ST16000NM001G-2KK103
ST12000VN0007-2GS116
WD60EFRX-68MYMN1

They are all connected through an LSI2308 SAS controller in IT mode. Other
drives that did not fail are also connected to the same controller.

There are no expanders in this particular machine, only a direct-attach
SAS backplane.


Does the SAS controller run the latest firmware?

I'm not sure what your failure domain is, but I would certainly want to 
try to reproduce this issue.


Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Drop of performance after Nautilus to Pacific upgrade

2021-09-20 Thread Luis Domingues
We tested Ceph 16.2.6, and indeed, the performances came back to what we expect 
for this cluster.

Luis Domingues

‐‐‐ Original Message ‐‐‐

On Saturday, September 11th, 2021 at 9:55 AM, Luis Domingues 
 wrote:

> Hi Igor,
>
> I have a SSD for the physical DB volume. And indeed it has very high 
> utilisation during the benchmark. I will test 16.2.6.
>
> Thanks,
>
> Luis Domingues
>
> ‐‐‐ Original Message ‐‐‐
>
> On Friday, September 10th, 2021 at 5:57 PM, Igor Fedotov ifedo...@suse.de 
> wrote:
>
> > Hi Luis,
> >
> > some chances that you're hit by https://tracker.ceph.com/issues/52089.
> >
> > What is your physical DB volume configuration - are there fast
> >
> > standalone disks for that? If so are they showing high utilization
> >
> > during the benchmark?
> >
> > It makes sense to try 16.2.6 once available - would the problem go away?
> >
> > Thanks,
> >
> > Igor
> >
> > On 9/5/2021 8:45 PM, Luis Domingues wrote:
> >
> > > Hello,
> > >
> > > I run a test cluster of 3 machines with 24 HDDs each, running bare-metal 
> > > on CentOS 8. Long story short, I can have a bandwidth of ~ 1'200 MB/s 
> > > when I do a rados bench, writing objects of 128k, when the cluster is 
> > > installed with Nautilus.
> > >
> > > When I upgrade the cluster to Pacific, (using ceph-ansible to deploy 
> > > and/or upgrade), my performances drop to ~400 MB/s of bandwidth doing the 
> > > same rados bench.
> > >
> > > I am kind of clueless on what makes the performance drop so much. Does 
> > > someone have some ideas where I can dig to find the root of this 
> > > difference?
> > >
> > > Thanks,
> > >
> > > Luis Domingues
> > >
> > > ceph-users mailing list -- ceph-users@ceph.io
> > >
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> > ceph-users mailing list -- ceph-users@ceph.io
> >
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
> ceph-users mailing list -- ceph-users@ceph.io
>
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PGs stuck in unkown state

2021-09-20 Thread Mr. Gecko

Thank you Stefan!

My problem was that the ruleset I built had the failure domain set to 
rack, when I do not have any racks defined. I changed the failure domain 
to host as this is just a home lab environment. I reverted the ruleset 
on the pool, and it immediately started to recover and storage was 
available to my virtual machines again. I then fixed the failure domain 
and the ruleset works.


On 9/20/21 01:37, Stefan Kooman wrote:

On 9/20/21 07:51, Mr. Gecko wrote:

Hello,

I'll start by explaining what I have done. I was adding some new 
storage in attempt to setup a cache pool according to 
https://docs.ceph.com/en/latest/dev/cache-pool/ by doing the following.


1. I upgraded all servers in cluster to ceph 15.2.14 which put the 
system into recovery for out of sync data.
2. I added 2 SSDs as OSDs to the cluster which immediately cause ceph 
to balance onto the SSDs.

3. I added 2 new crush rules which map to SSD storage vs HDD storage.\


I guess this is were things go wrong. Have you tested the CRUSH rules 
beforehand? To see if the right OSDs get mapped, or any at all.


I would revert the crush rule change for now to try get your PGs 
active+clean.


If that works, than try to find out (with crushtool for example) why 
the new CRUSH rule sets do not map the OSDs.


Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Adding cache tier to an existing objectstore cluster possible?

2021-09-20 Thread Szabo, Istvan (Agoda)
These are the processes in the iotop in 1 node. I think it's compacting but it 
is always like this, never finish.


  59936 be/4 ceph0.00 B/s   10.08 M/s  0.00 % 53.07 % ceph-osd -f 
--cluster ceph --id 46 --setuser ceph --setgroup ceph [bstore_kv_sync]
  66097 be/4 ceph0.00 B/s6.96 M/s  0.00 % 43.11 % ceph-osd -f 
--cluster ceph --id 48 --setuser ceph --setgroup ceph [bstore_kv_sync]
  63145 be/4 ceph0.00 B/s5.82 M/s  0.00 % 40.49 % ceph-osd -f 
--cluster ceph --id 47 --setuser ceph --setgroup ceph [bstore_kv_sync]
  51150 be/4 ceph0.00 B/s3.21 M/s  0.00 % 10.50 % ceph-osd -f 
--cluster ceph --id 43 --setuser ceph --setgroup ceph [bstore_kv_sync]
  53909 be/4 ceph0.00 B/s2.91 M/s  0.00 %  9.98 % ceph-osd -f 
--cluster ceph --id 44 --setuser ceph --setgroup ceph [bstore_kv_sync]
  57066 be/4 ceph0.00 B/s2.18 M/s  0.00 %  8.66 % ceph-osd -f 
--cluster ceph --id 45 --setuser ceph --setgroup ceph [bstore_kv_sync]
  36672 be/4 ceph0.00 B/s2.68 M/s  0.00 %  7.82 % ceph-osd -f 
--cluster ceph --id 42 --setuser ceph --setgroup ceph [bstore_kv_sync]

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---

-Original Message-
From: Stefan Kooman  
Sent: Monday, September 20, 2021 2:13 PM
To: Szabo, Istvan (Agoda) ; ceph-users 

Subject: Re: [ceph-users] Adding cache tier to an existing objectstore cluster 
possible?

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


On 9/20/21 06:15, Szabo, Istvan (Agoda) wrote:
> Hi,
>
> I'm running out of idea why my wal+db nvmes are maxed out always so thinking 
> of I might missed the cache tiering in front of my 4:2 ec-pool. IS it 
> possible to add it later?

Maybe I missed a post where you talked about WAL+DB being maxed out.
What Ceph version do you use? Maybe you suffer from issue #52244 which is ifxed 
in Pacific 16.2.6 with PR [1].

Gr. Stefan

[1]: https://github.com/ceph/ceph/pull/42773
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Adding cache tier to an existing objectstore cluster possible?

2021-09-20 Thread Stefan Kooman

On 9/20/21 06:15, Szabo, Istvan (Agoda) wrote:

Hi,

I'm running out of idea why my wal+db nvmes are maxed out always so thinking of 
I might missed the cache tiering in front of my 4:2 ec-pool. IS it possible to 
add it later?


Maybe I missed a post where you talked about WAL+DB being maxed out. 
What Ceph version do you use? Maybe you suffer from issue #52244 which 
is ifxed in Pacific 16.2.6 with PR [1].


Gr. Stefan

[1]: https://github.com/ceph/ceph/pull/42773
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PGs stuck in unkown state

2021-09-20 Thread Stefan Kooman

On 9/20/21 07:51, Mr. Gecko wrote:

Hello,

I'll start by explaining what I have done. I was adding some new storage 
in attempt to setup a cache pool according to 
https://docs.ceph.com/en/latest/dev/cache-pool/ by doing the following.


1. I upgraded all servers in cluster to ceph 15.2.14 which put the 
system into recovery for out of sync data.
2. I added 2 SSDs as OSDs to the cluster which immediately cause ceph to 
balance onto the SSDs.

3. I added 2 new crush rules which map to SSD storage vs HDD storage.\


I guess this is were things go wrong. Have you tested the CRUSH rules 
beforehand? To see if the right OSDs get mapped, or any at all.


I would revert the crush rule change for now to try get your PGs 
active+clean.


If that works, than try to find out (with crushtool for example) why the 
new CRUSH rule sets do not map the OSDs.


Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Adding cache tier to an existing objectstore cluster possible?

2021-09-20 Thread Szabo, Istvan (Agoda)
Hi,

I'm running out of idea why my wal+db nvmes are maxed out always so thinking of 
I might missed the cache tiering in front of my 4:2 ec-pool. IS it possible to 
add it later?
There are 9 nodes with 6x 15.3TB SAS ssds, 3x nvme drives. Currently out of the 
3 nvme 1 is used for index pool and meta pool, the other 2 is used for wal+db 
in front of 3-3 ssds. Thinking to remove the wal+db nvmes and add it as a write 
back cache pool.

The only thing which makes head ache is the description: 
https://docs.ceph.com/en/latest/rados/operations/cache-tiering/#a-word-of-caution
 feels like not really suggested to use it :/

Any experience with it?

Thank you.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Is it normal Ceph reports "Degraded data redundancy" in normal use?

2021-09-20 Thread Kai Stian Olstad

On 17.09.2021 16:10, Eugen Block wrote:
Since I'm trying to test different erasure encoding plugin and  
technique I don't want the balancer active.
So I tried setting it to none as Eguene suggested, and to my  surprise 
I did not get any degraded messages at all, and the cluster  was in 
HEALTH_OK the whole time.


Interesting, maybe the balancer works differently now? Or it works
differently under heavy load?


It would be strange that the balancer normal operation is to put the 
cluster in degraded mode.




The only suspicious lines I see are these:

 Sep 17 06:30:01 pech-mon-1 conmon[1337]: debug
2021-09-17T06:30:01.402+ 7f66b0329700  1 heartbeat_map
reset_timeout 'Monitor::cpu_tp thread 0x7f66b0329700' had timed out
after 0.0s

But I'm not sure if this is related. The out OSDs shouldn't have any
impact on this test.

Did you monitor the network saturation during these tests with iftop
or something similar?


I did not, so I rerun the test this morning.

All the servers have 2x25Gbit/s NIC in bonding with LACP 802.3ad 
layer3+4.


The peak on the active monitor was 27 Mbit/s and less on the other 2 
monitors.
I also checked the CPU(Xeon 5222 3.8 GHz) and non of the cores was 
saturated,

and network statistics show no errors or drops.


So perhaps there is a bug in the balancer code?

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: debugging radosgw sync errors

2021-09-20 Thread Boris Behrens
Ah found it.
It was a SSL certificate that was invalid (some PoC which started to mold).

Now the sync is running fine, but there is one bucket that got a ton of
data in the mdlog.
[root@s3db16 ~]# radosgw-admin mdlog list | grep temonitor | wc -l
No --period given, using current period=e8fc96f1-ae86-4dc1-b432-470b0772fded
284760
[root@s3db16 ~]# radosgw-admin mdlog list | grep name | wc -l
No --period given, using current period=e8fc96f1-ae86-4dc1-b432-470b0772fded
343078

Is it safe to clear the mdlog?

Am Mo., 20. Sept. 2021 um 01:00 Uhr schrieb Boris Behrens :

> I just deleted the rados object from .rgw.data.root and this removed the
> bucket.instance, but this did not solve the problem.
>
> It looks like there is some access error when I try to radosgw-admin
> metadata sync init.
> The 403 http response code on the post to the /admin/realm/period endpoint.
>
> I checked the system_key and added a new system user and set the keys with
> zone modify and period update --commit on both sides.
> This also did not help.
>
> After a weekend digging through the mailing list and trying to fix it, I
> am totally stuck.
> I hope that someone of you people can help me.
>
>
>
>
> Am Fr., 17. Sept. 2021 um 17:54 Uhr schrieb Boris Behrens :
>
>> While searching for other things I came across this:
>> [root ~]# radosgw-admin metadata list bucket | grep www1
>> "www1",
>> [root ~]# radosgw-admin metadata list bucket.instance | grep www1
>> "www1:ff7a8b0c-07e6-463a-861b-78f0adeba8ad.81095307.31103",
>> "www1.company.dev",
>> [root ~]# radosgw-admin bucket list | grep www1
>> "www1",
>> [root ~]# radosgw-admin metadata rm bucket.instance:www1.company.dev
>> ERROR: can't remove key: (22) Invalid argument
>>
>> Maybe this is part of the problem.
>>
>> Did somebody saw this and know what to do?
>> --
>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
>> groüen Saal.
>>
>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groüen Saal.
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rocksdb corruption with 16.2.6

2021-09-20 Thread Andrej Filipcic



attached it, but did not work, here it is:

https://www-f9.ijs.si/~andrej/ceph/ceph-osd.1049.log-20210920.gz

Cheers,
Andrej


On 9/20/21 9:41 AM, Dan van der Ster wrote:

On Sun, Sep 19, 2021 at 4:48 PM Andrej Filipcic  wrote:

I have attached a part of the osd log.

Hi Andrej. Did you mean to attach more than the snippets?
Could you also send the log of the first startup in 16.2.6 of an
now-corrupted osd?

Cheers, dan



--
_
   prof. dr. Andrej Filipcic,   E-mail: andrej.filip...@ijs.si
   Department of Experimental High Energy Physics - F9
   Jozef Stefan Institute, Jamova 39, P.o.Box 3000
   SI-1001 Ljubljana, Slovenia
   Tel.: +386-1-477-3674Fax: +386-1-477-3166
-

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rocksdb corruption with 16.2.6

2021-09-20 Thread Dan van der Ster
On Sun, Sep 19, 2021 at 4:48 PM Andrej Filipcic  wrote:
> I have attached a part of the osd log.

Hi Andrej. Did you mean to attach more than the snippets?
Could you also send the log of the first startup in 16.2.6 of an
now-corrupted osd?

Cheers, dan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Adding cache tier to an existing objectstore cluster possible?

2021-09-20 Thread Zakhar Kirpichenko
My experience was that placing DB+WAL on NVME provided a much better and
much more consistent boost to a HDD-backed pool than a cache tier. My
biggest grief with the cache tier was its unpredictable write performance,
when it would cache some writes and then immediately not cache some others
seemingly at random, and we couldn't affect this behavior with any
settings, both well and not so well documented. Read cache performance was
somewhat more predictable, but not nearly at the level our enterprise NVME
drives could provide. Then I asked about this on IRC and the feedback I got
was basically "it is what it is, avoid using cache tier".
Zakhar
On Mon, Sep 20, 2021 at 9:56 AM Eugen Block  wrote:

> And we are quite happy with our cache tier. When we got new HDD OSDs
> we tested if things would improve without the tier but we had to stick
> to it, otherwise working with our VMs was almost impossible. But this
> is an RBD cache so I can't tell how the other protocols perform with a
> cache tier.
>
>
> Zitat von Zakhar Kirpichenko :
>
> > Hi,
> >
> > You can arbitrarily add or remove the cache tier, there's no problem with
> > that. The problem is that cache tier doesn't work well, I tried it in
> front
> > of replicated and EC-pools with very mixed results: when it worked there
> > wasn't as much of a speed/latency benefit as one would expect from
> > NVME-based cache, and most of the time it just didn't work with I/O very
> > obviously hitting the underlying "cold data" pool for no reason. This
> > behavior is likely why cache tier isn't recommended. I eventually
> > dismantled the cache tier and used NVME for WAL+DB.
> >
> > Best regards,
> > Zakhar
> >
> > On Mon, Sep 20, 2021 at 7:16 AM Szabo, Istvan (Agoda) <
> > istvan.sz...@agoda.com> wrote:
> >
> >> Hi,
> >>
> >> I'm running out of idea why my wal+db nvmes are maxed out always so
> >> thinking of I might missed the cache tiering in front of my 4:2
> ec-pool. IS
> >> it possible to add it later?
> >> There are 9 nodes with 6x 15.3TB SAS ssds, 3x nvme drives. Currently out
> >> of the 3 nvme 1 is used for index pool and meta pool, the other 2 is
> used
> >> for wal+db in front of 3-3 ssds. Thinking to remove the wal+db nvmes and
> >> add it as a write back cache pool.
> >>
> >> The only thing which makes head ache is the description:
> >>
> https://docs.ceph.com/en/latest/rados/operations/cache-tiering/#a-word-of-caution
> >> feels like not really suggested to use it :/
> >>
> >> Any experience with it?
> >>
> >> Thank you.
> >>
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io