[ceph-users] Elasticsearch sync module | Ceph Issue

2024-03-11 Thread Lokendra Rathour
Hi Team,

We are working on Elasticsearch sync module integration with ceph.

Ceph version: 18.2.5 (reef)
Elasticsearch: 8.2.1

Problem statement:
The Syncing between the zones are not happenning:

Links followed to perform the integration:
https://ceph.io/en/news/blog/2017/new-luminous-rgw-metadata-search/
https://www.suse.com/c/rgw-elasticsearch/

Current status of radosgw-admin sync status:

  realm 1fc3fe5f-2bb8-4134-8200-4b26479f7709 (gold)
  zonegroup 59d8be05-0a31-4610-8f7f-d80a6d221f2e (zone-zg)
   zone ce1e53a5-5f30-42c5-bcca-a69b60411e1a (zone)
   current time 2024-03-11T02:31:17Z
zonegroup features enabled: resharding
   disabled: compress-encrypted
  metadata sync no sync (zone is master)
  data sync source: e012b80d-cb04-4ab2-b78f-13b1dea0156d (zone-es)
not syncing from zone

Configuration of ceph.conf:
[global]
ms bind ipv6 = true
ms bind ipv4 = false
mon initial members = storagenode1,storagenode2,storagenode3
osd pool default crush rule = -1
fsid = d5b53cd2-bef4-41a3-8667-135a3e02fc25
mon host =
[v2:[fd00:fd00:fd00:9900::34]:3300,v1:[fd00:fd00:fd00:9900::34]:6789],[v2:[fd00:fd00:fd00:9900::35]:3300,v1:[fd00:fd00:fd00:9900::35]:6789],[v2:[fd00:fd00:fd00:9900::36]:3300,v1:[fd00:fd00:fd00:9900::36]:6789]
public network = fd00:fd00:fd00:9900::/64
cluster network = eff0:eff0:eff0::/64

[osd]
osd memory target = 31380104806

[client.rgw.storagenode1.rgw0]
host = storagenode1
keyring = /var/lib/ceph/radosgw/ceph-rgw.storagenode1.rgw0/keyring
log file = /var/log/ceph/ceph-rgw-storagenode1.rgw0.log
rgw frontends = beast endpoint=[fd00:fd00:fd00:9900::34]:8080
rgw thread pool size = 512

[client.rgw.zone]
rgw_frontends="beast port=8000"
rgw_zone=zone
host = storagenode1
keyring = /etc/ceph/ceph.client.zone.keyring
log file = /var/log/radosgw/rgw.zone.radosgw.log


[client.rgw.zone-es]
rgw_frontends="beast port=8002"
rgw_zone=zone-es
host = storagenode1
keyring = /etc/ceph/ceph.client.zone-es.keyring
log file = /var/log/radosgw/rgw.zone-es.radosgw.log


Could anyone please help.


-- 
~ Lokendra
skype: lokendrarathour
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: bluestore_min_alloc_size and bluefs_shared_alloc_size

2024-03-11 Thread Alexander E. Patrakov
Hello Joel,

Please be aware that it is not recommended to keep a mix of OSDs
created with different bluestore_min_alloc_size values within the same
CRUSH device class. The consequence of such a mix is that the balancer
will not work properly - instead of evening out the OSD space
utilization, it will create a distribution with two bands.

This is a bug in the balancer. A ticket has been filed already:
https://tracker.ceph.com/issues/64715

On Tue, Mar 12, 2024 at 4:45 AM Joel Davidow  wrote:
>
> For osds that are added new, bfm_bytes_per_block is 4096. However, for osds
> that were added when the cluster was running octopus, bfm_bytes_per_block
> remains 65535.
>
> Based on
> https://github.com/ceph/ceph/blob/1c349451176cc5b4ebfb24b22eaaa754e05cff6c/src/os/bluestore/BitmapFreelistManager.cc
> and the space allocation section on page 360 of
> https://pdl.cmu.edu/PDL-FTP/Storage/ceph-exp-sosp19.pdf, it appears
> bfm_bytes_per_block is the bluestore_min_alloc_size that the osd was built
> with.
>
> Below is a sanitized example of what I was referring to as the osd label
> (which includes bfm_bytes_per_block) that was run on an osd built under
> octopus. The cluster was later upgraded to pacific.
>
> user@osd-host:/# ceph-bluestore-tool show-label --path
> /var/lib/ceph/osd/ceph-36
> inferring bluefs devices from bluestore path
> {
> "/var/lib/ceph/osd/ceph-36/block": {
> "osd_uuid": "",
> "size": 4000783007744,
> "btime": "2021-09-14T15:16:55.605860+",
> "description": "main",
> "bfm_blocks": "61047168",
> "bfm_blocks_per_key": "128",
> "bfm_bytes_per_block": "65536",
> "bfm_size": "4000783007744",
> "bluefs": "1",
> "ceph_fsid": "",
> "kv_backend": "rocksdb",
> "magic": "ceph osd volume v026",
> "mkfs_done": "yes",
> "osd_key": "",
> "osdspec_affinity": "",
> "ready": "ready",
> "require_osd_release": "16",
> "whoami": "36"
> }
> }
>
> I'm really interested in learning the answers to the questions in the
> original post.
>
> Thanks,
> Joel
>
> On Wed, Mar 6, 2024 at 12:11 PM Anthony D'Atri 
> wrote:
>
> >
> >
> > On Feb 28, 2024, at 17:55, Joel Davidow  wrote:
> >
> > Current situation
> > -
> > We have three Ceph clusters that were originally built via cephadm on
> > octopus and later upgraded to pacific. All osds are HDD (will be moving to
> > wal+db on SSD) and were resharded after the upgrade to enable rocksdb
> > sharding.
> >
> > The value for bluefs_shared_alloc_size has remained unchanged at 65535.
> >
> > The value for bluestore_min_alloc_size_hdd was 65535 in octopus but is
> > reported as 4096 by ceph daemon osd. config show in pacific.
> >
> >
> > min_alloc_size is baked into a given OSD when it is created.  The central
> > config / runtime value does not affect behavior for existing OSDs.  The
> > only way to change it is to destroy / redeploy the OSD.
> >
> > There was a succession of PRs in the Octopus / Pacific timeframe around
> > default min_alloc_size for HDD and SSD device classes, including IIRC one
> > temporary reversion.
> >
> > However, the osd label after upgrading to pacific retains the value of
> > 65535 for bfm_bytes_per_block.
> >
> >
> > OSD label?
> >
> > I'm not sure if your Pacific release has the back port, but not that along
> > ago `ceph osd metadata` was amended to report the min_alloc_size that a
> > given OSD was built with.  If you don't have that, the OSD's startup log
> > should report it.
> >
> > -- aad
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



-- 
Alexander E. Patrakov
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: bluestore_min_alloc_size and bluefs_shared_alloc_size

2024-03-11 Thread Joel Davidow
For osds that are added new, bfm_bytes_per_block is 4096. However, for osds
that were added when the cluster was running octopus, bfm_bytes_per_block
remains 65535.

Based on
https://github.com/ceph/ceph/blob/1c349451176cc5b4ebfb24b22eaaa754e05cff6c/src/os/bluestore/BitmapFreelistManager.cc
and the space allocation section on page 360 of
https://pdl.cmu.edu/PDL-FTP/Storage/ceph-exp-sosp19.pdf, it appears
bfm_bytes_per_block is the bluestore_min_alloc_size that the osd was built
with.

Below is a sanitized example of what I was referring to as the osd label
(which includes bfm_bytes_per_block) that was run on an osd built under
octopus. The cluster was later upgraded to pacific.

user@osd-host:/# ceph-bluestore-tool show-label --path
/var/lib/ceph/osd/ceph-36
inferring bluefs devices from bluestore path
{
"/var/lib/ceph/osd/ceph-36/block": {
"osd_uuid": "",
"size": 4000783007744,
"btime": "2021-09-14T15:16:55.605860+",
"description": "main",
"bfm_blocks": "61047168",
"bfm_blocks_per_key": "128",
"bfm_bytes_per_block": "65536",
"bfm_size": "4000783007744",
"bluefs": "1",
"ceph_fsid": "",
"kv_backend": "rocksdb",
"magic": "ceph osd volume v026",
"mkfs_done": "yes",
"osd_key": "",
"osdspec_affinity": "",
"ready": "ready",
"require_osd_release": "16",
"whoami": "36"
}
}

I'm really interested in learning the answers to the questions in the
original post.

Thanks,
Joel

On Wed, Mar 6, 2024 at 12:11 PM Anthony D'Atri 
wrote:

>
>
> On Feb 28, 2024, at 17:55, Joel Davidow  wrote:
>
> Current situation
> -
> We have three Ceph clusters that were originally built via cephadm on
> octopus and later upgraded to pacific. All osds are HDD (will be moving to
> wal+db on SSD) and were resharded after the upgrade to enable rocksdb
> sharding.
>
> The value for bluefs_shared_alloc_size has remained unchanged at 65535.
>
> The value for bluestore_min_alloc_size_hdd was 65535 in octopus but is
> reported as 4096 by ceph daemon osd. config show in pacific.
>
>
> min_alloc_size is baked into a given OSD when it is created.  The central
> config / runtime value does not affect behavior for existing OSDs.  The
> only way to change it is to destroy / redeploy the OSD.
>
> There was a succession of PRs in the Octopus / Pacific timeframe around
> default min_alloc_size for HDD and SSD device classes, including IIRC one
> temporary reversion.
>
> However, the osd label after upgrading to pacific retains the value of
> 65535 for bfm_bytes_per_block.
>
>
> OSD label?
>
> I'm not sure if your Pacific release has the back port, but not that along
> ago `ceph osd metadata` was amended to report the min_alloc_size that a
> given OSD was built with.  If you don't have that, the OSD's startup log
> should report it.
>
> -- aad
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] 18.2.2 dashboard really messed up.

2024-03-11 Thread Harry G Coin
Looking at ceph -s, all is well.  Looking at the dashboard, 85% of my 
capacity is 'warned', and 95% is 'in danger'.   There is no hint given 
as to the nature of the danger or reason for the warning.  Though 
apparently with merely 5% of my ceph world 'normal', the cluster reports 
'ok'.  Which, you know, seems contradictory.  I've used just under 40% 
of capacity.


Further down the dashboard, all the subsections of 'Cluster Utilization' 
are '1' and '0.5' with nothing whatever in the graphics area.


Previous versions of ceph presented a normal dashboard.

It's just a little half rack, 5 hosts, a few physical drives each, been 
running ceph for a couple years now.  Orchestrator is cephadm.  It's 
just about as 'plain vanilla' at it gets.  I've had to mute one alert, 
because cephadm refresh aborts when it finds drives on any host that 
have nothing to do with ceph that don't have a blkid_ip 'TYPE' key.  
Seems unrelated to a totally messed up dashboard.  (The tracker for that 
is here: https://tracker.ceph.com/issues/63502 ).


Any idea what the steps are to get useful stuff back on the dashboard?   
Any idea where I can learn what my 85% danger and 95% warning is 
'about'?  (You'd think 'danger' (The volcano is blowing up now!)  would 
be worse than 'warning' (the volcano might blow up soon) , so how can 
warning+danger > 100%, or if not additive how can warning < danger?)


 Here's a bit of detail:

root@noc1:~# ceph -s
 cluster:
   id: 4067126d-01cb-40af-824a-881c130140f8
   health: HEALTH_OK
   (muted: CEPHADM_REFRESH_FAILED)

 services:
   mon: 5 daemons, quorum noc4,noc2,noc1,noc3,sysmon1 (age 70m)
   mgr: noc2.yhyuxd(active, since 82m), standbys: noc4.tvhgac, 
noc3.sybsfb, noc1.jtteqg

   mds: 1/1 daemons up, 3 standby
   osd: 27 osds: 27 up (since 20m), 27 in (since 2d)

 data:
   volumes: 1/1 healthy
   pools:   16 pools, 1809 pgs
   objects: 12.29M objects, 17 TiB
   usage:   44 TiB used, 67 TiB / 111 TiB avail
   pgs: 1793 active+clean
9    active+clean+scrubbing
7    active+clean+scrubbing+deep

 io:
   client:   5.6 MiB/s rd, 273 KiB/s wr, 41 op/s rd, 58 op/s wr

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: General best practice for stripe unit and count if I want to change object size

2024-03-11 Thread Ilya Dryomov
On Sat, Mar 9, 2024 at 4:42 AM Nathan Morrison  wrote:
>
> This was asked in reddit and was requested to post here:
>
> So in RBD, say I want to make an image that's got an object size of 1M
> instead of the default 4M (if it will be a VM say, and likely not have
> too many big files in it, just OS files mostly). I also know I don't
> wanna go too crazy and make 4k objects or the cluster will blow up with
> number of objects. What's a good rule of thumb on what to set the stripe
> unit and count in relation to object size?
>
> Also how can I see the stripe unit and count for an image, it seems "rbd
> info " doesn't show it, only object size (or order).
>
>
>
> Would this be sensible (assuming old img is default 4M obj size) or
> really stupid and why?
>
> rbd cp --object-size 1M --stripe-unit 1M --stripe-count 1 pool/old-img
> pool/new-img

Hi Nathan,

This (stripe-unit == object-size and stripe-count == 1) is the default,
so if you want to change the object size, passing just --object-size is
sufficient.

Thanks,

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Telemetry endpoint down?

2024-03-11 Thread Konstantin Shalygin
Hi Greg

Seems is up now, last report uploaded successfully

Thanks,
k

Sent from my iPhone

> On 11 Mar 2024, at 18:57, Gregory Farnum  wrote:
> 
> We had a lab outage Thursday and it looks like this service wasn’t
> restarted after that occurred. Fixed now and we’ll look at how to prevent
> that in future.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Telemetry endpoint down?

2024-03-11 Thread Gregory Farnum
We had a lab outage Thursday and it looks like this service wasn’t
restarted after that occurred. Fixed now and we’ll look at how to prevent
that in future.
-Greg

On Mon, Mar 11, 2024 at 6:46 AM Konstantin Shalygin  wrote:

> Hi, seems telemetry endpoint is down for a some days? We have connection
> errors from multiple places
>
>
> 1:ERROR Mar 10 00:46:10.653 [564383]: opensock: Could not establish a
> connection to telemetry.ceph.com:443
> 2:ERROR Mar 10 01:48:20.061 [564383]: opensock: Could not establish a
> connection to telemetry.ceph.com:443
> 3:ERROR Mar 10 02:50:29.473 [564383]: opensock: Could not establish a
> connection to telemetry.ceph.com:443
> 4:ERROR Mar 10 03:52:38.877 [564383]: opensock: Could not establish a
> connection to telemetry.ceph.com:443
> 5:ERROR Mar 10 04:54:48.285 [564383]: opensock: Could not establish a
> connection to telemetry.ceph.com:443
> 6:ERROR Mar 10 05:56:57.693 [564383]: opensock: Could not establish a
> connection to telemetry.ceph.com:443
> 7:ERROR Mar 10 06:59:07.105 [564383]: opensock: Could not establish a
> connection to telemetry.ceph.com:443
> 8:ERROR Mar 10 08:01:16.509 [564383]: opensock: Could not establish a
> connection to telemetry.ceph.com:443
> 9:ERROR Mar 10 09:03:25.917 [564383]: opensock: Could not establish a
> connection to telemetry.ceph.com:443 
>
>
> Thanks,
> k
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: AMPQS support in Nautilus

2024-03-11 Thread Yuval Lifshitz
Hi Manuel,
I looked into the nautilus documentation [1]. could not find anything about
amqps there.

Yuval

[1] https://docs.ceph.com/en/nautilus/radosgw/notifications/#create-a-topic

On Mon, Mar 11, 2024 at 12:50 AM Manuel Negron  wrote:

> Hello, ive been trying to setup bucket notifications using amqs (according
> to the documentation is possible) however when checking the logs the push
> fails saying "invalid schema". any ideas wha the issue could be ?
>
>
> kind regards
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] v18.2.2 Reef (hot-fix) released

2024-03-11 Thread Yuri Weinstein
We're happy to announce the 2nd hotfix release in the Reef series.
We recommend users to update to this release.
For detailed release notes with links & changelog please refer to the
official blog entry at
https://ceph.io/en/news/blog/2024/v18-2-2-reef-released/

Notable Changes
---
* mgr/Prometheus: refine the orchestrator availability check to
prevent against crashes in
the prometheus module during startup. Introduce additional checks to
handle daemon_ids generated
within the Rook environment, thus preventing potential issues during
RGW metrics metadata generation.

Related tracker: https://tracker.ceph.com/issues/64721

Getting Ceph

* Git at git://github.com/ceph/ceph.git
* Tarball at https://download.ceph.com/tarballs/ceph_18.2.2.orig.tar.gz
* Containers at https://quay.io/repository/ceph/ceph
* For packages, see https://docs.ceph.com/en/latest/install/get-packages/
* Release git sha1: 531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MANY_OBJECT_PER_PG on 1 pool which is cephfs_metadata

2024-03-11 Thread Eugen Block

Hi,

I assume you're still on a "low" pacific release? This was fixed by PR  
[1][2] and the warning is supressed when autoscaler is on, it was  
merged into Pacific 16.2.8 [3].


I can't answer why autoscaler doesn't increase the pg_num, but yes,  
you can increase it by yourself. The pool for cephfs_metadata should  
be on fast storage and doesn't have huge amounts of data so it should  
be relatively quick. What's your 'ceph osd df' output?


Regards,
Eugen

[1] https://tracker.ceph.com/issues/53644
[2] https://github.com/ceph/ceph/pull/45152
[3] https://docs.ceph.com/en/latest/releases/pacific/#v16-2-8-pacific

Zitat von Edouard FAZENDA :


Hello Ceph community,



I have since this morning a warning about MANY_OBJECT_PER_PG on 1 pool which
is cephfs_metadata



# ceph health detail

HEALTH_WARN 1 pools have many more objects per pg than average

[WRN] MANY_OBJECTS_PER_PG: 1 pools have many more objects per pg than
average

pool cephfs_metadata objects per pg (154151) is more than 10.0215 times
cluster average (15382)



I have the autoscaling on on all the pool :



# ceph osd pool autoscale-status

POOL  SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO
TARGET RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE

device_health_metrics9523k3.026827G  0.
1.0   1  on

cephfs_data  5389G2.026827G  0.4018
1.0 512  on

cephfs_metadata 19365M2.026827G  0.0014
4.0  16  on

.rgw.root1323 3.026827G  0.
1.0  32  on

default.rgw.log 23552 3.026827G  0.
1.0  32  on

default.rgw.control 0 3.026827G  0.
1.0  32  on

default.rgw.meta11911 3.026827G  0.
4.0   8  on

default.rgw.buckets.index   0 3.026827G  0.
4.0   8  on

default.rgw.buckets.data497.0G3.026827G  0.0556
1.0  32  on

kubernetes  177.2G2.026827G  0.0132
1.0  32  on

default.rgw.buckets.non-ec432 3.026827G  0.
1.0  32  on



Actually the pg_num is 16 for the cephfs_metdata pool , but it does not
define NEW_PG_NUM



Here the replicated size of all my pool



# ceph osd dump | grep  'replicated size'

pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 189372
flags hashpspool stripe_width 0 pg_num_min 1 application mgr_devicehealth

pool 10 'cephfs_data' replicated size 2 min_size 1 crush_rule 1 object_hash
rjenkins pg_num 512 pgp_num 512 autoscale_mode on last_change 189346 lfor
0/0/183690 flags hashpspool,selfmanaged_snaps stripe_width 0 application
cephfs

pool 11 'cephfs_metadata' replicated size 2 min_size 1 crush_rule 1
object_hash rjenkins pg_num 16 pgp_num 16 autoscale_mode on last_change
187861 lfor 0/187861/187859 flags hashpspool stripe_width 0
pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs

pool 18 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash
rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 5265 flags
hashpspool stripe_width 0 application rgw

pool 19 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 5267
flags hashpspool stripe_width 0 application rgw

pool 20 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 5269
flags hashpspool stripe_width 0 application rgw

pool 21 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 5398
lfor 0/5398/5396 flags hashpspool stripe_width 0 pg_autoscale_bias 4
pg_num_min 8 application rgw

pool 22 'default.rgw.buckets.index' replicated size 3 min_size 2 crush_rule
0 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 7491
lfor 0/7491/7489 flags hashpspool stripe_width 0 pg_autoscale_bias 4
pg_num_min 8 application rgw

pool 23 'default.rgw.buckets.data' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 7500
flags hashpspool stripe_width 0 application rgw

pool 24 'kubernetes' replicated size 2 min_size 1 crush_rule 1 object_hash
rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 189363 lfor
0/0/7560 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd

pool 25 'default.rgw.buckets.non-ec' replicated size 3 min_size 2 crush_rule
0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
23983 

[ceph-users] Telemetry endpoint down?

2024-03-11 Thread Konstantin Shalygin
Hi, seems telemetry endpoint is down for a some days? We have connection errors 
from multiple places


1:ERROR Mar 10 00:46:10.653 [564383]: opensock: Could not establish a 
connection to telemetry.ceph.com:443
2:ERROR Mar 10 01:48:20.061 [564383]: opensock: Could not establish a 
connection to telemetry.ceph.com:443
3:ERROR Mar 10 02:50:29.473 [564383]: opensock: Could not establish a 
connection to telemetry.ceph.com:443
4:ERROR Mar 10 03:52:38.877 [564383]: opensock: Could not establish a 
connection to telemetry.ceph.com:443
5:ERROR Mar 10 04:54:48.285 [564383]: opensock: Could not establish a 
connection to telemetry.ceph.com:443
6:ERROR Mar 10 05:56:57.693 [564383]: opensock: Could not establish a 
connection to telemetry.ceph.com:443
7:ERROR Mar 10 06:59:07.105 [564383]: opensock: Could not establish a 
connection to telemetry.ceph.com:443
8:ERROR Mar 10 08:01:16.509 [564383]: opensock: Could not establish a 
connection to telemetry.ceph.com:443
9:ERROR Mar 10 09:03:25.917 [564383]: opensock: Could not establish a 
connection to telemetry.ceph.com:443 


Thanks,
k
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Dashboard building issue "RuntimeError: memory access out of bounds"?

2024-03-11 Thread 张东川
Hi there,


I was building ceph with the tag "v19.0.0" onMilkv Pioneer board (RISCV 
arch, OS is fedora-riscv 6.1.55)
I ran "./do_cmake.sh -DCMAKE_BUILD_TYPE=RelWithDebInfo -DWITH_SPDK=ON" then 
just went to the "build" folder and ran "ninja" command.
But it failed with the following dashboard error:


[1/2] cd /mnt/ssd/projects/test/ceph/build/src/pybind/mgr/dashboard/frontend 
 . 
/mnt/ssd/projects/test/ceph/build/src/pybind/mgr/dashboard/frontend/node-env/bin/activate
  npm config set cache 
/mnt/ssd/projects/test/ceph/build/src/pybind/mgr/dashboard/frontend/node-env/.npm
 --userconfig 
/mnt/ssd/projects/test/ceph/build/src/pybind/mgr/dashboard/frontend/node-env/.npmrc
  deactivate
[2/2] dashboard frontend is being created
FAILED: src/pybind/mgr/dashboard/frontend/dist 
/mnt/ssd/projects/test/ceph/build/src/pybind/mgr/dashboard/frontend/dist
cd /mnt/ssd/projects/test/ceph/src/pybind/mgr/dashboard/frontend  . 
/mnt/ssd/projects/test/ceph/build/src/pybind/mgr/dashboard/frontend/node-env/bin/activate
  DASHBOARD_FRONTEND_LANGS="" npm run build:localize -- --output-path 
/mnt/ssd/projects/test/ceph/build/src/pybind/mgr/dashboard/frontend/dist 
--configuration=production --progress=false  deactivate


 ceph-dashboard@0.0.0 build:localize
 node cd --env --pre  ng build --localize --output-path 
/mnt/ssd/projects/test/ceph/build/src/pybind/mgr/dashboard/frontend/dist 
--configuration=production --progress=false


[cd.js] './angular.backup.json' already exists, restoring it into 
'./angular.json'}
[cd.js] Preparing build of EN.
[cd.js] 'src/environments/environment.tpl.ts' was copied to 
'src/environments/environment.prod.ts'
[cd.js] 'src/environments/environment.tpl.ts' was copied to 
'src/environments/environment.ts'
[cd.js] Writing to ./angular.json
[cd.js] Placeholders were replace in 'src/environments/environment.prod.ts'
[cd.js] Placeholders were replace in 'src/environments/environment.ts'
  TypeScript compiler options "target" and 
"useDefineForClassFields" are set to "ES2022" and "false" respectively by the 
Angular CLI. To control ECMA version and features use the Browerslist 
configuration. For more information, see 
https://angular.io/guide/build#configuring-browser-compatibility
  NOTE: You can set the "target" to "ES2022" in the project's 
tsconfig to remove this warning.
wasm://wasm/7f04b08a:1




RuntimeError: memory access out of bounds
  at wasm://wasm/7f04b08a:wasm-function[2]:0x322
  at WasmHash.digest 
(/mnt/ssd/projects/test/ceph/src/pybind/mgr/dashboard/frontend/node_modules/webpack/lib/util/hash/wasm-hash.js:138:11)
  at BatchedHash.digest 
(/mnt/ssd/projects/test/ceph/src/pybind/mgr/dashboard/frontend/node_modules/webpack/lib/util/hash/BatchedHash.js:64:20)
  at 
/mnt/ssd/projects/test/ceph/src/pybind/mgr/dashboard/frontend/node_modules/webpack/lib/DefinePlugin.js:595:38
  at _next42 (eval at create 
(/mnt/ssd/projects/test/ceph/src/pybind/mgr/dashboard/frontend/node_modules/tapable/lib/HookCodeFactory.js:19:10),
 

[ceph-users] Re: PG damaged "failed_repair"

2024-03-11 Thread Eugen Block

Hi,

your ceph version seems to be 17.2.4, not 17.2.6 (which is the locally  
installed ceph version on the system where you ran the command) Could  
you add the 'ceph versions' output as well?


How is the load on the systems when the recovery starts? The OSDs  
crash after around 20 minutes, not immediately. That's why I assume  
that it's some sort of resource bottlneck.


---snip---
Mar 08 23:46:05 beta-cen bash[922752]: debug  
2024-03-09T04:46:05.198+ 7f0a4bb5d700  0 log_channel(cluster) log  
[INF] : 2.1b continuing backfill to osd.6 from  
(9971'17067184,10014'17073660]  
2:dc1332a8:::rbd_data.99d921d8edc910.0198:head to  
10014'17073660
Mar 08 23:46:05 beta-cen bash[922752]: debug  
2024-03-09T04:46:05.198+ 7f0a4ab5b700  0 log_channel(cluster) log  
[INF] : 2.1d continuing backfill to osd.6 from  
(9972'32706188,10014'32712589]  
2:bc276b0b:::rbd_data.307ae0ca08e035.00019bd6:head to  
10014'32712589
Mar 08 23:46:05 beta-cen bash[922752]:  
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.4/rpm/el8/BUILD/ceph-17.2.4/src/osd/osd_types.cc: In function 'uint64_t SnapSet::get_clone_bytes(snapid_t) const' thread 7f0a4bb5d700 time  
2024-03-09T04:46:05.331039+
Mar 08 23:46:05 beta-cen bash[922752]:  
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.4/rpm/el8/BUILD/ceph-17.2.4/src/osd/osd_types.cc: 5888: FAILED  
ceph_assert(clone_overlap.count(clone))
Mar 08 23:46:05 beta-cen bash[922752]:  ceph version 17.2.4  
(1353ed37dec8d74973edc3d5d5908c20ad5a7332) quincy (stable)
Mar 08 23:46:05 beta-cen bash[922752]:  1:  
(ceph::__ceph_assert_fail(char const*, char const*, int, char  
const*)+0x135) [0x55c62ba0d631]
Mar 08 23:46:05 beta-cen bash[922752]:  2:  
/usr/bin/ceph-osd(+0x5977f7) [0x55c62ba0d7f7]
Mar 08 23:46:05 beta-cen bash[922752]:  3:  
(SnapSet::get_clone_bytes(snapid_t) const+0xe8) [0x55c62bdc1228]
Mar 08 23:46:05 beta-cen bash[922752]:  4:  
(PrimaryLogPG::add_object_context_to_pg_stat(std::shared_ptr,  
pg_stat_t*)+0x25e) [0x55c62bc4ff4e]
Mar 08 23:46:05 beta-cen bash[922752]:  5:  
(PrimaryLogPG::recover_backfill(unsigned long, ThreadPool::TPHandle&,  
bool*)+0x1281) [0x55c62bcbaeb1]
Mar 08 23:46:05 beta-cen bash[922752]:  6:  
(PrimaryLogPG::start_recovery_ops(unsigned long,  
ThreadPool::TPHandle&, unsigned long*)+0xe34) [0x55c62bcc0414]
Mar 08 23:46:05 beta-cen bash[922752]:  7: (OSD::do_recovery(PG*,  
unsigned int, unsigned long, ThreadPool::TPHandle&)+0x272)  
[0x55c62bb20852]
Mar 08 23:46:05 beta-cen bash[922752]:  8:  
(ceph::osd::scheduler::PGRecovery::run(OSD*, OSDShard*,  
boost::intrusive_ptr&, ThreadPool::TPHandle&)+0x1d) [0x55c62be0cdcd]
Mar 08 23:46:05 beta-cen bash[922752]:  9:  
(OSD::ShardedOpWQ::_process(unsigned int,  
ceph::heartbeat_handle_d*)+0x115f) [0x55c62bb21dbf]
Mar 08 23:46:05 beta-cen bash[922752]:  10:  
(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x435)  
[0x55c62c27f8c5]
Mar 08 23:46:05 beta-cen bash[922752]:  11:  
(ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x55c62c281fe4]
Mar 08 23:46:05 beta-cen bash[922752]:  12:  
/lib64/libpthread.so.0(+0x81ca) [0x7f0a6bf991ca]

Mar 08 23:46:05 beta-cen bash[922752]:  13: clone()
---snip---


Zitat von Romain Lebbadi-Breteau :


Hi,

Sorry for the bad formatting. Here are the outputs again.

ceph osd df :

ID  CLASS  WEIGHT   REWEIGHT  SIZE RAW USE  DATA OMAP  
META AVAIL    %USE   VAR   PGS  STATUS
 3    hdd  1.81879 0  0 B  0 B   0 B  0  
B  0 B  0 B  0 0    0    down
12    hdd  1.81879   1.0  1.8 TiB  385 GiB   383 GiB  6.7 MiB  
1.4 GiB  1.4 TiB  20.66  1.73   18  up
13    hdd  1.81879   1.0  1.8 TiB  422 GiB   421 GiB  5.8 MiB  
1.3 GiB  1.4 TiB  22.67  1.90   17  up
15    hdd  1.81879   1.0  1.8 TiB  264 GiB   263 GiB  4.6 MiB  
1.1 GiB  1.6 TiB  14.17  1.19   14  up
16    hdd  9.09520   1.0  9.1 TiB  1.0 TiB  1023 GiB  8.8 MiB  
2.6 GiB  8.1 TiB  11.01  0.92   65  up
17    hdd  1.81879   1.0  1.8 TiB  319 GiB   318 GiB  6.1 MiB  
1.0 GiB  1.5 TiB  17.13  1.43   15  up
 1    hdd  5.45749   1.0  5.5 TiB  546 GiB   544 GiB  7.8 MiB  
1.4 GiB  4.9 TiB   9.76  0.82   29  up
 4    hdd  5.45749   1.0  5.5 TiB  801 GiB   799 GiB  8.3 MiB  
2.4 GiB  4.7 TiB  14.34  1.20   44  up
 8    hdd  5.45749   1.0  5.5 TiB  708 GiB   706 GiB  9.7 MiB  
2.1 GiB  4.8 TiB  12.67  1.06   36  up
11    hdd  5.45749 0  0 B  0 B   0 B  0  
B  0 B  0 B  0 0    0    down
14    hdd  1.81879   1.0  1.8 TiB  200 GiB   198 GiB  3.8 MiB  
1.3 GiB  1.6 TiB  10.71  0.90   10  up
 0    hdd  9.09520 0  0 B  0 B   0 B  0  
B  0 B  0 B  0 0    0    down
 5    hdd  9.09520   1.0  9.1 TiB  859