[ceph-users] RGW DNS bucket names with multi-tenancy

2019-11-01 Thread Florian Engelmann

 Hi,

is it possible to access buckets like:

https://../?

Some SDKs use DNS bucket names only. Using such an SDK the enpoint would 
look like ".".


All the best,
Florian


smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] is rgw crypt default encryption key long term supported ?

2019-06-06 Thread Florian Engelmann

Am 5/28/19 um 5:37 PM schrieb Casey Bodley:


On 5/28/19 11:17 AM, Scheurer François wrote:

Hi Casey


I greatly appreciate your quick and helpful answer :-)


It's unlikely that we'll do that, but if we do it would be announced 
with a long deprecation period and migration strategy.

Fine, just the answer we wanted to hear ;-)



However, I would still caution against using either as a strategy for
key management, especially when (as of mimic) the ceph configuration is
centralized in the ceph-mon database [1][2]. If there are gaps in our
sse-kms integration that makes it difficult to use in practice, I'd
really like to address those.

sse-kms is working great, no issue or gaps with it.
We already use it in our openstack (rocky) with barbican and 
ceph/radosgw (luminous).


But we have customers that want encryption by default, something like 
SSE-S3 (cf. below).

Do you know if there are plans to implement something similar?
I would love to see support for sse-s3. We've talked about building 
something around vault (which I think is what minio does?), but so far 
nobody has taken it up as a project.


What about accepting empty HTTP header "x-amz-server-side-encryption" or 
"x-amz-server-side-encryption: AES256" if


rgw crypt default encryption key =

is enabled. Even if this RadosGW "default encryption key" feature is not 
implemented the same way SSE-S3 is - still the data is encrypted by 
AES256. This would improve compatibility with the S3 API and client 
tools like s3cmd and awscli.





Using dm-crypt would cost too much time for the conversion (72x 8TB 
SATA disks...) .
And dm-crypt is also storing its key on the monitors (cf. 
https://www.spinics.net/lists/ceph-users/msg52402.html).



Best Regards
Francois Scheurer

Amazon SSE-3 description:

https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingServerSideEncryption.html 

Protecting Data Using Server-Side Encryption with Amazon S3-Managed 
Encryption Keys (SSE-S3)
Server-side encryption protects data at rest. Amazon S3 encrypts each 
object with a unique key. As an additional safeguard, it encrypts the 
key itself with a master key that it rotates regularly. Amazon S3 
server-side encryption uses one of the strongest block ciphers 
available, 256-bit Advanced Encryption Standard (AES-256), to encrypt 
your data.


https://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketPUTencryption.html 


The following is an example of the request body for setting SSE-S3.
xmlns="http://s3.amazonaws.com/doc/2006-03-01/;>

   
 
 AES256
 











From: Casey Bodley 
Sent: Tuesday, May 28, 2019 3:55 PM
To: Scheurer François; ceph-users@lists.ceph.com
Subject: Re: is rgw crypt default encryption key long term supported ?

Hi François,


Removing support for either of rgw_crypt_default_encryption_key or
rgw_crypt_s3_kms_encryption_keys would mean that objects encrypted with
those keys would no longer be accessible. It's unlikely that we'll do
that, but if we do it would be announced with a long deprecation period
and migration strategy.


However, I would still caution against using either as a strategy for
key management, especially when (as of mimic) the ceph configuration is
centralized in the ceph-mon database [1][2]. If there are gaps in our
sse-kms integration that makes it difficult to use in practice, I'd
really like to address those.


Casey


[1]
https://ceph.com/community/new-mimic-centralized-configuration-management/ 



[2]
http://docs.ceph.com/docs/mimic/rados/configuration/ceph-conf/#monitor-configuration-database 




On 5/28/19 6:39 AM, Scheurer François wrote:

Dear Casey, Dear Ceph Users The following is written in the radosgw
documentation
(http://docs.ceph.com/docs/luminous/radosgw/encryption/): rgw crypt
default encryption key = 4YSmvJtBv0aZ7geVgAsdpRnLBEwWSWlMIGnRS8a9TSA=

   Important: This mode is for diagnostic purposes only! The ceph
configuration file is not a secure method for storing encryption keys.

 Keys that are accidentally exposed in this way should be
considered compromised.




Is the warning only about the key exposure risk or does it mean also
that the feature could be removed in future?


The is also another similar parameter "rgw crypt s3 kms encryption
keys" (cf. usage example in
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-October/030679.html). 

 




Both parameters are still interesting (provided the ceph.conf is
encrypted) but we want to be sure that they will not be dropped in 
future.





Best Regards

Francois


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com

[ceph-users] sync rados objects to other cluster

2019-05-02 Thread Florian Engelmann

Hi,

we need to migrate a ceph pool used for gnocchi to another cluster in 
another datacenter. Gnocchi uses the python rados or cradox module to 
access the Ceph cluster. The pool is dedicated to gnocchi only. The 
source pool is based on HDD OSDs while the target pool got SSD only. As 
there are > 600.000 small objects (total = 12GB) in the pool a


rados export ... - | ssh ... rados import -

takes to long (more than 2 hours). So we would loose billing data of 2 
hours.


We will now try to add SSDs to the source cluster and modify the crush 
map to speed up the migration.


Are there any alternative options? Any "rsync" style RADOS tool?

All the best,
Flo


smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd cache limiting IOPS

2019-03-07 Thread Florian Engelmann

I was able to check the used settings by a ceph.conf like:

[client.nova]
admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok
log file = /var/log/ceph/client.log
debug rbd = 20
debug librbd = 20
rbd_cache = true
rbd cache size = 268435456
rbd cache max dirty = 201326592
rbd cache target dirty = 134217728

and than ask the socket:

ceph --admin-daemon 
/var/run/ceph/ceph-client.nova.17276.94854801343568.asok config get 
rbd_cache_size

{
"rbd_cache_size": "268435456"
}


So the settings are recognized and used by qemu. But any value higher 
than the default (32MB) of the cache size leads to strange IOPS results. 
IOPS are very constant with 32MB ~20.000 - 23.000 but if we define a 
bigger cache size (we tested from 64MB up to 256MB) the IOPS get very 
unconstant (from 0 IOPS up to 23.000).


Setting "rbd cache max dirty" to 0 changes the behaviour to 
write-through as far as I understood. I expected the latency to increase 
to at least 0.6 ms what was the case but I also expected the IOPS to 
increase to up to 60.000 which was not the case. IOPS was constant at ~ 
14.000IOPS (4 jobs, QD=64).




Am 3/7/19 um 11:41 AM schrieb Florian Engelmann:

Hi,

we are running an Openstack environment with Ceph block storage. There 
are six nodes in the current Ceph cluster (12.2.10) with NVMe SSDs and a 
P4800X Optane for rocksdb and WAL.
The decision was made to use rbd writeback cache with KVM/QEMU. The 
write latency is incredible good (~85 µs) and the read latency is still 
good (~0.6ms). But we are limited to ~23.000 IOPS in a KVM machine. So 
we did the same FIO benchmark after we disabled the rbd cache and got 
65.000 IOPS but of course the write latency (QD1) was increased to ~ 0.6ms.

We tried to tune:

rbd cache size -> 256MB
rbd cache max dirty -> 192MB
rbd cache target dirty -> 128MB

but still we are locked at ~23.000 IOPS with enabled writeback cache.

Right now we are not sure if the tuned settings have been honoured by 
libvirt.


Which options do we have to increase IOPS while writeback cache is used?

All the best,
Florian

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--

EveryWare AG
Florian Engelmann
Senior UNIX Systems Engineer
Zurlindenstrasse 52a
CH-8003 Zürich

tel: +41 44 466 60 00
fax: +41 44 466 60 10
mail: mailto:florian.engelm...@everyware.ch
web: http://www.everyware.ch


smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd cache limiting IOPS

2019-03-07 Thread Florian Engelmann

Hi,

we are running an Openstack environment with Ceph block storage. There 
are six nodes in the current Ceph cluster (12.2.10) with NVMe SSDs and a 
P4800X Optane for rocksdb and WAL.
The decision was made to use rbd writeback cache with KVM/QEMU. The 
write latency is incredible good (~85 µs) and the read latency is still 
good (~0.6ms). But we are limited to ~23.000 IOPS in a KVM machine. So 
we did the same FIO benchmark after we disabled the rbd cache and got 
65.000 IOPS but of course the write latency (QD1) was increased to ~ 0.6ms.

We tried to tune:

rbd cache size -> 256MB
rbd cache max dirty -> 192MB
rbd cache target dirty -> 128MB

but still we are locked at ~23.000 IOPS with enabled writeback cache.

Right now we are not sure if the tuned settings have been honoured by 
libvirt.


Which options do we have to increase IOPS while writeback cache is used?

All the best,
Florian


smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Openstack RBD EC pool

2019-02-15 Thread Florian Engelmann

Hi,

I tried to add a "archive" storage class to our Openstack environment by 
introducing a second storage backend offering RBD volumes having their 
data in an erasure coded pool. As I will have to specify a data-pool I 
tried it as follows:



### keyring files:
ceph.client.cinder.keyring
ceph.client.cinder-ec.keyring

### ceph.conf
[global]
fsid = b5e30221-a214-353c-b66b-8c37b4349123
mon host = ceph-mon.service.i.ewcs.ch
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
###


## ceph.ec.conf
[global]
fsid = b5e30221-a214-353c-b66b-8c37b4349123
mon host = ceph-mon.service.i..
auth cluster required = cephx
auth service required = cephx
auth client required = cephx

[client.cinder-ec]
rbd default data pool = ewos1-prod_cinder_ec
#

# cinder-volume.conf
...
[ceph1-rp3-1]
volume_backend_name = ceph1-rp3-1
volume_driver = cinder.volume.drivers.rbd.RBDDriver
rbd_ceph_conf = /etc/ceph/ceph.conf
rbd_user = cinder
rbd_secret_uuid = xxxcc8b-xx-ae16xx
rbd_pool = cinder
rbd_flatten_volume_from_snapshot = false
rbd_max_clone_depth = 5
rbd_store_chunk_size = 4
rados_connect_timeout = -1
report_discard_supported = true
rbd_exclusive_cinder_pool = true
enable_deferred_deletion = true
deferred_deletion_delay = 259200
deferred_deletion_purge_interval = 3600

[ceph1-ec-1]
volume_backend_name = ceph1-ec-1
volume_driver = cinder.volume.drivers.rbd.RBDDriver
rbd_ceph_conf = /etc/ceph/ceph.ec.conf
rbd_user = cinder-ec
rbd_secret_uuid = xxcc8b-xx-ae16xx
rbd_pool = cinder_ec_metadata
rbd_flatten_volume_from_snapshot = false
rbd_max_clone_depth = 3
rbd_store_chunk_size = 4
rados_connect_timeout = -1
report_discard_supported = true
rbd_exclusive_cinder_pool = true
enable_deferred_deletion = true
deferred_deletion_delay = 259200
deferred_deletion_purge_interval = 3600
##


I created three pools (for cinder) like:
ceph osd pool create cinder 512 512 replicated rack_replicated_rule
ceph osd pool create cinder_ec_metadata 6 6 replicated rack_replicated_rule
ceph osd pool create cinder_ec 512 512 erasure ec32
ceph osd pool set cinder_ec allow_ec_overwrites true


I am able to use backend ceph1-rp3-1 without any errors (create, attach, 
delete, snapshot). I am also able to create volumes via:


openstack volume create --size 100 --type ec1 myvolume_ec

but I am not able to attach it to any instance. I get erros like:

==> libvirtd.log <==
2019-02-15 22:23:01.771+: 27895: error : 
qemuMonitorJSONCheckError:392 : internal error: unable to execute QEMU 
command 'device_add': Property 'scsi-hd.drive' can't find value 
'drive-scsi0-0-0-3'


My instance got three disks (root,swap and one cinder replicated volume) 
amd looks like:



  instance-254e
  6d41c54b-753a-46c7-a573-bedf8822fbf5
  
xmlns:nova="http://openstack.org/xmlns/libvirt/nova/1.0;>

  
  x3-1
  2019-02-15 21:18:24
  
16384
80
8192
0
4
  
  
...
  
  

...
  
hvm


...
  
/usr/bin/qemu-system-x86_64

  
  

  
  name='nova/6d41c54b-753a-46c7-a573-bedf8822fbf5_disk'>




  
  
  
  


  
  

  
  name='nova/6d41c54b-753a-46c7-a573-bedf8822fbf5_disk.swap'>




  
  
  
  


  
  

  
  name='cinder/volume-01e8cb68-1f86-4142-958c-fdd1c301833a'>




  
  
  
125829120
1000
  
  01e8cb68-1f86-4142-958c-fdd1c301833a
  
  


  
  function='0x0'/>


...


Any ideas?

All the best,
Florian


smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Optane still valid

2019-02-04 Thread Florian Engelmann

Hi,

we have built a 6 Node NVMe only Ceph Cluster with 4x Intel DC P4510 8TB 
each and one Intel DC P4800X 375GB Optane each. Up to 10x P4510 can be 
installed in each node.
WAL and RocksDBs for all P4510 should be stored on the Optane (approx. 
30GB per RocksDB incl. WAL).
Internally, discussions arose whether the Optane would become a 
bottleneck from a certain number of P4510 on.
For us, the lowest possible latency is very important. Therefore the 
Optane NVMes were bought. In view of the good performance of the P4510, 
the question arises whether the Optanes still have a noticeable effect 
or whether they are actually just SPOFs?



All the best,
Florian


smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] HDD spindown problem

2018-12-03 Thread Florian Engelmann

Hello,

we are fighting a HDD spin-down problem on our production ceph cluster 
since two weeks now. The problem is not ceph related but I guess this 
topic is interesting to the list and to be honest I hope to find a 
solution here.


We do use 6 OSD Nodes like:
OS: Suse 12 SP3
Ceph: SES 5.5 (12.2.8)
Server: Supermicro 6048R-E1CR36L
Controller: LSI 3008 (LSI3008-IT)
Disk: 12x Seagate ST8000NM0055-1RM112 8TB (SN05 Firmware (some still 
SN02 and SN04)
NVMe: 1x Intel DC P3700 800GB (used for 80GB RocksDB and 2GB WAL for 
each OSD (only 7 Disks are online right now - up to 9 Disks will have 
there RocksDB/WAL on one NVMe SSD)



Problem:
This Ceph cluster is used for objectstorage (RadosGW) only and is mostly 
used for backups to S3 (RadosGW). There is not that much activity - 
mostly at night time. We do not want any HDD to spin down but they do.
We tried to disable the spindown timers by using sdparm and also with 
the Seagate tool SeaChest but "something" does re-enable them:



Disable standby on all HDD:
for i in sd{c..n}; do 
/root/SeaChestUtilities/Linux/Lin64/SeaChest_PowerControl_191_1183_64 -d 
/dev/$i --onlySeagate --changePower --disableMode --powerMode standby ; 
done



Monitor standby timer status:

while true; do for i in sd{c..n}; do echo  "$(date) $i 
$(/root/SeaChestUtilities/Linux/Lin64/SeaChest_PowerControl_191_1183_64 
-d /dev/$i --onlySeagate --showEPCSettings -v0 | grep Stand)";  done; 
sleep 1 ; done


This will show:
Mon Dec  3 10:42:54 CET 2018 sdc Standby Z   0 9000 
65535120   Y Y
Mon Dec  3 10:42:54 CET 2018 sdd Standby Z   0 9000 
65535120   Y Y
Mon Dec  3 10:42:54 CET 2018 sde Standby Z   0 9000 
65535120   Y Y
Mon Dec  3 10:42:54 CET 2018 sdf Standby Z   0 9000 
65535120   Y Y
Mon Dec  3 10:42:54 CET 2018 sdg Standby Z   0 9000 
65535120   Y Y
Mon Dec  3 10:42:54 CET 2018 sdh Standby Z   0 9000 
65535120   Y Y
Mon Dec  3 10:42:54 CET 2018 sdi Standby Z   0 9000 
65535120   Y Y
Mon Dec  3 10:42:55 CET 2018 sdj Standby Z   0 9000 
65535120   Y Y
Mon Dec  3 10:42:55 CET 2018 sdk Standby Z   0 9000 
65535120   Y Y
Mon Dec  3 10:42:55 CET 2018 sdl Standby Z   0 9000 
65535120   Y Y
Mon Dec  3 10:42:55 CET 2018 sdm Standby Z   0 9000 
65535120   Y Y
Mon Dec  3 10:42:55 CET 2018 sdn Standby Z   0 9000 
65535120   Y Y



So everything is fine right now. Standby timer is 0 and disabled (no * 
shown) while the default value is 9000 and the saved timer is  (we 
saved this value so the disks have a huge time after reboots). But after 
a unknown amount of time (in this case ~7 minutes) things start to get 
weird:


Mon Dec  3 10:47:52 CET 2018 sdc Standby Z  *3500  9000 
65535120   Y Y

[...]
65535120   Y Y
Mon Dec  3 10:48:07 CET 2018 sdc Standby Z  *3500  9000 
65535120   Y Y
Mon Dec  3 10:48:09 CET 2018 sdc Standby Z  *3500  9000 
65535120   Y Y
Mon Dec  3 10:48:12 CET 2018 sdc Standby Z  *4500  9000 
65535120   Y Y
Mon Dec  3 10:48:14 CET 2018 sdc Standby Z  *4500  9000 
65535120   Y Y
Mon Dec  3 10:48:16 CET 2018 sdc Standby Z  *4500  9000 
65535120   Y Y
Mon Dec  3 10:48:19 CET 2018 sdc Standby Z  *4500  9000 
65535120   Y Y
Mon Dec  3 10:48:21 CET 2018 sdc Standby Z  *4500  9000 
65535120   Y Y
Mon Dec  3 10:48:23 CET 2018 sdc Standby Z  *5500  9000 
65535120   Y Y
Mon Dec  3 10:48:26 CET 2018 sdc Standby Z  *5500  9000 
65535120   Y Y
Mon Dec  3 10:48:28 CET 2018 sdc Standby Z  *5500  9000 
65535120   Y Y
Mon Dec  3 10:48:30 CET 2018 sdc Standby Z  *5500  9000 
65535120   Y Y
Mon Dec  3 10:48:32 CET 2018 sdc Standby Z  *5500  9000 
65535120   Y Y
Mon Dec  3 10:48:35 CET 2018 sdc Standby Z  *5500  9000 
65535120   Y Y
Mon Dec  3 10:48:37 CET 2018 sdc Standby Z  *5500  9000 
65535120   Y Y
Mon Dec  3 10:48:40 CET 2018 sdc Standby Z  *5500  9000 
65535120   Y Y
Mon Dec  3 10:48:42 CET 2018 sdc Standby Z  *6500  9000 
65535120   Y Y
Mon Dec  3 10:48:44 CET 2018 sdc Standby Z  *6500  9000 
65535120   Y Y
Mon Dec  3 10:48:47 CET 2018 sdc Standby Z  *6500  9000 
65535120   Y Y
Mon Dec  3 10:48:49 CET 2018 sdc Standby Z  *6500  9000 
65535120   Y Y
Mon Dec  3 10:48:52 CET 2018 sdc Standby Z  *7500  9000 
65535120   Y Y
Mon Dec  3 10:48:52 CET 

Re: [ceph-users] RocksDB and WAL migration to new block device

2018-11-21 Thread Florian Engelmann

Hi Igor,

sad to say but I failed building the tool. I tried to build the whole 
project like documented here:


http://docs.ceph.com/docs/mimic/install/build-ceph/

But as my workstation is running Ubuntu the binary fails on SLES:

./ceph-bluestore-tool --help
./ceph-bluestore-tool: symbol lookup error: ./ceph-bluestore-tool: 
undefined symbol: _ZNK7leveldb6Status8ToStringB5cxx11Ev


I did copy all libraries to ~/lib and exported LD_LIBRARY_PATH but it 
did not solve the problem.


Is there any simple method to just build the bluestore-tool standalone 
and static?


All the best,
Florian


Am 11/21/18 um 9:34 AM schrieb Igor Fedotov:
Actually  (given that your devices are already expanded) you don't need 
to expand them once again - one can just update size labels with my new PR.


For new migrations you can use updated bluefs expand command which sets 
size label automatically though.



Thanks,
Igor
On 11/21/2018 11:11 AM, Florian Engelmann wrote:
Great support Igor Both thumbs up! We will try to build the tool 
today and expand those bluefs devices once again.



Am 11/20/18 um 6:54 PM schrieb Igor Fedotov:

FYI: https://github.com/ceph/ceph/pull/25187


On 11/20/2018 8:13 PM, Igor Fedotov wrote:


On 11/20/2018 7:05 PM, Florian Engelmann wrote:

Am 11/20/18 um 4:59 PM schrieb Igor Fedotov:



On 11/20/2018 6:42 PM, Florian Engelmann wrote:

Hi Igor,



what's your Ceph version?


12.2.8 (SES 5.5 - patched to the latest version)



Can you also check the output for

ceph-bluestore-tool show-label -p 


ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-0/
infering bluefs devices from bluestore path
{
    "/var/lib/ceph/osd/ceph-0//block": {
    "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
    "size": 8001457295360,
    "btime": "2018-06-29 23:43:12.088842",
    "description": "main",
    "bluefs": "1",
    "ceph_fsid": "a146-6561-307e-b032-c5cee2ee520c",
    "kv_backend": "rocksdb",
    "magic": "ceph osd volume v026",
    "mkfs_done": "yes",
    "ready": "ready",
    "whoami": "0"
    },
    "/var/lib/ceph/osd/ceph-0//block.wal": {
    "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
    "size": 524288000,
    "btime": "2018-06-29 23:43:12.098690",
    "description": "bluefs wal"
    },
    "/var/lib/ceph/osd/ceph-0//block.db": {
    "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
    "size": 524288000,
    "btime": "2018-06-29 23:43:12.098023",
    "description": "bluefs db"
    }
}





It should report 'size' labels for every volume, please check 
they contain new values.




That's exactly the problem, whether "ceph-bluestore-tool 
show-label" nor "ceph daemon osd.0 perf dump|jq '.bluefs'" did 
recognize the new sizes. But we are 100% sure the new devices are 
used as we already deleted the old once...


We tried to delete the "key" "size" to add one with the new value 
but:


ceph-bluestore-tool rm-label-key --dev 
/var/lib/ceph/osd/ceph-0/block.db -k size

key 'size' not present

even if:

ceph-bluestore-tool show-label --dev 
/var/lib/ceph/osd/ceph-0/block.db

{
    "/var/lib/ceph/osd/ceph-0/block.db": {
    "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
    "size": 524288000,
    "btime": "2018-06-29 23:43:12.098023",
    "description": "bluefs db"
    }
}

So it looks like the key "size" is "read-only"?


There was a bug in updating specific keys, see
https://github.com/ceph/ceph/pull/24352

This PR also eliminates the need to set sizes manually on 
bdev-expand.


I thought it had been backported to Luminous but it looks like it 
doesn't.

Will submit a PR shortly.




Thank you so much Igor! So we have to decide how to proceed. Maybe 
you could help us here as well.


Option A: Wait for this fix to be available. -> could last weeks or 
even months
if you can build a custom version of ceph_bluestore_tool then this 
is a short path. I'll submit a patch today or tomorrow which you 
need to integrate into your private build.

Then you need to upgrade just the tool and apply new sizes.



Option B: Recreate OSDs "one-by-one". -> will take a very long time 
as well

No need for that IMO.


Option C: There is some "lowlevel" commad allowing us to fix those 
sizes?
Well hex editor might help here as well. What you need is just to 
update 64bit size value in block.db and block.wal files. In my lab I 
can find it at offset 0x52. Most

Re: [ceph-users] RocksDB and WAL migration to new block device

2018-11-21 Thread Florian Engelmann
Great support Igor Both thumbs up! We will try to build the tool 
today and expand those bluefs devices once again.



Am 11/20/18 um 6:54 PM schrieb Igor Fedotov:

FYI: https://github.com/ceph/ceph/pull/25187


On 11/20/2018 8:13 PM, Igor Fedotov wrote:


On 11/20/2018 7:05 PM, Florian Engelmann wrote:

Am 11/20/18 um 4:59 PM schrieb Igor Fedotov:



On 11/20/2018 6:42 PM, Florian Engelmann wrote:

Hi Igor,



what's your Ceph version?


12.2.8 (SES 5.5 - patched to the latest version)



Can you also check the output for

ceph-bluestore-tool show-label -p 


ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-0/
infering bluefs devices from bluestore path
{
    "/var/lib/ceph/osd/ceph-0//block": {
    "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
    "size": 8001457295360,
    "btime": "2018-06-29 23:43:12.088842",
    "description": "main",
    "bluefs": "1",
    "ceph_fsid": "a146-6561-307e-b032-c5cee2ee520c",
    "kv_backend": "rocksdb",
    "magic": "ceph osd volume v026",
    "mkfs_done": "yes",
    "ready": "ready",
    "whoami": "0"
    },
    "/var/lib/ceph/osd/ceph-0//block.wal": {
    "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
    "size": 524288000,
    "btime": "2018-06-29 23:43:12.098690",
    "description": "bluefs wal"
    },
    "/var/lib/ceph/osd/ceph-0//block.db": {
    "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
    "size": 524288000,
    "btime": "2018-06-29 23:43:12.098023",
    "description": "bluefs db"
    }
}





It should report 'size' labels for every volume, please check they 
contain new values.




That's exactly the problem, whether "ceph-bluestore-tool 
show-label" nor "ceph daemon osd.0 perf dump|jq '.bluefs'" did 
recognize the new sizes. But we are 100% sure the new devices are 
used as we already deleted the old once...


We tried to delete the "key" "size" to add one with the new value but:

ceph-bluestore-tool rm-label-key --dev 
/var/lib/ceph/osd/ceph-0/block.db -k size

key 'size' not present

even if:

ceph-bluestore-tool show-label --dev /var/lib/ceph/osd/ceph-0/block.db
{
    "/var/lib/ceph/osd/ceph-0/block.db": {
    "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
    "size": 524288000,
    "btime": "2018-06-29 23:43:12.098023",
    "description": "bluefs db"
    }
}

So it looks like the key "size" is "read-only"?


There was a bug in updating specific keys, see
https://github.com/ceph/ceph/pull/24352

This PR also eliminates the need to set sizes manually on bdev-expand.

I thought it had been backported to Luminous but it looks like it 
doesn't.

Will submit a PR shortly.




Thank you so much Igor! So we have to decide how to proceed. Maybe 
you could help us here as well.


Option A: Wait for this fix to be available. -> could last weeks or 
even months
if you can build a custom version of ceph_bluestore_tool then this is 
a short path. I'll submit a patch today or tomorrow which you need to 
integrate into your private build.

Then you need to upgrade just the tool and apply new sizes.



Option B: Recreate OSDs "one-by-one". -> will take a very long time 
as well

No need for that IMO.


Option C: There is some "lowlevel" commad allowing us to fix those 
sizes?
Well hex editor might help here as well. What you need is just to 
update 64bit size value in block.db and block.wal files. In my lab I 
can find it at offset 0x52. Most probably this is the fixed location 
but it's better to check beforehand - existing value should contain 
value corresponding to the one reported with show-label. Or I can do 
that for you - please send the  first 4K chunks to me along with 
corresponding label report.
Then update with new values - the field has to contain exactly the 
same size as your new partition.










Thanks,

Igor


On 11/20/2018 5:29 PM, Florian Engelmann wrote:

Hi,

today we migrated all of our rocksdb and wal devices to new once. 
The new once are much bigger (500MB for wal/db -> 60GB db and 2G 
WAL) and LVM based.


We migrated like:

    export OSD=x

    systemctl stop ceph-osd@$OSD

    lvcreate -n db-osd$OSD -L60g data || exit 1
    lvcreate -n wal-osd$OSD -L2g data || exit 1

    dd if=/var/lib/ceph/osd/ceph-$OSD/block.wal 
of=/dev/data/wal-osd$OSD bs=1M || exit 1
    dd if=/var/lib/ceph/osd/ceph-$OSD/block.db 
of=/dev/data/db-osd$OSD bs=1M  || exit 1


    rm -

Re: [ceph-users] RocksDB and WAL migration to new block device

2018-11-20 Thread Florian Engelmann

Am 11/20/18 um 4:59 PM schrieb Igor Fedotov:



On 11/20/2018 6:42 PM, Florian Engelmann wrote:

Hi Igor,



what's your Ceph version?


12.2.8 (SES 5.5 - patched to the latest version)



Can you also check the output for

ceph-bluestore-tool show-label -p 


ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-0/
infering bluefs devices from bluestore path
{
    "/var/lib/ceph/osd/ceph-0//block": {
    "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
    "size": 8001457295360,
    "btime": "2018-06-29 23:43:12.088842",
    "description": "main",
    "bluefs": "1",
    "ceph_fsid": "a146-6561-307e-b032-c5cee2ee520c",
    "kv_backend": "rocksdb",
    "magic": "ceph osd volume v026",
    "mkfs_done": "yes",
    "ready": "ready",
    "whoami": "0"
    },
    "/var/lib/ceph/osd/ceph-0//block.wal": {
    "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
    "size": 524288000,
    "btime": "2018-06-29 23:43:12.098690",
    "description": "bluefs wal"
    },
    "/var/lib/ceph/osd/ceph-0//block.db": {
    "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
    "size": 524288000,
    "btime": "2018-06-29 23:43:12.098023",
    "description": "bluefs db"
    }
}





It should report 'size' labels for every volume, please check they 
contain new values.




That's exactly the problem, whether "ceph-bluestore-tool show-label" 
nor "ceph daemon osd.0 perf dump|jq '.bluefs'" did recognize the new 
sizes. But we are 100% sure the new devices are used as we already 
deleted the old once...


We tried to delete the "key" "size" to add one with the new value but:

ceph-bluestore-tool rm-label-key --dev 
/var/lib/ceph/osd/ceph-0/block.db -k size

key 'size' not present

even if:

ceph-bluestore-tool show-label --dev /var/lib/ceph/osd/ceph-0/block.db
{
    "/var/lib/ceph/osd/ceph-0/block.db": {
    "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
    "size": 524288000,
    "btime": "2018-06-29 23:43:12.098023",
    "description": "bluefs db"
    }
}

So it looks like the key "size" is "read-only"?


There was a bug in updating specific keys, see
https://github.com/ceph/ceph/pull/24352

This PR also eliminates the need to set sizes manually on bdev-expand.

I thought it had been backported to Luminous but it looks like it doesn't.
Will submit a PR shortly.




Thank you so much Igor! So we have to decide how to proceed. Maybe you 
could help us here as well.


Option A: Wait for this fix to be available. -> could last weeks or even 
months


Option B: Recreate OSDs "one-by-one". -> will take a very long time as well

Option C: There is some "lowlevel" commad allowing us to fix those sizes?







Thanks,

Igor


On 11/20/2018 5:29 PM, Florian Engelmann wrote:

Hi,

today we migrated all of our rocksdb and wal devices to new once. 
The new once are much bigger (500MB for wal/db -> 60GB db and 2G 
WAL) and LVM based.


We migrated like:

    export OSD=x

    systemctl stop ceph-osd@$OSD

    lvcreate -n db-osd$OSD -L60g data || exit 1
    lvcreate -n wal-osd$OSD -L2g data || exit 1

    dd if=/var/lib/ceph/osd/ceph-$OSD/block.wal 
of=/dev/data/wal-osd$OSD bs=1M || exit 1
    dd if=/var/lib/ceph/osd/ceph-$OSD/block.db 
of=/dev/data/db-osd$OSD bs=1M  || exit 1


    rm -v /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
    rm -v /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
    ln -vs /dev/data/db-osd$OSD /var/lib/ceph/osd/ceph-$OSD/block.db 
|| exit 1
    ln -vs /dev/data/wal-osd$OSD 
/var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1



    chown -c ceph:ceph $(realpath /dev/data/db-osd$OSD) || exit 1
    chown -c ceph:ceph $(realpath /dev/data/wal-osd$OSD) || exit 1
    chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
    chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1


    ceph-bluestore-tool bluefs-bdev-expand --path 
/var/lib/ceph/osd/ceph-$OSD/ || exit 1


    systemctl start ceph-osd@$OSD


Everything went fine but it looks like the db and wal size is still 
the old one:


ceph daemon osd.0 perf dump|jq '.bluefs'
{
  "gift_bytes": 0,
  "reclaim_bytes": 0,
  "db_total_bytes": 524279808,
  "db_used_bytes": 330301440,
  "wal_total_bytes": 524283904,
  "wal_used_bytes": 69206016,
  "slow_total_bytes": 320058949632,
  "slow_used_bytes": 13606322176,

Re: [ceph-users] RocksDB and WAL migration to new block device

2018-11-20 Thread Florian Engelmann

Hi Igor,



what's your Ceph version?


12.2.8 (SES 5.5 - patched to the latest version)



Can you also check the output for

ceph-bluestore-tool show-label -p 


ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-0/
infering bluefs devices from bluestore path
{
"/var/lib/ceph/osd/ceph-0//block": {
"osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
"size": 8001457295360,
"btime": "2018-06-29 23:43:12.088842",
"description": "main",
"bluefs": "1",
"ceph_fsid": "a146-6561-307e-b032-c5cee2ee520c",
"kv_backend": "rocksdb",
"magic": "ceph osd volume v026",
"mkfs_done": "yes",
"ready": "ready",
"whoami": "0"
},
"/var/lib/ceph/osd/ceph-0//block.wal": {
"osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
"size": 524288000,
"btime": "2018-06-29 23:43:12.098690",
"description": "bluefs wal"
},
"/var/lib/ceph/osd/ceph-0//block.db": {
"osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
"size": 524288000,
"btime": "2018-06-29 23:43:12.098023",
"description": "bluefs db"
}
}





It should report 'size' labels for every volume, please check they 
contain new values.




That's exactly the problem, whether "ceph-bluestore-tool show-label" nor 
"ceph daemon osd.0 perf dump|jq '.bluefs'" did recognize the new sizes. 
But we are 100% sure the new devices are used as we already deleted the 
old once...


We tried to delete the "key" "size" to add one with the new value but:

ceph-bluestore-tool rm-label-key --dev /var/lib/ceph/osd/ceph-0/block.db 
-k size

key 'size' not present

even if:

ceph-bluestore-tool show-label --dev /var/lib/ceph/osd/ceph-0/block.db
{
"/var/lib/ceph/osd/ceph-0/block.db": {
"osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
"size": 524288000,
"btime": "2018-06-29 23:43:12.098023",
"description": "bluefs db"
}
}

So it looks like the key "size" is "read-only"?





Thanks,

Igor


On 11/20/2018 5:29 PM, Florian Engelmann wrote:

Hi,

today we migrated all of our rocksdb and wal devices to new once. The 
new once are much bigger (500MB for wal/db -> 60GB db and 2G WAL) and 
LVM based.


We migrated like:

    export OSD=x

    systemctl stop ceph-osd@$OSD

    lvcreate -n db-osd$OSD -L60g data || exit 1
    lvcreate -n wal-osd$OSD -L2g data || exit 1

    dd if=/var/lib/ceph/osd/ceph-$OSD/block.wal 
of=/dev/data/wal-osd$OSD bs=1M || exit 1
    dd if=/var/lib/ceph/osd/ceph-$OSD/block.db of=/dev/data/db-osd$OSD 
bs=1M  || exit 1


    rm -v /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
    rm -v /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
    ln -vs /dev/data/db-osd$OSD /var/lib/ceph/osd/ceph-$OSD/block.db 
|| exit 1
    ln -vs /dev/data/wal-osd$OSD /var/lib/ceph/osd/ceph-$OSD/block.wal 
|| exit 1



    chown -c ceph:ceph $(realpath /dev/data/db-osd$OSD) || exit 1
    chown -c ceph:ceph $(realpath /dev/data/wal-osd$OSD) || exit 1
    chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
    chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1


    ceph-bluestore-tool bluefs-bdev-expand --path 
/var/lib/ceph/osd/ceph-$OSD/ || exit 1


    systemctl start ceph-osd@$OSD


Everything went fine but it looks like the db and wal size is still 
the old one:


ceph daemon osd.0 perf dump|jq '.bluefs'
{
  "gift_bytes": 0,
  "reclaim_bytes": 0,
  "db_total_bytes": 524279808,
  "db_used_bytes": 330301440,
  "wal_total_bytes": 524283904,
  "wal_used_bytes": 69206016,
  "slow_total_bytes": 320058949632,
  "slow_used_bytes": 13606322176,
  "num_files": 220,
  "log_bytes": 44204032,
  "log_compactions": 0,
  "logged_bytes": 31145984,
  "files_written_wal": 1,
  "files_written_sst": 1,
  "bytes_written_wal": 37753489,
  "bytes_written_sst": 238992
}


Even if the new block devices are recognized correctly:

2018-11-20 11:40:34.653524 7f70219b8d00  1 bdev(0x5647ea9ce200 
/var/lib/ceph/osd/ceph-0/block.db) open size 64424509440 (0xf, 
60GiB) block_size 4096 (4KiB) non-rotational
2018-11-20 11:40:34.653532 7f70219b8d00  1 bluefs add_block_device 
bdev 1 path /var/lib/ceph/osd/ceph-0/block.db size 60GiB



2018-11-20 11:40:34.662385 7f70219b8d00  1 bdev(0x5647ea9ce600 
/var/lib

[ceph-users] RocksDB and WAL migration to new block device

2018-11-20 Thread Florian Engelmann

Hi,

today we migrated all of our rocksdb and wal devices to new once. The 
new once are much bigger (500MB for wal/db -> 60GB db and 2G WAL) and 
LVM based.


We migrated like:

export OSD=x

systemctl stop ceph-osd@$OSD

lvcreate -n db-osd$OSD -L60g data || exit 1
lvcreate -n wal-osd$OSD -L2g data || exit 1

dd if=/var/lib/ceph/osd/ceph-$OSD/block.wal 
of=/dev/data/wal-osd$OSD bs=1M || exit 1
dd if=/var/lib/ceph/osd/ceph-$OSD/block.db of=/dev/data/db-osd$OSD 
bs=1M  || exit 1


rm -v /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
rm -v /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
ln -vs /dev/data/db-osd$OSD /var/lib/ceph/osd/ceph-$OSD/block.db || 
exit 1
ln -vs /dev/data/wal-osd$OSD /var/lib/ceph/osd/ceph-$OSD/block.wal 
|| exit 1



chown -c ceph:ceph $(realpath /dev/data/db-osd$OSD) || exit 1
chown -c ceph:ceph $(realpath /dev/data/wal-osd$OSD) || exit 1
chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1


ceph-bluestore-tool bluefs-bdev-expand --path 
/var/lib/ceph/osd/ceph-$OSD/ || exit 1


systemctl start ceph-osd@$OSD


Everything went fine but it looks like the db and wal size is still the 
old one:


ceph daemon osd.0 perf dump|jq '.bluefs'
{
  "gift_bytes": 0,
  "reclaim_bytes": 0,
  "db_total_bytes": 524279808,
  "db_used_bytes": 330301440,
  "wal_total_bytes": 524283904,
  "wal_used_bytes": 69206016,
  "slow_total_bytes": 320058949632,
  "slow_used_bytes": 13606322176,
  "num_files": 220,
  "log_bytes": 44204032,
  "log_compactions": 0,
  "logged_bytes": 31145984,
  "files_written_wal": 1,
  "files_written_sst": 1,
  "bytes_written_wal": 37753489,
  "bytes_written_sst": 238992
}


Even if the new block devices are recognized correctly:

2018-11-20 11:40:34.653524 7f70219b8d00  1 bdev(0x5647ea9ce200 
/var/lib/ceph/osd/ceph-0/block.db) open size 64424509440 (0xf, 
60GiB) block_size 4096 (4KiB) non-rotational
2018-11-20 11:40:34.653532 7f70219b8d00  1 bluefs add_block_device bdev 
1 path /var/lib/ceph/osd/ceph-0/block.db size 60GiB



2018-11-20 11:40:34.662385 7f70219b8d00  1 bdev(0x5647ea9ce600 
/var/lib/ceph/osd/ceph-0/block.wal) open size 2147483648 (0x8000, 
2GiB) block_size 4096 (4KiB) non-rotational
2018-11-20 11:40:34.662406 7f70219b8d00  1 bluefs add_block_device bdev 
0 path /var/lib/ceph/osd/ceph-0/block.wal size 2GiB



Are we missing some command to "notify" rocksdb about the new device size?

All the best,
Florian


smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Large omap objects - how to fix ?

2018-10-26 Thread Florian Engelmann
d add
--bucket $bucket --num-shards 8; done

# radosgw-admin reshard process



--
Kind regards,

Ben Morrice

__
Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670
EPFL / BBP
Biotech Campus
Chemin des Mines 9
1202 Geneva
Switzerland

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--

EveryWare AG
Florian Engelmann
Systems Engineer
Zurlindenstrasse 52a
CH-8003 Zürich

tel: +41 44 466 60 00
fax: +41 44 466 60 10
mail: mailto:florian.engelm...@everyware.ch
web: http://www.everyware.ch


smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW how to delete orphans

2018-10-26 Thread Florian Engelmann

Hi,

we've got the same problem here. Our 12.2.5 RadosGWs crashed 
(unrecognised by us) about 30.000 times with ongoing multipart uploads. 
After a couple of days we ended up with:


xx-1.rgw.buckets.data   6  N/A   N/A 
116TiB 87.22   17.1TiB 36264870 36.26M 3.63GiB 
148MiB   194TiB


116TB data (194TB raw) while only:

for i in $(radosgw-admin bucket list | jq -r '.[]'); do  radosgw-admin 
bucket stats --bucket=$i | jq '.usage | ."rgw.main" | .size_kb' ; done | 
awk '{ SUM += $1} END { print SUM/1024/1024/1024 }'


46.0962

116 - 46 = 70TB

So 70TB of objects are orphans, right?

And there are 36.264.870 objects in our rgw.buckets.data pool.

So we started:

radosgw-admin orphans list-jobs --extra-info
[
{
"orphan_search_state": {
"info": {
"orphan_search_info": {
"job_name": "check-orph",
"pool": "zh-1.rgw.buckets.data",
"num_shards": 64,
"start_time": "2018-10-10 09:01:14.746436Z"
}
},
"stage": {
"orphan_search_stage": {
"search_stage": "iterate_bucket_index",
"shard": 0,
"marker": ""
}
}
}
}
]

writing stdout to: orphans.txt

I am not sure about how to interpret the output but:

cat orphans.txt | awk '/^storing / { SUM += $2} END { print SUM }'
2145042765

So how to interpret those output lines:
...
storing 16 entries at orphan.scan.check-orph.linked.62
storing 19 entries at orphan.scan.check-orph.linked.63
storing 13 entries at orphan.scan.check-orph.linked.0
storing 13 entries at orphan.scan.check-orph.linked.1
...

Is it like

"I am storing 16 'healthy' object 'names' to the shard 
orphan.scan.check-orph.linked.62"


Is it objects? What is meant by "entries"? Where are those "shards"? Are 
they files or objects in a pool? How to know about the progress of 
"orphans find"? Is the job still doing the right thing? Time estimated 
to run on SATA disks with 194TB RAW?


The orphan find command stored already 2.145.042.765 (more than 2 
billion) "entries"... while there are "only" 36 million objects...


Is the process still healthy and doing the right thing?

All the best,
Florian





Am 10/3/17 um 10:48 AM schrieb Andreas Calminder:
The output, to stdout, is something like leaked: $objname. Am I supposed 
to pipe it to a log, grep for leaked: and pipe it to rados delete? Or am 
I supposed to dig around in the log pool to try and find the objects 
there? The information available is quite vague. Maybe Yehuda can shed 
some light on this issue?


Best regards,
/Andreas

On 3 Oct 2017 06:25, "Christian Wuerdig" > wrote:


yes, at least that's how I'd interpret the information given in this
thread:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-February/016521.html



On Tue, Oct 3, 2017 at 1:11 AM, Webert de Souza Lima
mailto:webert.b...@gmail.com>> wrote:
 > Hey Christian,
 >
 >> On 29 Sep 2017 12:32 a.m., "Christian Wuerdig"
 >> mailto:christian.wuer...@gmail.com>> wrote:
 >>>
 >>> I'm pretty sure the orphan find command does exactly just that -
 >>> finding orphans. I remember some emails on the dev list where
Yehuda
 >>> said he wasn't 100% comfortable of automating the delete just yet.
 >>> So the purpose is to run the orphan find tool and then delete the
 >>> orphaned objects once you're happy that they all are actually
 >>> orphaned.
 >>>
 >
 > so what you mean is that one should manually remove the result listed
 > objects that are output?
 >
 >
 > Regards,
 >
 > Webert Lima
 > DevOps Engineer at MAV Tecnologia
 > Belo Horizonte - Brasil
 >
 >
 > ___
 > ceph-users mailing list
 > ceph-users@lists.ceph.com 
 > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 >
___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


<< ATT1.txt (0.4KB) (0.4KB) >>




smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] understanding % used in ceph df

2018-10-19 Thread Florian Engelmann

Hi,


Our Ceph cluster is a 6 Node cluster each node having 8 disks. The 
cluster is used for object storage only (right now). We do use EC 3+2 on 
the buckets.data pool.


We had a problem with RadosGW segfaulting (12.2.5) till we upgraded to 
12.2.8. We had almost 30.000 radosgw crashes leading to millions of 
unreferenced objects (failed multiuploads?). It filled our cluster so 
fast that we are now in danger to run out of space.


As you can see we are reweighting some OSDs right now. But the real 
question is how "used" is calculated in ceph df.


Global: %RAW USED = 76.49%

while

x-1.rgw.buckets.data Used = 90.32%

Am I right this is because we should still be "able" to loose one OSD node?

If thats true - reweight can help just a little to rebalance the 
capacity used on each node?


The only chance we have right now to survive until new HDDs arrive is to 
delete objects, right?



ceph -s
  cluster:
id: a146-6561-307e-b032-x
health: HEALTH_WARN
3 nearfull osd(s)
13 pool(s) nearfull
1 large omap objects
766760/180478374 objects misplaced (0.425%)

  services:
mon: 3 daemons, quorum ceph1-mon3,ceph1-mon2,ceph1-mon1
mgr: ceph1-mon2(active), standbys: ceph1-mon1, ceph1-mon3
osd: 36 osds: 36 up, 36 in; 24 remapped pgs
rgw: 3 daemons active
rgw-nfs: 2 daemons active

  data:
pools:   13 pools, 1424 pgs
objects: 36.10M objects, 115TiB
usage:   200TiB used, 61.6TiB / 262TiB avail
pgs: 766760/180478374 objects misplaced (0.425%)
 1400 active+clean
 16   active+remapped+backfill_wait
 8active+remapped+backfilling

  io:
client:   3.05MiB/s rd, 0B/s wr, 1.12kop/s rd, 37op/s wr
recovery: 306MiB/s, 91objects/s

ceph df
GLOBAL:
SIZE   AVAIL   RAW USED %RAW USED
262TiB 61.6TiB   200TiB 76.49
POOLS:
NAMEID USED%USED MAX AVAIL 
   OBJECTS
iscsi-images1  35B 0   6.87TiB 
  5
.rgw.root   2  3.57KiB 0   6.87TiB 
 18
x-1.rgw.buckets.data   6   115TiB 90.32   12.4TiB 
  36090523
x-1.rgw.control7   0B 0   6.87TiB 
 8
x-1.rgw.meta   8   943KiB 0   6.87TiB 
  3265
x-1.rgw.log9   0B 0   6.87TiB 
   407
x-1.rgw.buckets.index  12  0B 0   6.87TiB 
  3096
x-1.rgw.buckets.non-ec 13  0B 0   6.87TiB 
  1623
default.rgw.meta14373B 0   6.87TiB 
  3
default.rgw.control 15  0B 0   6.87TiB 
  8
default.rgw.log 16  0B 0   6.87TiB 
  0
scbench 17  0B 0   6.87TiB 
  0
rbdbench18 1.00GiB  0.01   6.87TiB 
260




Regards,
Flo


smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com