Re: [ceph-users] ceph df space usage confusion - balancing needed?

2018-10-26 Thread Oliver Freyermuth
Am 27.10.18 um 04:12 schrieb Linh Vu:
> Should be fine as long as your "mgr/balancer/max_misplaced" is reasonable. I 
> find the default value of 0.05 decent enough, although from experience that 
> seems like 0.05% rather than 5% as suggested here: 
> http://docs.ceph.com/docs/luminous/mgr/balancer/  

Ok! I did actually choose 0.01. Interestingly, during the initial large 
rebalancing, it went up to > 2 % of misplaced objects (in small steps) until I 
decided to stop the balancer for a day to give the cluster
enough time to adapt. 
 
> You can also choose to turn it on only during certain hours when the cluster 
> might be less busy. The config-keys are there somewhere (there's a post by 
> Dan van der Ster on the ML about them) but they don't actually work in 12.2.8 
> at least, when I tried them. I suggest just use cron to turn the balancer on 
> and off. 

I found that mail in the archives. Indeed, that seems helpful. I'll start with 
permanently leaving the balancer on for now and observe if it has any impact. 
Since we rarely change the cluster's layout,
it should effectively just sit there silently most of the time. 

Thanks!
Oliver

> 
> --
> *From:* Oliver Freyermuth 
> *Sent:* Friday, 26 October 2018 9:32:14 PM
> *To:* Linh Vu; Janne Johansson
> *Cc:* ceph-users@lists.ceph.com; Peter Wienemann
> *Subject:* Re: [ceph-users] ceph df space usage confusion - balancing needed?
>  
> Dear Cephalopodians,
> 
> thanks for all your feedback!
> 
> I finally "pushed the button" and let upmap run for ~36 hours.
> Previously, we had ~63 % usage of our CephFS with only 50 % raw usage, now, 
> we see only 53.77 % usage.
> 
> That's as close as I expect things to ever become, and we gained about 70 TiB 
> of free storage by that, which is almost one file server.
> So the outcome is really close to perfection :-).
> 
> I'm leaving the balancer active now in upmap mode. Any bad experiences with 
> leaving it active "forever"?
> 
> Cheers and many thanks again,
>     Oliver
> 
> Am 23.10.18 um 01:14 schrieb Linh Vu:
>> Upmap is awesome. I ran it on our new cluster before we started ingesting 
>> data, so that the PG count is balanced on all OSDs. After ingesting about 
>> 315TB, it's still beautifully balanced. Note: we have a few nodes with 8TB 
>> OSDs, and the rest on 10TBs. 
>> 
>> 
>> # ceph osd df plain
>> ID  CLASS    WEIGHT  REWEIGHT SIZE    USE     AVAIL   %USE  VAR  PGS 
>>   0   mf1hdd 7.27739  1.0 7.28TiB 2.06TiB 5.21TiB 28.34 1.01 144 
>>   1   mf1hdd 7.27739  1.0 7.28TiB 2.07TiB 5.21TiB 28.38 1.02 144 
>>   2   mf1hdd 7.27739  1.0 7.28TiB 2.03TiB 5.24TiB 27.96 1.00 142 
>>   3   mf1hdd 7.27739  1.0 7.28TiB 2.06TiB 5.21TiB 28.37 1.02 144 
>>   4   mf1hdd 7.27739  1.0 7.28TiB 2.03TiB 5.24TiB 27.96 1.00 142 
>>   5   mf1hdd 7.27739  1.0 7.28TiB 2.02TiB 5.26TiB 27.73 0.99 141 
>>   6   mf1hdd 7.27739  1.0 7.28TiB 2.03TiB 5.24TiB 27.94 1.00 142 
>>   7   mf1hdd 7.27739  1.0 7.28TiB 2.06TiB 5.21TiB 28.35 1.02 144 
>>   8   mf1hdd 7.27739  1.0 7.28TiB 2.02TiB 5.26TiB 27.76 0.99 141 
>>   9   mf1hdd 7.27739  1.0 7.28TiB 2.04TiB 5.24TiB 27.97 1.00 142 
>>  10   mf1hdd 7.27739  1.0 7.28TiB 2.06TiB 5.21TiB 28.35 1.02 144 
>>  11   mf1hdd 7.27739  1.0 7.28TiB 2.04TiB 5.24TiB 27.99 1.00 142 
>>  12   mf1hdd 7.27739  1.0 7.28TiB 2.02TiB 5.26TiB 27.75 0.99 141 
>>  13   mf1hdd 7.27739  1.0 7.28TiB 2.03TiB 5.24TiB 27.96 1.00 142 
>>  14   mf1hdd 7.27739  1.0 7.28TiB 2.02TiB 5.26TiB 27.78 0.99 141 
>>  15   mf1hdd 7.27739  1.0 7.28TiB 2.07TiB 5.21TiB 28.38 1.02 144 
>> 224 nvmemeta 0.02179  1.0 22.3GiB 1.52GiB 20.8GiB  6.82 0.24 185 
>> 225 nvmemeta 0.02179  1.0 22.4GiB 1.49GiB 20.9GiB  6.68 0.24 182 
>> 144   mf1hdd 8.91019  1.0 8.91TiB 2.48TiB 6.43TiB 27.81 1.00 173 
>> 145   mf1hdd 8.91019  1.0 8.91TiB 2.48TiB 6.43TiB 27.84 1.00 173 
>> 146   mf1hdd 8.91019  1.0 8.91TiB 2.48TiB 6.43TiB 27.84 1.00 173 
>> 147   mf1hdd 8.91019  1.0 8.91TiB 2.48TiB 6.43TiB 2

Re: [ceph-users] Lost machine with MON and MDS

2018-10-26 Thread Martin Verges
Hello,

did you lost the only mon as well? If yes, restore it not easy but
possible. The mds is no problem just install it.

--
Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx


Maiko de Andrade  schrieb am Fr., 26. Okt. 2018,
19:40:

> Hi,
>
> I have 3 machine with ceph config with cephfs. But I lost one machine,
> just with mon and mds. It's possible recovey cephfs? If yes how?
>
> ceph: Ubuntu 16.05.5 (lost this machine)
> - mon
> - mds
> - osd
>
> ceph-osd-1: Ubuntu 16.05.5
> - osd
>
> ceph-osd-2: Ubuntu 16.05.5
> - osd
>
>
>
> []´s
> Maiko de Andrade
> MAX Brasil
> Desenvolvedor de Sistemas
> +55 51 91251756
> http://about.me/maiko
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Client new version than server?

2018-10-26 Thread Martin Verges
Hello,

In my opinion it is not a problem. It could be a problem on mayor releases
(read the release notes to check for incompatibilities) but minor version
differences shouldn't be a problem.

In the most environments I know are different client versions connecting to
a cluster.

--
Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx


Andre Goree  schrieb am Sa., 27. Okt. 2018, 00:02:

> I wanted to ask for thoughts/guidance on the case of running a newer
> version of Ceph on a client than the version of Ceph that is running on
> the server.
>
> E.g., I have a client machine running Ceph 12.2.8, while the server is
> running 12.2.4.  Is this a terrible idea?  My thoughts are to more
> thoroughly test 12.2.8 on the server side before upgrading my production
> server to 12.2.8.  However, I have a client that's been recently
> installed and thus pulled down the latest Luminous version (12.2.8).
>
> Thanks in advance.
>
> --
> Andre Goree
> -=-=-=-=-=-
> Email - andre at drenet.net
> Website   - http://blog.drenet.net
> PGP key   - http://www.drenet.net/pubkey.html
> -=-=-=-=-=-
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Lost machine with MON and MDS

2018-10-26 Thread Yan, Zheng
ceph-mds store all its data in object store. you just need to create new
ceph-mds on another machine

On Sat, Oct 27, 2018 at 1:40 AM Maiko de Andrade 
wrote:

> Hi,
>
> I have 3 machine with ceph config with cephfs. But I lost one machine,
> just with mon and mds. It's possible recovey cephfs? If yes how?
>
> ceph: Ubuntu 16.05.5 (lost this machine)
> - mon
> - mds
> - osd
>
> ceph-osd-1: Ubuntu 16.05.5
> - osd
>
> ceph-osd-2: Ubuntu 16.05.5
> - osd
>
>
>
> []´s
> Maiko de Andrade
> MAX Brasil
> Desenvolvedor de Sistemas
> +55 51 91251756
> http://about.me/maiko
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Client new version than server?

2018-10-26 Thread Andre Goree
I wanted to ask for thoughts/guidance on the case of running a newer 
version of Ceph on a client than the version of Ceph that is running on 
the server.


E.g., I have a client machine running Ceph 12.2.8, while the server is 
running 12.2.4.  Is this a terrible idea?  My thoughts are to more 
thoroughly test 12.2.8 on the server side before upgrading my production 
server to 12.2.8.  However, I have a client that's been recently 
installed and thus pulled down the latest Luminous version (12.2.8).


Thanks in advance.

--
Andre Goree
-=-=-=-=-=-
Email - andre at drenet.net
Website   - http://blog.drenet.net
PGP key   - http://www.drenet.net/pubkey.html
-=-=-=-=-=-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Lost machine with MON and MDS

2018-10-26 Thread Maiko de Andrade
Hi,

I have 3 machine with ceph config with cephfs. But I lost one machine, just
with mon and mds. It's possible recovey cephfs? If yes how?

ceph: Ubuntu 16.05.5 (lost this machine)
- mon
- mds
- osd

ceph-osd-1: Ubuntu 16.05.5
- osd

ceph-osd-2: Ubuntu 16.05.5
- osd



[]´s
Maiko de Andrade
MAX Brasil
Desenvolvedor de Sistemas
+55 51 91251756
http://about.me/maiko
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Large omap objects - how to fix ?

2018-10-26 Thread Florian Engelmann

Hi,

hijacking the hijacker! Sorry!

radosgw-admin bucket reshard --bucket somebucket --num-shards 8
*** NOTICE: operation will not remove old bucket index objects ***
*** these will need to be removed manually ***
tenant:
bucket name: somebucket
old bucket instance id: cb1594b3-a782-49d0-a19f-68cd48870a63.1923153.1
new bucket instance id: cb1594b3-a782-49d0-a19f-68cd48870a63.3119759.1
total entries: 1000 2000 3000 4000 5000 6000 7000 8000 9000 1 11000 
12000 13000 14000 15000 16000 17000 18000 19000 2 21000 22000 23000 
24000 25000 26000 27000 28000 29000 3 31000 32000 33000 34000 35000 
36000 37000 38000 39000 4 41000 42000 43000 44000 45000 46000 47000 
48000 49000 5 51000 52000 53000 54000 55000 56000 57000 58000 59000 
6 61000 62000 63000 64000 65000 66000 67000 68000 69000 7 71000 
72000 73000 74000 75000 76000 77000 78000 79000 8 81000 82000 83000 
84000 85000 86000 87000 88000 89000 9 91000 92000 93000 94000 95000 
96000 97000 98000 99000 10 101000 102000 103000 104000 105000 106000 
107000 108000 109000 11 111000 112000 113000 114000 115000 116000 
117000 118000 119000 12 121000 122000 123000 124000 125000 126000 
127000 128000 129000 13 131000 132000 133000 134000 135000 136000 
137000 138000 139000 14 141000 142000 143000 144000 145000 146000 
147000 148000 149000 15 151000 152000 153000 154000 155000 156000 
157000 158000 159000 16 161000 162000 163000 164000 165000 166000 
167000 168000 169000 17 171000 172000 173000 174000 175000 176000 
177000 178000 179000 18 181000 182000 183000 184000 185000 186000 
187000 188000 189000 19 191000 192000 193000 194000 195000 196000 
197000 198000 199000 20 201000 202000 203000 204000 205000 206000 
207000 207660


What to do now?

ceph -s is still:

health: HEALTH_WARN
1 large omap objects

But I have no idea how to:
*** NOTICE: operation will not remove old bucket index objects ***
*** these will need to be removed manually ***


All the best,
Flo


Am 10/26/18 um 3:56 PM schrieb Alexandru Cucu:

Hi,

Sorry to hijack this thread. I have a similar issue also with 12.2.8
recently upgraded from Jewel.

I my case all buckets are within limits:
 # radosgw-admin bucket limit check | jq '.[].buckets[].fill_status' | uniq
 "OK"

 # radosgw-admin bucket limit check | jq
'.[].buckets[].objects_per_shard'  | sort -n | uniq
 0
 1
 30
 109
 516
 5174
 50081
 50088
 50285
 50323
 50336
 51826

rgw_max_objs_per_shard is set to the default of 100k

---
Alex Cucu

On Fri, Oct 26, 2018 at 4:09 PM Ben Morrice  wrote:


Hello all,

After a recent Luminous upgrade (now running 12.2.8 with all OSDs
migrated to bluestore, upgraded from 11.2.0 and running filestore) I am
currently experiencing the warning 'large omap objects'.
I know this is related to large buckets in radosgw, and luminous
supports 'dynamic sharding' - however I feel that something is missing
from our configuration and i'm a bit confused on what the right approach
is to fix it.

First a bit of background info:

We previously had a multi site radosgw installation, however recently we
decommissioned the second site. With the radosgw multi-site
configuration we had 'bucket_index_max_shards = 0'. Since
decommissioning the second site, I have removed the secondary zonegroup
and changed 'bucket_index_max_shards' to be 16 for the single primary zone.
All our buckets do not have a 'num_shards' field when running
'radosgw-admin bucket stats --bucket '
Is this normal ?

Also - I'm finding it difficult to find out exactly what to do with the
buckets that are affected with 'large omap' (see commands below).
My interpretation of 'search the cluster log' is also listed below.

What do I need to do to with the below buckets get back to an overall
ceph HEALTH OK state ? :)


# ceph health detail
HEALTH_WARN 2 large omap objects
2 large objects found in pool '.bbp-gva-master.rgw.buckets.index'
Search the cluster log for 'Large omap object found' for more details.

# ceph osd pool get .bbp-gva-master.rgw.buckets.index pg_num
pg_num: 64

# for i in `ceph pg ls-by-pool .bbp-gva-master.rgw.buckets.index | tail
-n +2 | awk '{print $1}'`; do echo -n "$i: "; ceph pg $i query |grep
num_large_omap_objects | head -1 | awk '{print $2}'; done | grep ": 1"
137.1b: 1
137.36: 1

# cat buckets
#!/bin/bash
buckets=`radosgw-admin metadata list bucket |grep \" | cut -d\" -f2`
for i in $buckets
do
id=`radosgw-admin bucket stats --bucket $i |grep \"id\" | cut -d\" -f4`
pg=`ceph osd map .bbp-gva-master.rgw.buckets.index ${id} | awk
'{print $11}' | cut -d\( -f2 | cut -d\) -f1`
echo "$i:$id:$pg"
done
# ./buckets > pglist
# egrep '137.1b|137.36' pglist |wc -l
192

The following doesn't appear to do change anything

# for bucket in `cut -d: -f1 pglist`; do radosgw-admin reshard add
--bucket $bucket --num-shards 8; done

# radosgw-admin reshard proc

Re: [ceph-users] Ceph mds memory leak while replay

2018-10-26 Thread Yan, Zheng
Reset the source code and apply the attached patch. It should resolve the
memory issue.

good luck
Yan, Zheng






On Fri, Oct 26, 2018 at 2:41 AM Johannes Schlueter 
wrote:

> Hello,
>
> os: ubuntu bionic lts
> ceph v12.2.7 luminous (on one node we updated to ceph-mds 12.2.8 with no
> luck)
> 2 mds and 1 backup mds
>
> we just experienced a problem while restarting a mds. As it has begun to
> replay the journal, the node ran out of memory.
> A restart later, after giving about 175GB of swapfile, it still breaks
> down.
>
> As mentioned in an maillist entry with similar problem earlier, we
> restarted all mds nodes causing all nodes to leak. Now they just switch
> around as they break down and the backup starts the replay.
>
> Sincerely
>
> Patrick
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


patch
Description: Binary data
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Large omap objects - how to fix ?

2018-10-26 Thread Alexandru Cucu
Hi,

Sorry to hijack this thread. I have a similar issue also with 12.2.8
recently upgraded from Jewel.

I my case all buckets are within limits:
# radosgw-admin bucket limit check | jq '.[].buckets[].fill_status' | uniq
"OK"

# radosgw-admin bucket limit check | jq
'.[].buckets[].objects_per_shard'  | sort -n | uniq
0
1
30
109
516
5174
50081
50088
50285
50323
50336
51826

rgw_max_objs_per_shard is set to the default of 100k

---
Alex Cucu

On Fri, Oct 26, 2018 at 4:09 PM Ben Morrice  wrote:
>
> Hello all,
>
> After a recent Luminous upgrade (now running 12.2.8 with all OSDs
> migrated to bluestore, upgraded from 11.2.0 and running filestore) I am
> currently experiencing the warning 'large omap objects'.
> I know this is related to large buckets in radosgw, and luminous
> supports 'dynamic sharding' - however I feel that something is missing
> from our configuration and i'm a bit confused on what the right approach
> is to fix it.
>
> First a bit of background info:
>
> We previously had a multi site radosgw installation, however recently we
> decommissioned the second site. With the radosgw multi-site
> configuration we had 'bucket_index_max_shards = 0'. Since
> decommissioning the second site, I have removed the secondary zonegroup
> and changed 'bucket_index_max_shards' to be 16 for the single primary zone.
> All our buckets do not have a 'num_shards' field when running
> 'radosgw-admin bucket stats --bucket '
> Is this normal ?
>
> Also - I'm finding it difficult to find out exactly what to do with the
> buckets that are affected with 'large omap' (see commands below).
> My interpretation of 'search the cluster log' is also listed below.
>
> What do I need to do to with the below buckets get back to an overall
> ceph HEALTH OK state ? :)
>
>
> # ceph health detail
> HEALTH_WARN 2 large omap objects
> 2 large objects found in pool '.bbp-gva-master.rgw.buckets.index'
> Search the cluster log for 'Large omap object found' for more details.
>
> # ceph osd pool get .bbp-gva-master.rgw.buckets.index pg_num
> pg_num: 64
>
> # for i in `ceph pg ls-by-pool .bbp-gva-master.rgw.buckets.index | tail
> -n +2 | awk '{print $1}'`; do echo -n "$i: "; ceph pg $i query |grep
> num_large_omap_objects | head -1 | awk '{print $2}'; done | grep ": 1"
> 137.1b: 1
> 137.36: 1
>
> # cat buckets
> #!/bin/bash
> buckets=`radosgw-admin metadata list bucket |grep \" | cut -d\" -f2`
> for i in $buckets
> do
>id=`radosgw-admin bucket stats --bucket $i |grep \"id\" | cut -d\" -f4`
>pg=`ceph osd map .bbp-gva-master.rgw.buckets.index ${id} | awk
> '{print $11}' | cut -d\( -f2 | cut -d\) -f1`
>echo "$i:$id:$pg"
> done
> # ./buckets > pglist
> # egrep '137.1b|137.36' pglist |wc -l
> 192
>
> The following doesn't appear to do change anything
>
> # for bucket in `cut -d: -f1 pglist`; do radosgw-admin reshard add
> --bucket $bucket --num-shards 8; done
>
> # radosgw-admin reshard process
>
>
>
> --
> Kind regards,
>
> Ben Morrice
>
> __
> Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670
> EPFL / BBP
> Biotech Campus
> Chemin des Mines 9
> 1202 Geneva
> Switzerland
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrate/convert replicated pool to EC?

2018-10-26 Thread David Turner
It is indeed adding a placement target and not removing it replacing the
pool. The get/put wouldn't be a rados or even ceph command, you would do it
through an s3 client.

On Fri, Oct 26, 2018, 9:38 AM Matthew Vernon  wrote:

> Hi,
>
> On 26/10/2018 12:38, Alexandru Cucu wrote:
>
> > Have a look at this article:>
> https://ceph.com/geen-categorie/ceph-pool-migration/
>
> Thanks; that all looks pretty hairy especially for a large pool (ceph df
> says 1353T / 428,547,935 objects)...
>
> ...so something a bit more controlled/gradual and less
> manual-error-prone would make me happier!
>
> Regards,
>
> Matthew
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrate/convert replicated pool to EC?

2018-10-26 Thread Matthew Vernon
Hi,

On 26/10/2018 12:38, Alexandru Cucu wrote:

> Have a look at this article:> 
> https://ceph.com/geen-categorie/ceph-pool-migration/

Thanks; that all looks pretty hairy especially for a large pool (ceph df
says 1353T / 428,547,935 objects)...

...so something a bit more controlled/gradual and less
manual-error-prone would make me happier!

Regards,

Matthew



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrate/convert replicated pool to EC?

2018-10-26 Thread Matthew Vernon
Hi,

On 25/10/2018 17:57, David Turner wrote:
> There are no tools to migrate in either direction between EC and
> Replica. You can't even migrate an EC pool to a new EC profile.

Oh well :-/

> With RGW you can create a new data pool and new objects will be written
> to the new pool. If your objects have a lifecycle, then eventually
> you'll be to the new pool over time. Otherwise you can get there by
> rewriting all of the objects manually.

How does this work? I presume if I just change data_pool then everyone
will lose things currently in S3? So I guess this would be adding
another placement_target (can it share an index pool, or do I need a new
one of those too?) with the new pool, and making it the default_placement...

If I do that, is there a way to do manual migration of objects in
parallel? I imagine a dumb rados get/put or similar won't do the correct
plumbing...

Thanks,

Matthew



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Large omap objects - how to fix ?

2018-10-26 Thread Ben Morrice

Hello all,

After a recent Luminous upgrade (now running 12.2.8 with all OSDs 
migrated to bluestore, upgraded from 11.2.0 and running filestore) I am 
currently experiencing the warning 'large omap objects'.
I know this is related to large buckets in radosgw, and luminous 
supports 'dynamic sharding' - however I feel that something is missing 
from our configuration and i'm a bit confused on what the right approach 
is to fix it.


First a bit of background info:

We previously had a multi site radosgw installation, however recently we 
decommissioned the second site. With the radosgw multi-site 
configuration we had 'bucket_index_max_shards = 0'. Since 
decommissioning the second site, I have removed the secondary zonegroup 
and changed 'bucket_index_max_shards' to be 16 for the single primary zone.
All our buckets do not have a 'num_shards' field when running 
'radosgw-admin bucket stats --bucket '

Is this normal ?

Also - I'm finding it difficult to find out exactly what to do with the 
buckets that are affected with 'large omap' (see commands below).

My interpretation of 'search the cluster log' is also listed below.

What do I need to do to with the below buckets get back to an overall 
ceph HEALTH OK state ? :)



# ceph health detail
HEALTH_WARN 2 large omap objects
2 large objects found in pool '.bbp-gva-master.rgw.buckets.index'
Search the cluster log for 'Large omap object found' for more details.

# ceph osd pool get .bbp-gva-master.rgw.buckets.index pg_num
pg_num: 64

# for i in `ceph pg ls-by-pool .bbp-gva-master.rgw.buckets.index | tail 
-n +2 | awk '{print $1}'`; do echo -n "$i: "; ceph pg $i query |grep 
num_large_omap_objects | head -1 | awk '{print $2}'; done | grep ": 1"

137.1b: 1
137.36: 1

# cat buckets
#!/bin/bash
buckets=`radosgw-admin metadata list bucket |grep \" | cut -d\" -f2`
for i in $buckets
do
  id=`radosgw-admin bucket stats --bucket $i |grep \"id\" | cut -d\" -f4`
  pg=`ceph osd map .bbp-gva-master.rgw.buckets.index ${id} | awk 
'{print $11}' | cut -d\( -f2 | cut -d\) -f1`

  echo "$i:$id:$pg"
done
# ./buckets > pglist
# egrep '137.1b|137.36' pglist |wc -l
192

The following doesn't appear to do change anything

# for bucket in `cut -d: -f1 pglist`; do radosgw-admin reshard add 
--bucket $bucket --num-shards 8; done


# radosgw-admin reshard process



--
Kind regards,

Ben Morrice

__
Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670
EPFL / BBP
Biotech Campus
Chemin des Mines 9
1202 Geneva
Switzerland

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrate/convert replicated pool to EC?

2018-10-26 Thread Alexandru Cucu
Hi,

Have a look at this article:
https://ceph.com/geen-categorie/ceph-pool-migration/

---
Alex Cucu

On Thu, Oct 25, 2018 at 7:31 PM Matthew Vernon  wrote:
>
> Hi,
>
> I thought I'd seen that it was possible to migrate a replicated pool to
> being erasure-coded (but not the converse); but I'm failing to find
> anything that says _how_.
>
> Have I misremembered? Can you migrate a replicated pool to EC? (if so, how?)
>
> ...our use case is moving our S3 pool which is quite large, so if we can
> convert in-place that would be ideal...
>
> Thanks,
>
> Matthew
>
>
> --
>  The Wellcome Sanger Institute is operated by Genome Research
>  Limited, a charity registered in England with number 1021457 and a
>  company registered in England with number 2742969, whose registered
>  office is 215 Euston Road, London, NW1 2BE.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph df space usage confusion - balancing needed?

2018-10-26 Thread Oliver Freyermuth
Dear Cephalopodians,

thanks for all your feedback!

I finally "pushed the button" and let upmap run for ~36 hours. 
Previously, we had ~63 % usage of our CephFS with only 50 % raw usage, now, we 
see only 53.77 % usage. 

That's as close as I expect things to ever become, and we gained about 70 TiB 
of free storage by that, which is almost one file server. 
So the outcome is really close to perfection :-). 

I'm leaving the balancer active now in upmap mode. Any bad experiences with 
leaving it active "forever"? 

Cheers and many thanks again,
Oliver

Am 23.10.18 um 01:14 schrieb Linh Vu:
> Upmap is awesome. I ran it on our new cluster before we started ingesting 
> data, so that the PG count is balanced on all OSDs. After ingesting about 
> 315TB, it's still beautifully balanced. Note: we have a few nodes with 8TB 
> OSDs, and the rest on 10TBs. 
> 
> 
> # ceph osd df plain
> ID  CLASS    WEIGHT  REWEIGHT SIZE    USE     AVAIL   %USE  VAR  PGS 
>   0   mf1hdd 7.27739  1.0 7.28TiB 2.06TiB 5.21TiB 28.34 1.01 144 
>   1   mf1hdd 7.27739  1.0 7.28TiB 2.07TiB 5.21TiB 28.38 1.02 144 
>   2   mf1hdd 7.27739  1.0 7.28TiB 2.03TiB 5.24TiB 27.96 1.00 142 
>   3   mf1hdd 7.27739  1.0 7.28TiB 2.06TiB 5.21TiB 28.37 1.02 144 
>   4   mf1hdd 7.27739  1.0 7.28TiB 2.03TiB 5.24TiB 27.96 1.00 142 
>   5   mf1hdd 7.27739  1.0 7.28TiB 2.02TiB 5.26TiB 27.73 0.99 141 
>   6   mf1hdd 7.27739  1.0 7.28TiB 2.03TiB 5.24TiB 27.94 1.00 142 
>   7   mf1hdd 7.27739  1.0 7.28TiB 2.06TiB 5.21TiB 28.35 1.02 144 
>   8   mf1hdd 7.27739  1.0 7.28TiB 2.02TiB 5.26TiB 27.76 0.99 141 
>   9   mf1hdd 7.27739  1.0 7.28TiB 2.04TiB 5.24TiB 27.97 1.00 142 
>  10   mf1hdd 7.27739  1.0 7.28TiB 2.06TiB 5.21TiB 28.35 1.02 144 
>  11   mf1hdd 7.27739  1.0 7.28TiB 2.04TiB 5.24TiB 27.99 1.00 142 
>  12   mf1hdd 7.27739  1.0 7.28TiB 2.02TiB 5.26TiB 27.75 0.99 141 
>  13   mf1hdd 7.27739  1.0 7.28TiB 2.03TiB 5.24TiB 27.96 1.00 142 
>  14   mf1hdd 7.27739  1.0 7.28TiB 2.02TiB 5.26TiB 27.78 0.99 141 
>  15   mf1hdd 7.27739  1.0 7.28TiB 2.07TiB 5.21TiB 28.38 1.02 144 
> 224 nvmemeta 0.02179  1.0 22.3GiB 1.52GiB 20.8GiB  6.82 0.24 185 
> 225 nvmemeta 0.02179  1.0 22.4GiB 1.49GiB 20.9GiB  6.68 0.24 182 
> 144   mf1hdd 8.91019  1.0 8.91TiB 2.48TiB 6.43TiB 27.81 1.00 173 
> 145   mf1hdd 8.91019  1.0 8.91TiB 2.48TiB 6.43TiB 27.84 1.00 173 
> 146   mf1hdd 8.91019  1.0 8.91TiB 2.48TiB 6.43TiB 27.84 1.00 173 
> 147   mf1hdd 8.91019  1.0 8.91TiB 2.48TiB 6.43TiB 27.82 1.00 173 
> 148   mf1hdd 8.91019  1.0 8.91TiB 2.49TiB 6.42TiB 27.98 1.00 174 
> 149   mf1hdd 8.91019  1.0 8.91TiB 2.50TiB 6.41TiB 28.01 1.00 174 
> 150   mf1hdd 8.91019  1.0 8.91TiB 2.51TiB 6.40TiB 28.12 1.01 175 
> 151   mf1hdd 8.91019  1.0 8.91TiB 2.50TiB 6.41TiB 28.01 1.00 174 
> 152   mf1hdd 8.91019  1.0 8.91TiB 2.48TiB 6.43TiB 27.87 1.00 173 
> 153   mf1hdd 8.91019  1.0 8.91TiB 2.51TiB 6.40TiB 28.16 1.01 175 
> 154   mf1hdd 8.91019  1.0 8.91TiB 2.48TiB 6.43TiB 27.79 1.00 173 
> 155   mf1hdd 8.91019  1.0 8.91TiB 2.48TiB 6.43TiB 27.82 1.00 173 
> 156   mf1hdd 8.91019  1.0 8.91TiB 2.48TiB 6.43TiB 27.83 1.00 173 
> 157   mf1hdd 8.91019  1.0 8.91TiB 2.48TiB 6.43TiB 27.83 1.00 173 
> 158   mf1hdd 8.91019  1.0 8.91TiB 2.50TiB 6.41TiB 28.00 1.00 174 
> 159   mf1hdd 8.91019  1.0 8.91TiB 2.48TiB 6.43TiB 27.81 1.00 173 
> 242 nvmemeta 0.02179  1.0 22.3GiB 1.50GiB 20.8GiB  6.70 0.24 182 
> 243 nvmemeta 0.02179  1.0 22.4GiB 1.45GiB 20.9GiB  6.48 0.23 183 
> 160   mf1hdd 8.91019  1.0 8.91TiB 2.48TiB 6.43TiB 27.80 1.00 173 
> 161   mf1hdd 8.91019  1.0 8.91TiB 2.48TiB 6.43TiB 27.83 1.00 173 
> 162   mf1hdd 8.91019  1.0 8.91TiB 2.49TiB 6.42TiB 28.00 1.00 174 
> 163   mf1hdd 8.91019  1.0 8.91TiB 2.48TiB 6.43TiB 27.83 1.00 173 
> 164   mf1hdd 8.91019  1.0 8.91TiB 2.48TiB 6.43TiB 27.84 1.00 173 
> 165   mf1hdd 8.91019  1.0 8.91TiB 2.48TiB 6.43TiB 27.84 1.00 173 
> 166   mf1hdd 8.91019  1.0 8.91TiB 2.48TiB 6.43TiB 27.84 1.00 173 
> 167   mf1hdd 8.91019  1.0 8.91TiB 2.49TiB 6.42TiB 28.00 1.00 174 
> 168   mf1hdd 8.91019  1.0 8.91TiB 2.50TiB 6.41TiB 28.01 1.00 174 
> 169   mf1hdd 8.91019  1.0 8.91TiB 2.48TiB 6.43TiB 27.86 1.00 173 
> 170   mf1hdd 8.91019  1.0 8.91TiB 2.51TiB 6.40TiB 28.12 1.01 175 
> 171   mf1hdd 8.91019  1.0 8.91TiB 2.48TiB 6.43TiB 27.85 1.00 173 
> 172   mf1hdd 8.91019  1.0 8.91TiB 2.49TiB 6.42TiB 27.98 1.00 174 
> 173   mf1hdd 8.91019  1.0 8.91TiB 2.48TiB 6.43TiB 27.85 1.00 173 
> 174   mf1hdd 8.91019  1.0 8.91TiB 2.48TiB 6.43TiB 27.81 1.00 173 
> 175   mf1hdd 8.91019  1.0 8.91TiB 2.48TiB 6.43TiB 27.80 1.00 173 
> 244 nvmemeta 0.02179  1.0 22.3GiB 1.49GiB 20.9GiB  6.65 0.24 182 
> 245 nvmemeta 0.02179  1.0 22.4GiB 1.53GiB 20.8GiB  6.86 0.25 185 
> 176   mf1hdd 8.91019  1.0 8.91TiB 2.48TiB 6.43TiB 27.83 1.00 173 
> 177   mf1hdd 8.91019  1.0 8.91TiB 2.48TiB 6.43TiB 27.82 1.00 173

Re: [ceph-users] RGW how to delete orphans

2018-10-26 Thread Florian Engelmann

Hi,

we've got the same problem here. Our 12.2.5 RadosGWs crashed 
(unrecognised by us) about 30.000 times with ongoing multipart uploads. 
After a couple of days we ended up with:


xx-1.rgw.buckets.data   6  N/A   N/A 
116TiB 87.22   17.1TiB 36264870 36.26M 3.63GiB 
148MiB   194TiB


116TB data (194TB raw) while only:

for i in $(radosgw-admin bucket list | jq -r '.[]'); do  radosgw-admin 
bucket stats --bucket=$i | jq '.usage | ."rgw.main" | .size_kb' ; done | 
awk '{ SUM += $1} END { print SUM/1024/1024/1024 }'


46.0962

116 - 46 = 70TB

So 70TB of objects are orphans, right?

And there are 36.264.870 objects in our rgw.buckets.data pool.

So we started:

radosgw-admin orphans list-jobs --extra-info
[
{
"orphan_search_state": {
"info": {
"orphan_search_info": {
"job_name": "check-orph",
"pool": "zh-1.rgw.buckets.data",
"num_shards": 64,
"start_time": "2018-10-10 09:01:14.746436Z"
}
},
"stage": {
"orphan_search_stage": {
"search_stage": "iterate_bucket_index",
"shard": 0,
"marker": ""
}
}
}
}
]

writing stdout to: orphans.txt

I am not sure about how to interpret the output but:

cat orphans.txt | awk '/^storing / { SUM += $2} END { print SUM }'
2145042765

So how to interpret those output lines:
...
storing 16 entries at orphan.scan.check-orph.linked.62
storing 19 entries at orphan.scan.check-orph.linked.63
storing 13 entries at orphan.scan.check-orph.linked.0
storing 13 entries at orphan.scan.check-orph.linked.1
...

Is it like

"I am storing 16 'healthy' object 'names' to the shard 
orphan.scan.check-orph.linked.62"


Is it objects? What is meant by "entries"? Where are those "shards"? Are 
they files or objects in a pool? How to know about the progress of 
"orphans find"? Is the job still doing the right thing? Time estimated 
to run on SATA disks with 194TB RAW?


The orphan find command stored already 2.145.042.765 (more than 2 
billion) "entries"... while there are "only" 36 million objects...


Is the process still healthy and doing the right thing?

All the best,
Florian





Am 10/3/17 um 10:48 AM schrieb Andreas Calminder:
The output, to stdout, is something like leaked: $objname. Am I supposed 
to pipe it to a log, grep for leaked: and pipe it to rados delete? Or am 
I supposed to dig around in the log pool to try and find the objects 
there? The information available is quite vague. Maybe Yehuda can shed 
some light on this issue?


Best regards,
/Andreas

On 3 Oct 2017 06:25, "Christian Wuerdig" > wrote:


yes, at least that's how I'd interpret the information given in this
thread:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-February/016521.html



On Tue, Oct 3, 2017 at 1:11 AM, Webert de Souza Lima
mailto:webert.b...@gmail.com>> wrote:
 > Hey Christian,
 >
 >> On 29 Sep 2017 12:32 a.m., "Christian Wuerdig"
 >> mailto:christian.wuer...@gmail.com>> wrote:
 >>>
 >>> I'm pretty sure the orphan find command does exactly just that -
 >>> finding orphans. I remember some emails on the dev list where
Yehuda
 >>> said he wasn't 100% comfortable of automating the delete just yet.
 >>> So the purpose is to run the orphan find tool and then delete the
 >>> orphaned objects once you're happy that they all are actually
 >>> orphaned.
 >>>
 >
 > so what you mean is that one should manually remove the result listed
 > objects that are output?
 >
 >
 > Regards,
 >
 > Webert Lima
 > DevOps Engineer at MAV Tecnologia
 > Belo Horizonte - Brasil
 >
 >
 > ___
 > ceph-users mailing list
 > ceph-users@lists.ceph.com 
 > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 >
___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


<< ATT1.txt (0.4KB) (0.4KB) >>




smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph mds memory leak while replay

2018-10-26 Thread Yan, Zheng
On Fri, Oct 26, 2018 at 3:53 PM Johannes Schlueter
 wrote:
>
> Hello,
> thanks for the reply.
> Before the restart there was HEALTH OK and for a few moments "slow request".
>
> Maybe helpful:
> #
> Events by type:
>   COMMITED: 12188
>   EXPORT: 196
>   IMPORTFINISH: 197
>   IMPORTSTART: 197
>   OPEN: 28096
>   SESSION: 2
>   SESSIONS: 64
>   SLAVEUPDATE: 8440
>   SUBTREEMAP: 256
>   UPDATE: 124222
> Errors: 0
>

how about rank1 (cephfs-journal-tool --rank 1 event get summary)



> Yan, Zheng  schrieb am Fr., 26. Okt. 2018, 09:13:
>>
>> On Fri, Oct 26, 2018 at 2:41 AM Johannes Schlueter
>>  wrote:
>> >
>> > Hello,
>> >
>> > os: ubuntu bionic lts
>> > ceph v12.2.7 luminous (on one node we updated to ceph-mds 12.2.8 with no 
>> > luck)
>> > 2 mds and 1 backup mds
>> >
>> > we just experienced a problem while restarting a mds. As it has begun to 
>> > replay the journal, the node ran out of memory.
>> > A restart later, after giving about 175GB of swapfile, it still breaks 
>> > down.
>> >
>> > As mentioned in an maillist entry with similar problem earlier, we 
>> > restarted all mds nodes causing all nodes to leak. Now they just switch 
>> > around as they break down and the backup starts the replay.
>> >
>>
>> Did you see warning "Behind on trimming" before mds restart?
>>
>>
>>
>> > Sincerely
>> >
>> > Patrick
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph df space usage confusion - balancing needed?

2018-10-26 Thread Konstantin Shalygin

upmap has been amazing and balanced my clusters far better than anything
else I've ever seen.  I would go so far as to say that upmap can achieve a
perfect balance.



Upmap is awesome. I ran it on our new cluster before we started ingesting data, 
so that the PG count is balanced on all OSDs.


Guys, do you remember, mainline or el kernel (krbd) is already supported 
for upmap?




k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph mds memory leak while replay

2018-10-26 Thread Johannes Schlueter
Hello,
thanks for the reply.
Before the restart there was HEALTH OK and for a few moments "slow request".

Maybe helpful:
# ceph-journal-tool event get summary
Events by type:
  COMMITED: 12188
  EXPORT: 196
  IMPORTFINISH: 197
  IMPORTSTART: 197
  OPEN: 28096
  SESSION: 2
  SESSIONS: 64
  SLAVEUPDATE: 8440
  SUBTREEMAP: 256
  UPDATE: 124222
Errors: 0

Yan, Zheng  schrieb am Fr., 26. Okt. 2018, 09:13:

> On Fri, Oct 26, 2018 at 2:41 AM Johannes Schlueter
>  wrote:
> >
> > Hello,
> >
> > os: ubuntu bionic lts
> > ceph v12.2.7 luminous (on one node we updated to ceph-mds 12.2.8 with no
> luck)
> > 2 mds and 1 backup mds
> >
> > we just experienced a problem while restarting a mds. As it has begun to
> replay the journal, the node ran out of memory.
> > A restart later, after giving about 175GB of swapfile, it still breaks
> down.
> >
> > As mentioned in an maillist entry with similar problem earlier, we
> restarted all mds nodes causing all nodes to leak. Now they just switch
> around as they break down and the backup starts the replay.
> >
>
> Did you see warning "Behind on trimming" before mds restart?
>
>
>
> > Sincerely
> >
> > Patrick
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph mds memory leak while replay

2018-10-26 Thread Yan, Zheng
On Fri, Oct 26, 2018 at 2:41 AM Johannes Schlueter
 wrote:
>
> Hello,
>
> os: ubuntu bionic lts
> ceph v12.2.7 luminous (on one node we updated to ceph-mds 12.2.8 with no luck)
> 2 mds and 1 backup mds
>
> we just experienced a problem while restarting a mds. As it has begun to 
> replay the journal, the node ran out of memory.
> A restart later, after giving about 175GB of swapfile, it still breaks down.
>
> As mentioned in an maillist entry with similar problem earlier, we restarted 
> all mds nodes causing all nodes to leak. Now they just switch around as they 
> break down and the backup starts the replay.
>

Did you see warning "Behind on trimming" before mds restart?



> Sincerely
>
> Patrick
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com