[ceph-users] Re: Frequest LARGE_OMAP_OBJECTS in cephfs metadata pool

2020-02-24 Thread Uday Bhaskar jalagam
Thanks Patrick, 

is this the bug you are referring to https://tracker.ceph.com/issues/42515 ? 

We also see performance issues mainly on metadata operations like finding file 
stats operations , however mds perf dump shows no sign of any latencies . could 
this bug cause any performance issues ? here is the perf dump metrics . 

https://pastebin.com/178anAe1

do you see any clue in this that could cause slow down in such operations ?  
our metadara pool has around 1.7 GB of data I gave mds cache 3 GB , I am not 
sure where to check how much used in the 3 GB or what is hit and miss 
count/ration in cache . 
We have huge cluster , there is definitely not enough IO that could saturate 
actual disk capacity so it is definitely MDS , not sure what to check here  to 
pin point the issues.  

Could you point me where I can start to go deep in troubleshooting this ?

Thanks,
Uday.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Frequest LARGE_OMAP_OBJECTS in cephfs metadata pool

2020-02-24 Thread Patrick Donnelly
It's probably a recently fixed openfiletable bug. Please upgrade to
v14.2.8 when it is released in the next week or so.

On Mon, Feb 24, 2020 at 1:46 PM Uday Bhaskar jalagam
 wrote:
>
> Hello Patrick,
>
> File system created around 4 months back. Using ceph version 14.2.3 version.
>
> [root@knode25 /]# ceph fs dump
> dumped fsmap epoch 577
> e577
> enable_multiple, ever_enabled_multiple: 0,0
> compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable 
> ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds 
> uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file 
> layout v2,10=snaprealm v2}
> legacy client fscid: 1
>
> Filesystem 'cephfs01' (1)
> fs_name cephfs01
> epoch   577
> flags   32
> created 2019-10-18 23:59:29.610249
> modified2020-02-22 03:13:09.425905
> tableserver 0
> root0
> session_timeout 60
> session_autoclose   300
> max_file_size   1099511627776
> min_compat_client   -1 (unspecified)
> last_failure0
> last_failure_osd_epoch  1608
> compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable 
> ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds 
> uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file 
> layout v2,10=snaprealm v2}
> max_mds 1
> in  0
> up  {0=2981519}
> failed
> damaged
> stopped
> data_pools  [2]
> metadata_pool   1
> inline_data disabled
> balancer
> standby_count_wanted1
> 2981519:
> [v2:10.131.16.30:6808/3209191719,v1:10.131.16.30:6809/3209191719] 
> 'cephfs01-b' mds.0.572 up:active seq 22141
> 2998684:[v2:10.131.16.89:6832/54557615,v1:10.131.16.89:6833/54557615] 
> 'cephfs01-a' mds.0.0 up:standby-replay seq 2
>
>
> [root@knode25 /]# ceph fs status
> cephfs01 - 290 clients
> 
> +--+++---+---+---+
> | Rank | State  |MDS |Activity   |  dns  |  inos |
> +--+++---+---+---+
> |  0   | active | cephfs01-b | Reqs:  333 /s | 2738k | 2735k |
> | 0-s  | standby-replay | cephfs01-a | Evts:  795 /s | 1368k | 1363k |
> +--+++---+---+---+
> +---+--+---+---+
> |Pool   |   type   |  used | avail |
> +---+--+---+---+
> | cephfs01-metadata | metadata | 2193M | 78.1T |
> |   cephfs01-data0  |   data   |  753G | 78.1T |
> +---+--+---+---+
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Patrick Donnelly, Ph.D.
He / Him / His
Senior Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Frequest LARGE_OMAP_OBJECTS in cephfs metadata pool

2020-02-24 Thread Uday Bhaskar jalagam
Hello Patrick, 

File system created around 4 months back. Using ceph version 14.2.3 version. 

[root@knode25 /]# ceph fs dump
dumped fsmap epoch 577
e577
enable_multiple, ever_enabled_multiple: 0,0
compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable 
ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses 
versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout 
v2,10=snaprealm v2}
legacy client fscid: 1

Filesystem 'cephfs01' (1)
fs_name cephfs01
epoch   577
flags   32
created 2019-10-18 23:59:29.610249
modified2020-02-22 03:13:09.425905
tableserver 0
root0
session_timeout 60
session_autoclose   300
max_file_size   1099511627776
min_compat_client   -1 (unspecified)
last_failure0
last_failure_osd_epoch  1608
compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable 
ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses 
versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout 
v2,10=snaprealm v2}
max_mds 1
in  0
up  {0=2981519}
failed
damaged
stopped
data_pools  [2]
metadata_pool   1
inline_data disabled
balancer
standby_count_wanted1
2981519:
[v2:10.131.16.30:6808/3209191719,v1:10.131.16.30:6809/3209191719] 'cephfs01-b' 
mds.0.572 up:active seq 22141
2998684:[v2:10.131.16.89:6832/54557615,v1:10.131.16.89:6833/54557615] 
'cephfs01-a' mds.0.0 up:standby-replay seq 2


[root@knode25 /]# ceph fs status
cephfs01 - 290 clients

+--+++---+---+---+
| Rank | State  |MDS |Activity   |  dns  |  inos |
+--+++---+---+---+
|  0   | active | cephfs01-b | Reqs:  333 /s | 2738k | 2735k |
| 0-s  | standby-replay | cephfs01-a | Evts:  795 /s | 1368k | 1363k |
+--+++---+---+---+
+---+--+---+---+
|Pool   |   type   |  used | avail |
+---+--+---+---+
| cephfs01-metadata | metadata | 2193M | 78.1T |
|   cephfs01-data0  |   data   |  753G | 78.1T |
+---+--+---+---+
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Changing allocation size

2020-02-24 Thread Kristof Coucke
Hi all,

A while back, I indicated we had an issue with our cluster filling up too
fast. After checking everything, we've concluded this was because we had a
lot of small files and the allocation size on the bluestore was too high
(64kb).
We are now recreating the OSD's (2 disk at the same time) but, this will
take a very long time as we're dealing with 130 OSDs.
The current process we're following is removing 2 osd's and recreating them.
We're using erasure coding (6 + 3).

Has anyone some advice on how we can move forward with this? We've already
increased some parameters to speed up recovery, but even then, it would
still cost us too much time.

If we could recreate them faster, that would be great... Or adapt the
allocation on the fly?

Any suggestions are welcome...

Thank you,

Kristof.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Frequest LARGE_OMAP_OBJECTS in cephfs metadata pool

2020-02-24 Thread Patrick Donnelly
On Mon, Feb 24, 2020 at 11:14 AM Uday Bhaskar jalagam
 wrote:
>
> Hello Team ,
>
> I am getting frequent LARGE_OMAP_OBJECTS 1 large omap objects in one of my 
> cephfs metadata pools , anyone can explain why would this pool getting into 
> this state frequently and how could I prevent this in future ?
>
> # ceph health detail
> HEALTH_WARN 1 large omap objects
> LARGE_OMAP_OBJECTS 1 large omap objects
> 1 large objects found in pool 'cephfs01-metadata'
> Search the cluster log for 'Large omap object found' for more details.

When was the file system created? What version is running? Please also
share `ceph fs dump`.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Senior Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Frequest LARGE_OMAP_OBJECTS in cephfs metadata pool

2020-02-24 Thread Uday Bhaskar jalagam
Hello Team ,

I am getting frequent LARGE_OMAP_OBJECTS 1 large omap objects in one of my 
cephfs metadata pools , anyone can explain why would this pool getting into 
this state frequently and how could I prevent this in future ? 

# ceph health detail
HEALTH_WARN 1 large omap objects
LARGE_OMAP_OBJECTS 1 large omap objects
1 large objects found in pool 'cephfs01-metadata'
Search the cluster log for 'Large omap object found' for more details.

Thanks , 
Uday
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Limited performance

2020-02-24 Thread Fabian Zimmermann
Hi,

we currently creating a new cluster. This cluster is (as far as we can
tell) an config-copy (ansible) of our existing cluster, just 5 years later
- with new hardware (nvme instead of ssd, bigger disks, ...)

The setup:

* NVMe for Journals and "Cache"-Pool
* HDD with NVMe Journals for "Data"-Pool
* Cache-Pool as writeback-Tier on Data-Pool
* We are using 12.2.13 without bluestore.

If we run an rados benchmark against this pool, everything seems fine, but
as soon as we start a fio-benchmark

-<-
[global]
ioengine=rbd
clientname=cinder
pool=cinder
rbdname=fio_test
rw=write
bs=4M

[rbd_iodepth32]
iodepth=32
->-

after some seconds the bandwidth drops to <15 MB/s and our hdd-disks are
doing more IOs than our Journal-Disks.
We also unconfigured the caching completely, but the issue remains.

The output of "ceph osd pool stats" shows ~100 op/s, but our disks are
doing:
-<-
Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
nvme0n1   0.00 0.000.00  278.50 0.0034.07   250.51
0.140.500.000.50   0.03   0.80
nvme1n1   0.00 0.000.00   64.00 0.00 7.77   248.50
0.010.220.000.22   0.03   0.20
sda   0.00 1.500.00  557.00 0.0029.49   108.45
  180.57  160.590.00  160.59   1.80 100.00
sdb   0.0042.000.00  592.00 0.0028.2197.60
  176.51 1105.790.00 1105.79   1.69 100.00
sdc   0.0014.500.00  528.50 0.0027.95   108.31
  183.02  179.470.00  179.47   1.89 100.00
sde   0.00   134.500.00  223.50 0.0014.05   128.72
   17.38   60.050.00   60.05   0.89  20.00
sdg   0.0076.000.00  492.00 0.0026.32   109.54
  191.81 1474.960.00 1474.96   2.03 100.00
sdf   0.00 0.000.00  491.50 0.0026.76   111.49
  176.55  326.050.00  326.05   2.03 100.00
sdh   0.00 0.000.00  548.50 0.0026.7199.75
  204.39  327.570.00  327.57   1.82 100.00
sdi   0.00   112.000.00  526.00 0.0023.1590.14
  158.32 1325.610.00 1325.61   1.90 100.00
sdj   0.0012.000.00  641.00 0.0034.78   111.13
  185.51  278.290.00  278.29   1.56 100.00
sdk   0.0023.500.00  399.50 0.0020.38   104.46
  166.77  461.670.00  461.67   2.50 100.00
sdl   0.00   267.000.00  498.50 0.0034.46   141.58
  200.37  490.800.00  490.80   2.01 100.00
->-

Any hints how to debug the issue?

Thanks a lot,

 Fabian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Migrating data to a more efficient EC pool

2020-02-24 Thread Vladimir Brik

Hello

I have ~300TB of data in default.rgw.buckets.data k2m2 pool and I would 
like to move it to a new k5m2 pool.


I found instructions using cache tiering[1], but they come with a vague 
scary warning, and it looks like EC-EC may not even be possible [2] (is 
it still the case?).


Can anybody recommend a safe procedure to copy an EC pool's data to 
another pool with a more efficient erasure coding? Perhaps there is a 
tool out there that could do it?


A few days of downtime would be tolerable, if it will simplify things. 
Also, I have enough free space to temporarily store the k2m2 data in a 
replicated pool (if EC-EC tiering is not possible, but EC-replicated and 
replicated-EC tiering is possible).


Is there a tool or some efficient way to verify that the content of two 
pools is the same?



Thanks,

Vlad

[1] https://ceph.io/geen-categorie/ceph-pool-migration/
[2] 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-February/016109.html

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph @ SoCal Linux Expo

2020-02-24 Thread Gregory Farnum
Hey all, we're excited to be returning properly to SCaLE in
Pasadena[1] this year (March 5-8) with a Thursday Birds-of-a-Feather
session[2] and a booth in the expo hall. Please come by if you're
attending the conference or are in the area to get face time with
other area users and Ceph developers. :)

Also, I got drafted into organizing this so if you'd be willing to
help man the booth in exchange for an Expo pass, shoot me an email! I
think I've got 3 spots left.
-Greg
[1]: https://www.socallinuxexpo.org/scale/18x
[2]: https://www.socallinuxexpo.org/scale/18x/presentations/ceph-storage
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph-mon using 100% CPU after upgrade to 14.2.5

2020-02-24 Thread Dan van der Ster
Hi Bryan,

Did you ever learn more about this, or see it again?
I'm facing 100% ceph-mon CPU usage now, and putting my observations
here: https://tracker.ceph.com/issues/42830

Cheers, Dan

On Mon, Dec 16, 2019 at 10:58 PM Bryan Stillwell  wrote:
>
> Sasha,
>
> I was able to get past it by restarting the ceph-mon processes every time it 
> got stuck, but that's not a very good solution for a production cluster.
>
> Right now I'm trying to narrow down what is causing the problem.  Rebuilding 
> the OSDs with BlueStore doesn't seem to be enough.  I believe it could be 
> related to us using the extra space on the journal device as an SSD-based 
> OSD.  During the conversion process I'm removing this SSD-based OSD (since 
> with BlueStore the omap data is ending up on the SSD anyways), and I'm 
> suspecting it might be causing this problem.
>
> Bryan
>
> On Dec 14, 2019, at 10:27 AM, Sasha Litvak  
> wrote:
>
> Notice: This email is from an external sender.
>
> Bryan,
>
> Were you able to resolve this?  If yes, can you please share with the list?
>
> On Fri, Dec 13, 2019 at 10:08 AM Bryan Stillwell  
> wrote:
>>
>> Adding the dev list since it seems like a bug in 14.2.5.
>>
>> I was able to capture the output from perf top:
>>
>>   21.58%  libceph-common.so.0   [.] 
>> ceph::buffer::v14_2_0::list::append
>>   20.90%  libstdc++.so.6.0.19   [.] std::getline> std::char_traits, std::allocator >
>>   13.25%  libceph-common.so.0   [.] 
>> ceph::buffer::v14_2_0::list::append
>>   10.11%  libstdc++.so.6.0.19   [.] std::istream::sentry::sentry
>>8.94%  libstdc++.so.6.0.19   [.] std::basic_ios> std::char_traits >::clear
>>3.24%  libceph-common.so.0   [.] 
>> ceph::buffer::v14_2_0::ptr::unused_tail_length
>>1.69%  libceph-common.so.0   [.] std::getline> std::char_traits, std::allocator >@plt
>>1.63%  libstdc++.so.6.0.19   [.] 
>> std::istream::sentry::sentry@plt
>>1.21%  [kernel]  [k] __do_softirq
>>0.77%  libpython2.7.so.1.0   [.] PyEval_EvalFrameEx
>>0.55%  [kernel]  [k] _raw_spin_unlock_irqrestore
>>
>> I increased mon debugging to 20 and nothing stuck out to me.
>>
>> Bryan
>>
>> > On Dec 12, 2019, at 4:46 PM, Bryan Stillwell  
>> > wrote:
>> >
>> > On our test cluster after upgrading to 14.2.5 I'm having problems with the 
>> > mons pegging a CPU core while moving data around.  I'm currently 
>> > converting the OSDs from FileStore to BlueStore by marking the OSDs out in 
>> > multiple nodes, destroying the OSDs, and then recreating them with 
>> > ceph-volume lvm batch.  This seems too get the ceph-mon process into a 
>> > state where it pegs a CPU core on one of the mons:
>> >
>> > 1764450 ceph  20   0 4802412   2.1g  16980 S 100.0 28.1   4:54.72 
>> > ceph-mon
>> >
>> > Has anyone else run into this with 14.2.5 yet?  I didn't see this problem 
>> > while the cluster was running 14.2.4.
>> >
>> > Thanks,
>> > Bryan
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unable to increase PG numbers

2020-02-24 Thread Andres Rojas Guerrero
I have tried to increase to 16, with the same result:

# ceph osd pool set cephfs_data pg_num 16
set pool 1 pg_num to 16

# ceph osd pool get cephfs_data pg_num
pg_num: 8


El 24/2/20 a las 15:10, Gabryel Mason-Williams escribió:
> Have you tried making a smaller increment instead of jumping from 8 to 128 as 
> that is quite a big leap?
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 

-- 
***
Andrés Rojas Guerrero
Unidad Sistemas Linux
Area Arquitectura Tecnológica
Secretaría General Adjunta de Informática
Consejo Superior de Investigaciones Científicas (CSIC)
Pinar 19
28006 - Madrid
Tel: +34 915680059 -- Ext. 990059
email: a.ro...@csic.es
ID comunicate.csic.es: @50852720l:matrix.csic.es
***
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unable to increase PG numbers

2020-02-24 Thread Gabryel Mason-Williams
Have you tried making a smaller increment instead of jumping from 8 to 128 as 
that is quite a big leap?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Unable to increase PG numbers

2020-02-24 Thread Andres Rojas Guerrero
Hi, I have a Nautilus installation version 14.2.1 with a very unbalanced
cephfs pool, I have 430 osd in the cluster but this pool only have 8 PG
and PGP and 118 TB used :

# ceph -s
  cluster:
id: a2269da7-e399-484a-b6ae-4ee1a31a4154
health: HEALTH_WARN
1 nearfull osd(s)
2 pool(s) nearfull

  services:
mon: 3 daemons, quorum mon21,mon22,mon23 (age 7M)
mgr: mon23(active, since 8M), standbys: mon22, mon21
mds: cephfs:2 {0=mon21=up:active,1=mon22=up:active} 1 up:standby
osd: 430 osds: 430 up, 430 in

  data:
pools:   2 pools, 16 pgs
objects: 10.07M objects, 38 TiB
usage:   118 TiB used, 4.5 PiB / 4.6 PiB avail
pgs: 15 active+clean
 1  active+clean+scrubbing+deep

# ceph osd pool get cephfs_data pg_num
pg_num: 8

Due to this bad configuration I have this warning message:


# ceph status
  cluster:
id: a2269da7-e399-484a-b6ae-4ee1a31a4154
health: HEALTH_WARN
1 nearfull osd(s)
2 pool(s) nearfull

I've discover that some osd are full:

# ceph osd status

| 113 | osd23 | 9824G | 1351G |0   | 0   |0   | 0   |
exists,nearfull,up |

I've tried to reweight this osd:

ceph osd reweight osd.113 0.9

But the process of reweight doesn't start. Otherwise I've tried to
increase the PG and PGP numbers but it' doesn't work.

# ceph osd pool set cephfs_data pg_num 128
set pool 1 pg_num to 128

# ceph osd pool get cephfs_data pg_num
pg_num: 8


What could be the reason for this problem?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW do not show up in 'ceph status'

2020-02-24 Thread Andreas Haupt
Sorry for the noise - problem was introduced by a missing iptables rule
:-(

On Fri, 2020-02-21 at 09:04 +0100, Andreas Haupt wrote:
> Dear all,
> 
> we recently added two additional RGWs to our CEPH cluster (version
> 14.2.7). They work flawlessly, however they do not show up in 'ceph
> status':
> 
> [cephmon1] /root # ceph -s | grep -A 6 services
>   services:
> mon: 3 daemons, quorum cephmon1,cephmon2,cephmon3 (age 14h)
> mgr: cephmon1(active, since 14h), standbys: cephmon2, cephmon3
> mds: cephfs:1 {0=cephmon1=up:active} 2 up:standby
> osd: 168 osds: 168 up (since 2w), 168 in (since 6w)
> rgw: 1 daemon active (ceph-s3)
>  
> As you can see, only the first, old RGW (ceph-s3) is listed. Is there
> any place where the RGWs need to get "announced"? Any idea, how to
> debug this?
> 
> Thanks,
> Andreas
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
-- 
| Andreas Haupt| E-Mail: andreas.ha...@desy.de
|  DESY Zeuthen| WWW:http://www-zeuthen.desy.de/~ahaupt
|  Platanenallee 6 | Phone:  +49/33762/7-7359
|  D-15738 Zeuthen | Fax:+49/33762/7-7216



smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] One PG is stuck and reading is not possible

2020-02-24 Thread mikko . lampikoski
ceph version 12.2.13 luminous (stable)

My whole ceph cluster went to kind of read only state. Ceph status showed that 
client reads is 0 op/s for whole cluster. There was normal amount of writes 
going on.

I checked health and it said:

# ceph health detail
HEALTH_WARN Reduced data availability: 1 pg inactive, 1 pg peering
PG_AVAILABILITY Reduced data availability: 1 pg inactive, 1 pg peering
pg 26.13b is stuck peering for 25523.506788, current state peering, last acting 
[2,0,33]

All osds showed to be up and all monitors are good. All pools are 3/2 
(size/min) and space usage ~30%.

I fixed this by restarting forst osd.2 (nothing happened) and then restarted 
osd.0. After that everyting went back to normal.

So what can cause "stuck peering" and how can i prevent this event from 
happening again?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io