from:"c"

Re: [ceph-users] [Nfs-ganesha-devel] 2.7.3 with CEPH_FSAL Crashing

2019-10-10 Thread David C

Thanks, Patrick. Looks like the fix is awaiting review, I guess my options
are to hold tight for 14.2.5 or patch myself if I get desperate. I've seen
this crash about 4 times over the past 96 hours, is there anything I can do
to mitigate the issue in the meantime?

On Wed, Oct 9, 2019 at 9:23 PM Patrick Donnelly  wrote:

> Looks like this bug: https://tracker.ceph.com/issues/41148
>
> On Wed, Oct 9, 2019 at 1:15 PM David C  wrote:
> >
> > Hi Daniel
> >
> > Thanks for looking into this. I hadn't installed ceph-debuginfo, here's
> the bt with line numbers:
> >
> > #0  operator uint64_t (this=0x10) at
> /usr/src/debug/ceph-14.2.2/src/include/object.h:123
> > #1  Client::fill_statx (this=this@entry=0x274b980, in=0x0,
> mask=mask@entry=341, stx=stx@entry=0x7fccdbefa210) at
> /usr/src/debug/ceph-14.2.2/src/client/Client.cc:7336
> > #2  0x7fce4ea1d4ca in fill_statx (stx=0x7fccdbefa210, mask=341,
> in=..., this=0x274b980) at
> /usr/src/debug/ceph-14.2.2/src/client/Client.h:898
> > #3  Client::_readdir_cache_cb (this=this@entry=0x274b980,
> dirp=dirp@entry=0x7fcb7d0e7860,
> > cb=cb@entry=0x7fce4e9d0950 <_readdir_single_dirent_cb(void*,
> dirent*, ceph_statx*, off_t, Inode*)>, p=p@entry=0x7fccdbefa6a0,
> caps=caps@entry=341,
> > getref=getref@entry=true) at
> /usr/src/debug/ceph-14.2.2/src/client/Client.cc:7999
> > #4  0x7fce4ea1e865 in Client::readdir_r_cb (this=0x274b980,
> d=0x7fcb7d0e7860,
> > cb=cb@entry=0x7fce4e9d0950 <_readdir_single_dirent_cb(void*,
> dirent*, ceph_statx*, off_t, Inode*)>, p=p@entry=0x7fccdbefa6a0,
> want=want@entry=1775,
> > flags=flags@entry=0, getref=true) at
> /usr/src/debug/ceph-14.2.2/src/client/Client.cc:8138
> > #5  0x7fce4ea1f3dd in Client::readdirplus_r (this=,
> d=, de=de@entry=0x7fccdbefa8c0, stx=stx@entry=0x7fccdbefa730,
> want=want@entry=1775,
> > flags=flags@entry=0, out=0x7fccdbefa720) at
> /usr/src/debug/ceph-14.2.2/src/client/Client.cc:8307
> > #6  0x7fce4e9c92d8 in ceph_readdirplus_r (cmount=,
> dirp=, de=de@entry=0x7fccdbefa8c0, stx=stx@entry
> =0x7fccdbefa730,
> > want=want@entry=1775, flags=flags@entry=0, out=out@entry=0x7fccdbefa720)
> at /usr/src/debug/ceph-14.2.2/src/libcephfs.cc:629
> > #7  0x7fce4ece7b0e in fsal_ceph_readdirplus (dir=,
> cred=, out=0x7fccdbefa720, flags=0, want=1775,
> stx=0x7fccdbefa730, de=0x7fccdbefa8c0,
> > dirp=, cmount=) at
> /usr/src/debug/nfs-ganesha-2.7.3/FSAL/FSAL_CEPH/statx_compat.h:314
> > #8  ceph_fsal_readdir (dir_pub=, whence=,
> dir_state=0x7fccdbefaa30, cb=0x522640 ,
> attrmask=122830,
> > eof=0x7fccdbefac0b) at
> /usr/src/debug/nfs-ganesha-2.7.3/FSAL/FSAL_CEPH/handle.c:211
> > #9  0x005256e1 in mdcache_readdir_uncached
> (directory=directory@entry=0x7fcaa8bb84a0, whence=,
> dir_state=, cb=,
> > attrmask=, eod_met=) at
> /usr/src/debug/nfs-ganesha-2.7.3/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:1654
> > #10 0x00517a88 in mdcache_readdir (dir_hdl=0x7fcaa8bb84d8,
> whence=0x7fccdbefab18, dir_state=0x7fccdbefab30, cb=0x432db0
> , attrmask=122830,
> > eod_met=0x7fccdbefac0b) at
> /usr/src/debug/nfs-ganesha-2.7.3/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:551
> > #11 0x0043434a in fsal_readdir 
> > (directory=directory@entry=0x7fcaa8bb84d8,
> cookie=cookie@entry=0, nbfound=nbfound@entry=0x7fccdbefac0c,
> > eod_met=eod_met@entry=0x7fccdbefac0b, attrmask=122830, 
> > cb=cb@entry=0x46f600
> , opaque=opaque@entry=0x7fccdbefac20)
> > at /usr/src/debug/nfs-ganesha-2.7.3/FSAL/fsal_helper.c:1164
> > #12 0x004705b9 in nfs4_op_readdir (op=0x7fcb7fed1f80,
> data=0x7fccdbefaea0, resp=0x7fcb7d106c40)
> > at
> /usr/src/debug/nfs-ganesha-2.7.3/Protocols/NFS/nfs4_op_readdir.c:664
> > #13 0x0045d120 in nfs4_Compound (arg=,
> req=, res=0x7fcb7e001000)
> > at /usr/src/debug/nfs-ganesha-2.7.3/Protocols/NFS/nfs4_Compound.c:942
> > #14 0x004512cd in nfs_rpc_process_request
> (reqdata=0x7fcb7e1d1950) at
> /usr/src/debug/nfs-ganesha-2.7.3/MainNFSD/nfs_worker_thread.c:1328
> > #15 0x00450766 in nfs_rpc_decode_request (xprt=0x7fcaf17fb0e0,
> xdrs=0x7fcb7e1ddb90) at
> /usr/src/debug/nfs-ganesha-2.7.3/MainNFSD/nfs_rpc_dispatcher_thread.c:1345
> > #16 0x7fce6165707d in svc_rqst_xprt_task (wpe=0x7fcaf17fb2f8) at
> /usr/src/debug/nfs-ganesha-2.7.3/libntirpc/src/svc_rqst.c:769
> > #17 0x7fce6165759a in svc_rqst_epoll_events (n_events= out>, sr_rec=0x56a24c0) at
> /usr/src/debug/nfs-ganesha-2.7.3/libntirpc/src/svc_rqst.c:941
> > #18 svc_rqst_epoll_loop (sr_rec=) at
> /usr/src/debug/nfs-ganesha-2.7.3/libntirpc

Re: [ceph-users] [Nfs-ganesha-devel] 2.7.3 with CEPH_FSAL Crashing

2019-10-09 Thread David C

Hi Daniel

Thanks for looking into this. I hadn't installed ceph-debuginfo, here's the
bt with line numbers:

#0  operator uint64_t (this=0x10) at
/usr/src/debug/ceph-14.2.2/src/include/object.h:123
#1  Client::fill_statx (this=this@entry=0x274b980, in=0x0, mask=mask@entry=341,
stx=stx@entry=0x7fccdbefa210) at
/usr/src/debug/ceph-14.2.2/src/client/Client.cc:7336
#2  0x7fce4ea1d4ca in fill_statx (stx=0x7fccdbefa210, mask=341, in=...,
this=0x274b980) at /usr/src/debug/ceph-14.2.2/src/client/Client.h:898
#3  Client::_readdir_cache_cb (this=this@entry=0x274b980, dirp=dirp@entry
=0x7fcb7d0e7860,
cb=cb@entry=0x7fce4e9d0950 <_readdir_single_dirent_cb(void*, dirent*,
ceph_statx*, off_t, Inode*)>, p=p@entry=0x7fccdbefa6a0, caps=caps@entry=341,
getref=getref@entry=true) at
/usr/src/debug/ceph-14.2.2/src/client/Client.cc:7999
#4  0x7fce4ea1e865 in Client::readdir_r_cb (this=0x274b980,
d=0x7fcb7d0e7860,
cb=cb@entry=0x7fce4e9d0950 <_readdir_single_dirent_cb(void*, dirent*,
ceph_statx*, off_t, Inode*)>, p=p@entry=0x7fccdbefa6a0, want=want@entry
=1775,
flags=flags@entry=0, getref=true) at
/usr/src/debug/ceph-14.2.2/src/client/Client.cc:8138
#5  0x7fce4ea1f3dd in Client::readdirplus_r (this=,
d=, de=de@entry=0x7fccdbefa8c0, stx=stx@entry=0x7fccdbefa730,
want=want@entry=1775,
flags=flags@entry=0, out=0x7fccdbefa720) at
/usr/src/debug/ceph-14.2.2/src/client/Client.cc:8307
#6  0x7fce4e9c92d8 in ceph_readdirplus_r (cmount=,
dirp=, de=de@entry=0x7fccdbefa8c0, stx=stx@entry
=0x7fccdbefa730,
want=want@entry=1775, flags=flags@entry=0, out=out@entry=0x7fccdbefa720)
at /usr/src/debug/ceph-14.2.2/src/libcephfs.cc:629
#7  0x7fce4ece7b0e in fsal_ceph_readdirplus (dir=,
cred=, out=0x7fccdbefa720, flags=0, want=1775,
stx=0x7fccdbefa730, de=0x7fccdbefa8c0,
dirp=, cmount=) at
/usr/src/debug/nfs-ganesha-2.7.3/FSAL/FSAL_CEPH/statx_compat.h:314
#8  ceph_fsal_readdir (dir_pub=, whence=,
dir_state=0x7fccdbefaa30, cb=0x522640 ,
attrmask=122830,
eof=0x7fccdbefac0b) at
/usr/src/debug/nfs-ganesha-2.7.3/FSAL/FSAL_CEPH/handle.c:211
#9  0x005256e1 in mdcache_readdir_uncached
(directory=directory@entry=0x7fcaa8bb84a0, whence=,
dir_state=, cb=,
attrmask=, eod_met=) at
/usr/src/debug/nfs-ganesha-2.7.3/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:1654
#10 0x00517a88 in mdcache_readdir (dir_hdl=0x7fcaa8bb84d8,
whence=0x7fccdbefab18, dir_state=0x7fccdbefab30, cb=0x432db0
, attrmask=122830,
eod_met=0x7fccdbefac0b) at
/usr/src/debug/nfs-ganesha-2.7.3/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:551
#11 0x0043434a in fsal_readdir
(directory=directory@entry=0x7fcaa8bb84d8,
cookie=cookie@entry=0, nbfound=nbfound@entry=0x7fccdbefac0c,
eod_met=eod_met@entry=0x7fccdbefac0b, attrmask=122830, cb=cb@entry=0x46f600
, opaque=opaque@entry=0x7fccdbefac20)
at /usr/src/debug/nfs-ganesha-2.7.3/FSAL/fsal_helper.c:1164
#12 0x004705b9 in nfs4_op_readdir (op=0x7fcb7fed1f80,
data=0x7fccdbefaea0, resp=0x7fcb7d106c40)
at /usr/src/debug/nfs-ganesha-2.7.3/Protocols/NFS/nfs4_op_readdir.c:664
#13 0x0045d120 in nfs4_Compound (arg=,
req=, res=0x7fcb7e001000)
at /usr/src/debug/nfs-ganesha-2.7.3/Protocols/NFS/nfs4_Compound.c:942
#14 0x004512cd in nfs_rpc_process_request (reqdata=0x7fcb7e1d1950)
at /usr/src/debug/nfs-ganesha-2.7.3/MainNFSD/nfs_worker_thread.c:1328
#15 0x00450766 in nfs_rpc_decode_request (xprt=0x7fcaf17fb0e0,
xdrs=0x7fcb7e1ddb90) at
/usr/src/debug/nfs-ganesha-2.7.3/MainNFSD/nfs_rpc_dispatcher_thread.c:1345
#16 0x7fce6165707d in svc_rqst_xprt_task (wpe=0x7fcaf17fb2f8) at
/usr/src/debug/nfs-ganesha-2.7.3/libntirpc/src/svc_rqst.c:769
#17 0x7fce6165759a in svc_rqst_epoll_events (n_events=,
sr_rec=0x56a24c0) at
/usr/src/debug/nfs-ganesha-2.7.3/libntirpc/src/svc_rqst.c:941
#18 svc_rqst_epoll_loop (sr_rec=) at
/usr/src/debug/nfs-ganesha-2.7.3/libntirpc/src/svc_rqst.c:1014
#19 svc_rqst_run_task (wpe=0x56a24c0) at
/usr/src/debug/nfs-ganesha-2.7.3/libntirpc/src/svc_rqst.c:1050
#20 0x7fce6165f123 in work_pool_thread (arg=0x7fcd381c77b0) at
/usr/src/debug/nfs-ganesha-2.7.3/libntirpc/src/work_pool.c:181
#21 0x7fce5fc17dd5 in start_thread (arg=0x7fccdbefe700) at
pthread_create.c:307
#22 0x7fce5ed8eead in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:111

On Mon, Oct 7, 2019 at 3:40 PM Daniel Gryniewicz  wrote:

> Client::fill_statx() is a fairly large function, so it's hard to know
> what's causing the crash.  Can you get line numbers from your backtrace?
>
> Daniel
>
> On 10/7/19 9:59 AM, David C wrote:
> > Hi All
> >
> > Further to my previous messages, I upgraded
> > to libcephfs2-14.2.2-0.el7.x86_64 as suggested and things certainly seem
> > a lot more stable, I have had some crashes though, could someone assist
> > in debugging this latest crash please?
> >
> > (gdb) bt
> > #0  0x7fce4e

Re: [ceph-users] [Nfs-ganesha-devel] 2.7.3 with CEPH_FSAL Crashing

2019-10-07 Thread David C

Hi All

Further to my previous messages, I upgraded
to libcephfs2-14.2.2-0.el7.x86_64 as suggested and things certainly seem a
lot more stable, I have had some crashes though, could someone assist in
debugging this latest crash please?

(gdb) bt
#0  0x7fce4e9fc1bb in Client::fill_statx(Inode*, unsigned int,
ceph_statx*) () from /lib64/libcephfs.so.2
#1  0x7fce4ea1d4ca in Client::_readdir_cache_cb(dir_result_t*, int
(*)(void*, dirent*, ceph_statx*, long, Inode*), void*, int, bool) () from
/lib64/libcephfs.so.2
#2  0x7fce4ea1e865 in Client::readdir_r_cb(dir_result_t*, int
(*)(void*, dirent*, ceph_statx*, long, Inode*), void*, unsigned int,
unsigned int, bool) () from /lib64/libcephfs.so.2
#3  0x7fce4ea1f3dd in Client::readdirplus_r(dir_result_t*, dirent*,
ceph_statx*, unsigned int, unsigned int, Inode**) () from
/lib64/libcephfs.so.2
#4  0x7fce4ece7b0e in fsal_ceph_readdirplus (dir=,
cred=, out=0x7fccdbefa720, flags=0, want=1775,
stx=0x7fccdbefa730, de=0x7fccdbefa8c0, dirp=,
cmount=)
at /usr/src/debug/nfs-ganesha-2.7.3/FSAL/FSAL_CEPH/statx_compat.h:314
#5  ceph_fsal_readdir (dir_pub=, whence=,
dir_state=0x7fccdbefaa30, cb=0x522640 ,
attrmask=122830, eof=0x7fccdbefac0b) at
/usr/src/debug/nfs-ganesha-2.7.3/FSAL/FSAL_CEPH/handle.c:211
#6  0x005256e1 in mdcache_readdir_uncached
(directory=directory@entry=0x7fcaa8bb84a0, whence=,
dir_state=, cb=, attrmask=,
eod_met=)
at
/usr/src/debug/nfs-ganesha-2.7.3/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:1654
#7  0x00517a88 in mdcache_readdir (dir_hdl=0x7fcaa8bb84d8,
whence=0x7fccdbefab18, dir_state=0x7fccdbefab30, cb=0x432db0
, attrmask=122830, eod_met=0x7fccdbefac0b) at
/usr/src/debug/nfs-ganesha-2.7.3/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:551
#8  0x0043434a in fsal_readdir
(directory=directory@entry=0x7fcaa8bb84d8,
cookie=cookie@entry=0, nbfound=nbfound@entry=0x7fccdbefac0c,
eod_met=eod_met@entry=0x7fccdbefac0b, attrmask=122830, cb=cb@entry=0x46f600
, opaque=opaque@entry=0x7fccdbefac20)
at /usr/src/debug/nfs-ganesha-2.7.3/FSAL/fsal_helper.c:1164
#9  0x004705b9 in nfs4_op_readdir (op=0x7fcb7fed1f80,
data=0x7fccdbefaea0, resp=0x7fcb7d106c40) at
/usr/src/debug/nfs-ganesha-2.7.3/Protocols/NFS/nfs4_op_readdir.c:664
#10 0x0045d120 in nfs4_Compound (arg=,
req=, res=0x7fcb7e001000) at
/usr/src/debug/nfs-ganesha-2.7.3/Protocols/NFS/nfs4_Compound.c:942
#11 0x004512cd in nfs_rpc_process_request (reqdata=0x7fcb7e1d1950)
at /usr/src/debug/nfs-ganesha-2.7.3/MainNFSD/nfs_worker_thread.c:1328
#12 0x00450766 in nfs_rpc_decode_request (xprt=0x7fcaf17fb0e0,
xdrs=0x7fcb7e1ddb90) at
/usr/src/debug/nfs-ganesha-2.7.3/MainNFSD/nfs_rpc_dispatcher_thread.c:1345
#13 0x7fce6165707d in svc_rqst_xprt_task (wpe=0x7fcaf17fb2f8) at
/usr/src/debug/nfs-ganesha-2.7.3/libntirpc/src/svc_rqst.c:769
#14 0x7fce6165759a in svc_rqst_epoll_events (n_events=,
sr_rec=0x56a24c0) at
/usr/src/debug/nfs-ganesha-2.7.3/libntirpc/src/svc_rqst.c:941
#15 svc_rqst_epoll_loop (sr_rec=) at
/usr/src/debug/nfs-ganesha-2.7.3/libntirpc/src/svc_rqst.c:1014
#16 svc_rqst_run_task (wpe=0x56a24c0) at
/usr/src/debug/nfs-ganesha-2.7.3/libntirpc/src/svc_rqst.c:1050
#17 0x7fce6165f123 in work_pool_thread (arg=0x7fcd381c77b0) at
/usr/src/debug/nfs-ganesha-2.7.3/libntirpc/src/work_pool.c:181
#18 0x7fce5fc17dd5 in start_thread () from /lib64/libpthread.so.0
#19 0x7fce5ed8eead in clone () from /lib64/libc.so.6

Package versions:

nfs-ganesha-vfs-2.7.3-0.1.el7.x86_64
nfs-ganesha-debuginfo-2.7.3-0.1.el7.x86_64
nfs-ganesha-ceph-2.7.3-0.1.el7.x86_64
nfs-ganesha-2.7.3-0.1.el7.x86_64
libcephfs2-14.2.2-0.el7.x86_64
librados2-14.2.2-0.el7.x86_64

Ganesha export:

EXPORT
{
Export_ID=100;
Protocols = 4;
Transports = TCP;
Path = /;
Pseudo = /ceph/;
Access_Type = RW;
Attr_Expiration_Time = 0;
Disable_ACL = FALSE;
Manage_Gids = TRUE;
Filesystem_Id = 100.1;
FSAL {
Name = CEPH;
}
}

Ceph.conf:

[client]
mon host = --removed--
client_oc_size = 6291456000 #6GB
client_acl_type=posix_acl
client_quota = true
client_quota_df = true

Client mount options:

rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=removed,local_lock=none,addr=removed)

On Fri, Jul 19, 2019 at 5:47 PM David C  wrote:

> Thanks, Jeff. I'll give 14.2.2 a go when it's released.
>
> On Wed, 17 Jul 2019, 22:29 Jeff Layton,  wrote:
>
>> Ahh, I just noticed you were running nautilus on the client side. This
>> patch went into v14.2.2, so once you update to that you should be good
>> to go.
>>
>> -- Jeff
>>
>> On Wed, 2019-07-17 at 17:10 -0400, Jeff Layton wrote:
>> > This is almost certainly the same bug that is fixed here:
>> >
>> > https://github.com/ceph/ceph/pull/28324
>> >
>> > It should get b

Re: [ceph-users] [Nfs-ganesha-devel] 2.7.3 with CEPH_FSAL Crashing

2019-07-19 Thread David C

Thanks, Jeff. I'll give 14.2.2 a go when it's released.

On Wed, 17 Jul 2019, 22:29 Jeff Layton,  wrote:

> Ahh, I just noticed you were running nautilus on the client side. This
> patch went into v14.2.2, so once you update to that you should be good
> to go.
>
> -- Jeff
>
> On Wed, 2019-07-17 at 17:10 -0400, Jeff Layton wrote:
> > This is almost certainly the same bug that is fixed here:
> >
> > https://github.com/ceph/ceph/pull/28324
> >
> > It should get backported soon-ish but I'm not sure which luminous
> > release it'll show up in.
> >
> > Cheers,
> > Jeff
> >
> > On Wed, 2019-07-17 at 10:36 +0100, David C wrote:
> > > Thanks for taking a look at this, Daniel. Below is the only
> interesting bit from the Ceph MDS log at the time of the crash but I
> suspect the slow requests are a result of the Ganesha crash rather than the
> cause of it. Copying the Ceph list in case anyone has any ideas.
> > >
> > > 2019-07-15 15:06:54.624007 7f5fda5bb700  0 log_channel(cluster) log
> [WRN] : 6 slow requests, 5 included below; oldest blocked for > 34.588509
> secs
> > > 2019-07-15 15:06:54.624017 7f5fda5bb700  0 log_channel(cluster) log
> [WRN] : slow request 33.113514 seconds old, received at 2019-07-15
> 15:06:21.510423: client_request(client.16140784:5571174 setattr
> mtime=2019-07-15 14:59:45.642408 #0x10009079cfb 2019-07
> > > -15 14:59:45.642408 caller_uid=1161, caller_gid=1131{}) currently
> failed to xlock, waiting
> > > 2019-07-15 15:06:54.624020 7f5fda5bb700  0 log_channel(cluster) log
> [WRN] : slow request 34.588509 seconds old, received at 2019-07-15
> 15:06:20.035428: client_request(client.16129440:1067288 create
> #0x1000907442e/filePathEditorRegistryPrefs.melDXAtss 201
> > > 9-07-15 14:59:53.694087 caller_uid=1161,
> caller_gid=1131{1131,4121,2330,2683,4115,2322,2779,2979,1503,3511,2783,2707,2942,2980,2258,2829,1238,1237,2793,1235,1249,2097,1154,2982,2983,3860,4101,1208,3638,3641,3644,3640,3643,3639,3642,3822,3945,4045,3521,35
> > > 22,3520,3523,}) currently failed to wrlock, waiting
> > > 2019-07-15 15:06:54.624025 7f5fda5bb700  0 log_channel(cluster) log
> [WRN] : slow request 34.583918 seconds old, received at 2019-07-15
> 15:06:20.040019: client_request(client.16140784:5570551 getattr pAsLsXsFs
> #0x1000907443b 2019-07-15 14:59:44.171408 cal
> > > ler_uid=1161, caller_gid=1131{}) currently failed to rdlock, waiting
> > > 2019-07-15 15:06:54.624028 7f5fda5bb700  0 log_channel(cluster) log
> [WRN] : slow request 34.580632 seconds old, received at 2019-07-15
> 15:06:20.043305: client_request(client.16129440:1067293 unlink
> #0x1000907442e/filePathEditorRegistryPrefs.melcdzxxc 201
> > > 9-07-15 14:59:53.701964 caller_uid=1161,
> caller_gid=1131{1131,4121,2330,2683,4115,2322,2779,2979,1503,3511,2783,2707,2942,2980,2258,2829,1238,1237,2793,1235,1249,2097,1154,2982,2983,3860,4101,1208,3638,3641,3644,3640,3643,3639,3642,3822,3945,4045,3521,35
> > > 22,3520,3523,}) currently failed to wrlock, waiting
> > > 2019-07-15 15:06:54.624032 7f5fda5bb700  0 log_channel(cluster) log
> [WRN] : slow request 34.538332 seconds old, received at 2019-07-15
> 15:06:20.085605: client_request(client.16129440:1067308 create
> #0x1000907442e/filePathEditorRegistryPrefs.melHHljMk 201
> > > 9-07-15 14:59:53.744266 caller_uid=1161,
> caller_gid=1131{1131,4121,2330,2683,4115,2322,2779,2979,1503,3511,2783,2707,2942,2980,2258,2829,1238,1237,2793,1235,1249,2097,1154,2982,2983,3860,4101,1208,3638,3641,3644,3640,3643,3639,3642,3822,3945,4045,3521,3522,3520,3523,})
> currently failed to wrlock, waiting
> > > 2019-07-15 15:06:55.014073 7f5fdcdc0700  1 mds.mds01 Updating MDS map
> to version 68166 from mon.2
> > > 2019-07-15 15:06:59.624041 7f5fda5bb700  0 log_channel(cluster) log
> [WRN] : 7 slow requests, 2 included below; oldest blocked for > 39.588571
> secs
> > > 2019-07-15 15:06:59.624048 7f5fda5bb700  0 log_channel(cluster) log
> [WRN] : slow request 30.495843 seconds old, received at 2019-07-15
> 15:06:29.128156: client_request(client.16129440:1072227 create
> #0x1000907442e/filePathEditorRegistryPrefs.mel58AQSv 2019-07-15
> 15:00:02.786754 caller_uid=1161,
> caller_gid=1131{1131,4121,2330,2683,4115,2322,2779,2979,1503,3511,2783,2707,2942,2980,2258,2829,1238,1237,2793,1235,1249,2097,1154,2982,2983,3860,4101,1208,3638,3641,3644,3640,3643,3639,3642,3822,3945,4045,3521,3522,3520,3523,})
> currently failed to wrlock, waiting
> > > 2019-07-15 15:06:59.624053 7f5fda5bb700  0 log_channel(cluster) log
> [WRN] : slow request 39.432848 seconds old, received at 2019-07-15
> 15:06:20.191151: client_request(client.16140784:5570649 mknod
> #0x1

Re: [ceph-users] [Nfs-ganesha-devel] 2.7.3 with CEPH_FSAL Crashing

2019-07-17 Thread David C

fda5bb700  0 log_channel(cluster) log [WRN] :
slow request 32.689838 seconds old, received at 2019-07-15 15:06:36.934283:
client_request(client.16129440:1072271 getattr pAsLsXsFs #0x1000907443b
2019-07-15 15:00:10.592734 caller_uid=1161,
caller_gid=1131{1131,4121,2330,2683,4115,2322,2779,2979,1503,3511,2783,2707,2942,2980,2258,2829,1238,1237,2793,1235,1249,2097,1154,2982,2983,3860,4101,1208,3638,3641,3644,3640,3643,3639,3642,3822,3945,4045,3521,3522,3520,3523,})
currently failed to rdlock, waiting
2019-07-15 15:07:09.624177 7f5fda5bb700  0 log_channel(cluster) log [WRN] :
slow request 34.962719 seconds old, received at 2019-07-15 15:06:34.661402:
client_request(client.16129440:1072256 getattr pAsLsXsFs #0x1000907443b
2019-07-15 15:00:08.319912 caller_uid=1161,
caller_gid=1131{1131,4121,2330,2683,4115,2322,2779,2979,1503,3511,2783,2707,2942,2980,2258,2829,1238,1237,2793,1235,1249,2097,1154,2982,2983,3860,4101,1208,3638,3641,3644,3640,3643,3639,3642,3822,3945,4045,3521,3522,3520,3523,})
currently failed to rdlock, waiting
2019-07-15 15:07:11.519928 7f5fdcdc0700  1 mds.mds01 Updating MDS map to
version 68169 from mon.2
2019-07-15 15:07:19.624272 7f5fda5bb700  0 log_channel(cluster) log [WRN] :
11 slow requests, 1 included below; oldest blocked for > 59.588812 secs
2019-07-15 15:07:19.624278 7f5fda5bb700  0 log_channel(cluster) log [WRN] :
slow request 32.164260 seconds old, received at 2019-07-15 15:06:47.459980:
client_request(client.16129440:1072326 getattr pAsLsXsFs #0x1000907443b
2019-07-15 15:00:21.118372 caller_uid=1161,
caller_gid=1131{1131,4121,2330,2683,4115,2322,2779,2979,1503,3511,2783,2707,2942,2980,2258,2829,1238,1237,2793,1235,1249,2097,1154,2982,2983,3860,4101,1208,3638,3641,3644,3640,3643,3639,3642,3822,3945,4045,3521,3522,3520,3523,})
currently failed to rdlock, waiting


On Tue, Jul 16, 2019 at 1:18 PM Daniel Gryniewicz  wrote:

> This is not one I've seen before, and a quick look at the code looks
> strange.  The only assert in that bit is asserting the parent is a
> directory, but the parent directory is not something that was passed in
> by Ganesha, but rather something that was looked up internally in
> libcephfs.  This is beyond my expertise, at this point.  Maybe some ceph
> logs would help?
>
> Daniel
>
> On 7/15/19 10:54 AM, David C wrote:
> > This list has been deprecated. Please subscribe to the new devel list at
> lists.nfs-ganesha.org.
> >
> >
> > Hi All
> >
> > I'm running 2.7.3 using the CEPH FSAL to export CephFS (Luminous), it
> > ran well for a few days and crashed. I have a coredump, could someone
> > assist me in debugging this please?
> >
> > (gdb) bt
> > #0  0x7f04dcab6207 in raise () from /lib64/libc.so.6
> > #1  0x7f04dcab78f8 in abort () from /lib64/libc.so.6
> > #2  0x7f04d2a9d6c5 in ceph::__ceph_assert_fail(char const*, char
> > const*, int, char const*) () from /usr/lib64/ceph/libceph-common.so.0
> > #3  0x7f04d2a9d844 in ceph::__ceph_assert_fail(ceph::assert_data
> > const&) () from /usr/lib64/ceph/libceph-common.so.0
> > #4  0x7f04cc807f04 in Client::_lookup_name(Inode*, Inode*, UserPerm
> > const&) () from /lib64/libcephfs.so.2
> > #5  0x7f04cc81c41f in Client::ll_lookup_inode(inodeno_t, UserPerm
> > const&, Inode**) () from /lib64/libcephfs.so.2
> > #6  0x7f04ccadbf0e in create_handle (export_pub=0x1baff10,
> > desc=, pub_handle=0x7f0470fd4718,
> > attrs_out=0x7f0470fd4740) at
> > /usr/src/debug/nfs-ganesha-2.7.3/FSAL/FSAL_CEPH/export.c:256
> > #7  0x00523895 in mdcache_locate_host (fh_desc=0x7f0470fd4920,
> > export=export@entry=0x1bafbf0, entry=entry@entry=0x7f0470fd48b8,
> > attrs_out=attrs_out@entry=0x0)
> >  at
> >
> /usr/src/debug/nfs-ganesha-2.7.3/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:1011
> > #8  0x0051d278 in mdcache_create_handle (exp_hdl=0x1bafbf0,
> > fh_desc=, handle=0x7f0470fd4900, attrs_out=0x0) at
> >
> /usr/src/debug/nfs-ganesha-2.7.3/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:1578
> > #9  0x0046d404 in nfs4_mds_putfh
> > (data=data@entry=0x7f0470fd4ea0) at
> > /usr/src/debug/nfs-ganesha-2.7.3/Protocols/NFS/nfs4_op_putfh.c:211
> > #10 0x0046d8e8 in nfs4_op_putfh (op=0x7f03effaf1d0,
> > data=0x7f0470fd4ea0, resp=0x7f03ec1de1f0) at
> > /usr/src/debug/nfs-ganesha-2.7.3/Protocols/NFS/nfs4_op_putfh.c:281
> > #11 0x0045d120 in nfs4_Compound (arg=,
> > req=, res=0x7f03ec1de9d0) at
> > /usr/src/debug/nfs-ganesha-2.7.3/Protocols/NFS/nfs4_Compound.c:942
> > #12 0x004512cd in nfs_rpc_process_request
> > (reqdata=0x7f03ee5ed4b0) at
> > /usr/src/debug/nfs-ganesha-2.7.3/MainNFSD/nfs_worker_thread.c:1328
> > #13 0x00450766 i

Re: [ceph-users] NFS-Ganesha CEPH_FSAL | potential locking issue

2019-05-17 Thread David C

Thanks for your response on that, Jeff. Pretty sure this is nothing to do
with Ceph or Ganesha, sorry for wasting your time. What I'm seeing is
related to writeback on the client. I can mitigate the behaviour a bit by
playing around with the vm.dirty* parameters.




On Tue, Apr 16, 2019 at 7:07 PM Jeff Layton  wrote:

> On Tue, Apr 16, 2019 at 10:36 AM David C  wrote:
> >
> > Hi All
> >
> > I have a single export of my cephfs using the ceph_fsal [1]. A CentOS 7
> machine mounts a sub-directory of the export [2] and is using it for the
> home directory of a user (e.g everything under ~ is on the server).
> >
> > This works fine until I start a long sequential write into the home
> directory such as:
> >
> > dd if=/dev/zero of=~/deleteme bs=1M count=8096
> >
> > This saturates the 1GbE link on the client which is great but during the
> transfer, apps that are accessing files in home start to lock up. Google
> Chrome for example, which puts it's config in ~/.config/google-chrome/,
> locks up during the transfer, e.g I can't move between tabs, as soon as the
> transfer finishes, Chrome goes back to normal. Essentially the desktop
> environment reacts as I'd expect if the server was to go away. I'm using
> the MATE DE.
> >
> > However, if I mount a separate directory from the same export on the
> machine [3] and do the same write into that directory, my desktop
> experience isn't affected.
> >
> > I hope that makes some sense, it's a bit of a weird one to describe.
> This feels like a locking issue to me, although I can't explain why a
> single write into the root of a mount would affect access to other files
> under that same mount.
> >
>
> It's not a single write. You're doing 8G worth of 1M I/Os. The server
> then has to do all of those to the OSD backing store.
>
> > [1] CephFS export:
> >
> > EXPORT
> > {
> > Export_ID=100;
> > Protocols = 4;
> > Transports = TCP;
> > Path = /;
> > Pseudo = /ceph/;
> > Access_Type = RW;
> > Attr_Expiration_Time = 0;
> > Disable_ACL = FALSE;
> > Manage_Gids = TRUE;
> > Filesystem_Id = 100.1;
> > FSAL {
> > Name = CEPH;
> > }
> > }
> >
> > [2] Home directory mount:
> >
> > 10.10.10.226:/ceph/homes/username on /homes/username type nfs4
> (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.10.10.135,local_lock=none,addr=10.10.10.226)
> >
> > [3] Test directory mount:
> >
> > 10.10.10.226:/ceph/testing on /tmp/testing type nfs4
> (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.10.10.135,local_lock=none,addr=10.10.10.226)
> >
> > Versions:
> >
> > Luminous 12.2.10
> > nfs-ganesha-2.7.1-0.1.el7.x86_64
> > nfs-ganesha-ceph-2.7.1-0.1.el7.x86_64
> >
> > Ceph.conf on nfs-ganesha server:
> >
> > [client]
> > mon host = 10.10.10.210:6789, 10.10.10.211:6789,
> 10.10.10.212:6789
> > client_oc_size = 8388608000
> > client_acl_type=posix_acl
> > client_quota = true
> > client_quota_df = true
> >
>
> No magic bullets here, I'm afraid.
>
> Sounds like ganesha is probably just too swamped with write requests
> to do much else, but you'll probably want to do the legwork starting
> with the hanging application, and figure out what it's doing that
> takes so long. Is it some syscall? Which one?
>
> From there you can start looking at statistics in the NFS client to
> see what's going on there. Are certain RPCs taking longer than they
> should? Which ones?
>
> Once you know what's going on with the client, you can better tell
> what's going on with the server.
> --
> Jeff Layton 
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] IMPORTANT : NEED HELP : Low IOPS on hdd : MAX AVAIL Draining fast

2019-04-27 Thread David C

On Sat, 27 Apr 2019, 18:50 Nikhil R,  wrote:

> Guys,
> We now have a total of 105 osd’s on 5 baremetal nodes each hosting 21
> osd’s on HDD which are 7Tb with journals on HDD too. Each journal is about
> 5GB
>

This would imply you've got a separate hdd partition for journals, I don't
think there's any value in that and would probabaly be detrimental to
performance.

>
> We expanded our cluster last week and added 1 more node with 21 HDD and
> journals on same disk.
> Our client i/o is too heavy and we are not able to backfill even 1 thread
> during peak hours - incase we backfill during peak hours osd's are crashing
> causing undersized pg's and if we have another osd crash we wont be able to
> use our cluster due to undersized and recovery pg's. During non-peak we can
> just backfill 8-10 pgs.
> Due to this our MAX AVAIL is draining out very fast.
>

How much ram have you got in your nodes? In my experience that's a common
reason for crashing OSDs during recovery ops

What does your recovery and backfill tuning look like?



> We are thinking of adding 2 more baremetal nodes with 21 *7tb  osd’s on
>  HDD and add 50GB SSD Journals for these.
> We aim to backfill from the 105 osd’s a bit faster and expect writes of
> backfillis coming to these osd’s faster.
>

Ssd journals would certainly help, just be sure it's a model that performs
well with Ceph

>
> Is this a good viable idea?
> Thoughts please?
>

I'd recommend sharing more detail e.g full spec of the nodes, Ceph version
etc.

>
> -Nikhil
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] NFS-Ganesha CEPH_FSAL | potential locking issue

2019-04-16 Thread David C

Hi All

I have a single export of my cephfs using the ceph_fsal [1]. A CentOS 7
machine mounts a sub-directory of the export [2] and is using it for the
home directory of a user (e.g everything under ~ is on the server).

This works fine until I start a long sequential write into the home
directory such as:

dd if=/dev/zero of=~/deleteme bs=1M count=8096

This saturates the 1GbE link on the client which is great but during the
transfer, apps that are accessing files in home start to lock up. Google
Chrome for example, which puts it's config in ~/.config/google-chrome/,
locks up during the transfer, e.g I can't move between tabs, as soon as the
transfer finishes, Chrome goes back to normal. Essentially the desktop
environment reacts as I'd expect if the server was to go away. I'm using
the MATE DE.

However, if I mount a separate directory from the same export on the
machine [3] and do the same write into that directory, my desktop
experience isn't affected.

I hope that makes some sense, it's a bit of a weird one to describe. This
feels like a locking issue to me, although I can't explain why a single
write into the root of a mount would affect access to other files under
that same mount.

[1] CephFS export:

EXPORT
{
Export_ID=100;
Protocols = 4;
Transports = TCP;
Path = /;
Pseudo = /ceph/;
Access_Type = RW;
Attr_Expiration_Time = 0;
Disable_ACL = FALSE;
Manage_Gids = TRUE;
Filesystem_Id = 100.1;
FSAL {
Name = CEPH;
}
}

[2] Home directory mount:

10.10.10.226:/ceph/homes/username on /homes/username type nfs4
(rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.10.10.135,local_lock=none,addr=10.10.10.226)

[3] Test directory mount:

10.10.10.226:/ceph/testing on /tmp/testing type nfs4
(rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.10.10.135,local_lock=none,addr=10.10.10.226)

Versions:

Luminous 12.2.10
nfs-ganesha-2.7.1-0.1.el7.x86_64
nfs-ganesha-ceph-2.7.1-0.1.el7.x86_64

Ceph.conf on nfs-ganesha server:

[client]
mon host = 10.10.10.210:6789, 10.10.10.211:6789, 10.10.10.212:6789
client_oc_size = 8388608000
client_acl_type=posix_acl
client_quota = true
client_quota_df = true

Thanks,
David
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] mount cephfs on ceph servers

2019-03-12 Thread David C

Out of curiosity, are you guys re-exporting the fs to clients over
something like nfs or running applications directly on the OSD nodes?

On Tue, 12 Mar 2019, 18:28 Paul Emmerich,  wrote:

> Mounting kernel CephFS on an OSD node works fine with recent kernels
> (4.14+) and enough RAM in the servers.
>
> We did encounter problems with older kernels though
>
>
> Paul
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
> On Tue, Mar 12, 2019 at 10:07 AM Hector Martin 
> wrote:
> >
> > It's worth noting that most containerized deployments can effectively
> > limit RAM for containers (cgroups), and the kernel has limits on how
> > many dirty pages it can keep around.
> >
> > In particular, /proc/sys/vm/dirty_ratio (default: 20) means at most 20%
> > of your total RAM can be dirty FS pages. If you set up your containers
> > such that the cumulative memory usage is capped below, say, 70% of RAM,
> > then this might effectively guarantee that you will never hit this issue.
> >
> > On 08/03/2019 02:17, Tony Lill wrote:
> > > AFAIR the issue is that under memory pressure, the kernel will ask
> > > cephfs to flush pages, but that this in turn causes the osd (mds?) to
> > > require more memory to complete the flush (for network buffers, etc).
> As
> > > long as cephfs and the OSDs are feeding from the same kernel mempool,
> > > you are susceptible. Containers don't protect you, but a full VM, like
> > > xen or kvm? would.
> > >
> > > So if you don't hit the low memory situation, you will not see the
> > > deadlock, and you can run like this for years without a problem. I
> have.
> > > But you are most likely to run out of memory during recovery, so this
> > > could compound your problems.
> > >
> > > On 3/7/19 3:56 AM, Marc Roos wrote:
> > >>
> > >>
> > >> Container =  same kernel, problem is with processes using the same
> > >> kernel.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> -Original Message-
> > >> From: Daniele Riccucci [mailto:devs...@posteo.net]
> > >> Sent: 07 March 2019 00:18
> > >> To: ceph-users@lists.ceph.com
> > >> Subject: Re: [ceph-users] mount cephfs on ceph servers
> > >>
> > >> Hello,
> > >> is the deadlock risk still an issue in containerized deployments? For
> > >> example with OSD daemons in containers and mounting the filesystem on
> > >> the host machine?
> > >> Thank you.
> > >>
> > >> Daniele
> > >>
> > >> On 06/03/19 16:40, Jake Grimmett wrote:
> > >>> Just to add "+1" on this datapoint, based on one month usage on Mimic
> > >>> 13.2.4 essentially "it works great for us"
> > >>>
> > >>> Prior to this, we had issues with the kernel driver on 12.2.2. This
> > >>> could have been due to limited RAM on the osd nodes (128GB / 45 OSD),
> > >>> and an older kernel.
> > >>>
> > >>> Upgrading the RAM to 256GB and using a RHEL 7.6 derived kernel has
> > >>> allowed us to reliably use the kernel driver.
> > >>>
> > >>> We keep 30 snapshots ( one per day), have one active metadata server,
> > >>> and change several TB daily - it's much, *much* faster than with
> fuse.
> > >>>
> > >>> Cluster has 10 OSD nodes, currently storing 2PB, using ec 8:2 coding.
> > >>>
> > >>> ta ta
> > >>>
> > >>> Jake
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On 3/6/19 11:10 AM, Hector Martin wrote:
> >  On 06/03/2019 12:07, Zhenshi Zhou wrote:
> > > Hi,
> > >
> > > I'm gonna mount cephfs from my ceph servers for some reason,
> > > including monitors, metadata servers and osd servers. I know it's
> > > not a best practice. But what is the exact potential danger if I
> > > mount cephfs from its own server?
> > 
> >  As a datapoint, I have been doing this on two machines (single-host
> >  Ceph
> >  clusters) for months with no ill effects. The FUSE client performs a
> >  lot worse than the kernel client, so I switched to the latter, and
> >  it's been working well with no deadlocks.
> > 
> > >>> ___
> > >>> ceph-users mailing list
> > >>> ceph-users@lists.ceph.com
> > >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >>>
> > >> ___
> > >> ceph-users mailing list
> > >> ceph-users@lists.ceph.com
> > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >>
> > >>
> > >> ___
> > >> ceph-users mailing list
> > >> ceph-users@lists.ceph.com
> > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >>
> > >
> > >
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> >
> > --
> > Hector Martin (hec...@marcansoft.com)
> > Public Key: https://mrcn.st/pub
> > ___
> > ceph-users mailing list
> > ceph-users

Re: [ceph-users] mount cephfs on ceph servers

2019-03-06 Thread David C

The general advice has been to not use the kernel client on an osd node as
you may see a deadlock under certain conditions. Using the fuse client
should be fine or use the kernel client inside a VM.

On Wed, 6 Mar 2019, 03:07 Zhenshi Zhou,  wrote:

> Hi,
>
> I'm gonna mount cephfs from my ceph servers for some reason,
> including monitors, metadata servers and osd servers. I know it's
> not a best practice. But what is the exact potential danger if I mount
> cephfs from its own server?
>
> Thanks
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] [Nfs-ganesha-devel] NFS-Ganesha CEPH_FSAL ceph.quota.max_bytes not enforced

2019-03-04 Thread David C

On Mon, Mar 4, 2019 at 5:53 PM Jeff Layton  wrote:

>
> On Mon, 2019-03-04 at 17:26 +, David C wrote:
> > Looks like you're right, Jeff. Just tried to write into the dir and am
> > now getting the quota warning. So I guess it was the libcephfs cache
> > as you say. That's fine for me, I don't need the quotas to be too
> > strict, just a failsafe really.
> >
>
> Actually, I said it was likely the NFS client cache. The Linux kernel is
> allowed to aggressively cache writes if you're doing buffered I/O. The
> NFS client has no concept of the quota here, so you'd only see
> enforcement once those writes start getting flushed back to the server.
>

Ah sorry, that makes a lot of sense!

>
>
> > Interestingly, if I create a new dir, set the same 100MB quota, I can
> > write multiple files with "dd if=/dev/zero of=1G bs=1M count=1024
> > oflag=direct". Wouldn't that bypass the cache? I have the following in
> > my ganesha.conf which I believe effectively disables Ganesha's
> > caching:
> >
> > CACHEINODE {
> > Dir_Chunk = 0;
> > NParts = 1;
> > Cache_Size = 1;
> > }
> >
>
> Using direct I/O like that should take the NFS client cache out of the
> picture. That said, cephfs quota enforcement is pretty "lazy". According
> to http://docs.ceph.com/docs/mimic/cephfs/quota/ :
>
> "Quotas are imprecise. Processes that are writing to the file system
> will be stopped a short time after the quota limit is reached. They will
> inevitably be allowed to write some amount of data over the configured
> limit. How far over the quota they are able to go depends primarily on
> the amount of time, not the amount of data. Generally speaking writers
> will be stopped within 10s of seconds of crossing the configured limit."
>
> You can write quite a bit of data in 10s of seconds (multiple GBs is not
> unreasonable here).
>
> > On Mon, Mar 4, 2019 at 2:50 PM Jeff Layton  wrote:
>
> > > > > On Mon, 2019-03-04 at 09:11 -0500, Jeff Layton wrote:
> > This list has
> > > been deprecated. Please subscribe to the new devel list at
> > > lists.nfs-ganesha.org.
> > On Fri, 2019-03-01 at 15:49 +, David C
> > > wrote:
> > > This list has been deprecated. Please subscribe to the new
> > > devel list at lists.nfs-ganesha.org.
> > > Hi All
> > >
> > > Exporting
> > > cephfs with the CEPH_FSAL
> > >
> > > I set the following on a dir:
> > >
> >
> > > > setfattr -n ceph.quota.max_bytes -v 1 /dir
> > > setfattr -n
> > > ceph.quota.max_files -v 10 /dir
> > >
> > > From an NFSv4 client, the
> > > quota.max_bytes appears to be completely ignored, I can go GBs over
> > > the quota in the dir. The quota.max_files DOES work however, if I
> > > try and create more than 10 files, I'll get "Error opening file
> > > 'dir/new file': Disk quota exceeded" as expected.
> > >
> > > From a
> > > fuse-mount on the same server that is running nfs-ganesha, I've
> > > confirmed ceph.quota.max_bytes is enforcing the quota, I'm unable to
> > > copy more than 100MB into the dir.
> > >
> > > According to [1] and [2]
> > > this should work.
> > >
> > > Cluster is Luminous 12.2.10
> > >
> > > Package
> > > versions on nfs-ganesha server:
> > >
> > > nfs-ganesha-rados-grace-
> > > 2.7.1-0.1.el7.x86_64
> > > nfs-ganesha-2.7.1-0.1.el7.x86_64
> > > nfs-
> > > ganesha-vfs-2.7.1-0.1.el7.x86_64
> > > nfs-ganesha-ceph-2.7.1-
> > > 0.1.el7.x86_64
> > > libcephfs2-13.2.2-0.el7.x86_64
> > > ceph-fuse-
> > > 12.2.10-0.el7.x86_64
> > >
> > > My Ganesha export:
> > >
> > > EXPORT
> > > {
> >
> > > > Export_ID=100;
> > > Protocols = 4;
> > > Transports = TCP;
> >
> > > > Path = /;
> > > Pseudo = /ceph/;
> > > Access_Type = RW;
> > >
> > >Attr_Expiration_Time = 0;
> > > #Manage_Gids = TRUE;
> > >
> > >  Filesystem_Id = 100.1;
> > > FSAL {
> > > Name = CEPH;
> > >
> > >  }
> > > }
> > >
> > > My ceph.conf client section:
> > >
> > > [client]
> > >
> > >mon host = 10.10.10.210:6789, 10.10.10.211:6789,
> > > 10.10.10.212:6789
> > >

Re: [ceph-users] [Nfs-ganesha-devel] NFS-Ganesha CEPH_FSAL ceph.quota.max_bytes not enforced

2019-03-04 Thread David C

Looks like you're right, Jeff. Just tried to write into the dir and am now
getting the quota warning. So I guess it was the libcephfs cache as you
say. That's fine for me, I don't need the quotas to be too strict, just a
failsafe really.

Interestingly, if I create a new dir, set the same 100MB quota, I can write
multiple files with "dd if=/dev/zero of=1G bs=1M count=1024 oflag=direct".
Wouldn't that bypass the cache? I have the following in my ganesha.conf
which I believe effectively disables Ganesha's caching:

CACHEINODE {
Dir_Chunk = 0;
NParts = 1;
Cache_Size = 1;
}

Thanks,

On Mon, Mar 4, 2019 at 2:50 PM Jeff Layton  wrote:

> On Mon, 2019-03-04 at 09:11 -0500, Jeff Layton wrote:
> > This list has been deprecated. Please subscribe to the new devel list at
> lists.nfs-ganesha.org.
> > On Fri, 2019-03-01 at 15:49 +, David C wrote:
> > > This list has been deprecated. Please subscribe to the new devel list
> at lists.nfs-ganesha.org.
> > > Hi All
> > >
> > > Exporting cephfs with the CEPH_FSAL
> > >
> > > I set the following on a dir:
> > >
> > > setfattr -n ceph.quota.max_bytes -v 1 /dir
> > > setfattr -n ceph.quota.max_files -v 10 /dir
> > >
> > > From an NFSv4 client, the quota.max_bytes appears to be completely
> ignored, I can go GBs over the quota in the dir. The quota.max_files DOES
> work however, if I try and create more than 10 files, I'll get "Error
> opening file 'dir/new file': Disk quota exceeded" as expected.
> > >
> > > From a fuse-mount on the same server that is running nfs-ganesha, I've
> confirmed ceph.quota.max_bytes is enforcing the quota, I'm unable to copy
> more than 100MB into the dir.
> > >
> > > According to [1] and [2] this should work.
> > >
> > > Cluster is Luminous 12.2.10
> > >
> > > Package versions on nfs-ganesha server:
> > >
> > > nfs-ganesha-rados-grace-2.7.1-0.1.el7.x86_64
> > > nfs-ganesha-2.7.1-0.1.el7.x86_64
> > > nfs-ganesha-vfs-2.7.1-0.1.el7.x86_64
> > > nfs-ganesha-ceph-2.7.1-0.1.el7.x86_64
> > > libcephfs2-13.2.2-0.el7.x86_64
> > > ceph-fuse-12.2.10-0.el7.x86_64
> > >
> > > My Ganesha export:
> > >
> > > EXPORT
> > > {
> > > Export_ID=100;
> > > Protocols = 4;
> > > Transports = TCP;
> > > Path = /;
> > > Pseudo = /ceph/;
> > > Access_Type = RW;
> > > Attr_Expiration_Time = 0;
> > > #Manage_Gids = TRUE;
> > > Filesystem_Id = 100.1;
> > > FSAL {
> > > Name = CEPH;
> > > }
> > > }
> > >
> > > My ceph.conf client section:
> > >
> > > [client]
> > > mon host = 10.10.10.210:6789, 10.10.10.211:6789,
> 10.10.10.212:6789
> > > client_oc_size = 8388608000
> > > #fuse_default_permission=0
> > > client_acl_type=posix_acl
> > > client_quota = true
> > > client_quota_df = true
> > >
> > > Related links:
> > >
> > > [1] http://tracker.ceph.com/issues/16526
> > > [2] https://github.com/nfs-ganesha/nfs-ganesha/issues/100
> > >
> > > Thanks
> > > David
> > >
> >
> > It looks like you're having ganesha do the mount as "client.admin", and
> > I suspect that that may allow you to bypass quotas? You may want to try
> > creating a cephx user with less privileges, have ganesha connect as that
> > user and see if it changes things?
> >
>
> Actually, this may be wrong info.
>
> How are you testing being able to write to the file past quota? Are you
> using O_DIRECT I/O? If not, then it may just be that you're seeing the
> effect of the NFS client caching writes.
> --
> Jeff Layton 
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] NFS-Ganesha CEPH_FSAL ceph.quota.max_bytes not enforced

2019-03-01 Thread David C

Hi All

Exporting cephfs with the CEPH_FSAL

I set the following on a dir:

setfattr -n ceph.quota.max_bytes -v 1 /dir
setfattr -n ceph.quota.max_files -v 10 /dir

>From an NFSv4 client, the quota.max_bytes appears to be completely ignored,
I can go GBs over the quota in the dir. The *quota.max_files* DOES work
however, if I try and create more than 10 files, I'll get "Error opening
file 'dir/new file': Disk quota exceeded" as expected.

>From a fuse-mount on the same server that is running nfs-ganesha, I've
confirmed ceph.quota.max_bytes is enforcing the quota, I'm unable to copy
more than 100MB into the dir.

According to [1] and [2] this should work.

Cluster is Luminous 12.2.10

Package versions on nfs-ganesha server:

nfs-ganesha-rados-grace-2.7.1-0.1.el7.x86_64
nfs-ganesha-2.7.1-0.1.el7.x86_64
nfs-ganesha-vfs-2.7.1-0.1.el7.x86_64
nfs-ganesha-ceph-2.7.1-0.1.el7.x86_64
libcephfs2-13.2.2-0.el7.x86_64
ceph-fuse-12.2.10-0.el7.x86_64

My Ganesha export:

EXPORT
{
Export_ID=100;
Protocols = 4;
Transports = TCP;
Path = /;
Pseudo = /ceph/;
Access_Type = RW;
Attr_Expiration_Time = 0;
#Manage_Gids = TRUE;
Filesystem_Id = 100.1;
FSAL {
Name = CEPH;
}
}

My ceph.conf client section:

[client]
mon host = 10.10.10.210:6789, 10.10.10.211:6789, 10.10.10.212:6789
client_oc_size = 8388608000
#fuse_default_permission=0
client_acl_type=posix_acl
client_quota = true
client_quota_df = true

Related links:

[1] http://tracker.ceph.com/issues/16526
[2] https://github.com/nfs-ganesha/nfs-ganesha/issues/100

Thanks
David
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cephfs recursive stats | rctime in the future

2019-02-28 Thread David C

On Wed, Feb 27, 2019 at 11:35 AM Hector Martin 
wrote:

> On 27/02/2019 19:22, David C wrote:
> > Hi All
> >
> > I'm seeing quite a few directories in my filesystem with rctime years in
> > the future. E.g
> >
> > ]# getfattr -d -m ceph.dir.* /path/to/dir
> > getfattr: Removing leading '/' from absolute path names
> > # file:  path/to/dir
> > ceph.dir.entries="357"
> > ceph.dir.files="1"
> > ceph.dir.rbytes="35606883904011"
> > ceph.dir.rctime="1851480065.090"
> > ceph.dir.rentries="12216551"
> > ceph.dir.rfiles="10540827"
> > ceph.dir.rsubdirs="1675724"
> > ceph.dir.subdirs="356"
> >
> > That's showing a last modified time of 2 Sept 2028, the day and month
> > are also wrong.
>
> Obvious question: are you sure the date/time on your cluster nodes and
> your clients is correct? Can you track down which files (if any) have
> the ctime in the future by following the rctime down the filesystem tree?
>

Times are all correct on the nodes and CephFS clients however the fs is
being exported over NFS. It's possible some NFS clients have the wrong time
although I'm reasonably confident they are all correct as the machines are
synced to local time servers and they use AD for auth, things wouldn't work
if the time was that wildly out of sync.

Good idea on checking down the tree. I've found the offending files but
can't find any explanation as to why they have a modified date so far in
the future.

For example one dir is "/.config/caja/" in a users home dir. The files in
this dir are all wildly different, the modified times are 1984, 1997,
2028...

It certainly feels like a MDS issue to me. I've used the recursive stats
since Jewel and I've never seen this before.

Any ideas?

> --
> Hector Martin (hec...@marcansoft.com)
> Public Key: https://mrcn.st/pub
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Cephfs recursive stats | rctime in the future

2019-02-27 Thread David C

Hi All

I'm seeing quite a few directories in my filesystem with rctime years in
the future. E.g

]# getfattr -d -m ceph.dir.* /path/to/dir
getfattr: Removing leading '/' from absolute path names
# file:  path/to/dir
ceph.dir.entries="357"
ceph.dir.files="1"
ceph.dir.rbytes="35606883904011"
ceph.dir.rctime="1851480065.090"
ceph.dir.rentries="12216551"
ceph.dir.rfiles="10540827"
ceph.dir.rsubdirs="1675724"
ceph.dir.subdirs="356"

That's showing a last modified time of 2 Sept 2028, the day and month are
also wrong.

Most dirs are still showing the correct rctime.

I've used the recursive stats for a few years now and they've always been
reliable. The last major changes I made to this cluster was an update to
Luminous 12.2.10, moving the metadata pool to an SSD backed pool and the
addition of a second Cephfs data pool.

I have just received a scrub error this morning with 1 inconsistent pg but
I've been noticing the incorrect rctimes for a while a now so not sure if
that's related.

Any help much appreciated

Thanks
David
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CEPH_FSAL Nfs-ganesha

2019-01-30 Thread David C

Hi Patrick

Thanks for the info. If I did multiple exports, how does that work in terms
of the cache settings defined in ceph.conf, are those settings per CephFS
client or a shared cache? I.e if I've definied client_oc_size, would that
be per export?

Cheers,

On Tue, Jan 15, 2019 at 6:47 PM Patrick Donnelly 
wrote:

> On Mon, Jan 14, 2019 at 7:11 AM Daniel Gryniewicz  wrote:
> >
> > Hi.  Welcome to the community.
> >
> > On 01/14/2019 07:56 AM, David C wrote:
> > > Hi All
> > >
> > > I've been playing around with the nfs-ganesha 2.7 exporting a cephfs
> > > filesystem, it seems to be working pretty well so far. A few questions:
> > >
> > > 1) The docs say " For each NFS-Ganesha export, FSAL_CEPH uses a
> > > libcephfs client,..." [1]. For arguments sake, if I have ten top level
> > > dirs in my Cephfs namespace, is there any value in creating a separate
> > > export for each directory? Will that potentially give me better
> > > performance than a single export of the entire namespace?
> >
> > I don't believe there are any advantages from the Ceph side.  From the
> > Ganesha side, you configure permissions, client ACLs, squashing, and so
> > on on a per-export basis, so you'll need different exports if you need
> > different settings for each top level directory.  If they can all use
> > the same settings, one export is probably better.
>
> There may be performance impact (good or bad) with having separate
> exports for CephFS. Each export instantiates a separate instance of
> the CephFS client which has its own bookkeeping and set of
> capabilities issued by the MDS. Also, each client instance has a
> separate big lock (potentially a big deal for performance). If the
> data for each export is disjoint (no hard links or shared inodes) and
> the NFS server is expected to have a lot of load, breaking out the
> exports can have a positive impact on performance. If there are hard
> links, then the clients associated with the exports will potentially
> fight over capabilities which will add to request latency.)
>
> --
> Patrick Donnelly
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Radosgw s3 subuser permissions

2019-01-25 Thread Adam C. Emerson

On 24/01/2019, Marc Roos wrote:
>
>
> This should do it sort of.
>
> {
>   "Id": "Policy1548367105316",
>   "Version": "2012-10-17",
>   "Statement": [
> {
>   "Sid": "Stmt1548367099807",
>   "Effect": "Allow",
>   "Action": "s3:ListBucket",
>   "Principal": { "AWS": "arn:aws:iam::Company:user/testuser" },
>   "Resource": "arn:aws:s3:::archive"
> },
> {
>   "Sid": "Stmt1548369229354",
>   "Effect": "Allow",
>   "Action": [
> "s3:GetObject",
> "s3:PutObject",
> "s3:ListBucket"
>   ],
>   "Principal": { "AWS": "arn:aws:iam::Company:user/testuser" },
>   "Resource": "arn:aws:s3:::archive/folder2/*"
> }
>   ]
> }


Does this work well for sub-users? I hadn't worked on them as we were
focusing on the tenant/user case, but if someone's been using policy
with sub-users, I'd like to hear their experience and any problems
they run into.

-- 
Senior Software Engineer   Red Hat Storage, Ann Arbor, MI, US
IRC: Aemerson@OFTC, Actinic@Freenode
0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How To Properly Failover a HA Setup

2019-01-21 Thread David C

It could also be the kernel client versions, what are you running? I
remember older kernel clients didn't always deal with recovery scenarios
very well.

On Mon, Jan 21, 2019 at 9:18 AM Marc Roos  wrote:

>
>
> I think his downtime is coming from the mds failover, that takes a while
> in my case to. But I am not using the cephfs that much yet.
>
>
>
> -Original Message-
> From: Robert Sander [mailto:r.san...@heinlein-support.de]
> Sent: 21 January 2019 10:05
> To: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] How To Properly Failover a HA Setup
>
> On 21.01.19 09:22, Charles Tassell wrote:
> > Hello Everyone,
> >
> >I've got a 3 node Jewel cluster setup, and I think I'm missing
> > something.  When I want to take one of my nodes down for maintenance
> > (kernel upgrades or the like) all of my clients (running the kernel
> > module for the cephfs filesystem) hang for a couple of minutes before
> > the redundant servers kick in.
>
> Have you set the noout flag before doing cluster maintenance?
>
> ceph osd set noout
>
> and afterwards
>
> ceph osd unset noout
>
> Regards
> --
> Robert Sander
> Heinlein Support GmbH
> Schwedter Str. 8/9b, 10119 Berlin
>
> https://www.heinlein-support.de
>
> Tel: 030 / 405051-43
> Fax: 030 / 405051-19
>
> Amtsgericht Berlin-Charlottenburg - HRB 93818 B
> Geschäftsführer: Peer Heinlein - Sitz: Berlin
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS - Small file - single thread - read performance.

2019-01-18 Thread David C

On Fri, 18 Jan 2019, 14:46 Marc Roos 
>
> [@test]# time cat 50b.img > /dev/null
>
> real0m0.004s
> user0m0.000s
> sys 0m0.002s
> [@test]# time cat 50b.img > /dev/null
>
> real0m0.002s
> user0m0.000s
> sys 0m0.002s
> [@test]# time cat 50b.img > /dev/null
>
> real0m0.002s
> user0m0.000s
> sys 0m0.001s
> [@test]# time cat 50b.img > /dev/null
>
> real0m0.002s
> user0m0.001s
> sys 0m0.001s
> [@test]#
>
> Luminous, centos7.6 kernel cephfs mount, 10Gbit, ssd meta, hdd data, mds
> 2,2Ghz
>

Did you drop the caches on your client before reading the file?

>
>
>
> -Original Message-
> From: Alexandre DERUMIER [mailto:aderum...@odiso.com]
> Sent: 18 January 2019 15:37
> To: Burkhard Linke
> Cc: ceph-users
> Subject: Re: [ceph-users] CephFS - Small file - single thread - read
> performance.
>
> Hi,
> I don't have so big latencies:
>
> # time cat 50bytesfile > /dev/null
>
> real0m0,002s
> user0m0,001s
> sys 0m0,000s
>
>
> (It's on an ceph ssd cluster (mimic), kernel cephfs client (4.18), 10GB
> network with small latency too, client/server have 3ghz cpus)
>
>
>
> - Mail original -
> De: "Burkhard Linke" 
> À: "ceph-users" 
> Envoyé: Vendredi 18 Janvier 2019 15:29:45
> Objet: Re: [ceph-users] CephFS - Small file - single thread - read
> performance.
>
> Hi,
>
> On 1/18/19 3:11 PM, jes...@krogh.cc wrote:
> > Hi.
> >
> > We have the intention of using CephFS for some of our shares, which
> > we'd like to spool to tape as a part normal backup schedule. CephFS
> > works nice for large files but for "small" .. < 0.1MB .. there seem to
>
> > be a "overhead" on 20-40ms per file. I tested like this:
> >
> > root@abe:/nfs/home/jk# time cat /ceph/cluster/rsyncbackups/13kbfile >
> > /dev/null
> >
> > real 0m0.034s
> > user 0m0.001s
> > sys 0m0.000s
> >
> > And from local page-cache right after.
> > root@abe:/nfs/home/jk# time cat /ceph/cluster/rsyncbackups/13kbfile >
> > /dev/null
> >
> > real 0m0.002s
> > user 0m0.002s
> > sys 0m0.000s
> >
> > Giving a ~20ms overhead in a single file.
> >
> > This is about x3 higher than on our local filesystems (xfs) based on
> > same spindles.
> >
> > CephFS metadata is on SSD - everything else on big-slow HDD's (in both
>
> > cases).
> >
> > Is this what everyone else see?
>
>
> Each file access on client side requires the acquisition of a
> corresponding locking entity ('file capability') from the MDS. This adds
> an extra network round trip to the MDS. In the worst case the MDS needs
> to request a capability release from another client which still holds
> the cap (e.g. file is still in page cache), adding another extra network
> round trip.
>
>
> CephFS is not NFS, and has a strong consistency model. This comes at a
> price.
>
>
> Regards,
>
> Burkhard
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS - Small file - single thread - read performance.

2019-01-18 Thread David C

On Fri, Jan 18, 2019 at 2:12 PM  wrote:

> Hi.
>
> We have the intention of using CephFS for some of our shares, which we'd
> like to spool to tape as a part normal backup schedule. CephFS works nice
> for large files but for "small" .. < 0.1MB  .. there seem to be a
> "overhead" on 20-40ms per file. I tested like this:
>
> root@abe:/nfs/home/jk# time cat /ceph/cluster/rsyncbackups/13kbfile >
> /dev/null
>
> real0m0.034s
> user0m0.001s
> sys 0m0.000s
>
> And from local page-cache right after.
> root@abe:/nfs/home/jk# time cat /ceph/cluster/rsyncbackups/13kbfile >
> /dev/null
>
> real0m0.002s
> user0m0.002s
> sys 0m0.000s
>
> Giving a ~20ms overhead in a single file.
>
> This is about x3 higher than on our local filesystems (xfs) based on
> same spindles.
>
> CephFS metadata is on SSD - everything else on big-slow HDD's (in both
> cases).
>
> Is this what everyone else see?
>

Pretty much. Reading a file from a pool of Filestore spinners:

# time cat 13kb > /dev/null

real0m0.013s
user0m0.000s
sys 0m0.003s

That's after dropping the caches on the client however the file would have
still been in the page cache on the OSD nodes as I just created it. If the
file was coming straight off the spinners I'd expect to see something
closer to your time.

I guess if you wanted to improve the latency you would be looking at the
usual stuff e.g (off the top of my head):

- Faster network links/tuning your network
- Turning down Ceph debugging
- Trying a different striping layout on the dirs with the small files
(unlikely to have much affect)
- If you're using fuse mount try Kernel mount (or maybe vice versa)
- Play with mount options
- Tune CPU on MDS node

Still even with all of that unlikely you'll get to local file-system
performance, as Burkhard says you have the locking overhead. You'll
probably need to look at getting more parallelism going in your rsyncs.



>
> Thanks
>
> --
> Jesper
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Why does "df" on a cephfs not report same free space as "rados df" ?

2019-01-16 Thread David C

On Wed, 16 Jan 2019, 02:20 David Young  Hi folks,
>
> My ceph cluster is used exclusively for cephfs, as follows:
>
> ---
> root@node1:~# grep ceph /etc/fstab
> node2:6789:/ /ceph ceph
> auto,_netdev,name=admin,secretfile=/root/ceph.admin.secret
> root@node1:~#
> ---
>
> "rados df" shows me the following:
>
> ---
> root@node1:~# rados df
> POOL_NAME  USED  OBJECTS CLONESCOPIES MISSING_ON_PRIMARY
> UNFOUND DEGRADEDRD_OPS  RDWR_OPS  WR
> cephfs_metadata 197 MiB49066  0 98132  0
> 00   9934744  55 GiB  57244243 232 GiB
> media   196 TiB 51768595  0 258842975  0
> 1   203534 477915206 509 TiB 165167618 292 TiB
>
> total_objects51817661
> total_used   266 TiB
> total_avail  135 TiB
> total_space  400 TiB
> root@node1:~#
> ---
>
> But "df" on the mounted cephfs volume shows me:
>
> ---
> root@node1:~# df -h /ceph
> Filesystem  Size  Used Avail Use% Mounted on
> 10.20.30.22:6789:/  207T  196T   11T  95% /ceph
> root@node1:~#
> ---
>
> And ceph -s shows me:
>
> ---
>   data:
> pools:   2 pools, 1028 pgs
> objects: 51.82 M objects, 196 TiB
> usage:   266 TiB used, 135 TiB / 400 TiB avail
> ---
>
> "media" is an EC pool with size of 5 (4+1), so I can expect 1TB of data to
> consume 1.25TB raw space.
>
> My question is, why does "df" show me I have 11TB free, when "rados df"
> shows me I have 135TB (raw) available?
>

Probabaly because your OSDs are quite unbalanced.  What does your 'ceph osd
df' look like?



>
> Thanks!
> D
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] CEPH_FSAL Nfs-ganesha

2019-01-14 Thread David C

Hi All

I've been playing around with the nfs-ganesha 2.7 exporting a cephfs
filesystem, it seems to be working pretty well so far. A few questions:

1) The docs say " For each NFS-Ganesha export, FSAL_CEPH uses a libcephfs
client,..." [1]. For arguments sake, if I have ten top level dirs in my
Cephfs namespace, is there any value in creating a separate export for each
directory? Will that potentially give me better performance than a single
export of the entire namespace?

2) Tuning: are there any recommended parameters to tune? So far I've found
I had to increase client_oc_size which seemed quite conservative.

Thanks
David

[1] http://docs.ceph.com/docs/mimic/cephfs/nfs/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] cephfs free space issue

2019-01-10 Thread David C

On Thu, Jan 10, 2019 at 4:07 PM Scottix  wrote:

> I just had this question as well.
>
> I am interested in what you mean by fullest, is it percentage wise or raw
> space. If I have an uneven distribution and adjusted it, would it make more
> space available potentially.
>

Yes - I'd recommend using pg-upmap if all your clients are Luminous+. I
"reclaimed" about 5TB of usable space recently by balancing my PGs.

@Yoanne, you've got a fair bit of variance so you would likely benefit from
pg-upmap (or other rebalancing).


> Thanks
> Scott
> On Thu, Jan 10, 2019 at 12:05 AM Wido den Hollander  wrote:
>
>>
>>
>> On 1/9/19 2:33 PM, Yoann Moulin wrote:
>> > Hello,
>> >
>> > I have a CEPH cluster in luminous 12.2.10 dedicated to cephfs.
>> >
>> > The raw size is 65.5 TB, with a replica 3, I should have ~21.8 TB
>> usable.
>> >
>> > But the size of the cephfs view by df is *only* 19 TB, is that normal ?
>> >
>>
>> Yes. Ceph will calculate this based on the fullest OSD. As data
>> distribution is never 100% perfect you will get such numbers.
>>
>> To go from raw to usable I use this calculation:
>>
>> (RAW / 3) * 0.85
>>
>> So yes, I take a 20%, sometimes even 30% buffer.
>>
>> Wido
>>
>> > Best regards,
>> >
>> > here some hopefully useful information :
>> >
>> >> apollo@icadmin004:~$ ceph -s
>> >>   cluster:
>> >> id: fc76846a-d0f0-4866-ae6d-d442fc885469
>> >> health: HEALTH_OK
>> >>
>> >>   services:
>> >> mon: 3 daemons, quorum icadmin006,icadmin007,icadmin008
>> >> mgr: icadmin006(active), standbys: icadmin007, icadmin008
>> >> mds: cephfs-3/3/3 up
>> {0=icadmin008=up:active,1=icadmin007=up:active,2=icadmin006=up:active}
>> >> osd: 40 osds: 40 up, 40 in
>> >>
>> >>   data:
>> >> pools:   2 pools, 2560 pgs
>> >> objects: 26.12M objects, 15.6TiB
>> >> usage:   49.7TiB used, 15.8TiB / 65.5TiB avail
>> >> pgs: 2560 active+clean
>> >>
>> >>   io:
>> >> client:   510B/s rd, 24.1MiB/s wr, 0op/s rd, 35op/s wr
>> >
>> >> apollo@icadmin004:~$ ceph df
>> >> GLOBAL:
>> >> SIZEAVAIL   RAW USED %RAW USED
>> >> 65.5TiB 15.8TiB  49.7TiB 75.94
>> >> POOLS:
>> >> NAMEID USED%USED MAX AVAIL
>>  OBJECTS
>> >> cephfs_data 1  15.6TiB 85.62   2.63TiB
>>  25874848
>> >> cephfs_metadata 2   571MiB  0.02   2.63TiB
>>  245778
>> >
>> >> apollo@icadmin004:~$ rados df
>> >> POOL_NAME   USEDOBJECTS  CLONES COPIES   MISSING_ON_PRIMARY
>> UNFOUND DEGRADED RD_OPS RD  WR_OPS   WR
>> >> cephfs_data 15.6TiB 25874848  0 77624544  0
>>00  324156851 25.9TiB 20114360 9.64TiB
>> >> cephfs_metadata  571MiB   245778  0   737334  0
>>00 1802713236 87.7TiB 75729412 16.0TiB
>> >>
>> >> total_objects26120626
>> >> total_used   49.7TiB
>> >> total_avail  15.8TiB
>> >> total_space  65.5TiB
>> >
>> >> apollo@icadmin004:~$ ceph osd pool ls detail
>> >> pool 1 'cephfs_data' replicated size 3 min_size 2 crush_rule 0
>> object_hash rjenkins pg_num 2048 pgp_num 2048 last_change 6197 lfor 0/3885
>> flags hashpspool stripe_width 0 application cephfs
>> >> pool 2 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0
>> object_hash rjenkins pg_num 512 pgp_num 512 last_change 6197 lfor 0/703
>> flags hashpspool stripe_width 0 application cephfs
>> >
>> >> apollo@icadmin004:~$ df -h /apollo/
>> >> Filesystem Size  Used Avail Use% Mounted on
>> >> 10.90.36.16,10.90.36.17,10.90.36.18:/   19T   16T  2.7T  86% /apollo
>> >
>> >> apollo@icadmin004:~$ ceph fs get cephfs
>> >> Filesystem 'cephfs' (1)
>> >> fs_name  cephfs
>> >> epoch49277
>> >> flagsc
>> >> created  2018-01-23 14:06:43.460773
>> >> modified 2019-01-09 14:17:08.520888
>> >> tableserver  0
>> >> root 0
>> >> session_timeout  60
>> >> session_autoclose300
>&g

Re: [ceph-users] Balancer=on with crush-compat mode

2019-01-05 Thread David C

On Sat, 5 Jan 2019, 13:38 Marc Roos 
> I have straw2, balancer=on, crush-compat and it gives worst spread over
> my ssd drives (4 only) being used by only 2 pools. One of these pools
> has pg 8. Should I increase this to 16 to create a better result, or
> will it never be any better.
>
> For now I like to stick to crush-compat, so I can use a default centos7
> kernel.
>

Pg upmap is supported in the CentOS 7.5+ kernels

>
> Luminous 12.2.8, 3.10.0-862.14.4.el7.x86_64, CentOS Linux release
> 7.5.1804 (Core)
>
>
>
> [@c01 ~]# cat balancer-1-before.txt | egrep '^19|^20|^21|^30'
> 19   ssd 0.48000  1.0  447GiB  164GiB  283GiB 36.79 0.93  31
> 20   ssd 0.48000  1.0  447GiB  136GiB  311GiB 30.49 0.77  32
> 21   ssd 0.48000  1.0  447GiB  215GiB  232GiB 48.02 1.22  30
> 30   ssd 0.48000  1.0  447GiB  151GiB  296GiB 33.72 0.86  27
>
> [@c01 ~]# ceph osd df | egrep '^19|^20|^21|^30'
> 19   ssd 0.48000  1.0  447GiB  157GiB  290GiB 35.18 0.87  30
> 20   ssd 0.48000  1.0  447GiB  125GiB  322GiB 28.00 0.69  30
> 21   ssd 0.48000  1.0  447GiB  245GiB  202GiB 54.71 1.35  30
> 30   ssd 0.48000  1.0  447GiB  217GiB  230GiB 48.46 1.20  30
>
> [@c01 ~]# ceph osd pool ls detail | egrep 'fs_meta|rbd.ssd'
> pool 19 'fs_meta' replicated size 3 min_size 2 crush_rule 5 object_hash
> rjenkins pg_num 16 pgp_num 16 last_change 22425 lfor 0/9035 flags
> hashpspool stripe_width 0 application cephfs
> pool 54 'rbd.ssd' replicated size 3 min_size 2 crush_rule 5 object_hash
> rjenkins pg_num 8 pgp_num 8 last_change 24666 flags hashpspool
> stripe_width 0 application rbd
>
> [@c01 ~]# ceph df |egrep 'ssd|fs_meta'
> fs_meta   19  170MiB  0.07
> 240GiB 2451382
> fs_data.ssd   33  0B 0
> 240GiB   0
> rbd.ssd   54  266GiB 52.57
> 240GiB   75902
> fs_data.ec21.ssd  55  0B 0
> 480GiB   0
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] CephFS client df command showing raw space after adding second pool to mds

2019-01-03 Thread David C

Hi All

Luminous 12.2.12
Single MDS
Replicated pools

A 'df' on a CephFS kernel client used to show me the usable space (i.e the
raw space with the replication overhead applied). This was when I just had
a single cephfs data pool.

After adding a second pool to the mds and using file layouts to map a
directory to that pool, a df is now showing the raw space. It's not the end
of the world but was handy to see the usable space.

I'm fairly sure the change was me adding the second pool although I'm not
99% sure.

I'm seeing this behavior on the latest Centos 7.6 kernel and a 4.14 kernel,
is this expected?

Thanks,
David
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help with setting device-class rule on pool without causing data to move

2019-01-03 Thread David C

Thanks, Sage! That did the trick.

Wido, seems like an interesting approach but I wasn't brave enough to
attempt it!

Eric, I suppose this does the same thing that the crushtool reclassify
feature does?

Thank you both for your suggestions.

For posterity:

-  I grabbed some 14.0.1 packages, extracted crushtool
and libceph-common.so.1
- Ran 'crushtool -i cm --reclassify --reclassify-root default hdd -o
cm_reclassified'
- Compared the maps with:

crushtool -i cm --compare cm_reclassified

That suggested I would get an acceptable amount of data reshuffling which I
expected, I didn't use --set-subtree-class as I'd already added SSD drives
to the cluster.

My ultimate goal was to migrate the cephfs_metadata pool onto SSD drives
while leaving the cephfs_data pool on the HDD drives. The device classes
feature made that really trivial, I just created an intermediary rule which
would use both HDD and SDD hosts (I didn't have any mixed devices in
hosts), set the Metadata pool to use the new rule, waited for recovery and
then set the Metadata pool to use an SSD-only rule. Not sure if that
intermediary stage was strictly necessary, I was concerned about inactive
PGs.

Thanks,
David

On Mon, Dec 31, 2018 at 6:06 PM Eric Goirand  wrote:

> Hi David,
>
> CERN has provided with a python script to swap the correct bucket IDs
> (default <-> hdd), you can find it here :
>
> https://github.com/cernceph/ceph-scripts/blob/master/tools/device-class-id-swap.py
>
> The principle is the following :
> - extract the CRUSH map
> - run the script on it => it creates a new CRUSH file.
> - edit the CRUSH map and modify the rule associated with the pool(s) you
> want to associate with HDD OSDs only like :
> => step take default WITH step take default class hdd
>
> Then recompile and reinject the new CRUSH map and voilà !
>
> Your cluster should be using only the HDD OSDs without rebalancing (or a
> very small amount).
>
> In case you have forgotten something, just reapply the former CRUSH map
> and start again.
>
> Cheers and Happy new year 2019.
>
> Eric
>
>
>
> On Sun, Dec 30, 2018, 21:16 David C  wrote:
>
>> Hi All
>>
>> I'm trying to set the existing pools in a Luminous cluster to use the hdd
>> device-class but without moving data around. If I just create a new rule
>> using the hdd class and set my pools to use that new rule it will cause a
>> huge amount of data movement even though the pgs are all already on HDDs.
>>
>> There is a thread on ceph-large [1] which appears to have the solution
>> but I can't get my head around what I need to do. I'm not too clear on
>> which IDs I need to swap. Could someone give me some pointers on this
>> please?
>>
>> [1]
>> http://lists.ceph.com/pipermail/ceph-large-ceph.com/2018-April/000109.html
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Help with setting device-class rule on pool without causing data to move

2018-12-30 Thread David C

Hi All

I'm trying to set the existing pools in a Luminous cluster to use the hdd
device-class but without moving data around. If I just create a new rule
using the hdd class and set my pools to use that new rule it will cause a
huge amount of data movement even though the pgs are all already on HDDs.

There is a thread on ceph-large [1] which appears to have the solution but
I can't get my head around what I need to do. I'm not too clear on which
IDs I need to swap. Could someone give me some pointers on this please?

[1]
http://lists.ceph.com/pipermail/ceph-large-ceph.com/2018-April/000109.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Bluestore nvme DB/WAL size

2018-12-21 Thread David C

I'm in a similar situation, currently running filestore with spinners and
journals on NVME partitions which are about 1% of the size of the OSD. If I
migrate to bluestore, I'll still only have that 1% available. Per the docs,
if my block.db device fills up, the metadata is going to spill back onto
the block device which will incur an understandable perfomance penalty. The
question is, will there be more of performance hit in that scenario versus
if the block.db was on the spinner and just the WAL was on the NVME?

On Fri, Dec 21, 2018 at 9:01 AM Janne Johansson  wrote:

> Den tors 20 dec. 2018 kl 22:45 skrev Vladimir Brik
> :
> > Hello
> > I am considering using logical volumes of an NVMe drive as DB or WAL
> > devices for OSDs on spinning disks.
> > The documentation recommends against DB devices smaller than 4% of slow
> > disk size. Our servers have 16x 10TB HDDs and a single 1.5TB NVMe, so
> > dividing it equally will result in each OSD getting ~90GB DB NVMe
> > volume, which is a lot less than 4%. Will this cause problems down the
> road?
>
> Well, apart from the reply you already got on "one nvme fails all the
> HDDs it is WAL/DB for",
> the recommendations are about getting the best out of them, especially
> for the DB I suppose.
>
> If one can size stuff up before, then following recommendations is a
> good choice, but I think
> you should test using it for WALs for instance, and bench it against
> another host with data,
> wal and db on the HDD and see if it helps a lot in your expected use case.
>
> --
> May the most significant bit of your life be positive.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] active+recovering+degraded after cluster reboot

2018-12-15 Thread David C

Yep, that cleared it. Sorry for the noise!

On Sun, Dec 16, 2018 at 12:16 AM David C  wrote:

> Hi Paul
>
> Thanks for the response. Not yet, just being a bit cautious ;) I'll go
> ahead and do that.
>
> Thanks
> David
>
>
> On Sat, 15 Dec 2018, 23:39 Paul Emmerich 
>> Did you unset norecover?
>>
>>
>> Paul
>>
>> --
>> Paul Emmerich
>>
>> Looking for help with your Ceph cluster? Contact us at https://croit.io
>>
>> croit GmbH
>> Freseniusstr. 31h
>> 81247 München
>> www.croit.io
>> Tel: +49 89 1896585 90
>>
>> On Sun, Dec 16, 2018 at 12:22 AM David C  wrote:
>> >
>> > Hi All
>> >
>> > I have what feels like a bit of a rookie question
>> >
>> > I shutdown a Luminous 12.2.1 cluster with noout,nobackfill,norecover set
>> >
>> > Before shutting down, all PGs were active+clean
>> >
>> > I brought the cluster up, all daemons started and all but 2 PGs are
>> active+clean
>> >
>> > I have 2 pgs showing: "active+recovering+degraded"
>> >
>> > It's been reporting this for about an hour with no signs of clearing on
>> it's own
>> >
>> > Ceph health detail shows: PG_DEGRADED Degraded data redundancy:
>> 2/131709267 objects degraded (0.000%), 2 pgs unclean, 2 pgs degraded
>> >
>> > I've tried restarting MONs and all OSDs in the cluster.
>> >
>> > How would you recommend I proceed at this point?
>> >
>> > Thanks
>> > David
>> >
>> >
>> >
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] active+recovering+degraded after cluster reboot

2018-12-15 Thread David C

Hi Paul

Thanks for the response. Not yet, just being a bit cautious ;) I'll go
ahead and do that.

Thanks
David


On Sat, 15 Dec 2018, 23:39 Paul Emmerich  Did you unset norecover?
>
>
> Paul
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
> On Sun, Dec 16, 2018 at 12:22 AM David C  wrote:
> >
> > Hi All
> >
> > I have what feels like a bit of a rookie question
> >
> > I shutdown a Luminous 12.2.1 cluster with noout,nobackfill,norecover set
> >
> > Before shutting down, all PGs were active+clean
> >
> > I brought the cluster up, all daemons started and all but 2 PGs are
> active+clean
> >
> > I have 2 pgs showing: "active+recovering+degraded"
> >
> > It's been reporting this for about an hour with no signs of clearing on
> it's own
> >
> > Ceph health detail shows: PG_DEGRADED Degraded data redundancy:
> 2/131709267 objects degraded (0.000%), 2 pgs unclean, 2 pgs degraded
> >
> > I've tried restarting MONs and all OSDs in the cluster.
> >
> > How would you recommend I proceed at this point?
> >
> > Thanks
> > David
> >
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] active+recovering+degraded after cluster reboot

2018-12-15 Thread David C

Hi All

I have what feels like a bit of a rookie question

I shutdown a Luminous 12.2.1 cluster with noout,nobackfill,norecover set

Before shutting down, all PGs were active+clean

I brought the cluster up, all daemons started and all but 2 PGs are
active+clean

I have 2 pgs showing: "active+recovering+degraded"

It's been reporting this for about an hour with no signs of clearing on
it's own

Ceph health detail shows: PG_DEGRADED Degraded data redundancy: 2/131709267
objects degraded (0.000%), 2 pgs unclean, 2 pgs degraded

I've tried restarting MONs and all OSDs in the cluster.

How would you recommend I proceed at this point?

Thanks
David
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Deploying an Active/Active NFS Cluster over CephFS

2018-12-12 Thread David C

Hi Jeff

Many thanks for this! Looking forward to testing it out.

Could you elaborate a bit on why Nautilus is recommended for this set-up
please. Would attempting this with a Luminous cluster be a non-starter?



On Wed, 12 Dec 2018, 12:16 Jeff Layton  (Sorry for the duplicate email to ganesha lists, but I wanted to widen
> it to include the ceph lists)
>
> In response to some cries for help over IRC, I wrote up this blog post
> the other day, which discusses how to set up parallel serving over
> CephFS:
>
>
> https://jtlayton.wordpress.com/2018/12/10/deploying-an-active-active-nfs-cluster-over-cephfs/
>
> Feel free to comment if you have questions. We may be want to eventually
> turn this into a document in the ganesha or ceph trees as well.
>
> Cheers!
> --
> Jeff Layton 
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CEPH DR RBD Mount

2018-11-30 Thread David C

Is that one big xfs filesystem? Are you able to mount with krbd?

On Tue, 27 Nov 2018, 13:49 Vikas Rana  Hi There,
>
> We are replicating a 100TB RBD image to DR site. Replication works fine.
>
> rbd --cluster cephdr mirror pool status nfs --verbose
>
> health: OK
>
> images: 1 total
>
> 1 replaying
>
>
>
> dir_research:
>
>   global_id:   11e9cbb9-ce83-4e5e-a7fb-472af866ca2d
>
>   state:   up+replaying
>
>   description: replaying, master_position=[object_number=591701,
> tag_tid=1, entry_tid=902879873], mirror_position=[object_number=446354,
> tag_tid=1, entry_tid=727653146], entries_behind_master=175226727
>
>   last_update: 2018-11-14 16:17:23
>
>
>
>
> We then, use nbd to map the RBD image at the DR site but when we try to
> mount it, we get
>
>
> # mount /dev/nbd2 /mnt
>
> mount: block device /dev/nbd2 is write-protected, mounting read-only
>
> *mount: /dev/nbd2: can't read superblock*
>
>
>
> We are using 12.2.8.
>
>
> Any help will be greatly appreciated.
>
>
> Thanks,
>
> -Vikas
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Apply bucket policy to bucket for LDAP user: what is the correct identifier for principal

2018-10-11 Thread Adam C. Emerson

Ha Son Hai  wrote:
> Hello everyone,
> I try to apply the bucket policy to my bucket for LDAP user but it doesn't 
> work.
> For user created by radosgw-admin, the policy works fine.
>
> {
>
>   "Version": "2012-10-17",
>
>   "Statement": [{
>
> "Effect": "Allow",
>
> "Principal": {"AWS": ["arn:aws:iam:::user/radosgw-user"]},
>
> "Action": "s3:*",
>
> "Resource": [
>
>   "arn:aws:s3:::shared-tenant-test",
>
>   "arn:aws:s3:::shared-tenant-test/*"
>
> ]
>
>   }]
>
> }

LDAP users essentially are RGW users, so it should be this same
format. As I understand RGW's LDAP interface (I have not worked with
LDAP personally), every LDAP users get a corresponding RGW user whose
name is derived from rgw_ldap_dnattr, often 'uid' or 'cn', but this is
dependent on site.

If you, can check that part of configuration, and if that doesn't work
if you'll send some logs I'll take a look. If something fishy is going
on we can try opening a bug.

Thank you.

-- 
Senior Software Engineer   Red Hat Storage, Ann Arbor, MI, US
IRC: Aemerson@OFTC, Actinic@Freenode
0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] list admin issues

2018-10-06 Thread David C

Same issue here, Gmail user, member of different lists but only get
disabled on ceph-users. Happens about once a month but had three in Sept.

On Sat, 6 Oct 2018, 18:28 Janne Johansson,  wrote:

> Den lör 6 okt. 2018 kl 15:06 skrev Elias Abacioglu
> :
> >
> > Hi,
> >
> > I'm bumping this old thread cause it's getting annoying. My membership
> get disabled twice a month.
> > Between my two Gmail accounts I'm in more than 25 mailing lists and I
> see this behavior only here. Why is only ceph-users only affected? Maybe
> Christian was on to something, is this intentional?
> > Reality is that there is a lot of ceph-users with Gmail accounts,
> perhaps it wouldn't be so bad to actually trying to figure this one out?
> >
> > So can the maintainers of this list please investigate what actually
> gets bounced? Look at my address if you want.
> > I got disabled 20181006, 20180927, 20180916, 20180725, 20180718 most
> recently.
> > Please help!
>
> Same here.
>
>
> --
> May the most significant bit of your life be positive.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Luminous new OSD being over filled

2018-09-03 Thread David C

Hi Marc

I like that approach although I think I'd go in smaller weight increments.

Still a bit confused by the behaviour I'm seeing, it looks like I've got
things weighted correctly. Redhat's docs recommend doing an OSD at a time
and I'm sure that's how I've done it on other clusters in the past although
they would have been running older versions.

Thanks,

On Mon, Sep 3, 2018 at 1:45 PM Marc Roos  wrote:

>
>
> I am adding a node like this, I think it is more efficient, because in
> your case you will have data being moved within the added node (between
> the newly added osd's there). So far no problems with this.
>
> Maybe limit your
> ceph tell osd.* injectargs --osd_max_backfills=X
> Because pg's being moved are taking space until the move is completed.
>
> sudo -u ceph ceph osd crush reweight osd.23 1 (all osd's in the node)
> sudo -u ceph ceph osd crush reweight osd.24 1
> sudo -u ceph ceph osd crush reweight osd.25 1
> sudo -u ceph ceph osd crush reweight osd.26 1
> sudo -u ceph ceph osd crush reweight osd.27 1
> sudo -u ceph ceph osd crush reweight osd.28 1
> sudo -u ceph ceph osd crush reweight osd.29 1
>
> And then after recovery
>
> sudo -u ceph ceph osd crush reweight osd.23 2
> sudo -u ceph ceph osd crush reweight osd.24 2
> sudo -u ceph ceph osd crush reweight osd.25 2
> sudo -u ceph ceph osd crush reweight osd.26 2
> sudo -u ceph ceph osd crush reweight osd.27 2
> sudo -u ceph ceph osd crush reweight osd.28 2
> sudo -u ceph ceph osd crush reweight osd.29 2
>
> Etc etc
>
>
> -Original Message-
> From: David C [mailto:dcsysengin...@gmail.com]
> Sent: maandag 3 september 2018 14:34
> To: ceph-users
> Subject: [ceph-users] Luminous new OSD being over filled
>
> Hi all
>
>
> Trying to add a new host to a Luminous cluster, I'm doing one OSD at a
> time. I've only added one so far but it's getting too full.
>
> The drive is the same size (4TB) as all others in the cluster, all OSDs
> have crush weight of 3.63689. Average usage on the drives is 81.70%
>
>
> With the new OSD I start with a crush weight 0 and steadily increase.
> It's currently crush weight 3.0 and is 94.78% full. If I increase to
> 3.63689 it's going to hit too full.
>
>
> It's been a while since I've added a host to an existing cluster. Any
> idea why the drive is getting too full? Do I just have to leave this one
> with a lower crush weight and then continue adding the drives and then
> eventually even out the crush weights?
>
> Thanks
> David
>
>
>
>
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Luminous new OSD being over filled

2018-09-03 Thread David C

Hi all

Trying to add a new host to a Luminous cluster, I'm doing one OSD at a
time. I've only added one so far but it's getting too full.

The drive is the same size (4TB) as all others in the cluster, all OSDs
have crush weight of 3.63689. Average usage on the drives is 81.70%

With the new OSD I start with a crush weight 0 and steadily increase. It's
currently crush weight 3.0 and is 94.78% full. If I increase to 3.63689
it's going to hit too full.

It's been a while since I've added a host to an existing cluster. Any idea
why the drive is getting too full? Do I just have to leave this one with a
lower crush weight and then continue adding the drives and then eventually
even out the crush weights?

Thanks
David
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Luminous missing osd_backfill_full_ratio

2018-09-03 Thread David C

In the end if was because I hadn't completed the upgrade with "ceph osd
require-osd-release luminous", after setting that I had the default
backfill full (0.9 I think) and was able to change it with ceph osd set
backfillfull-ratio.

Potential gotcha for a Jewel -> Luminous upgrade if you delay the
"...require-osd-release luminous" for whatever reason as it appears to
leave you with no backfillfull limit

Still having a bit of an issue with new OSDs over filling but will start a
new thread for that

Cheers,

On Thu, Aug 30, 2018 at 10:34 PM David Turner  wrote:

> This moved to the PG map in luminous. I think it might have been there in
> Jewel as well.
>
> http://docs.ceph.com/docs/luminous/man/8/ceph/#pg
> ceph pg set_full_ratio 
> ceph pg set_backfillfull_ratio 
> ceph pg set_nearfull_ratio 
>
>
> On Thu, Aug 30, 2018, 1:57 PM David C  wrote:
>
>> Hi All
>>
>> I feel like this is going to be a silly query with a hopefully simple
>> answer. I don't seem to have the osd_backfill_full_ratio config option on
>> my OSDs and can't inject it. This a Lumimous 12.2.1 cluster that was
>> upgraded from Jewel.
>>
>> I added an OSD to the cluster and woke up the next day to find the OSD
>> had hit OSD_FULL. I'm pretty sure the reason it filled up was because the
>> new host was weighted too high (I initially add two OSDs but decided to
>> only backfill one at a time). The thing that surprised me was why a
>> backfill full ratio didn't kick in to prevent this from happening.
>>
>> One potentially key piece of info is I haven't run the "ceph osd
>> require-osd-release luminous" command yet (I wasn't sure what impact this
>> would have so was waiting for a window with quiet client I/O).
>>
>> ceph osd dump is showing zero for all full ratios:
>>
>> # ceph osd dump | grep full_ratio
>> full_ratio 0
>> backfillfull_ratio 0
>> nearfull_ratio 0
>>
>> Do I simply need to run ceph osd set -backfillfull-ratio? Or am I missing
>> something here. I don't understand why I don't have a default backfill_full
>> ratio on this cluster.
>>
>> Thanks,
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help Basically..

2018-09-02 Thread David C

Does "ceph health detail" work?
Have you manually confirmed the OSDs on the nodes are working?
What was the replica size of the pools?
Are you seeing any progress with the recovery?



On Sun, Sep 2, 2018 at 9:42 AM Lee  wrote:

> Running 0.94.5 as part of a Openstack enviroment, our ceph setup is 3x OSD
> Nodes 3x MON Nodes, yesterday we had a aircon outage in our hosting
> enviroment, 1 OSD node failed (offline with a the journal SSD dead) left
> with 2 nodes running correctly, 2 hours later a second OSD node failed
> complaining of readwrite errors to the physical drives, i assume this was a
> heat issue as when rebooted this came back online ok and ceph started to
> repair itself. We have since brought the first failed node back on by
> replacing the ssd and recreating the journals hoping it would all repair..
> Our pools are min 2 repl.
>
> The problem we have is client IO (read) is totally blocked, and when I
> query the stuck PG's it just hangs..
>
> For example the check version command just errors with:
>
> Error EINTR: problem getting command descriptions from on various OSD's so
> I cannot even query the inactive PG's
>
> root@node31-a4:~# ceph -s
> cluster 7c24e1b9-24b3-4a1b-8889-9b2d7fd88cd2
>  health HEALTH_WARN
> 83 pgs backfill
> 2 pgs backfill_toofull
> 3 pgs backfilling
> 48 pgs degraded
> 1 pgs down
> 31 pgs incomplete
> 1 pgs recovering
> 29 pgs recovery_wait
> 1 pgs stale
> 48 pgs stuck degraded
> 31 pgs stuck inactive
> 1 pgs stuck stale
> 148 pgs stuck unclean
> 17 pgs stuck undersized
> 17 pgs undersized
> 599 requests are blocked > 32 sec
> recovery 111489/4697618 objects degraded (2.373%)
> recovery 772268/4697618 objects misplaced (16.440%)
> recovery 1/2171314 unfound (0.000%)
>  monmap e5: 3 mons at {bc07s12-a7=
> 172.27.16.11:6789/0,bc07s13-a7=172.27.16.21:6789/0,bc07s14-a7=172.27.16.15:6789/0
> }
> election epoch 198, quorum 0,1,2
> bc07s12-a7,bc07s14-a7,bc07s13-a7
>  osdmap e18727: 25 osds: 25 up, 25 in; 90 remapped pgs
>   pgmap v70996322: 1792 pgs, 13 pools, 8210 GB data, 2120 kobjects
> 16783 GB used, 6487 GB / 23270 GB avail
> 111489/4697618 objects degraded (2.373%)
> 772268/4697618 objects misplaced (16.440%)
> 1/2171314 unfound (0.000%)
> 1639 active+clean
>   66 active+remapped+wait_backfill
>   30 incomplete
>   25 active+recovery_wait+degraded
>   15 active+undersized+degraded+remapped+wait_backfill
>4 active+recovery_wait+degraded+remapped
>4 active+clean+scrubbing
>2 active+remapped+wait_backfill+backfill_toofull
>1 down+incomplete
>1 active+remapped+backfilling
>1 active+clean+scrubbing+deep
>1 stale+active+undersized+degraded
>1 active+undersized+degraded+remapped+backfilling
>1 active+degraded+remapped+backfilling
>1 active+recovering+degraded
> recovery io 29385 kB/s, 7 objects/s
>   client io 5877 B/s wr, 1 op/s
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Luminous missing osd_backfill_full_ratio

2018-08-30 Thread David C

Hi All

I feel like this is going to be a silly query with a hopefully simple
answer. I don't seem to have the osd_backfill_full_ratio config option on
my OSDs and can't inject it. This a Lumimous 12.2.1 cluster that was
upgraded from Jewel.

I added an OSD to the cluster and woke up the next day to find the OSD had
hit OSD_FULL. I'm pretty sure the reason it filled up was because the new
host was weighted too high (I initially add two OSDs but decided to only
backfill one at a time). The thing that surprised me was why a backfill
full ratio didn't kick in to prevent this from happening.

One potentially key piece of info is I haven't run the "ceph osd
require-osd-release luminous" command yet (I wasn't sure what impact this
would have so was waiting for a window with quiet client I/O).

ceph osd dump is showing zero for all full ratios:

# ceph osd dump | grep full_ratio
full_ratio 0
backfillfull_ratio 0
nearfull_ratio 0

Do I simply need to run ceph osd set -backfillfull-ratio? Or am I missing
something here. I don't understand why I don't have a default backfill_full
ratio on this cluster.

Thanks,
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cephfs meta data pool to ssd and measuring performance difference

2018-07-30 Thread David C

Something like smallfile perhaps? https://github.com/bengland2/smallfile

Or you just time creating/reading lots of files

With read benching you would want to ensure you've cleared your mds cache
or use a dataset larger than the cache.

I'd be interested in seeing your results, I this on the to do list myself.

On 25 Jul 2018 15:18, "Marc Roos"  wrote:



>From this thread, I got how to move the meta data pool from the hdd's to
the ssd's.
https://www.spinics.net/lists/ceph-users/msg39498.html

ceph osd pool get fs_meta crush_rule
ceph osd pool set fs_meta crush_rule replicated_ruleset_ssd

I guess this can be done on a live system?

What would be a good test to show the performance difference between the
old hdd and the new ssd?


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS+NFS For VMWare

2018-07-02 Thread David C

On Sat, 30 Jun 2018, 21:48 Nick Fisk,  wrote:

> Hi Paul,
>
>
>
> Thanks for your response, is there anything you can go into more detail on
> and share with the list? I’m sure it would be much appreciated by more than
> just myself.
>
>
>
> I was planning on Kernel CephFS and NFS server, both seem to achieve
> better performance, although stability is of greater concern.
>
FWIW, a recent nfs-ganesha could be more stable than kernel nfs. I've had a
fair few issues with Knfs exporting cephfs, it works fine until there is an
issue with your cluster such as an mds going down or slow requests and you
can end up with your nfsd processes in the dreaded uninterruptable sleep.

Also consider CTDB for basic active/active nfs on cephfs, works fine for
normal Linux clients, not sure how well it would work with esx. If you want
want to use use ctdb with ganesha I think you're restricted to using the
plain vfs fsal, I don't think the ceph fsal will give you the consistent
file handles you need for client fail over to work properly (although could
be wrong there).



>
> Thanks,
>
> Nick
>
> *From:* Paul Emmerich [mailto:paul.emmer...@croit.io]
> *Sent:* 29 June 2018 17:57
> *To:* Nick Fisk 
> *Cc:* ceph-users 
> *Subject:* Re: [ceph-users] CephFS+NFS For VMWare
>
>
>
> VMWare can be quite picky about NFS servers.
>
> Some things that you should test before deploying anything with that in
> production:
>
>
>
> * failover
>
> * reconnects after NFS reboots or outages
>
> * NFS3 vs NFS4
>
> * Kernel NFS (which kernel version? cephfs-fuse or cephfs-kernel?) vs NFS
> Ganesha (VFS FSAL vs. Ceph FSAL)
>
> * Stress tests with lots of VMWare clients - we had a setup than ran fine
> with 5 big VMWare hypervisors but started to get random deadlocks once we
> added 5 more
>
>
>
> We are running CephFS + NFS + VMWare in production but we've encountered
> *a lot* of problems until we got that stable for a few configurations.
>
> Be prepared to debug NFS problems at a low level with tcpdump and a
> careful read of the RFC and NFS server source ;)
>
>
>
> Paul
>
>
>
> 2018-06-29 18:48 GMT+02:00 Nick Fisk :
>
> This is for us peeps using Ceph with VMWare.
>
>
>
> My current favoured solution for consuming Ceph in VMWare is via RBD’s
> formatted with XFS and exported via NFS to ESXi. This seems to perform
> better than iSCSI+VMFS which seems to not play nicely with Ceph’s PG
> contention issues particularly if working with thin provisioned VMDK’s.
>
>
>
> I’ve still been noticing some performance issues however, mainly
> noticeable when doing any form of storage migrations. This is largely due
> to the way vSphere transfers VM’s in 64KB IO’s at a QD of 32. vSphere does
> this so Arrays with QOS can balance the IO easier than if larger IO’s were
> submitted. However Ceph’s PG locking means that only one or two of these
> IO’s can happen at a time, seriously lowering throughput. Typically you
> won’t be able to push more than 20-25MB/s during a storage migration
>
>
>
> There is also another issue in that the IO needed for the XFS journal on
> the RBD, can cause contention and effectively also means every NFS write IO
> sends 2 down to Ceph. This can have an impact on latency as well. Due to
> possible PG contention caused by the XFS journal updates when multiple IO’s
> are in flight, you normally end up making more and more RBD’s to try and
> spread the load. This normally means you end up having to do storage
> migrations…..you can see where I’m getting at here.
>
>
>
> I’ve been thinking for a while that CephFS works around a lot of these
> limitations.
>
>
>
> 1.   It supports fancy striping, so should mean there is less per
> object contention
>
> 2.   There is no FS in the middle to maintain a journal and other
> associated IO
>
> 3.   A single large NFS mount should have none of the disadvantages
> seen with a single RBD
>
> 4.   No need to migrate VM’s about because of #3
>
> 5.   No need to fstrim after deleting VM’s
>
> 6.   Potential to do away with pacemaker and use LVS to do
> active/active NFS as ESXi does its own locking with files
>
>
>
> With this in mind I exported a CephFS mount via NFS and then mounted it to
> an ESXi host as a test.
>
>
>
> Initial results are looking very good. I’m seeing storage migrations to
> the NFS mount going at over 200MB/s, which equates to several thousand IO’s
> and seems to be writing at the intended QD32.
>
>
>
> I need to do more testing to make sure everything works as intended, but
> like I say, promising initial results.
>
>
>
> Further testing needs to be done to see what sort of MDS performance is
> required, I would imagine that since we are mainly dealing with large
> files, it might not be that critical. I also need to consider the stability
> of CephFS, RBD is relatively simple and is in use by a large proportion of
> the Ceph community. CephFS is a lot easier to “upset”.
>
>
>
> Nick
>
>
> ___
> ceph-users mail

Re: [ceph-users] GFS2 as RBD on ceph?

2018-06-13 Thread David C

I'd say it's safe in terms of data integrity. In terms of availability,
that's something you'll want to test thoroughly e.g what happens when the
cluster is in recovery, does the filesystem remain accessible?

I think you'll be disappointed in terms of performance, I found OCFS2 to be
slightly better in that regard.

My advice would be, confirm Cephfs can't give you the performance you need
before you embark on an architecture like this.

Cheers,
David

On Tue, 12 Jun 2018, 21:27 Kevin Olbrich,  wrote:

> Hi!
>
> *Is it safe to run GFS2 on ceph as RBD and mount it to approx. 3 to 5
> vm's?*
> Idea is to consolidate 3 webservers which are located behind proxys. The
> old infrastructure is not HA or capable of load balancing.
> I would like to set up a webserver, clone the image and mount the GFS2
> disk as shared storage. This would also allow FTP load balancing.
>
> Redundancy would be taken care of by ceph while the VMs share up-to-date
> data on all nodes.
>
> *I don't think CephFS is an option, as most files are very small and
> thousands of files will be opened simultaneously.*
>
> Anyone using such an approach?
>
> Kind regards,
> Kevin
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Nfs-ganesha 2.6 packages in ceph repo

2018-05-16 Thread David C

Hi Oliver

Thanks for following up. I just picked this up again today and it was
indeed librados2...the package wasn't installed! It's working now, haven't
tested much but I haven't noticed any problems yet. This is with
nfs-ganesha-2.6.1-0.1.el7.x86_64, libcephfs2-12.2.5-0.el7.x86_64 and
librados2-12.2.5-0.el7.x86_64. Thanks for the pointer on that.

I'd be interested to hear your experience with ganesha with cephfs if
you're happy to share some insights. Any tuning you would recommend?

Thanks,

On Wed, May 16, 2018 at 4:14 PM, Oliver Freyermuth <
freyerm...@physik.uni-bonn.de> wrote:

> Hi David,
>
> did you already manage to check your librados2 version and manage to pin
> down the issue?
>
> Cheers,
> Oliver
>
> Am 11.05.2018 um 17:15 schrieb Oliver Freyermuth:
> > Hi David,
> >
> > Am 11.05.2018 um 16:55 schrieb David C:
> >> Hi Oliver
> >>
> >> Thanks for the detailed reponse! I've downgraded my libcephfs2 to
> 12.2.4 and still get a similar error:
> >>
> >> load_fsal :NFS STARTUP :CRIT :Could not dlopen
> module:/usr/lib64/ganesha/libfsalceph.so Error:/lib64/libcephfs.so.2:
> undefined symbol: _Z14common_preinitRK18CephInitParameters1
> 8code_environment_ti
> >> load_fsal :NFS STARTUP :MAJ :Failed to load module 
> >> (/usr/lib64/ganesha/libfsalceph.so)
> because: Can not access a needed shared library
> >>
> >> I'm on CentOS 7.4, using the following package versions:
> >>
> >> # rpm -qa | grep ganesha
> >> nfs-ganesha-2.6.1-0.1.el7.x86_64
> >> nfs-ganesha-vfs-2.6.1-0.1.el7.x86_64
> >> nfs-ganesha-ceph-2.6.1-0.1.el7.x86_64
> >>
> >> # rpm -qa | grep ceph
> >> libcephfs2-12.2.4-0.el7.x86_64
> >> nfs-ganesha-ceph-2.6.1-0.1.el7.x86_64
> >
> > Mhhhm - that sounds like a messup in the dependencies.
> > The symbol you are missing should be provided by
> > librados2-12.2.4-0.el7.x86_64
> > which contains
> > /usr/lib64/ceph/ceph/libcephfs-common.so.0
> > Do you have a different version of librados2 installed? If so, I wonder
> how yum / rpm allowed that ;-).
> >
> > Thinking again, it might also be (if you indeed have a different version
> there) that this is the cause also for the previous error.
> > If the problematic symbol is indeed not exposed, but can be resolved
> only if both libraries (libcephfs-common and libcephfs) are loaded in
> unison with matching versions,
> > it might be that also 12.2.5 works fine...
> >
> > First thing, in any case, is to checkout which version of librados2 you
> are using ;-).
> >
> > Cheers,
> >   Oliver
> >
> >>
> >> I don't have the ceph user space components installed, assuming they're
> not nesscary apart from libcephfs2? Any idea why it's giving me this error?
> >>
> >> Thanks,
> >>
> >> On Fri, May 11, 2018 at 2:17 AM, Oliver Freyermuth <
> freyerm...@physik.uni-bonn.de <mailto:freyerm...@physik.uni-bonn.de>>
> wrote:
> >>
> >> Hi David,
> >>
> >> for what it's worth, we are running with nfs-ganesha 2.6.1 from
> Ceph repos on CentOS 7.4 with the following set of versions:
> >> libcephfs2-12.2.4-0.el7.x86_64
> >> nfs-ganesha-2.6.1-0.1.el7.x86_64
> >> nfs-ganesha-ceph-2.6.1-0.1.el7.x86_64
> >> Of course, we plan to upgrade to 12.2.5 soon-ish...
> >>
> >> Am 11.05.2018 um 00:05 schrieb David C:
> >> > Hi All
> >> >
> >> > I'm testing out the nfs-ganesha-2.6.1-0.1.el7.x86_64.rpm package
> from http://download.ceph.com/nfs-ganesha/rpm-V2.6-stable/luminous/x86_64/
> <http://download.ceph.com/nfs-ganesha/rpm-V2.6-stable/luminous/x86_64/>
> >> >
> >> > It's failing to load /usr/lib64/ganesha/libfsalceph.so
> >> >
> >> > With libcephfs-12.2.1 installed I get the following error in my
> ganesha log:
> >> >
> >> > load_fsal :NFS STARTUP :CRIT :Could not dlopen
> module:/usr/lib64/ganesha/libfsalceph.so Error:
> >> > /usr/lib64/ganesha/libfsalceph.so: undefined symbol:
> ceph_set_deleg_timeout
> >> > load_fsal :NFS STARTUP :MAJ :Failed to load module
> (/usr/lib64/ganesha/libfsalceph.so) because
> >> > : Can not access a needed shared library
> >>
> >> That looks like an ABI incompatibility, probably the nfs-ganesha
> packages should block this libcephfs2-version (and older ones).
> >

Re: [ceph-users] Cephfs write fail when node goes down

2018-05-15 Thread David C

I've seen similar behavior with cephfs client around that age, try 4.14+

On 15 May 2018 1:57 p.m., "Josef Zelenka" 
wrote:

Client's kernel is 4.4.0. Regarding the hung osd request, i'll have to
check, the issue is gone now, so i'm not sure if i'll find what you are
suggesting. It's rather odd, because Ceph's failover worked for us every
time, so i'm trying to figure out whether it is a ceph or app issue.



On 15/05/18 02:57, Yan, Zheng wrote:
> On Mon, May 14, 2018 at 5:37 PM, Josef Zelenka
>  wrote:
>> Hi everyone, we've encountered an unusual thing in our setup(4 nodes, 48
>> OSDs, 3 monitors - ceph Jewel, Ubuntu 16.04 with kernel 4.4.0).
Yesterday,
>> we were doing a HW upgrade of the nodes, so they went down one by one -
the
>> cluster was in good shape during the upgrade, as we've done this numerous
>> times and we're quite sure that the redundancy wasn't screwed up while
doing
>> this. However, during this upgrade one of the clients that does backups
to
>> cephfs(mounted via the kernel driver) failed to write the backup file
>> correctly to the cluster with the following trace after we turned off
one of
>> the nodes:
>>
>> [2585732.529412]  8800baa279a8 813fb2df 880236230e00
>> 8802339c
>> [2585732.529414]  8800baa28000 88023fc96e00 7fff
>> 8800baa27b20
>> [2585732.529415]  81840ed0 8800baa279c0 818406d5
>> 
>> [2585732.529417] Call Trace:
>> [2585732.529505]  [] ? cpumask_next_and+0x2f/0x40
>> [2585732.529558]  [] ? bit_wait+0x60/0x60
>> [2585732.529560]  [] schedule+0x35/0x80
>> [2585732.529562]  [] schedule_timeout+0x1b5/0x270
>> [2585732.529607]  [] ? kvm_clock_get_cycles+0x1e/0x20
>> [2585732.529609]  [] ? bit_wait+0x60/0x60
>> [2585732.529611]  [] io_schedule_timeout+0xa4/0x110
>> [2585732.529613]  [] bit_wait_io+0x1b/0x70
>> [2585732.529614]  [] __wait_on_bit_lock+0x4e/0xb0
>> [2585732.529652]  [] __lock_page+0xbb/0xe0
>> [2585732.529674]  [] ?
autoremove_wake_function+0x40/0x40
>> [2585732.529676]  [] pagecache_get_page+0x17d/0x1c0
>> [2585732.529730]  [] ? ceph_pool_perm_check+0x48/0x700
>> [ceph]
>> [2585732.529732]  []
grab_cache_page_write_begin+0x26/0x40
>> [2585732.529738]  [] ceph_write_begin+0x48/0xe0 [ceph]
>> [2585732.529739]  [] generic_perform_write+0xce/0x1c0
>> [2585732.529763]  [] ? file_update_time+0xc9/0x110
>> [2585732.529769]  [] ceph_write_iter+0xf89/0x1040
[ceph]
>> [2585732.529792]  [] ?
__alloc_pages_nodemask+0x159/0x2a0
>> [2585732.529808]  [] new_sync_write+0x9b/0xe0
>> [2585732.529811]  [] __vfs_write+0x26/0x40
>> [2585732.529812]  [] vfs_write+0xa9/0x1a0
>> [2585732.529814]  [] SyS_write+0x55/0xc0
>> [2585732.529817]  []
entry_SYSCALL_64_fastpath+0x16/0x71
>>
>>
> is there any hang osd request in /sys/kernel/debug/ceph//osdc?
>
>> I have encountered this behavior on Luminous, but not on Jewel. Anyone
who
>> has a clue why the write fails? As far as i'm concerned, it should always
>> work if all the PGs are available. Thanks
>> Josef
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Nfs-ganesha 2.6 packages in ceph repo

2018-05-11 Thread David C

Hi Oliver

Thanks for the detailed reponse! I've downgraded my libcephfs2 to 12.2.4
and still get a similar error:

load_fsal :NFS STARTUP :CRIT :Could not dlopen
module:/usr/lib64/ganesha/libfsalceph.so
Error:/lib64/libcephfs.so.2: undefined symbol: _Z14common_
preinitRK18CephInitParameters18code_environment_ti
load_fsal :NFS STARTUP :MAJ :Failed to load module
(/usr/lib64/ganesha/libfsalceph.so)
because: Can not access a needed shared library

I'm on CentOS 7.4, using the following package versions:

# rpm -qa | grep ganesha
nfs-ganesha-2.6.1-0.1.el7.x86_64
nfs-ganesha-vfs-2.6.1-0.1.el7.x86_64
nfs-ganesha-ceph-2.6.1-0.1.el7.x86_64

# rpm -qa | grep ceph
libcephfs2-12.2.4-0.el7.x86_64
nfs-ganesha-ceph-2.6.1-0.1.el7.x86_64

I don't have the ceph user space components installed, assuming they're not
nesscary apart from libcephfs2? Any idea why it's giving me this error?

Thanks,

On Fri, May 11, 2018 at 2:17 AM, Oliver Freyermuth <
freyerm...@physik.uni-bonn.de> wrote:

> Hi David,
>
> for what it's worth, we are running with nfs-ganesha 2.6.1 from Ceph repos
> on CentOS 7.4 with the following set of versions:
> libcephfs2-12.2.4-0.el7.x86_64
> nfs-ganesha-2.6.1-0.1.el7.x86_64
> nfs-ganesha-ceph-2.6.1-0.1.el7.x86_64
> Of course, we plan to upgrade to 12.2.5 soon-ish...
>
> Am 11.05.2018 um 00:05 schrieb David C:
> > Hi All
> >
> > I'm testing out the nfs-ganesha-2.6.1-0.1.el7.x86_64.rpm package from
> http://download.ceph.com/nfs-ganesha/rpm-V2.6-stable/luminous/x86_64/
> >
> > It's failing to load /usr/lib64/ganesha/libfsalceph.so
> >
> > With libcephfs-12.2.1 installed I get the following error in my ganesha
> log:
> >
> > load_fsal :NFS STARTUP :CRIT :Could not dlopen
> module:/usr/lib64/ganesha/libfsalceph.so Error:
> > /usr/lib64/ganesha/libfsalceph.so: undefined symbol:
> ceph_set_deleg_timeout
> > load_fsal :NFS STARTUP :MAJ :Failed to load module
> (/usr/lib64/ganesha/libfsalceph.so) because
> > : Can not access a needed shared library
>
> That looks like an ABI incompatibility, probably the nfs-ganesha packages
> should block this libcephfs2-version (and older ones).
>
> >
> >
> > With libcephfs-12.2.5 installed I get:
> >
> > load_fsal :NFS STARTUP :CRIT :Could not dlopen
> module:/usr/lib64/ganesha/libfsalceph.so Error:
> > /lib64/libcephfs.so.2: undefined symbol: _ZNK5FSMap10parse_
> roleEN5boost17basic_string_viewIcSt11char_traitsIcEEEP10mds_role_tRSo
> > load_fsal :NFS STARTUP :MAJ :Failed to load module
> (/usr/lib64/ganesha/libfsalceph.so) because
> > : Can not access a needed shared library
>
> That looks ugly and makes me fear for our planned 12.2.5-upgrade.
> Interestingly, we do not have that symbol on 12.2.4:
> # nm -D /lib64/libcephfs.so.2 | grep FSMap
>  U _ZNK5FSMap10parse_roleERKSsP10mds_role_tRSo
>  U _ZNK5FSMap13print_summaryEPN4ceph9FormatterEPSo
> and NFS-Ganesha works fine.
>
> Looking at:
> https://github.com/ceph/ceph/blob/v12.2.4/src/mds/FSMap.h
> versus
> https://github.com/ceph/ceph/blob/v12.2.5/src/mds/FSMap.h
> it seems this commit:
> https://github.com/ceph/ceph/commit/7d8b3c1082b6b870710989773f3cd9
> 8a472b9a3d
> changed libcephfs2 ABI.
>
> I've no idea how that's usually handled and whether ABI breakage should
> occur within point releases (I would not have expected that...).
> At least, this means either:
> - ABI needs to be reverted to the old state.
> - A new NFS Ganesha build is needed. Probably, if this is a common thing,
> builds should be automated and be synchronized to ceph releases,
>   and old versions should be kept around.
>
> I'll hold back our update to 12.2.5 until this is resolved, so many thanks
> from my side!
>
> Let's see who jumps in to resolve it...
>
> Cheers,
> Oliver
> >
> >
> > My cluster is running 12.2.1
> >
> > All package versions:
> >
> > nfs-ganesha-2.6.1-0.1.el7.x86_64
> > nfs-ganesha-ceph-2.6.1-0.1.el7.x86_64
> > libcephfs2-12.2.5-0.el7.x86_64
> >
> > Can anyone point me in the right direction?
> >
> > Thanks,
> > David
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Nfs-ganesha 2.6 packages in ceph repo

2018-05-10 Thread David C

Hi All

I'm testing out the nfs-ganesha-2.6.1-0.1.el7.x86_64.rpm package from
http://download.ceph.com/nfs-ganesha/rpm-V2.6-stable/luminous/x86_64/

It's failing to load /usr/lib64/ganesha/libfsalceph.so

With libcephfs-12.2.1 installed I get the following error in my ganesha log:

load_fsal :NFS STARTUP :CRIT :Could not dlopen
> module:/usr/lib64/ganesha/libfsalceph.so Error:
> /usr/lib64/ganesha/libfsalceph.so: undefined symbol: ceph_set_deleg_timeout
> load_fsal :NFS STARTUP :MAJ :Failed to load module
> (/usr/lib64/ganesha/libfsalceph.so) because
> : Can not access a needed shared library
>

With libcephfs-12.2.5 installed I get:

load_fsal :NFS STARTUP :CRIT :Could not dlopen
> module:/usr/lib64/ganesha/libfsalceph.so Error:
> /lib64/libcephfs.so.2: undefined symbol:
> _ZNK5FSMap10parse_roleEN5boost17basic_string_viewIcSt11char_traitsIcEEEP10mds_role_tRSo
> load_fsal :NFS STARTUP :MAJ :Failed to load module
> (/usr/lib64/ganesha/libfsalceph.so) because
> : Can not access a needed shared library
>

My cluster is running 12.2.1

All package versions:

nfs-ganesha-2.6.1-0.1.el7.x86_64
nfs-ganesha-ceph-2.6.1-0.1.el7.x86_64
libcephfs2-12.2.5-0.el7.x86_64

Can anyone point me in the right direction?

Thanks,
David
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Poor read performance.

2018-04-25 Thread David C

How does your rados bench look?

Have you tried playing around with read ahead and striping?


On Tue, 24 Apr 2018 17:53 Jonathan Proulx,  wrote:

> Hi All,
>
> I seem to be seeing consitently poor read performance on my cluster
> relative to both write performance and read perormance of a single
> backend disk, by quite a lot.
>
> cluster is luminous with 174 7.2k SAS drives across 12 storage servers
> with 10G ethernet and jumbo frames.  Drives are mix 4T and 2T
> bluestore with DB on ssd.
>
> The performence I really care about is over rbd for VMs in my
> OpenStack but 'rbd bench' seems to line up frety well with 'fio' test
> inside VMs so a more or less typical random write rbd bench (from a
> monitor node with 10G connection on same net as osds):
>
> rbd bench  --io-total=4G --io-size 4096 --io-type write \
> --io-pattern rand --io-threads 16 mypool/myvol
>
> 
>
> elapsed:   361  ops:  1048576  ops/sec:  2903.82  bytes/sec: 11894034.98
>
> same for random read is an order of magnitude lower:
>
> rbd bench  --io-total=4G --io-size 4096 --io-type read \
> --io-pattern rand --io-threads 16  mypool/myvol
>
> elapsed:  3354  ops:  1048576  ops/sec:   312.60  bytes/sec: 1280403.47
>
> (sequencial reads and bigger io-size help but not a lot)
>
> ceph -s from during read bench so get a sense of comparing traffic:
>
>   cluster:
> id: 
> health: HEALTH_OK
>
>   services:
> mon: 3 daemons, quorum ceph-mon0,ceph-mon1,ceph-mon2
> mgr: ceph-mon0(active), standbys: ceph-mon2, ceph-mon1
> osd: 174 osds: 174 up, 174 in
> rgw: 3 daemon active
>
>   data:
> pools:   19 pools, 10240 pgs
> objects: 17342k objects, 80731 GB
> usage:   240 TB used, 264 TB / 505 TB avail
> pgs: 10240 active+clean
>
>   io:
> client:   4296 kB/s rd, 417 MB/s wr, 1635 op/s rd, 3518 op/s wr
>
>
> During deep-scrubs overnight I can see the disks doing >500MBps reads
> and ~150rx/iops (each at peak), while during read bench (including all
> traffic from ~1k VMs) individual osd data partitions peak around 25
> rx/iops and 1.5MBps rx bandwidth so it seems like there should be
> performance to spare.
>
> Obviosly given my disk choices this isn't designed as a particularly
> high performance setup but I do expect a bit mroe performance out of
> it.
>
> Are my expectations wrong? If not any clues what I've don (or failed
> to do) that is wrong?
>
> Pretty sure rx/wx was much more sysmetric in earlier versions (subset
> of same hardware and filestore backend) but used a different perf tool
> so don't want to make direct comparisons.
>
> -Jon
>
> --
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] cephfs performance issue

2018-03-29 Thread David C

Pretty sure you're getting stung by: http://tracker.ceph.com/issues/17563

Consider using an elrepo kernel, 4.14 works well for me.



On Thu, 29 Mar 2018, 09:46 Dan van der Ster,  wrote:

> On Thu, Mar 29, 2018 at 10:31 AM, Robert Sander
>  wrote:
> > On 29.03.2018 09:50, ouyangxu wrote:
> >
> >> I'm using Ceph 12.2.4 with CentOS 7.4, and tring to use cephfs for
> >> MariaDB deployment,
> >
> > Don't do this.
> > As the old saying goes: If it hurts, stop doing it.
>
> Why not? Let's find out where and why the perf is lacking, then fix it!
>
> -- dan
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cephfs MDS slow requests

2018-03-14 Thread David C

Thanks, John. I'm pretty sure the root of my slow OSD issues is filestore
subfolder splitting.


On Wed, Mar 14, 2018 at 2:17 PM, John Spray  wrote:

> On Tue, Mar 13, 2018 at 7:17 PM, David C  wrote:
> > Hi All
> >
> > I have a Samba server that is exporting directories from a Cephfs Kernel
> > mount. Performance has been pretty good for the last year but users have
> > recently been complaining of short "freezes", these seem to coincide with
> > MDS related slow requests in the monitor ceph.log such as:
> >
> >> 2018-03-13 13:34:58.461030 osd.15 osd.15 10.10.10.211:6812/13367 5752 :
> >> cluster [WRN] slow request 31.834418 seconds old, received at 2018-03-13
> >> 13:34:26.626474: osd_repop(mds.0.5495:810644 3.3e e14085/14019
> >> 3:7cea5bac:::10001a88b8f.:head v 14085'846936) currently
> commit_sent
> >> 2018-03-13 13:34:59.461270 osd.15 osd.15 10.10.10.211:6812/13367 5754 :
> >> cluster [WRN] slow request 32.832059 seconds old, received at 2018-03-13
> >> 13:34:26.629151: osd_repop(mds.0.5495:810671 2.dc2 e14085/14020
> >> 2:43bdcc3f:::10001e91a91.:head v 14085'21394) currently
> commit_sent
> >> 2018-03-13 14:23:57.409427 osd.30 osd.30 10.10.10.212:6824/14997 5708 :
> >> cluster [WRN] slow request 30.536832 seconds old, received at 2018-03-13
> >> 14:23:26.872513: osd_repop(mds.0.5495:865403 2.fb6 e14085/14077
> >> 2:6df955ef:::10001e93542.00c4:head v 14085'21296) currently
> commit_sent
> >> 2018-03-13 14:23:57.409449 osd.30 osd.30 10.10.10.212:6824/14997 5709 :
> >> cluster [WRN] slow request 30.529640 seconds old, received at 2018-03-13
> >> 14:23:26.879704: osd_repop(mds.0.5495:865407 2.595 e14085/14019
> >> 2:a9a56101:::10001e93542.00c8:head v 14085'20437) currently
> commit_sent
> >> 2018-03-13 14:23:57.409453 osd.30 osd.30 10.10.10.212:6824/14997 5710 :
> >> cluster [WRN] slow request 30.503138 seconds old, received at 2018-03-13
> >> 14:23:26.906207: osd_repop(mds.0.5495:865423 2.ea e14085/14055
> >> 2:57096bbf:::10001e93542.00d8:head v 14085'21147) currently
> commit_sent
> >
> >
> > --
> >
> > Looking in the MDS log, with debug set to 4, it's full of
> "setfilelockrule
> > 1" and "setfilelockrule 2":
> >
> >> 2018-03-13 14:23:00.446905 7fde43e73700  4 mds.0.server
> >> handle_client_request client_request(client.9174621:141162337
> >> setfilelockrule 1, type 4, owner 14971048052668053939, pid 7, start 120,
> >> length 1, wait 0 #0x10001e8dc37 2018-03-13 14:22:58.838521
> caller_uid=1155,
> >> caller_gid=1131{}) v2
> >> 2018-03-13 14:23:00.447050 7fde43e73700  4 mds.0.server
> >> handle_client_request client_request(client.9174621:141162338
> >> setfilelockrule 2, type 4, owner 14971048137043556787, pid 4632, start
> 0,
> >> length 0, wait 0 #0x10001e8dc37 2018-03-13 14:22:58.838521 caller_uid=0,
> >> caller_gid=0{}) v2
> >> 2018-03-13 14:23:00.447258 7fde43e73700  4 mds.0.server
> >> handle_client_request client_request(client.9174621:141162339
> >> setfilelockrule 2, type 4, owner 14971048137043550643, pid 4632, start
> 0,
> >> length 0, wait 0 #0x10001e8dc37 2018-03-13 14:22:58.838521 caller_uid=0,
> >> caller_gid=0{}) v2
> >> 2018-03-13 14:23:00.447393 7fde43e73700  4 mds.0.server
> >> handle_client_request client_request(client.9174621:141162340
> >> setfilelockrule 1, type 4, owner 14971048052668053939, pid 7, start 124,
> >> length 1, wait 0 #0x10001e8dc37 2018-03-13 14:22:58.838521
> caller_uid=1155,
> >> caller_gid=1131{}) v2
>
> The MDS reporting slow requests when file locking in use is a bug, the
> ticket is:
> http://tracker.ceph.com/issues/22428
>
> Probably only indirectly related to the stuck OSD requests: perhaps
> the application itself is having trouble promptly releasing locks
> because it is hung up on flushing its data to slow OSDs.
>
> John
>
> >
> > --
> >
> > I don't have a particularly good monitoring set up on this cluster yet,
> but
> > a cursory look at a few things such as iostat doesn't seem to suggest
> OSDs
> > are being hammered.
> >
> > Some questions:
> >
> > 1) Can anyone recommend a way of diagnosing this issue?
> > 2) Are the multiple "setfilelockrule" per inode to be expected? I assume
> > this is something to do with the Samba oplocks.
> > 3) What's the recommended highest MDS debug setting before performance
> > starts to be adversely

Re: [ceph-users] Luminous | PG split causing slow requests

2018-03-14 Thread David C

On Mon, Feb 26, 2018 at 6:08 PM, David Turner  wrote:

> The slow requests are absolutely expected on filestore subfolder
> splitting.  You can however stop an OSD, split it's subfolders, and start
> it back up.  I perform this maintenance once/month.  I changed my settings
> to [1]these, but I only suggest doing something this drastic if you're
> committed to manually split your PGs regularly.  In my environment that
> needs to be once/month.
>

Hi David, to be honest I've still not completely got my head around the
filestore splitting, but one thing's for sure it's causing major IO issues
on my small cluster. If I understand correctly, your settings in [1]
completely disable "online" merging and splitting. Have I got that right?

Why is your filestore_merge_threshold -16 as opposed to -1?

You say you need to do your offline splitting on a monthly basis in your
environment but how are you arriving at that conclusion? What would I need
to monitor to discover how frequently I would need to do a split?

Thanks for all your help on this


>
> Along with those settings, I use [2]this script to perform the subfolder
> splitting. It will change your config file to [3]these settings, perform
> the subfolder splitting, change them back to what you currently have, and
> start your OSDs back up.  using a negative merge threshold prevents
> subfolder merging which is useful for some environments.
>
> The script automatically sets noout and unset it for you afterwards as
> well it won't start unless the cluster is health_ok.  Feel free to use it
> as is or pick from it what's useful for you.  I highly suggest that anyone
> feeling the pains of subfolder splitting to do some sort of offline
> splitting to get through it.  If you're using some sort of config
> management like salt or puppet, be sure to disable it so that the config
> won't be overwritten while the subfolders are being split.
>
>
> [1] filestore_merge_threshold = -16
>  filestore_split_multiple = 256
>
> [2] https://gist.github.com/drakonstein/cb76c7696e65522ab0e699b7ea1ab1c4
>
> [3] filestore_merge_threshold = -1
>  filestore_split_multiple = 1
> On Mon, Feb 26, 2018 at 12:18 PM David C  wrote:
>
>> Thanks, David. I think I've probably used the wrong terminology here, I'm
>> not splitting PGs to create more PGs. This is the PG folder splitting that
>> happens automatically, I believe it's controlled by the
>> "filestore_split_multiple" setting (which is 8 on my OSDs, I believe that's
>> the Luminous default...). Increasing heartbeat grace would probably still
>> be a good idea to prevent the flapping. I'm trying to understand if the
>> slow requests is to be expected or if I need to tune something or look at
>> hardware.
>>
>> On Mon, Feb 26, 2018 at 4:19 PM, David Turner 
>> wrote:
>>
>>> Splitting PG's is one of the most intensive and disruptive things you
>>> can, and should, do to a cluster.  Tweaking recovery sleep, max backfills,
>>> and heartbeat grace should help with this.  Heartbeat grace can be set high
>>> enough to mitigate the OSDs flapping which slows things down by peering and
>>> additional recovery, while still being able to detect OSDs that might fail
>>> and go down.  The recovery sleep and max backfills are the settings you
>>> want to look at for mitigating slow requests.  I generally tweak those
>>> while watching iostat of some OSDs and ceph -s to make sure I'm not giving
>>> too  much priority to the recovery operations so that client IO can still
>>> happen.
>>>
>>> On Mon, Feb 26, 2018 at 11:10 AM David C 
>>> wrote:
>>>
>>>> Hi All
>>>>
>>>> I have a 12.2.1 cluster, all filestore OSDs, OSDs are spinners,
>>>> journals on NVME. Cluster primarily used for CephFS, ~20M objects.
>>>>
>>>> I'm seeing some OSDs getting marked down, it appears to be related to
>>>> PG splitting, e.g:
>>>>
>>>> 2018-02-26 10:27:27.935489 7f140dbe2700  1 _created [C,D] has 5121
>>>>> objects, starting split.
>>>>>
>>>>
>>>> Followed by:
>>>>
>>>> 2018-02-26 10:27:58.242551 7f141cc3f700  0 log_channel(cluster) log
>>>>> [WRN] : 9 slow requests, 5 included below; oldest blocked for > 30.308128
>>>>> secs
>>>>> 2018-02-26 10:27:58.242563 7f141cc3f700  0 log_channel(cluster) log
>>>>> [WRN] : slow request 30.151105 seconds old, received at 2018-02-26
>>>>> 10:27:28.091312: osd_op(mds.0.5339:811

Re: [ceph-users] Cephfs MDS slow requests

2018-03-13 Thread David C

Thanks for the detailed response, Greg. A few follow ups inline:

On 13 Mar 2018 20:52, "Gregory Farnum"  wrote:

On Tue, Mar 13, 2018 at 12:17 PM, David C  wrote:
> Hi All
>
> I have a Samba server that is exporting directories from a Cephfs Kernel
> mount. Performance has been pretty good for the last year but users have
> recently been complaining of short "freezes", these seem to coincide with
> MDS related slow requests in the monitor ceph.log such as:
>
>> 2018-03-13 13:34:58.461030 osd.15 osd.15 10.10.10.211:6812/13367 5752 :
>> cluster [WRN] slow request 31.834418 seconds old, received at 2018-03-13
>> 13:34:26.626474: osd_repop(mds.0.5495:810644 3.3e e14085/14019
>> 3:7cea5bac:::10001a88b8f.:head v 14085'846936) currently
commit_sent
>> 2018-03-13 13:34:59.461270 osd.15 osd.15 10.10.10.211:6812/13367 5754 :
>> cluster [WRN] slow request 32.832059 seconds old, received at 2018-03-13
>> 13:34:26.629151: osd_repop(mds.0.5495:810671 2.dc2 e14085/14020
>> 2:43bdcc3f:::10001e91a91.:head v 14085'21394) currently
commit_sent
>> 2018-03-13 14:23:57.409427 osd.30 osd.30 10.10.10.212:6824/14997 5708 :
>> cluster [WRN] slow request 30.536832 seconds old, received at 2018-03-13
>> 14:23:26.872513: osd_repop(mds.0.5495:865403 2.fb6 e14085/14077
>> 2:6df955ef:::10001e93542.00c4:head v 14085'21296) currently
commit_sent
>> 2018-03-13 14:23:57.409449 osd.30 osd.30 10.10.10.212:6824/14997 5709 :
>> cluster [WRN] slow request 30.529640 seconds old, received at 2018-03-13
>> 14:23:26.879704: osd_repop(mds.0.5495:865407 2.595 e14085/14019
>> 2:a9a56101:::10001e93542.00c8:head v 14085'20437) currently
commit_sent
>> 2018-03-13 14:23:57.409453 osd.30 osd.30 10.10.10.212:6824/14997 5710 :
>> cluster [WRN] slow request 30.503138 seconds old, received at 2018-03-13
>> 14:23:26.906207: osd_repop(mds.0.5495:865423 2.ea e14085/14055
>> 2:57096bbf:::10001e93542.00d8:head v 14085'21147) currently
commit_sent

Well, that means your OSDs are getting operations that commit quickly
to a journal but are taking a while to get into the backing
filesystem. (I assume this is on filestore based on that message
showing up at all, but could be missing something.)


Yep it's filestore. Journals are on Intel P3700 NVME, data and metadata
pools both on 7200rpm SATA. Sounds like I might benefit from moving
metadata to a dedicated SSD pool.

In the meantime, are there any recommended tunables? Filestore max/min sync
interval for example?


>
>
> --
>
> Looking in the MDS log, with debug set to 4, it's full of "setfilelockrule
> 1" and "setfilelockrule 2":
>
>> 2018-03-13 14:23:00.446905 7fde43e73700  4 mds.0.server
>> handle_client_request client_request(client.9174621:141162337
>> setfilelockrule 1, type 4, owner 14971048052668053939, pid 7, start 120,
>> length 1, wait 0 #0x10001e8dc37 2018-03-13 14:22:58.838521
caller_uid=1155,
>> caller_gid=1131{}) v2
>> 2018-03-13 14:23:00.447050 7fde43e73700  4 mds.0.server
>> handle_client_request client_request(client.9174621:141162338
>> setfilelockrule 2, type 4, owner 14971048137043556787, pid 4632, start 0,
>> length 0, wait 0 #0x10001e8dc37 2018-03-13 14:22:58.838521 caller_uid=0,
>> caller_gid=0{}) v2
>> 2018-03-13 14:23:00.447258 7fde43e73700  4 mds.0.server
>> handle_client_request client_request(client.9174621:141162339
>> setfilelockrule 2, type 4, owner 14971048137043550643, pid 4632, start 0,
>> length 0, wait 0 #0x10001e8dc37 2018-03-13 14:22:58.838521 caller_uid=0,
>> caller_gid=0{}) v2
>> 2018-03-13 14:23:00.447393 7fde43e73700  4 mds.0.server
>> handle_client_request client_request(client.9174621:141162340
>> setfilelockrule 1, type 4, owner 14971048052668053939, pid 7, start 124,
>> length 1, wait 0 #0x10001e8dc37 2018-03-13 14:22:58.838521
caller_uid=1155,
>> caller_gid=1131{}) v2

And that is clients setting (and releasing) advisory locks on files. I
don't think this should directly have anything to do with the slow OSD
requests (file locking is ephemeral state, not committed to disk), but
if you have new applications running which are taking file locks on
shared files that could definitely impede other clients and slow
things down more generally.
-Greg


Sounds like that could be a red herring then. It seems like my issue is
users chucking lots of small writes at the cephfs mount.


>
>
> --
>
> I don't have a particularly good monitoring set up on this cluster yet,
but
> a cursory look at a few things such as iostat doesn't seem to suggest OSDs
> are being hammered.
>
> Some questions:
>
> 1) Can anyone recommend a way of diagnosing this issue?
> 2)

[ceph-users] Cephfs MDS slow requests

2018-03-13 Thread David C

Hi All

I have a Samba server that is exporting directories from a Cephfs Kernel
mount. Performance has been pretty good for the last year but users have
recently been complaining of short "freezes", these seem to coincide with
MDS related slow requests in the monitor ceph.log such as:

2018-03-13 13:34:58.461030 osd.15 osd.15 10.10.10.211:6812/13367 5752 :
> cluster [WRN] slow request 31.834418 seconds old, received at 2018-03-13
> 13:34:26.626474: osd_repop(mds.0.5495:810644 3.3e e14085/14019
> 3:7cea5bac:::10001a88b8f.:head v 14085'846936) currently commit_sent
> 2018-03-13 13:34:59.461270 osd.15 osd.15 10.10.10.211:6812/13367 5754 :
> cluster [WRN] slow request 32.832059 seconds old, received at 2018-03-13
> 13:34:26.629151: osd_repop(mds.0.5495:810671 2.dc2 e14085/14020
> 2:43bdcc3f:::10001e91a91.:head v 14085'21394) currently commit_sent
> 2018-03-13 14:23:57.409427 osd.30 osd.30 10.10.10.212:6824/14997 5708 :
> cluster [WRN] slow request 30.536832 seconds old, received at 2018-03-13
> 14:23:26.872513: osd_repop(mds.0.5495:865403 2.fb6 e14085/14077
> 2:6df955ef:::10001e93542.00c4:head v 14085'21296) currently commit_sent
> 2018-03-13 14:23:57.409449 osd.30 osd.30 10.10.10.212:6824/14997 5709 :
> cluster [WRN] slow request 30.529640 seconds old, received at 2018-03-13
> 14:23:26.879704: osd_repop(mds.0.5495:865407 2.595 e14085/14019
> 2:a9a56101:::10001e93542.00c8:head v 14085'20437) currently commit_sent
> 2018-03-13 14:23:57.409453 osd.30 osd.30 10.10.10.212:6824/14997 5710 :
> cluster [WRN] slow request 30.503138 seconds old, received at 2018-03-13
> 14:23:26.906207: osd_repop(mds.0.5495:865423 2.ea e14085/14055
> 2:57096bbf:::10001e93542.00d8:head v 14085'21147) currently commit_sent


-- 

Looking in the MDS log, with debug set to 4, it's full of "setfilelockrule
1" and "setfilelockrule 2":

2018-03-13 14:23:00.446905 7fde43e73700  4 mds.0.server
> handle_client_request client_request(client.9174621:141162337
> setfilelockrule 1, type 4, owner 14971048052668053939, pid 7, start 120,
> length 1, wait 0 #0x10001e8dc37 2018-03-13 14:22:58.838521 caller_uid=1155,
> caller_gid=1131{}) v2
> 2018-03-13 14:23:00.447050 7fde43e73700  4 mds.0.server
> handle_client_request client_request(client.9174621:141162338
> setfilelockrule 2, type 4, owner 14971048137043556787, pid 4632, start 0,
> length 0, wait 0 #0x10001e8dc37 2018-03-13 14:22:58.838521 caller_uid=0,
> caller_gid=0{}) v2
> 2018-03-13 14:23:00.447258 7fde43e73700  4 mds.0.server
> handle_client_request client_request(client.9174621:141162339
> setfilelockrule 2, type 4, owner 14971048137043550643, pid 4632, start 0,
> length 0, wait 0 #0x10001e8dc37 2018-03-13 14:22:58.838521 caller_uid=0,
> caller_gid=0{}) v2
> 2018-03-13 14:23:00.447393 7fde43e73700  4 mds.0.server
> handle_client_request client_request(client.9174621:141162340
> setfilelockrule 1, type 4, owner 14971048052668053939, pid 7, start 124,
> length 1, wait 0 #0x10001e8dc37 2018-03-13 14:22:58.838521 caller_uid=1155,
> caller_gid=1131{}) v2


-- 

I don't have a particularly good monitoring set up on this cluster yet, but
a cursory look at a few things such as iostat doesn't seem to suggest OSDs
are being hammered.

Some questions:

1) Can anyone recommend a way of diagnosing this issue?
2) Are the multiple "setfilelockrule" per inode to be expected? I assume
this is something to do with the Samba oplocks.
3) What's the recommended highest MDS debug setting before performance
starts to be adversely affected (I'm aware log files will get huge)?
4) What's the best way of matching inodes in the MDS log to the file names
in cephfs?

Hardware/Versions:

Luminous 12.1.1
Cephfs client 3.10.0-514.2.2.el7.x86_64
Samba 4.4.4
4 node cluster, each node 1xIntel 3700 NVME, 12x SATA, 40Gbps networking

Thanks in advance!

Cheers,
David
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Corrupted files on CephFS since Luminous upgrade

2018-02-28 Thread David C

On 27 Feb 2018 06:46, "Jan Pekař - Imatic"  wrote:

I think I hit the same issue.
I have corrupted data on cephfs and I don't remember the same issue before
Luminous (i did the same tests before).

It is on my test 1 node cluster with lower memory then recommended (so
server is swapping) but it shouldn't lose data (it never did before).
So slow requests may appear in the log like Florent B mentioned.

My test is to take some bigger files (few GB) and copy it to cephfs or from
cephfs to cephfs and stress the cluster so data copying stall for a while.
It will resume in few seconds/minutes and everything looks ok (no error on
copying). But copied file may be corrupted silently.

I checked wiles with MD5SUM and compared some corrupted files in detail.
There were missing some 4MB blocks of data (cephfs object size) - corrupted
file had that block of data filled with zeroes.

My idea is, that there happen something wrong when cluster is under
pressure and client want to save the block. Client gets OK and continues
with another block so data is lost and corrupted block is filled with zeros.

I tried kernel client 4.x and ceph-fuse client with same result.

I'm using erasure for cephfs data pool, cache tier and my storage is
bluestore and filestore mixed.

How can I help to debug or what should I do to help to find the problem?

Always worrying to see the dreaded C word. I operate a Luminous cluster
with a pretty varied workload and have yet to see any signs of corruption,
although of course that doesn't mean its not happening. Initial questions:

- What's the history of your cluster? Was this an upgrade or fresh Luminous
install?
- Was ceph healthy when you ran this test?
-Are you accessing this one node cluster from the node itself or from a
separate client?

I'd recommend starting a new thread with more details, it sounds like it's
pretty reproducable for you so maybe crank up your debugging and send logs.
http://docs.ceph.com/docs/luminous/dev/kernel-client-troubleshooting/

With regards
Jan Pekar

On 14.12.2017 15:41, Yan, Zheng wrote:

> On Thu, Dec 14, 2017 at 8:52 PM, Florent B  wrote:
>
>> On 14/12/2017 03:38, Yan, Zheng wrote:
>>
>>> On Thu, Dec 14, 2017 at 12:49 AM, Florent B  wrote:
>>>
>>>>
>>>> Systems are on Debian Jessie : kernel 3.16.0-4-amd64 & libfuse 2.9.3-15.
>>>>
>>>> I don't know pattern of corruption, but according to error message in
>>>> Dovecot, it seems to expect data to read but reach EOF.
>>>>
>>>> All seems fine using fuse_disable_pagecache (no more corruption, and
>>>> performance increased : no more MDS slow requests on filelock requests).
>>>>
>>>
>>> I checked ceph-fuse changes since kraken, didn't find any clue. I
>>> would be helpful if you can try recent version kernel.
>>>
>>> Regards
>>> Yan, Zheng
>>>
>>
>> Problem occurred this morning even with fuse_disable_pagecache=true.
>>
>> It seems to be a lock issue between imap & lmtp processes.
>>
>> Dovecot uses fcntl as locking method. Is there any change about it in
>> Luminous ? I switched to flock to see if problem is still there...
>>
>>
> I don't remenber there is any change.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
-- 

Ing. Jan Pekař
jan.pe...@imatic.cz | +420603811737

Imatic | Jagellonská 14 | Praha 3 | 130 00
http://www.imatic.cz

--

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Kll
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Luminous | PG split causing slow requests

2018-02-27 Thread David C

This is super helpful, thanks for sharing, David. I need to a bit more
reading into this.

On 26 Feb 2018 6:08 p.m., "David Turner"  wrote:

The slow requests are absolutely expected on filestore subfolder
splitting.  You can however stop an OSD, split it's subfolders, and start
it back up.  I perform this maintenance once/month.  I changed my settings
to [1]these, but I only suggest doing something this drastic if you're
committed to manually split your PGs regularly.  In my environment that
needs to be once/month.

Along with those settings, I use [2]this script to perform the subfolder
splitting. It will change your config file to [3]these settings, perform
the subfolder splitting, change them back to what you currently have, and
start your OSDs back up.  using a negative merge threshold prevents
subfolder merging which is useful for some environments.

The script automatically sets noout and unset it for you afterwards as well
it won't start unless the cluster is health_ok.  Feel free to use it as is
or pick from it what's useful for you.  I highly suggest that anyone
feeling the pains of subfolder splitting to do some sort of offline
splitting to get through it.  If you're using some sort of config
management like salt or puppet, be sure to disable it so that the config
won't be overwritten while the subfolders are being split.


[1] filestore_merge_threshold = -16
 filestore_split_multiple = 256

[2] https://gist.github.com/drakonstein/cb76c7696e65522ab0e699b7ea1ab1c4

[3] filestore_merge_threshold = -1
 filestore_split_multiple = 1
On Mon, Feb 26, 2018 at 12:18 PM David C  wrote:

> Thanks, David. I think I've probably used the wrong terminology here, I'm
> not splitting PGs to create more PGs. This is the PG folder splitting that
> happens automatically, I believe it's controlled by the
> "filestore_split_multiple" setting (which is 8 on my OSDs, I believe that's
> the Luminous default...). Increasing heartbeat grace would probably still
> be a good idea to prevent the flapping. I'm trying to understand if the
> slow requests is to be expected or if I need to tune something or look at
> hardware.
>
> On Mon, Feb 26, 2018 at 4:19 PM, David Turner 
> wrote:
>
>> Splitting PG's is one of the most intensive and disruptive things you
>> can, and should, do to a cluster.  Tweaking recovery sleep, max backfills,
>> and heartbeat grace should help with this.  Heartbeat grace can be set high
>> enough to mitigate the OSDs flapping which slows things down by peering and
>> additional recovery, while still being able to detect OSDs that might fail
>> and go down.  The recovery sleep and max backfills are the settings you
>> want to look at for mitigating slow requests.  I generally tweak those
>> while watching iostat of some OSDs and ceph -s to make sure I'm not giving
>> too  much priority to the recovery operations so that client IO can still
>> happen.
>>
>> On Mon, Feb 26, 2018 at 11:10 AM David C  wrote:
>>
>>> Hi All
>>>
>>> I have a 12.2.1 cluster, all filestore OSDs, OSDs are spinners, journals
>>> on NVME. Cluster primarily used for CephFS, ~20M objects.
>>>
>>> I'm seeing some OSDs getting marked down, it appears to be related to PG
>>> splitting, e.g:
>>>
>>> 2018-02-26 10:27:27.935489 7f140dbe2700  1 _created [C,D] has 5121
>>>> objects, starting split.
>>>>
>>>
>>> Followed by:
>>>
>>> 2018-02-26 10:27:58.242551 7f141cc3f700  0 log_channel(cluster) log
>>>> [WRN] : 9 slow requests, 5 included below; oldest blocked for > 30.308128
>>>> secs
>>>> 2018-02-26 10:27:58.242563 7f141cc3f700  0 log_channel(cluster) log
>>>> [WRN] : slow request 30.151105 seconds old, received at 2018-02-26
>>>> 10:27:28.091312: osd_op(mds.0.5339:811969 3.5c
>>>> 3:3bb9d743:::200.0018c6c4:head [write 73416~5897 [fadvise_dontneed]] snapc
>>>> 0=[] ondisk+write+known_if_redirected+full_force e13994) currently
>>>> commit_sent
>>>> 2018-02-26 10:27:58.242569 7f141cc3f700  0 log_channel(cluster) log
>>>> [WRN] : slow request 30.133441 seconds old, received at 2018-02-26
>>>> 10:27:28.108976: osd_op(mds.0.5339:811970 3.5c
>>>> 3:3bb9d743:::200.0018c6c4:head [write 79313~4866 [fadvise_dontneed]] snapc
>>>> 0=[] ondisk+write+known_if_redirected+full_force e13994) currently
>>>> commit_sent
>>>> 2018-02-26 10:27:58.242574 7f141cc3f700  0 log_channel(cluster) log
>>>> [WRN] : slow request 30.083401 seconds old, received at 2018-02-26
>>>> 10:27:28.159016: osd_op(mds.9174516.0:444

Re: [ceph-users] Luminous | PG split causing slow requests

2018-02-26 Thread David C

Thanks, David. I think I've probably used the wrong terminology here, I'm
not splitting PGs to create more PGs. This is the PG folder splitting that
happens automatically, I believe it's controlled by the
"filestore_split_multiple" setting (which is 8 on my OSDs, I believe that's
the Luminous default...). Increasing heartbeat grace would probably still
be a good idea to prevent the flapping. I'm trying to understand if the
slow requests is to be expected or if I need to tune something or look at
hardware.

On Mon, Feb 26, 2018 at 4:19 PM, David Turner  wrote:

> Splitting PG's is one of the most intensive and disruptive things you can,
> and should, do to a cluster.  Tweaking recovery sleep, max backfills, and
> heartbeat grace should help with this.  Heartbeat grace can be set high
> enough to mitigate the OSDs flapping which slows things down by peering and
> additional recovery, while still being able to detect OSDs that might fail
> and go down.  The recovery sleep and max backfills are the settings you
> want to look at for mitigating slow requests.  I generally tweak those
> while watching iostat of some OSDs and ceph -s to make sure I'm not giving
> too  much priority to the recovery operations so that client IO can still
> happen.
>
> On Mon, Feb 26, 2018 at 11:10 AM David C  wrote:
>
>> Hi All
>>
>> I have a 12.2.1 cluster, all filestore OSDs, OSDs are spinners, journals
>> on NVME. Cluster primarily used for CephFS, ~20M objects.
>>
>> I'm seeing some OSDs getting marked down, it appears to be related to PG
>> splitting, e.g:
>>
>> 2018-02-26 10:27:27.935489 7f140dbe2700  1 _created [C,D] has 5121
>>> objects, starting split.
>>>
>>
>> Followed by:
>>
>> 2018-02-26 10:27:58.242551 7f141cc3f700  0 log_channel(cluster) log [WRN]
>>> : 9 slow requests, 5 included below; oldest blocked for > 30.308128 secs
>>> 2018-02-26 10:27:58.242563 7f141cc3f700  0 log_channel(cluster) log
>>> [WRN] : slow request 30.151105 seconds old, received at 2018-02-26
>>> 10:27:28.091312: osd_op(mds.0.5339:811969 3.5c
>>> 3:3bb9d743:::200.0018c6c4:head [write 73416~5897 [fadvise_dontneed]] snapc
>>> 0=[] ondisk+write+known_if_redirected+full_force e13994) currently
>>> commit_sent
>>> 2018-02-26 10:27:58.242569 7f141cc3f700  0 log_channel(cluster) log
>>> [WRN] : slow request 30.133441 seconds old, received at 2018-02-26
>>> 10:27:28.108976: osd_op(mds.0.5339:811970 3.5c
>>> 3:3bb9d743:::200.0018c6c4:head [write 79313~4866 [fadvise_dontneed]] snapc
>>> 0=[] ondisk+write+known_if_redirected+full_force e13994) currently
>>> commit_sent
>>> 2018-02-26 10:27:58.242574 7f141cc3f700  0 log_channel(cluster) log
>>> [WRN] : slow request 30.083401 seconds old, received at 2018-02-26
>>> 10:27:28.159016: osd_op(mds.9174516.0:444202 3.5c
>>> 3:3bb9d743:::200.0018c6c4:head [stat] snapc 0=[]
>>> ondisk+read+rwordered+known_if_redirected+full_force e13994) currently
>>> waiting for rw locks
>>> 2018-02-26 10:27:58.242579 7f141cc3f700  0 log_channel(cluster) log
>>> [WRN] : slow request 30.072310 seconds old, received at 2018-02-26
>>> 10:27:28.170107: osd_op(mds.0.5339:811971 3.5c
>>> 3:3bb9d743:::200.0018c6c4:head [write 84179~1941 [fadvise_dontneed]] snapc
>>> 0=[] ondisk+write+known_if_redirected+full_force e13994) currently
>>> waiting for rw locks
>>> 2018-02-26 10:27:58.242584 7f141cc3f700  0 log_channel(cluster) log
>>> [WRN] : slow request 30.308128 seconds old, received at 2018-02-26
>>> 10:27:27.934288: osd_op(mds.0.5339:811964 3.5c
>>> 3:3bb9d743:::200.0018c6c4:head [write 0~62535 [fadvise_dontneed]] snapc
>>> 0=[] ondisk+write+known_if_redirected+full_force e13994) currently
>>> commit_sent
>>> 2018-02-26 10:27:59.242768 7f141cc3f700  0 log_channel(cluster) log
>>> [WRN] : 47 slow requests, 5 included below; oldest blocked for > 31.308410
>>> secs
>>> 2018-02-26 10:27:59.242776 7f141cc3f700  0 log_channel(cluster) log
>>> [WRN] : slow request 30.349575 seconds old, received at 2018-02-26
>>> 10:27:28.893124:
>>
>>
>> I'm also experiencing some MDS crash issues which I think could be
>> related.
>>
>> Is there anything I can do to mitigate the slow requests problem? The
>> rest of the time the cluster is performing pretty well.
>>
>> Thanks,
>> David
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Luminous | PG split causing slow requests

2018-02-26 Thread David C

Hi All

I have a 12.2.1 cluster, all filestore OSDs, OSDs are spinners, journals on
NVME. Cluster primarily used for CephFS, ~20M objects.

I'm seeing some OSDs getting marked down, it appears to be related to PG
splitting, e.g:

2018-02-26 10:27:27.935489 7f140dbe2700  1 _created [C,D] has 5121 objects,
> starting split.
>

Followed by:

2018-02-26 10:27:58.242551 7f141cc3f700  0 log_channel(cluster) log [WRN] :
> 9 slow requests, 5 included below; oldest blocked for > 30.308128 secs
> 2018-02-26 10:27:58.242563 7f141cc3f700  0 log_channel(cluster) log [WRN]
> : slow request 30.151105 seconds old, received at 2018-02-26
> 10:27:28.091312: osd_op(mds.0.5339:811969 3.5c
> 3:3bb9d743:::200.0018c6c4:head [write 73416~5897 [fadvise_dontneed]] snapc
> 0=[] ondisk+write+known_if_redirected+full_force e13994) currently
> commit_sent
> 2018-02-26 10:27:58.242569 7f141cc3f700  0 log_channel(cluster) log [WRN]
> : slow request 30.133441 seconds old, received at 2018-02-26
> 10:27:28.108976: osd_op(mds.0.5339:811970 3.5c
> 3:3bb9d743:::200.0018c6c4:head [write 79313~4866 [fadvise_dontneed]] snapc
> 0=[] ondisk+write+known_if_redirected+full_force e13994) currently
> commit_sent
> 2018-02-26 10:27:58.242574 7f141cc3f700  0 log_channel(cluster) log [WRN]
> : slow request 30.083401 seconds old, received at 2018-02-26
> 10:27:28.159016: osd_op(mds.9174516.0:444202 3.5c
> 3:3bb9d743:::200.0018c6c4:head [stat] snapc 0=[]
> ondisk+read+rwordered+known_if_redirected+full_force e13994) currently
> waiting for rw locks
> 2018-02-26 10:27:58.242579 7f141cc3f700  0 log_channel(cluster) log [WRN]
> : slow request 30.072310 seconds old, received at 2018-02-26
> 10:27:28.170107: osd_op(mds.0.5339:811971 3.5c
> 3:3bb9d743:::200.0018c6c4:head [write 84179~1941 [fadvise_dontneed]] snapc
> 0=[] ondisk+write+known_if_redirected+full_force e13994) currently waiting
> for rw locks
> 2018-02-26 10:27:58.242584 7f141cc3f700  0 log_channel(cluster) log [WRN]
> : slow request 30.308128 seconds old, received at 2018-02-26
> 10:27:27.934288: osd_op(mds.0.5339:811964 3.5c
> 3:3bb9d743:::200.0018c6c4:head [write 0~62535 [fadvise_dontneed]] snapc
> 0=[] ondisk+write+known_if_redirected+full_force e13994) currently
> commit_sent
> 2018-02-26 10:27:59.242768 7f141cc3f700  0 log_channel(cluster) log [WRN]
> : 47 slow requests, 5 included below; oldest blocked for > 31.308410 secs
> 2018-02-26 10:27:59.242776 7f141cc3f700  0 log_channel(cluster) log [WRN]
> : slow request 30.349575 seconds old, received at 2018-02-26
> 10:27:28.893124:


I'm also experiencing some MDS crash issues which I think could be related.

Is there anything I can do to mitigate the slow requests problem? The rest
of the time the cluster is performing pretty well.

Thanks,
David
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] MDS crash Luminous

2018-02-26 Thread David C

Thanks for the tips, John. I'll increase the debug level as suggested.

On 25 Feb 2018 20:56, "John Spray"  wrote:

> On Sat, Feb 24, 2018 at 10:13 AM, David C  wrote:
> > Hi All
> >
> > I had an MDS go down on a 12.2.1 cluster, the standby took over but I
> don't
> > know what caused the issue. Scrubs are scheduled to start at 23:00 on
> this
> > cluster but this appears to have started a minute before.
> >
> > Can anyone help me with diagnosing this please. Here's the relevant bit
> from
> > the MDS log:
>
> The messages about the heartbeat map not being healthy are a sign that
> somewhere in the MDS a thread is getting stuck and not letting others
> get in there to do work.  The daemon responds to that by stopping
> sending beacons to the monitors, who in turn blacklist the misbehaving
> MDS daemon.
>
> You'll have a better shot at working out what got jammed up if "debug
> mds" is set to something like 7, or if this is happening predictably
> at 22:59:30 you could even attach gdb to the running process and grab
> a backtrace of all threads.
>
> John
>
> > 2018-02-23 22:59:30.702915 7f26e0612700  1 mds.beacon.mdshostname _send
> > skipping beacon, heartbeat map not healthy
> > 2018-02-23 22:59:32.960228 7f26e461a700  1 heartbeat_map is_healthy
> > 'MDSRank' had timed out after 15
> > 2018-02-23 22:59:34.703001 7f26e0612700  1 heartbeat_map is_healthy
> > 'MDSRank' had timed out after 15
> > 2018-02-23 22:59:342018-02-23 22:59:02.702284 7f26e0612700  1
> heartbeat_map
> > is_healthy 'MDSRank' had timed out after 15
> > 2018-02-23 22:59:02.702334 7f26e0612700  1 mds.beacon.mdshostname _send
> > skipping beacon, heartbeat map not healthy
> > 2018-02-23 22:59:02.959726 7f26e461a700  1 heartbeat_map is_healthy
> > 'MDSRank' had timed out after 15
> > 2018-02-23 22:59:06.702354 7f26e0612700  1 heartbeat_map is_healthy
> > 'MDSRank' had timed out after 15
> > 2018-02-23 22:59:06.702366 7f26e0612700  1 mds.beacon.mdshostname _send
> > skipping beacon, heartbeat map not healthy
> > 2018-02-23 22:59:07.959804 7f26e461a700  1 heartbeat_map is_healthy
> > 'MDSRank' had timed out after 15
> > 2018-02-23 22:59:10.702421 7f26e0612700  1 heartbeat_map is_healthy
> > 'MDSRank' had timed out after 15
> > 2018-02-23 22:59:10.702434 7f26e0612700  1 mds.beacon.mdshostname _send
> > skipping beacon, heartbeat map not healthy
> > 2018-02-23 22:59:12.959876 7f26e461a700  1 heartbeat_map is_healthy
> > 'MDSRank' had timed out after 15
> > 2018-02-23 22:59:14.702522 7f26e0612700  1 heartbeat_map is_healthy
> > 'MDSRank' had timed out after 15
> > 2018-02-23 22:59:14.702535 7f26e0612700  1 mds.beacon.mdshostname _send
> > skipping beacon, heartbeat map not healthy
> > 2018-02-23 22:59:17.959985 7f26e461a700  1 heartbeat_map is_healthy
> > 'MDSRank' had timed out after 15
> > 2018-02-23 22:59:18.702645 7f26e0612700  1 heartbeat_map is_healthy
> > 'MDSRank' had timed out after 15
> > 2018-02-23 22:59:18.702670 7f26e0612700  1 mds.beacon.mdshostname _send
> > skipping beacon, heartbeat map not healthy
> > 2018-02-23 22:59:22.702742 7f26e0612700  1 heartbeat_map is_healthy
> > 'MDSRank' had timed out after 15
> > 2018-02-23 22:59:22.702754 7f26e0612700  1 mds.beacon.mdshostname _send
> > skipping beacon, heartbeat map not healthy
> > 2018-02-23 22:59:22.960063 7f26e461a700  1 heartbeat_map is_healthy
> > 'MDSRank' had timed out after 15
> > 2018-02-23 22:59:26.702841 7f26e0612700  1 heartbeat_map is_healthy
> > 'MDSRank' had timed out after 15
> > 2018-02-23 22:59:26.702854 7f26e0612700  1 mds.beacon.mdshostname _send
> > skipping beacon, heartbeat map not healthy
> > 2018-02-23 22:59:27.960141 7f26e461a700  1 heartbeat_map is_healthy
> > 'MDSRank' had timed out after 15
> > 2018-02-23 22:59:30.702903 7f26e0612700  1 heartbeat_map is_healthy
> > 'MDSRank' had timed out after 15
> > .703014 7f26e0612700  1 mds.beacon.mdshostname _send skipping beacon,
> > heartbeat map not healthy
> > 2018-02-23 22:59:37.960301 7f26e461a700  1 heartbeat_map is_healthy
> > 'MDSRank' had timed out after 15
> > 2018-02-23 22:59:38.703063 7f26e0612700  1 heartbeat_map is_healthy
> > 'MDSRank' had timed out after 15
> > 2018-02-23 22:59:38.703075 7f26e0612700  1 mds.beacon.mdshostname _send
> > skipping beacon, heartbeat map not healthy
> > 2018-02-23 22:59:42.703147 7f26e0612700  1 heartbeat_map

[ceph-users] MDS crash Luminous

2018-02-24 Thread David C

Hi All

I had an MDS go down on a 12.2.1 cluster, the standby took over but I don't
know what caused the issue. Scrubs are scheduled to start at 23:00 on this
cluster but this appears to have started a minute before.

Can anyone help me with diagnosing this please. Here's the relevant bit
from the MDS log:

2018-02-23 22:59:30.702915 7f26e0612700  1 mds.beacon.mdshostname _send
skipping beacon, heartbeat map not healthy
2018-02-23 22:59:32.960228 7f26e461a700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2018-02-23 22:59:34.703001 7f26e0612700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2018-02-23 22:59:342018-02-23 22:59:02.702284 7f26e0612700  1 heartbeat_map
is_healthy 'MDSRank' had timed out after 15
2018-02-23 22:59:02.702334 7f26e0612700  1 mds.beacon.mdshostname _send
skipping beacon, heartbeat map not healthy
2018-02-23 22:59:02.959726 7f26e461a700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2018-02-23 22:59:06.702354 7f26e0612700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2018-02-23 22:59:06.702366 7f26e0612700  1 mds.beacon.mdshostname _send
skipping beacon, heartbeat map not healthy
2018-02-23 22:59:07.959804 7f26e461a700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2018-02-23 22:59:10.702421 7f26e0612700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2018-02-23 22:59:10.702434 7f26e0612700  1 mds.beacon.mdshostname _send
skipping beacon, heartbeat map not healthy
2018-02-23 22:59:12.959876 7f26e461a700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2018-02-23 22:59:14.702522 7f26e0612700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2018-02-23 22:59:14.702535 7f26e0612700  1 mds.beacon.mdshostname _send
skipping beacon, heartbeat map not healthy
2018-02-23 22:59:17.959985 7f26e461a700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2018-02-23 22:59:18.702645 7f26e0612700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2018-02-23 22:59:18.702670 7f26e0612700  1 mds.beacon.mdshostname _send
skipping beacon, heartbeat map not healthy
2018-02-23 22:59:22.702742 7f26e0612700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2018-02-23 22:59:22.702754 7f26e0612700  1 mds.beacon.mdshostname _send
skipping beacon, heartbeat map not healthy
2018-02-23 22:59:22.960063 7f26e461a700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2018-02-23 22:59:26.702841 7f26e0612700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2018-02-23 22:59:26.702854 7f26e0612700  1 mds.beacon.mdshostname _send
skipping beacon, heartbeat map not healthy
2018-02-23 22:59:27.960141 7f26e461a700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2018-02-23 22:59:30.702903 7f26e0612700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
.703014 7f26e0612700  1 mds.beacon.mdshostname _send skipping beacon,
heartbeat map not healthy
2018-02-23 22:59:37.960301 7f26e461a700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2018-02-23 22:59:38.703063 7f26e0612700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2018-02-23 22:59:38.703075 7f26e0612700  1 mds.beacon.mdshostname _send
skipping beacon, heartbeat map not healthy
2018-02-23 22:59:42.703147 7f26e0612700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2018-02-23 22:59:42.703160 7f26e0612700  1 mds.beacon.mdshostname _send
skipping beacon, heartbeat map not healthy
2018-02-23 22:59:42.960414 7f26e461a700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2018-02-23 22:59:46.703209 7f26e0612700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2018-02-23 22:59:46.703222 7f26e0612700  1 mds.beacon.mdshostname _send
skipping beacon, heartbeat map not healthy
2018-02-23 22:59:47.960487 7f26e461a700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2018-02-23 22:59:50.703305 7f26e0612700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2018-02-23 22:59:50.703319 7f26e0612700  1 mds.beacon.mdshostname _send
skipping beacon, heartbeat map not healthy
2018-02-23 22:59:52.960569 7f26e461a700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2018-02-23 22:59:54.703365 7f26e0612700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2018-02-23 22:59:54.703377 7f26e0612700  1 mds.beacon.mdshostname _send
skipping beacon, heartbeat map not healthy
2018-02-23 22:59:57.960642 7f26e461a700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2018-02-23 22:59:58.703447 7f26e0612700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2018-02-23 22:59:58.703461 7f26e0612700  1 mds.beacon.mdshostname _send
skipping beacon, heartbeat map not healthy
2018-02-23 22:59:59.717665 7f26e0e13700  1 heartbeat_map reset_timeout
'MDSRank' had timed out after 15
2018-02-23 22:59:59.719194 7f26dd60c700 -1 mds.0.journaler.mdlog(rw)
_finish_write_head got (108) Cannot send after transport endpoint

[ceph-users] C++17 and C++ ABI on master

2018-01-08 Thread Adam C. Emerson

Good day,

I've just merged some changs into master that set us up to compile
with C++17. This will require a reasonably new compiler to build
master.

Due to a change in how 'noexcept' is handled (it is now part of the type
signature of a function), mangled symbol names of noexcept functions are
different, so if you have custom clients using the C++ libraries, you may
need to recompile.

Do not worry, there should be no change to the C ABI. Any C clients
should be unaffected.

Thank you.

-- 
Senior Software Engineer   Red Hat Storage, Ann Arbor, MI, US
IRC: Aemerson@OFTC, Actinic@Freenode
0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to use vfs_ceph

2017-12-21 Thread David C

At a glance looks OK, I've not tested this in a while. Silly question but
does your Samba package definitely ship with the Ceph vfs? Caught me out in
the past.

Have you tried exporting a sub dir? Maybe 777 it although shouldn't make a
difference.

On 21 Dec 2017 13:16, "Felix Stolte"  wrote:

> Hello folks,
>
> is anybody using the vfs_ceph module for exporting cephfs as samba shares?
> We are running ceph jewel with cephx enabled. Manpage of vfs_ceph only
> references the option ceph:config_file. How do I need to configure my share
> (or maybe ceph.conf)?
>
> log.smbd:  '/' does not exist or permission denied when connecting to
> [vfs] Error was Transport endpoint is not connected
>
> I have a user ctdb with keyring file /etc/ceph/ceph.client.ctdb.keyring
> with permissions:
>
> caps: [mds] allow rw
> caps: [mon] allow rcaps: [osd] allow rwx
> pool=cephfs_metadata,allow rwx pool=cephfs_data
>
> I can mount cephfs with cephf-fuse using the id ctdb and its keyfile.
>
> My share definition is:
>
> [vfs]
> comment = vfs
> path = /
> read only = No
> vfs objects = acl_xattr ceph
> ceph:user_id = ctdb
> ceph:config_file = /etc/ceph/ceph.conf
>
>
> Any advice is appreciated.
>
> Regards Felix
>
> --
> Forschungszentrum Jülich GmbH
> 52425 Jülich
> Sitz der Gesellschaft: Jülich
> Eingetragen im Handelsregister des Amtsgerichts Düren Nr. HR B 3498
> Vorsitzender des Aufsichtsrats: MinDir. Dr. Karl Eugen Huthmacher
> Geschäftsführung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
> Prof. Dr. Sebastian M. Schmidt
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cephfs NFS failover

2017-12-20 Thread David C

Do a search on "nfs pacemaker", should be loads of guides.

Another approach, assuming you're exporting cephfs, could be active/active
nfs servers using Ctdb from the samba project. You'll be restricted to
NFSv3 but it's much simpler than configuring pacemaker IMO.

On 20 Dec 2017 5:45 p.m., "nigel davies"  wrote:

Hay all

Can any one advise on how it can do this

I have set up an test ceph cluster

3 osd system
2 NFS servers

I want to set up the two NFS serves as an failover process. So if one fails
the other starts up.

I have tried an few ways and getting stuck any advise would be gratefully
received on this one


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] active+remapped+backfill_toofull

2017-12-20 Thread David C

You should just need to restart the relavent  OSDs for the new backfill
threshold to kick in.

On 20 Dec 2017 00:14, "Nghia Than"  wrote:

I added more OSDs few days ago to reduce usage under 70% (nearfull and full
ratio is higher than this value) and it still stuck at backfill_toofull
while rebalance data.

I tried to change backfill full ratio and it show error (unchangeable) as
below:

[root@storcp ~]# ceph tell osd.\* injectargs '--osd_backfill_full_ratio
0.92'

osd.0: osd_backfill_full_ratio = '0.92' (unchangeable)

osd.1: osd_backfill_full_ratio = '0.92' (unchangeable)

osd.2: osd_backfill_full_ratio = '0.92' (unchangeable)

osd.3: osd_backfill_full_ratio = '0.92' (unchangeable)

osd.4: osd_backfill_full_ratio = '0.92' (unchangeable)

osd.5: osd_backfill_full_ratio = '0.92' (unchangeable)

osd.6: osd_backfill_full_ratio = '0.92' (unchangeable)

osd.7: osd_backfill_full_ratio = '0.92' (unchangeable)

osd.8: osd_backfill_full_ratio = '0.92' (unchangeable)

osd.9: osd_backfill_full_ratio = '0.92' (unchangeable)

osd.10: osd_backfill_full_ratio = '0.92' (unchangeable)

osd.11: osd_backfill_full_ratio = '0.92' (unchangeable)

osd.12: osd_backfill_full_ratio = '0.92' (unchangeable)

osd.13: osd_backfill_full_ratio = '0.92' (unchangeable)

osd.14: osd_backfill_full_ratio = '0.92' (unchangeable)

osd.15: osd_backfill_full_ratio = '0.92' (unchangeable)

osd.16: osd_backfill_full_ratio = '0.92' (unchangeable)

osd.17: osd_backfill_full_ratio = '0.92' (unchangeable)

osd.18: osd_backfill_full_ratio = '0.92' (unchangeable)

osd.19: osd_backfill_full_ratio = '0.92' (unchangeable)

osd.20: osd_backfill_full_ratio = '0.92' (unchangeable)

osd.21: osd_backfill_full_ratio = '0.92' (unchangeable)

osd.22: osd_backfill_full_ratio = '0.92' (unchangeable)

osd.23: osd_backfill_full_ratio = '0.92' (unchangeable)

osd.24: osd_backfill_full_ratio = '0.92' (unchangeable)

osd.25: osd_backfill_full_ratio = '0.92' (unchangeable)

osd.26: osd_backfill_full_ratio = '0.92' (unchangeable)

osd.27: osd_backfill_full_ratio = '0.92' (unchangeable)

osd.28: osd_backfill_full_ratio = '0.92' (unchangeable)

[root@storcp ~]#

On Wed, Dec 20, 2017 at 1:57 AM, David C  wrote:

> What's your backfill full ratio? You may be able to get healthy by
> increasing your backfill full ratio (in small increments). But your next
> immediate task should be to add more OSDs or remove data.
>
>
> On 19 Dec 2017 4:26 p.m., "Nghia Than"  wrote:
>
> Hi,
>
> My CEPH is stuck at this for few days, we added new OSD and nothing
> changed:
>
> - *17 pgs backfill_toofull*
> - *17 pgs stuck unclean*
> - *recovery 21/5156264 objects degraded (0.000%)*
> - *recovery 52908/5156264 objects misplaced (1.026%)*
> - *8 near full osd(s)*
>
> And here is my ceph health detail:
>
> HEALTH_WARN 17 pgs backfill_toofull; 17 pgs stuck unclean; recovery
> 21/5156264 objects degraded (0.000%); recovery 52908/5156264 objects
> misplaced (1.026%); 8 near full osd(s)
>
> pg 1.231 is stuck unclean for 4367.09, current state
> active+remapped+backfill_toofull, last acting [24,9]
>
> pg 1.1e8 is stuck unclean for 7316.364770, current state
> active+remapped+backfill_toofull, last acting [16,3]
>
> pg 1.188 is stuck unclean for 7315.400227, current state
> active+remapped+backfill_toofull, last acting [11,7]
>
> pg 1.158 is stuck unclean for 7321.511627, current state
> active+remapped+backfill_toofull, last acting [11,17]
>
> pg 1.81 is stuck unclean for 4366.683703, current state
> active+remapped+backfill_toofull, last acting [10,24]
>
> pg 1.332 is stuck unclean for 7315.248115, current state
> active+remapped+backfill_toofull, last acting [23,1]
>
> pg 1.2c2 is stuck unclean for 4365.635413, current state
> active+remapped+backfill_toofull, last acting [24,13]
>
> pg 1.3c6 is stuck unclean for 7320.816089, current state
> active+remapped+backfill_toofull, last acting [11,20]
>
> pg 1.26f is stuck unclean for 7315.882215, current state
> active+remapped+backfill_toofull, last acting [28,8]
>
> pg 1.236 is stuck unclean for 7322.152706, current state
> active+remapped+backfill_toofull, last acting [8,26]
>
> pg 1.249 is stuck unclean for 4366.885751, current state
> active+remapped+backfill_toofull, last acting [9,24]
>
> pg 1.7b is stuck unclean for 7315.353072, current state
> active+remapped+backfill_toofull, last acting [28,3]
>
> pg 1.1ec is stuck unclean for 7315.981062, current state
> active+remapped+backfill_toofull, last acting [16,0]
>
> pg 1.248 is stuck u

Re: [ceph-users] active+remapped+backfill_toofull

2017-12-19 Thread David C

What's your backfill full ratio? You may be able to get healthy by
increasing your backfill full ratio (in small increments). But your next
immediate task should be to add more OSDs or remove data.


On 19 Dec 2017 4:26 p.m., "Nghia Than"  wrote:

Hi,

My CEPH is stuck at this for few days, we added new OSD and nothing changed:

- *17 pgs backfill_toofull*
- *17 pgs stuck unclean*
- *recovery 21/5156264 objects degraded (0.000%)*
- *recovery 52908/5156264 objects misplaced (1.026%)*
- *8 near full osd(s)*

And here is my ceph health detail:

HEALTH_WARN 17 pgs backfill_toofull; 17 pgs stuck unclean; recovery
21/5156264 objects degraded (0.000%); recovery 52908/5156264 objects
misplaced (1.026%); 8 near full osd(s)

pg 1.231 is stuck unclean for 4367.09, current state
active+remapped+backfill_toofull, last acting [24,9]

pg 1.1e8 is stuck unclean for 7316.364770, current state
active+remapped+backfill_toofull, last acting [16,3]

pg 1.188 is stuck unclean for 7315.400227, current state
active+remapped+backfill_toofull, last acting [11,7]

pg 1.158 is stuck unclean for 7321.511627, current state
active+remapped+backfill_toofull, last acting [11,17]

pg 1.81 is stuck unclean for 4366.683703, current state
active+remapped+backfill_toofull, last acting [10,24]

pg 1.332 is stuck unclean for 7315.248115, current state
active+remapped+backfill_toofull, last acting [23,1]

pg 1.2c2 is stuck unclean for 4365.635413, current state
active+remapped+backfill_toofull, last acting [24,13]

pg 1.3c6 is stuck unclean for 7320.816089, current state
active+remapped+backfill_toofull, last acting [11,20]

pg 1.26f is stuck unclean for 7315.882215, current state
active+remapped+backfill_toofull, last acting [28,8]

pg 1.236 is stuck unclean for 7322.152706, current state
active+remapped+backfill_toofull, last acting [8,26]

pg 1.249 is stuck unclean for 4366.885751, current state
active+remapped+backfill_toofull, last acting [9,24]

pg 1.7b is stuck unclean for 7315.353072, current state
active+remapped+backfill_toofull, last acting [28,3]

pg 1.1ec is stuck unclean for 7315.981062, current state
active+remapped+backfill_toofull, last acting [16,0]

pg 1.248 is stuck unclean for 7324.062482, current state
active+remapped+backfill_toofull, last acting [16,3]

pg 1.e4 is stuck unclean for 4370.009328, current state
active+remapped+backfill_toofull, last acting [21,24]

pg 1.144 is stuck unclean for 7317.998393, current state
active+remapped+backfill_toofull, last acting [26,3]

pg 0.5f is stuck unclean for 5877.987814, current state
active+remapped+backfill_toofull, last acting [24,5]

pg 1.3c6 is active+remapped+backfill_toofull, acting [11,20]

pg 1.332 is active+remapped+backfill_toofull, acting [23,1]

pg 1.2c2 is active+remapped+backfill_toofull, acting [24,13]

pg 1.26f is active+remapped+backfill_toofull, acting [28,8]

pg 1.249 is active+remapped+backfill_toofull, acting [9,24]

pg 1.248 is active+remapped+backfill_toofull, acting [16,3]

pg 1.236 is active+remapped+backfill_toofull, acting [8,26]

pg 1.e4 is active+remapped+backfill_toofull, acting [21,24]

pg 0.5f is active+remapped+backfill_toofull, acting [24,5]

pg 1.7b is active+remapped+backfill_toofull, acting [28,3]

pg 1.81 is active+remapped+backfill_toofull, acting [10,24]

pg 1.144 is active+remapped+backfill_toofull, acting [26,3]

pg 1.158 is active+remapped+backfill_toofull, acting [11,17]

pg 1.188 is active+remapped+backfill_toofull, acting [11,7]

pg 1.1e8 is active+remapped+backfill_toofull, acting [16,3]

pg 1.1ec is active+remapped+backfill_toofull, acting [16,0]

pg 1.231 is active+remapped+backfill_toofull, acting [24,9]

recovery 21/5156264 objects degraded (0.000%)

recovery 52908/5156264 objects misplaced (1.026%)

osd.3 is near full at 92%

osd.4 is near full at 91%

osd.12 is near full at 92%

osd.17 is near full at 86%

osd.18 is near full at 87%

osd.23 is near full at 90%

osd.27 is near full at 85%
osd.28 is near full at 85%

I tried reweight OSD to smaller weight but nothing changed. This is my
dump full_ratio:

[root@storcp ~]# ceph pg dump |grep full_ratio

dumped all in format plain

full_ratio 0.95

nearfull_ratio 0.85
[root@storcp ~]#

And ceph osd df:

[root@storcp ~]# ceph osd df

ID WEIGHT  REWEIGHT SIZE   USEAVAIL  %USE  VAR  PGS

 0 0.86800  1.0   888G   754G   134G 84.91 1.09 102

 1 0.86800  1.0   888G   734G   154G 82.63 1.06  90

 2 0.86800  1.0   888G   548G   339G 61.77 0.79  75

 9 0.86800  1.0   888G   658G   230G 74.09 0.95  81

10 0.86800  1.0   888G   659G   229G 74.17 0.95  79

11 0.86800  1.0   888G   706G   182G 79.49 1.02  91

18 0.86800  1.0   888G   774G   114G 87.14 1.12  94

 3 0.86800  1.0   888G   823G 67037M 92.63 1.19  99

 4 0.86800  1.0   888G   816G 73780M 91.89 1.18 102

 5 0.86800  1.0   888G   608G   279G 68.51 0.88  76

12 0.86800  1.0   888G   818G 72144M 92.07 1.18 111

13 0.86800  1.0   888G   657G   231G 73.94 0.95  84

14 0.86800  1.000

Re: [ceph-users] Ceph luminous nfs-ganesha-ceph

2017-12-14 Thread David C

Is this nfs-ganesha exporting Cephfs?
Are you using NFS for a Vmware Datastore?
What are you using for the NFS failover?

We need more info but this does sound like a vmware/nfs question rather
than specifically ceph/nfs-ganesha

On Thu, Dec 14, 2017 at 1:47 PM, nigel davies  wrote:

> Hay all
>
> i am in the process or trying to set up and VMware storage environment
> I been reading and found that Iscsi (on jewel release) can cause issues
> and the datastore can drop out.
>
> I been looking at using  nfs-ganesha with my ceph platform, it all looked
> good until i looked at failover to our 2nd nfs serve. I believe i set up
> the Server right. gave the two IP address of the NFS servers.
>
> when i shutdown the live NFS server, the datastore becomes "inactive" even
> after i bring the NFS server backup, it stil shows as inactive.
>
> any advise on this i would be grateful incase i am missing something.
> i am using NFS 4.1 as i been advised it will support the fail over
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] public/cluster network

2017-12-11 Thread David C

Hi Roman

Whilst you can define multiple subnets in the public network directive, the
MONs still only bind to a single IP. Your clients need to be able to route
to that IP. From what you're saying, 172.x.x.x/24 is an isolated network,
so a client on the 10.x.x.x network is not going to be able to access the
cluster.

On Mon, Dec 11, 2017 at 9:15 AM, Roman  wrote:

> Hi all,
>
> We would like to implement the following setup.
> Our cloud nodes (CNs) for virtual machines  have two 10 Gbps NICs:
> 10.x.y.z/22 (routed through the backbone) and 172.x.y.z/24 (available only
> on servers within single rack). CNs and ceph nodes are in the same rack.
> Ceph nodes have two 10 Gpbs NICs in the same networks. We are going to use
> 172.x.y.z/24 as ceph cluster network for all ceph components' traffic (i.e.
> OSD/MGR/MON/MDS). But apart from that we are thinking about to use the same
> cluster network for CNs interactions with ceph nodes (since it is expected
> the network within single switch in rack to be much faster then the routed
> via backbone one).
> So 172.x.y.z/24 is for the following: pure ceph traffic, CNs <=> ceph
> nodes; 10.x.y.z/22 is for the rest type of ceph clients like VMs with
> mounted cephfs shares (since VMs  doesn't have access to 172.x.y.z/24 net).
> So I wonder if it's possible to implement something like the following:
> always use 172.x.y.z/24 if it is availabe on both source and destination
> otherwise use 10.x.y.z/22.
> We have just tried to specify the following in ceph.conf:
> cluster network = 172.x.y.z/24
> public network = 172.x.y.z/24, 10.x.y.z/22
>
> But it doesn't seem to work.
> There is an entry in Redhat Knowledgebase portal [1] called "Ceph Multiple
> public networks" but there is not solution provided yet.
>
> [1] https://access.redhat.com/solutions/1463363
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] List directory in cephfs blocking very long time

2017-12-05 Thread David C

AP_ANONYMOUS, -1, 0) =
> 0x7f8c65b89000
> read(3, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 1330
> close(3)= 0
> munmap(0x7f8c65b89000, 4096)= 0
> socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3
> connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) =
> -1 ENOENT (No such file or directory)
> close(3)= 0
> socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3
> connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) =
> -1 ENOENT (No such file or directory)
> close(3)= 0
> open("/etc/group", O_RDONLY|O_CLOEXEC)  = 3
> fstat(3, {st_mode=S_IFREG|0644, st_size=597, ...}) = 0
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
> 0x7f8c65b89000
> read(3, "root:x:0:\nbin:x:1:\ndaemon:x:2:\ns"..., 4096) = 597
> close(3)= 0
> munmap(0x7f8c65b89000, 4096)= 0
> openat(AT_FDCWD, "base", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
> getdents(3, /* 533 entries */, 32768)   = 32720
> lstat("base/dbus-glib-0.100-7.el7.x86_64.rpm", {st_mode=S_IFREG|0777,
> st_size=104756, ...}) = 0
> lstat("base/tomcat-admin-webapps-7.0.54-2.el7_1.noarch.rpm",
> {st_mode=S_IFREG|0777, st_size=38872, ...}) = 0
> lstat("base/hunspell-gl-0.20080515-8.el7.noarch.rpm",
> {st_mode=S_IFREG|0777, st_size=212276, ...}) = 0
> lstat("base/libreoffice-langpack-de-4.3.7.2-5.el7.x86_64.rpm",
> {st_mode=S_IFREG|0777, st_size=7570712, ...}) = 0
> lstat("base/paktype-naskh-basic-fonts-4.1-3.el7.noarch.rpm",
> {st_mode=S_IFREG|0777, st_size=436532, ...}) = 0
> lstat("base/openlmi-networking-0.3.0-3.el7.x86_64.rpm",
> {st_mode=S_IFREG|0777, st_size=148344, ...}) = 0
> lstat("base/kdenetwork-kopete-libs-4.10.5-8.el7_0.x86_64.rpm",
> {st_mode=S_IFREG|0777, st_size=1191908, ...}) = 0
> lstat("base/libwmf-0.2.8.4-41.el7_1.x86_64.rpm", {st_mode=S_IFREG|0777,
> st_size=137716, ...}) = 0
> lstat("base/texlive-texconfig-bin-svn27344.0-38.20130427_r30134.el7.noarch.rpm",
> {st_mode=S_IFREG|0777, st_size=16908, ...}) = 0
> lstat("base/gcc-c++-4.8.5-4.el7.x86_64.rpm", {st_mode=S_IFREG|0777,
> st_size=7508064, ...}) = 0
> ==
>
> --
> Best Regards
> Jian Zhang
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Replaced a disk, first time. Quick question

2017-12-04 Thread David C

On Mon, Dec 4, 2017 at 4:39 PM, Drew Weaver  wrote:

> Howdy,
>
>
>
> I replaced a disk today because it was marked as Predicted failure. These
> were the steps I took
>
>
>
> ceph osd out osd17
>
> ceph -w #waited for it to get done
>
> systemctl stop ceph-osd@osd17
>
> ceph osd purge osd17 --yes-i-really-mean-it
>
> umount /var/lib/ceph/osd/ceph-osdX
>
>
>
> I noticed that after I ran the ‘osd out’ command that it started moving
> data around.
>

That's normal

>
>
> 19446/16764 objects degraded (115.999%) ß I noticed that number seems odd
>

I don't think that's normal!

>
>
> So then I replaced the disk
>
> Created a new label on it
>
> Ceph-deploy osd prepare OSD5:sdd
>
>
>
> THIS time, it started rebuilding
>
>
>
> 40795/16764 objects degraded (243.349%) ß Now I’m really concerned.
>
>
>
> Perhaps I don’t quite understand what the numbers are telling me but is it
> normal for it to rebuilding more objects than exist?
>
See:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-September/020682.html,
seems to be similar issue to yours.

I'd recommend providing more info, Ceph version, bluestore or filestore,
crushmap etc.

>
>
> Thanks,
>
> -Drew
>
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] "failed to open ino"

2017-11-29 Thread David C

On Tue, Nov 28, 2017 at 1:50 PM, Jens-U. Mozdzen  wrote:

> Hi David,
>
> Zitat von David C :
>
>> On 27 Nov 2017 1:06 p.m., "Jens-U. Mozdzen"  wrote:
>>
>> Hi David,
>>
>> Zitat von David C :
>>
>> Hi Jens
>>
>>>
>>> We also see these messages quite frequently, mainly the "replicating
>>> dir...". Only seen "failed to open ino" a few times so didn't do any real
>>> investigation. Our set up is very similar to yours, 12.2.1,
>>> active/standby
>>> MDS and exporting cephfs through KNFS (hoping to replace with Ganesha
>>> soon).
>>>
>>>
>> been there, done that - using Ganesha more than doubled the run-time of
>> our
>> jobs, while with knfsd, the run-time is about the same for CephFS-based
>> and
>> "local disk"-based files. But YMMV, so if you see speeds with Ganesha that
>> are similar to knfsd, please report back with details...
>>
>>
>> I'd be interested to know if you tested Ganesha over a cephfs kernel mount
>> (ie using the VFS fsal) or if you used the Ceph fsal. Also the server and
>> client versions you tested.
>>
>
> I had tested Ganesha only via the Ceph FSAL. Our Ceph nodes (including the
> one used as a Ganesha server) are running 
> ceph-12.2.1+git.1507910930.aea79b8b7a
> on OpenSUSE 42.3, SUSE's kernel 4.4.76-1-default (which has a number of
> back-ports in it), Ganesha is at version nfs-ganesha-2.5.2.0+git.150427
> 5777.a9d23b98f.
>
> The NFS clients are a broad mix of current and older systems.
>
> Prior to Luminous, Ganesha writes were terrible due to a bug with fsync
>> calls in the mds code. The fix went into the mds and client code. If
>> you're
>> doing Ganesha over the top of the kernel mount you'll need a pretty recent
>> kernel to see the write improvements.
>>
>
> As we were testing the Ceph FSAL, this should not be the cause.
>
> From my limited Ganesha testing so far, reads are better when exporting the
>> kernel mount, writes are much better with the Ceph fsal. But that's
>> expected for me as I'm using the CentOS kernel. I was hoping the
>> aforementioned fix would make it into the rhel 7.4 kernel but doesn't look
>> like it has.
>>
>
> When exporting the kernel-mounted CephFS via kernel nfsd, we see similar
> speeds to serving the same set of files from a local bcache'd RAID1 array
> on SAS disks. This is for a mix of reads and writes, mostly small files
> (compile jobs, some packaging).
>

I'm surprised your knfs writes are that good on a 4.4 kernel (assuming your
exports aren't async). At least when I tested with the mainline 4.4 kernel
it was still super slow for me. It's only in 4.12 or 4.13 where they
improve. It sounds like Suse have potentially backported some good stuff!



>
> From what I can see, it would have to be A/A/P, since MDS demands at least
>> one stand-by.
>>
>>
>> That's news to me.
>>
>
> From http://docs.ceph.com/docs/master/cephfs/multimds/ :
>
> "Each CephFS filesystem has a max_mds setting, which controls how many
> ranks will be created. The actual number of ranks in the filesystem will
> only be increased if a spare daemon is available to take on the new rank.
> For example, if there is only one MDS daemon running, and max_mds is set to
> two, no second rank will be created."
>
> Might well be I was mis-reading this... I had first read it to mean that a
> spare daemon needs to be available *while running* A/A, but the example
> sounds like the spare is required when *switching to* A/A.
>

Yep I think you're right. Further down that page it states: "Even with
multiple active MDS daemons, a highly available system still requires
standby daemons to take over if any of the servers running an active daemon
fail."

I assumed if an active MDS failed, the surviving MDS(s) would just pick up
the workload. The question is, would losing an MDS in a cluster with no
standbys stop all metadata IO or would it just be a health warning? I need
to do some playing around with this at some point.



> Is it possible you still had standby config in your ceph.conf?
>>
>
> Not sure what you're asking for, is this related to active/active or to
> our Ganesha tests? We have not yet tried to switch to A/A, so our config
> actually contains standby parameters.
>

It was in relation to A/A, but query answered above.

>
> Regards,
> Jens
>
> --
> Jens-U. Mozdzen voice   : +49-40-559 51 75
> NDE Netzdesign und -entwicklung AG  fax : +49-40-559 51 77
> Postfach 61 03 15   mobile  : +49-179-4 98 21 98
> D-22423 Hamburg e-mail  : jmozd...@nde.ag
>
> Vorsitzende des Aufsichtsrates: Angelika Torlée-Mozdzen
>   Sitz und Registergericht: Hamburg, HRB 90934
>   Vorstand: Jens-U. Mozdzen
>USt-IdNr. DE 814 013 983
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] "failed to open ino"

2017-11-27 Thread David C

On 27 Nov 2017 1:06 p.m., "Jens-U. Mozdzen"  wrote:

Hi David,

Zitat von David C :

Hi Jens
>
> We also see these messages quite frequently, mainly the "replicating
> dir...". Only seen "failed to open ino" a few times so didn't do any real
> investigation. Our set up is very similar to yours, 12.2.1, active/standby
> MDS and exporting cephfs through KNFS (hoping to replace with Ganesha
> soon).
>

been there, done that - using Ganesha more than doubled the run-time of our
jobs, while with knfsd, the run-time is about the same for CephFS-based and
"local disk"-based files. But YMMV, so if you see speeds with Ganesha that
are similar to knfsd, please report back with details...

I'd be interested to know if you tested Ganesha over a cephfs kernel mount
(ie using the VFS fsal) or if you used the Ceph fsal. Also the server and
client versions you tested.

Prior to Luminous, Ganesha writes were terrible due to a bug with fsync
calls in the mds code. The fix went into the mds and client code. If you're
doing Ganesha over the top of the kernel mount you'll need a pretty recent
kernel to see the write improvements.

>From my limited Ganesha testing so far, reads are better when exporting the
kernel mount, writes are much better with the Ceph fsal. But that's
expected for me as I'm using the CentOS kernel. I was hoping the
aforementioned fix would make it into the rhel 7.4 kernel but doesn't look
like it has.

I currently use async on my nfs exports as writes are really poor
otherwise. I'm comfortable with the risks that entails.

Interestingly, the paths reported in "replicating dir" are usually
> dirs exported through Samba (generally Windows profile dirs). Samba runs
> really well for us and there doesn't seem to be any impact on users. I
> expect we wouldn't see these messages if running active/active MDS but I'm
> still a bit cautious about implementing that (am I being overly cautious I
> wonder?).
>

>From what I can see, it would have to be A/A/P, since MDS demands at least
one stand-by.

That's news to me. Is it possible you still had standby config in your
ceph.conf?

Regards,
Jens
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] "failed to open ino"

2017-11-27 Thread David C

Hi Jens

We also see these messages quite frequently, mainly the "replicating
dir...". Only seen "failed to open ino" a few times so didn't do any real
investigation. Our set up is very similar to yours, 12.2.1, active/standby
MDS and exporting cephfs through KNFS (hoping to replace with Ganesha
soon). Interestingly, the paths reported in "replicating dir" are usually
dirs exported through Samba (generally Windows profile dirs). Samba runs
really well for us and there doesn't seem to be any impact on users. I
expect we wouldn't see these messages if running active/active MDS but I'm
still a bit cautious about implementing that (am I being overly cautious I
wonder?).

Thanks,

On Mon, Nov 27, 2017 at 10:57 AM, Jens-U. Mozdzen  wrote:

> Hi,
>
> Zitat von "Yan, Zheng" :
>
>> On Sat, Nov 25, 2017 at 2:27 AM, Jens-U. Mozdzen  wrote:
>>
>>> [...]
>>> In the log of the active MDS, we currently see the following two inodes
>>> reported over and over again, about every 30 seconds:
>>>
>>> --- cut here ---
>>> 2017-11-24 18:24:16.496397 7fa308cf0700  0 mds.0.cache  failed to open
>>> ino
>>> [...]
>>>
>>
>> It's likely caused by NFS export.  MDS reveals this error message if
>> NFS client tries to access a deleted file. The error causes NFS client
>> to return -ESTALE.
>>
>
> thank you for pointing me at this potential cause - as we're still using
> NFS access during that job (old clients without native CephFS support), it
> may be we have some yet unnoticed stale NFS file handles. I'll have a
> closer look, indeed!
>
>
> Regards,
> Jens
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS 12.2.0 -> 12.2.1 change in inode caching behaviour

2017-11-27 Thread David C

Yep, that did it! Thanks, Zheng. I should read release notes more carefully!

On Fri, Nov 24, 2017 at 7:09 AM, Yan, Zheng  wrote:

> On Thu, Nov 23, 2017 at 9:17 PM, David C  wrote:
> > Hi All
> >
> > I upgraded my 12.2.0 cluster to 12.2.1 a month or two back. I've noticed
> > that the number of inodes held in cache is only approx 1/5th of my
> > inode_max. This is a surprise to me as with 12.2.0, and before that
> Jewel,
> > after starting an MDS server, the cache would typically fill to the max
> > within 24 hours. I have ~8 million entries my file system, and most of
> this
> > is fairly hot data. I'm seeing quite frequent "failing to respond to
> cache
> > pressure" messages, I just have two Kernel clients accessing the
> filesystem.
> >
> > Are there some new defaults I need to change perhaps? Or potentially a
> bug?
> >
>
> we introduced config option 'mds_cache_memory_limit'.
>
>
> Regards
> Yan, Zheng
>
> > Output of perf dump mds:
> >
> >>> "mds": {
> >>>
> >>> "request": 184132091,
> >>>
> >>> "reply": 184132064,
> >>>
> >>> "reply_latency": {
> >>>
> >>> "avgcount": 184132064,
> >>>
> >>> "sum": 125364.905594355,
> >>>
> >>> "avgtime": 0.000680842
> >>>
> >>> },
> >>>
> >>> "forward": 0,
> >>>
> >>> "dir_fetch": 9846671,
> >>>
> >>> "dir_commit": 562495,
> >>>
> >>> "dir_split": 0,
> >>>
> >>> "dir_merge": 0,
> >>>
> >>> "inode_max": 250,
> >>>
> >>> "inodes": 444642,
> >>>
> >>> "inodes_top": 185845,
> >>>
> >>> "inodes_bottom": 127878,
> >>>
> >>> "inodes_pin_tail": 130919,
> >>>
> >>> "inodes_pinned": 179149,
> >>>
> >>> "inodes_expired": 135604208,
> >>>
> >>> "inodes_with_caps": 165900,
> >>>
> >>> "caps": 165948,
> >>>
> >>> "subtrees": 2,
> >>>
> >>> "traverse": 187280168,
> >>>
> >>> "traverse_hit": 185739606,
> >>>
> >>> "traverse_forward": 0,
> >>>
> >>> "traverse_discover": 0,
> >>>
> >>> "traverse_dir_fetch": 118150,
> >>>
> >>> "traverse_remote_ino": 8,
> >>>
> >>> "traverse_lock": 60256,
> >>>
> >>> "load_cent": 18413221445,
> >>>
> >>> "q": 0,
> >>>
> >>> "exported": 0,
> >>>
> >>> "exported_inodes": 0,
> >>>
> >>> "imported": 0,
> >>>
> >>> "imported_inodes": 0
> >>>
> >>> }
> >>
> >>
> >
> > A few extra details:
> >
> > Running two MDS servers, one is standby.
> > Both have mds_cache_size = 250.
> > CentOS 7.3 servers.
> > Kernel clients are CentOS 7.3 (3.10.0-514.2.2.el7.x86_64)
> >
> > Thanks,
> > David
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] CephFS 12.2.0 -> 12.2.1 change in inode caching behaviour

2017-11-23 Thread David C

Hi All

I upgraded my 12.2.0 cluster to 12.2.1 a month or two back. I've noticed
that the number of inodes held in cache is only approx 1/5th of my
inode_max. This is a surprise to me as with 12.2.0, and before that Jewel,
after starting an MDS server, the cache would typically fill to the max
within 24 hours. I have ~8 million entries my file system, and most of this
is fairly hot data. I'm seeing quite frequent "failing to respond to cache
pressure" messages, I just have two Kernel clients accessing the filesystem.

Are there some new defaults I need to change perhaps? Or potentially a bug?

Output of perf dump mds:

"mds": {
>
> "request": 184132091,
>
> "reply": 184132064,
>
> "reply_latency": {
>
> "avgcount": 184132064,
>
> "sum": 125364.905594355,
>
> "avgtime": 0.000680842
>
> },
>
> "forward": 0,
>
> "dir_fetch": 9846671,
>
> "dir_commit": 562495,
>
> "dir_split": 0,
>
> "dir_merge": 0,
>
> "inode_max": 250,
>
> "inodes": 444642,
>
> "inodes_top": 185845,
>
> "inodes_bottom": 127878,
>
> "inodes_pin_tail": 130919,
>
> "inodes_pinned": 179149,
>
> "inodes_expired": 135604208,
>
> "inodes_with_caps": 165900,
>
> "caps": 165948,
>
> "subtrees": 2,
>
> "traverse": 187280168,
>
> "traverse_hit": 185739606,
>
> "traverse_forward": 0,
>
> "traverse_discover": 0,
>
> "traverse_dir_fetch": 118150,
>
> "traverse_remote_ino": 8,
>
> "traverse_lock": 60256,
>
> "load_cent": 18413221445,
>
> "q": 0,
>
> "exported": 0,
>
> "exported_inodes": 0,
>
> "imported": 0,
>
> "imported_inodes": 0
>
> }
>
>
>
A few extra details:

Running two MDS servers, one is standby.
Both have mds_cache_size = 250.
CentOS 7.3 servers.
Kernel clients are CentOS 7.3 (3.10.0-514.2.2.el7.x86_64)

Thanks,
David
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] s3 bucket policys

2017-11-07 Thread Adam C. Emerson

On 07/11/2017, Simon Leinen wrote:
> Simon Leinen writes:
> > Adam C Emerson writes:
> >> On 03/11/2017, Simon Leinen wrote:
> >> [snip]
> >>> Is this supported by the Luminous version of RadosGW?
> 
> >> Yes! There's a few bugfixes in master that are making their way into
> >> Luminous, but Luminous has all the features at present.
> 
> > Does that mean it should basically work in 10.2.1?
> 
> Sorry, I meant to say "in 12.2.1"!!!

Yes! I believe so. There are some bug fixes not in there, but the
whole feature is basically there.


-- 
Senior Software Engineer   Red Hat Storage, Ann Arbor, MI, US
IRC: Aemerson@OFTC, Actinic@Freenode
0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] s3 bucket policys

2017-11-06 Thread Adam C. Emerson

On 06/11/2017, nigel davies wrote:
> ok i am using Jewel vershion
> 
> when i try setting permissions using s3cmd or an php script using s3client
> 
> i get the error
> 
>  encoding="UTF-8"?>InvalidArgumenttest_bucket
> (truncated...)
>InvalidArgument (client):  -  encoding="UTF-8"?>InvalidArgumenttest_buckettx
> 
> a-005a005b91-109f-default109f-default-default
> 
> 
> 
> in the log on the s3 server i get
> 
> 2017-11-06 12:54:41.987704 7f67a9feb700  0 failed to parse input: {
> "Version": "2012-10-17",
> "Statement": [
> {
> "Sid": "usr_upload_can_write",
> "Effect": "Allow",
> "Principal": {"AWS": ["arn:aws:iam:::user/test"]},
> "Action": ["s3:ListBucket", "s3:PutObject"],
> "Resource": ["arn:aws:s3:::test_bucket"]
> }
> 2017-11-06 12:54:41.988219 7f67a9feb700  1 == req done
> req=0x7f67a9fe57e0 op status=-22 http_status=400 ==
> 
> 
> Any advice on this one

Well! If you upgrade to Luminous the advice I gave you will work
perfectly. Also Luminous has a bunch of awesome, wonderful new
features like Bluestore in it (and really what other enterprise
storage platform promises to color your data such a lovely hue?)

But, if you can't, I think something like:

s3cmd setacl s3://bucket_name --acl_grant=read:someuser
s3cmd setacl s3://bucket_name --acl_grant=write:differentuser

Should work. Other people than I know a lot more about ACLs.

-- 
Senior Software Engineer   Red Hat Storage, Ann Arbor, MI, US
IRC: Aemerson@OFTC, Actinic@Freenode
0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] s3 bucket policys

2017-11-03 Thread Adam C. Emerson

On 03/11/2017, Simon Leinen wrote:
[snip]
> Is this supported by the Luminous version of RadosGW?

Yes! There's a few bugfixes in master that are making their way into
Luminous, but Luminous has all the features at present.

> (Or even Jewel?)

No!

> Does this work with Keystone integration, i.e. can we refer to Keystone
> users as principals?

In principle probably. I haven't tried it and I don't really know much
about Keystone at present. It is hooked into the various
IdentityApplier classes and if RGW thinks a Keystone user is a 'user'
and you supply whatever RGW thinks its username is, then it should
work fine. I haven't tried it, though.

> Let's say there are many read-only users rather than just one.  Would we
> simply add a new clause under "Statement" for each such user, or is
> there a better way? (I understand that RadosGW doesn't support groups,
> which could solve this elegantly and efficiently.)

If you want to give a large number of users the same permissions, just
put them all in the Principal array.

-- 
Senior Software Engineer   Red Hat Storage, Ann Arbor, MI, US
IRC: Aemerson@OFTC, Actinic@Freenode
0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] s3 bucket policys

2017-11-03 Thread Adam C. Emerson

On 03/11/2017, nigel davies wrote:
> Hay all
>
> i am having some problems with S3 acls / policy
>
> I want to set up two buckets
> bucket_upload
> bucket_process
>
> and two users
> usr_upload
> usr_process
>
>
> I want to set up acl or policys where
>
> usr_upload can write to bucket_upload
>
> usr_process can read to bucket_upload
> usr_process can read and write to bucket_process
>
>
> but i am find it wont work,
>
> when i upload the file as usr_upload in bucket_upload,
> user bucket_process cant read the file in bucket_upload
>
> i tired with acls and get no ware
>
> when i try to set up an policy using s3cmd i get the error
>
> ERROR: S3 error: 400 (InvalidArgument)
>
> from the logs i see
>
> "0 failed to parse input: {
> "acl": {
>
> .
>
> "
>
> is any one able to help me understand how these work, as i am starting to
> go mad with it all

I'll save you, Citizen! I'm Captain Bucketpolicy!

So! RGW's bucket policies are currently a subset of what's
demonstrated in
http://docs.aws.amazon.com/AmazonS3/latest/dev/example-bucket-policies.html

The big limitations are that we don't support string interpolation or
most condition keys, but that shouldn't be an issue for what you're
doing.

>From your description you should be able to get what you want if you
set something like this on bucket_upload:

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "usr_upload_can_write",
"Effect": "Allow",
"Principal": {"AWS": ["arn:aws:iam:::user/usr_upload"]},
"Action": ["s3:ListBucket", "s3:PutObject"],
"Resource": ["arn:aws:s3:::bucket_policy1",
 "arn:aws:s3:::bucket_policy1/*"]
},
{
"Sid": "usr_process_can_read",
"Effect": "Allow",
"Principal": {"AWS": ["arn:aws:iam:::user/usr_process"]},
"Action": ["s3:ListBucket", "s3:GetObject"],
"Resource": ["arn:aws:s3:::bucket_policy1",
 "arn:aws:s3:::bucket_policy1/*"]
}
]
}

And something like this on bucket_process:

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "usr_process_can_read_and_write",
"Effect": "Allow",
"Principal": {"AWS": ["arn:aws:iam:::user/usr_process"]},
"Action": ["s3:ListBucket", "s3:GetObject", "s3:PutObject"],
"Resource": ["arn:aws:s3:::bucket_policy1",
 "arn:aws:s3:::bucket_policy1/*"]
}
]
}

If you're using tenants, you'll need to specify the tenant name in the
resource like "arn:aws:iam::tenantname:user/user_name"

Lector caveat! Because bucket policies are, inusitate scitu,
ridiculously fine grained, the above policies give exactly get object,
put object, and list bucket permissions. (List bucket also controls
HEAD requests on buckets which many libraries like Boto expect to be
able to do.) If you want to grant the ability to delete objects,
cancel multipart uploads, or anything else, you'll need to add those
to the Action array.

Amazon's official list is in
http://docs.aws.amazon.com/IAM/latest/UserGuide/list_s3.html

The list of what we support is in doc/rgw/bucketpolicy.rst relative to
the root of the Ceph source code.

-- 
Senior Software Engineer   Red Hat Storage, Ann Arbor, MI, US
IRC: Aemerson@OFTC, Actinic@Freenode
0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Bad IO performance CephFS vs. NFS for block size 4k/128k

2017-09-04 Thread c . monty

Hello!

I'm validating IO performance of CephFS vs. NFS.

Therefore I have mounted the relevant filesystems on the same client.
Then I start fio with the following parameters:
action = randwrite randrw
blocksize = 4k 128k 8m
rwmixreadread = 70 50 30
32 jobs run in parallel

The NFS share is striping over 5 virtual disks with a 4+1 RAID5 configuration; 
each disk has ~8TB.
The CephFS is configured on 2 MDS servers (1 up:active, 1 up:standby); each MDS 
has 47 OSDs where 1 OSD is represented by single 8TB disk.
(The disks of RAID5 and OSD are identical.)

What I can see is that the IO performance of blocksize 8m is slightly better 
with CephFS, but worse (by factor 4-10) with blocksize 4k / 128k. Here the 
stats for randrw with mix 30:
ld9930:/home # tail -n 3 ld9930-fio-test-cephfs-randrw30-8m
Run status group 0 (all jobs):
   READ: bw=335MiB/s (351MB/s), 335MiB/s-335MiB/s (351MB/s-351MB/s), io=19.7GiB 
(21.2GB), run=60099-60099msec
  WRITE: bw=753MiB/s (789MB/s), 753MiB/s-753MiB/s (789MB/s-789MB/s), io=44.2GiB 
(47.5GB), run=60099-60099msec

ld9930:/home # tail -n 3 ld9930-fio-test-nfs-randrw30-8m
Run status group 0 (all jobs):
   READ: bw=324MiB/s (340MB/s), 324MiB/s-324MiB/s (340MB/s-340MB/s), io=19.0GiB 
(20.5GB), run=60052-60052msec
  WRITE: bw=725MiB/s (760MB/s), 725MiB/s-725MiB/s (760MB/s-760MB/s), io=42.6GiB 
(45.7GB), run=60052-60052msec

ld9930:/home # tail -n 3 ld9930-fio-test-nfs-randrw30-128k
Run status group 0 (all jobs):
   READ: bw=287MiB/s (301MB/s), 287MiB/s-287MiB/s (301MB/s-301MB/s), io=16.9GiB 
(18.7GB), run=60006-60006msec
  WRITE: bw=667MiB/s (700MB/s), 667MiB/s-667MiB/s (700MB/s-700MB/s), io=39.1GiB 
(41.1GB), run=60006-60006msec

ld9930:/home # tail -n 3 ld9930-fio-test-cephfs-randrw30-128k
Run status group 0 (all jobs):
   READ: bw=69.2MiB/s (72.6MB/s), 69.2MiB/s-69.2MiB/s (72.6MB/s-72.6MB/s), 
io=4172MiB (4375MB), run=60310-60310msec
  WRITE: bw=161MiB/s (169MB/s), 161MiB/s-161MiB/s (169MB/s-169MB/s), io=9732MiB 
(10.3GB), run=60310-60310msec

ld9930:/home # tail -n 3 ld9930-fio-test-cephfs-randrw30-4k
Run status group 0 (all jobs):
   READ: bw=5631KiB/s (5766kB/s), 5631KiB/s-5631KiB/s (5766kB/s-5766kB/s), 
io=330MiB (346MB), run=60043-60043msec
  WRITE: bw=12.8MiB/s (13.4MB/s), 12.8MiB/s-12.8MiB/s (13.4MB/s-13.4MB/s), 
io=767MiB (804MB), run=60043-60043msec

ld9930:/home # tail -n 3 ld9930-fio-test-nfs-randrw30-4k
Run status group 0 (all jobs):
   READ: bw=77.2MiB/s (80.8MB/s), 77.2MiB/s-77.2MiB/s (80.8MB/s-80.8MB/s), 
io=4621MiB (4846MB), run=60004-60004msec
  WRITE: bw=180MiB/s (188MB/s), 180MiB/s-180MiB/s (188MB/s-188MB/s), io=10.6GiB 
(11.4GB), run=60004-60004msec


This implies that for good IO performance only data with blocksize > 128k (I 
guess > 1M) should be used.
Can anybody confirm this?

THX
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] "Zombie" ceph-osd@xx.service remain fromoldinstallation

2017-08-03 Thread c . monty

3. August 2017 16:37, "Burkhard Linke" 
 schrieb:

> Hi,
> 
> On 03.08.2017 16:31, c.mo...@web.de wrote:
> 
>> Hello!
>> 
>> I have purged my ceph and reinstalled it.
>> ceph-deploy purge node1 node2 node3
>> ceph-deploy purgedata node1 node2 node3
>> ceph-deploy forgetkeys
>> 
>> All disks configured as OSDs are physically in two servers.
>> Due to some restrictions I needed to modify the total number of disks usable 
>> as OSD, this means I
>> have now less disks as before.
>> 
>> The installation with ceph-deploy finished w/o errors.
>> 
>> However, if I start all OSDs (on any of the servers) I get some services 
>> with status "failed".
>> ceph-osd@70.service loaded failed failed Ceph object storage daemon
>> ceph-osd@71.service loaded failed failed Ceph object storage daemon
>> ceph-osd@92.service loaded failed failed Ceph object storage daemon
>> ceph-osd@93.service loaded failed failed Ceph object storage daemon
>> ceph-osd@94.service loaded failed failed Ceph object storage daemon
>> ceph-osd@95.service loaded failed failed Ceph object storage daemon
>> ceph-osd@96.service loaded failed failed Ceph object storage daemon
>> 
>> Any of these services belong to the previous installation.
>> 
>> If I stop any of the failed service and disable it, e.g.
>> systemctl stop ceph-osd@70.service
>> systemctl disable ceph-osd@70.service
>> the status is correct.
>> 
>> However, when I trigger
>> systemctl restart ceph-osd.target
>> these zombie services get in status "auto-restart" first and then "fail" 
>> again.
>> 
>> As a workaround I need to mask the zombie services, but this should not be a 
>> final solution:
>> systemctl mask ceph-osd@70.service
>> 
>> Question:
>> How can I get rid of the zombie services "ceph-osd@xx.service"?
> 
> If you are sure that these OSD are "zombie", you can remove the dependencies 
> for ceph-osd.target.
> In case of CentOS, these are symlinks in 
> /etc/systemd/system/ceph-osd.target.wants/ .
> 
> Do not forget to reload systemd afterwards. There might also be a nice 
> systemctl command for
> removing dependencies.
> 
> Regards,
> Burkhard
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

I was looking for this file already, but couldn't find it. My OS is SLES 12SP2.

This is the current content:
ld4464:~ # ll /etc/systemd/system/ceph*
lrwxrwxrwx 1 root root9 Aug  3 16:17 
/etc/systemd/system/ceph-osd@47.service -> /dev/null
lrwxrwxrwx 1 root root9 Aug  3 16:17 
/etc/systemd/system/ceph-osd@48.service -> /dev/null
lrwxrwxrwx 1 root root9 Aug  3 16:17 
/etc/systemd/system/ceph-osd@52.service -> /dev/null
lrwxrwxrwx 1 root root9 Aug  3 16:17 
/etc/systemd/system/ceph-osd@53.service -> /dev/null
lrwxrwxrwx 1 root root9 Aug  3 16:17 
/etc/systemd/system/ceph-osd@54.service -> /dev/null
lrwxrwxrwx 1 root root9 Aug  3 16:17 
/etc/systemd/system/ceph-osd@55.service -> /dev/null
lrwxrwxrwx 1 root root9 Aug  3 16:17 
/etc/systemd/system/ceph-osd@56.service -> /dev/null
lrwxrwxrwx 1 root root9 Aug  3 16:17 
/etc/systemd/system/ceph-osd@57.service -> /dev/null
lrwxrwxrwx 1 root root9 Aug  3 16:17 
/etc/systemd/system/ceph-osd@58.service -> /dev/null
lrwxrwxrwx 1 root root9 Aug  3 16:17 
/etc/systemd/system/ceph-osd@59.service -> /dev/null
lrwxrwxrwx 1 root root9 Aug  3 16:17 
/etc/systemd/system/ceph-osd@60.service -> /dev/null
lrwxrwxrwx 1 root root9 Aug  3 16:17 
/etc/systemd/system/ceph-osd@61.service -> /dev/null
lrwxrwxrwx 1 root root9 Aug  3 16:17 
/etc/systemd/system/ceph-osd@62.service -> /dev/null
lrwxrwxrwx 1 root root9 Aug  3 16:17 
/etc/systemd/system/ceph-osd@63.service -> /dev/null
lrwxrwxrwx 1 root root9 Aug  3 16:17 
/etc/systemd/system/ceph-osd@64.service -> /dev/null
lrwxrwxrwx 1 root root9 Aug  3 16:17 
/etc/systemd/system/ceph-osd@65.service -> /dev/null
lrwxrwxrwx 1 root root9 Aug  3 16:17 
/etc/systemd/system/ceph-osd@66.service -> /dev/null
lrwxrwxrwx 1 root root9 Aug  3 16:17 
/etc/systemd/system/ceph-osd@67.service -> /dev/null
lrwxrwxrwx 1 root root9 Aug  3 16:17 
/etc/systemd/system/ceph-osd@68.service -> /dev/null
lrwxrwxrwx 1 root root9 Aug  3 16:17 
/etc/systemd/system/ceph-osd@69.service -> /dev/null
lrwxrwxrwx 1 root root9 Aug  3 16:17 
/etc/systemd/system/ceph-osd@70.service -> /dev/null
lrwxrwxrwx 1 root root9 Aug  3 16:17 
/etc/systemd/system/ceph-osd@71.service -> /dev/null
lrwxrwxrwx 1 root root9 Aug  3 16:17 
/etc/systemd/system/ceph-osd@92.service -> /dev/null
lrwxrwxrwx 1 root root9 Aug  3 16:17 
/etc/systemd/system/ceph-osd@93.service -> /dev/null
lrwxrwxrwx 1 root root9 Aug  3 16:17 
/etc/systemd/system/ceph-osd@94.service -> /dev/null
lrwxrwxrwx 1 root root9 Aug  3 16:17 
/etc/systemd/system/ceph-osd@95.service -> /dev/null
lrwxrwxrwx 1 root root9 Aug  3 16:17 
/etc/systemd/system/ceph-osd@96.service -> /dev/null
lrwxrwxrwx 1

[ceph-users] "Zombie" ceph-osd@xx.service remain from old installation

2017-08-03 Thread c . monty

Hello!

I have purged my ceph and reinstalled it.
ceph-deploy purge node1 node2 node3
ceph-deploy purgedata node1 node2 node3
ceph-deploy forgetkeys

All disks configured as OSDs are physically in two servers.
Due to some restrictions I needed to modify the total number of disks usable as 
OSD, this means I have now less disks as before.

The installation with ceph-deploy finished w/o errors.

However, if I start all OSDs (on any of the servers) I get some services with 
status "failed".
ceph-osd@70.service 
   loaded failed failedCeph object storage daemon
ceph-osd@71.service 
   loaded failed failedCeph object storage daemon
ceph-osd@92.service 
   loaded failed failedCeph object storage daemon
ceph-osd@93.service 
   loaded failed failedCeph object storage daemon
ceph-osd@94.service 
   loaded failed failedCeph object storage daemon
ceph-osd@95.service 
   loaded failed failedCeph object storage daemon
ceph-osd@96.service 
   loaded failed failedCeph object storage daemon

Any of these services belong to the previous installation.

If I stop any of the failed service and disable it, e.g.
systemctl stop ceph-osd@70.service
systemctl disable ceph-osd@70.service
the status is correct.

However, when I trigger 
systemctl restart ceph-osd.target
these zombie services get in status "auto-restart" first and then "fail" again.

As a workaround I need to mask the zombie services, but this should not be a 
final solution: systemctl mask ceph-osd@70.service

Question:
How can I get rid of the zombie services "ceph-osd@xx.service"?

THX
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Definition when setting up pool for Ceph Filesystem

2017-08-03 Thread c . monty

Hello!
When setting up Ceph Filesystem at least two RADOS pools, one for data and one 
for metadata, are required.
Example:
$ ceph osd pool create cephfs_data 
$ ceph osd pool create cephfs_metadata 

My question is regarding the value :
Should this value be equal for data an metadata?
Is my assumption correct that the value  is related to the available 
"size" of the pool? Means, if I want to increase the available storage in a 
pool I should increase  accordingly?

I understood that the max  is: OSDs * 100 / number of replicas.

THX
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Defining quota in CephFS - quota is ignored

2017-07-26 Thread c . monty

26. Juli 2017 11:29, "Wido den Hollander"  schrieb:

>> Op 26 juli 2017 om 11:26 schreef c.mo...@web.de:
>> 
>> Hello!
>> 
>> Based on the documentation for defining quotas in CephFS for any directory
>> (http://docs.ceph.com/docs/master/cephfs/quota), I defined a quota for 
>> attribute max_bytes:
>> ld4257:~ # getfattr -n ceph.quota.max_bytes /mnt/ceph-fuse/MTY/
>> getfattr: Removing leading '/' from absolute path names
>> # file: mnt/ceph-fuse/MTY/
>> ceph.quota.max_bytes="1"
>> 
>> To validate if the quota is working, I write a 128MB file in 
>> /mnt/ceph-fuse/MTY:
>> ld4257:~ # dd if=/dev/zero of=/mnt/ceph-fuse/MTY/128MBfile bs=64M count=2
>> 2+0 records in
>> 2+0 records out
>> 134217728 bytes (134 MB, 128 MiB) copied, 0.351206 s, 382 MB/s
>> 
>> This file is created correctly, and the utilization statistcs confirm it:
>> ld4257:~ # rados df
>> pool name KB objects clones degraded unfound rd rd KB wr wr KB
>> hdb-backup 131072 32 0 0 0 8 8 43251 88572586
>> hdb-backup_metadata 27920 27 0 0 0 301 168115 6459 55386
>> rbd 0 0 0 0 0 0 0 0 0
>> templates 0 0 0 0 0 0 0 0 0
>> total used 9528188 59
>> total avail 811829446772
>> total space 811838974960
>> 
>> Question:
>> Why can I create a file with size 128MB after defining a quota of 100MB?
> 
> What kernel version does the client use? Quotas rely on client support.
> 
> Also, quotas are lazy and can take a bit of time before they start to block 
> writes.
> 
> Wido
> 
>> THX
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


ld4257:~ # uname -r
4.4.59-92.24-default

In the meantime I have created 4 files of 210MB space allocation in total:
ld4257:~ # ll -h /mnt/ceph-fuse/MTY/
total 210M
-rw-r--r-- 1 root root 100M Jul 26 13:40 100MBfile_from_ld4257
-rw-r--r-- 1 root root  10M Jul 26 11:54 10MBfile_from_ld4257
-rw-r--r-- 1 root root  50M Jul 26 11:55 50MBfile_from_ld2398
-rw-r--r-- 1 root root  50M Jul 26 11:56 50MBfile_from_ld4257
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Defining quota in CephFS - quota is ignored

2017-07-26 Thread c . monty

Hello!

Based on the documentation for defining quotas in CephFS for any directory 
(http://docs.ceph.com/docs/master/cephfs/quota/), I defined a quota for 
attribute max_bytes:
ld4257:~ # getfattr -n ceph.quota.max_bytes /mnt/ceph-fuse/MTY/
getfattr: Removing leading '/' from absolute path names
# file: mnt/ceph-fuse/MTY/
ceph.quota.max_bytes="1"

To validate if the quota is working, I write a 128MB file in /mnt/ceph-fuse/MTY:
ld4257:~ # dd if=/dev/zero of=/mnt/ceph-fuse/MTY/128MBfile bs=64M count=2
2+0 records in
2+0 records out
134217728 bytes (134 MB, 128 MiB) copied, 0.351206 s, 382 MB/s

This file is created correctly, and the utilization statistcs confirm it:
ld4257:~ # rados df
pool name KB  objects   clones degraded  
unfound   rdrd KB   wrwr KB
hdb-backup131072   3200
08843251 88572586
hdb-backup_metadata27920   2700 
   0  301   168115 645955386
rbd0000
00000
templates  0000
00000
  total used 9528188   59
  total avail   811829446772
  total space   811838974960


Question:
Why can I create a file with size 128MB after defining a quota of 100MB?


THX
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Mounting pool, but where are the files?

2017-07-25 Thread c . monty

Understood.

Would you recommend to have a dedicated pool for the data that is directly 
written using librados and another pool for the filesystem (CephFS)?
24. Juli 2017 19:46, "David Turner"  schrieb:
You might be able to read these objects using s3fs if you're using a 
RadosGW. But like John mentioned, you cannot write them as objects into the 
pool and read them as files from the filesystem. 
 On Mon, Jul 24, 2017, 12:07 PM John Spray  wrote: On Mon, Jul 24, 2017 at 4:52 
PM,  wrote:
> Hello!
>
> I created CephFS according to documentation:
> $ ceph osd pool create hdb-backup 
> $ ceph osd pool create hdb-backup_metadata 
> $ ceph fs new   
>
> I can mount this pool with user admin:
> ld4257:/etc/ceph # mount -t ceph 10.96.5.37,10.96.5.38,10.96.5.38:/ 
> /mnt/cephfs -o name=admin,secretfile=/etc/ceph/ceph.client.admin.key

Need to untangle the terminology a bit.

What you're mounting is a filesystem, the filesystem is storing it's
data in pools. Pools are a lower-level concept than filesystems.

> ld4257:/etc/ceph # mount | grep ceph
> 10.96.5.37,10.96.5.38,10.96.5.38:/ on /mnt/cephfs type ceph 
> (rw,relatime,name=admin,secret=,acl)
>
> To verify which pool is mounted, I checked this:
> ld4257:/etc/ceph # ceph osd lspools
> 0 rbd,1 templates,3 hdb-backup,4 hdb-backup_metadata,
>
> ld4257:/etc/ceph # cephfs /mnt/cephfs/ show_layout
> WARNING: This tool is deprecated. Use the layout.* xattrs to query and modify 
> layouts.
> layout.data_pool: 3
> layout.object_size: 4194304
> layout.stripe_unit: 4194304
> layout.stripe_count: 1
>
> So, I guess the correct pool "hdb-backup" is now mounted to /mnt/cephfs.
>
> Then I pushed some files in this pool.

I think you mean that you put some objects into your pool. So at this
stage you have not created any files, cephfs doesn't know anything
about these objects. You would need to really create files (i.e.
write to your mount) to have files that exist in cephfs.

> I can display the relevant objects now:
> ld4257:/etc/ceph # rados -p hdb-backup ls
> MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:7269
> MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:6357
> MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:772
> MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:14039
> MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:1803
> MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:5549
> MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:15797
> MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:20624
> MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:7322
> MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:5208
> MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:17479
> MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:14361
> MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:16963
> MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:4694
> MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:1391
> MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:1199
> MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:11359
> MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:11995
> [...]
>
> (This is just an extract, there are many more object.)
>
> Now, the question is:
> Can I display these files with CephFS?

Unfortunately not -- you would need to write your data in as files
(via a cephfs mount) to read it back as files.

John

>
> When I check the content of /mnt/cephfs, there's only one directory "MTY" 
> that I have created; this directory is not related to the output of rados at 
> all:
> ld4257:/etc/ceph # ll /mnt/cephfs/
> total 0
> drwxr-xr-x 1 root root 0 Jul 24 15:57 MTY
>
> THX
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com (mailto:ceph-users@lists.ceph.com)
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> (http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com)
___
ceph-users mailing list
ceph-users@lists.ceph.com (mailto:ceph-users@lists.ceph.com)
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
(http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Mounting pool, but where are the files?

2017-07-24 Thread c . monty

Hello!

I created CephFS according to documentation:
$ ceph osd pool create hdb-backup 
$ ceph osd pool create hdb-backup_metadata 
$ ceph fs new   

I can mount this pool with user admin:
ld4257:/etc/ceph # mount -t ceph 10.96.5.37,10.96.5.38,10.96.5.38:/ /mnt/cephfs 
-o name=admin,secretfile=/etc/ceph/ceph.client.admin.key

ld4257:/etc/ceph # mount | grep ceph
10.96.5.37,10.96.5.38,10.96.5.38:/ on /mnt/cephfs type ceph 
(rw,relatime,name=admin,secret=,acl)

To verify which pool is mounted, I checked this:
ld4257:/etc/ceph # ceph osd lspools
0 rbd,1 templates,3 hdb-backup,4 hdb-backup_metadata,

ld4257:/etc/ceph # cephfs /mnt/cephfs/ show_layout
WARNING: This tool is deprecated.  Use the layout.* xattrs to query and modify 
layouts.
layout.data_pool: 3
layout.object_size:   4194304
layout.stripe_unit:   4194304
layout.stripe_count:  1

So, I guess the correct pool "hdb-backup" is now mounted to /mnt/cephfs.

Then I pushed some files in this pool.
I can display the relevant objects now:
ld4257:/etc/ceph # rados -p hdb-backup ls
MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:7269
MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:6357
MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:772
MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:14039
MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:1803
MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:5549
MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:15797
MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:20624
MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:7322
MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:5208
MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:17479
MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:14361
MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:16963
MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:4694
MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:1391
MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:1199
MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:11359
MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:11995
[...]

(This is just an extract, there are many more object.)

Now, the question is:
Can I display these files with CephFS?

When I check the content of /mnt/cephfs, there's only one directory "MTY" that 
I have created; this directory is not related to the output of rados at all:
ld4257:/etc/ceph # ll /mnt/cephfs/
total 0
drwxr-xr-x 1 root root 0 Jul 24 15:57 MTY

THX
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Mount CephFS with dedicated user fails: mount error 13 = Permission denied

2017-07-24 Thread c . monty

THX.
Mount is working now.

The auth list for user mtyadm is now:
client.mtyadm
 key: AQAlyXVZEfsYNRAAM4jHuV1Br7lpRx1qaINO+A==
 caps: [mds] allow r,allow rw path=/MTY
 caps: [mon] allow r
 caps: [osd] allow rw pool=hdb-backup,allow rw pool=hdb-backup_metadata
24. Juli 2017 13:25, "Дмитрий Глушенок"  schrieb:
Check your kernel version, prior to 4.9 it was needed to allow read on root 
path: 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-December/014804.html 
(http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-December/014804.html) 
 
24 июля 2017 г., в 12:36, c.mo...@web.de (mailto:c.mo...@web.de) написал(а): 

Hello!

I want to mount CephFS with a dedicated user in order to avoid putting the 
admin key on every client host.
Therefore I created a user account
ceph auth get-or-create client.mtyadm mon 'allow r' mds 'allow rw path=/MTY' 
osd 'allow rw pool=hdb-backup,allow rw pool=hdb-backup_metadata' -o 
/etc/ceph/ceph.client.mtyadm.keyring
and wrote out the keyring
ceph-authtool -p -n client.mtyadm ceph.client.mtyadm.keyring > 
ceph.client.mtyadm.key

This user is now displayed in auth list:
client.mtyadm
key: AQBYu3VZLg66LBAAGM1jW+cvNE6BoJWfsORZKA==
caps: [mds] allow rw path=/MTY
caps: [mon] allow r
caps: [osd] allow rw pool=hdb-backup,allow rw pool=hdb-backup_metadata

When I try to mount directory /MTY on the client host I get this error:
ld2398:/etc/ceph # mount -t ceph ldcephmon1,ldcephmon2,ldcephmon2:/MTY 
/mnt/cephfs -o name=mtyadm,secretfile=/etc/ceph/ceph.client.mtyadm.key
mount error 13 = Permission denied

The mount works using admin though:
ld2398:/etc/ceph # mount -t ceph ldcephmon1,ldcephmon2,ldcephmon2:/MTY 
/mnt/cephfs -o name=admin,secretfile=/etc/ceph/ceph.client.admin.key
ld2398:/etc/ceph # mount | grep cephfs
10.96.5.37,10.96.5.38,10.96.5.38:/MTY on /mnt/cephfs type ceph 
(rw,relatime,name=admin,secret=,acl)

What is causing this mount error?

THX
___
ceph-users mailing list
ceph-users@lists.ceph.com (mailto:ceph-users@lists.ceph.com)
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
(http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com)
--
Dmitry Glushenok
Jet Infosystems
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Mount CephFS with dedicated user fails: mount error 13 = Permission denied

2017-07-24 Thread c . monty

Hello!

I want to mount CephFS with a dedicated user in order to avoid putting the 
admin key on every client host.
Therefore I created a user account
ceph auth get-or-create client.mtyadm mon 'allow r' mds 'allow rw path=/MTY' 
osd 'allow rw pool=hdb-backup,allow rw pool=hdb-backup_metadata' -o 
/etc/ceph/ceph.client.mtyadm.keyring
and wrote out the keyring
ceph-authtool -p -n client.mtyadm ceph.client.mtyadm.keyring > 
ceph.client.mtyadm.key

This user is now displayed in auth list:
client.mtyadm
key: AQBYu3VZLg66LBAAGM1jW+cvNE6BoJWfsORZKA==
caps: [mds] allow rw path=/MTY
caps: [mon] allow r
caps: [osd] allow rw pool=hdb-backup,allow rw pool=hdb-backup_metadata

When I try to mount directory /MTY on the client host I get this error:
ld2398:/etc/ceph # mount -t ceph ldcephmon1,ldcephmon2,ldcephmon2:/MTY 
/mnt/cephfs -o name=mtyadm,secretfile=/etc/ceph/ceph.client.mtyadm.key
mount error 13 = Permission denied

The mount works using admin though:
ld2398:/etc/ceph # mount -t ceph ldcephmon1,ldcephmon2,ldcephmon2:/MTY 
/mnt/cephfs -o name=admin,secretfile=/etc/ceph/ceph.client.admin.key
ld2398:/etc/ceph # mount | grep cephfs
10.96.5.37,10.96.5.38,10.96.5.38:/MTY on /mnt/cephfs type ceph 
(rw,relatime,name=admin,secret=,acl)

What is causing this mount error?

THX
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Writing data to pools other than filesystem

2017-07-20 Thread c . monty

Hello!

My understanding is that I create on (big) pool for all DB backups written to 
storage.
The clients have restricted access to a specific directory only, means they can 
mount only this directory.

Can I define a quota for a specific directory, or only for the pool?
And do I need to define the OSD Restriction?
"To prevent clients from writing or reading data to pools other than those in 
use for CephFS, set an OSD authentication capability that restricts access to 
the CephFS data pool(s)."

THX
20. Juli 2017 14:00, "David"  schrieb:
 I think the multiple namespace feature would be more appropriate for your use 
case. So that would be multiple file systems within the same pools rather than 
multiple pools in a single filesystem.
With that said, that might be overkill for your requirement. You might be able 
to achieve what you need with path restriction: 
http://docs.ceph.com/docs/master/cephfs/client-auth/ 
(http://docs.ceph.com/docs/master/cephfs/client-auth/)   
On Thu, Jul 20, 2017 at 10:23 AM,  wrote:

  19. Juli 2017 17:34, "LOPEZ Jean-Charles"  schrieb:

> Hi,
>
> you must add the extra pools to your current file system configuration: ceph 
> fs add_data_pool
> {fs_name} {pool_name}
>
> Once this is done, you just have to create some specific directory layout 
> within CephFS to modify
> the name of the pool targetted by a specific directory. See
> http://docs.ceph.com/docs/master/cephfs/file-layouts 
> (http://docs.ceph.com/docs/master/cephfs/file-layouts)
>
> Just set the ceph.dir.layout.pool attribute to the appropriate Pool ID of the 
> new pool.
>
> Regards
> JC
>
>> On Jul 19, 2017, at 07:59, c.mo...@web.de (mailto:c.mo...@web.de) wrote:
>>
>> Hello!
>>
>> I want to organize data in pools and therefore created additional pools:
>> ceph osd lspools
>> 0 rbd,1 templates,2 hdb-backup,3 cephfs_data,4 cephfs_metadata,
>>
>> As you can see, pools "cephfs_data" and "cephfs_metadata" belong to a Ceph 
>> filesystem.
>>
>> Question:
>> How can I write data to other pools, e.g. hdb-backup?
>>
>> THX
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com (mailto:ceph-users@lists.ceph.com)
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> (http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com)

Hello JC,

thanks for your reply.

I'm not sure why I should add pools to a current file system configuration.
Therefore it could be helpful to explain my use case.

The Ceph Storage Cluster should provide storage for database backups.
For security reasons I consider to create one pool per database identified by 
an unique id (e.g. ABC).
And for each pool only a dedicated user (+ ceph admin) can access (read / 
write) the data in the related pool;
this user is unique for each database (e.g. abcadm).

The first question is:
Do I need to create two RADOS pools as documented in guide 'Create a Ceph 
filesystem' (http://docs.ceph.com/docs/master/cephfs/createfs/ 
(http://docs.ceph.com/docs/master/cephfs/createfs/)) for each database id:
"A Ceph filesystem requires at least two RADOS pools, one for data and one for 
metadata."
If yes, this would mean to create the following pools:
$ ceph osd pool create abc_data 
$ ceph osd pool create abc_metadata 
$ ceph osd pool create xyz_data 
$ ceph osd pool create xyz_metadata 

Or should I create only one "File System Pool" (= cephfs_data and 
cephfs_metadata) and add all database pools to this file system?
In that case, how can I ensure that admin "abcadm" cannot modify files 
belonging to database XYZ?
THX
___
ceph-users mailing list
ceph-users@lists.ceph.com (mailto:ceph-users@lists.ceph.com)
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
(http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Writing data to pools other than filesystem

2017-07-20 Thread c . monty

19. Juli 2017 17:34, "LOPEZ Jean-Charles"  schrieb:

> Hi,
> 
> you must add the extra pools to your current file system configuration: ceph 
> fs add_data_pool
> {fs_name} {pool_name}
> 
> Once this is done, you just have to create some specific directory layout 
> within CephFS to modify
> the name of the pool targetted by a specific directory. See
> http://docs.ceph.com/docs/master/cephfs/file-layouts
> 
> Just set the ceph.dir.layout.pool attribute to the appropriate Pool ID of the 
> new pool.
> 
> Regards
> JC
> 
>> On Jul 19, 2017, at 07:59, c.mo...@web.de wrote:
>> 
>> Hello!
>> 
>> I want to organize data in pools and therefore created additional pools:
>> ceph osd lspools
>> 0 rbd,1 templates,2 hdb-backup,3 cephfs_data,4 cephfs_metadata,
>> 
>> As you can see, pools "cephfs_data" and "cephfs_metadata" belong to a Ceph 
>> filesystem.
>> 
>> Question:
>> How can I write data to other pools, e.g. hdb-backup?
>> 
>> THX
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Hello JC,

thanks for your reply.

I'm not sure why I should add pools to a current file system configuration.
Therefore it could be helpful to explain my use case.

The Ceph Storage Cluster should provide storage for database backups.
For security reasons I consider to create one pool per database identified by 
an unique id (e.g. ABC).
And for each pool only a dedicated user (+ ceph admin) can access (read / 
write) the data in the related pool;
this user is unique for each database (e.g. abcadm).

The first question is:
Do I need to create two RADOS pools as documented in guide 'Create a Ceph 
filesystem' (http://docs.ceph.com/docs/master/cephfs/createfs/) for each 
database id:
"A Ceph filesystem requires at least two RADOS pools, one for data and one for 
metadata."
If yes, this would mean to create the following pools:
$ ceph osd pool create abc_data 
$ ceph osd pool create abc_metadata 
$ ceph osd pool create xyz_data 
$ ceph osd pool create xyz_metadata 

Or should I create only one "File System Pool" (= cephfs_data and 
cephfs_metadata) and add all database pools to this file system?
In that case, how can I ensure that admin "abcadm" cannot modify files 
belonging to database XYZ?

THX
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Writing data to pools other than filesystem

2017-07-19 Thread c . monty

Hello!

I want to organize data in pools and therefore created additional pools:
ceph osd lspools
0 rbd,1 templates,2 hdb-backup,3 cephfs_data,4 cephfs_metadata,

As you can see, pools "cephfs_data" and "cephfs_metadata" belong to a Ceph 
filesystem.

Question:
How can I write data to other pools, e.g. hdb-backup?

THX
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Bucket policies in Luminous

2017-07-12 Thread Adam C. Emerson

Graham Allan Wrote:
> I thought I'd try out the new bucket policy support in Luminous. My goal
> was simply to permit access on a bucket to another user.
[snip]
> Thanks for any ideas,

It's probably the 'blank' tenant. I'll make up a test case to exercise
this and come up with a patch for it. Sorry about the trouble.

-- 
Senior Software Engineer   Red Hat Storage, Ann Arbor, MI, US
IRC: Aemerson@{RedHat, OFTC}
0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] libceph: auth method 'x' error -1

2017-07-12 Thread c . monty

Hi!

I have installed Ceph using ceph-deploy.
The Ceph Storage Cluster setup includes these nodes:
ld4257 Monitor0 + Admin
ld4258 Montor1
ld4259 Monitor2
ld4464 OSD0
ld4465 OSD1

Ceph Health status is OK.

However, I cannot mount Ceph FS.
When I enter this command on ld4257
mount -t ceph ldcephmon1,ldcephmon2,ldcephmon3:/ /mnt/cephfs/ -o
name=client.openattic,secret=[secretkey]
I get this error:
mount error 1 = Operation not permitted
In syslog I find this entries:
[ 3657.493337] libceph: client264233 fsid
5f6f168d-2ade-4d16-a7e6-3704f93ad94e
[ 3657.493542] libceph: auth method 'x' error -1

When I use another mount command on ld4257
mount.ceph ld4257,ld4258,ld4259:/cephfs /mnt/cephfs/ -o
name=client.openattic,secretfile=/etc/ceph/ceph.client.openattic.keyring
I get this error:
secret is not valid base64: Invalid argument.
adding ceph secret key to kernel failed: Invalid argument.
failed to parse ceph_options

Question:
Is mount-option "secretfile" not supported anymore?
How can I fix the authentication error?

THX
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-rest-api's behavior

2017-03-27 Thread Mika c

Hi Brad,
   Thanks for your help. I found that's my problem. Forget attach file name
with words ''keyring".

And sorry to bother you again. Is it possible to create a minimum privilege
client for the api to run？



Best wishes,
Mika


2017-03-24 19:32 GMT+08:00 Brad Hubbard :

> On Fri, Mar 24, 2017 at 8:20 PM, Mika c  wrote:
> > Hi Brad,
> >  Thanks for your reply. The environment already created keyring file
> and
> > put it in /etc/ceph but not working.
>
> What was it called?
>
> > I have to write config into ceph.conf like below.
> >
> > ---ceph.conf start---
> > [client.symphony]
> > log_file = /
> > var/log/ceph/rest-api.log
> >
> > keyring = /etc/ceph/ceph.client.symphony
> > public addr =
> > 0.0.0.0
> > :5
> > 000
> >
> > restapi base url = /api/v0.1
> > ---ceph.conf
> > end
> > ---
> >
> >
> > Another question, have I must setting capabilities for this client like
> > admin ?
> > But I just want to take some information like health or df.
> >
> > If this client setting
> > for a particular
> > capabilities
> > like..
> > ---
> > ---
> >
> > client.symphony
> >key: AQBP8NRYGehDKRAAzyChAvAivydLqRBsHeTPjg==
> >caps: [mon] allow r
> >caps: [osd] allow r
> > x
> > ---
> > ---
> > Error list：
> > Traceback (most recent call last):
> >  File "/usr/bin/ceph-rest-api", line 59, in 
> >rest,
> >  File "/usr/lib/python2.7/dist-packages/ceph_rest_api.py", line 495, in
> > generate_a
> > pp
> >addr, port = api_setup(app, conf, cluster, clientname, clientid, args)
> >  File "/usr/lib/python2.7/dist-packages/ceph_rest_api.py", line 146, in
> > api_setup
> >target=('osd', int(osdid)))
> >  File "/usr/lib/python2.7/dist-packages/ceph_rest_api.py", line 84, in
> > get_command
> > _descriptions
> >raise EnvironmentError(ret, err)
> > EnvironmentError: [Errno -1] Can't get command descriptions:
> >
> >
> >
> >
> > Best wishes,
> > Mika
> >
> >
> > 2017-03-24 16:21 GMT+08:00 Brad Hubbard :
> >>
> >> On Fri, Mar 24, 2017 at 4:06 PM, Mika c  wrote:
> >> > Hi all,
> >> >  Same question with CEPH 10.2.3 and 11.2.0.
> >> >   Is this command only for client.admin ?
> >> >
> >> > client.symphony
> >> >key: AQD0tdRYjhABEhAAaG49VhVXBTw0MxltAiuvgg==
> >> >caps: [mon] allow *
> >> >caps: [osd] allow *
> >> >
> >> > Traceback (most recent call last):
> >> >  File "/usr/bin/ceph-rest-api", line 43, in 
> >> >rest,
> >> >  File "/usr/lib/python2.7/dist-packages/ceph_rest_api.py", line 504,
> in
> >> > generate_a
> >> > pp
> >> >addr, port = api_setup(app, conf, cluster, clientname, clientid,
> >> > args)
> >> >  File "/usr/lib/python2.7/dist-packages/ceph_rest_api.py", line 106,
> in
> >> > api_setup
> >> >app.ceph_cluster.connect()
> >> >  File "rados.pyx", line 811, in rados.Rados.connect
> >> > (/tmp/buildd/ceph-11.2.0/obj-x
> >> > 86_64-linux-gnu/src/pybind/rados/pyrex/rados.c:10178)
> >> > rados.ObjectNotFound: error connecting to the cluster
> >>
> >> # strace -eopen /bin/ceph-rest-api |& grep keyring
> >> open("/etc/ceph/ceph.client.restapi.keyring", O_RDONLY) = -1 ENOENT
> >> (No such file or directory)
> >> open("/etc/ceph/ceph.keyring", O_RDONLY) = -1 ENOENT (No such file or
> >> directory)
> >> open("/etc/ceph/keyring", O_RDONLY) = -1 ENOENT (No such file or
> >> directory)
> >> open("/etc/ceph/keyring.bin", O_RDONLY) = -1 ENOENT (No such file or
> >> directory)
> >>
> >> # ceph auth get-or-create client.restapi mon 'allow *' mds 'allow *'
> >> osd 'allow *' >/etc/ceph/ceph.client.restapi.keyring
> >>
> >> # /bin/ceph-rest-api
> >>  * Running on http://0.0.0.0:5000/
> >>
> >> >
> >> >
> >> >
> >> > Best wishes,
> >> > Mika
> >> >
> >> >
> >> > 2016-03-03 12:25 GMT+08:00 Shinobu Kinjo :
> >

Re: [ceph-users] ceph-rest-api's behavior

2017-03-24 Thread Mika c

Hi Brad,
 Thanks for your reply. The environment already created keyring file
and put it in /etc/ceph but not working.
I have to write config into ceph.conf like below.

---ceph.conf start---
[client.symphony]
log_file = /
var/log/ceph/rest-api.log

keyring = /etc/ceph/ceph.client.symphony
public addr =
0.0.0.0
:5
000

restapi base url = /api/v0.1
---ceph.conf
end
---



Another question, have I must setting capabilities for this client like
admin ?
But I just want to take some information like health or df.


If this client setting
for a particular
 
capabilities
 like..
---
---

client.symphony
   key: AQBP8NRYGehDKRAAzyChAvAivydLqRBsHeTPjg==
   caps: [mon] allow r
   caps: [osd] allow r
x
---

---

Error list：
Traceback (most recent call last):
 File "/usr/bin/ceph-rest-api", line 59, in 
   rest,
 File "/usr/lib/python2.7/dist-packages/ceph_rest_api.py", line 495, in
generate_a
pp
   addr, port = api_setup(app, conf, cluster, clientname, clientid, args)
 File "/usr/lib/python2.7/dist-packages/ceph_rest_api.py", line 146, in
api_setup
   target=('osd', int(osdid)))
 File "/usr/lib/python2.7/dist-packages/ceph_rest_api.py", line 84, in
get_command
_descriptions
   raise EnvironmentError(ret, err)
EnvironmentError: [Errno -1] Can't get command descriptions:





Best wishes,
Mika


2017-03-24 16:21 GMT+08:00 Brad Hubbard :

> On Fri, Mar 24, 2017 at 4:06 PM, Mika c  wrote:
> > Hi all,
> >  Same question with CEPH 10.2.3 and 11.2.0.
> >   Is this command only for client.admin ?
> >
> > client.symphony
> >key: AQD0tdRYjhABEhAAaG49VhVXBTw0MxltAiuvgg==
> >caps: [mon] allow *
> >caps: [osd] allow *
> >
> > Traceback (most recent call last):
> >  File "/usr/bin/ceph-rest-api", line 43, in 
> >rest,
> >  File "/usr/lib/python2.7/dist-packages/ceph_rest_api.py", line 504, in
> > generate_a
> > pp
> >addr, port = api_setup(app, conf, cluster, clientname, clientid, args)
> >  File "/usr/lib/python2.7/dist-packages/ceph_rest_api.py", line 106, in
> > api_setup
> >app.ceph_cluster.connect()
> >  File "rados.pyx", line 811, in rados.Rados.connect
> > (/tmp/buildd/ceph-11.2.0/obj-x
> > 86_64-linux-gnu/src/pybind/rados/pyrex/rados.c:10178)
> > rados.ObjectNotFound: error connecting to the cluster
>
> # strace -eopen /bin/ceph-rest-api |& grep keyring
> open("/etc/ceph/ceph.client.restapi.keyring", O_RDONLY) = -1 ENOENT
> (No such file or directory)
> open("/etc/ceph/ceph.keyring", O_RDONLY) = -1 ENOENT (No such file or
> directory)
> open("/etc/ceph/keyring", O_RDONLY) = -1 ENOENT (No such file or
> directory)
> open("/etc/ceph/keyring.bin", O_RDONLY) = -1 ENOENT (No such file or
> directory)
>
> # ceph auth get-or-create client.restapi mon 'allow *' mds 'allow *'
> osd 'allow *' >/etc/ceph/ceph.client.restapi.keyring
>
> # /bin/ceph-rest-api
>  * Running on http://0.0.0.0:5000/
>
> >
> >
> >
> > Best wishes,
> > Mika
> >
> >
> > 2016-03-03 12:25 GMT+08:00 Shinobu Kinjo :
> >>
> >> Yes.
> >>
> >> On Wed, Jan 27, 2016 at 1:10 PM, Dan Mick  wrote:
> >> > Is the client.test-admin key in the keyring read by ceph-rest-api?
> >> >
> >> > On 01/22/2016 04:05 PM, Shinobu Kinjo wrote:
> >> >> Does anyone have any idea about that?
> >> >>
> >> >> Rgds,
> >> >> Shinobu
> >> >>
> >> >> - Original Message -
> >> >> From: "Shinobu Kinjo" 
> >> >> To: "ceph-users" 
> >> >> Sent: Friday, January 22, 2016 7:15:36 AM
> >> >> Subject: ceph-rest-api's behavior
> >> >>
> >> >> Hello,
> >> >>
> >> >> "ceph-rest-api" works greatly with client.admin.
> >> >> But with client.test-admin which I created just after building the
> Ceph
> >> >> cluster , it does not work.
> >> >>
> >> >>  ~$ ceph auth get-or-create client.test-admin mon 'allow *' mds
> 'allow
> >> >> *' osd 'allow *'
> >> >>
> >> >>  ~$ sudo ceph auth list
> >> >>  installed auth entries:
> >> >>...
> >> >>  client.test-admin
> >> >>   key: AQCOVaFWTYr2ORAAKwr

Re: [ceph-users] ceph-rest-api's behavior

2017-03-23 Thread Mika c

Hi all,
 Same question with CEPH 10.2.3 and 11.2.0.
  Is this command only for client.admin ?

client.symphony
   key: AQD0tdRYjhABEhAAaG49VhVXBTw0MxltAiuvgg==
   caps: [mon] allow *
   caps: [osd] allow *

Traceback (most recent call last):
 File "/usr/bin/ceph-rest-api", line 43, in 
   rest,
 File "/usr/lib/python2.7/dist-packages/ceph_rest_api.py", line 504, in
generate_a
pp
   addr, port = api_setup(app, conf, cluster, clientname, clientid, args)
 File "/usr/lib/python2.7/dist-packages/ceph_rest_api.py", line 106, in
api_setup
   app.ceph_cluster.connect()
 File "rados.pyx", line 811, in rados.Rados.connect
(/tmp/buildd/ceph-11.2.0/obj-x
86_64-linux-gnu/src/pybind/rados/pyrex/rados.c:10178)
rados.ObjectNotFound: error connecting to the cluster



Best wishes,
Mika


2016-03-03 12:25 GMT+08:00 Shinobu Kinjo :

> Yes.
>
> On Wed, Jan 27, 2016 at 1:10 PM, Dan Mick  wrote:
> > Is the client.test-admin key in the keyring read by ceph-rest-api?
> >
> > On 01/22/2016 04:05 PM, Shinobu Kinjo wrote:
> >> Does anyone have any idea about that?
> >>
> >> Rgds,
> >> Shinobu
> >>
> >> - Original Message -
> >> From: "Shinobu Kinjo" 
> >> To: "ceph-users" 
> >> Sent: Friday, January 22, 2016 7:15:36 AM
> >> Subject: ceph-rest-api's behavior
> >>
> >> Hello,
> >>
> >> "ceph-rest-api" works greatly with client.admin.
> >> But with client.test-admin which I created just after building the Ceph
> cluster , it does not work.
> >>
> >>  ~$ ceph auth get-or-create client.test-admin mon 'allow *' mds 'allow
> *' osd 'allow *'
> >>
> >>  ~$ sudo ceph auth list
> >>  installed auth entries:
> >>...
> >>  client.test-admin
> >>   key: AQCOVaFWTYr2ORAAKwruANTLXqdHOchkVvRApg==
> >>   caps: [mds] allow *
> >>   caps: [mon] allow *
> >>   caps: [osd] allow *
> >>
> >>  ~$ ceph-rest-api -n client.test-admin
> >>  Traceback (most recent call last):
> >>File "/bin/ceph-rest-api", line 59, in 
> >>  rest,
> >>File "/usr/lib/python2.7/site-packages/ceph_rest_api.py", line 504,
> in generate_app
> >>  addr, port = api_setup(app, conf, cluster, clientname, clientid,
> args)
> >>File "/usr/lib/python2.7/site-packages/ceph_rest_api.py", line 106,
> in api_setup
> >>  app.ceph_cluster.connect()
> >>File "/usr/lib/python2.7/site-packages/rados.py", line 485, in
> connect
> >>  raise make_ex(ret, "error connecting to the cluster")
> >>  rados.ObjectNotFound: error connecting to the cluster
> >>
> >> # ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
> >>
> >> Is that expected behavior?
> >> Or if I've missed anything, please point it out to me.
> >>
> >> Rgds,
> >> Shinobu
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >
> >
> > --
> > Dan Mick
> > Red Hat, Inc.
> > Ceph docs: http://ceph.com/docs
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Email:
> shin...@linux.com
> GitHub:
> shinobu-x
> Blog:
> Life with Distributed Computational System based on OpenSource
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] ImportError: No module named ceph_deploy.cli

2017-03-23 Thread c . monty

Hello!

I have installed 
ceph-deploy-1.5.36git.1479985814.c561890-6.6.noarch.rpm
on SLES11 SP4.

When I start ceph-deploy, I get an error:
ceph@ldcephadm:~/dlm-lve-cluster> ceph-deploy new ldcephmon1
Traceback (most recent call last):
  File "/usr/bin/ceph-deploy", line 18, in 
from ceph_deploy.cli import main
ImportError: No module named ceph_deploy.cli


What is causing this error?

THX
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ONE pg deep-scrub blocks cluster

2016-08-02 Thread c


Am 2016-08-02 13:30, schrieb c:

Hello Guys,

this time without the original acting-set osd.4, 16 and 28. The issue
still exists...

[...]

For the record, this ONLY happens with this PG and no others that
share
the same OSDs, right?


Yes, right.

[...]

When doing the deep-scrub, monitor (atop, etc) all 3 nodes and
see if a
particular OSD (HDD) stands out, as I would expect it to.


Now I logged all disks via atop each 2 seconds while the deep-scrub
was running ( atop -w osdXX_atop 2 ).
As you expected all disks was 100% busy - with constant 150MB
(osd.4), 130MB (osd.28) and 170MB (osd.16)...

- osd.4 (/dev/sdf) http://slexy.org/view/s21emd2u6j [1]
- osd.16 (/dev/sdm): http://slexy.org/view/s20vukWz5E [2]
- osd.28 (/dev/sdh): http://slexy.org/view/s20YX0lzZY [3]
[...]
But what is causing this? A deep-scrub on all other disks - same
model and ordered at the same time - seems to not have this issue.

[...]

Next week, I will do this

1.1 Remove osd.4 completely from Ceph - again (the actual primary
for PG 0.223)


osd.4 is now removed completely.
The Primary PG is now on "osd.9"

# ceph pg map 0.223
osdmap e8671 pg 0.223 (0.223) -> up [9,16,28] acting [9,16,28]


1.2 xfs_repair -n /dev/sdf1 (osd.4): to see possible error


xfs_repair did not find/show any error


1.3 ceph pg deep-scrub 0.223
- Log with " ceph tell osd.4,16,28 injectargs "--debug_osd 5/5"


Because now osd.9 is the Primary PG i have set the debug_osd on this 
too:

ceph tell osd.9 injectargs "--debug_osd 5/5"

and run the deep-scrub on 0.223 (and againg nearly all of my VMs stop
working for a while)
Start @ 15:33:27
End @ 15:48:31

The "ceph.log"
- http://slexy.org/view/s2WbdApDLz

The related LogFiles (OSDs 9,16 and 28) and the LogFile via atop for 
the osds


LogFile - osd.9 (/dev/sdk)
- ceph-osd.9.log: http://slexy.org/view/s2kXeLMQyw
- atop Log: http://slexy.org/view/s21wJG2qr8

LogFile - osd.16 (/dev/sdh)
- ceph-osd.16.log: http://slexy.org/view/s20D6WhD4d
- atop Log: http://slexy.org/view/s2iMjer8rC

LogFile - osd.28 (/dev/sdm)
- ceph-osd.28.log: http://slexy.org/view/s21dmXoEo7
- atop log: http://slexy.org/view/s2gJqzu3uG


2.1 Remove osd.16 completely from Ceph


osd.16 is now removed completely - now replaced with osd.17 witihin
the acting set.

# ceph pg map 0.223
osdmap e9017 pg 0.223 (0.223) -> up [9,17,28] acting [9,17,28]


2.2 xfs_repair -n /dev/sdh1


xfs_repair did not find/show any error


2.3 ceph pg deep-scrub 0.223
- Log with " ceph tell osd.9,17,28 injectargs "--debug_osd 5/5"


and run the deep-scrub on 0.223 (and againg nearly all of my VMs stop
working for a while)

Start @ 2016-08-02 10:02:44
End @ 2016-08-02 10:17:22

The "Ceph.log": http://slexy.org/view/s2ED5LvuV2

LogFile - osd.9 (/dev/sdk)
- ceph-osd.9.log: http://slexy.org/view/s21z9JmwSu
- atop Log: http://slexy.org/view/s20XjFZFEL

LogFile - osd.17 (/dev/sdi)
- ceph-osd.17.log: http://slexy.org/view/s202fpcZS9
- atop Log: http://slexy.org/view/s2TxeR1JSz

LogFile - osd.28 (/dev/sdm)
- ceph-osd.28.log: http://slexy.org/view/s2eCUyC7xV
- atop log: http://slexy.org/view/s21AfebBqK


3.1 Remove osd.28 completely from Ceph


Now osd.28 is also removed completely from Ceph - now replaced with 
osd.23


# ceph pg map 0.223
osdmap e9363 pg 0.223 (0.223) -> up [9,17,23] acting [9,17,23]


3.2 xfs_repair -n /dev/sdm1


As expected: xfs_repair did not find/show any error


3.3 ceph pg deep-scrub 0.223
- Log with " ceph tell osd.9,17,23 injectargs "--debug_osd 5/5"


... againg nearly all of my VMs stop working for a while...

Now are all "original" OSDs (4,16,28) removed which was in the
acting-set when i wrote my first eMail to this mailinglist. But the
issue still exists with different OSDs (9,17,23) as the acting-set
while the questionable PG 0.223 is still the same!

In suspicion that the "tunable" could be the cause, i have now changed
this back to "default" via " ceph osd crush tunables default ".
This will take a whille... then i will do " ceph pg deep-scrub 0.223 "
again (without osds 4,16,28)...


Really, i do not know whats going on here.

Ceph finished its recovering to "default" tunables but the issue still 
exists!:*(


The acting set has changed again

# ceph pg map 0.223
osdmap e11230 pg 0.223 (0.223) -> up [9,11,20] acting [9,11,20]

But when i start " ceph pg deep-scrub 0.223 ", again nearly all of my 
VMs stop working for a while!


Does any one have an idea where i should have a look to find the cause 
for this?


It seems that everytime the Primary OSD from the acting set of PG 0.223 
(*4*,16,28; *9*,17,23 or *9*,11,20) leads to "currently waiting for 
subops from 9,X" and the deep-scrub takes always nearly 15 minutes to 
finish.


My output from " ceph pg 0.223 query "

- http://slexy.org/view/s21d6qUqnV

Mehmet

Re: [ceph-users] ONE pg deep-scrub blocks cluster

2016-08-02 Thread c


Hello Guys,

this time without the original acting-set osd.4, 16 and 28. The issue 
still exists...


[...]

For the record, this ONLY happens with this PG and no others that
share
the same OSDs, right?


Yes, right.

[...]

When doing the deep-scrub, monitor (atop, etc) all 3 nodes and
see if a
particular OSD (HDD) stands out, as I would expect it to.


Now I logged all disks via atop each 2 seconds while the deep-scrub
was running ( atop -w osdXX_atop 2 ).
As you expected all disks was 100% busy - with constant 150MB
(osd.4), 130MB (osd.28) and 170MB (osd.16)...

- osd.4 (/dev/sdf) http://slexy.org/view/s21emd2u6j [1]
- osd.16 (/dev/sdm): http://slexy.org/view/s20vukWz5E [2]
- osd.28 (/dev/sdh): http://slexy.org/view/s20YX0lzZY [3]
[...]
But what is causing this? A deep-scrub on all other disks - same
model and ordered at the same time - seems to not have this issue.

[...]

Next week, I will do this

1.1 Remove osd.4 completely from Ceph - again (the actual primary
for PG 0.223)


osd.4 is now removed completely.
The Primary PG is now on "osd.9"

# ceph pg map 0.223
osdmap e8671 pg 0.223 (0.223) -> up [9,16,28] acting [9,16,28]


1.2 xfs_repair -n /dev/sdf1 (osd.4): to see possible error


xfs_repair did not find/show any error


1.3 ceph pg deep-scrub 0.223
- Log with " ceph tell osd.4,16,28 injectargs "--debug_osd 5/5"


Because now osd.9 is the Primary PG i have set the debug_osd on this 
too:

ceph tell osd.9 injectargs "--debug_osd 5/5"

and run the deep-scrub on 0.223 (and againg nearly all of my VMs stop 
working for a while)

Start @ 15:33:27
End @ 15:48:31

The "ceph.log"
- http://slexy.org/view/s2WbdApDLz

The related LogFiles (OSDs 9,16 and 28) and the LogFile via atop for the 
osds


LogFile - osd.9 (/dev/sdk)
- ceph-osd.9.log: http://slexy.org/view/s2kXeLMQyw
- atop Log: http://slexy.org/view/s21wJG2qr8

LogFile - osd.16 (/dev/sdh)
- ceph-osd.16.log: http://slexy.org/view/s20D6WhD4d
- atop Log: http://slexy.org/view/s2iMjer8rC

LogFile - osd.28 (/dev/sdm)
- ceph-osd.28.log: http://slexy.org/view/s21dmXoEo7
- atop log: http://slexy.org/view/s2gJqzu3uG


2.1 Remove osd.16 completely from Ceph


osd.16 is now removed completely - now replaced with osd.17 witihin the 
acting set.


# ceph pg map 0.223
osdmap e9017 pg 0.223 (0.223) -> up [9,17,28] acting [9,17,28]


2.2 xfs_repair -n /dev/sdh1


xfs_repair did not find/show any error


2.3 ceph pg deep-scrub 0.223
- Log with " ceph tell osd.9,17,28 injectargs "--debug_osd 5/5"


and run the deep-scrub on 0.223 (and againg nearly all of my VMs stop 
working for a while)


Start @ 2016-08-02 10:02:44
End @ 2016-08-02 10:17:22

The "Ceph.log": http://slexy.org/view/s2ED5LvuV2

LogFile - osd.9 (/dev/sdk)
- ceph-osd.9.log: http://slexy.org/view/s21z9JmwSu
- atop Log: http://slexy.org/view/s20XjFZFEL

LogFile - osd.17 (/dev/sdi)
- ceph-osd.17.log: http://slexy.org/view/s202fpcZS9
- atop Log: http://slexy.org/view/s2TxeR1JSz

LogFile - osd.28 (/dev/sdm)
- ceph-osd.28.log: http://slexy.org/view/s2eCUyC7xV
- atop log: http://slexy.org/view/s21AfebBqK


3.1 Remove osd.28 completely from Ceph


Now osd.28 is also removed completely from Ceph - now replaced with 
osd.23


# ceph pg map 0.223
osdmap e9363 pg 0.223 (0.223) -> up [9,17,23] acting [9,17,23]


3.2 xfs_repair -n /dev/sdm1


As expected: xfs_repair did not find/show any error


3.3 ceph pg deep-scrub 0.223
- Log with " ceph tell osd.9,17,23 injectargs "--debug_osd 5/5"


... againg nearly all of my VMs stop working for a while...

Now are all "original" OSDs (4,16,28) removed which was in the 
acting-set when i wrote my first eMail to this mailinglist. But the 
issue still exists with different OSDs (9,17,23) as the acting-set while 
the questionable PG 0.223 is still the same!


In suspicion that the "tunable" could be the cause, i have now changed 
this back to "default" via " ceph osd crush tunables default ".
This will take a whille... then i will do " ceph pg deep-scrub 0.223 " 
again (without osds 4,16,28)...


For the records: Although nearly all disks are busy i have no 
slow/blocked requests and i am watching the logfiles for nearly 20 
minutes now...


Your help is realy appreciated!
- Mehmet

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ONE pg deep-scrub blocks cluster

2016-08-01 Thread c


Hello Guys,

your help is realy appreciated!

[...]

For the record, this ONLY happens with this PG and no others that
share
the same OSDs, right?


Yes, right.
[...]

When doing the deep-scrub, monitor (atop, etc) all 3 nodes and
see if a
particular OSD (HDD) stands out, as I would expect it to.


Now I logged all disks via atop each 2 seconds while the deep-scrub
was running ( atop -w osdXX_atop 2 ).
As you expected all disks was 100% busy - with constant 150MB
(osd.4), 130MB (osd.28) and 170MB (osd.16)...

- osd.4 (/dev/sdf) http://slexy.org/view/s21emd2u6j [1]
- osd.16 (/dev/sdm): http://slexy.org/view/s20vukWz5E [2]
- osd.28 (/dev/sdh): http://slexy.org/view/s20YX0lzZY [3]
[...]
But what is causing this? A deep-scrub on all other disks - same
model and ordered at the same time - seems to not have this issue.

[...]

Next week, I will do this

1.1 Remove osd.4 completely from Ceph - again (the actual primary
for PG 0.223)


osd.4 is now removed completely.
The Primary PG is now on "osd.9"

# ceph pg map 0.223
osdmap e8671 pg 0.223 (0.223) -> up [9,16,28] acting [9,16,28]


1.2 xfs_repair -n /dev/sdf1 (osd.4): to see possible error


xfs_repair did not show any error


1.3 ceph pg deep-scrub 0.223
- Log with " ceph tell osd.4,16,28 injectargs "--debug_osd 5/5"


Cause now osd.9 is the Primary PG i have set the debug_osd on this too:
ceph tell osd.9 injectargs "--debug_osd 5/5"

and run the deep-scrub on 0.223 (and againg nearly all of my VMs stop 
working for a while)

Start @ 15:33:27
End @ 15:48:31

The "ceph.log"
- http://slexy.org/view/s2WbdApDLz

The related LogFiles (OSDs 9,16 and 28) and the LogFile via atop for the 
osds


LogFile - osd.9 (/dev/sdk)
- ceph-osd.9.log: http://slexy.org/view/s2kXeLMQyw
- atop Log: http://slexy.org/view/s21wJG2qr8

LogFile - osd.16 (/dev/sdm)
- ceph-osd.16.log: http://slexy.org/view/s20D6WhD4d
- atop Log: http://slexy.org/view/s2iMjer8rC

LogFile - osd.28 (/dev/sdh)
- ceph-osd.28.log: http://slexy.org/view/s21dmXoEo7
- atop log: http://slexy.org/view/s2gJqzu3uG


2.1 Remove osd.16 completely from Ceph
2.2 xfs_repair -n /dev/sdm1
2.3 ceph pg deep-scrub 0.223
- Log with " ceph tell osd.4,16,28 injectargs "--debug_osd 5/5"


Tommorow i will remove osd.16 in addition to osd4 and do the same.
Acting set for pg 0.223: 9, ?, 28


3.1 Remove osd.28 completely from Ceph
3.2 xfs_repair -n /dev/sdm1
3.3 ceph pg deep-scrub 0.223
- Log with " ceph tell osd.4,16,28 injectargs "--debug_osd 5/5"


After that, osd.28 will follow.
acting set for pg 0.223 then: 9,?,?

When my VMs stops for while even the mentioned disks before (4,16,28) 
are not in the cluster anymore, then there must be in issue with this 
pg!



smartctl may not show anything out of sorts until the marginally
bad sector or sectors finally goes bad and gets remapped.  The
only hint may be buried in the raw read error rate, seek error
rate or other error counts like ecc or crc errors.  The long test
you are running may or may not show any new information.


The long smartctl checks did not find any issues.

Perhaps is it notable that i have set the tunables to "jewel" since 
installation.
The flag "sort bitwise" is also set, cause this is the default for an 
Jewel installation.


My next mail will follow tommorow.

Did you guys find something in the attached logs which i did not see?

- Mehmet

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ONE pg deep-scrub blocks cluster

2016-07-30 Thread c


Am 2016-07-30 14:04, schrieb Marius Vaitiekunas:

Hi,

We had a similar issue. If you use radosgw and have large buckets,
this pg could hold a bucket index. 


Hello Marius,

thanks for your hint.

But, it seems that i forgot to mention that we are using ceph only as 
rbd for our virtual machines for now.

So no radosgw for now.

- Mehmet



On Friday, 29 July 2016, c  wrote:


Hi Christian,
Hello Bill,

thank you very much for your Post.


For the record, this ONLY happens with this PG and no others that
share
the same OSDs, right?


Yes, right.


If so then we're looking at something (HDD or FS wise) that's
specific to
the data of this PG.

When doing the deep-scrub, monitor (atop, etc) all 3 nodes and
see if a
particular OSD (HDD) stands out, as I would expect it to.


Now I logged all disks via atop each 2 seconds while the deep-scrub
was running ( atop -w osdXX_atop 2 ).
As you expected all disks was 100% busy - with constant 150MB
(osd.4), 130MB (osd.28) and 170MB (osd.16)...

- osd.4 (/dev/sdf) http://slexy.org/view/s21emd2u6j [1]
- osd.16 (/dev/sdm): http://slexy.org/view/s20vukWz5E [2]
- osd.28 (/dev/sdh): http://slexy.org/view/s20YX0lzZY [3]

You can have a look on this logs via "atop -r FILE" and jump to the
time when you press "b" and type "17:12:31".
With "t" you can "walk" forward and "T" backward through the
logfile.

But what is causing this? A deep-scrub on all other disks - same
model and ordered at the same time - seems to not have this issue.


Bill<

Removing osd.4 and still getting the scrub problems removes its
drive from consideration as the culprit.  Try the same thing
again for osd.16 and then osd.28.



Christian<

Since you already removed osd.4 with the same result, continue to
cycle through the other OSDs.
Running a fsck on the (out) OSDs might be helpful, too.


Next week, I will do this

1.1 Remove osd.4 completely from Ceph - again (the actual primary
for PG 0.223)
1.2 xfs_repair -n /dev/sdf1 (osd.4): to see possible error
1.3 ceph pg deep-scrub 0.223
- Log with " ceph tell osd.4,16,28 injectargs "--debug_osd 5/5"

2.1 Remove osd.16 completely from Ceph
2.2 xfs_repair -n /dev/sdm1
2.3 ceph pg deep-scrub 0.223
- Log with " ceph tell osd.4,16,28 injectargs "--debug_osd 5/5"

3.1 Remove osd.16 completely from Ceph
3.2 xfs_repair -n /dev/sdm1
3.3 ceph pg deep-scrub 0.223
- Log with " ceph tell osd.4,16,28 injectargs "--debug_osd 5/5"


smartctl may not show anything out of sorts until the marginally
bad sector or sectors finally goes bad and gets remapped.  The
only hint may be buried in the raw read error rate, seek error
rate or other error counts like ecc or crc errors.  The long test
you are running may or may not show any new information.


I will write you again next week when I have done the tests above.

- Mehmet

Am 2016-07-29 03:05, schrieb Christian Balzer:
Hello,

On Thu, 28 Jul 2016 14:46:58 +0200 c wrote:

Hello Ceph alikes :)

i have a strange issue with one PG (0.223) combined with
"deep-scrub".

Always when ceph - or I manually - run a " ceph pg deep-scrub 0.223
",
this leads to many "slow/block requests" so that nearly all of my
VMs
stop working for a while.

For the record, this ONLY happens with this PG and no others that
share
the same OSDs, right?

If so then we're looking at something (HDD or FS wise) that's
specific to
the data of this PG.

When doing the deep-scrub, monitor (atop, etc) all 3 nodes and see
if a
particular OSD (HDD) stands out, as I would expect it to.

Since you already removed osd.4 with the same result, continue to
cycle
through the other OSDs.
Running a fsck on the (out) OSDs might be helpful, too.

Christian

This happens only to this one PG 0.223 and in combination with
deep-scrub (!). All other Placement Groups where a deep-scrub
occurs are
fine. The mentioned PG also works fine when a "normal scrub"
occurs.

These OSDs are involved:

#> ceph pg map 0.223
osdmap e7047 pg 0.223 (0.223) -> up [4,16,28] acting [4,16,28]

*The LogFiles*

"deep-scrub" starts @ 2016-07-28 12:44:00.588542 and takes
approximately
12 Minutes (End: 2016-07-28 12:56:31.891165)
- ceph.log: http://pastebin.com/FSY45VtM [4]

I have done " ceph tell osd injectargs '--debug-osd = 5/5' " for
the
related OSDs 4,16 and 28

LogFile - osd.4
- ceph-osd.4.log: http://slexy.org/view/s20zzAfxFH [5]

LogFile - osd.16
- ceph-osd.16.log: http://slexy.org/view/s25H3Zvkb0 [6]

LogFile - osd.28
- ceph-osd.28.log: http://slexy.org/view/s21Ecpwd70 [7]

I have checked the disks 4,16 and 28 with smartctl and could not
any
issues - also there are no odd "dmesg" messages.

*ceph -s*
     cluster 98a410bf-b823-47e4-ad17-4543afa24992
      health HEALTH_OK
      monmap e2: 3 mons at


{monitor1=172.16.0.2:6789/0,monitor3=172.16.0.4:6789/0,monitor2=172.16.0.3:6789/0

[

1 2 >

1 - 100 of 144 matches

Mail list logo